Demo projects
Interactive US power grid mix visualization from public data.
Client-only CSV cleaning: quality report, deterministic steps, export—no backend.
Phishing URL classifiers: precomputed metrics, precision vs recall, and plain-language model guidance—static JSON only.
NORMAL vs PNEUMONIA on curated pediatric chest X-rays: TensorFlow.js in the browser—samples only, no upload.
Offline demo: synthetic radiology reports with audience-specific summaries—static JSON only, no API.
Anonymized DICOM study converted to representative slices and analyzed in one Anthropic multimodal request, then rendered from static JSON.
Previous projects
Built a classification pipeline to predict employee attrition, identifying the key behavioral and structural drivers of retention risk. Designed for HR teams to intervene proactively before talent loss occurs.
- Engineered a full scikit-learn pipeline with preprocessing, cross-validation, and robust evaluation metrics
- Applied feature importance analysis to surface the top predictors of employee turnover
- Evaluated performance using AUC-ROC curves and precision/recall tradeoffs to tune for real-world usability
- Documented findings in a structured report suitable for both technical and non-technical stakeholders
End-to-end analysis of the LA Crime Dataset (1M+ records), from raw ingestion through statistical analysis to formatted stakeholder reports. Demonstrates scalable data processing and clear visual communication of findings.
- Processed and cleaned a 1M+ record dataset using Python and pandas — handling nulls, inconsistent formats, and outliers
- Applied statistical analysis and visualization to surface actionable crime trends and geographic patterns
- Automated the full data cleaning and reporting pipeline, generating Excel/CSV outputs for non-technical audiences
- Focused on making findings operationally useful, not just analytically interesting
Developed a custom image classification model using the FastAI framework and transfer learning, achieving strong accuracy on a limited training dataset. Demonstrates the practical application of modern CV techniques without requiring large-scale data.
- Leveraged transfer learning from pretrained models to achieve high performance with a small custom dataset
- Trained, validated, and evaluated the model with FastAI's high-level API and learning rate finder
- Demonstrated that state-of-the-art CV results are achievable outside of large research environments
A systematic comparison of multiple classification and regression models on large-scale datasets, focused on quantifying real performance differences across algorithms rather than picking a winner by intuition.
- Built and compared Logistic Regression, Decision Trees, and Random Forest on datasets exceeding 1M records
- Applied cross-validation and feature selection to produce reliable, generalizable results
- Evaluated with multiple metrics — accuracy, precision, recall, F1, AUC — choosing metrics appropriate to each problem type
- Compiled findings into clear technical reports with visualizations for model comparison
Built Python-based automation to transform raw, unstructured firewall log data into clean, structured security reports — reducing manual analyst time and improving turnaround for client security teams at Nuvodia.
- Developed scripts to parse, clean, and normalize messy raw log data at scale
- Automated the pipeline from raw input to formatted client-ready report with no manual steps
- Reduced manual analysis time significantly, freeing engineers for higher-value investigations
- Deployed in a production environment — actively used by client security teams