Portfolio

All Projects

Every project and course project, tagged by the tools, methods, and domains I worked with. Click any keyword, or search, to see all the work that uses it.

Languages & Tools
Machine Learning & AI
Statistics & Methods
Data & Engineering
Domain & Business

Soccer Player Market Value Scout preview

Soccer Player Market Value Scout

Deployed ML service that predicts and explains soccer players' market value. Streamlit app with a FastAPI prediction service on Google Cloud Run.

PythonMachine Learningscikit-learnRandom ForestRegressionFeature EngineeringModel DeploymentFastAPIREST APIStreamlitDockerGoogle CloudWeb ScrapingData VisualizationPredictive ModelingSports Analytics
🤖

LLM Multi-Agent Research Teams

M.S. thesis: LLM multi-agent simulations of five real research teams (~24K utterances) with a four-layer fidelity framework and a 55-metric NLP evaluation suite. Under review at EMNLP 2026.

PythonNLPLLMsMulti-Agent SystemsDeep LearningStatistical InferenceHypothesis TestingBootstrapExperimental DesignModel EvaluationData PipelineResearchReproducibility
🧮

Multimodal LLM Merging

Samsung Electronics AI research: layer-wise merge strategies and a scalable safetensors pipeline that lift language-benchmark scores while preserving multimodal performance.

PythonPyTorchDeep LearningLLMsComputer VisionNeural NetworksModel EvaluationResearch
🔬

SLIViT Vision Transformer Reproduction

Reproduced and verified the SLIViT vision-transformer experiments from the paper and open-source code, analyzing model structure and performance across settings.

PythonPyTorchDeep LearningComputer VisionTransformersNeural NetworksModel EvaluationReproducibilityResearch
Enhancing Emotion Recognition in AI preview

Enhancing Emotion Recognition in AI

Improved interpretability and reduced bias of an image emotion classifier using CLIP-Dissect and concept models on a ResNet-50 backbone.

PythonPyTorchDeep LearningComputer VisionCLIPTransfer LearningModel InterpretabilityNeural NetworksResearch
CLIP Image Retrieval & VLM Classification preview

CLIP Image Retrieval & VLM Classification

CLIP embeddings with k-NN retrieval and Qwen2-VL classification on CIFAR-10, with retrieval-quality checks against a random baseline.

PythonPyTorchDeep LearningComputer VisionCLIPscikit-learnMachine LearningModel EvaluationInformation Retrieval
📊

Congressional Stock-Trading Analysis

Analysis and modeling of U.S. House members' stock trades: data collection, missingness analysis, hypothesis testing, and a Random Forest party-prediction model (99% accuracy) with a fairness permutation test.

PythonpandasData AnalysisEDAData CleaningMissing DataData CollectionHypothesis TestingPermutation TestingClassificationRandom ForestHyperparameter TuningFairnessWeb ScrapingStatistical ModelingFinance
🚗

Used Car Price Prediction

Predicting used-car prices from vehicle attributes with feature engineering, exploratory analysis, and regression model selection.

Pythonscikit-learnMachine LearningRegressionFeature EngineeringEDAData AnalysisPredictive Modeling

Email Spam Classification

Comparing preprocessing techniques and classifiers (logistic regression, LDA/QDA, SVM, random forest) on 4,601 emails. Built in R.

RMachine LearningClassificationLogistic RegressionSVMRandom ForestStatistical ModelingFeature EngineeringData Analysis
Optimizing Hospital Locations in San Diego preview

Optimizing Hospital Locations in San Diego

Geospatial analysis using traffic-crash and census data to recommend hospital locations for high-risk areas, compared against actual locations.

PythonGISGeospatial AnalysisData AnalysisData VisualizationOptimizationHealthcarepandas
🎲

Gradient Boosting from Scratch

Implemented gradient boosting (XGBoost-style) from scratch with gradient- and Hessian-based optimization, matching the library's performance under the same hyperparameters.

PythonMachine LearningGradient BoostingXGBoostRegressionOptimizationscikit-learnModel Evaluation
🏢

Real-Estate Demographic Case Study

Case study joining 2023 Census demographics with LODES employment data to analyze where people live versus work, using statistical modeling and data collection from public APIs.

PythonData AnalysisStatistical ModelingData CollectionData VisualizationCensus DataEconomicsBusiness Analyticspandas
📈

Market Intelligence & Competitor Analysis

Samsung Biologics internship: market and competitor analysis to support strategy, with automated workflows that cut recurring manual effort by over 90% and insights communicated to stakeholders.

Market ResearchCompetitor AnalysisBusiness AnalyticsData AnalysisStakeholder Communication
📋

Customer Analytics Dashboards

Suntek Systems: built Tableau dashboards from customer data that cut reporting time by over 50% and delivered data-driven insights to align the product with customer needs.

TableauSQLData VisualizationDashboardsBusiness AnalyticsData AnalysisCustomer SuccessStakeholder Communication