my projects.

Reddit Upvote Prediction

Reddit Upvote Prediction

End-to-end scalable Spark ML pipeline with custom TF-IDF, scratch logistic regression, and MLlib models for Reddit upvote prediction.

Logistic Regression
Naive Bayes
NumPy
Pandas
PySpark
Python
RDD
SHAP
Spark MLlib
TF-IDF

Adaptive Operator Migration for Edge-Cloud Stream Processing

Dynamic Flink framework for live operator migration in edge-cloud systems using cost-based metrics and savepoint state transfer to reduce WAN latency and maintain SLA compliance.

Apache Flink
Distributed Systems
Edge Computing
Fault Tolerance
Java
Python
Stream Processing

Library Booking Website - Boston University

A real-time, email-verified room booking platform for BU students and faculty with mobile optimization, admin dashboard, and smart conflict detection using React, TypeScript, Flask, and Supabase.

EmailJS
Flask
Full Stack
Mapbox
PostgreSQL
Python
React
Room Booking
Supabase
TypeScript
Respiratory Illness Data Warehouse

Respiratory Illness Data Warehouse

Designed a star-schema data warehouse with SCD Type 2 dimensions, ETL pipelines, and Kafka streaming to track respiratory illness trends in SQL Server.

Data Engineering
ETL
Kafka
Python
SCD Type 2
SQL Server
Star Schema
CourtDoc Classifier

CourtDoc Classifier

End-to-end scalable NLP pipeline using low-level Spark RDD APIs, implementing custom TF-IDF feature extraction and custom logistic regression.

Distributed Systems
Logistic Regression
NumPy
PySpark
Python
RDD
TF-IDF
Paper-to-Podcast

Paper-to-Podcast

Built a tool that converts academic research papers into conversational podcasts with Host, Learner, and Expert personas using retrieval-augmented generation and text-to-speech synthesis.

LangChain
Natural Language Processing
OpenAI API
Prompt Engineering
Python
RAG
Text-to-Speech