my projects.
Reddit Upvote Prediction
End-to-end scalable Spark ML pipeline with custom TF-IDF, scratch logistic regression, and MLlib models for Reddit upvote prediction.
Adaptive Operator Migration for Edge-Cloud Stream Processing
Dynamic Flink framework for live operator migration in edge-cloud systems using cost-based metrics and savepoint state transfer to reduce WAN latency and maintain SLA compliance.
Library Booking Website - Boston University
A real-time, email-verified room booking platform for BU students and faculty with mobile optimization, admin dashboard, and smart conflict detection using React, TypeScript, Flask, and Supabase.
Respiratory Illness Data Warehouse
Designed a star-schema data warehouse with SCD Type 2 dimensions, ETL pipelines, and Kafka streaming to track respiratory illness trends in SQL Server.
CourtDoc Classifier
End-to-end scalable NLP pipeline using low-level Spark RDD APIs, implementing custom TF-IDF feature extraction and custom logistic regression.
Paper-to-Podcast
Built a tool that converts academic research papers into conversational podcasts with Host, Learner, and Expert personas using retrieval-augmented generation and text-to-speech synthesis.