HELLO, I AM

Arya Patel

DATA SCIENTIST
|
SCROLL

About Me

Most automation projects don't fail because the idea is wrong. They fail because no one built the pipeline to make it real. I'm a Data Science graduate student (Class of 2026) obsessed with closing that gap.

My work ranges from building cloud-native AWS platforms for algorithmic trading research to automating customer discovery pipelines that surface revenue opportunities a sales team would never find manually.

I've learned that the hardest part of data science isn't the model — it's understanding the messy, manual process you're trying to replace. One experience that shaped me early: watching a perfectly backtested trading strategy fall apart the moment it hit live market conditions. Debugging that taught me to think in terms of failure modes and system reliability.

Now I'm channeling that into GenAI and Agentic AI — because the most exciting automation isn't just faster. It reasons.

Arya Patel
AP
0+
DATA SCIENCE
ROLES
0.0
GPA
MS DATA SCIENCE
0+
PROJECTS
BUILT
0+
IBM & GOOGLE
CERTIFICATIONS

FIND ME ONLINE

CERTIFICATIONS & BADGES

IBM Data Analysis
Data Analysis Using Python
IBM Certified
VERIFY
IBM Data Fundamentals
Data Fundamentals
IBM SkillsBuild
VERIFY
IBM Data Visualization
Data Visualization Using Python
IBM Certified
VERIFY
Google Skills
Silver League · 14,000 pts
VIEW PROFILE
Insights from Data with BigQuery
Insights from Data with BigQuery
Google Cloud Certified
VERIFY
Integrate with Machine Learning APIs
Integrate with Machine Learning APIs
Google Cloud Certified
VERIFY
What I Do

Technical Skills

Hover over a card to see the details. I work across the full data science stack — from raw data pipelines to deployed ML models.

Languages
Languages
Python SQL R Bash
Data Engineering
Data Engineering
Apache Spark Airflow dbt Snowflake Pandas NumPy ETL Design
ML / Statistics
ML / Stats
Scikit-learn XGBoost PyTorch Statsmodels SciPy A/B Testing Causal Inference
Cloud & MLOps
Cloud & MLOps
AWS S3 Glue Athena SageMaker Lambda Docker MLflow
NLP & LLMs
NLP & LLMs
Hugging Face Prompt Engineering RAG TF-IDF Transformers
Data Viz & Geo
Visualization
Power BI Plotly Matplotlib Seaborn QuickSight GeoPandas
My Resume

Education & Experience

MY EDUCATION
MS in Data Science
Illinois Institute of Technology, Chicago, IL  |  2024 – May 2026
Teaching Assistant for Data Preparation and Analysis (CSP 571), Fall 2025 & Spring 2026. GPA 3.9/4.0.

Coursework: Big Data Technologies, Statistical Learning, Decentralized ML, Advanced Database, Time Series, Regression, Social Network Analysis, Causal Inference.
BE in Information Technology
Gujarat Technological University, India  |  2020 – 2024
GPA 3.9/4.0. Smart India Hackathon 2022 Winner — National-level government hackathon.
MY EXPERIENCE
Data Scientist Co-op
Jan 2026 – May 2026  |  LabelMaster, Chicago, IL
Integrated 9 regulatory datasets (EPA, OSHA, RCRA, DOT) with 132K+ transactions into a 26K+ site dataset. LLM pipeline generating hazmat profiles at 0.81 mean confidence. Hybrid recommendation engine exposing an 83.6% cross-sell packaging gap.
Data Scientist
May 2025 – Aug 2025  |  Shoptaki, NY (Remote)
Led team of 5 building Python backtesting framework; Parquet storage cut data size 70%. Trading bots via Interactive Brokers API with async concurrent architecture.
Data Scientist
Jul 2022 – Aug 2024  |  Algobazar, Gujarat, India
Engineered 30+ features from 29K+ price bars using Amazon Athena & S3. TensorFlow MLP on SageMaker — backtested strategy generating 9.5% return. Spark ML Random Forest — 65.52% win rate across 87 live trades. Dockerized models deployed to ECR/ECS.
Teaching Assistant — CSP 571
Fall 2025 & Spring 2026  |  Illinois Institute of Technology
Mentored graduate students in Python-based data wrangling, EDA, and statistical analysis. Held office hours and graded assignments.

My Projects

CardShield
Streaming · Fraud Detection
CardShield
End-to-end fraud detection pipeline for streaming credit card transactions. Combines Kafka event streaming, real-time scoring, Cassandra persistence, and a live monitoring dashboard.
Python, Apache Kafka, PySpark, Spark ML, Cassandra
CatSense
Multimodal AI · Computer Vision
CatSense
Real-time multimodal AI inspection co-pilot for Caterpillar heavy equipment. Combines vision, audio, and manual-grounded reasoning to produce structured safety reports in seconds.
React, TypeScript, FastAPI, Google Gemini, Cloudflare
FedAsync
Federated Learning · Distributed ML
FedAsync
Implements FedAsync and FedBuff using PyTorch Lightning and ResNet-18. Simulates heterogeneous client behavior and asynchronous aggregation across distributed nodes.
PyTorch, Flower (flwr), Python, ResNet-18
SmartBites
Recommendation · Big Data
SmartBites
ML-based restaurant recommendation system powered by the Yelp Open Dataset. Distributed PySpark pipelines for feature engineering and AWS SageMaker for model training at scale.
Python, PySpark, AWS SageMaker
Disaster Severity Classifier
NLP · Real-Time Classification
Disaster Severity Classifier
Transformer-based NLP pipeline classifying disaster severity from 11,000+ social media posts at 91% F1. Streaming inference at 100+ posts/sec with geospatial hotspot aggregation.
PyTorch, Hugging Face, Scikit-learn, GeoPandas
Equitable Transit Access
Geospatial · Equity Analysis
Equitable Transit Access
Distributed geospatial pipeline joining 50M+ GTFS transit records with census demographics. Revealed 23% lower transit access in low-income neighborhoods; 4 route recommendations for 40K+ residents.
Apache Spark, GeoPandas, Scikit-learn, AWS
Time Series Forecasting
Time Series · Forecasting
Time Series Forecasting
IBM stock price analysis and forecasting: stationarity testing (ADF), ARIMA modeling, and out-of-sample prediction with confidence intervals.
R, ARIMA, Time Series

Swipe or use arrows to see more

Get In Touch

Let's work together!

I'm currently open to full-time AI/ML, Data Science, and ML Engineering roles starting 2026. If you're building systems where AI meets real constraints and real users, I'd love to connect 🤝

📍 My Location
📍
Location
Chicago, IL
📞
Phone
(312) 284-9308

Contact Form