Rajashekar Reddy Vedire
Applied scientist building production ML systems across cricket analytics, agricultural forecasting, healthcare cost prediction, and autonomous web agents.
MS Applied Data Science, IU Indianapolis (Aug 2026) · STEM OPT Eligible
0
Projects
0
M Rows Processed
0
% FP Reduction
0
Cloud Platforms
0
PyPI Packages
Education
MS, Applied Data Science
Indiana University Indianapolis (Luddy School)
Aug 2024 – Aug 2026
Indianapolis, IN
- Graduate Research Assistant, Sport Innovation Institute (SII)
- ML, Applied Deep Learning, NLP, Statistical Computing, Data Mining, Big Data Analytics, Cloud Computing
B.Tech, Mechanical Engineering (Automotive)
Vellore Institute of Technology
2013 – 2017
India
Technical Skills
Logistic RegressionIsotonic CalibrationBayesian InferenceMonte Carlo SimulationLightGBMXGBoostRandom ForestSARIMAXHolt-WintersTabular Q-LearningInteger Linear ProgrammingSHAP ExplainabilityShrinkage EstimationMahalanobis Anomaly DetectionTF-IDF + SVMWalk-Forward CVLangGraphRAG (FAISS + Ollama)Vision-Language ModelsAnthropic APIPrompt EngineeringPythonTypeScriptSQLFastAPINext.jsStreamlitPlaywrightscikit-learnpandasNumPySciPystatsmodelsPyArrowPydantic v2SQLAlchemy 2.0AlembicPostgreSQLRedisDuckDBBigQueryDatabricksPySparkAWSAzureGCPDocker + CUDAGitHub ActionsVercelRailwayMLflowAPSchedulerPower BIPlotlyRechartsDeck.glChart.js
Experience
Graduate Research Assistant
Aug 2024 – PresentSport Innovation Institute, Indiana University
Indianapolis, IN
- Designed and deployed the Horizon League Budget Dashboard: migrated from Power BI ($250/mo) to custom Next.js on Azure Static Web Apps ($59/mo), serving 11 NCAA Division I Athletic Directors with 8 analytics pages and Entra ID multi-tenant auth
- Architected DataSkrive cohort analytics on GCP/BigQuery: reverse-engineered 30+ tables, discovered behavioral segment is 114x more predictive of conversion than cohort assignment, identified $857M in attributed revenue across 48 cohort profiles
- Built Intent Quotient (IQ): original NLP metric quantifying batting aggression intent from commentary using TF-IDF + SVM classification (40+ attack pattern regexes) with Bayesian shrinkage regularization
- Engineered IPL Playing XI Selector: 4-layer prescriptive pipeline combining PuLP ILP, mixed-effects synergy regression from cross-league data, and tabular Q-learning MDP with 384 discretized game states
- Developed PlayerData athlete benchmark: data pipeline for 281 collegiate athletes with percentile computation by cohort and rule-based conversational chatbot
Programmer Analyst, Client: Microsoft Research
Aug 2021 – May 2024Cognizant Technology Solutions
Bangalore, India
- Evaluated multiple time series prediction algorithms for anomaly detection, comparing feasibility and performance across Microsoft Research engagement metrics
- Implemented a modified Holt-Winters algorithm with forward testing and back testing in Databricks for performance analysis
- Achieved 45% reduction in false positives at user-level with no compromise in recall
Programmer Analyst, Client: Microsoft Advertising
Cognizant Technology Solutions
Bangalore, India
- Performed data analysis across Microsoft Ads products using internal tools (Agora, Scope, PyScope) and SQL/Python
- Designed business-specific Power BI dashboards saving 20-30% time in operational procedures
- Integrated Power BI data pipelines with SQL and Python for comprehensive automation
Selected Projects
CoverDrive Cricket↗
Mar 2026 – PresentFastAPI, Next.js 16, PostgreSQL, Redis, scikit-learn, Anthropic API
- 3-stage Bayesian win probability engine: logistic regression + isotonic calibration (Brier = 0.194), Bayesian log-odds update, NumPy-vectorized Monte Carlo (10K sims, p99 latency 47ms)
- Context-adjusted metrics (SR+, Avg+, Eco+) with 4-factor decomposition from 2M+ delivery records
- Shrinkage matchup model: Laplace-smoothed 5x2 prior matrix with 20% min prior weight
- Stale-while-revalidate caching with Redis, 88 to 23 query N+1 batch optimization
- SportMonks live scoring (30s polling), Claude narrative pipeline, Razorpay freemium monetization
- Phase A cross-league lambda estimation: Beta priors across 6 T20 leagues
Agricultural Data Analysis (QuickStats)↗
Nov 2025 – PresentNext.js, FastAPI, LightGBM, SARIMAX, AWS
- Ensemble forecasting: SARIMAX + LightGBM quantile regression + Ridge meta-learner + isotonic calibration for p10/p50/p90 forecasts
- 18-feature matrix from 4 data sources (CME futures, WASDE, DXY, ERS) with Pandera schema enforcement
- Mahalanobis distance regime anomaly detection deferring to futures curve during regime shifts
- SHAP TreeExplainer for per-forecast key driver identification
- Walk-forward validation (2010-2024 split) with futures-baseline MAPE gate
- Total infrastructure: $22/month (RDS $15, EC2 $6, S3/Athena <$1)
DataSkrive Cohort Analytics↗
Mar 2026 – PresentBigQuery, PySpark, statsmodels, LangGraph
- Reverse-engineered GCP/BigQuery database: 30+ tables, 250+ columns, two coexisting cohort architectures
- Behavioral segment 114x more predictive (AUC 0.885, 194M rows); 183x conversion spread across 19 scenarios
- Holt-Winters anomaly detection: 7-day seasonal decomposition, validated CVR collapse (0.25 to 0.08) and $267K NBA halftime value destruction
- Local RAG: LangGraph 6-node state machine, FAISS vector index, Ollama Qwen3 generation, SQLite persistence
Medicare Provider Cost Analysis↗
Mar 2026 – Presentscikit-learn, XGBoost, RAPIDS cuML, MLflow, Databricks
- Medallion pipeline (Bronze/Silver/Gold) processing 103M rows of CMS data (2013-2023) with dual execution: Databricks + local pandas/PyArrow
- Regional batch training: XGBoost booster continuation + Random Forest warm start with CUDA auto-detection on RTX 5070 Ti
- R² = 0.884 (RF, test MAE $12.04) after removing data leakage from payment-derived features
- HCC risk scores via NPI+year join; 10-feature set with clinical HCPCS bucketing
Peruse AI (Open Source)↗
Feb 2026Python, Playwright, Ollama, LM Studio, Jina
- Local-first perceive-plan-act loop: dual-channel perception (DOM + visual) with VLM decision-making via Playwright
- 5-strategy VLM response parsing fallback for malformed JSON from local models
- Concurrent focus groups: multiple personas (UX designer, accessibility auditor, data analyst) with independent browser instances
- Smart loop recovery: detects 7+ repeated actions, issues progressive nudges with element blocking
APEX (Algorithmic Trading System)↗
Mar 2026Python, DuckDB, LightGBM, Alpaca, FRED, SEC EDGAR
- 8-layer pipeline: raw sources (Alpaca, FRED, SEC EDGAR, Finnhub) through signal engineering (47 features) to DuckDB feature store to execution
- 5 production ingestors with BaseIngestor pattern, rate limiting, and INSERT OR REPLACE deduplication
- 9 sequential circuit breaker risk gates: confidence, position, sector, correlation, drawdown, VIX, liquidity, streak
- Half-Kelly position sizing with calibrated probability inputs; vectorbt backtesting engine
Ball View↗
Feb 2026FastAPI, YOLOv8, OpenCV, Kalman Filter, EasyOCR, Docker/CUDA
- Real-time CV pipeline: browser capture at 30 FPS via WebSocket, YOLOv8n detection, Kalman Filter trajectory tracking (20-frame history)
- OCR-to-match sync: EasyOCR reads scoreboard every 30 frames, fuzzy-matches against Cricsheet ball-by-ball JSON
- Deployed on NVIDIA Docker (CUDA 12.4) with GPU passthrough on RTX 5070 Ti
Horizon League Budget Dashboard↗
Jan – Feb 2026Next.js, Azure SQL, Entra ID, GitHub Actions
- Replaced Power BI ($250/mo for 20 users) with Next.js on Azure Static Web Apps ($59/mo): 8 analytics pages with Recharts and TanStack React Table
- Azure SQL star schema (11 dimensions, 7 facts) with Python ETL, Managed Identity for credential-free access, multi-tenant Entra ID auth for 11 institutional tenants
Open Export (Open Source)↗
Feb 2026Python, Playwright, Click, Rich
- Published CLI tool to PyPI: Chrome DevTools Protocol connection, paginated API, tree-based message linearization, JSON + Markdown export with SHA-256 deduplication
PocketLedger↗
Mar 2026FastAPI, Tesseract OCR, OpenCV, Ollama, SQLite
- OCR-powered bank statement parser with auto-detecting bank routing (Chase, BofA, Discover), image preprocessing pipeline, and dual categorization engine (SQL pattern matching + Ollama LLM fallback)