Raj

Rajashekar Reddy Vedire

Applied scientist building production ML systems across cricket analytics, agricultural forecasting, healthcare cost prediction, and autonomous web agents.

Indianapolis, IN|rvedire.com|LinkedIn|GitHub
MS Applied Data Science, IU Indianapolis (Aug 2026) · STEM OPT Eligible
0
Projects
0
M Rows Processed
0
% FP Reduction
0
Cloud Platforms
0
PyPI Packages
Education

MS, Applied Data Science

Indiana University Indianapolis (Luddy School)
Aug 2024 – Aug 2026
Indianapolis, IN
  • Graduate Research Assistant, Sport Innovation Institute (SII)
  • ML, Applied Deep Learning, NLP, Statistical Computing, Data Mining, Big Data Analytics, Cloud Computing

B.Tech, Mechanical Engineering (Automotive)

Vellore Institute of Technology
2013 – 2017
India
Technical Skills
Logistic RegressionIsotonic CalibrationBayesian InferenceMonte Carlo SimulationLightGBMXGBoostRandom ForestSARIMAXHolt-WintersTabular Q-LearningInteger Linear ProgrammingSHAP ExplainabilityShrinkage EstimationMahalanobis Anomaly DetectionTF-IDF + SVMWalk-Forward CVLangGraphRAG (FAISS + Ollama)Vision-Language ModelsAnthropic APIPrompt EngineeringPythonTypeScriptSQLFastAPINext.jsStreamlitPlaywrightscikit-learnpandasNumPySciPystatsmodelsPyArrowPydantic v2SQLAlchemy 2.0AlembicPostgreSQLRedisDuckDBBigQueryDatabricksPySparkAWSAzureGCPDocker + CUDAGitHub ActionsVercelRailwayMLflowAPSchedulerPower BIPlotlyRechartsDeck.glChart.js
Experience

Graduate Research Assistant

Aug 2024 – Present
Sport Innovation Institute, Indiana University
Indianapolis, IN
  • Designed and deployed the Horizon League Budget Dashboard: migrated from Power BI ($250/mo) to custom Next.js on Azure Static Web Apps ($59/mo), serving 11 NCAA Division I Athletic Directors with 8 analytics pages and Entra ID multi-tenant auth
  • Architected DataSkrive cohort analytics on GCP/BigQuery: reverse-engineered 30+ tables, discovered behavioral segment is 114x more predictive of conversion than cohort assignment, identified $857M in attributed revenue across 48 cohort profiles
  • Built Intent Quotient (IQ): original NLP metric quantifying batting aggression intent from commentary using TF-IDF + SVM classification (40+ attack pattern regexes) with Bayesian shrinkage regularization
  • Engineered IPL Playing XI Selector: 4-layer prescriptive pipeline combining PuLP ILP, mixed-effects synergy regression from cross-league data, and tabular Q-learning MDP with 384 discretized game states
  • Developed PlayerData athlete benchmark: data pipeline for 281 collegiate athletes with percentile computation by cohort and rule-based conversational chatbot

Programmer Analyst, Client: Microsoft Research

Aug 2021 – May 2024
Cognizant Technology Solutions
Bangalore, India
  • Evaluated multiple time series prediction algorithms for anomaly detection, comparing feasibility and performance across Microsoft Research engagement metrics
  • Implemented a modified Holt-Winters algorithm with forward testing and back testing in Databricks for performance analysis
  • Achieved 45% reduction in false positives at user-level with no compromise in recall

Programmer Analyst, Client: Microsoft Advertising

Cognizant Technology Solutions
Bangalore, India
  • Performed data analysis across Microsoft Ads products using internal tools (Agora, Scope, PyScope) and SQL/Python
  • Designed business-specific Power BI dashboards saving 20-30% time in operational procedures
  • Integrated Power BI data pipelines with SQL and Python for comprehensive automation
Selected Projects

CoverDrive Cricket

Mar 2026 – Present
FastAPI, Next.js 16, PostgreSQL, Redis, scikit-learn, Anthropic API
Full-stack IPL analytics platform
  • 3-stage Bayesian win probability engine: logistic regression + isotonic calibration (Brier = 0.194), Bayesian log-odds update, NumPy-vectorized Monte Carlo (10K sims, p99 latency 47ms)
  • Context-adjusted metrics (SR+, Avg+, Eco+) with 4-factor decomposition from 2M+ delivery records
  • Shrinkage matchup model: Laplace-smoothed 5x2 prior matrix with 20% min prior weight
  • Stale-while-revalidate caching with Redis, 88 to 23 query N+1 batch optimization
  • SportMonks live scoring (30s polling), Claude narrative pipeline, Razorpay freemium monetization
  • Phase A cross-league lambda estimation: Beta priors across 6 T20 leagues

Agricultural Data Analysis (QuickStats)

Nov 2025 – Present
Next.js, FastAPI, LightGBM, SARIMAX, AWS
USDA analytics with commodity forecasting
  • Ensemble forecasting: SARIMAX + LightGBM quantile regression + Ridge meta-learner + isotonic calibration for p10/p50/p90 forecasts
  • 18-feature matrix from 4 data sources (CME futures, WASDE, DXY, ERS) with Pandera schema enforcement
  • Mahalanobis distance regime anomaly detection deferring to futures curve during regime shifts
  • SHAP TreeExplainer for per-forecast key driver identification
  • Walk-forward validation (2010-2024 split) with futures-baseline MAPE gate
  • Total infrastructure: $22/month (RDS $15, EC2 $6, S3/Athena <$1)

DataSkrive Cohort Analytics

Mar 2026 – Present
BigQuery, PySpark, statsmodels, LangGraph
Cohort audit for sports betting platform
  • Reverse-engineered GCP/BigQuery database: 30+ tables, 250+ columns, two coexisting cohort architectures
  • Behavioral segment 114x more predictive (AUC 0.885, 194M rows); 183x conversion spread across 19 scenarios
  • Holt-Winters anomaly detection: 7-day seasonal decomposition, validated CVR collapse (0.25 to 0.08) and $267K NBA halftime value destruction
  • Local RAG: LangGraph 6-node state machine, FAISS vector index, Ollama Qwen3 generation, SQLite persistence

Medicare Provider Cost Analysis

Mar 2026 – Present
scikit-learn, XGBoost, RAPIDS cuML, MLflow, Databricks
National-scale 103M-row ML pipeline
  • Medallion pipeline (Bronze/Silver/Gold) processing 103M rows of CMS data (2013-2023) with dual execution: Databricks + local pandas/PyArrow
  • Regional batch training: XGBoost booster continuation + Random Forest warm start with CUDA auto-detection on RTX 5070 Ti
  • R² = 0.884 (RF, test MAE $12.04) after removing data leakage from payment-derived features
  • HCC risk scores via NPI+year join; 10-feature set with clinical HCPCS bucketing

Peruse AI (Open Source)

Feb 2026
Python, Playwright, Ollama, LM Studio, Jina
Autonomous VLM web agent on PyPI
  • Local-first perceive-plan-act loop: dual-channel perception (DOM + visual) with VLM decision-making via Playwright
  • 5-strategy VLM response parsing fallback for malformed JSON from local models
  • Concurrent focus groups: multiple personas (UX designer, accessibility auditor, data analyst) with independent browser instances
  • Smart loop recovery: detects 7+ repeated actions, issues progressive nudges with element blocking

APEX (Algorithmic Trading System)

Mar 2026
Python, DuckDB, LightGBM, Alpaca, FRED, SEC EDGAR
Multi-signal ensemble for US equities
  • 8-layer pipeline: raw sources (Alpaca, FRED, SEC EDGAR, Finnhub) through signal engineering (47 features) to DuckDB feature store to execution
  • 5 production ingestors with BaseIngestor pattern, rate limiting, and INSERT OR REPLACE deduplication
  • 9 sequential circuit breaker risk gates: confidence, position, sector, correlation, drawdown, VIX, liquidity, streak
  • Half-Kelly position sizing with calibrated probability inputs; vectorbt backtesting engine

Ball View

Feb 2026
FastAPI, YOLOv8, OpenCV, Kalman Filter, EasyOCR, Docker/CUDA
Real-time cricket ball tracking
  • Real-time CV pipeline: browser capture at 30 FPS via WebSocket, YOLOv8n detection, Kalman Filter trajectory tracking (20-frame history)
  • OCR-to-match sync: EasyOCR reads scoreboard every 30 frames, fuzzy-matches against Cricsheet ball-by-ball JSON
  • Deployed on NVIDIA Docker (CUDA 12.4) with GPU passthrough on RTX 5070 Ti

Horizon League Budget Dashboard

Jan – Feb 2026
Next.js, Azure SQL, Entra ID, GitHub Actions
Budget benchmarking for 11 NCAA D1 universities
  • Replaced Power BI ($250/mo for 20 users) with Next.js on Azure Static Web Apps ($59/mo): 8 analytics pages with Recharts and TanStack React Table
  • Azure SQL star schema (11 dimensions, 7 facts) with Python ETL, Managed Identity for credential-free access, multi-tenant Entra ID auth for 11 institutional tenants

Open Export (Open Source)

Feb 2026
Python, Playwright, Click, Rich
ChatGPT conversation archival CLI tool
  • Published CLI tool to PyPI: Chrome DevTools Protocol connection, paginated API, tree-based message linearization, JSON + Markdown export with SHA-256 deduplication

PocketLedger

Mar 2026
FastAPI, Tesseract OCR, OpenCV, Ollama, SQLite
Local-first personal finance tracker
  • OCR-powered bank statement parser with auto-detecting bank routing (Chase, BofA, Discover), image preprocessing pipeline, and dual categorization engine (SQL pattern matching + Ollama LLM fallback)
Education1
Technical Skills2
Experience3
Projects4
Toggle Dark Mode
CoverDrive Cricket
Agricultural Data Analysis (QuickStats)
DataSkrive Cohort Analytics
Medicare Provider Cost Analysis
Peruse AI (Open Source)
APEX (Algorithmic Trading System)
Ball View
Horizon League Budget Dashboard
Open Export (Open Source)
PocketLedger
↑↓ navigate · ⏎ select · esc close