Raj
Home/Projects/CoverDrive Cricket

CoverDrive Cricket

live

Analyst-grade IPL match intelligence for the serious cricket fan

7 progress reports
ETLResearchEngineeringModelingLaunch
Next.js 16TypeScriptTailwind CSS 4RechartsClerkFastAPIPython 3.11+SQLAlchemy 2.0
Overview

What It Is

CoverDrive Cricket is a public-facing IPL analytics platform that covers the full match lifecycle: pre-match previews, live scoring, and post-match analysis. It targets serious cricket fans and analysts who want deeper insight than traditional scorecards provide, without crossing into betting territory.

The platform serves contextual match intelligence: who is likely to perform, why venue and conditions matter, how teams match up historically, and what the numbers say about probable outcomes.

The system is a two-tier web application with background data pipelines:

  • Frontend (Next.js App Router on Vercel): Dynamic match pages with ISR, authenticated user tiers via Clerk, interactive charts (Recharts), and SEO-optimized routes with JSON-LD structured data.
  • Backend (FastAPI on Railway): 21 API routers, 28 service modules, async PostgreSQL via SQLAlchemy 2.0, Redis caching with stale-while-revalidate patterns, and in-process background scheduling (APScheduler).
  • Database: PostgreSQL (Supabase) with 24+ tables, anchored by a ball-by-ball delivery table (~2M+ rows) that powers all derived analytics.
  • Background Jobs: Cache warming (2h cycles), live score polling (30s during match windows), automated post-match data ingestion, and form snapshot pre-computation.

Data flows through four ingestion channels:

  1. Cricsheet (primary historical source): Open-license JSON ball-by-ball data for IPL and five other T20 leagues (BBL, CPL, PSL, SA20, T20I). Parsed into delivery-level records with idempotent upserts. Covers impact player substitution events.
  2. SportMonks Cricket API: Real-time live scores, fixtures, standings, and confirmed playing XIs. Polled every 30 seconds during IPL match windows (1:30 PM to 11:30 PM IST).
  3. Open-Meteo Weather API: Venue-specific temperature, humidity, and dew point data, refreshed every 15 minutes. Feeds the dew index computation.
  4. Manual seeds: Coach tenures, venue metadata, and season rosters maintained as CSV files, ingested via seed scripts.

Pre-computed lookup tables (strike rate percentiles, era baselines, matchup priors, context factors) are built offline and stored as JSON artifacts used at query time.

Key Features

Monetization

Freemium model with usage gates. Free tier provides basic match previews and scores. Paid tier (IPL Season Pass via Razorpay) unlocks full analytics depth: detailed matchups, adjusted metrics, H2H deep views, and AI narratives. Gates are enforced at the API level with per-user tracking.


What It Does Not Do

  • No betting odds, tips, or wagering signals. Win probability is framed as match context, not a betting tool.
  • No real-time video or highlights.
  • No fantasy league integration or team-building tools.
  • No social features (comments, forums, user-generated content).

Current State

The platform is live and operational for IPL 2026. Core analytics (previews, live scores, post-match, profiles, win probability, adjusted metrics) are all in production. Background pipelines run autonomously during the IPL season window.

Active development areas include cross-league normalization (using other T20 leagues to enrich IPL-specific models), advanced form modeling, and deeper team structure analysis through network-based approaches.

Progress Reports
Report #7 of 7
CoverDrive Cricket: Phase A, Cross-League Lambda Research
Apr 3 - Apr 6, 2026

Devlog

With the win probability v2 engine live and user analytics flowing via PostHog, I started the research phase that underpins everything planned for the next generation of CoverDrive's analytics: estimating league quality coefficients that normalize non-IPL T20 performance into IPL-equivalent units.

The core problem is straightforward. A player who averages 35 in the BBL and another who averages 35 in the IPL are not equivalent. League difficulty varies, and without accounting for it, any cross-league enrichment (using BBL data to inform IPL predictions for players who play in both) introduces systematic bias. The goal is a set of coefficients across six leagues (IPL as reference, plus BBL, CPL, PSL, SA20, and T20I) covering batting phases and bowling types.

The estimation methodology uses a Bayesian approach with Beta-distributed priors. Players who have competed in multiple leagues serve as the bridge population. Their performance differentials across leagues, after controlling for phase and opposition, inform the posterior distributions. The quality gate is a coefficient of variation below 0.25 for all cells with more than 50 overlap players.

I built the Cricsheet multi-league ingest pipeline to pull ball-by-ball data for all five non-IPL leagues alongside the existing IPL data. Player identity resolution across leagues was a significant effort: the same player can appear under different name formats across Cricsheet files (full name vs initials, different transliterations). I implemented fuzzy matching with manual override aliases for known problem cases.

The lambda_coefficients table and supporting migration are in place on Supabase. Alongside this research work, I also shipped rain delay and play-stopped status handling from SportMonks, and fixed some edge cases in live match detection and league filtering.

This phase is a prerequisite for everything on the roadmap: the form model upgrades, the hierarchical matchup matrix, and the network centrality research all depend on being able to fold cross-league evidence into IPL-specific predictions without introducing league-quality bias.

What's next: Running the Bayesian estimation, validating posterior distributions, and publishing coefficients to the lambda table. After that, Phase B begins with advanced form modeling and trajectory estimation.

Changelog

Added

  • Add Cricsheet ingest pipeline for BBL, CPL, PSL, SA20, and T20I leagues
  • Add player identity resolution with fuzzy matching and manual alias overrides
  • Add lambda_coefficients table and Alembic migration on Supabase
  • Add Bayesian lambda estimation framework with Beta priors and player-overlap evidence
  • Add cross-league delivery and match tables for multi-league data storage
  • Add rain delay and play-stopped status handling from SportMonks

Changed

  • Update Cricsheet parser to support non-IPL league data formats

Fixed

  • Fix live match detection edge cases with league filtering
  • Fix win probability persistence for interrupted matches
  • Fix bowling XI display for confirmed lineups
bayesiancricsheetcross-leaguesupabaseresearchplayer-identity-resolutionfuzzy-matchinglambda-coefficientsbblcplpslsa20t20idata-pipelinecricket-analyticsnormalization
Report #6 of 7
CoverDrive Cricket: Win Probability v2, 3-Stage Bayesian Engine
Apr 1 - Apr 2, 2026

Devlog

With the landing page and tournament stats polished, I replaced the v1 win probability model with a 3-stage Bayesian engine that updates across the full match lifecycle: pre-match, post-toss, and in-play.

Stage 1 is a logistic regression with isotonic calibration. On 2024 held-out data it achieves a Brier score of 0.194, which is solid for a pre-match estimate with no toss or lineup information. Stage 2 applies a Bayesian log-odds update after the toss, incorporating toss advantage, XI quality delta (when confirmed lineups are available), and matchup surplus from the shrinkage engine. Stage 3 is a NumPy-vectorized Monte Carlo engine running thousands of simulations per update, producing ball-by-ball win probability estimates during the match. Latency stays low enough to keep it viable for live updates without blocking the scorecard rendering.

Each stage has its own Redis cache with appropriate TTLs: pre-match predictions cache until toss, post-toss until match start, and in-play on a short TTL to stay fresh during live matches. The caching was essential because Stage 1 and 2 predictions do not change once computed for a given match, but Stage 3 needs to update frequently without re-running the earlier stages.

The win probability chart on the match page now shows the full 0-to-20-over progression. For completed matches, the chart renders from stored history rather than re-computing, and I enriched the stored data with full game state (score, wickets, required rate, bowling figures) so post-match analysis can show what drove probability shifts at each point.

I also integrated PostHog for event tracking and added an in-app feedback collection layer. Understanding how users interact with the analytics surfaces will inform what to build next.

One thing I learned: the isotonic calibration step matters more than the model complexity. A well-calibrated simple model produces better probability estimates than a complex uncalibrated one. The v1 model had raw outputs that were overconfident in close matches. Isotonic calibration flattened that bias significantly.

What's next: Starting the Phase A research pipeline for cross-league lambda coefficient estimation, which will normalize performance metrics from other T20 leagues into IPL-equivalent units.

Changelog

Added

  • Add win probability v2: 3-stage Bayesian engine (pre-match, post-toss, in-play)
  • Add Stage 1: logistic regression with isotonic calibration
  • Add Stage 2: Bayesian log-odds update for toss, XI delta, and matchup surplus
  • Add Stage 3: NumPy-vectorized Monte Carlo engine for ball-by-ball live updates
  • Add per-stage Redis caching with lifecycle-appropriate TTLs
  • Add win probability chart with full 0-20 over x-axis progression
  • Add stored win prob history for completed matches with enriched game state
  • Add PostHog analytics integration for user behavior tracking
  • Add in-app feedback collection layer

Changed

  • Update win prob chart polling interval from 30 seconds to 90 seconds
  • Update match page to serve win prob chart from stored history for completed matches
bayesianmonte-carloisotonic-calibrationnumpyposthogwin-probabilitylive-chartrediscricket-analyticslogistic-regressionmodel-evaluationbrier-scoreuser-analyticsfeedback
Report #5 of 7
CoverDrive Cricket: Landing Page, Stats & Social Export
Mar 28 - Mar 31, 2026

Devlog

With live scoring and post-match analysis shipped, I turned to making the product feel complete from the outside. This was the "make it feel like a real product" milestone: four iterations of the landing page, a tournament statistics page, the teams hub, a social media export feature, and a design system overhaul.

The landing page went through v1 (basic hero), v2 (linked match preview zone), v3 (compact layout with arrow navigation), and finally v4. The current state has a hero section, a match zone that auto-selects the next upcoming match (not the first completed one, which was a surprisingly common UX mistake), dynamic season snapshot tiles (matches analyzed, top run scorer, top wicket taker), an SVG-based cap race visualization, and a live points table pulled from the API. The points table rows link directly to team profiles. Most of this data is now dynamic from the season snapshot API endpoint, cached for 5 minutes.

The tournament statistics page surfaces 14 leaderboard categories across batting and bowling dimensions. The teams hub shows all active IPL franchises as cards with official brand colors, linking to squad pages with role-based filtering and inline player stats modals.

I also shipped the post-match social media export: a 4-slide square carousel (hero summary, worm chart, partnerships, scorecard) rendered for download and sharing. The initial implementation used html2canvas, but it had letter-spacing displacement issues and CORS problems with cross-origin fonts. Switching to html-to-image with a double-pass rendering strategy (resource priming on the first pass, capture on the second) fixed both issues.

The light mode theming and design system v5 shipped alongside this work. The system uses CSS variables with attribute-based switching and localStorage persistence. Having a consistent token system made the landing page iterations much faster since each v2/v3/v4 pass only needed to adjust layout, not re-specify colors.

What's next: Replacing the v1 win probability model with a 3-stage Bayesian engine that updates across the full match lifecycle.

Changelog

Added

  • Add landing page v4: hero, match zone with auto-select, cap race SVGs, dynamic points table
  • Add season snapshot API with top scorers, leading bowlers, and match count tiles
  • Add tournament statistics page with 14 leaderboard categories
  • Add teams hub with franchise cards and official brand colors
  • Add squad pages with role-based filtering and inline player stats modals
  • Add post-match social media export: 4-slide square carousel
  • Add light mode theming with CSS variable tokens and localStorage persistence
  • Add design system v5 with consistent token naming convention

Changed

  • Update landing page to auto-select next upcoming match instead of first completed
  • Update match zone to support upcoming, live, and recent match tabs
  • Update points table to link team rows to team profile pages

Fixed

  • Fix letter-spacing displacement in social export by replacing html2canvas with html-to-image
  • Fix CORS font loading in export by implementing double-pass rendering (resource priming)
nextjsrechartshtml-to-imagedesign-systemtailwindlanding-pageleaderboardsteams-hubsocial-media-exportlight-modecss-variablessvgcap-racepoints-tableseason-snapshot
Report #4 of 7
CoverDrive Cricket: Live Scoring & SportMonks Migration
Mar 27 - Mar 30, 2026

Devlog

With the analytics surfaces and monetization in place, I tackled the biggest infrastructure pivot so far: migrating the entire live data layer from CricketData.org to SportMonks Cricket API v2. The previous live feed was unreliable during peak match hours and lacked structured lineup data. SportMonks provides confirmed playing XIs, toss details, ball-by-ball scoring, and standings sync, all through a single token-authenticated API.

The live scorecard now polls every 30 seconds during IPL match hours (1:30 PM to 11:30 PM IST). Outside that window, polling stops entirely to conserve API quota. Each poll result is cached in Redis with a 30-second TTL. The scorecard UI shows innings tabs (defaulting to the current innings), batting and bowling cards with extras handling, toss information, playing XI from the lineup field, and a "yet-to-bat" row.

I also built the full post-match analysis view. Once a match completes, the system generates innings charts from ball-by-ball data: worm diagrams (cumulative run progression), partnership breakdowns, bowler economy heatmaps, dot-ball pressure sequences, and phase-wise scoring summaries. Chart data comes from SportMonks as the primary source, with Cricsheet as a fallback for when SportMonks data is delayed. Key performers and match verdicts are auto-generated from the delivery data.

The trickiest part was the data source fallback logic. SportMonks provides ball-by-ball data in a different structure than Cricsheet, so the chart service needed to normalize both formats into a common internal representation. I also had to stop rate-limiting SportMonks on post-match chart pages: completed match data does not change, so I now cache it for 7 days instead of re-fetching on every page load.

Live match detection was more nuanced than I expected. A match can be in progress, not started, finished, or in a rain delay, and each state requires different UI treatment and different polling behavior.

What's next: Redesigning the landing page with a proper match zone, building tournament stats leaderboards, and launching the teams hub.

Changelog

Added

  • Add SportMonks Cricket API v2 integration with token-based authentication
  • Add live scorecard with 30-second polling during IPL match hours (1:30-11:30 PM IST)
  • Add playing XI display from SportMonks lineup data
  • Add toss information display on live scorecard
  • Add yet-to-bat row in batting card
  • Add post-match analysis view with innings charts (worm, partnerships, heatmap, dot pressure, phase scoring)
  • Add key performers and match verdict auto-generation
  • Add auto-sync for standings and results from SportMonks
  • Add Cricsheet fallback for ball-by-ball chart data when SportMonks is delayed

Changed

  • Update chart data source priority: SportMonks primary, Cricsheet fallback
  • Update completed match chart cache TTL from 30 seconds to 7 days

Fixed

  • Fix rate-limiting on post-match chart pages for completed matches
  • Fix extras handling in live scorecard (wides, no-balls, byes, leg-byes)

Removed

  • Remove CricketData.org live feed integration
sportmonkslive-datarediscricket-analyticspost-match-analysisworm-chartpartnershipsball-by-ballinnings-chartsdata-source-migrationapi-integrationcaching
Report #3 of 7
CoverDrive Cricket: Profiles, Narratives & Monetization
Mar 24 - Mar 26, 2026

Devlog

Three major systems shipped this week, and they all needed to land together: player and team profiles, AI-generated match narratives, and the full monetization stack. The freemium model only works if there is enough depth behind the paywall to justify upgrading, so the analytics surfaces had to be in place before the gate went live.

Player profiles aggregate career and season stats with venue splits (home vs away), phase-wise breakdowns (powerplay, middle, death), batting position distribution, percentile rankings against the active player pool, and dismissal patterns. I also built a peer exploration view with scatter plots comparing a player against their squad and opposition on key metrics. The H2H deep view shows batter-vs-bowler analysis: strike rate curves, venue-conditioned performance, and adjacent matchup context so you can see how a batter performs against similar bowler types, not just the specific bowler.

The narrative pipeline uses the Anthropic API to generate analytical bullets per match. Each bullet is typed by signal category (matchup, venue, dew, form, win probability, composition) so the frontend can render them contextually. There is a manual review step before publishing: narratives go into a staging table, get reviewed, and then a publish action pushes them live. This was a deliberate choice. Auto-publishing AI-generated cricket analysis felt risky given how specific and opinionated the domain is.

For monetization, I integrated Clerk for authentication and Razorpay for payments. I initially set up Stripe alongside Razorpay but removed Stripe the same day: for an India-focused IPL product, Razorpay handles UPI and Indian card payments far better. The freemium model gates advanced analytics (H2H deep views, structural matchups, full player profiles) behind a season pass, with usage tracking enforced at the API level via Redis counters. One important decision: the gates fail open when Redis is unavailable. I would rather give everyone free access temporarily than return 503 errors during a live match.

What's next: Migrating the live score feed to SportMonks Cricket API for real-time match data and building out the live scorecard UI.

Changelog

Added

  • Add player profile pages with stats, percentiles, venue splits, phase breakdowns, dismissal patterns
  • Add peer exploration scatter plots (squad and opposition comparisons)
  • Add H2H deep view: batter-vs-bowler SR curves, venue splits, adjacent matchup context
  • Add team profile pages with squad composition and trajectory charts
  • Add Claude narrative pipeline with typed bullets (matchup, venue, dew, form, win prob, composition)
  • Add narrative staging and publish workflow with manual review gate
  • Add Clerk authentication integration with email identification
  • Add Razorpay checkout flow with IPL Season Pass subscription
  • Add freemium usage gates with Redis-backed per-user rate limiting
  • Add user_plans table for subscription tracking

Changed

  • Update match preview to include narrative bullets section

Fixed

  • Fix 503 errors on match preview when Redis unavailable by failing usage gates open
  • Fix dew scale mismatch in narrative signal guards

Removed

  • Remove Stripe integration (replaced by Razorpay-only checkout)
clerkrazorpayanthropic-apifastapinextjsplayer-profilesh2h-analysisnarrative-generationfreemiumauthenticationpaymentsredisusage-gatescricket-analytics
Report #2 of 7
CoverDrive Cricket: Adjusted Metrics & Win Probability v1
Mar 19 - Mar 23, 2026

Devlog

With the data pipeline stable, I turned to the analytics layer that differentiates CoverDrive from a basic scorecard app. This milestone shipped three proprietary metric systems: context-adjusted performance metrics, a win probability model, and a shrinkage-blended matchup engine.

The adjusted metrics (SR+, Avg+, Eco+) decompose each player's career numbers across four factors: era, venue, match phase (powerplay, middle, death), and opposition quality. The era boundary is set at 2023 because the Impact Player Rule fundamentally changed T20 batting patterns in that season. Venue and phase factors are computed from the delivery table, and opposition factors use season-level aggregates. The result is a set of metrics that let you compare a batter's powerplay strike rate at one ground against a different batter's death-overs economy at another, on a fair basis.

Win probability v1 is a logistic regression using team strength deltas, form differentials, bowling edge, historical head-to-head prior, toss advantage, venue fitness, and team composition metrics. It runs pre-match and updates post-toss. I also built the EWA form system (exponentially weighted average over recent innings) and a shrinkage matchup model that blends empirical batter-vs-bowler records with a smoothed prior matrix, keeping a minimum prior weight to avoid noisy small-sample estimates dominating.

The biggest performance lesson came from the predicted XI feature. The initial implementation made 88 database queries per match preview (one per player per metric). I refactored to batch queries, bringing cold preview builds down to about 23 queries, with warm reads hitting a single Redis GET. That N+1 fix was the difference between a 4-second page load and a sub-second one.

I also shipped team composition analysis: Batting Composition Index (how balanced the top 7 is), bowling depth (how many bowlers can complete their quota), and allrounder ratio. These feed into the win probability model as structural features, and they turned out to be more predictive than I expected.

What's next: Player profile pages with percentile rankings, a Claude-powered narrative pipeline for match storylines, and the monetization layer to gate the analytics behind a freemium paywall.

Changelog

Added

  • Add SR+, Avg+, Eco+ adjusted metrics with 4-factor decomposition (era, venue, phase, opposition)
  • Add win probability v1: logistic regression model with team strength and composition features
  • Add EWA form indicators with exponential weighting over recent innings
  • Add shrinkage matchup model with smoothed prior matrix blending
  • Add team composition metrics: Batting Composition Index, bowling depth, allrounder ratio
  • Add predicted XI engine with impact substitute optimization
  • Add expected SR and economy lookup table builders
  • Add prior matrix computation pipeline
  • Add form caching system in Redis
  • Add cache warming infrastructure with background APScheduler tasks

Changed

  • Update match preview to batch player queries (88 queries reduced to 23)

Fixed

  • Fix N+1 query pattern in adjusted metrics service for predicted XI cards
scikit-learnrediscricket-analyticslogistic-regressionadjusted-metricsmatchupsform-indicatorsteam-compositionpredicted-xicache-warmingapschedulerbayesian-shrinkage
Report #1 of 7
CoverDrive Cricket: MVP & Core Data Pipeline
Mar 18 - Mar 19, 2026

Devlog

I shipped the first working version of CoverDrive Cricket today: an IPL match preview platform with a full data pipeline, deployed to production.

The core bet was building around Cricsheet's open ball-by-ball JSON data as the foundation layer. I wrote a parser that ingests historical IPL data (2008 onward) into a PostgreSQL schema on Supabase, covering matches, players, teams, deliveries, venues, rosters, and standings. The schema is normalized enough to support the analytics I have planned, but not so rigid that adding new dimensions later will be painful. The delivery table alone holds over two million rows, and every downstream metric I want to build (adjusted stats, win probability, matchup matrices) will query against it.

The backend is FastAPI with async SQLAlchemy and Redis caching, deployed on Railway. The frontend is Next.js with TypeScript on Vercel. Match preview pages are the primary surface: for each upcoming IPL match, the page pulls team context, venue history, and basic player stats into a single view. I also wired up weather data from Open-Meteo and seeded coach tenures, venue metadata, and IPL 2026 rosters manually via CSV.

The harder-than-expected part was deployment. Railway's Python buildpack needed explicit version pinning to 3.12, the DATABASE_URL scheme had to be auto-corrected from `postgres://` to `postgresql+asyncpg://`, and Vercel builds were hanging because the API client had no timeout on startup health checks. Small things, but they ate hours. I also added a Procfile with pre-deploy Alembic migrations so schema changes apply automatically on push.

One decision I am already glad I made: separating the Cricsheet ingest pipeline from the live data layer. The ingest is idempotent (upsert on conflict), so I can re-run it safely after every match without worrying about duplicates. That will matter when I add automated post-match ingestion.

What's next: Building out the analytics layer: adjusted batting and bowling metrics that account for era, venue, phase, and opposition context.

Changelog

Added

  • Add FastAPI backend with async SQLAlchemy 2.0 and Redis caching
  • Add Next.js 16 frontend with TypeScript and App Router
  • Add Cricsheet JSON parser for IPL ball-by-ball data (2008-2024)
  • Add PostgreSQL schema: matches, players, teams, deliveries, venues, rosters, standings
  • Add Open-Meteo weather pipeline for venue-specific conditions
  • Add match preview page at `/match/[season]/[slug]`
  • Add landing page with upcoming and recent match listings
  • Add coach tenure, venue metadata, and IPL 2026 roster CSV seeds
  • Add Railway deployment with Procfile and pre-deploy Alembic migrations
  • Add Vercel deployment with 10-second API client timeout

Fixed

  • Fix Railway build by pinning Python 3.12 and auto-correcting DATABASE_URL scheme
  • Fix Vercel build hangs by adding timeout to API health checks
fastapinextjscricsheetsupabaserailwayvercelpostgresqlredispythontypescriptcricket-analyticsdata-pipelinealembic