Raj
Home/Projects/Ball View

Ball View

in-progress

Real-time cricket ball tracking overlay for live broadcasts

2 progress reports
ETLResearchEngineeringModelingLaunch
FastAPIOpenCVYOLOv8EasyOCRKalman FilterChrome Extension (Manifest V3)Tampermonkey UserscriptCanvas API
Overview

What It Is

Ball View is a real-time video analysis system that augments live cricket broadcasts with intelligent ball tracking and contextual match data. It captures video from the browser, processes it using computer vision (object detection and OCR), and renders analytical overlays back onto the video stream with minimal latency.

The system detects the cricket ball in broadcast footage, tracks its trajectory across frames, and synchronizes detections with ball-by-ball match data to display contextual information (current batter, bowler, score) alongside the visual tracking.


Architecture

A client-server model with two main components:

  • Browser Capture (Chrome Extension or Tampermonkey userscript): Captures video frames from a browser tab at approximately 30 FPS, downscales to 1280x720 JPEG, and sends binary frames via WebSocket to the backend.
  • Processing Server (FastAPI on Docker): Receives frames, runs object detection, spatial filtering, trajectory tracking, and periodic OCR. Returns JSON with ball position, trajectory history, pitch polygon, and match context.
  • Overlay Rendering: The browser client renders a transparent canvas on top of the video element, displaying ball position (green circle), trajectory lines (cyan), and a context box with player names.

Processing Pipeline

Each frame goes through multiple stages:

  1. Scene Classification: HSV color segmentation determines whether the frame shows active pitch play versus replays or crowd shots.
  2. Object Detection: YOLOv8 Nano identifies "sports ball" class with a low confidence threshold optimized for small cricket ball detection.
  3. Spatial Filtering: Detections outside the pitch polygon (derived from color segmentation) are rejected to eliminate false positives.
  4. Trajectory Tracking: A Kalman Filter (constant velocity model) smooths noisy detections and maintains a rolling 20-frame trajectory history.
  5. Match Context Sync: Every 30 frames, EasyOCR reads the scoreboard region. A data synchronizer fuzzy-matches extracted text against Cricsheet JSON to retrieve current batter, bowler, and run context.
Key Features

Data Sources

  • Cricsheet JSON: Ball-by-ball match data used for context synchronization. Currently loaded with IPL match data.
  • Browser Video: Real-time frame capture from any cricket streaming tab.

Current State

The core pipeline is functional: WebSocket communication, YOLO detection, Kalman Filter tracking, pitch detection, OCR integration, data synchronization, Chrome Extension, and Docker containerization are all implemented. Currently in active debugging phase, with ongoing work on pitch polygon stability, performance optimization, and potential OCR engine alternatives.

Progress Reports
Report #2 of 2
Ball View: Tracking, Vision & Docker Deployment
Feb 8 - Feb 8, 2026

Devlog

Less than 20 hours after the initial commit, I shipped the second major build: trajectory tracking, intelligent scene understanding, match context synchronization, and GPU-accelerated Docker deployment. This turned Ball View from "detect the ball in a frame" into "track the ball across frames, understand what is happening in the match, and show it on screen."

The Kalman Filter uses a constant velocity model with four state dimensions (x, y, velocity_x, velocity_y). I tuned it to trust predictions more than measurements (higher measurement noise than process noise), which smooths out the jitter from YOLO giving slightly different bounding boxes frame to frame. Trajectory history keeps a rolling window of 20 positions for visualization. The tracking only updates when three conditions are met: a ball is detected, the frame shows an active pitch view, and the detection point falls inside the pitch polygon. This prevents the tracker from jumping to false positives in crowd shots or replay graphics.

Pitch detection uses HSV color segmentation rather than a deep learning model. It looks for beige/brown tones (the pitch strip) and green (the grass), classifies the frame as "active pitch view" based on color ratios, and extracts a polygon around the pitch area using morphological cleanup and contour analysis. This is simpler than training a pitch segmentation model and works well enough for the broadcast angles I have tested. It will need per-stadium HSV tuning for different grounds and lighting conditions.

The data synchronizer connects OCR output to Cricsheet ball-by-ball records. Every 30 frames, EasyOCR reads the bottom 18% of the frame (where scoreboards typically sit), extracts team names and over counts via fuzzy matching (handling OCR misreads like O-to-0 and L-to-1), and looks up the current delivery in an in-memory ball map built from the Cricsheet JSON. When it finds a match, the system knows the current batter, bowler, runs, and extras without needing a separate data feed.

Docker deployment uses NVIDIA's PyTorch container image with CUDA 12.4 support. A PowerShell launcher script handles the container startup on WSL2 with GPU passthrough for the RTX 5070 Ti. I also built a debug visualization pipeline that saves annotated frames (pitch polygon, ball circle, trajectory lines) at regular intervals for offline analysis.

The system is functionally complete but in debugging phase. Pitch polygon stability needs temporal smoothing, and the OCR is CPU-bound which creates a bottleneck on frames where it runs.

What's next: Improving pitch polygon stability via temporal smoothing, investigating PaddleOCR as a lighter alternative to EasyOCR, and performance optimization for the critical processing loops.

Changelog

Added

  • Add Kalman Filter trajectory tracking with constant velocity model and 20-frame history
  • Add pitch detection via HSV color segmentation (beige pitch + green grass thresholds)
  • Add scene classification: active pitch view vs replay/crowd based on color ratios
  • Add pitch polygon extraction using morphological cleanup and contour analysis
  • Add Cricsheet data synchronizer with OCR-to-ball-by-ball fuzzy matching
  • Add team and over detection from scoreboard OCR with error correction (O/0, L/1)
  • Add Docker containerization with NVIDIA CUDA 12.4 base image
  • Add PowerShell launcher for WSL2 GPU passthrough (RTX 5070 Ti)
  • Add debug visualization: annotated frames with pitch polygon, ball detection, trajectory
  • Add technical summary documentation

Changed

  • Update YOLO confidence threshold to 0.3 (optimized for small cricket ball detection)
  • Update OCR frequency from every 60 frames to every 30 frames for faster state sync
  • Update main processing pipeline with detection gating (pitch view + polygon containment)
  • Update frontend capture_client.js with improved stream lifecycle and DOM-attached video element

Fixed

  • Fix data synchronization fallback logic across innings
kalman-filterpitch-detectionhsv-segmentationdata-synchronizationcricsheetdockernvidia-cudawsl2rtx-5070-tidebug-visualizationtrajectory-trackingscene-classificationcontour-analysis
Report #1 of 2
Ball View: Core Backend & Browser Capture
Feb 7 - Feb 7, 2026

Devlog

I built the first working version of Ball View today: a real-time system that captures video from a cricket broadcast in the browser, sends frames to a backend for object detection, and identifies the cricket ball using YOLOv8.

The architecture is a client-server model over WebSocket. The browser captures video frames at approximately 30 FPS, downscales them to 1280x720, compresses to JPEG at 0.7 quality, and streams the binary data to a FastAPI backend. The backend decodes each frame, runs YOLOv8n inference to detect the ball, and logs detections. I also wired up EasyOCR for scoreboard text extraction, throttled to run every 60 frames to avoid overwhelming the CPU.

I built two frontend capture clients to maximize compatibility. The primary is a Chrome Extension using Manifest V3 with the tabCapture API, which gives clean access to the active tab's video stream. The fallback is a Tampermonkey userscript that heuristically finds video elements on any page and captures from a canvas overlay. Both clients share the same WebSocket protocol and JPEG encoding strategy. Having two capture paths means the system works regardless of whether someone wants to install an extension or just paste a userscript.

The model choice was YOLOv8 Nano. The cricket ball is a small object in broadcast footage (often 5-20 pixels), so I needed a model fast enough for real-time inference that could still detect at that scale. Nano is the smallest YOLOv8 variant, which keeps frame processing fast enough to not bottleneck the 30 FPS capture rate.

I also pulled in Cricsheet IPL match data (16 match files plus a full CSK vs GT match) as the reference database for matching detected game state to ball-by-ball records. The data sync logic is not wired up yet, but the files are in place.

What's next: Adding trajectory tracking to smooth noisy detections across frames, pitch detection to filter false positives from crowd shots, and connecting the OCR output to the Cricsheet data for real-time match context.

Changelog

Added

  • Add FastAPI backend with WebSocket `/stream` endpoint for frame processing
  • Add YOLOv8n object detection for cricket ball identification
  • Add EasyOCR integration for scoreboard text extraction (throttled to every 60 frames)
  • Add Chrome Extension (Manifest V3) with tabCapture API for video stream capture
  • Add Tampermonkey userscript as cross-browser capture fallback
  • Add 30 FPS frame capture with 1280x720 downscaling and JPEG 0.7 compression
  • Add FPS monitoring and logging for backend performance tracking
  • Add Cricsheet IPL match JSON files (16 matches + CSK vs GT full data)
  • Add system architecture documentation
fastapiwebsocketyolov8easyocrchrome-extensionmanifest-v3tampermonkeyuserscriptopencvpytorchultralyticscricketreal-timevideo-capture