PlayerData x IU Indianapolis Analytics

Data engineering and analytics pipeline for athlete tracking data, part of the IU Indianapolis–PlayerData partnership. Includes synthetic data generation, integrity checks, and agent-ready metric design.

01.Project Overview

Overview

A collaborative data analytics project using PlayerData athlete tracking data as part of the IU Indianapolis–PlayerData partnership for the Spring 2026 semester. The project spans multiple roles (Business Analytics, Data Engineering, Data Science, Project Management) with the goal of turning raw athlete tracking data into actionable performance insights.


My Contributions (Data Engineering)

Synthetic Data Generation

Developed a statistical blueprint system that analyzes sample data to generate realistic synthetic datasets for development and testing:

  • Statistical Profiling — Automated univariate (distributions, skewness, kurtosis) and multivariate (correlations, missingness patterns) analysis of the sample data
  • Blueprint Conversion — Translated statistical profiles into generator configurations, preserving inter-column relationships and distribution shapes
  • Synthetic Data Output — Generated multiple synthetic datasets (men's/women's soccer, divisions) with verifiable statistical fidelity to the original data

Data Integrity & Quality

  • Built data validation pipelines to ensure incoming PlayerData exports meet expected schemas, value ranges, and completeness thresholds
  • Performed cross-dataset integrity checks to flag anomalies and inconsistencies

Future Direction

Planning a chatbot/agent that can apply the metrics created by the Data Science team to larger datasets when plugged in, enabling coaches and staff to query performance insights through natural language.


Tech Stack

  • Python, Pandas, NumPy
  • Statistical analysis (scipy, custom profilers)
  • Synthetic data generation pipeline

Technologies

PythonPandasData EngineeringSynthetic DataStatistical Analysis

Role

Data Engineer

Timeline

Jan 2026 - Present

Category

Data Engineering / Sports Analytics