PlayerData x IU Indianapolis Analytics
Data engineering and analytics pipeline for athlete tracking data, part of the IU Indianapolis–PlayerData partnership. Includes synthetic data generation, integrity checks, and agent-ready metric design.
01.Project Overview
Overview
A collaborative data analytics project using PlayerData athlete tracking data as part of the IU Indianapolis–PlayerData partnership for the Spring 2026 semester. The project spans multiple roles (Business Analytics, Data Engineering, Data Science, Project Management) with the goal of turning raw athlete tracking data into actionable performance insights.
My Contributions (Data Engineering)
Synthetic Data Generation
Developed a statistical blueprint system that analyzes sample data to generate realistic synthetic datasets for development and testing:
- Statistical Profiling — Automated univariate (distributions, skewness, kurtosis) and multivariate (correlations, missingness patterns) analysis of the sample data
- Blueprint Conversion — Translated statistical profiles into generator configurations, preserving inter-column relationships and distribution shapes
- Synthetic Data Output — Generated multiple synthetic datasets (men's/women's soccer, divisions) with verifiable statistical fidelity to the original data
Data Integrity & Quality
- Built data validation pipelines to ensure incoming PlayerData exports meet expected schemas, value ranges, and completeness thresholds
- Performed cross-dataset integrity checks to flag anomalies and inconsistencies
Future Direction
Planning a chatbot/agent that can apply the metrics created by the Data Science team to larger datasets when plugged in, enabling coaches and staff to query performance insights through natural language.
Tech Stack
- Python, Pandas, NumPy
- Statistical analysis (scipy, custom profilers)
- Synthetic data generation pipeline
Technologies
Role
Data Engineer
Timeline
Jan 2026 - Present
Category
Data Engineering / Sports Analytics