FormulaOne Analysis
A comprehensive data analysis pipeline leveraging historical Formula 1 race data to predict race outcomes and visualize constructor performance trends over the last decade.
01.Project Overview
Project Overview
Formula 1 represents the pinnacle of motorsport engineering, but it's also a massive data problem. With hundreds of sensors on each car generating terabytes of data, extracting meaningful insights is a significant challenge.
This project utilizes the Ergast Developer API to fetch historical race data from 1950 to present day. The primary goal was to identify key factors contributing to race wins and to build a predictive model for the 2023 season.
We performed extensive Exploratory Data Analysis (EDA) to visualize driver consistency, constructor dominance eras, and the impact of qualifying position on final race standing.
Key Findings
- 87% Prediction Accuracy
- 1.2M+ Data Points Analyzed
Win Probability vs. Grid Position
There is a strong logarithmic decay correlation between starting grid position and probability of winning a race.
Implementation Detail
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
# Prepare features for the model
features = ['grid_position', 'constructor_points', 'driver_age']
X = race_data[features]
y = race_data['is_winner']
# Initialize Random Forest with optimized hyperparameters
clf = RandomForestClassifier(
n_estimators=100,
max_depth=10,
random_state=42
)
# Fit the model
clf.fit(X_train, y_train)
print(f"Feature Importance: {clf.feature_importances_}")
Technologies
Role
Data Scientist & Engineer
Timeline
Sep 2023 - Oct 2023
Category
Machine Learning / Sports Analytics