Project Overview

Formula 1 represents the pinnacle of motorsport engineering, but it's also a massive data problem. With hundreds of sensors on each car generating terabytes of data, extracting meaningful insights is a significant challenge.

This project utilizes the Ergast Developer API to fetch historical race data from 1950 to present day. The primary goal was to identify key factors contributing to race wins and to build a predictive model for the 2023 season.

We performed extensive Exploratory Data Analysis (EDA) to visualize driver consistency, constructor dominance eras, and the impact of qualifying position on final race standing.

Key Findings

87% Prediction Accuracy
1.2M+ Data Points Analyzed

Win Probability vs. Grid Position

There is a strong logarithmic decay correlation between starting grid position and probability of winning a race.

Implementation Detail

import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# Prepare features for the model
features = ['grid_position', 'constructor_points', 'driver_age']
X = race_data[features]
y = race_data['is_winner']

# Initialize Random Forest with optimized hyperparameters
clf = RandomForestClassifier(
    n_estimators=100, 
    max_depth=10, 
    random_state=42
)

# Fit the model
clf.fit(X_train, y_train)
print(f"Feature Importance: {clf.feature_importances_}")

FormulaOne Analysis

01.Project Overview

Project Overview

Key Findings

Win Probability vs. Grid Position

Implementation Detail

Technologies

Role

Timeline

Category

Related Projects