FormulaOne Analysis

A comprehensive data analysis pipeline leveraging historical Formula 1 race data to predict race outcomes and visualize constructor performance trends over the last decade.

01.Project Overview

Project Overview

Formula 1 represents the pinnacle of motorsport engineering, but it's also a massive data problem. With hundreds of sensors on each car generating terabytes of data, extracting meaningful insights is a significant challenge.

This project utilizes the Ergast Developer API to fetch historical race data from 1950 to present day. The primary goal was to identify key factors contributing to race wins and to build a predictive model for the 2023 season.

We performed extensive Exploratory Data Analysis (EDA) to visualize driver consistency, constructor dominance eras, and the impact of qualifying position on final race standing.

Key Findings

  • 87% Prediction Accuracy
  • 1.2M+ Data Points Analyzed

Win Probability vs. Grid Position

There is a strong logarithmic decay correlation between starting grid position and probability of winning a race.

Implementation Detail

import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# Prepare features for the model
features = ['grid_position', 'constructor_points', 'driver_age']
X = race_data[features]
y = race_data['is_winner']

# Initialize Random Forest with optimized hyperparameters
clf = RandomForestClassifier(
    n_estimators=100, 
    max_depth=10, 
    random_state=42
)

# Fit the model
clf.fit(X_train, y_train)
print(f"Feature Importance: {clf.feature_importances_}")

Technologies

PythonPandasScikit-LearnMatplotlibJupyterErgast API

Role

Data Scientist & Engineer

Timeline

Sep 2023 - Oct 2023

Category

Machine Learning / Sports Analytics