Plotting F1 race data

Casus Plotting formule 1 data

Data is provided by Ivon Kok.

It contains points per location per F1 coureur. The progress in points will be plotted by a cumsum plot

import pandas as pd

Load data

df = pd.read_csv('data/points.csv')

Inspect data

df.head(2)

Coureur

BHR

EMI

POR

SPA

MON

AZE

FRA

STI

OOS

...

NED

ITA

RUS

TUR

VST

MXS

SAP

QAT

ABU

Hamilton

...

Verstappen

...

2 rows × 23 columns

df.tail(2)

Coureur

BHR

EMI

POR

SPA

MON

AZE

FRA

STI

OOS

...

NED

ITA

RUS

TUR

VST

MXS

SAP

QAT

ABU

Kubica

...

Mazepin

...

2 rows × 23 columns

# check if all are numbers
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 23 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Coureur  21 non-null     object 
 1   BHR      21 non-null     int64  
 2   EMI      21 non-null     int64  
 3   POR      21 non-null     int64  
 4   SPA      21 non-null     int64  
 5   MON      21 non-null     int64  
 6   AZE      21 non-null     int64  
 7   FRA      21 non-null     int64  
 8   STI      21 non-null     int64  
 9   OOS      21 non-null     int64  
 10  GBR      21 non-null     int64  
 11  HON      21 non-null     int64  
 12  BEL      21 non-null     float64
 13  NED      21 non-null     int64  
 14  ITA      21 non-null     int64  
 15  RUS      21 non-null     int64  
 16  TUR      21 non-null     int64  
 17  VST      21 non-null     int64  
 18  MXS      21 non-null     int64  
 19  SAP      21 non-null     int64  
 20  QAT      21 non-null     int64  
 21  SAU      21 non-null     int64  
 22  ABU      21 non-null     int64  
dtypes: float64(1), int64(21), object(1)
memory usage: 3.9+ KB

#some statistics
df.describe()

BHR

EMI

POR

SPA

MON

AZE

FRA

STI

OOS

GBR

...

NED

ITA

RUS

TUR

VST

MXS

SAP

QAT

ABU

count

21.000000

...

21.000000

21.0

mean

4.857143

4.809524

4.857143

4.809524

...

4.857143

0.0

std

7.329978

7.350413

7.329978

7.350413

7.268327

7.256852

7.397876

7.350413

7.397876

7.256852

...

7.350413

7.397876

7.268327

0.0

min

0.000000

...

0.000000

0.0

25%

0.000000

...

0.000000

0.0

50%

0.000000

...

0.000000

0.0

75%

8.000000

...

8.000000

0.0

max

25.000000

26.000000

25.000000

26.000000

25.000000

...

25.000000

26.000000

25.000000

0.0

8 rows × 22 columns

#visualize
df.boxplot()

Prepare data

#select only races that happened
df2=df.iloc[:,:-7]

df2.boxplot()

df2.head(3)

Coureur

BHR

EMI

POR

SPA

MON

AZE

FRA

STI

OOS

GBR

HON

BEL

NED

ITA

RUS

Hamilton

7.5

Verstappen

12.5

Bottas

0.0

#For plotting easiest is to set coureur as index
df2 = df2.set_index('Coureur')

df2.head(3)

BHR

EMI

POR

SPA

MON

AZE

FRA

STI

OOS

GBR

HON

BEL

NED

ITA

RUS

Coureur

Hamilton

7.5

Verstappen

12.5

Bottas

0.0

# Create cummulative dataframe in columns direction
df_cum = df2.cumsum(axis = 1)
#select only top 10
df_cum = df_cum.nlargest(10, 'RUS')

Plot Data

#Plot the cummulative numbers
#Plot by index, not by column -> df_cum.T
df_cum.T.plot(title = 'top 10 F1 coureurs', ylabel = 'points')

PreviousMissing time related data

Last updated 3 years ago

Was this helpful?