Plotting F1 race data

Casus Plotting formule 1 data

Data is provided by Ivon Kok.

It contains points per location per F1 coureur. The progress in points will be plotted by a cumsum plot

import pandas as pd

Load data

df = pd.read_csv('data/points.csv')

Inspect data

df.head(2)

Coureur
BHR
EMI
POR
SPA
MON
AZE
FRA
STI
OOS
...
NED
ITA
RUS
TUR
VST
MXS
SAP
QAT
SAU
ABU

0

Hamilton

25

19

25

25

7

0

18

19

12

...

19

0

25

0

0

0

0

0

0

0

1

Verstappen

18

25

18

19

25

0

26

25

26

...

25

0

18

0

0

0

0

0

0

0

2 rows × 23 columns

df.tail(2)

Coureur
BHR
EMI
POR
SPA
MON
AZE
FRA
STI
OOS
...
NED
ITA
RUS
TUR
VST
MXS
SAP
QAT
SAU
ABU

19

Kubica

0

0

0

0

0

0

0

0

0

...

0

0

0

0

0

0

0

0

0

0

20

Mazepin

0

0

0

0

0

0

0

0

0

...

0

0

0

0

0

0

0

0

0

0

2 rows × 23 columns

# check if all are numbers
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 23 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Coureur  21 non-null     object 
 1   BHR      21 non-null     int64  
 2   EMI      21 non-null     int64  
 3   POR      21 non-null     int64  
 4   SPA      21 non-null     int64  
 5   MON      21 non-null     int64  
 6   AZE      21 non-null     int64  
 7   FRA      21 non-null     int64  
 8   STI      21 non-null     int64  
 9   OOS      21 non-null     int64  
 10  GBR      21 non-null     int64  
 11  HON      21 non-null     int64  
 12  BEL      21 non-null     float64
 13  NED      21 non-null     int64  
 14  ITA      21 non-null     int64  
 15  RUS      21 non-null     int64  
 16  TUR      21 non-null     int64  
 17  VST      21 non-null     int64  
 18  MXS      21 non-null     int64  
 19  SAP      21 non-null     int64  
 20  QAT      21 non-null     int64  
 21  SAU      21 non-null     int64  
 22  ABU      21 non-null     int64  
dtypes: float64(1), int64(21), object(1)
memory usage: 3.9+ KB
#some statistics
df.describe()
BHR
EMI
POR
SPA
MON
AZE
FRA
STI
OOS
GBR
...
NED
ITA
RUS
TUR
VST
MXS
SAP
QAT
SAU
ABU

count

21.000000

21.000000

21.000000

21.000000

21.000000

21.000000

21.000000

21.000000

21.000000

21.000000

...

21.000000

21.000000

21.000000

21.0

21.0

21.0

21.0

21.0

21.0

21.0

mean

4.857143

4.857143

4.857143

4.857143

4.857143

4.809524

4.857143

4.857143

4.857143

4.809524

...

4.857143

4.857143

4.857143

0.0

0.0

0.0

0.0

0.0

0.0

0.0

std

7.329978

7.350413

7.329978

7.350413

7.268327

7.256852

7.397876

7.350413

7.397876

7.256852

...

7.350413

7.397876

7.268327

0.0

0.0

0.0

0.0

0.0

0.0

0.0

min

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

...

0.000000

0.000000

0.000000

0.0

0.0

0.0

0.0

0.0

0.0

0.0

25%

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

...

0.000000

0.000000

0.000000

0.0

0.0

0.0

0.0

0.0

0.0

0.0

50%

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

...

0.000000

0.000000

0.000000

0.0

0.0

0.0

0.0

0.0

0.0

0.0

75%

8.000000

8.000000

8.000000

8.000000

8.000000

8.000000

8.000000

8.000000

8.000000

8.000000

...

8.000000

8.000000

8.000000

0.0

0.0

0.0

0.0

0.0

0.0

0.0

max

25.000000

25.000000

25.000000

25.000000

25.000000

25.000000

26.000000

25.000000

26.000000

25.000000

...

25.000000

26.000000

25.000000

0.0

0.0

0.0

0.0

0.0

0.0

0.0

8 rows × 22 columns

#visualize
df.boxplot()

Prepare data

#select only races that happened
df2=df.iloc[:,:-7]
df2.boxplot()
df2.head(3)

Coureur
BHR
EMI
POR
SPA
MON
AZE
FRA
STI
OOS
GBR
HON
BEL
NED
ITA
RUS

0

Hamilton

25

19

25

25

7

0

18

19

12

25

18

7.5

19

0

25

1

Verstappen

18

25

18

19

25

0

26

25

26

0

2

12.5

25

0

18

2

Bottas

16

0

16

15

0

0

12

15

18

15

0

0.0

15

15

10

#For plotting easiest is to set coureur as index
df2 = df2.set_index('Coureur')
df2.head(3)

BHR
EMI
POR
SPA
MON
AZE
FRA
STI
OOS
GBR
HON
BEL
NED
ITA
RUS

Coureur

Hamilton

25

19

25

25

7

0

18

19

12

25

18

7.5

19

0

25

Verstappen

18

25

18

19

25

0

26

25

26

0

2

12.5

25

0

18

Bottas

16

0

16

15

0

0

12

15

18

15

0

0.0

15

15

10

# Create cummulative dataframe in columns direction
df_cum = df2.cumsum(axis = 1)
#select only top 10
df_cum = df_cum.nlargest(10, 'RUS')

Plot Data

#Plot the cummulative numbers
#Plot by index, not by column -> df_cum.T
df_cum.T.plot(title = 'top 10 F1 coureurs', ylabel = 'points')

Last updated

Was this helpful?