Plotting F1 race data
Casus Plotting formule 1 data
Data is provided by Ivon Kok.
It contains points per location per F1 coureur. The progress in points will be plotted by a cumsum plot
import pandas as pd
Load data
df = pd.read_csv('data/points.csv')
Inspect data
df.head(2)
0
Hamilton
25
19
25
25
7
0
18
19
12
...
19
0
25
0
0
0
0
0
0
0
1
Verstappen
18
25
18
19
25
0
26
25
26
...
25
0
18
0
0
0
0
0
0
0
2 rows × 23 columns
df.tail(2)
19
Kubica
0
0
0
0
0
0
0
0
0
...
0
0
0
0
0
0
0
0
0
0
20
Mazepin
0
0
0
0
0
0
0
0
0
...
0
0
0
0
0
0
0
0
0
0
2 rows × 23 columns
# check if all are numbers
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Coureur 21 non-null object
1 BHR 21 non-null int64
2 EMI 21 non-null int64
3 POR 21 non-null int64
4 SPA 21 non-null int64
5 MON 21 non-null int64
6 AZE 21 non-null int64
7 FRA 21 non-null int64
8 STI 21 non-null int64
9 OOS 21 non-null int64
10 GBR 21 non-null int64
11 HON 21 non-null int64
12 BEL 21 non-null float64
13 NED 21 non-null int64
14 ITA 21 non-null int64
15 RUS 21 non-null int64
16 TUR 21 non-null int64
17 VST 21 non-null int64
18 MXS 21 non-null int64
19 SAP 21 non-null int64
20 QAT 21 non-null int64
21 SAU 21 non-null int64
22 ABU 21 non-null int64
dtypes: float64(1), int64(21), object(1)
memory usage: 3.9+ KB
#some statistics
df.describe()
count
21.000000
21.000000
21.000000
21.000000
21.000000
21.000000
21.000000
21.000000
21.000000
21.000000
...
21.000000
21.000000
21.000000
21.0
21.0
21.0
21.0
21.0
21.0
21.0
mean
4.857143
4.857143
4.857143
4.857143
4.857143
4.809524
4.857143
4.857143
4.857143
4.809524
...
4.857143
4.857143
4.857143
0.0
0.0
0.0
0.0
0.0
0.0
0.0
std
7.329978
7.350413
7.329978
7.350413
7.268327
7.256852
7.397876
7.350413
7.397876
7.256852
...
7.350413
7.397876
7.268327
0.0
0.0
0.0
0.0
0.0
0.0
0.0
min
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
...
0.000000
0.000000
0.000000
0.0
0.0
0.0
0.0
0.0
0.0
0.0
25%
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
...
0.000000
0.000000
0.000000
0.0
0.0
0.0
0.0
0.0
0.0
0.0
50%
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
...
0.000000
0.000000
0.000000
0.0
0.0
0.0
0.0
0.0
0.0
0.0
75%
8.000000
8.000000
8.000000
8.000000
8.000000
8.000000
8.000000
8.000000
8.000000
8.000000
...
8.000000
8.000000
8.000000
0.0
0.0
0.0
0.0
0.0
0.0
0.0
max
25.000000
25.000000
25.000000
25.000000
25.000000
25.000000
26.000000
25.000000
26.000000
25.000000
...
25.000000
26.000000
25.000000
0.0
0.0
0.0
0.0
0.0
0.0
0.0
8 rows × 22 columns
#visualize
df.boxplot()

Prepare data
#select only races that happened
df2=df.iloc[:,:-7]
df2.boxplot()
df2.head(3)

0
Hamilton
25
19
25
25
7
0
18
19
12
25
18
7.5
19
0
25
1
Verstappen
18
25
18
19
25
0
26
25
26
0
2
12.5
25
0
18
2
Bottas
16
0
16
15
0
0
12
15
18
15
0
0.0
15
15
10
#For plotting easiest is to set coureur as index
df2 = df2.set_index('Coureur')
df2.head(3)
Coureur
Hamilton
25
19
25
25
7
0
18
19
12
25
18
7.5
19
0
25
Verstappen
18
25
18
19
25
0
26
25
26
0
2
12.5
25
0
18
Bottas
16
0
16
15
0
0
12
15
18
15
0
0.0
15
15
10
# Create cummulative dataframe in columns direction
df_cum = df2.cumsum(axis = 1)
#select only top 10
df_cum = df_cum.nlargest(10, 'RUS')
Plot Data
#Plot the cummulative numbers
#Plot by index, not by column -> df_cum.T
df_cum.T.plot(title = 'top 10 F1 coureurs', ylabel = 'points')

Last updated
Was this helpful?