Plotting time data with panel dashboard and bokeh

In this notebook we first generate count data per businessday at several locations over two years. Then we merge the two years into one dataframe. We resample the data to monthly means by end of month frequencies. After that we display the data to compare the average month count of the two years.

from bokeh.io import output_file, show
from bokeh.io import output_notebook
import pandas as pd
import numpy as np
output_notebook()

Loading BokehJS ...

Pandas datetime

pandas stores timestamps using NumPy’s datetime64 data type at the nanosecond resolution. We can demonstrate this by creating a pandas series object with a data range in nanosecond frequency

#pd.date_range?
#pd.Series(pd.date_range('2021-07-01', periods=3, freq='D')) # day
#pd.Series(pd.date_range('2021-07-01', periods=3, freq='H')) # hour
pd.Series(pd.date_range('2021-07-01', periods=3, freq='N')) # nanosecond
0   2021-07-01 00:00:00.000000000
1   2021-07-01 00:00:00.000000001
2   2021-07-01 00:00:00.000000002
dtype: datetime64[ns]

We see that the pandas series object is of the dtype datetime64[ns] (ns = nanoseconds)

Create date_range

With date_range we can create all sorts of time intervals For example, if you wanted a date index containing the last business day of each month, you would pass the 'BM' frequency (business end of month)

Another example is a 2hour period

In the https://pandas.pydata.org/docs/reference/offset_frequency.html you can find more about frequencies and in the documentation https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html you can read all the methods of this datetime object.

Let us create some time related data. The data represent counts per business days at several locations

Assen
Groningen
Hoogeveen

2019-01-01

12

17

8

2019-01-02

24

8

7

Assen
Groningen
Hoogeveen

2020-12-14

26

28

25

2020-12-15

33

12

32

count
mean
std
min
25%
50%
75%
max

Assen

500.0

21.952

8.973223

7.0

14.0

21.0

29.25

39.0

Groningen

500.0

21.934

8.185331

7.0

15.0

22.0

28.00

39.0

Hoogeveen

500.0

22.042

8.554683

7.0

14.0

22.0

29.00

39.0

Plot the counts data

We can plot the data just by calling pandas.DataFrame.plot()

png
png

Resample

This data is not readable. We should consider Resampling. Resampling is necessary when you’re given a data set recorded in some time interval and you want to change the time interval to something else. For example, aggregate daily numbers into monthly numbers. The syntax

Assen
Groningen
Hoogeveen

2020-08-31

556

479

560

2020-09-30

490

538

488

2020-10-31

495

514

474

2020-11-30

519

496

570

2020-12-31

251

268

295

Assen
Groningen
Hoogeveen

count

2.000000

2.000000

2.000000

mean

5488.000000

5483.500000

5510.500000

std

1107.329219

884.590583

853.477885

min

4705.000000

4858.000000

4907.000000

25%

5096.500000

5170.750000

5208.750000

50%

5488.000000

5483.500000

5510.500000

75%

5879.500000

5796.250000

5812.250000

max

6271.000000

6109.000000

6114.000000

png

We can also resample to evaluate a part of the dataset. For instance we could get the mean value of Assen en Hoogeveen combined

png

Locators and Formatters using matplotib

This is not the kind of plot we want. Remember we can access the objects of the figure. The two relevant classes are Locators and Formatters. Locators determine where the ticks are, and formatters control the formatting of tick labels.

png

However, we want to compare the different dates over the year per year. Let's pivot the table

count
mean
std
min
25%
50%
75%
max

year

month

12.0

6.500000

3.605551

1.000000

3.750000

6.500000

9.250000

12.000000

2019

12.0

38.485756

2.406670

34.857143

37.238636

38.195652

39.326299

42.750000

2020

12.0

49.624818

3.694032

44.045455

46.689723

50.627273

52.545455

54.047619

png

The bokeh way

The example above is an example of grouped bar chart. Bokeh can handle up to three levels of nested (hierarchical) categories, and will automatically group output according to the outermost level. To specify neted categorical coordinates, the columns of the data source should contain tuples, for example:

x = [ ("jan", "2019"), ("jan", "2020"), ("feb", "2019"), ("feb", "2020), ... ]

Furthermore we need some styling https://docs.bokeh.org/en/latest/docs/user_guide/styling.html

see also https://docs.bokeh.org/en/latest/docs/user_guide/interaction/legends.html

Make it interactive using panel

We can make this plot interactive using panel. we need to reformat our print function to a general function that can handle year 2019, 2020 or both and we need to create a widget to select the years. We will make a grid layout to display the widget, the plot and the table. First we make sure we can run the panel form the notebook by panel.extension()

Important! If you use matplotlib instead of bokeh or holov you must make sure to return an interactive plot using pn.pane.Matplotlib(fig)

Making the panel nice with a template

The code generates interactive plots but is does not look nice. Let us use a template. See https://panel.holoviz.org/user_guide/Templates.html

Last updated