Montly sales data
In this example we first generate sales data per business day at several locations over two years. Then we merge the two years into one dataframe. We resample the data to monthly means by end of month frequencies. After that we display the data to compare the average month sales of the two years.
Pandas datetime
pandas stores timestamps using NumPy’s datetime64 data type at the nanosecond resolution. We can demonstrate this by creating a pandas series object with a data range in nanosecond frequency
#pd.Series(pd.date_range('2021-07-01', periods=3, freq='D')) # day
#pd.Series(pd.date_range('2021-07-01', periods=3, freq='H')) # hour
pd.Series(pd.date_range('2021-07-01', periods=3, freq='N')) # nanosecond0 2021-07-01 00:00:00.000000000
1 2021-07-01 00:00:00.000000001
2 2021-07-01 00:00:00.000000002
dtype: datetime64[ns]We see that the pandas series object is of the dtype datetime64[ns] (ns = nanoseconds)
Create date_range
With date_range we can create all sorts of time intervals For example, if you wanted a date index containing the last business day of each month, you would pass the 'BM' frequency (business end of month)
dates = pd.date_range('1/1/2021', periods = 3, freq='BM')
pd.Series(dates)0 2021-01-29
1 2021-02-26
2 2021-03-31
dtype: datetime64[ns]Another example is a 2hour period
In the https://pandas.pydata.org/docs/reference/offset_frequency.html you can find more about frequencies and in the documentation https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html you can read all the methods of this datetime object.
Create time related sales data
Let us create some time related data. The data represent sales per business days at several locations
Assen
Groningen
Hoogeveen
2019-01-01
31
28
19
2019-01-02
10
16
25
2019-01-03
21
25
22
2019-01-04
19
23
14
2019-01-07
19
23
10
...
...
...
...
2020-12-09
24
15
19
2020-12-10
23
20
26
2020-12-11
16
26
11
2020-12-14
21
35
24
2020-12-15
33
29
32
500 rows × 3 columns
Plot the sales data
We can plot the data just by calling pandas.DataFrame.plot()

This is far away from a table we would like to see. For instance a table below:

Resample
This data is not readable. We should consider Resampling. Resampling is necessary when you’re given a data set recorded in some time interval and you want to change the time interval to something else. For example, aggregate daily numbers into monthly numbers. The syntax
Assen
Groningen
Hoogeveen
2020-08-31
552
492
548
2020-09-30
586
490
475
2020-10-31
599
558
543
2020-11-30
618
466
488
2020-12-31
255
263
290

We can also resample to evaluate a part of the dataset. For instance we could get the mean value of Assen en Hoogeveen combined

Locators and Formatters
This is not the kind of plot we want. Remember we can access the objects of the figure. The two relevant classes are Locators and Formatters. Locators determine where the ticks are, and formatters control the formatting of tick labels.

However, we want to compare the different dates over the year per year. Let's pivot the table
year
2019
2020
month
1
37.173913
46.434783
2
36.550000
53.050000
3
36.857143
51.636364
4
38.136364
52.454545
5
40.521739
46.714286
6
36.050000
54.136364
7
39.608696
51.304348
8
39.363636
52.380952
9
45.428571
48.227273
10
34.478261
51.909091
11
37.904762
52.666667
12
40.454545
49.545455

Last updated
Was this helpful?