Montly sales data

In this example we first generate sales data per business day at several locations over two years. Then we merge the two years into one dataframe. We resample the data to monthly means by end of month frequencies. After that we display the data to compare the average month sales of the two years.

Pandas datetime

pandas stores timestamps using NumPy’s datetime64 data type at the nanosecond resolution. We can demonstrate this by creating a pandas series object with a data range in nanosecond frequency

#pd.Series(pd.date_range('2021-07-01', periods=3, freq='D')) # day
#pd.Series(pd.date_range('2021-07-01', periods=3, freq='H')) # hour
pd.Series(pd.date_range('2021-07-01', periods=3, freq='N')) # nanosecond
0   2021-07-01 00:00:00.000000000
1   2021-07-01 00:00:00.000000001
2   2021-07-01 00:00:00.000000002
dtype: datetime64[ns]

We see that the pandas series object is of the dtype datetime64[ns] (ns = nanoseconds)

Create date_range

With date_range we can create all sorts of time intervals For example, if you wanted a date index containing the last business day of each month, you would pass the 'BM' frequency (business end of month)

dates = pd.date_range('1/1/2021', periods = 3, freq='BM')
pd.Series(dates)
0   2021-01-29
1   2021-02-26
2   2021-03-31
dtype: datetime64[ns]

Another example is a 2hour period

In the https://pandas.pydata.org/docs/reference/offset_frequency.html you can find more about frequencies and in the documentation https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html you can read all the methods of this datetime object.

Let us create some time related data. The data represent sales per business days at several locations

Assen

Groningen

Hoogeveen

2019-01-01

31

28

19

2019-01-02

10

16

25

2019-01-03

21

25

22

2019-01-04

19

23

14

2019-01-07

19

23

10

...

...

...

...

2020-12-09

24

15

19

2020-12-10

23

20

26

2020-12-11

16

26

11

2020-12-14

21

35

24

2020-12-15

33

29

32

500 rows × 3 columns

Plot the sales data

We can plot the data just by calling pandas.DataFrame.plot()

This is far away from a table we would like to see. For instance a table below:

Resample

This data is not readable. We should consider Resampling. Resampling is necessary when you’re given a data set recorded in some time interval and you want to change the time interval to something else. For example, aggregate daily numbers into monthly numbers. The syntax

Assen

Groningen

Hoogeveen

2020-08-31

552

492

548

2020-09-30

586

490

475

2020-10-31

599

558

543

2020-11-30

618

466

488

2020-12-31

255

263

290

We can also resample to evaluate a part of the dataset. For instance we could get the mean value of Assen en Hoogeveen combined

Locators and Formatters

This is not the kind of plot we want. Remember we can access the objects of the figure. The two relevant classes are Locators and Formatters. Locators determine where the ticks are, and formatters control the formatting of tick labels.

However, we want to compare the different dates over the year per year. Let's pivot the table

year

2019

2020

month

1

37.173913

46.434783

2

36.550000

53.050000

3

36.857143

51.636364

4

38.136364

52.454545

5

40.521739

46.714286

6

36.050000

54.136364

7

39.608696

51.304348

8

39.363636

52.380952

9

45.428571

48.227273

10

34.478261

51.909091

11

37.904762

52.666667

12

40.454545

49.545455

Last updated

Was this helpful?