Time flies, wether you plot it or not
Histograms are a simple way to get an assessment on data, that's well known.
Well, when such data is formatted in a way that the values has a temporal ordering, things gets not so straightforward. Here I'll show you a way to quickly create histograms from dates, grouping in different scales (month, days, and others).
The steps can be summarized in:
Let's assume we have a dataframe with 4 columns. Here we print the last rows:
Sometimes we'll need to adjust the decoding of the date formating, in case is not in 'utf-8', but in 'byte'. How to detect it? if we see b'2019-02-10', then the column was not properly decoded when our dataframe was generated.
The decoding can be done by simply:
Then, we use 3 pandas methods:
These 3 methods are used in the below block of code, generating colorful histograms out of the box.
And here we have our histograms, in various flavors for different tastes:
As you see, this was pretty simple, without need to format the data and transform to datetime. Pandas does the heavy work for us, allowing to quickly create simple visualizations with few lines of code. Time saving isn't?
Hope you found it useful!
You can find the complete Jupyter notebook on my Github