As described in the article “Why data disappoints?”, data needs to be handled with considerable care to enable discoveries and meaningful insights, with the goal to move towards automation using machine learning.
As an example, in the previous article, we highlighted the danger of incorrectly using seemingly simple tools like averaging or smoothing for time series. Following requests for further clarifying this point, a simple use case is presented.
Let’s assume that you want to compare the revenues generated by two products (A and B) over a period of 6 months. As shown in the figures below, in case (1) the revenues from the products follow a very different evolution, while in case (2) the revenues for both products evolve, more or less, in the same way.
When looking at the difference in the average over this (6 months) period, both cases would yield same results: both products are generating the same revenue within a few percent. But the variance is completely different; in the ballpark of 100% in case (1), and a few percent in case (2).
This demonstrates well that variance is a critical measurement to account for (and yet seldom appears in any statistics in business reports and presentations, even those for internal use). Smoothing a time series can have dangerous implications on business strategy.
Once again, data doesn’t disappoint, only unwise use of statistics does.