I recently came across an amazing thing, which actually blew my mind. It is like magic to a kid. I am fascinated about numbers and correlations, and if you are too, you either already know Benford’s law or you should definitely check it out.
Cutting the chase, Benford’s law is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time.
Benford’s law is almost applicable to everything if you can represent it in numbers. Like everything in nature follows the golden ratio, every pattern in nature follows Benford’s law. Not only that, but it is also applicable to population, taxation, elections, and many more… This is an important tool to identify any inconsistency in data sets.
You can check it out yourself, with the data around you like population of various states in your country or even your bank statement for the last 5-10 years. You will be amazed to find the numbers following Benford’s law every single time if you are not manipulating the data.
BTW even with the 5% data manipulation (randomly), you will see that it violates Benford’s law.
Of course, I got curious so, I applied Benford’s law to a Covid19 data set consisting of:
- Countries: 196
- Date Range: Jan 2020 to Aug 2020
- Parameters: Total Confirmed Cases & Total Deaths
And see for yourself that for all cumulative data, we see a perfect Benford’s curve. Amazing isn’t it?
I am sure you already guessed what I am going to do next and you want to see it too. So, let’s see what happens if we use the data for individual countries.
Below are the graphs for a few selected countries:
Looking at these graphs, we see that some countries are clearly following Benford’s law and others clearly not.
Let’s put them in three categories, based on how they comply with Benford’s law:
Low compliance with Benford’s law shows that there may be some issues with the data being presented to us. Some visible call outs:
- Canada, Japan, and New Zealand conform to the law for their number of confirmed cases but not for number of deaths
- Australia and Russia conform to the law for their number of deaths but not for their number of confirmed cases
- China, Italy, Singapore, UAE, and the UK do not conform for either, interesting!
Benford’s law wouldn’t lie, so who would???