Benford's Law
eBook - ePub

Benford's Law

Theory, the General Law of Relative Quantities, and Forensic Fraud Detection Applications

Alex Ely Kossovsky

Share book
  1. 672 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Benford's Law

Theory, the General Law of Relative Quantities, and Forensic Fraud Detection Applications

Alex Ely Kossovsky

Book details
Book preview
Table of contents

About This Book

Contrary to common intuition that all digits should occur randomly with equal chances in real data, empirical examinations consistently show that not all digits are created equal, but rather that low digits such as {1, 2, 3} occur much more frequently than high digits such as {7, 8, 9} in almost all data types, such as those relating to geology, chemistry, astronomy, physics, and engineering, as well as in accounting, financial, econometrics, and demographics data sets. This intriguing digital phenomenon is known as Benford's Law.

This book represents an attempt to give a comprehensive and in-depth account of all the theoretical aspects, results, causes and explanations of Benford's Law, with a strong emphasis on the connection to real-life data and the physical manifestation of the law. In addition to such a bird's eye view of the digital phenomenon, the conceptual distinctions between digits, numbers, and quantities are explored; leading to the key finding that the phenomenon is actually quantitative in nature; originating from the fact that in extreme generality, nature creates many small quantities but very few big quantities, corroborating the motto "small is beautiful", and that therefore all this is applicable just as well to data written in the ancient Roman, Mayan, Egyptian, and other digit-less civilizations.

Fraudsters are typically not aware of this digital pattern and tend to invent numbers with approximately equal digital frequencies. The digital analyst can easily check reported data for compliance with this digital law, enabling the detection of tax evasion, Ponzi schemes, and other financial scams. The forensic fraud detection section in this book is written in a very concise and reader-friendly style; gathering all known methods and standards in the accounting and auditing industry; summarizing and fusing them into a singular coherent whole; and can be understood without deep knowledge in statistical theory or advanced mathematics. In addition, a digital algorithm is presented, enabling the auditor to detect fraud even when the sophisticated cheater is aware of the law and invents numbers accordingly. The algorithm employs a subtle inner digital pattern within the Benford's pattern itself. This newly discovered pattern is deemed to be nearly universal, being even more prevalent than the Benford phenomenon, as it is found in all random data sets, Benford as well as non-Benford types.


  • Benford's Law
  • Forensic Digital Analysis & Fraud Detection
  • Data Compliance Tests
  • Conceptual and Mathematical Foundations
  • Benford's Law in the Physical Sciences
  • Topics in Benford's Law
  • The Law of Relative Quantities

Readership: Researchers in probability and statistics, forensic data analysis. Key Features:

  • The book is a concise account of all known aspects in practical applications of the phenomenon to fraud detection. It also corrects several errors committed in the field where mistaken applications are used
  • The perceptive reader such as an accountant, an auditor or an official at any governmental tax authority worldwide, interested in knowing about the use of this digital law in fraud detection, would be able to learn about it with ease and with a minimal amount of effort and time, instead of searching through literally hundreds of various small articles on the topic
  • The book provides numerous new theoretical points of view of the phenomenon, new methods for testing data for compliance, and fuses many different aspects of the law into a singular explanation

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Benford's Law an online PDF/ePUB?
Yes, you can access Benford's Law by Alex Ely Kossovsky in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.


Section 1
The typical statistician, during a typical day at the office, spends most of the time intensely staring at data charts and scatter plots, seeking real or imaginary patterns where perhaps none exist, summarizing data, calculating averages and standard deviations, regressing and correlating seemingly unrelated variables, analyzing subtle variances between related data sets to determine whether they are significantly or randomly different from each other, dissecting and bisecting those pesky numbers sent by clients, government agencies, companies, and research institutes.
Interestingly, the statistician is recently taking on the role of a philosopher of sorts, and instead of examining the numbers themselves as is the standard practice, he or she is investigating the digital language utilized in writing those numbers. What letters are to words, digits are to numbers. Why should a poetry lover seek any patterns or beauty by looking into the letters in Shakespeare’s prose instead of the elegantly combined words? Yet, the relative proportions of our ten digits 0 to 9 occurring within our typical everyday numbers are now being routinely recorded and investigated by statisticians and data analysts, and even theorized as to how exactly they should be spread within any given data set by applying mathematical and statistical reasoning. Moreover, the study of digit proportions is further subdivided by classifying them into different categories according to position. For example, the specific proportions of the leftmost digit, namely the first digit of numbers, is looked into and examined separately. Another separate analysis is performed on the second-leftmost digit, which indeed shows quite different digital proportions than those of the first digit. But aren’t all digits supposed to be occurring randomly and thus equally distributed? Why should the digit 4 for example have a higher or lower chance of occurring within numbers than say the digit 5? One wonders whether the occurrences of digits themselves within numbers are just ‘too random’ for the statistician to even consider and analyze. Is there indeed a particular statistical law supposedly governing digital proportions? In addition, it seems doubtful that there would be any use or consequence in looking into this digital language proportion in the first place. Are there any applications that can exploit the examination of these digital proportions?
The answers to the latter two questions are all decisively positive, as evident by the newly-created role assigned to the statistician recently as a private detective utilizing known digital patterns in data to detect fraud by knowing that fake data probably lacks those particular digital patterns. Previously, the task of the statistician was merely to analyze data, but never to decide on the authenticity of the provided data. Data was traditionally always taken as a given without any ability to authenticate. For how could the unsuspecting, honest and naive statistician know that people were sending him or her fake data that was merely invented? One incentive to fake data and reduce reported revenues and income would naturally be to lower tax payments. Another incentive is the temptation to inflate revenues and profits in order to impress investors and present the company in a better light as being financially sound. Therefore there is a strong need on the part of tax authorities, governmental financial regulatory and supervisory agencies worldwide, as well as auditing and accounting companies and others, to obtain professional statistical advice as to how to detect fake data. By wearing that philosopher’s hat and examining the digital language used in writing the numbers in provided data sets, the statistician is then able to wear his or her other hat, namely the detective’s hat, and forensically analyze data for any possible fraud.
As our civilization progresses, we are able to do things previously thought impossible. Our collective mathematical and technological abilities have reached fantastic heights. We literally perform magic with our computers and other gadgets. But can we perform the simple task of telling when a friend or a spouse lies? Perhaps not, but the truly sophisticated statistician, aware of the latest developments in the field, can nowadays detect straight-faced fraudsters when presented with their fake data. Underpinning this ability is the fact that to concoct authentic-looking data one must know something about the particular properties of their digital language, while most fraudsters haven’t got a clue about the topic, and mistakenly believe that digital equality rules the universe of numbers. Yet in fact, low digits such as 1, 2, and 3 actually occur with very high frequencies within the first-place position of typical everyday data, while high digits such as 7, 8, and 9 have very little overall proportion of occurrence. So much so that the proportion of everyday typical numbers starting with digit 1 is about seven times that of numbers starting with digit 9! About 30% of typical everyday numbers in use start with digit 1, while only about 4% start with digit 9.
In order to illustrate the ability of utilizing this peculiar digital phenomenon in fraud detection, we shall digitally analyze hypothetical accounting data from five different companies where amounts represent revenues. The table in Fig. 1.1 shows 25 dollar amounts from each company. Nothing seems unusual or suspicious if we merely focus on the numbers themselves. Yet, if we forensically investigate the digital language used in writing those numbers, namely the digits at the very beginning of each number (the leftmost ones), we can immediately reveal an abnormality with one particular data set. Figure 1.2 shows the proportions of the first digits for all five companies.
Clearly, MF Capital comes under strong suspicion in the eyes of the expert statistician, since typical accounting data rarely comes with anything near digital equality for the first position. First-digit proportions of the other four companies show an overall pattern of gradual decrease, consistent with the expected pattern in almost all types of accounting data. The set of the first digits for MF Capital revenue data (commas omitted) is {4736281255914389752766432}, which is distinctly different compared to say Alcoa’s {6111119321441128225618431}. Digits at the second and third positions are much more equal in proportions for all five companies and do not show any particular pattern; they also do not single out MF Capital in any way. Had the focus of the statistician been misplaced on those digits, there wouldn’t be any clue about MF Capital’s possible fraudulent activities.
Figure 1.1 Hypothetical Accounting Data for Five Companies
Figure 1.2 1st Digits Proportions of the Data of Five Companies
First Leading Digit (LD) or First Significant Digit is the first (non-zero) digit of a given number appearing on the leftmost side. For 567.34 the leading digit is 5. For 0.0367 the leading digit is 3, as we discard the zeros. For the lone integer 6 the leading digit is 6. For negative numbers we simply discard the sign, hence for -62.97 the leading digit is 6. Another way of defining the first digit of any number is by writing it in scientific notation as A*10N with N being an integer and A being a real number such that 1 ≤ |A| < 10. For such representation of numbers, the integral part of A (excluding the fractional part), and with the positive or negative sign ignored, is what we consider the first leading digit. For example, the number 311.75 is scientifically written as 3.1175*102 and digit 3 leads the number. Naturally, when digit d appears first in a number composed of several digits, we call d the ‘leader’, as it leads all the other digits trailing behind it to the right.
Perhaps it is tempting to intuit that for numbers in typical real-life data sets, all nine digits {1, 2, 3, 4, 5, 6, 7, 8, 9} should be equally likely to occur and thus uniformly distributed. Let us examine three typical data sets from a variety of real-life situations where digital results run counter to that misguided intuition and where, surprisingly, low digits such as 1, 2, and 3 are strongly favored over high digits such as 7, 8, and 9. The three data sets to be digitally examined are: (I) stock market prices and volume of stock traded, (II) the 10 by 10 multiplication table, and (III) house number in typical address data.
Examination of first digits of closing prices and daily volume of stocks traded on the New York Stock Exchange on December 23, 2011 reveals a definite pattern in which digital proportions are almost monotonically and consistently decreasing. The first 31 companies on top of the alphabetically-sorted list were arbitrarily chosen. Figure 1.3 shows the extracted data.
Low digits lead much more often than high digits, for both stock prices and volume. Figure 1.4 shows the exact LD distributions for this limited set of 31 companies. It should be noted that almost all other such subsets down the long list on the NYSE website yield quite similar results, that there was nothing unusual about the trading day of the 23rd of December 2011, and that very similar digital results are gotten on other trading days.
Let us examine LD of the 10 by 10 multiplication table that we all were forced to memorize at school against our will, as shown in Fig. 1.5(A).
Surprisingly, out of 100 numbers, 21 start with the lowest digit 1 (shown in large and bold font), and only five start with the highest digit 9 (shown within circles), namely a ratio of 4:1 roughly. This result is surprising yet approximately compatible with the digital results seen in the example with stock prices and volume data. In this digital analysis the numbers 1, 10, and 100 are grouped together under the same category since all of them are being led by digit 1. Digital proportions here are {21%, 17%, 13%, 14%, 8%, 9%, 6%, 7%, 5%}.
Figure 1.3 Price and Volume of Stocks Traded on the NYSE
Interestingly, if the digital as...

Table of contents