This book examines the use of Big Data and statistical analyses in litigation. This is timely as the use and reliance upon Big Data by business and government has exploded. Organizations of all types are increasingly reliant on analytics and Big Data systems for supporting and informing most, if not all, of their functions. Crucially, in a corollary to Moreâs Law which states that the density of transistors on a circuit board will double every two years leading to doubling of computing power, it is likewise true that the volume of data created and used by business and government doubles every 18 months.
From a public policy and legal perspective, the implications of this are indeed monumental. In addition, technological innovation in Big Data as well as areas such as Artificial Intelligence, Internet of Things, and Smart Contracts all have significant implications on how organizations function, privacy, security, how transactions are carried out, and myriad other factors with enormous legal implications. Using examples, court decisions, and discussions from a range of lawsuits and courts, we draw connections across many different types of litigation.
What Is Big Data?
Whatâs âbigâ in big data isnât necessarily the size of the databases, itâs the big number of data sources we have, as digital sensors and behavior trackers migrate across the world.1
Before proceeding with the discussion on Big Data some clarity is required on the terms that are often used interchangeably. Statistics is properly understood as the use of samples to make inferences about populations. A staple of statistical analysis are surveys, sampling, and testing hypothesis. Statistical analysis is widely used in litigation. Data analysis entails analyzing data of a particular set or population. A financial data analyst examines his firmâs stocks, an insurance claims analyst examines her companyâs extensive claims data, a human resources analyst dives into his agenciesâ personnel data to assess risk of key staff retiring, and so on. Like inferential statistics, data analysis has been extensively used in litigation (audits, patterns, averages, anomalies, etc.).
Big Data is an elaboration of data analysis but it is also different in ways that have significant implications for litigation. It is not necessarily inferential and relies upon computational techniques which examine patterns, trends, and other features of behavior in the data. Big Data examples are credit card transactions, health insurance claims, and online behavior among others. A key difference between data analysis and Big Data analysis can be gleaned by example. Data analysis generates reports on, say, sales by month. Big Data analysis also examines sales but seeks to find patterns for the effect of time of day consumers shop, the weather, location of store, type of credit card, bundle of goods bought, and so on. Big Data analysis is made possible by the decreasing cost of storage space, the use of cloud computing and the recognition that Big Data analytics can confer a competitive advantage or at minimum efficiency enhancing benefits to an organization.2
The National Institute of Standards and Technology defines Big Data as follows: Big Data consists of extensive datasets primarily in the characteristics of volume, variety, velocity, and/or variability that require a scalable architecture for efficient storage, manipulation, and analysis. Further, the Big Data paradigm consists of the distribution of data systems across horizontally coupled, independent resources to achieve the scalability needed for the efficient processing of extensive datasets.3 While seemingly obtuse, this definition of Big Data has implications for organizational behavior, risk, and ultimately litigation. Unlike merely storing large amounts of data and then analyzing, the use of Big Data requires links among disparate systems, reconfiguring, and/or acquiring complex information system architectures and in increasingly common cases reconfiguring an organizationâs very structure. For example, Big Data is defined by some as âBig data is not all about volume, it is more about combining different data sets and to analyze it in real-time to get insights for your organization. Therefore, the right definition of big data should in fact be: mixed data.â4
These processes are complex, expensive, and can be risky. The risk inheres in that organization leadership may not fully understand the implications of using or relying on Big Data. Further, Big Data systems may fail to comply with administrative processes and laws governing the use of personal or confidential data. Combining different datasets is fraught with risk and uncertainty. In addition, true Big Data systems require complex infrastructure at times connected to the internet which expose data and systems to hacking and ultimately legal risks.
Data and Big Data in Litigation
In an increasing number of legal cases, large collections of electronic data information, or in some instances Big Data, determine which party ultimately prevails. In business disputes, employment cases, consumer class actions, and even personal injury lawsuits, the analysis of enormous amounts of electronic data often provides evidence that would be otherwise unattainable strictly from witnesses testifying to the facts of the case. In some instances, electronic data is the only way to analyze the parties dueling allegations in a lawsuit.
Lawsuits involving employment discrimination are an area where statistics and Big Data have been used extensively. In employment cases, especially ones involving many plaintiffs, the compilation, tabulation, and analysis of Big Data has been relied upon heavily with the rise of electronic computing. In class action employment lawsuits, litigants frequently introduce mountains of data and analysis to support or refute the allegations of gender, race, age, or other type of illegal employment discrimination.
Litigation in instances of employee unpaid overtime and off-the-clock work allegations is an area where statistical and large data analyses are used extensively. In these cases, commonly referred to as wage and hour cases, former or current employees allege that the defendant illegally denied them their legal right to overtime premium pay, the defendantâs timekeeping system illegally shaved work time, or the defendant required them to perform work before or after punching in for work or while punched out for a work break. In these types of wage and hour cases it is typical for parties to present evidence based on the collection and analysis of extensive daily time and salary electronic databases. In some instances, where data is not collected by the defendant, the parties may perform sophisticated statistical surveys, based on the electronic information that is available, to provide insights into the allegations in the case.
In response to the complexity of these cases, a number of courts have established special courts to handle complex cases such as the ones involving massive amounts of electronic data. In California, numerous state courts have been set up to handle complex cases where the judge is particularly well versed in the nuances that these types of complex cases involve. A quick review of the court docket of these complex courts shows that a number of these cases are large employment wage in our class actions that involve the analysis and calculations involving these large electronic datasets. Many of the litigants report that the cases flow more efficiently and a number of the issues are adjudicated rapidly. It is often noted that the monetary cost of establishing and maintaining these types of complex courts is relatively expensive.
The Average Wholesale Pricing (AWP) pharmaceutical drug litigation that began in the mid-2000s is also instructive in the use of large electronic databases information as evidence in litigation. Medical billing data typically conforms to the true definition of Big Data with separate systems communicating, exchanging information, and generating new and enormous sets of data. In these cases, individual plaintiff whistleblowers and stateâs attorneys alleged that pharmaceutical drug companies conspired to overcharge Medicaid programs for their pharmaceutical products. The plaintiffs in these cases alleged that drug companies fraudulently reported prices to drug pricing reporting agencies that were higher than their actual average wholesale prices. Medicaid programs use the average wholesale price calculated by companies reporting prices to determine the reimbursement to pharmacists for the medicines that they provide to patients. Accordingly, as it is alleged, since the reported average wholesale price was inflated, the reimbursement to pharmacists, and others who receive Medicaid payments, will be inflated.
The analysis of the defendantâs actions, and liability, and ultimately the calculation of any damages incurred as a result of the defendantâs alleged actions requires the analysis of massive amounts of electronic data. Even in small states with relatively small Medicaid programs, the investigation into the plaintiffâs allegations and calculation of economic damages requires the a...




