1.1 Introduction
This book will demonstrate that there is great potential for better availability, understanding, and use of data about the law itself and about legal institutions, which can be used to improve understanding and processes within the sector. Sources of this data are varied and include data created as part of processes within legal organizations, the documents that make up the law itself, and experimental data generated to explore legal aspects of human societies. This data, used creatively and carefully, can help accomplish many things, such as:
- Improving the profitability of a legal practice
- Delivering insights into the way decisions are made
- Conveying understanding of how the legal system affects different segments of society
- Developing computer applications that create value
There are significant benefits to using data to better understand processes and to support decision making. This is widely practiced in other disciplines and industries. Until recently this has been less the case in law, but this is starting to change. Improvements in the availability of suitable technologies and data make it easier to use these techniques to make more evidence-based decisions and gain insights into how law affects society and how to improve methodologies.
Data is not a panacea, but there is a great deal of room to improve things in law. The introduction of artificial intelligence will not end war or resolve all contract disputes, but ending a given percentage of unnecessary disputes is within reach (Walker 2019, 13). Laws define the rules that human beings use to navigate complex relationships at societal levels, and these relationships will always contain complexities that require the creativity and insight humans possess. In the immediate term however, focusing on incremental improvements can bring considerable benefits, and in time the full potential of data driven methodologies will become clear.
1.2 Why look at data?
One of the great stories of the last 250 years has been the increase in productivity in many industries and the accompanying improvements in living conditions for a large proportion of the worldās population. However, these productivity gains have not been consistent across sectors, with fields like medicine, education, and law, which require extensive human inputs that cannot be automated, lagging. This has been followed in recent years by a narrative about how artificial intelligence will change the ways people work and live, and that knowledge workers, like lawyers, will have their jobs disrupted in the near future in the same way manufacturing workers have had their jobs disrupted in the last twenty to thirty years.
How this will happen, or whether it will happen at all, remains to be seen, but it is clear that these changes will rely on data to develop the proposed systems, and they will not be possible without it. All the applications being proposed and launched that are described as being based on artificial intelligence or algorithms could just as correctly be described as being based on data. Many of these systems are anticipated to run on machine learning. These are sophisticated applications which use extensive data to draw conclusions on different situations which are anticipated to follow the same patterns. How to get this data and how to know whether it is predictive of expected patterns in other situations is less certain.
Prior generations of artificial intelligence development in the 1980s were usually based on hard coding behaviors into system, so that direct decisions could be made about how they would react in particular situations. This approach had issues with scale, because it could not use additional computing power without significant additional human input. In contrast, artificial intelligence applications now primarily use machine learning and complex statistical analyses based on ingesting enormous amounts of existing data, and the systems create the complexity needed primarily through computing power with considerably less human input. This is one of the shifts that makes artificial intelligence now seem more transformative than it did then. The ability to scale these applications creates the promise that artificial intelligence can deliver efficiency gains in knowledge-based work where in the past it could not achieve the required complexity.
In his book On Legal AI, Joshua Walker describes a āCambrian explosionā of legal innovation happening now, but he argues that much of the publicity surrounding it is āmarketing pablumā to fool demanding clients, and it is not based in substantial value or technical developments that warrant the hype (Walker 2019, 26). Whether his pessimism is warranted remains to be seen, but āartificial intelligenceā is at least as much a marketing as a technical term. That said, better quality data that is used in mathematically justified ways can significantly improve the quality of the resulting applications. Being sophisticated about what particular datasets can support makes clients better able to select products effectively. This is important because there are real needs for improvements in legal systems around the world, and contrary to the alarmist rhetoric some have used to discuss these changes, there is room for many stakeholders to benefit.
There are substantial unmet needs for legal services in society, and data based innovation can generate business opportunities and ways to improve access to justice. Currently, clients ask lawyers for information and in many cases they get educated guesses on outcomes. These situations are high stakes for clients, and they need better data on probable outcomes (Walters 2018, 1ā2). Even relatively small increases in the quality of information lawyers give clients lead to substantial improvements in the probable value that clients can anticipate (Sutherland 2017). A longitudinal report published in 2019 found that the difference between the business statistics for successful and unsuccessful firms is very small, but the compounding effects of a few percentage points combine to flourishing or failing practices over time (Clio 2019). Improving data use to improve decision making can generate these changes.
There are important aspects of the legal industry that make data applications more complicated than they are in other disciplines. The law and legal organizations have long histories ā in many cases they outlive the jurisdictions that created them. For example, in Namibia there are acts that have origins in Dutch, English, South African, and Indigenous law, because the laws associated with different colonial and local governments were brought forward as time passed (Badeva-Bright 2021). On a smaller scale, many law firms continue in business over decades, and they need to make sure that the information they need from that whole period is available to them, from wills that were stored decades ago to the emails that may need to be included in e-discovery from last week.
Inclusion of that depth of history is imperative to handle the complexity of human communities and how they are governed. At the same time, law is ever changing, and any successful data driven system will have to accommodate these changes. The element of the passage of time in legal systems is imperative to understand, and it makes the analysis of legal data different from the analysis of scientific data. It would be as if scientific methodology required inclusion of a single body of data from past experiments going back centuries, but which is continually updated, for researchers to work in a contemporary lab.
Experts in law need to be involved in the development of projects from the beginning. It is common for software to be developed without the input of legal experts until the last stage when it is prohibitively expensive to make changes. In contrast, having legal input at the preliminary design stage makes integrating these considerations into systems comparatively cheap and easy (Walker 2019, 145).
1.3 Applications
There are so many possible ways for data to be used in legal scholarship and the legal industry that it is impossible to list them all here. The following sections are intended to give an outline of some possibilities to help give context to coming the chapters.
Business improvement
One of the most relevant and easiest to implement ways to use legal data is through collecting and using data to improve business metrics within law firms, legal departments, and other legal businesses. This data can be collected and analyzed using well developed business techniques to develop performance in many ways. They can:
- Manage financial performance
- Increase productivity
- Improve client retention and acquisition
- Quantify the impacts of particular units or practice areas
- Measure the contributions of particular lawyers beyond billable hours
This can be as simple as being more careful about how website analytics are interpreted, but it can also become quite complex. Happily, practice management tools are starting to include better data recording and analysis functionality in their tools, so it is becoming easier to collect and use.
Legal research
Legal research has changed a great deal in the last thirty years as it has become digitized, and the majority of it is now done online. Part of the reason this took so long is that there was so much historical data that needed to be digitized in order to make the transition viable. This is still not complete, and anyone who is doing in depth historical legal research can expect to have to go to print volumes with some regularity, but for most questions online tools are sufficient now.
The big driver of methodological change until now has been search, with an ongoing shift from controlled vocabularies, taxonomies, and curated indices to more automated systems. Improved data techniques have certainly been instrumental in driving improvements in search and now machine learning is creating new generations of systems that integrate the data that years of use have generated.
Tools like machine learning are now taking this further. They are improving search by allowing for more flexible searches to be run from documents and longer texts rather than just from keywords. Search systems are getting better at delivering what researchers mean rather than just the results that follow from the direct terms entered. Now with systems like Googleās search algorithm BERT, it is no longer necessary to include precise words that appear in results as it is efficient at searching for concepts regardless of vocabulary (Arredondo 2021).
Machine learning is also being used to automatically generate value added finding tools such as case summaries, classification, legal dictionaries, and annotated legislation. The ease of keyword searching has made these tools less popular with researchers in recent years, so it will be interesting to see how the social dynamics of adoption of these new tools changes as they become more cost effective, but they can be anticipated to be a focus of development in the foreseeable future.
The next segment of the legal research market that will likely develop and be widely adopted is writing assist software which will automatically generate document drafts based on inputs or do things like suggest the next clauses that can be included in contracts. These have existed for some time, but they will improve and become more accessible and cost effective.
Academic research
The possibilities for academic research driven by data analysis are substantial. There are many opportunities to explore techniques from other disciplines to understand the law and social aspects of legal systems. Researchers are particularly encouraged to increase their use of experimental methodologies such as randomized controlled testing to better quantify the impacts of different i...