INTRODUCTION
Current Approaches
We wish to air some of the more important practical considerations around making data available for dataâdriven usage. This could be for static, offline studies or for operationalized, online reviews. We introduce the concept of data engineeringâhow to engineer data for fitâforâpurpose use outside the domain applicationsâand we take the reader from the first baby steps in getting started through to thoughts on highly operationalized data analysis.
A geoscience team will use an extensive collection of methods, tools, and datasets to achieve scientific understanding. The diversity of data spans voluminous preâstack seismic to singleâpoint measurements of a rock lithology in an outcrop. Modeling approaches are constrained by:
- Size and scarcity of data
- Computational complexity
- Time available to achieve a âgood enoughâ solution
- Cloud computing
- Budget
- Workflow lubrication
It is this last constraint that has proven the largest inhibitor to the emergence of a dataâdriven approach in exploration and production (E&P). It is a motif for the ease with which data and insight are moved from one piece of software to another.
These constraints have led to a brittle digital infrastructure. This is problematic not only in the individual geoscientific silos but also across the wider domain of E&P. We can potentially exclude a rich array of data types, and restrict innovative methodologies because of the current hardware/software stacks that have evolved symbiotically. The applicationâcentric landscape undermines E&P solutions that strive to integrate multidimensional and multivariate datasets.
It was not meant to be this way. Back when it all began, it was okay for decisions to be made in an expert's head. Highâperformance computers (HPCs) were power tools that gave the expert better images or more robust simulations, but at the end of the workflow, all that number crunching led to a human decision based on the experience of that human and his or her team of peers. Currently, there is too much riding on this approach.
So, how do we become dataâdriven if it's hard to get at the data?
Is There a Crisis in Geophysical and Petrophysical Analysis?
There is a movement to adopt dataâdriven analytical workflows across the industry, particularly in E&P. However, there is an existing group of Luddites providing not constructive criticism but deliberate and subversive rhetoric to undermine the inevitable implementation of dataâdriven analytics in the industry. It is true data scientists sometimes lack experimental data of a robust nature. How certain are we that we can quantify uncertainties? How can we understand the things that manifest themselves in the real world, in the hydrocarbon reservoirs? They argue that without concrete experimental evidence, theory harbors the risk of retreating into metaphysics. Predictive and prescriptive models are only the source of philosophical discourse. It is tantamount to solving the problem of how many leprechauns live at the end of our garden. Science is not philosophy. Thus, without recourse to experiment, geoscientists play in the realm of pure speculation and march to the metaphysical drumbeat of ancient philosophers. The slide into metaphysics is not always clear. The language of the perplexing mathematical algorithms can mask it. Theoretical physics, especially quantum physics, and the theories that underpin the geosciences and E&P engineering disciplines can be jamâpacked with opaque, impermeable, thorny mathematical structures. The Luddites, looking over the soft computing techniques and dataâdriven workflows, are betrayed into believing that only the high mathematics and classical physical laws must deliver rigor, a wisdom of the absolute, the lucidity of the variance between right and wrong. No doubt there is rigor. But the answers we get depend so much on the questions we ask and the way we ask them. Additionally, the first principles can be applied incorrectly and the business problem unresolved for the engineers asking the questions.
So, there is no crisis unless we wish to create one. The marriage between traditional deterministic interpretation and dataâdriven deep learning and data mining is a union that when established on the grounds of mutual recognition, addresses an overabundance of business issues.
Applying an Analytical Approach
The premise of this book is to demonstrate the value of taking a dataâdriven approach. Put simply, if the data could speak for itself, what would you learn beyond what your current applications can tell you?
In the first place, it is the experience of many other industries that statistical context can be established. This could be around testing the validity of an assumed scientific assumption (for example, water flood versus overburden compaction being the cause of a 4D velocity change) or it could be demonstrating whether a set of observations are mainstream or outliers when viewed at the formation, basin, or analog scale.
The current crop of applications:
- Lack the computational platform for scaleâout analysis
- Can only consume and analyze data for which they have an input filter
- Are only able to use algorithms that are available in the code base or via their application programming interfaces (APIs)
We discuss in greater detail ahead how to get G&G (geological and geophysical) data into a useable format, but first let us set the vision of what could be plausible, and this takes us into the world of analytics.
What Are Analytics and Data Science?
Analytics is a term that has suffered from overuse. It means many things in many industries and disciplines but is almost universally accepted to mean mathematical and statistical analysis of data for patterns or relationships.
We use this term in customerâ and transactionârich industries, as well as domains where businesses operate on the thinnest of margins. In the UK in the 1950s, the Lyons Tea Company implemented what we now recognize as centralized business intelligence. It was a digital computer that performed analytics across its empireâwide supply chain: thousands of teashops and hundreds of bakeries. Their business analytics grew from their ability to understand and articulate their business processes regarding a data model: a description of the relationships between entities such as customer and inventory items. The team that built this system (called Leo) went on to create similar platforms for other organizations and even sell computing space. This presaged the central mainframes of IBM by a decade, the supply chains of Starbucks by four decades, and the cooperation/competition of computing resources pioneered by Amazon. This history is well documented (Ferry, G., 2010, âA Computer called LEOâ) and is worth bearing in mind, as we understand how the paradigm applies to the geoscientific domain.
Let us fastâforward to the late 1990s and the evolution of the Internet beyond its academic and military homelands. Data could be collected from across an organization and transmitted into, around, and beyond its conventional boundaries. This gave businesses no technical reason to avoid emulating Lyons's example of 40 years before, and those that could exploit the ability to process and assimilate their data for business impact pulled ahead of those that proved unwilling or unable to embrace this technical potential. Davenport's âCompeting on Analyticsâ is a mesmerizing overview of this dynamic period in business history (Davenport, Harris, 2007).
As well as the ability to move data around using wellâdesigned and implemented protocols (i.e., via the Internet), the data was generated by:
- Interactions between people and organizations via interfaces such as pointâofâsale terminals or ATMs
- Communications between individuals and agencies via webâbased services
- The capture of data along a supply chain as goods and materialsâor people in the case of travel and hospitality industriesâmoved around a complex system
Data arising from a transaction could be captured trivially at sufficient quality and richness to enable statistical insight to be gained, often in real time, in the instance of assessing the likelihood that it is someone other than a banking card's owner using it at a given location and time.
Analytics is provisioned by the integration and contextualization of diverse data types. Moreover, it is predicted by timely access to reliable, granular data. If we look to the downstream domains of our industry, this would be realâtime access to realâtime data about refinery operations and productivity and passing it through to trading desks to enable capacity to be provisioned against spot pricing options.
The economic luxury of $100 oil insulated a lot of the upstream domain from adopting this type of integration. With the growth of factoryâstyle drilling for unconventional plays, development and lifting costs became a major component of the economics. Since 2014, it has become less unusual (but still not mainstream) for drilling engineers to be guided in their quest for best practices. Such guides include analytical dashboards that are the result of combining petrophysical, technical, and operational data in statistical models. Engineers can use such guidance to characterize likelihoods of bit failure or stuck pipe under given geological and operational parameters.
The big surprise from working on such projects is not the willingness of roughânecked senior drillers to embrace such an approach (money, especially saved costs, always talks), but more that the data types in question could be brought together and used in such a manner. This combined an approach that used to be called data mining (it's still an appropriate term but is now deeply unfashionable) and soft computing techniques, which currently fall under the definition data science.
To a dyedâinâtheâwool data miner (and probably a senior drilling engineer), data science is one of those unpleasant necessities of modern life (so it's probably an ageârelated t...