CHAPTER 1
THE OPPOSITE OF CERTAINTY
During the past century, research in the medical, social, and economic sciences has led to major improvements in longevity and living conditions. Statistical methods grounded in the mathematics of probability have played a major role in much of this progress. Our confidence in these quantitative tools has grown, along with our ability to wield them with great proficiency. We have an enormous investment of tangible and intellectual capital in scientific research that is predicated on this framework. We assume that the statistical methods as applied in the past so successfully will continue to be productive. Yet, something is amiss.
New findings often contradict previously accepted theories. Faith in the ability of science to provide reliable answers is being steadily eroded, as expert opinion on many critical issues flip-flops. Scientists in some fields seriously debate whether a majority of their published research findings are ultimately overturned1; the decline effect has been coined to describe how even strongly positive results often fade over time in the light of subsequent study2; revelations of errors in the findings published in prestigious scientific journals, and even fraud, are becoming more common.3 Instead of achieving greater certainty, we seem to be moving backwards. What is going on?
Consider efforts to help disadvantaged children through early childhood educational intervention. Beginning around 1970, the U.S. government sponsored several major programs to help overcome social and economic disadvantage. The most famous of these, Project Head Start, aimed to close the perceived gap in cognitive development between richer and poorer children that was already evident in kindergarten. The aims of this program were admirable and the rationale compelling. However, policy debates about the efficacy and cost of this initiative have gone on for four decades, with no resolution in sight. Research on the impact of Head Start has been extensive and costly, but answers are few and equivocal.
Medical research is often held up as the paragon of statistical research methodology. Evidence-based medicine, based on randomized clinical trials, can provide proof of the effectiveness and safety of various drugs and other therapies. But cracks are appearing even in this apparently solid foundation. Low dose aspirin for prevention of heart attacks was gospel for years but is now being questioned. Perhaps the benefits are less and the risks, more than we previously believed. Hormone replacement therapy for postmenopausal women was considered almost miraculous until a decade ago when a landmark study overturned previous findings. Not a year goes by without some new recommendation regarding whether, how, and by whom, hormone replacement should be used.
These are not isolated instances. The ideal of science is an evolution of useful theory coupled with improved practice, as new research builds upon and refines previous findings. Each individual study should be a piece of a larger puzzle to which it contributes. Instead, research in the biomedical and social sciences is rarely cumulative, and each research paper tends to stand alone. We fill millions of pages in scientific journals with âstatistically significantâ results that add little to our store of practical knowledge and often cannot be replicated. Practitioners, whose clinical judgment should be informed by hard data, gain little that is truly useful to them.
TWO DEAD ENDS
If I am correct in observing that scientific research has contributed so little to our understanding of âwhat worksâ in areas like education, health care, and economic development, it is important to ask why this is the case. I believe that much of the problem lies with our research methodology. At one end of the spectrum, we have what can be called the quantitative approach, grounded in modern probability-based statistical methods. At the other extreme are researchers who support a radically different paradigm, one that is primarily qualitative and more subjective. This school of thought emphasizes the use of case studies and in-depth participatory observation to understand the dynamics of complex causal processes.
Both statistical and qualitative approaches have important contributions to make. However, researchers in either of these traditions tend to view those in the other with suspicion, like warriors in two opposing camps peering across a great divide. Nowadays, the statistical types dominate, because methods based on probability and statistics virtually define our standard of what is deemed âscientific.â The perspective of qualitative researchers is much closer to that of clinicians but lacks the authority that the objectivity of statistics seems to provide.
Sadly, each side in this fruitless debate is stuck in a mindset that is too restricted to address the kinds of problems we face. Conventional statistical methods make it difficult to think seriously about causal processes underlying observable data. Qualitative researchers, on the other hand, tend to underestimate the value of statistical generalizations based on patterns of data. One approach willfully ignores all salient distinctions among individuals, while the other drowns in infinite complexity.
The resulting intellectual gridlock is especially unfortunate as we enter an era in which the potential to organize and analyze data is expanding exponentially. We already have the ability to assemble databases in ways that could not even be imagined when the modern statistical paradigm was formulated. Innovative statistical analyses that transcend twentieth century data limitations are possible if we can summon the will and imagination to fully embrace the opportunities presented by new technology.
Unfortunately, as statistical methodology has matured, it has grown more timid. For many, the concept of scientific method has been restricted to a narrow range of approved techniques, often applied mechanically. The result is to limit the scope of individual creativity and inspiration in a futile attempt to attain virtual certainty. Already in 1962, the iconoclastic statistical genius John Tukey counseled that data analysts âmust be willing to err moderately often in order that inadequate evidence shall more often suggest the right answer.â4
Instead, to achieve an illusory pseudo-certainty, we dutifully perform the ritual of computing a significance level or confidence interval, having forgotten the original purposes and assumptions underlying such techniques. This âtechnologyâ for interpreting evidence and generating conclusions has come to replace expert judgment to a large extent. Scientists no longer trust their own intuition and judgment enough to risk modest failure in the quest for great success. As a result, we are raising a generation of young researchers who are highly adept technically but have, in many cases, forgotten how to think for themselves.
ANALYTICAL ENGINES
The dream of âautomatingâ the human sciences by substituting calculation for intuition arose about two centuries ago. Adolphe Quetelet's famous treatise on his statistically based âsocial physicsâ was published in 1835, and SimĂ©on Poisson's masterwork on probability theory and judgments in civil and criminal matters appeared in 1837.5, 6 It is perhaps not coincidental that in 1834 Charles Babbage first began to design a mechanical computer, which he called an analytical engine.7 Optimism about the potential ability of mathematical analysis, and especially the theory of probability, to resolve various medical, social, and economic problems was at its zenith.
Shortly after this historical moment, the tide turned. The attempt to supplant human judgment by automated procedures was criticized as hopelessly naĂŻve. Reliance on mathematical probability and statistical methods to deal with such subtle issues went out of favor. The philosopher John Stuart Mill termed such uses of mathematical probability âthe real opprobrium of mathematics.â8 The famous physiologist Claude Bernard objected that âstatistics teach absolutely nothing about the mode of action of medicine nor the mechanics of cureâ in any particular patient.9 Probability was again relegated to a modest supporting role, suitable for augmenting our reasoning. Acquiring and evaluating relevant information, and reaching final conclusions and decisions remained human prerogatives.
Early in the twentieth century, the balance between judgment and calculation began to shift once again. Gradually, mathematical probability and statistical methods based on it came to be regarded as more objective, reliable, and generally âscientificâ than human theorizing and subjective weighing of evidence. Supported by rapidly developing computational capabilities, probability and statistics were increasingly viewed as methods to generate definitive solutions and decisions. Conversely, human intuition became seen as an outmoded and flawed aspect of scientific investigation.
Instead of serving as an adjunct to scientific reasoning, statistical methods today are widely perceived as a corrective to the many cognitive biases that often lead us astray. In particular, our naĂŻve tendencies to misinterpret and overreact to limited data must be countered by a better understanding of probability and statistics. Thus, the genie that was put back in the bottle after 1837 has emerged in a new and more sophisticated guise. Poisson's ambition of rationalizing such activities as medical research and social policy development is alive and well. Mathematical probability, implemented by modern analytical engines, is widely perceived to be capable of providing scientific evidence-based answers to guide us in such matters.
Regrettably, modern science has bought into the misconception that probability and statistics can arbitrate truth. Evidence that is âtaintedâ by personal intuition and judgment is often denigrated as merely descriptive or âanecdotal.â This radical change in perspective has come about because probability appears capable of objectively quantifying our uncertainty in the same unambiguous way as measurement techniques in the...