Although the theme of this book is reinventing clinical decision support, we have not lost sight of the adage about not reinventing the wheel. There is much that can be done to reduce the number of diagnostic errors and to improve clinical decision making that does not require the latest innovations in artificial intelligence (AI), machine learning (ML), and data analytics. With that in mind, Chapter 1 will focus primarily on the basics of clinical reasoning, cognitive errors, and diagnostic errors and what can be done to remedy these errors with currently available technology and human intelligence.
Measuring Diagnostic Errors
Although much attention has been given to patient safety in general in the professional press, relatively little of this attention has focused on one of the most important aspects of patient safety, namely, diagnostic errors. A 2015 report from the National Academy of Medicine points out that about 5% of adult outpatients in the United States experience a diagnostic error annually.1 The same report found that diagnostic mishaps contribute to about 1 out of 10 patient deaths, cause as much as 17% of hospital adverse effects, and affect approximately 12 million adult outpatients a year, which translates into 1 out of 20 Americans. About half of these errors may be harmful, according to Singh et al.2 Among the 850,000 patients who died in US hospitals annually, about 71,400 of these deaths included a major diagnosis that had not been detected.
One reason why it has been difficult to reduce the number of diagnostic errors is that we have yet to find an accurate way to measure the problemâand without accurate metrics, there is no reliable way to determine if potential solutions are having a significant impact. Traditionally, we have relied on several metrics to estimate the incidence of diagnostic mistakes: medical records review, malpractice claims data, insurance claims, autopsies, reviews of diagnostic tests, reviews of medical imaging, clinician surveys, and patient surveys. Each has its strengths and weaknesses, and most are labor intensive.
Postmortem reviews. Autopsies can unearth diagnostic errors by detecting discrepancies with medical records and interviews with clinicians and families. Diagnostic errors that may impact patient outcomesâlabeled as Class I errorsâhave been observed in 10% of autopsies. Class I and II errors, considered major errors, are estimated to occur in 1 out of 4 autopsies.3 Since autopsies are not randomly performed on the population as a whole but rather in special circumstances, some have suggested that the US Department of Health and Human Services fund more routine postmortem reviews to help the healthcare community obtain a more representative sample of patient deaths.
Medical records. The Harvard Medical Practice Study (1991), which examined more than 30,000 patient records, found diagnostic errors contributed to 17% of all identified adverse effects, while an analysis of Colorado and Utah hospitals (2000) concluded that diagnostic errors caused 6.9% of adverse reactions.4,5 A more recent investigation in the Netherlands found diagnostic adverse effects accounted for 6.4% of all adverse effects reported in a hospital setting.6 When the researchers divided these errors into subcategories, they found about 96% had resulted from human failures. The primary causes of diagnostic adverse effects were classified as âknowledge-based failures (physicians did not have sufficient knowledge or applied their knowledge incorrectly) and information transfer failures (physicians did not receive the most current updates about a patient).â
Malpractice claims. An analysis of 25 years of medical malpractice lawsuits gleaned from the National Practitioner Data Bank found that the most common reason for payment of a claim was a diagnostic error (28.6%).7 The same analysis concluded that such errors were far more likely to be linked to patients dying, when compared to other issues, including surgery, drugs, and treatment options. The Institute of Medicine report also pointed out that about 70% of diagnostic error malpractice claims happened in an outpatient setting, but âinpatient diagnostic error claims were more likely to be associated with patient death.â1 The Doctors Companyâs review of malpractice claims looked at 10 medical specialties and found that 9% occurred in obstetrics and 61% in pediatrics. The most common disorders represented in malpractice claims included acute MI, cancer, appendicitis, and acute stroke.8
Health insurance claims. It is now possible to link insurer databases to federal death registries. These types of correlations have been used to detect potential diagnostic errors as they are related to congestive heart failure, 30-day hospital readmissions, and other expensive complications that are now of keen interest to the US government. One such analysis looked at patients who were admitted to the hospital with stroke who had been previously treated in the ED and released 30 days earlier.9 More than 12% of the admissions may have been the result of a missed diagnosis, and 1.2% reflected âprobable missed diagnoses.â
Diagnostic testing. Reports on the frequency of laboratory test errors vary widely, but most agree that the pre- and post-analytic phases of lab testing are the most vulnerable to error. One analysis found 62% of errors occurred during the pre-analytic phase, 15% during the actual testing, and 23% during the post-analytic phase.10 Test follow-up is also an issue that contributes to diagnostic errors, with failure rates as high as 23% among hospital patients and 16.5% in the ED.11
Physician surveys. A survey of nearly 600 physicians found that diagnostic errors were most likely to occur in pulmonary embolism, cancer, drug reactions, stroke, and acute coronary syndrome.12 An independent survey found that more than a third of physicians had either experienced a diagnostic error themselves or observed one in a family member.13 It is probably obvious to most readers that surveys are not the most reliable or accurate way to estimate the frequency of diagnostic errors since they are subject to many biases.
Patient surveys. A 1997 survey from the National Patient Safety Foundation found that about 1 out of 6 patients (16.6%) reported a diagnostic error, either happening to themselves or a close friend or relative.14 A more recent survey found that 23% of survey respondents said they or someone close to them had experienced a medical error, about half were labeled diagnostic mistakes.15
As all these metrics have shortcomings and require considerable resources to implement, there has been a growing movement to enlist AI-enhanced tools to supplement or even replace them. Ava Liberman from the Department of Neurology at Albert Einstein College of Medicine and David Newman-Toker from Johns Hopkins have developed an AI system that has the potential to replace these legacy approaches to diagnostic error tracking.16
Liberman and Newman-Tokerâs approach uses well-documented symptom/disease pairs that have been shown to occur together during diagnostic mishaps. The Symptom-Disease Pair Analysis of Diagnostic Error or SPADE relies on readily available administrative and clinical data from electronic health records (EHRs), billing, and insurance claims to measure the rate at which seemingly benign ED diagnoses are followed up in a short period of time by rehospitalization for a much more serious diagnosis that apparently was missed during the initial patient presentation. For example, dizziness in the ED is sometimes mistakenly attributed to an inner ear infection when in fact its root cause is cerebral ischemia and stroke. As Liberman and Newman-Toker point out: âWith untreated TIA [transient ischemic attack] and minor stroke, there is a marked increased short-term risk of major stroke in the subsequent 30 days that tapers off by 90 days. A clinically relevant and statistically significant temporal association between ED discharge for supposedly âbenignâ vertigo followed by a stroke diagnosis within 30 days is therefore a biologically plausible marker of diagnostic error. If this missed diagnosis of cerebral ischaemia resulted in a clinically meaningful adverse health outcome (e.g., stroke hospitalisation), this would suggest misdiagnosis-related harm.â16
In order for a health system to implement the SPADE approach, it must have access to a large data set of patient information that includes 2 specific points in time for each patient: the initial diagnosis and when it was given, and the final diagnosis and its timing. It is also important to have established a âclinically relevant and statistically significant temporal associationâ between the 2 events. To establish the symptom/disease pairs worth considering as part of a diagnostic error metric, Liberman and Newman-Toker used look-back and look-ahead analyses, that is, they first studied a specific disease and looked back to determine which symptomatic presentations are most likely to be missed. The look forward analysis started with a symptom in the patient population to determine which diseases were most likely missed. Additional symptoms/disease pairs that are credible candidates for this metrics systems include headache and aneurysm, chest pain/myocardial infarction, and fainting/pulmonary embolism.
How large should the data set be for this approach to work? At least 5,000 to 50,000 visits, which would generate about 50 to 100 diagnostic error outcome events. This estimate is based on previous research that found misdiagnosis harm rates of about 0.2% to 2%.
One weak link in the SPADE model is the out-of-network patient. If a significant number of patients with the initial benign diagnosis return to a different health system when they experience the more serious outcome disease, that would skew the results. One study, for instance, suggested that during a 1-year period, 25% of patients crossed over to another unaffiliated treatment facility. Thus, the model is most likely to yield an accurate estimate of diagnostic errors when either the data is drawn from a regional health information exchange or from a health system that has a built-in insurance plan that tracks patients who decide to use facilities outside the one that recorded the index diagnosis.
The SPADE approach is also not well suited to detect diagnostic errors involving many chronic diseases. For example, the emergence of diabetes or hypertension may appear slowly over time, making it difficult to detect a diagnostic error using the symptom/disease pairing discussed above. Similarly, certain disorders with complex presentations may not be easily tracked with SPADE. As Liberman and Newman-Toker point out: âFor diseases with a sub-acute time course presenting non-specific symptoms (e.g., tuberculosis and cancer), a more complex analytical approach is required. For example, it might be necessary to bundle symptoms and combine with visit/testâordering patterns over time (e.g., increased odds of general practitioner visits for new complaints/tests in the 6 months before a cancer diagnosis).â16
There may be other ways to measure diagnostic errors besides symptom/disease dyads, including EHR triggers. With the assistance of data mining, it is possible to identify patient records that include clinical findings that suggest the need for diagnostic testing and to track follow-up on these signposts to determine if they have in fact been acted upon by clinicians. A delayed diagnosis is one of the 4 common causes of diagnostic errors, which also includes missed diagnosis, misdiagnosis, that is, incorrectly diagnosed disease, and overdiagnosis.
To demonstrate the value of such EHR triggers, Daniel Murphy with the Michael DeBakey VA Medical Center in Houston, Texas, and his colleagues analyzed nearly 300,000 patient records to look for patient demographics and abnormal clinical findings that would usually warrant a recommendation for follow-up diagnostic testing.17 The algorithms scanned the data repositories of 2 large integrated health systems for 4 diagnostic clues: abnormal prostate-specific antigen (PSA), positive fecal occult test (FOBT) results, the existence of iron deficiency anemia, and fresh stool or anal blood, called haematochezia.
The algorithm found 1,564 trigger positive patients for these four diagnostic clues. Further analysis concluded that: âUse of all four triggers at the study sites could detect an estimated 1048 instances of delayed or missed follow-up of abnormal findings annually and 47 high-grade cancers.â The analysis suggests that many patients fall through the cracks, for a variety of reasons, and setting up a better reminder system to a...