Panning for Positives
WHEN ARE SCREENING TESTS WORTH THE RISKS?The New Physician
Pyrite, also known as “fool’s gold,” is a very attractive rock. New World explorers sometimes loaded their ships with it, returning to Europe with false hopes and worthless cargo. If you toss your pan into any old stream, you’re more likely to find pyrite than real gold. But the experienced prospector knows that, expects that, and knows how to tell the difference.
There’s a lot of pyrite in medicine too. It’s the misleading test result—the one that threatens to send us down the wrong path. And like the prospector, we find it most often when we pan the streams where gold is especially rare. But if you know how to tell the difference, you won’t be fooled—or at least you’ll be properly suspicious when you find a bright, shiny rock.
You must understand such subtleties of testing to make sense of the screening debate surrounding breast and prostate cancer, HIV and illicit drug use. Unfortunately, this topic is often taught with equations that don’t give you an intuitive sense of what is going on in this most unintuitive of topics. So let’s skip the equations this time, briefly lay the groundwork, and then go directly to an example that I hope will make screening tests more clear.
Introduction to Testing. Physicians use medical tests to help classify their individual patients. If the test result is X, the patient is healthy. If the result is Y, the patient is diseased. To really understand testing, however, we need some altitude.
Looking down from 10,000 feet, we no longer have individual patients, but populations of patients, diseased and healthy. The ideal test would allow us to perfectly separate the diseased patients from the healthy ones. But tests aren’t perfect. The test results of the healthy people and the diseased people usually overlap (see Figure 1).
This means the “normal range” for a test is actually a compromise designed to balance the consequences of mislabeling healthy people as abnormal (“false positives”) with the consequences of missing people who really have the disease (“false negatives”). Incidentally, the other patients are either “true negative” (healthy and tested negative) or “true positive” (diseased and tested positive).
The ability of a test to separate the healthy and diseased groups is generally described by two somewhat confusing terms: “sensitivity” and “specificity.”
Sensitivity, or the True Positive Rate, describes the ability of the test to correctly classify the diseased population. If you have 100 diseased patients and the test sensitivity is .95, the test will be positive in 95 of them. The other five patients are “false negatives.”
Specificity, or the True Negative Rate, does the same for the healthy population. If you have 100 healthy patients, and the specificity is .97, the test will correctly classify 97 of them as normal. The other three will be “false positives.”
In Figure 1, notice how the sensitivity and specificity can be changed by moving the line that marks the edge of the “normal range.” Moving it left reduces the number of false negatives and will increase your sensitivity. But also notice that by doing this, you’ll increase the number of false positives and reduce your specificity. Moving it right reduces the number of false positives, but increases the false negatives. Whether it’s better to reduce false negatives or false positives depends upon the situation.
The discussion thus far has only focused on the characteristics of the test itself. Don’t forget that there are many other places where things can go wrong. The sample can be improperly drawn, labeled, transported, stored, processed or reported. Usually the system works, but mistakes can happen.
The Perils of Screening. Here’s an example of what could go wrong. A nervous Caucasian couple and their apparently healthy newborn arrive in your office and say, “Our best friends’ new baby was just diagnosed with cystic fibrosis [CF]. We understand you can do a cheap screening test to ensure our baby is OK. Would you please do it?”
You calmly explain that the baby seems fine. And since there is no family history of CF, their baby doesn’t need the test. The parents plead some more and convince you to order the test. The result comes back “positive.” What is the likelihood the baby actually has CF?
The prevalence of CF in the Caucasian population is 1 in 3,400. The CF screening test has a sensitivity of .85 and a specificity of .9985. You can plug these values into an equation, but you won’t really understand the answer when you’re done. So instead, let’s use Figure 2, the “2 x 2” table.
Pick a convenient number of newborns for this hypothetical experiment. How about 34,000? Plug that number into the lower right corner of the table. Then use the prevalence to fill in the bottom totals (divide 34,000 by 3,400). In your sample of 34,000 kids, 10 will have CF, and 33,990 are healthy.
Next, use the sensitivity to fill in the “Diseased” column. To find the number of diseased newborns testing positive (true positives) multiply the sensitivity by the total number of diseased in this example (.85 x 10). Subtract this figure from the total number of diseased patients (10 – 8.5 = 1.5). This number (1.5) represents the number of false negatives.
Use the specificity to complete the “Healthy” column. Put the number of healthy children who test negative for CF (.9985 x 33,990) in the true negative box. Subtract this figure from the total number of healthy children (33,990 – 33,939 = 51). This gives you the number of false positives (51). Total the rows.
Now you can answer the question: What is the likelihood that the baby really has CF? Or, in statistical language, what is the “predictive value of a positive test?” Simply look at the number of true positives (8.5) and divide that by the total number of positives (59.5). The likelihood is 14 percent.
To figure out what happened, look at the table. The low prevalence of CF in Caucasians caused the number of false positives (51) to greatly outnumber the true positives (8.5), even though the specificity was extremely high (.9985).
But suppose the situation was different. What if the newborn had a sibling with CF? The prevalence would be 25 percent instead of 1 in 3,400 (CF is autosomal recessive). Run the table again and you’ll find, if the screen is positive, the chances are better than 99 percent that the child has CF. Same test, different prevalence, different predictive value. Not intuitive, until you work through the 2 x 2s.
Usually, when a test is ordered by a physician, a positive result means something. That’s because an experienced physician doesn’t order a test without a good reason. The “good reason” ordinarily means that the patient is in a higher risk group. He may have symptoms of CF or a family history. The prevalence is higher in those selected patients, and the test works.
Calculating the Cost of Screening. When talking about a single newborn patient, the cost of screening is only the cost of the screening test. But how much does it cost to screen a population?
For this example, start with 3,400 newborns and screen them all at $4 each (total cost = $13,600). Now take the 5.85 newborns that test positive and give them the confirmatory test at $60 each ($351). That’s $13,951 per .85 CF patients found, or $16,413 per one CF patient found.
Next look at the cost (both financial and human) of not finding the patients early and decide whether it makes sense to do the screening. For CF perhaps it doesn’t, although it may make sense to screen the parents before the child is even conceived, but that’s a different story.
Take-Home Messages. There are several morals to this story. First, remember that the “normal range” for a test is a somewhat arbitrary compromise between creating false negatives and false positives. It is not etched in stone. If your patient and the test result don’t seem to make sense together, take some time to think. If the patient looks hyperthyroid, but the test is “high normal,” maybe the patient is a false negative.
Remember the old adage: Treat the patient, not the test.
Second, mass screening for rare conditions invariably results in many false positives. Be sure you and the patient understand the potential for false positives and what risks could be involved if you get a positive. Patients who screen positive for HIV may attempt suicide if they don’t understand the possibility that the test result could be in error. And there are other risks. Insurance companies may attempt to deny that individual coverage. Positive drug screens cause employment and legal problems. This doesn’t mean you shouldn’t do the test, but it certainly shows you need to take great care with the results.
Finally, remember that the cost of screening for rare conditions is not just the small individual cost to the patient, but also the much larger cost to society.
Panning for medical gold is sometimes a useful endeavor. The nuggets found could be valuable—or a risky, costly distraction. These issues will come up again and again as the medical community continues to work out what role screening plays in maintaining public health.
Next Time: Medical privacy—the struggle continues. As our data become computerized, the questions intensify. Who has access to our medical data? Under what circumstances? Do we have any control at all? What is the government doing about it? What can you do?
New Physician contributing editor Rick Stahlhut is a medical informatics writer and consultant. Contact him with questions or suggestions for column topics at firstname.lastname@example.org, or check out his Web site, at web.net-link.net/