Reaction to Swets, Dawes, & Monahan (2000) by Richard Thripp
EXP 6506 Section 0002: Fall 2015 – UCF, Dr. Joseph Schmidt
October 7, 2015 [Week 7]
Swets, Dawes, and Monahan (2000) have given us a strong exposition of probability modeling, including engaging and practical applications, with the intention to shift public policy for the better (p. 23). I love this topic, and the need for it is basically summed in this quote: “The distribution of the degrees of evidence produced by the positive condition overlaps the distribution of the degrees produced by the negative condition” (p. 2), meaning that diagnostics is a tradeoff—since there is an overlapping range where scores can indicate both having and not having a condition (such as glaucoma or dangerously cracked airplane wings), whatever decision model is adopted will yield both false positives and false negatives.
I cannot understand why the authors never compare decision making to type I and type II errors from statistical hypothesis testing. For the statistically inclined, this seems an analogy with immense expository power. While the graphs and explanations in the article are helpful and clear, they do become repetitive—9 of 12 figures are receiver operating characteristic (ROC) curves and 2 more are ROC scatterplots. Figures relating to statistical prediction rules (SPRs) would have been welcome, such as a graph showing how reliability increases with number of cases (p. 8).
The possibilities with probability are endless, and while it may initially appear that they are valuable only to highly-educated professionals such as actuaries and medical diagnosticians, they are actually quite relevant even for personal financial literacy. For instance, I was recently tempted by a postcard advertisement to enter a $5,000 sweepstakes that requires calling in and listening to a life insurance pitch. However, after noticing the odds were listed as 1:250,000, I realized that entrants would earn, on average, 2¢. If the phone call takes five minutes of undivided attention, that is 24¢ per hour—a shockingly low return. Would not many of our decisions and practices be changed with a habitualization for seeking solid probabilistic statistics? For example, we might drive far less after realizing that our risk of bodily injury or death is so high—and we would understand a possible reason why auto insurance is expensive.
One grievance with Swets et al. (2000) is that they focused heavily on binary decisions. Diagnosing cancer (pp. 11–15) is a true/false decision, as are the decisions to utilize a particular test. While there may be a choice between tests of varying accuracy and expense, you cannot do a little bit of a biopsy because you are a little bit concerned about breast cancer—you either choose to perform or not perform the test. This might be a criticism of the field in general, since SPRs and ROCs are obviously geared toward binary tests—and a whole bunch of binary tests can approximate a continuous scale. Nonetheless, I would have liked more examples regarding non-binary decisions—for example, deciding what interest rate and credit line to extend to a borrowing customer, or rating the structural integrity of a bridge. We do have the weather forecasting example, but it was only briefly discussed (p. 18).
Screenings with a low frequency of “hits” are an interesting topic (pp. 16, 19). A detector for plastic explosives produces 5 million false positives per true positive (p. 19); 85% of low-risk HIV patients might receive a false positive diagnosis (p. 16). Statistics like these prompt us to question whether we should even bother with tests in low-risk cases? However, airport security is an area where comprehensive screening is required—we cannot simply select every nth passenger because the costs of missing a terrorist are so high and the commonness of terrorists is so low. On the other hand, the U.S. Postal Service does not need to open every package sent via media mail to ensure the contents are eligible—there is no loss of life at stake. Of course, when both false positives and false negatives are costly, such as with detecting cracks in airplane wings (pp. 16–18), detecting HIV, or detecting terrorists, SPRs and ROCs shine. We can then choose how many true positives we want and exactly how far beyond the point of diminishing returns we are willing to go.
Reference
Swets, J. A., Dawes, R. M., & Monahan, J. (2000). Psychological science can improve diagnostic decisions. Psychological Science in the Public Interest, 1(1), 1–26.