# PMean: What does large mean when talking about negative values?

Dear Professor Mean, I saw a paper where the authors said that they wanted a diagnostic test with a large negative likelihood ratio, because it was important to rule out a condition. False negatives mean leaving a high risk condition untreated. But don’t they mean that they want a diagnostic test with a small likelihood ratio?

Okay, I agree with you, but it’s an understandable mistake. Let’s quickly review the idea of likelihood ratios. A positive likelihood ratio is defined at Sn / (1-Sp) where Sn is the sensitivity of the diagnostic test and Sp is the specificity. For a diagnostic test with a very high specificity, you get a very large ratio, because you are putting a really small value in the denominator. For Sp=0.99, for example, you would end up getting a positive likelihood ratio of 50 or more (assuming that Sn is at least 0.5).

The positive likelihood ratio is a measure of how much the odds of disease are increased if the diagnostic test is positive.

A negative likelihood ratio is defined as as (1-Sn) / Sp. For a diagnostic test with a very large sensitivity, the negative likelihood ratio is very close to zero. For Sn=0.99, the likelihood ratio is going to be 0.02 or smaller, assuming that Sp is at least 0.5.

The negative likelihood ratio is a measure of how much the odds of disease are decreased if the diagnostic test is negative.

The two likelihood ratios should remind you of the acronyms SpIn and SnOut. SpIn means that if specificity is large, then a positive diagnostic test is good at ruling in the disease. This isn’t always the case, sadly, and for many diagnostic tests, the next step after a positive test is not to treat the disease, but to double check things using a more expensive or more invasive test.

SnNout means that if the sensitivity is large, then a negative diagnostic test is good at ruling out the disease. You can safely send the patient home in some settings, or start looking for other diseases in different settings.

That sounds great, but sometimes you are very concerned about false negatives, and you don’t want to send someone home if they actually have the disease. If you are worried about a cervical fracture, ruling out the fracture and sending someone home might lead to paralysis or death if you have a false negative. So you want to be very sure of yourself in this setting.

Now with regard to the comment above, I think it is just a case of careless language. When the authors say “large negative likelihood ratio”, they should have said “extreme negative likelihood ratio” meaning a likelihood ratio much much smaller than one. I’ve done it myself when I talk about a correlation of -0.8 as being a “big” correlation because it is very far away from zero.

We tend to shy away from words like “small” when we talk about a negative likelihood ratio being much less than 1, because “small” in some people’s minds means “inconsequential” when the opposite is true. When I am careful in my language, I try to use the word “extreme” to mean very far away from the null value (1 for a likelihood ratio or 0 for a correlation) rather than “large” or “small”.

# Recommended: OpenRefine: A free, open source, powerful tool for working with messy data

I have not had a chance to use this, but it comes highly recommended. OpenRefine is a program that uses a graphical user interface to clean up messy data, but it saves all the clean up steps to insure that your work is well documented and reproducible. I listed Martin Magdinier as the “author” in the citation below because he has posted most of the blog entries about OpenRefine, but there are many contributors to this package and website. Continue reading

# Recommended: How to increase value and reduce waste when research priorities are set

This is the first in a series of articles on reducing waste in research. It focuses on funding agencies and recommends that funders should support more work on making research replicable, be more transparent on how they set priorities, make sure that research proposals are justified through a systematic review of previous research, and encourage greater openness of research in progress to encourage collagoration. Other articles in this series cover research design, conduct, and analysis, regulation and management, inaccessible research, and incomplete reports of research. Continue reading

# Recommended: Randomized Controlled Trials in Health Insurance Systems

While researchers often use data from health insurance systems to conduct observational studies, the authors of this research paper point out that you can also conduct randomized trials as well. You can randomly assign different levels of insurance coverage and then get claims data to evaluate how much difference there is, if any, in the levels of coverage. This approach is attractive because you do not need a lot of resources, and you can very quickly get a very large sample size. Since insurance data is collected for administrative needs rather than research needs, you have to contend with inaccurate or incomplete data, potentially causing loss of statistical efficiency or producing biased results. The authors offer some interesting examples of actual studies, propose new potential studies, and offer general guidance on how to conduct a randomized trial from health insurance systems. Continue reading

# Recommended: Announcing a new monthly feature: What’s going on in this graph

Through the effort of a team of statisticians with the American Statistical Association, the New York Times is producing a new resource for educators called “What’s Going On in This Graph?”. This is similar to another New York Times effort called “What’s Going On in This Picture?”

Every month the New York Times will publish a graph stripped of some key information and ask three questions: What do you notice? What do you wonder? and What do you think is going on in this graph?

The content will be suitable for middle school and high school students, but I suspect that even college students will find the exercise interesting.

The first graph will appear on September 19 and on the second Tuesday of every month afterwards. Continue reading

# Recommended: Good enough practices in scientific computing

There is more than one way to approach a data analysis and some of the ways lead to easier modifications and updates and help make your work more reproducible. This paper talks about steps that they recommend based on years of teaching software carpentry and data carpentry classes. One of the software products mentioned in this article, OpenRefine, looks like a very interesting way to clean up messy data in a way that leaves a well documented trail. Continue reading

# PMean: Syllabus for Introduction to SPSS, Fall semester 2017

I am teaching a class, Introduction to SPSS (MEDB 5506). Here is the syllabus for Fall Semester 2017. Continue reading