Tag Archives: Human side of statistics

PMean: What does large mean when talking about negative values?

Dear Professor Mean, I saw a paper where the authors said that they wanted a diagnostic test with a large negative likelihood ratio, because it was important to rule out a condition. False negatives mean leaving a high risk condition untreated. But don’t they mean that they want a diagnostic test with a small likelihood ratio?

Okay, I agree with you, but it’s an understandable mistake. Let’s quickly review the idea of likelihood ratios. A positive likelihood ratio is defined at Sn / (1-Sp) where Sn is the sensitivity of the diagnostic test and Sp is the specificity. For a diagnostic test with a very high specificity, you get a very large ratio, because you are putting a really small value in the denominator. For Sp=0.99, for example, you would end up getting a positive likelihood ratio of 50 or more (assuming that Sn is at least 0.5).

The positive likelihood ratio is a measure of how much the odds of disease are increased if the diagnostic test is positive.

A negative likelihood ratio is defined as as (1-Sn) / Sp. For a diagnostic test with a very large sensitivity, the negative likelihood ratio is very close to zero. For Sn=0.99, the likelihood ratio is going to be 0.02 or smaller, assuming that Sp is at least 0.5.

The negative likelihood ratio is a measure of how much the odds of disease are decreased if the diagnostic test is negative.

The two likelihood ratios should remind you of the acronyms SpIn and SnOut. SpIn means that if specificity is large, then a positive diagnostic test is good at ruling in the disease. This isn’t always the case, sadly, and for many diagnostic tests, the next step after a positive test is not to treat the disease, but to double check things using a more expensive or more invasive test.

SnNout means that if the sensitivity is large, then a negative diagnostic test is good at ruling out the disease. You can safely send the patient home in some settings, or start looking for other diseases in different settings.

That sounds great, but sometimes you are very concerned about false negatives, and you don’t want to send someone home if they actually have the disease. If you are worried about a cervical fracture, ruling out the fracture and sending someone home might lead to paralysis or death if you have a false negative. So you want to be very sure of yourself in this setting.

Now with regard to the comment above, I think it is just a case of careless language. When the authors say “large negative likelihood ratio”, they should have said “extreme negative likelihood ratio” meaning a likelihood ratio much much smaller than one. I’ve done it myself when I talk about a correlation of -0.8 as being a “big” correlation because it is very far away from zero.

We tend to shy away from words like “small” when we talk about a negative likelihood ratio being much less than 1, because “small” in some people’s minds means “inconsequential” when the opposite is true. When I am careful in my language, I try to use the word “extreme” to mean very far away from the null value (1 for a likelihood ratio or 0 for a correlation) rather than “large” or “small”.

Why secondary data analysis takes a lot longer

Someone posted a question noting that most of the statistical consulting projects that they worked on finished in a reasonable time frame, a few were outliers. They took a lot longer and required a lot more effort by the statisticians. Were there any common features to these outliers they wondered. So they asked if anyone else had identified methodological features of projects that went overtime. I only had a subjective impression, but thought it was still worth sharing. Continue reading

Recommended: Practical advice for analysis of large, complex data sets

This is a nice compilation of issues that you should be concerned. The examples are mostly from things that interest Google, but you will find this advice itself is useful no matter what type of data you work with. The advice is split into three broad categories: technical (e.g., look at your distributions), process (e.g., separate validation, description, and evaluation), and communication (e.g., data analysis starts with questions, not data or a technique). Continue reading

PMean: Independent consulting and the cold call

There’s been some more discussion about getting started as an independent statistical consultant. One person is ready to hang their shingle and proposes to “find a niche I can serve, contact companies in that niche, etc.” but didn’t know what that niche might be. I had one cautionary comment and then discussed finding your niche. Continue reading

PMean: What do you hate most about independent consulting

Someone on the Statistical Consulting forum mentioned that she is going to become an independent consultant when she graduates and wanted to find out from people who are currently in that position what the one thing is that they hate most. This email drew a lot of responses including several people who cautioned this women about the difficulties for a young person to become an independent consultant. Here are the thoughts I shared on the thing I hate most and what the issues are with embarking out on your own as an independent consulting early in your career. Continue reading

PMean: Never consult by email if you can help it

Consulting is always a back and forth process and often you will find yourself re-working things because of communication problems. That’s okay, but keep in mind that communication problems are even worse when they are done solely through email. Sometimes you have to consult this way, but it greatly increases the amount of rework needed. Here’s an example. Continue reading