The question reads: “*Prove that a random variable with a distribution on [0,1] (that is, the density function is equal to 0 outside [0, 1]) has an expectation always between 0 and 1. Prove that its variance is maximum and equal to 1/12 if and only if the distribution is uniform on [0, 1].*”

The first half is okay. The integral of xf(x) over the interval 0 to 1 is bounded above by the integral of f(x) over the interval 0 to 1 and you know that the latter has to equal 1 for it to be a density function.

The second half, though, is wrong. A Bernoulli random variable with probability 1/2 is bounded between 0 and 1 and has variance 1/4. That’s a whole lot bigger than 1/12.

You can confirm this with a quick simulation in R. The single line

var(rbinom(1000, 1, 0.5))

will give you an answer that is pretty close to 0.25.

No fair, you claim. The Bernoulli distribution does not have a density function because it is not a continuous random variable.

That’s true. Let’s consider a different case then. Let’s consider a beta random variable with parameters alpha=0.5 and beta=0.5. I had to peek at Wikipedia, but the variance of a beta distribution is

alpha*beta / ((alpha+beta)^2 * (alpha+beta+1).

Plug in alpha=1 and beta=1 as a quick check and you do indeed get 1/12. When you plug in alpha=0.5 and beta=0.5, you get 1/8. Check this in R

with

var(rbeta(1000, 0.5, 0.5))

and you’ll get a value close to 0.125.

Maybe, you’ll say something like, no fair because the density function of this particular beta distribution is unbounded at both 0 and 1.

Fair enough. How about a distribution that is uniform on the interval 0 to 1/3 and 2/3 to 1? That has a variance of roughly 2/9.

You say that you were talking about unimodal distributions only. I’m not sure, but I think you might be right about 1/12 being the largest variance possible…

…except that a uniform distribution is multi-modal.

]]>Hedyeh Ziai, Rujun Zhang, An-Wen Chan, Nav Persaud. Search for unpublished data by systematic reviewers: an audit. BMJ Open (2017); 7(10). Available at http://bmjopen.bmj.com/content/7/10/e017737.

]]>

Dirk Kruger. Get credit for your data – BMC Research Notes launches data notes. Research in Progress blog. September 29, 2017. Available at http://blogs.biomedcentral.com/bmcblog/2017/09/29/get-credit-for-your-data-bmc-research-notes-launches-data-notes/.

]]>Okay, I agree with you, but it’s an understandable mistake. Let’s quickly review the idea of likelihood ratios. A positive likelihood ratio is defined at Sn / (1-Sp) where Sn is the sensitivity of the diagnostic test and Sp is the specificity. For a diagnostic test with a very high specificity, you get a very large ratio, because you are putting a really small value in the denominator. For Sp=0.99, for example, you would end up getting a positive likelihood ratio of 50 or more (assuming that Sn is at least 0.5).

The positive likelihood ratio is a measure of how much the odds of disease are increased if the diagnostic test is positive.

A negative likelihood ratio is defined as as (1-Sn) / Sp. For a diagnostic test with a very large sensitivity, the negative likelihood ratio is very close to zero. For Sn=0.99, the likelihood ratio is going to be 0.02 or smaller, assuming that Sp is at least 0.5.

The negative likelihood ratio is a measure of how much the odds of disease are decreased if the diagnostic test is negative.

The two likelihood ratios should remind you of the acronyms SpIn and SnOut. SpIn means that if specificity is large, then a positive diagnostic test is good at ruling in the disease. This isn’t always the case, sadly, and for many diagnostic tests, the next step after a positive test is not to treat the disease, but to double check things using a more expensive or more invasive test.

SnNout means that if the sensitivity is large, then a negative diagnostic test is good at ruling out the disease. You can safely send the patient home in some settings, or start looking for other diseases in different settings.

That sounds great, but sometimes you are very concerned about false negatives, and you don’t want to send someone home if they actually have the disease. If you are worried about a cervical fracture, ruling out the fracture and sending someone home might lead to paralysis or death if you have a false negative. So you want to be very sure of yourself in this setting.

Now with regard to the comment above, I think it is just a case of careless language. When the authors say “large negative likelihood ratio”, they should have said “extreme negative likelihood ratio” meaning a likelihood ratio much much smaller than one. I’ve done it myself when I talk about a correlation of -0.8 as being a “big” correlation because it is very far away from zero.

We tend to shy away from words like “small” when we talk about a negative likelihood ratio being much less than 1, because “small” in some people’s minds means “inconsequential” when the opposite is true. When I am careful in my language, I try to use the word “extreme” to mean very far away from the null value (1 for a likelihood ratio or 0 for a correlation) rather than “large” or “small”.

]]>Martin Magdinier. OpenRefine: A free, open source, powerful tool for working with messy data. Available at http://openrefine.org/index.html.

]]>Iain Chalmers, Michael B Bracken, Ben Djulbegovic, Silvio Garattini, Jonathan Grant, A Metin GÃ¼lmezoglu, David W Howells, John P A Ioannidis, Sandy Oliver. How to increase value and reduce waste when research priorities are set. Lancet 2014; 383: 156â€“65. Available at http://www.testingtreatments.org/wp-content/uploads/2014/03/1-Chalmers-et-al.-Paper-1.pdf.

]]>

Choudhry NK. Randomized, Controlled Trials in Health Insurance Systems. N Engl J Med 2017 (Sept. 7); 377: 957-964. Available at http://www.nejm.org/doi/full/10.1056/NEJMra1510058.

]]>Every month the New York Times will publish a graph stripped of some key information and ask three questions: What do you notice? What do you wonder? and What do you think is going on in this graph?

The content will be suitable for middle school and high school students, but I suspect that even college students will find the exercise interesting.

The first graph will appear on September 19 and on the second Tuesday of every month afterwards.

Michael Gonchar and Katherine Schulten. Announcing a new monthly feature: What’s going on in this graph. The New York Times, September 6, 2017. Available at https://www.nytimes.com/2017/09/06/learning/announcing-a-new-monthly-feature-whats-going-on-in-this-graph.html.