Monthly Archives: February 2014

PMean: NIH is interested in big data

The National Institutes of Health has shown a recent interest in “big data.” You can define big data in several ways, but a common characteristic is the three V’s. Big data takes up a lot of space (volume) and/or it comes at you very rapidly (velocity) and/or it comes in a wide range of differing formats (variety). One of the recent Requests for Applications (RFAs) from NIH spells out what types of research into big data that they are interested in seeing. I might be interested in applying, and would love to find some collaborators. Here’s a summary of what the RFA is all about. Continue reading

PMean: How many months should you wait before re-testing?

I got a question that I had never heard before, and it sort of is a statistics question and sort of isn’t. A researcher was comparing two methods of training residents in a particular surgical procedure and wanted to know how long you should wait between the training and the evaluation of whether that training was effective. Continue reading

Recommended: Why Coincidences, Miracles And Rare Events Happen Every Day.

This is an interview with David Hand, the author of a new book, The Improbability Principle: Why Coincidences, Miracles and Rare Events Happen Every Day. The discussion raises the issue of events that seem highly improbable based on a simple probability calculation, but which nevertheless, are not that uncommon. Continue reading

PMean: Summary of my research interest in patient accrual in clinical trials.

My boss at UMKC (I’m part-time at UMKC and part-time independent statistical consultant) asked me for one of those “summarize the research you’ve been working on” so she could mention all the work being done by our Department for a talk she’s giving. Recently, I’ve been focused almost exclusively on one thing, and although she knew it very well, I sent her a summary anyway. Then, I thought, why not share the same summary on my blog. Maybe you’re curious or maybe you might be interested in collaborating. So here’s my summary about my work on Bayesian models for patient accrual in clinical trials. Continue reading

Recommended: Why randomized controlled trials fail but needn’t: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!)

This is a classic article about the relationship between signal, noise, sample size, and confidence. It provided pragmatic guidance on designing a trial to make the best use of limited resources. It should be the first article that you read before you sit down and design a clinical trial. Continue reading

Quote: Because statistics has too often been presented …

Because statistics has too often been presented as a bag of specialized computational tools, with morbid emphasis on calculation, it is no wonder that survivors of such courses regard their statistical tools as instruments of torture [rather] than as diagnostic aids in the art and science of data analysis. — George W. Cobb, as cited at

Recommended: Troubleshooting Public Data Archiving: Suggestions to Increase Participation

I’ve always been a big fan of data sharing, mostly for selfish reasons. I like to see interesting data sets and use them as teaching examples and on my website. There are unselfish reasons for sharing data: such as the increase in research transparency and the ability to pursue new avenues of research. But if we want to see more progress in sharing data, there need to be some improvements in public data archives, In particular, there needs to be more flexibility in data embargoes, better communication between the original data set owners and those who would like to re-use their data, better understanding of the ethics of data re-use, and more rewards for those who take the time and trouble to share their data. Continue reading