Category Archives: Statistics

Recommended: Sample size of 12 per group rule of thumb for a pilot study

This study is (sadly) not available for free on the Internet, but it is still worth highlighting here. Steven Julious provides some justification for the use of twelve patients per group in a pilot study. This is a useful starting point for discussion, and it may serve as a useful lower bound. I would suggest that you consider the size of the larger trial that you are piloting. For a larger study that might require thousands or tens of thousands of patients, a pilot study of 12 patients per group is woefully inadequate. Continue reading

PMean: What is the probability of a probability of one

Someone wrote asking me about a variation of the “Rule of Three”. This rule says that if you observe zero events out of n, an upper 95% confidence limit for n is approximately 3/n. So suppose you operated on 10 patients and none of them died after surgery. Then you would be 95% confident that the mortality rate would be 30% (3/10) or less. This person asked “Suppose I repeatedly sample from a population and every patient in the sample was a G. What is the how likely is it that the entire population is Gs?” This flips the problem around, and is equivalent to saying that the probability of survival is 97% or greater. But this person wanted an estimate of the probability that the probability in the population is 1. Continue reading

PMean: An example of a simple sample size justification

Someone asked me for a sample size justification for a study involving a historical control group of 30 patients and a treatment group of unspecified size. I thought it would be nice to document the mechanics of this calculation here, as an example for future clients. It uses a program, Piface, developed by Russ Lenth for sample size calculations. Continue reading

PMean: Simple longitudinal data sets to illustrate data management

I am working on a class that will teach basic data management and graphics using the R programming language with parallel classes in SPSS and SAS. On the third or fourth day of the class, we will look at managing longitudinal data sets, as these require special skills. I wanted to find a couple of reasonably simple longitudinal data sets that were available on the web and which had at least a few missing values in them. Here’s a couple of data sets that might work. Continue reading