Category Archives: Statistics

Pmean: Which R package should I use?

Working with R is great in that if anything has been done in Statistics, there is an R package that will do it. The problem is that sometimes there are four packages that will do it. So when this happens what do you do? I want to outline what I did recently when I needed to find a package to calculate Cronbach’s alpha. Continue reading

Recommended: Interpretation of Changes in Health-related Quality of Life: The Remarkable Universality of Half a Standard Deviation

I’ve typically mocked the use of effect sizes in research, but perhaps I need to be a bit more open minded. This paper looked at the “minimally important difference” (note: not quite the same thing as the minimal clinically important difference) across 33 published studies of health related quality of life measures. Even though the structure of many of these measures was radically different, the minimally important difference was almost always close to 0.5. The authors draw an analogy to measurement on a seven point scale, where one unit is understood from previous psychological research to represent (roughly) the limit of human discrimination. Continue reading

PMean: Some simple examples of single imputation

I wanted to use R to show some simple approaches to imputing missing values. These approaches are difficult to support because they require that you make some questionable and unverifiable assumptions about your data.  They still may prove useful as a sensitivity check or as a springboard into more complex approaches for imputing missing values. I have a link to the code that generated most of these results. Continue reading

PMean: Using version control through git, github, and R Studio

I’m definitely “old school” when it comes to programming, but there comes a time when even this old dog needs to learn a new trick. I decided yesterday to use version control for my own R programs. Nothing for clients, mind you, because of confidentiality concerns, but the R code that I use to develop teaching examples is certainly fair game. I’m not totally clueless on version control because of my work for the Greater Plains Collaborative, but it’s a different thing to do it totally by yourself. Here’s a brief outline of what I needed to do to get version control up and running. Continue reading

Recommended: Reporting and methodological quality of sample size calculations in cluster randomized trials could be improved: a review

The sample size justification for a cluster randomized trial is messy. It requires the use of an intra-class correlation or something similar (the authors use the term within-cluster correlation). In a review of 300 cluster randomized trials, the authors found that in only about a third of the trials did the authors specify the within-cluster correlation. Even fewer compared this to the observed within-cluster correlation observed in the data. We need to do better. Continue reading

PMean: The biggest statistics papers of all time

I’m giving a short talk about the Kaplan-Meier curve and found out an interesting fact about the 1958 paper by Edward Kaplan and Paul Meier that introduced this curve. It represents the 11th most cited research paper of all time. There’s a nice graphic in a Nature paper that allows you to review the top 100 most cited papers of all time. There are a few other statistics papers on this list as well. Continue reading