Category Archives: Statistics

Pmean: Which R package should I use?

Working with R is great in that if anything has been done in Statistics, there is an R package that will do it. The problem is that sometimes there are four packages that will do it. So when this happens what do you do? I want to outline what I did recently when I needed to find a package to calculate Cronbach’s alpha. Continue reading →

PMean: So you’re thinking about a retrospective chart review

If you are designing a retrospective chart review, you should talk to a statistician early in the process. There are lots of statistical issues that you must think about during the concept development phase of your research. Here is a broad overview of these issues. Continue reading →

PMean: Calculating 90 day readmission rates

Someone asked me how to calculate a 90 day readmission rate from a large database. It’s a tricky problem because for many databases, it requires you to examine the data from a longitudinal perspective. Here’s some general advice. Continue reading →

Recommended: Interpretation of Changes in Health-related Quality of Life: The Remarkable Universality of Half a Standard Deviation

I’ve typically mocked the use of effect sizes in research, but perhaps I need to be a bit more open minded. This paper looked at the “minimally important difference” (note: not quite the same thing as the minimal clinically important difference) across 33 published studies of health related quality of life measures. Even though the structure of many of these measures was radically different, the minimally important difference was almost always close to 0.5. The authors draw an analogy to measurement on a seven point scale, where one unit is understood from previous psychological research to represent (roughly) the limit of human discrimination. Continue reading →

PMean: Some simple examples of single imputation

I wanted to use R to show some simple approaches to imputing missing values. These approaches are difficult to support because they require that you make some questionable and unverifiable assumptions about your data. They still may prove useful as a sensitivity check or as a springboard into more complex approaches for imputing missing values. I have a link to the code that generated most of these results. Continue reading →

PMean: Using version control through git, github, and R Studio

I’m definitely “old school” when it comes to programming, but there comes a time when even this old dog needs to learn a new trick. I decided yesterday to use version control for my own R programs. Nothing for clients, mind you, because of confidentiality concerns, but the R code that I use to develop teaching examples is certainly fair game. I’m not totally clueless on version control because of my work for the Greater Plains Collaborative, but it’s a different thing to do it totally by yourself. Here’s a brief outline of what I needed to do to get version control up and running. Continue reading →

PMean: Some open source Kaplan Meier curves

I’m giving a talk on the Kaplan-Meier survival curve and wanted to show and interpret a few real examples from the open source literature. Continue reading →

Recommended: Reporting and methodological quality of sample size calculations in cluster randomized trials could be improved: a review

The sample size justification for a cluster randomized trial is messy. It requires the use of an intra-class correlation or something similar (the authors use the term within-cluster correlation). In a review of 300 cluster randomized trials, the authors found that in only about a third of the trials did the authors specify the within-cluster correlation. Even fewer compared this to the observed within-cluster correlation observed in the data. We need to do better. Continue reading →

PMean: Is my odds ratio zero or infinity?

Dear Professor Mean, I know you told me that when one of the row probabilities in a two by two table is 0% or if one of the row probabilities is 100%, then the odds ratio is either 0 or infinity? But how do I tell which? Continue reading →

PMean: The biggest statistics papers of all time

I’m giving a short talk about the Kaplan-Meier curve and found out an interesting fact about the 1958 paper by Edward Kaplan and Paul Meier that introduced this curve. It represents the 11th most cited research paper of all time. There’s a nice graphic in a Nature paper that allows you to review the top 100 most cited papers of all time. There are a few other statistics papers on this list as well. Continue reading →