Author Archives: pmean

PMean: Calculating 90 day readmission rates

Someone asked me how to calculate a 90 day readmission rate from a large database. It’s a tricky problem because for many databases, it requires you to examine the data from a longitudinal perspective. Here’s some general advice. Continue reading →

Recommended: Interpretation of Changes in Health-related Quality of Life: The Remarkable Universality of Half a Standard Deviation

I’ve typically mocked the use of effect sizes in research, but perhaps I need to be a bit more open minded. This paper looked at the “minimally important difference” (note: not quite the same thing as the minimal clinically important difference) across 33 published studies of health related quality of life measures. Even though the structure of many of these measures was radically different, the minimally important difference was almost always close to 0.5. The authors draw an analogy to measurement on a seven point scale, where one unit is understood from previous psychological research to represent (roughly) the limit of human discrimination. Continue reading →

PMean: Some simple examples of single imputation

I wanted to use R to show some simple approaches to imputing missing values. These approaches are difficult to support because they require that you make some questionable and unverifiable assumptions about your data. They still may prove useful as a sensitivity check or as a springboard into more complex approaches for imputing missing values. I have a link to the code that generated most of these results. Continue reading →

PMean: Using version control through git, github, and R Studio

I’m definitely “old school” when it comes to programming, but there comes a time when even this old dog needs to learn a new trick. I decided yesterday to use version control for my own R programs. Nothing for clients, mind you, because of confidentiality concerns, but the R code that I use to develop teaching examples is certainly fair game. I’m not totally clueless on version control because of my work for the Greater Plains Collaborative, but it’s a different thing to do it totally by yourself. Here’s a brief outline of what I needed to do to get version control up and running. Continue reading →

PMean: Some open source Kaplan Meier curves

I’m giving a talk on the Kaplan-Meier survival curve and wanted to show and interpret a few real examples from the open source literature. Continue reading →

Recommended: Reporting and methodological quality of sample size calculations in cluster randomized trials could be improved: a review

The sample size justification for a cluster randomized trial is messy. It requires the use of an intra-class correlation or something similar (the authors use the term within-cluster correlation). In a review of 300 cluster randomized trials, the authors found that in only about a third of the trials did the authors specify the within-cluster correlation. Even fewer compared this to the observed within-cluster correlation observed in the data. We need to do better. Continue reading →

Recommended: The number of subjects per variable required in linear regression analyses

There are several rules of thumb out there about how many subjects that you need for a multiple linear regression model. Most of these rules look at the ratio of subjects per variable (SPV). If you have 100 subjects and 20 independent variables in your regression model, then the SPV is 5. This article comes to the surprising conclusion that an SPV of 2 is just fine. In other words, you could have 40 subjects and 20 independent variables and still be okay. This is independent of power considerations, by the way, but it still seems rather small to me. Read the paper yourself and let me know what you think. Continue reading →

PMean: Is my odds ratio zero or infinity?

Dear Professor Mean, I know you told me that when one of the row probabilities in a two by two table is 0% or if one of the row probabilities is 100%, then the odds ratio is either 0 or infinity? But how do I tell which? Continue reading →

PMean: The biggest statistics papers of all time

I’m giving a short talk about the Kaplan-Meier curve and found out an interesting fact about the 1958 paper by Edward Kaplan and Paul Meier that introduced this curve. It represents the 11th most cited research paper of all time. There’s a nice graphic in a Nature paper that allows you to review the top 100 most cited papers of all time. There are a few other statistics papers on this list as well. Continue reading →

PMean: The perils of shortening a survey

Dear Professor Mean, I’m trying to publish a research study that involves some survey data, but the peer-reviewer is complaining about something I did. There was a scale that I used that had five items, but because the survey was already very long, I used only three of the five items. The peer reviewer seems to think that I arbitrarily chose these three items after looking at the data. How should I respond? Continue reading →