The sample size justification for a cluster randomized trial is messy. It requires the use of an intra-class correlation or something similar (the authors use the term within-cluster correlation). In a review of 300 cluster randomized trials, the authors found that in only about a third of the trials did the authors specify the within-cluster correlation. Even fewer compared this to the observed within-cluster correlation observed in the data. We need to do better. Continue reading

# Monthly Archives: March 2016

# Recommended: The number of subjects per variable required in linear regression analyses

There are several rules of thumb out there about how many subjects that you need for a multiple linear regression model. Most of these rules look at the ratio of subjects per variable (SPV). If you have 100 subjects and 20 independent variables in your regression model, then the SPV is 5. This article comes to the surprising conclusion that an SPV of 2 is just fine. In other words, you could have 40 subjects and 20 independent variables and still be okay. This is independent of power considerations, by the way, but it still seems rather small to me. Read the paper yourself and let me know what you think. Continue reading

# PMean: Is my odds ratio zero or infinity?

*Dear Professor Mean, I know you told me that when one of the row probabilities in a two by two table is 0% or if one of the row probabilities is 100%, then the odds ratio is either 0 or infinity? But how do I tell which?* Continue reading

# PMean: The biggest statistics papers of all time

I’m giving a short talk about the Kaplan-Meier curve and found out an interesting fact about the 1958 paper by Edward Kaplan and Paul Meier that introduced this curve. It represents the 11th most cited research paper of all time. There’s a nice graphic in a Nature paper that allows you to review the top 100 most cited papers of all time. There are a few other statistics papers on this list as well. Continue reading

# PMean: The perils of shortening a survey

Dear Professor Mean, I’m trying to publish a research study that involves some survey data, but the peer-reviewer is complaining about something I did. There was a scale that I used that had five items, but because the survey was already very long, I used only three of the five items. The peer reviewer seems to think that I arbitrarily chose these three items after looking at the data. How should I respond? Continue reading

# PMean: Do we really need to teach all this math stuff?

I got tagged in a Facebook post about an article criticizing the emphasis on math in high school and that proposes replacing some of the more theory based courses like Algebra II and Calculus with “a practical course in statistics for citizenship”. It’s an interesting article, and although it had some points, I had to disagree with the overall premise. Here’s what I said. Continue reading

# PMean: My current work at the Greater Plains Collaborative

I’m spending a fair amount of time over the next few months working with Russ Waitman and the Greater Plains Collaborative (GPC). It’s an interesting job so far, and one of the things that I find quite appealing about the job is the openness that permeates all of their work. Continue reading

# Recommended: PLOS ONE 2015 Reviewer Thank You

I reviewed a paper for PLOS One in 2014 and got a nice acknowledgment, but I also reviewed a paper for the same journal in 2015. Here’s the acknowledgment for that contribution. They’re still having a bit of trouble with alphabetization (Steve Simon should be the last “Simon” on the list, but it’s not). Still, it’s nice to have a public record of my small contribution. Continue reading

# PMean: The data structure in i2b2

I’m working with the Greater Plains Collaborative on a research project and my work requires me to understand the underlying data structure of a system known as i2b2. It’s not a difficult data structure, but it is uncommon, so it is worthwhile to document what is going on. Continue reading