Author Archives: pmean

PMean: Examining relationships in R

I’m giving a talk for the Kansas City R Users Group on how to get a preliminary impression of relationships between pairs of variables. Here is the R code and output that I will use. Continue reading →

PMean: Acceptable response rates

Dear Professor Mean, I review a lot of observational studies in the literature, and I am concerned about the response rates and when they fall so low that they tend to produce problems with selection bias. I’ve heard that anything lower than 80% is a problem. Is that correct? Continue reading →

PMean: Nonparametric tests for multifactor designs

Dear Professor Mean, I want to run nonparametric tests like the Kruskal-Wallis test and the Friedman test for a setting where there may be more than one factor. Everything I’ve seen for these two tests only works for a single factor. Is there any extension of these tests that I could use when I suspect that my data is not normally distributed. Continue reading →

PMean: Forget confounding, and think of things in terms of covariate imbalance

Someone noted in a passing comment in their email that they found the term “confounding” to be difficult and confusing. I’ve been doing this stuff for over thirty years, but to be quite honest, I get a little nervous about this as well. But I took the time to explain a simpler concept, “covariate imbalance.” Continue reading →

Recommended: Rich Data, Poor Data

Nate Silver emphasizes an important point about when statistical models can really shine: when there is a rich source of data and lots of opportunities to test the predictive power of your models. This is why baseball statistics provide such a great platform for teaching modelling techniques. Continue reading →

Recommended: Editorial (Basic and Applied Social Psychology)

Recommended does not always mean that I agree with what’s written. In this case, it means that this is something that is important to read because it offers an important perspective. And this editorial offers the perspective that all p-values and all confidence intervals are so fatally flawed that they are banned from all future publications in this journal. The editorial goes further to criticize most Bayesian methods because of the problems with the “Laplacian assumption.” The editorial authors have trouble with some of the ambiguities associated with creating a non-informative prior distribution that is, a prior distribution that represents a “state of ignorance.” They will accept Bayesian analyses on a case by case basis. Throwing out most Bayesian analyses, all p-values, and all confidence intervals makes you wonder what they will accept. They suggest larger than typical sample sizes, strong descriptive statistics (which they fail to define), and effect sizes. They believe that by “banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking.” It’s worth debating this issue, though I think that these recommendations are far too extreme. Continue reading →

PMean: Missing values in R talk

I’m talking a bit about missing values in R this afternoon for the Kansas City R Users Group. Here is what I’ll be talking about. Continue reading →

Recommended: P-Values

Randall Munroe, author of the xkcd comic strip, will often comment on Statistics. This cartoon shows how p-values are typically interpreted. Continue reading →

Recommended: New R Package: cdcfluview

I work a lot with secondary datasets and I’m always looking for new and interesting resources. There is a CDC site that tracks flu reports and with a bit of effort, you can get the raw data behind these reports. A blogger, hrbrmstr (Bob Rudis, if you dig long enough to find his real name), developed an R package that makes it easy to import this data into R. He illustrates the use of this package with a graph that shows some interesting trend lines across several major cities. Continue reading →

Recommended: The answer is 17 years, what is the question: understanding time lags in translational research

A widely quoted statistic is that it takes 17 years for research to find it’s way from the initial discovery to clinical practice. That statistic has always bothered me. How do you know that it takes this long? How could you measure such a thing? Wouldn’t it depend on the type of discovery? Apparently, I’m not the only one bothered by this statistic. The authors of this research paper looked at all the publications that purported to estimate the time lag between discovery and clinical adoption. They found that different authors used different markers for the date of discovery and the date of clinical adoption. Furthermore, reporting is poor, with little discussion of the variation in the estimated time lag. Continue reading →