Category Archives: Statistics

Recommended: Restoring invisible and abandoned trials

Too much research data goes unreported, leading to a serious distortion of the evidence base that clinicians need to make intelligent medical decisions. The authors of this paper in BMJ argue that if you can document that a study has been abandoned before publication, and if you formally requestthe researchers to publish the data, and if they fail to act within a certain amount of time,then the data should be considered public access so that you or anyone else could publish those results. It’s an interesting proposal and one that will generate a lot of controversy. Continue reading →

PMean: Equations using MathType

I’m ordinarily not a big fan of commercial software, but one product that I would have a hard time living without is MathType. It produces mathematical equations with ease and the appearance is almost always perfect. It’s hard to do this, especially with equations have lots of superscripts and subscripts. You get the size or spacing wrong and all of a sudden things look really ugly and it is hard to fix. TeX is a very good product, too, but I have grown so used to MathType that it is really hard to make the switch. I had to upgrade MathType recently to version 6.9 and I wanted to experiment with MathType equations on my blog. Here are some examples. Continue reading →

PMean: Examining relationships in R

I’m giving a talk for the Kansas City R Users Group on how to get a preliminary impression of relationships between pairs of variables. Here is the R code and output that I will use. Continue reading →

PMean: Acceptable response rates

Dear Professor Mean, I review a lot of observational studies in the literature, and I am concerned about the response rates and when they fall so low that they tend to produce problems with selection bias. I’ve heard that anything lower than 80% is a problem. Is that correct? Continue reading →

PMean: Nonparametric tests for multifactor designs

Dear Professor Mean, I want to run nonparametric tests like the Kruskal-Wallis test and the Friedman test for a setting where there may be more than one factor. Everything I’ve seen for these two tests only works for a single factor. Is there any extension of these tests that I could use when I suspect that my data is not normally distributed. Continue reading →

PMean: Forget confounding, and think of things in terms of covariate imbalance

Someone noted in a passing comment in their email that they found the term “confounding” to be difficult and confusing. I’ve been doing this stuff for over thirty years, but to be quite honest, I get a little nervous about this as well. But I took the time to explain a simpler concept, “covariate imbalance.” Continue reading →

PMean: Missing values in R talk

I’m talking a bit about missing values in R this afternoon for the Kansas City R Users Group. Here is what I’ll be talking about. Continue reading →

Recommended: The answer is 17 years, what is the question: understanding time lags in translational research

A widely quoted statistic is that it takes 17 years for research to find it’s way from the initial discovery to clinical practice. That statistic has always bothered me. How do you know that it takes this long? How could you measure such a thing? Wouldn’t it depend on the type of discovery? Apparently, I’m not the only one bothered by this statistic. The authors of this research paper looked at all the publications that purported to estimate the time lag between discovery and clinical adoption. They found that different authors used different markers for the date of discovery and the date of clinical adoption. Furthermore, reporting is poor, with little discussion of the variation in the estimated time lag. Continue reading →

PMean: Analyzing ordinal salary categories

Dear Professor Mean,

I have three variables: physicians (%), dentists (%), and salary categories. I want to know if there is a difference in the percentage of physicians and dentists in each salary category. What test I need to use? ANOVA is not appropriate because the outcome is not continuous. Continue reading →

Recommended: Report on Survey Participation Refusals

This page has moved to a new website.