Monthly Archives: May 2017

PMean: It only looks like a blank

This page is moving to a new website.

I was having trouble with trailing blanks in an R program. There were some strings that looked like ” Y” and “Y ” and it’s easy enough to fix this, but one of the “Y ” values was not converting properly. The second character wasn’t a blank, but it looked like it. Here’s what I had to do. Continue reading →

Recommended: This is your machine learning system?

This page is moving to a new website.

This xkcd cartoon by Scott Munro is open source, so I can hotlink the image directly. But if you go to the source, https://xkcd.com/1838/, be sure to hover over the image for a second punch line.

Recommended: Why R is Bad for You

This page is moving to a new website.

Arguing about R versus SAS often takes on a religious fervor, so I normally hesitate to recommend articles that trash one package or the other. But this one raises an interesting point which makes it worth reading. Note that “recommended” does not mean that I endorse these conclusions. But rather than bias you with my perception of the issue, just read this on your own. Continue reading →

PMean: A p-value of .000

This page is moving to a new website.

Dear Professor Mean, I ran a statistical test in SPSS and got a p-value of .000. I re-ran the same data in Microsoft Excel and got a p-value of 3.9433E-9. I know from scientific notation that this is the same as 0.0000000039433. Why are these numbers different? Continue reading →

Recommended: One in Five Clinical Trials for Adults with Cancer Never Finish

This page is moving to a new website.

This is a research summary of a study that found one out of every five cancer trials that “did not finish” which actually means that they stopped early for futility, if I am reading between the lines properly. Of those studies, 40% stopped early because of poor accrual. Continue reading →

PMean: Extremely imbalanced multi-center trials

This page is moving to a new website.

There was some recent discussion of issues with multi-center trials where one center dominates, contributing as much as 94% of all the patients. What does this do to the generalizability of the study. I wanted to summarize these comments here, because it relates to some of the issues I’m looking at right now in accrual models for multi-center trials. Continue reading →

Recommended: The numbers for the Science March…

This page is moving to a new website.

I normally don’t recommend other people’s tweets, but these two, from Siobhan Tompson, were too funny to pass up. Continue reading →

PMean: Getting out of the free consulting trap

This page is moving to a new website.

Someone on the Statistical Consulting Section message board asked a question about how to handle a situation where a colleague was repeatedly asking for advice. How do you make a transition from offering free advice to getting paid as a consultant? There were lots of good answers, and here’s the suggestion that I offered. Continue reading →

Recommended: ROSE: A package for binary imbalanced learning

This page is moving to a new website.

Logistic regression and other statistical methods for predicting a binary outcome run into problems when the outcome being tested is very rare, even in data sets big enough to insure that the rare outcome occurs hundreds or thousands of times. The problem is that attempts to optimize the model across all of the data will end up looking predominantly at optimizing the negative cases, and could easily ignore and misclassify all or almost all of the positive cases since they consistute such a small percentage of the data. The ROSE package generates artificial balanced samples to allow for better estimation and better evaluation of the accuracy of the model. Continue reading →

PMean: How big is the stuff I’m working on

This page is moving to a new website.

I have been working part-time on a project for the Great Plains Collaborative (GPC) under the direction of Russ Waitman and the gentle guidance of Dan Connolly, both at Kansas University Medical Center. I hoping to submit a paper soon on the work I’ve done, but if you are curious about the size and scope of the electronic health records that I’ve been slinging around, this blog entry might help. Continue reading →