Monthly Archives: May 2017

Recommended: Why R is Bad for You

This page is moving to a new website.

Arguing about R versus SAS often takes on a religious fervor, so I normally hesitate to recommend articles that trash one package or the other. But this one raises an interesting point which makes it worth reading. Note that “recommended” does not mean that I endorse these conclusions. But rather than bias you with my perception of the issue, just read this on your own. Continue reading

PMean: Extremely imbalanced multi-center trials

This page is moving to a new website.

There was some recent discussion of issues with multi-center trials where one center dominates, contributing as much as 94% of all the patients. What does this do to the generalizability of the study. I wanted to summarize these comments here, because it relates to some of the issues I’m looking at right now in accrual models for multi-center trials. Continue reading

PMean: Getting out of the free consulting trap

This page is moving to a new website.

Someone on the Statistical Consulting Section message board asked a question about how to handle a situation where a colleague was repeatedly asking for advice. How do you make a transition from offering free advice to getting paid as a consultant? There were lots of good answers, and here’s the suggestion that I offered. Continue reading

Recommended: ROSE: A package for binary imbalanced learning

This page is moving to a new website.

Logistic regression and other statistical methods for predicting a binary outcome run into problems when the outcome being tested is very rare, even in data sets big enough to insure that the rare outcome occurs hundreds or thousands of times. The problem is that attempts to optimize the model across all of the data will end up looking predominantly at optimizing the negative cases, and could easily ignore and misclassify all or almost all of the positive cases since they consistute such a small percentage of the data. The ROSE package generates artificial balanced samples to allow for better estimation and better evaluation of the accuracy of the model. Continue reading

PMean: How big is the stuff I’m working on

This page is moving to a new website.

I have been working part-time on a project for the Great Plains Collaborative (GPC) under the direction of Russ Waitman and the gentle guidance of Dan Connolly, both at Kansas University Medical Center. I hoping to submit a paper soon on the work I’ve done, but if you are curious about the size and scope of the electronic health records that I’ve been slinging around, this blog entry might help. Continue reading