Tag Archives: Logistic regression

Recommended: A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models

This is one of those articles where you have to restrain yourself. Its message, that good old statistical tools like logistic regression can perform as well as these new fangled machine learning approaches that you haven’t taken the time to learn, is quite tempting. But I’d be cautious here. Maybe logistic regression is still competitive, but maybe the systematic overview got a bunch of biased studies. It’s worthwhile to cite this whenever someone makes an overly strong claim about machine learning models, but don’t use this as an excuse to keep from learning the new stuff yourself. This article is stuck behind a paywall. Sorry! Continue reading

Recommended: ROSE: A package for binary imbalanced learning

Logistic regression and other statistical methods for predicting a binary outcome run into problems when the outcome being tested is very rare, even in data sets big enough to insure that the rare outcome occurs hundreds or thousands of times. The problem is that attempts to optimize the model across all of the data will end up looking predominantly at optimizing the negative cases, and could easily ignore and misclassify all or almost all of the positive cases since they consistute such a small percentage of the data. The ROSE package generates artificial balanced samples to allow for better estimation and better evaluation of the accuracy of the model. Continue reading

PMean: Nonparametric tests for multifactor designs

Dear Professor Mean, I want to run nonparametric tests like the Kruskal-Wallis test and the Friedman test for a setting where there may be more than one factor. Everything I’ve seen for these two tests only works for a single factor. Is there any extension of these tests that I could use when I suspect that my data is not normally distributed. Continue reading