Category Archives: Recommended

Recommended: ProbOnto

If you work with probability distributions a lot, you find there are mutliple parameterizations (e.g., the two different forms of the exponential distribution), as well as interesting relationships (the geometric distribution is a discrete version of the exponential distribution). I have found Wikipedia to be a nice guide for some of this, but the coverage is uneven in quality. One of the Wikipedia links mentioned a new website, ProbOnto, that offers a systematic and standardized attempt to catalog every important probability distribution and the relationships among these distributions. Continue reading

Recommended: Why R is Bad for You

Arguing about R versus SAS often takes on a religious fervor, so I normally hesitate to recommend articles that trash one package or the other. But this one raises an interesting point which makes it worth reading. Note that “recommended” does not mean that I endorse these conclusions. But rather than bias you with my perception of the issue, just read this on your own. Continue reading

Recommended: ROSE: A package for binary imbalanced learning

Logistic regression and other statistical methods for predicting a binary outcome run into problems when the outcome being tested is very rare, even in data sets big enough to insure that the rare outcome occurs hundreds or thousands of times. The problem is that attempts to optimize the model across all of the data will end up looking predominantly at optimizing the negative cases, and could easily ignore and misclassify all or almost all of the positive cases since they consistute such a small percentage of the data. The ROSE package generates artificial balanced samples to allow for better estimation and better evaluation of the accuracy of the model. Continue reading

Recommended: Proving the null hypothesis in clinical trials

I’m attending a great short course on non-inferiority trials and the speaker provided a key reference of historical interest. This reference is the one that got the Statistics community interested in the concept of non-inferiority. The full text is behind a paywall, but you can look at the abstract. A footnote is a paper, Dunnett and Gent 1977, (also trapped behind a paywall) addressed this problem earlier. Continue reading