Category Archives: Recommended

Recommended: Tibbles (Tibbles are a modern take on data frames)

I’m an old dog R programmer who tends to rely on features of R that were available 10 years ago (an eternity for computers). But it’s time for this old dog to learn new tricks. One thing I need to use in my R programs is called a “tibble” (sometimes called a “tidy tibble”). It’s a minor but important improvement on data frames and many of the newer packages are using tibbles instead of data frames. Tibbles are available in the package, tibble. This web page offers a nice description of the improvements on tibbles. Continue reading

Recommended: dplyr and pipes: the basics

One of the recent developments in R that I was unaware of until I attended some talks at the Joint Statistical Meetings was the use of dplyr and pipes. It’s an approach to data management that isn’t different from earlier approaches, but the code is much easier to read and maintain. This blog post explains in simple terms how these work and why you would use them. Continue reading

Recommended: Institute for Digital Research and Education — Statistical Computing

This is a wonderful site, but for some reason, it is difficult to find. The Institute for Digital Research and Education (IDRE) at UCLA has put together some wonderful resources on how to do simple data analyses in R, SAS, SPSS, and Stata. The examples cover just about everything you’d ever want to do in any of these statistical packages. If you are making a transition from one statistical package to another, this site offers you the opportunity to see how things are done in the package you know well and compare it to how things are done in the package you are learning. Of special note are the worked textbook examples from many classic statistics textbooks. Continue reading

Recommended: Bayesian computing with INLA

This page promotes a new approach to a broad class of models (spatio-temporal models, latent variable models, mixed models) using a fast approximation to the Bayesian solution. It runs under R and appears to handle very large datasets. I have not had a chance to try this, but it looks very interesting. Continue reading

Recommended: The number of subjects per variable required in linear regression analyses

There are several rules of thumb out there about how many subjects that you need for a multiple linear regression model. Most of these rules look at the ratio of subjects per variable (SPV). If you have 100 subjects and 20 independent variables in your regression model, then the SPV is 5. This article comes to the surprising conclusion that an SPV of 2 is just fine. In other words, you could have 40 subjects and 20 independent variables and still be okay. This is independent of power considerations, by the way, but it still seems rather small to me. Read the paper yourself and let me know what you think. Continue reading