I’m an old dog R programmer who tends to rely on features of R that were available 10 years ago (an eternity for computers). But it’s time for this old dog to learn new tricks. One thing I need to use in my R programs is called a “tibble” (sometimes called a “tidy tibble”). It’s a minor but important improvement on data frames and many of the newer packages are using tibbles instead of data frames. Tibbles are available in the package, tibble. This web page offers a nice description of the improvements on tibbles. Continue reading
Category Archives: Recommended
Recommended: dplyr and pipes: the basics
One of the recent developments in R that I was unaware of until I attended some talks at the Joint Statistical Meetings was the use of dplyr and pipes. It’s an approach to data management that isn’t different from earlier approaches, but the code is much easier to read and maintain. This blog post explains in simple terms how these work and why you would use them. Continue reading
Recommended: Hadley Wickham, the Man Who Revolutionized R
Hadley Wickham has written many popular R packages, so many that they are sometimes referred to as the “Hadleyverse.” This is a nice biography that emphasizes the impact that Dr. Wickham has had on R. Continue reading
Recommended: 100+ Interesting Data Sets for Statistics
This list starts out with a data set of 216,930 previous Jeopardy questions and goes from there. Not everything suggested is easily amenable for statistical analysis, but the list is extremely interesting and diverse. In particular, this list is very helpful for anyone interested in text data. Continue reading
Recommended: Institute for Digital Research and Education — Statistical Computing
This is a wonderful site, but for some reason, it is difficult to find. The Institute for Digital Research and Education (IDRE) at UCLA has put together some wonderful resources on how to do simple data analyses in R, SAS, SPSS, and Stata. The examples cover just about everything you’d ever want to do in any of these statistical packages. If you are making a transition from one statistical package to another, this site offers you the opportunity to see how things are done in the package you know well and compare it to how things are done in the package you are learning. Of special note are the worked textbook examples from many classic statistics textbooks. Continue reading
Recommended: Handling date-times in R
Dates in R, like dates in any other software package, are tricky to work with. Here’s a nice guide that will help you get started. Continue reading
Recommended: Ten Simple Rules for Effective Statistical Practice
This article has good general advice about how to run a statistical analysis, such as Rule 1: Statistical Methods Should Enable Data to Answer Scientific Questions.
Recommended: Bayesian computing with INLA
This page promotes a new approach to a broad class of models (spatio-temporal models, latent variable models, mixed models) using a fast approximation to the Bayesian solution. It runs under R and appears to handle very large datasets. I have not had a chance to try this, but it looks very interesting. Continue reading
Recommended: The number of subjects per variable required in linear regression analyses
There are several rules of thumb out there about how many subjects that you need for a multiple linear regression model. Most of these rules look at the ratio of subjects per variable (SPV). If you have 100 subjects and 20 independent variables in your regression model, then the SPV is 5. This article comes to the surprising conclusion that an SPV of 2 is just fine. In other words, you could have 40 subjects and 20 independent variables and still be okay. This is independent of power considerations, by the way, but it still seems rather small to me. Read the paper yourself and let me know what you think. Continue reading
Recommended: The Survey Statistician
The International Association of Survey Statisticians (IASS) has a twice-yearly newsletter that talks about meetings and events sponsored by the association, informal overview articles about new methodologies, and book reviews. This is the archive page for the current and all previous issues of this newsletter. Continue reading