This page is moving to a new website.
I heard a story a long time ago, and I don’t remember who told it to me and I’m probably getting all the details wrong, but I wanted to try to recreate the story from memory because it illustrates one of the perils of blind reliance on statistical models to identify “important” variables. Continue reading
I wrote a program in R Markdown that shows how the lasso regression model works. It has too many pictures to be easily ported to this blog, so I’ll share a link to a pdf file instead. You can also find the R Markdown code at my github repository.
There are several rules of thumb out there about how many subjects that you need for a multiple linear regression model. Most of these rules look at the ratio of subjects per variable (SPV). If you have 100 subjects and 20 independent variables in your regression model, then the SPV is 5. This article comes to the surprising conclusion that an SPV of 2 is just fine. In other words, you could have 40 subjects and 20 independent variables and still be okay. This is independent of power considerations, by the way, but it still seems rather small to me. Read the paper yourself and let me know what you think. Continue reading