Everybody has different standards for documentation, and if you are already using a standard you like, don’t let me stop you. But if you’ve never used much documentation and decide that you need to do better, here’s a guideline that I developed. Continue reading
I’ve been teaching a variety of classes that require students to run a statistical analysis in a package like SAS or R and report the results. There is a tremendous variety of formats that students use, and I thought it would be helpful to offer some guidance. It would save me time in grading, but more importantly it would emphasize that students need to think about what they produce rather than just tossing together whatever comes out of the computer. The five requirements for homework assignments are they be complete, concise, clear, error-free, and interpretable. Continue reading
I have been somewhat lax in making my work accessible for people with disabilities. This video covers some of the basic things you can do with a PDF file to insure that it is can be easily read by screen reading software. There are similar videos for Microsoft Word and Microsoft Powerpoint files. Continue reading
I’m teaching an online workshop for The Analysis Factor on survival analysis. It’s not announced yet, and I have a LOT of work to do before it is ready. One thing that will save me time is that I am taking many of my examples from the excellent textbook, Applied Survival Analysis Second Edition. One nice perk of this book is that the helpful folks at UCLA have taken every textbook example, and written up code (with comments!) to reproduce the book’s results. With the exception of a few advanced methods in later chapters, where only one or two software packages have the right capability, the code is written in parallel in R, SAS, SPSS, and Stata. They also have links to the raw data at the publishers website, and datasets stored in SAS format and SPSS format. How nice! Browse around and you’ll find software code for all the examples in other popular statistics textbooks as well.
Warning! The R examples look like they are from the first edition, not the second edition. A small nitpick for an otherwise very nice resource. Continue reading
I got this recommendation from a friend. IBM has a large number of free resources explaining things like cloud computing and blockchain. I’m most interested in their section on analytics. There’s a nice introduction, for example, to natural language processing. Continue reading
I attended several talks about R at the Joint Statistics meetings and noted some interesting packages and other resources during these talks. I lost track of that list until recently, but they are still relevant, so here they are. Continue reading
This is a series of videos and homework exercises that you can work on at your own pace. I have only viewed the outline for this, but anything from DataCamp comes highly recommended. Continue reading
This is the github repository of Ben Baumer. He is one of the co-authors of “Modern Data Science with R” and the data and code from that book is available here. He also provides code and data for OpenWAR, an open source method for calculating a baseball statistic, Wins Above Replacement. Finally, there is an R library for extracting, transforming, and loading “medium” sized datasets into SQL. Medium here means multi-gigabyte sized files. Related to this are a couple of “medium” sized data sets from the Internet Movie Database and from the NYC CitiBike dataset. Continue reading
This paper talks about how to get students to think about large databases in an introductory class that normally uses “toy” problems with a few dozen rows of data. Continue reading
Through the effort of a team of statisticians with the American Statistical Association, the New York Times is producing a new resource for educators called “What’s Going On in This Graph?”. This is similar to another New York Times effort called “What’s Going On in This Picture?”
Every month the New York Times will publish a graph stripped of some key information and ask three questions: What do you notice? What do you wonder? and What do you think is going on in this graph?
The content will be suitable for middle school and high school students, but I suspect that even college students will find the exercise interesting.
The first graph will appear on September 19 and on the second Tuesday of every month afterwards. Continue reading