Monthly Archives: November 2017

PMean: January talk at KU

This page is moving to a new website.

Networking is important, and until recently I have failed to build bridges with some of the very smart people working at the University of Kansas in Lawrence. But I will be giving a colloquium talk to a group (Center for Research Methods and Data Analysis) at KU in January. It may be for a different, but closely related group, but it doesn’t matter. It’s an excuse to get out of the office and meet people. Here’s the tentative title and abstract for my talk and a brief review of some other talks I’ll be giving. Continue reading

Recommended: Can A.I. be taught to explain itself

This page is moving to a new website.

This is a nice article in the popular press that talks about some of the problems with “black box” models (in particular deep neural nets) used extensively in many big data projects. It is a bit shy on technical details, which is understandable for a paper like the New York Times. Even so, the stories are quite intriguing. This is a wake up call for those people who fail to recognize the serious problems with many big data models. Continue reading

PMean: Losing track of your transformed variables in R

This page is moving to a new website.

I got an interesting question from one of my students, and it illustrates a subtle issue that may confuse beginning R programmers. The student was trying to compute a ratio of brain weight to body weight in a small data set, but then was unable to calculate any summary statistics on that ratio. Here’s what caused the problem. Continue reading

Recommended: beanumber repository

This page is moving to a new website.

This is the github repository of Ben Baumer. He is one of the co-authors of “Modern Data Science with R” and the data and code from that book is available here. He also provides code and data for OpenWAR, an open source method for calculating a baseball statistic, Wins Above Replacement. Finally, there is an R library for extracting, transforming, and loading “medium” sized datasets into SQL. Medium here means multi-gigabyte sized files. Related to this are a couple of “medium” sized data sets from the Internet Movie Database and from the NYC CitiBike dataset. Continue reading

Recommended: Writing about numbers

This page is moving to a new website.

This is a chapter in a classic book, Medical Uses of Statistics. The writer of this particular chapter was a giant in Statistics, Frederick Mosteller. This chapter talks about some of the style issues associated with the data that you would normally present in your results section of your research paper. The advice is a bit dated, perhaps, but still well worth reading. Continue reading