Recommended: Defining Urban and Rural Areas in U.S. Epidemiologic Studies

I’m somewhat new to geocoding. One of the first things you might be interested in, if you have geographic data, is an indicator as to whether a certain address, zip code, or county is urban or rural. This is actually quite a complex topic. This paper outlines some of the basic systems to classifying a location as urban, rural, or something in between (e.g., suburban). Continue reading

Recommended: Practical advice for analysis of large, complex data sets

This is a nice compilation of issues that you should be concerned. The examples are mostly from things that interest Google, but you will find this advice itself is useful no matter what type of data you work with. The advice is split into three broad categories: technical (e.g., look at your distributions), process (e.g., separate validation, description, and evaluation), and communication (e.g., data analysis starts with questions, not data or a technique). Continue reading

PMean: About those “awful” election predictions

If you were on Mars for the past few days, you may not have noticed that Donald Trump has won the election. There has been a lot of commentary lately about how badly the predictions about the U.S. election have been and someone mentioned that even Nate Silver at the fivethirtyeight website had a predicted probability of a Clinton win at 71%. I wrote a brief comment that predicting an event with 71% probability does not mean that your prediction was “wrong” if the other event occurs. Continue reading

PMean: Misunderstanding autism

A friend of mine posted an inspiring story published in the Washington Post. Unfortunately, it did not inspire me, but rather made me worried about how often we misunderstand autism and how much trouble this causes. It’s not statistics, per se, but rather represents an example of how research on new approaches for patients with autism can end up being abusive. Continue reading

PMean: Measuring pixels in an R graph

I have an R cheat sheet, How Big Is Your Graph, that explains how to measure the size of various features of your graph in R. This blog post illustrates unit conversions. If you want to measure the length of a diagonal line segment in an R graph, you need to calculate the size of the plotting region in pixels, compare that to the range of the plotting region in the x and y directions, and then apply the Pythagorean Theorem. Continue reading

PMean: Independent consulting and the cold call

There’s been some more discussion about getting started as an independent statistical consultant. One person is ready to hang their shingle and proposes to “find a niche I can serve, contact companies in that niche, etc.” but didn’t know what that niche might be. I had one cautionary comment and then discussed finding your niche. Continue reading