I’m somewhat new to geocoding. One of the first things you might be interested in, if you have geographic data, is an indicator as to whether a certain address, zip code, or county is urban or rural. This is actually quite a complex topic. This paper outlines some of the basic systems to classifying a location as urban, rural, or something in between (e.g., suburban). Continue reading

# Recommended: Practical advice for analysis of large, complex data sets

This is a nice compilation of issues that you should be concerned. The examples are mostly from things that interest Google, but you will find this advice itself is useful no matter what type of data you work with. The advice is split into three broad categories: technical (e.g., look at your distributions), process (e.g., separate validation, description, and evaluation), and communication (e.g., data analysis starts with questions, not data or a technique). Continue reading

# PMean: About those “awful” election predictions

If you were on Mars for the past few days, you may not have noticed that Donald Trump has won the election. There has been a lot of commentary lately about how badly the predictions about the U.S. election have been and someone mentioned that even Nate Silver at the fivethirtyeight website had a predicted probability of a Clinton win at 71%. I wrote a brief comment that predicting an event with 71% probability does not mean that your prediction was “wrong” if the other event occurs. Continue reading

# PMean: A simple example of pipes in R

At the Joint Statistical Meetings this year, I learned a lot about recent developments in R, and not so recent developments that I was totally clueless about. One of those developments was the use of pipes in R. I wanted to show a simple example of how pipes can simplify your code. Continue reading

# PMean: Small group presentations using screen sharing tools

I received a suggestion for the Kansas City R Users Group to use screen sharing tools. I am going to experiment with this a bit. Here are two tools worth trying. Continue reading

# PMean: Misunderstanding autism

A friend of mine posted an inspiring story published in the Washington Post. Unfortunately, it did not inspire me, but rather made me worried about how often we misunderstand autism and how much trouble this causes. It’s not statistics, per se, but rather represents an example of how research on new approaches for patients with autism can end up being abusive. Continue reading

# PMean: Measuring pixels in an R graph

I have an R cheat sheet, How Big Is Your Graph, that explains how to measure the size of various features of your graph in R. This blog post illustrates unit conversions. If you want to measure the length of a diagonal line segment in an R graph, you need to calculate the size of the plotting region in pixels, compare that to the range of the plotting region in the x and y directions, and then apply the Pythagorean Theorem. Continue reading

# PMean: Rotating text in an R graph

I have an R cheat sheet, How Big Is Your Graph, that explains how to measure the size of various features of your graph in R. This blog post illustrates how you can use some of the commands described in that cheat sheet to rotate text to match a diagonal line in an R graph. It’s trickier than it seems. Continue reading

# PMean: Drawing the perfect circle

I have an R cheat sheet, How Big Is Your Graph, that explains how to measure the size of various features of your graph in R. This blog post illustrates how you can use some of the commands described in that cheat sheet to draw a perfect circle. Continue reading

# PMean: Independent consulting and the cold call

There’s been some more discussion about getting started as an independent statistical consultant. One person is ready to hang their shingle and proposes to “find a niche I can serve, contact companies in that niche, etc.” but didn’t know what that niche might be. I had one cautionary comment and then discussed finding your niche. Continue reading