This is an O’Reilly book (cute animal on the cover is a rabbit) that is available online for free. It’s a great resource for someone just getting started with text mining. Continue reading

# Category Archives: Statistics

# PMean: Cases and cohorts and controls, oh my!

Some asked a question about a retrospective study where you have a control cohort matched to a case cohort so the cohorts are similar on important (potentially confounding) variables. I pointed out that the two consecutive words “case cohort” are ambiguous and tried to explainÂ how I define a retrospective cohort design versus a (retrospective) case-control design. Continue reading

# Recommended: R and SQL Server 2016

I have not viewed this video yet, but it comes from a good friend. There is a substantial effort at Microsoft to better integrate the R programming language and their flagship database produce, SQL Server. Continue reading

# PMean: Looking inside the brains of scientists

I found an interesting research study that shows what happens inside the brains ofÂ scientists as they view statistical graphs of the type commonly used in peer-reviewed research. I don’t have the citation in front of me, but it was published in a very prominent research journal. Here’s a brief summary of the research. Continue reading

# PMean: How to run your first Bayesian analysis using jags software in R

Someone wanted to know how to run a Bayesian data analysis for a two group longitudinal study. There are several ways you can do this, but I had to confess I did not have an immediate answer. So I took some time to figure out how to do this using jags software inside of R. I’ve done a fair amount of stuff in jags, but not anything close to a longitudinal design. The general principle is to start with something easy and work your way slowly up to the final analysis. Continue reading

# PMean: Another example of pipes in R

I am using pipes in R (the magrittr package) a lot recently. It reduces the number of errors due to nested functions, among other things. I’ve given a simple example before, and here’s another. Continue reading

# PMean: When differing versions of R packages matter

When you use R, you are using a program that is constantly evolving. The user-contributed packages are also evolving as well. Normally this is not that big a deal. But sometimes it is. Continue reading

# Why secondary data analysis takes a lot longer

Someone posted a question noting that most of the statistical consulting projects that they worked on finished in a reasonable time frame, a few were outliers. They took a lot longer and required a lot more effort by the statisticians. Were there any common features to these outliers they wondered. So they asked if anyone else had identified methodological features of projects that went overtime. I only had a subjective impression, but thought it was still worth sharing. Continue reading

# PMean: About those “awful” election predictions

If you were on Mars for the past few days, you may not have noticed that Donald Trump has won the election. There has been a lot of commentary lately about how badly the predictions about the U.S. election have been and someone mentioned that even Nate Silver at the fivethirtyeight website had a predicted probability of a Clinton win at 71%. I wrote a brief comment that predicting an event with 71% probability does not mean that your prediction was “wrong” if the other event occurs. Continue reading

# PMean: A simple example of pipes in R

At the Joint Statistical Meetings this year, I learned a lot about recent developments in R, and not so recent developments that I was totally clueless about. One of those developments was the use of pipes in R. I wanted to show a simple example of how pipes can simplify your code. Continue reading