PMean: Getting R to shut the heck up

When you are using R Markdown to create various documents, you are often interested in displaying any informative messages that appear along the way. This is especially true for documents you plan to use yourself. But when you are preparing a report or a presentation for someone else, you may want to suppress these messages. That’s not always easy because different functions in R use different means to display messages, especially warning messages. So the option that might suppress a warning message from one function might not work for another function. Warnings when loading packages are notoriously difficult to suppress. I want to list, for my own benefit, all of the options that are available for getting R to shut the heck up. Continue reading

Recommended: 1.1 Billion Taxi Rides with Spark 2.2 & 3 Raspberry Pi 3 Model Bs

Mark Litwintschik has taken a large open source data set (1.1 billion taxi rides with data storage on the order of hundreds of gigabytes) and ran some benchmark queries on a variety of different systems. Perhaps the most humble of these systems is a cluster of three Raspberry Pi computers. This webpage talks about how he set up the software on this cluster. Continue reading

PMean: What to do about claims of borderline statistical significance

A comment about the phrase “trend towards efficiency” on the Statistical Consulting Section discussion board raised a lot of interesting commentary. The phrase refers to a setting where the p-value is not small enough to allow you to claim statistical significance, but still was close enough to 0.05 to be worth commenting on. Most of responses were fairly negative and stressed that we need to refuse to sign off on any report of publication using that phrase. I posted a response that differed from the others. Here’s the gist of what I said. Continue reading

Recommended: Making it easier to discover data sets

I heard about this from the UMKC Bioinformatics twitter feed. Google has a blog entry highlighting a new search feature they’ve developed, Dataset Search. It lets you find interesting data sets using standard Google search criteria. The system only works if people on the web provide reasonable documentation of their data sets. I’ve not had a chance to work with this yet, but it looks interesting. Continue reading