Recommended: In UC’s battle with the world’s largest scientific publisher, the future of information is at stake

The University of California (UC) is in the midst of a difficult negotiation with Reed Elsevier, a major publisher of research journals. The dispute relates to the traditional model of publishing where the author writes for a journal for free and the journal sells subscriptions to individuals and libraries. A newer publication model is Open Source, where the author pays a fee to get the article published, and then the article is made available for free to any and all readers. The UC library wants a large reduction in subscription fees and is threatening to cancel the Elsevier subscription and rely solely on open source journals. The issues are complicated and this article lays out both sides carefully. Continue reading

Recommended: Standardized Mortality Ratio

I was at a talk where mortality rates were presented in one column and  the standardized mortality ratio was presented in a different column. I was a bit confused; I could not remember how or why you calculate an SMR. It’s not because SMR calculations are complicated; it’s because my brain can’t remember things as well as it used to. So when I got back to my office, I searched for a web site with a simple tutorial on SMRs with a worked out example. This page popped up right away and I was impressed with the clarity of the writing style. Continue reading

Recommended: Welcome to DASL – The Data And Story Library

The Data and Story Library (DASL) is a collection of small and simple data sets useful for teaching basic statistical concepts. It was originally housed at the Carnegie-Mellon website, but (like many classic websites) it disappeared one day. The nice folks at Data Description, Inc. (makers of Data Desk software) have revived and updated this resource. Continue reading

PMean: Getting R to shut the heck up

When you are using R Markdown to create various documents, you are often interested in displaying any informative messages that appear along the way. This is especially true for documents you plan to use yourself. But when you are preparing a report or a presentation for someone else, you may want to suppress these messages. That’s not always easy because different functions in R use different means to display messages, especially warning messages. So the option that might suppress a warning message from one function might not work for another function. Warnings when loading packages are notoriously difficult to suppress. I want to list, for my own benefit, all of the options that are available for getting R to shut the heck up. Continue reading

Recommended: 1.1 Billion Taxi Rides with Spark 2.2 & 3 Raspberry Pi 3 Model Bs

Mark Litwintschik has taken a large open source data set (1.1 billion taxi rides with data storage on the order of hundreds of gigabytes) and ran some benchmark queries on a variety of different systems. Perhaps the most humble of these systems is a cluster of three Raspberry Pi computers. This webpage talks about how he set up the software on this cluster. Continue reading