A rather harsh and cynical take on data science, but still worth reading. Let me share a story about this. Back in my college days (that would be the 1970s), someone found a New Yorker cartoon and shared it with me. It showed a politician, obviously a very powerful politician because his office had a view of the Washington Monument. He was speaking to his aide “That’s the gist of what I want to say. Now go and find me some statistics to base it on.” So the issues that this person brings up are no different than those from four decades ago. There’s no easy solution to the problem. You can’t say, “I’ll only work with people who have a commitment to the truth, no matter where it might lead” because even people without strong overt biases still have subtle biases that can profoundly skew the results. Requiring a priori specifications and reserving a hold out sample for a final quality check can help, but mostly it is just being careful and detail oriented and transparent in all your work. Continue reading

# Recommended: Harvard University Program on Survey Research

This is a series of guides on survey research, written for the beginning student. It is written from the perspective of Political Science, but the advice works for other areas as well. Continue reading

# Recommended: A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models

This is one of those articles where you have to restrain yourself. Its message, that good old statistical tools like logistic regression can perform as well as these new fangled machine learning approaches that you haven’t taken the time to learn, is quite tempting. But I’d be cautious here. Maybe logistic regression is still competitive, but maybe the systematic overview got a bunch of biased studies. It’s worthwhile to cite this whenever someone makes an overly strong claim about machine learning models, but don’t use this as an excuse to keep from learning the new stuff yourself. This article is stuck behind a paywall. Sorry! Continue reading

# Recommended: Webinar Series, Congressionally Directed Medical Research Programs

We live in a golden age of learning, where you find find just about anything you’d ever need to learn from on the Internet. One example of this is a series of webinars about who to get research funding through the Congressionally Directed Medical Research Programs (CDRMP). I have not listened yet to any of these webinars, but they look like they would be very helpful for anyone seeking funding through this program. Continue reading

# Recommended: R Markdown Basics

This is actually a nice “peek under the hood” approach with lots of practical advice about getting that last tweak in to make your results go from good to great. Continue reading

# Recommended: LaTeX/Mathematics

You can incorporate very nice looking mathematical formulas in R Markdown fairly easily. The system relies on LaTeX for displaying formulas and is surprisingly easy to learn. But every once in a while you want to do something a bit exotic, like placing a “hat” in your equation. I’ve typically just done a quick Google search on something like “LaTeX hat symbol” and each different search yields a different website. Recently, I stumbled up a fairly comprehensive guide to displaying mathematical formulas in LaTex. It is published as an eBook.

Note: Some of the examples require additional libraries like amsmath and I haven’t figured out yet how to take advantage of these libraries in R Markdown. Continue reading

# Quote: Did you hear about the mathematician…

“Did you hear about the mathematician who was afraid of negative numbers? He would stop at nothing to avoid them.” (This joke is all over the Internet, and I’m not sure where the original source would be).

# Recommended: Stop Saying ‘Exponential.’ Sincerely, a Math Nerd.

This is a brief plea to avoid using the word “exponential” when you really mean “a lot.” Continue reading

# Recommended: 12 things I wish I’d known before starting as a Data Scientist

This article was recommended to me at a webinar I attended. The author offers very personal and practical advice. The author’s third point “You’ll never have to know all the tools” is quite reassuring. Continue reading

# PMean: Slapping the word “pilot” on a failed study

Someone was asking on the MedStats listserv about a study that had gone off the rails. They had recruited only about a third of the patients that they had wanted. Things were going pretty well in the first arm of the study, but the second arm had a dropout rate of 50%.

Anyway, they decided to end the study (good call!) and wanted to know what they should do with the data that they had already collected. There were three options that they were considering (I’m paraphrasing a bit here).

- Analyze the study as originally planned, including a classic test of hypothesis for the primary outcome.
- Call this a pilot study and provide descriptive analyses only.
- Recognize that the data is so fatally flawed that any analysis of the data would be inappropriate.

This is what I suggested. Continue reading