Tag Archives: Big data

Recommended: Data Science Has Become About Lending False Credibility To Decisions We’ve Already Made

A rather harsh and cynical take on data science, but still worth reading. Let me share a story about this. Back in my college days (that would be the 1970s), someone found a New Yorker cartoon and shared it with me. It showed a politician, obviously a very powerful politician because his office had a view of the Washington Monument. He was speaking to his aide “That’s the gist of what I want to say. Now go and find me some statistics to base it on.” So the issues that this person brings up are no different than those from four decades ago. There’s no easy solution to the problem. You can’t say, “I’ll only work with people who have a commitment to the truth, no matter where it might lead” because even people without strong overt biases still have subtle biases that can profoundly skew the results. Requiring a priori specifications and reserving a hold out sample for a final quality check can help, but mostly it is just being careful and detail oriented and transparent in all your work. Continue reading

Recommended: A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models

This is one of those articles where you have to restrain yourself. Its message, that good old statistical tools like logistic regression can perform as well as these new fangled machine learning approaches that you haven’t taken the time to learn, is quite tempting. But I’d be cautious here. Maybe logistic regression is still competitive, but maybe the systematic overview got a bunch of biased studies. It’s worthwhile to cite this whenever someone makes an overly strong claim about machine learning models, but don’t use this as an excuse to keep from learning the new stuff yourself. This article is stuck behind a paywall. Sorry! Continue reading

Quote: In a world where the price of calculation continues to decrease rapidly…

“In a world where the price of calculation continues to decrease rapidly, but the price of theorem proving continues to hold steady or increase, elementary economics indicates that we ought to spend a larger and larger fraction of our time on calculation.” John Tukey, as quoted in “Sunset Salvo”, The American Statistician 1986; 40(10): 72-76.

Recommended: How a Feel-Good AI Story Went Wrong in Flint

Building a great statistical model does no one any good if it doesn’t pay attention to non-statistical issues. This story talks about a machine learning model to identify which houses in Flint Michagan that were the best candidates for removal of lead pipes. The model worked fairly well, but came up against problems like individual city council members wanting to assure their constituents that enough was being done in their district. I’m not sure what the actual moral of this story is, but it does serve as a warning to be careful when you are modeling data in a contentous area. Continue reading