PMean: The Dark Side of Data Science

This page is moving to a new website.

I’m planning to give a talk on “The Dark Side of Data Science” and I’m hoping to get some interesting references and articles from my colleagues. Here is a first draft of my abstract, with a few references that I am already familiar with.

Progress in statistical modeling has grown faster than our ability to assess the individual and societal impact of these models. We can now attach numbers or labels to people that are surprisingly effective at predicting future behavior, but we are in danger of losing sight of accountability and fairness in the process. In this talk, I will show examples of statistical models that have caused far more harm than good and characterize features of these models: reification, lack of reproducibility, implicit redlining, and the inability to conduct independent audits.


Keith A. Baggerly, Kevin R. Coombes. Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. The Annals of Applied Statistics 2009, 3(4), 1309-1344.

I. Glenn Cohen, Ruben Amarasingham, Anand Shah, Bin Xie, Bernard Lo. The Legal And Ethical Concerns That Arise From Using Complex Predictive Analytics In Health Care. Health Affairs 2014, 33(7).

Gina Kolata. How Bright Promise in Cancer Testing Fell Apart. The New York Times (2011, July 7).

Cliff Kuang. Can AI Be Taught to Explain Itself. The New York Times (2017, Nov 22).

Will Knight. Microsoft is creating an oracle for catching biased AI algorithms. MIT Technology Review (2018, May 25).

Cathy O’Neil. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishers (2016). ISBN: 978-0553418811. Also check out Cathy O’Neil’s blog: Mathbabe: Exploring and Venting About Quantitative Issues.