Douglas Zahn. How to be more effective in your professional life. Amstat News, September 2009. Available at http://magazine.amstat.org/blog/2009/09/01/heartofastatcareersept09/.

]]>Everyone wants a rule on this, and no rule will work. It depends on how or why the data is missing. Sometimes 5% missing will kill you and sometimes 90% missing is not a serious issue. Seriously, I have heard arguments that even 90% missing is not a problem if you make a few reasonable assumptions about your data and use a rigorous approach to imputing the missing values. It’s easy enough to verify this. Take a data set with no missing values and remove 90% of the data using a particular set of assumptions. Then impute using a rigorous approach and see if the results match those of the original data set.

But here’s what I tell people. If less than 10% of your data is missing and there is nothing too unusual about the data that is missing then you’re probably okay. If more than 30% is missing, there’s a good chance you have problems unless you proceed very carefully.

That’s a rule of thumb and I remind people that no one gets thrown in jail for violating a rule of thumb.

There’s also a consideration of the method being used. If you rely on a simplistic approach, such as complete case analysis or last observation carried forward, you cannot afford to have more than a small percentage of missing values. If you use rigorous approaches like multiple imputation or maximum likelihood approaches to missing data, then you can tolerate even a very large fraction of missing data. Anyone using these rigorous approaches, of course, should work closely with a statistician. Large amounts of missing data is one of the “red flags” that I look for when deciding how much help a client needs.

Finally, you need to ask yourself why the data is missing, and this often requires speculation beyond the data set itself. Certain types of missing value mechanisms, like missing completely at random, are easily addressed, even if you have a large fraction of missing data. But watch out for informative missingness or data missing not at random . These are setting where the fact that a value is missing is directly associated with the value that you don’t know. If you are measuring something sensitive, and you know that people are more likely to skip answering if they have this sensitive condition, then even a small fraction of missing data can cause serious problems.

]]>

John K. Kruschke. Bayesian meta-analysis of two proportions in random control trials. Doing Bayesian Data Analysi blog, November 3, 2016. Available at http://doingbayesiandataanalysis.blogspot.com/2016/11/bayesian-meta-analysis-of-two.html.

]]>National Clearinghouse of Rehabilitation Training Materials. PDF Accessibility – Part One. Publsied on Feb6, 2017. Available at https://www.youtube.com/watch?v=PJvDoArpZ0g.

]]>Will Knight. Microsoft is creating an oracle for catching biased AI algorithms. MIT Technology Review, May 25, 2018. Available at https://www.technologyreview.com/s/611138/microsoft-is-creating-an-oracle-for-catching-biased-ai-algorithms.