This page is moving to a new website.
I got a question about how much missing data could you have in a study and still feel comfortable with your data analysis. It’s a question with no hard and fast answer, but I get the question so often that I have developed some general guidance. Continue reading →
I wanted to use R to show some simple approaches to imputing missing values. These approaches are difficult to support because they require that you make some questionable and unverifiable assumptions about your data. They still may prove useful as a sensitivity check or as a springboard into more complex approaches for imputing missing values. I have a link to the code that generated most of these results. Continue reading →
Dear Professor Mean, I have a large data set from a household budget survey with 20,000 records. When I calculate the mean for some of the variables, there are some missing values. Sometimes it is an average of almost 20,000 observations and sometimes is an average of much less than 20,000. Can I replace all the missing values with zero so I am averaging exactly 20,000 observations for each variable? Continue reading →
I am working on a class that will teach basic data management and graphics using the R programming language with parallel classes in SPSS and SAS. On the third or fourth day of the class, we will look at managing longitudinal data sets, as these require special skills. I wanted to find a couple of reasonably simple longitudinal data sets that were available on the web and which had at least a few missing values in them. Here’s a couple of data sets that might work. Continue reading →
This is a nice summary of the advantages and disadvantages of various methods for handling missing values. Continue reading →