Tag Archives: Missing data

PMean: Some simple examples of single imputation

I wanted to use R to show some simple approaches to imputing missing values. These approaches are difficult to support because they require that you make some questionable and unverifiable assumptions about your data.  They still may prove useful as a sensitivity check or as a springboard into more complex approaches for imputing missing values. I have a link to the code that generated most of these results. Continue reading

PMean: Can I replace missing values with zero?

Dear Professor Mean, I have a large data set from a household budget survey with 20,000 records. When I calculate the mean for some of the variables, there are some missing values. Sometimes it is an average of almost 20,000 observations and sometimes is an average of much less than 20,000. Can I replace all the missing values with zero so I am averaging exactly 20,000 observations for each variable? Continue reading

PMean: Simple longitudinal data sets to illustrate data management

I am working on a class that will teach basic data management and graphics using the R programming language with parallel classes in SPSS and SAS. On the third or fourth day of the class, we will look at managing longitudinal data sets, as these require special skills. I wanted to find a couple of reasonably simple longitudinal data sets that were available on the web and which had at least a few missing values in them. Here’s a couple of data sets that might work. Continue reading