PMean: Resources for a beginner’s workshop in R

I’m helping to teach a beginner’s workshop on R. Here are some resources that we will get from the web, but if you can download these files to your computer ahead of time, that would be great.

Here are the files for the latest R version (3.0.2 as of the writing of this blog entry):

Here are the files for R Studio (what is R Studio?)

Here are the files for R Commander (what is R Commander?)

I’ll be using a data set on childhood respiratory diseases (read documentation for this file)

Here are the R commands and the associated output.

For your convenience, I am also including the R commands file below.

# For this example, I am reading data from a text file
# located at the OzDASL (Australian Data And Story Library)
# website. It is a tab delimited file with headers in the first row.

fnam <- “http://www.statsci.org/data/general/fev.txt”
crd <- read.table(file=fnam,header=TRUE)

# It is always good to peek at your data, and the
# head and tail functions give the first few and
# the last few bits of your data.
head(crd)
tail(crd)

# All statistical analyses in R are functions. The
# mean function gives you the average.
mean(crd$Age)

# Pop quiz #1, what is the average height for this data?

# Always be careful about missing values. Most functions
# in R have options for handling missing values different
# ways.
mean(crd$Age,na.rm=TRUE)

# The sd, range, and quantile fuctions are self-explanatory.
# The summary function gives you the mean and quantiles combined.
# Use the table function to get counts for categorical data.
sd(crd$Age)
range(crd$Age)
quantile(crd$Age)
quantile(crd$Age,probs=c(0.1,0.9))
summary(crd$Age)
table(crd$Sex)

# Pop quiz #2, what is the range for height?
# Pop quiz #3, how many smokers are there?

# Again, you must always be careful about missing values.
# You can specify different methods for handling missing
# values in the table function with the useNA argument.
table(crd$Sex,useNA=”always”)

# Pop quiz #4, look at the help file for table. What are the
# other options for the useNA argument?

# Since all statistical analyses in R is done through
# functions, you can store the results of that function
# and re-use it.
age.mn <- mean(crd$Age)
age.sd <- sd(crd$Age)
ti <- paste(“The mean age is”,age.mn,”+/-”,age.sd)
hist(crd$Age,main=ti)

# Here’s an example of a more advanced analysis.
# The lm function fits a linear model, which in
# this example is equivalent to simple linear
# regression. Store the results of lm because you
# can then use it in several different ways.

lm.mod1 <- lm(FEV~Height,data=crd)

# Pop quiz #5, look at the help file for the lm
# function. What is one possible option for handling
# missing values?

# You can get a quick idea of what is in lm.mod1 by
# printing it.
print(lm.mod1)

# The summary function gives more details
summary(lm.mod1)

# The anova function gives a different display than summary.
anova(lm.mod1)

# The resid function produces residuals and the predict function
# produces the predicted values. Use head to avoid printing out
# all 654 values.
head(resid(lm.mod1))
head(predict(lm.mod1))

# There is quite a bit stored in lm.mod1, and you can explore it
# by first finding out the names of everything within it.
names(lm.mod1)

# This is an example of re-using information in lm.mod1 in order
# to enhance the information in your plot. First you should store
# the coefficients of the regression equation for later re-use.
co.mod1 <- lm.mod1$coefficients
b0 <- co.mod1[1]
b1 <- co.mod1[2]
r0 <- round(b0,1)
r1 <- round(b1,2)

plot(crd$Height,crd$FEV)
# abline draws a reference line with a given intercept and slope.
abline(a=b0,b=b1)
# you can also insert the equation into the title of the graph.
ti <- paste(“FEV =”,r0,”+”,r1,”* Height”)
title(ti)

# The lm function illustrates some of the object-oriented features
# of the R programming language–specifically the use of classes and
# methods. lm produces an object of class “lm”. In contrast, the
# nlm function (non-linear model) produces an object of class “nlm”.
# The predict function has a method for lm objects and a different
# method for nlm objects. The object oriented nature of R takes some
# getting used to, but once you get over the initial learning curve,
# it actually makes your programming life a lot simpler.

# end of file

Here are the two graphs produced by this output.