I am teaching a class, Introduction to R (MEDB 5505). Here is the syllabus for Fall Semester 2017.
MEDB 5505: Introduction to R
A working knowledge of statistical software is a vital skill for anyone involved in quantitative research. This class will introduce data management, simple descriptive statistics, and basic graphical display using the R software package. Students will develop the fundamental skills needed to prepare data sets for analysis, and to conduct simple descriptive and graphic analyses and report those analyses.
1.1 Course content
This on-line course is intended to provide a working familiarity with R. Students are not expected to have advanced programming or statistical analysis skills. A basic understanding of statistical terminology is necessary. The class will introduce basic methods for data import, data management, simple graphics, and basic statistical analysis. This class will not cover advanced statistical methods, but will provide you with a firm foundation to address these areas in your statistics classes or in your thesis/dissertation research.
1.2 Student learning objectives
At the completion of this course, students will be able to:
- Prepare and manipulate datasets for analysis in R.
- Conduct simple descriptive and graphic analyses of data in R.
- Prepare a report with a summary of analyses conducted in R.
1.3 Course framework
This class will be taught as a self-paced, asynchronous online course. The instructor will provide datasets and instructions for running various functions within R. You will apply these functions to import datasets, manipulate these datasets, and produce basic summaries of these datasets. As the final project of the class, you will produce an independent analysis on a dataset of your own choice that demonstrates the techniques and skills that were covered in the course.
Steve Simon, PhD
Department of Biomedical and Health Informatics
School of Medicine, M5-117
I am very grateful for the work of Karen Williams and Mary Gerkovich who helped develop this class along with two others: MEDB 5506, Introduction to SPSSR; and MEDB 5507, Introduction to SAS. Over time, I plan to update the materials in this class to keep it more closely aligned with the other two classes. These updates will be optional viewing for Fall Semester 2017.
My email contact information is at http://www.pmean.com/contact.html. The preferred email address is the UMKC account, but any of these email addresses are fine. I do not check email on my days off from work, so please don’t worry if you don’t get an immediate response. If you have not heard back by 48 hours, please feel free to contact me again.
You are also welcome to call me. My office number is 816-235-6617 and my cell phone number is 913-912-2076.
2.3 Discussion forum
Before you contact me by email or telephone, consider posting your problem or question on the Blackboard discussion board. Others will benefit from the exchange. You are welcome, however, to discuss problems and ask questions via email or telephone if you prefer.
2.4 Office hours
You can get help for many of your questions by email, but sometimes a face-to-face appointment is needed. I am part-time at UMKC and hold two other part-time jobs, so I cannot hold regular office hours. I am more than happy to meet with you face-to-face by appointment. Because of child care responsibilities, I generally cannot meet prior to 9am (10am on Thursdays) and I have to leave most days by 2pm. I have a lot more flexibility for meeting via telephone.
3. Class structure
This class will be taught as a self-paced, asynchronous online course. The course material is presented in five parts, with 4 assignments that will be submitted to the instructor. In order to achieve learning objectives, students are expected to review provided course materials, including recorded lectures.
During the lectures, the instructor will provide examples to demonstrate application of R. Students should open R on their own and replicate the work shown in the video. Then students will be asked to conduct similar work on a different dataset and turn that in as homework. Students need to download R (as described in an early video) and optionally, RStudio. Both R and RStudio are free.
3.1 Student projects
Students will be presented with either instructions on using R functions and/or syntax files and be shown how to apply those to a provided dataset. After replicating the work shown in the R video, students will apply these skills to a different provided dataset. At the end of each class section, students will turn in the produced output and a brief written interpretation. The interpretation is very important. Output without any interpretation will be returned. For the final project, students will use a dataset of their own choosing, import that dataset, manipulate it as needed, and produce a statistical report using at least one graphical display and at least one descriptive statistical method. This final project will include a written explanation of the results. The dataset you work with for this final project should include at least four variables of which two variables will be measured using continuous data and two variables measured using categorical data.
This course will be graded as Pass / Fail. In order to receive a passing grade, the student must successfully complete the four assignments, plus the analysis of an independent data set.including the assignment for the final day that includes work using your own data set. This work must be completed and submitted to the instructor by the last day of the semester, Friday, December 8, 2017, in order to get credit for the course.
If a student feels that he/she has been unfairly graded, information on the appeal process can be found in the academic regulations information (http://www.umkc.edu/catalog/Procedure_for_Appeal_of_Grades.html).
There is NO required textbook for this class. However, the following books are possible resources you might want to purchase for your future work with R:
- William N. Venables, David M. Smith, and the R Core Development Team. An Introduction to R, Second Editon. Available in book form or as a free PDF file at https://cran.r-project.org/doc/manuals/R-intro.pdf.
- Peter Dalgaard. Introductory Statistics with R.
4.2 R software
You should download a copy of R from https://cran.r-project.org/. This is fairly easy, but you can review the steps for downloading in one of the early videos for this class. If you have any trouble downloading R, please contact me.
RStudio is an integrated development environment for the R programming language. Use of RStudio is encouraged, but it is not a requirement. You can download RStudio at https://www.rstudio.com/. This is also fairly easy, but you can review the steps in one of the early videos for the class.
Some of the files that you need will be stored at my github site, https://github.com/pmean. There are several ways that you can access these files, and this is also described in an early video. If you have any trouble getting files stored at github, send me an email and I can give you those files as an email attachment.
4.5. Web Sites
All of the material you need for this class will be availabe on Blackboard. Here are some optional websites that you can use to supplement your learning.
- http://www.ats.ucla.edu/stat/. Institute for Digital Research and Education, UCLA. This website provides comprehensive guidance on the use of R (as well as other statistical software). Highly recommended.
- https://www.r-bloggers.com/. A mash up of blog posts about R from 750 different R bloggers. The posts are a mix of beginning, intermediate, and advanced levels. The quality of the posts is uniformly high.
- http://blog.pmean.com/tag/r-software/. My own blog, by contrast is of uneven quality. The blog covers a range of topics, but I tag and R related posts.
4.6 Discussion board
Blackboard has a discussion forum and I would encourage you to describe any problems or post any questions to the discussion board. Other students will benefit from seeing your question and are welcome to post suggested solutions.
5. Course outline
For each part of the course, the instructor will provide a recorded lecture that shows a demonstration of R program execution. If you are trying to work in R at the same time as viewing the recorded lectures, it is recommended that you use a system that will allow you to follow along with the recordings and make notes as needed. The most effective system is using two monitors. This can be accomplished by playing the recording on one screen and doing other operations on the other screen.
All of the files and videos referenced below are available at http://www.pmean.com/15/r.html or at one of my github repositories.
Part 0. Installing R and optionally, RStudio
This section has two written handouts, a short quiz, and six videos (total viewing time: 42 minutes). The videos cover
- installing R,
- installing RStudio,
- installing git,
- running R commands,
- getting the files you need, and
- history of R.
Part 1. Introduction and data sets with mostly continuous variables.
This section has nine videos (total viewing time: 3 hours, 3 minutes). These videos cover
- Cleaning house
- Definitions of categorical and continuous variables
- Reading the body fat measurement dataset (read.table)
- head, tail, and names functions
- saving the R environment
- loading the R environment
- selecting specific rows and columns
- finding and modifying specific values
- missing values (NA)
- saving output
- correlation matrices
- adding regression line or smooth curve
- reading in comma separated values, space delimited files, and fixed format files.
Part 2. Data sets with mostly categorical variables.
This section has seven videos (total viewing time: 1 hour 54 minutes). These videos cover:
- reading in the titanic data set
- frequency counts (table)
- counting missing values
- recoding to a binary category
- recoding to a multi-level category
- odds ratios and risk ratios
Part 3. Mixture of categorical and continuous variables.
This section has four videos (total viewing time: 45 minutes). These videos cover:
- Reading in the fev dataset
- means by group
Part 4. Longitudinal data.
This section has four videos (total viewing time: 2 hours 19 minutes).
On Your Own Assignment
For the final project, use a dataset of your own choosing from those available through this course or a dataset of your own. Demonstrate your mastery of elements that have been covered in the course, including importing and manipulating the dataset as needed, and produce a brief statistical report using at least one graphical display and at least one descriptive statistical method. This final project should include a written explanation of the results. The dataset you work with for this final project should include at least four variables of which two variables will be measured using continuous data and two variables measured using categorical data.
6. Course expectations, course policies, requirements, and standards for student coursework and student behavior
Important UMKC Resources and Policies are applicable to every course and every student at UMKC. These are located in the Blackboard site for this course under the “UMKC Policies” tab. As a UMKC student, you are expected to review and abide by these policies. If you have any questions, please contact your instructor for clarification. In addition to the standard UMKC policies, the Department of Biomedical and Health Informatics includes self-plagiarism in their definition of plagiarism. Self-plagiarism is reuse, without prior discussion and consent of the course director, of an existing paper that has been submitted for credit in a different course.