PMean: History of R

I’m helping to put together three separate classes, Basic data management and analysis with R [SAS / SPSS]. As part of these classes, I need to discuss the history of these programs, because understanding that history will help you better understand the strengths and weaknesses of each statistical package. Here’s a brief history of R.

R has its roots in a program called S. S was developed in a time when single letters were in vogue (as in the C programming language). The author of S, John Chambers, was a statistician at Bell Laboratories wrote several versions in the 1970s through the 1990s. This packages was intended for internal research use, but the code was freely available to anyone who was interested.

Two unique features of the S programming language were the use of functions rather than macros for extending the language and the introduction of object oriented features (classes, objects, and methods).

A nice history of the development of S was written by John Chambers and is available at http://cm.bell-labs.com/cm/ms/departments/sia/S/history.html.

A commercial adaptation of S was introduced by Statistical Sciences Corporation in the 1990s and became very popular. Through various mergers and buyouts, S+ has been marketed by Mathsoft, Insightful Software, and more recently Tibco Corporation.

About the same time,  Ross Ihaka and Robert Gentleman started an effort to produce an open source and freely distributed version of S, called R. Their publication:

Ross Ihaka and Robert Gentleman. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3):299-314, 1996. Available at https://www.stat.auckland.ac.nz/~ihaka/downloads/R-paper.pdf

outlined the features of the R programming language. The first major release of R (version 1.0.0) appeared in 2000. Soon R eclipsed S+ in popularity. One measure of the breadth of R’s impact was a New York Times article published in 2009.

Vance, Ashlee. Data Analysts Captivated by R’s Power. The New York Times, 2009 (January 6). Available at http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html.

There is a non-profit group, the R Foundation for Statistical Computing, that coordinates many of the efforts in the maintenance and development of the R programming language. Several commercial companies have piggybacked on R, including Revolution Analytics, which sells an enhanced version of R with capabilities for handling very large data sets.

One of the most popular features of R is the ease with which outside developers can extend the R language through libraries. Most of these libraries are available for free under and open source license at the Comprehensive R Archive Network (CRAN, available at various sites, including http://cran.us.r-project.org/). You can also find a major effort to develop freely available libraries for statistical analysis of genetic data through the Bioconductor project, available at http://www.bioconductor.org/.

R is an interactive programming language, but menu driven versions of R are available. The most notable of these is R Commander, available at http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/.

Update (August 19, 2014). The Revolutions Analytic blog posted a nice summary of a John Chambers talk on the history of S at the Use R! 2014 conference. That article has links to the slides (PDF format) of a 2006 talk (again on the history of S) by John Chambers, a video interview of John Chambers by Trevor Hastie, and a 1998 paper (PDF format) by Ross Ihaka on the past (!) and future of R presented at the Interface conference.