Tag Archives: Statistical computing

Recommended: The history of Hadoop

If you want to understand big data, you need to understand Hadoop. Hadoop is the technology underlying many big data efforts. But most of the descriptions of Hadoop are jargon laden and impenetrable to newcomers. Well, maybe just impenetrable to this newcomer. But one great revelation to me was a historical note as to WHY there was a need to develop Hadoop. It was all those pages that had to be indexed by search engines at Google and Yahoo. So I went out to try to find more details. This article, with a ton of references throughout, is an excellent introduction to the precursors to Hadoop, the development of Hadoop itself, and the explosion of systems that used Hadoop as their foundation. Continue reading

PMean: Using version control through git, github, and R Studio

I’m definitely “old school” when it comes to programming, but there comes a time when even this old dog needs to learn a new trick. I decided yesterday to use version control for my own R programs. Nothing for clients, mind you, because of confidentiality concerns, but the R code that I use to develop teaching examples is certainly fair game. I’m not totally clueless on version control because of my work for the Greater Plains Collaborative, but it’s a different thing to do it totally by yourself. Here’s a brief outline of what I needed to do to get version control up and running. Continue reading

PMean: History of SPSS

I’m helping to put together three separate classes, Basic data management and analysis with R [SAS / SPSS]. As part of these classes, I need to discuss the history of these programs, because understanding that history will help you better understand the strengths and weaknesses of each statistical package. Here’s a brief history of SPSS. Continue reading