Tag Archives: Statistical computing

PMean: Finding those weird characters

This page is moving to a new website.

PMean: A simple structure for documentation

This page is moving to a new website.

PMean: Grading rubric for computer assignments

This page is moving to a new website.

Recommended: The history of Hadoop

This page is moving to a new website.

If you want to understand big data, you need to understand Hadoop. Hadoop is the technology underlying many big data efforts. But most of the descriptions of Hadoop are jargon laden and impenetrable to newcomers. Well, maybe just impenetrable to this newcomer. But one great revelation to me was a historical note as to WHY there was a need to develop Hadoop. It was all those pages that had to be indexed by search engines at Google and Yahoo. So I went out to try to find more details. This article, with a ton of references throughout, is an excellent introduction to the precursors to Hadoop, the development of Hadoop itself, and the explosion of systems that used Hadoop as their foundation. Continue reading →

PMean: What greedy means to a geek

I’ve run across the term “greedy” in several work related contexts, so I thought it might be worth explaining what it means. Continue reading →

PMean: Minimum standards for a github repository

Using a github repository is new to me, and I need to document some things that may be obvious to more experienced programmers. Here are, as far as I understand it, the minimal documentation standards for a github repository. Continue reading →

PMean: A megabyte is not a million bytes

Sometimes you forget things. Here’s an example. Continue reading →

PMean: Using version control through git, github, and R Studio

I’m definitely “old school” when it comes to programming, but there comes a time when even this old dog needs to learn a new trick. I decided yesterday to use version control for my own R programs. Nothing for clients, mind you, because of confidentiality concerns, but the R code that I use to develop teaching examples is certainly fair game. I’m not totally clueless on version control because of my work for the Greater Plains Collaborative, but it’s a different thing to do it totally by yourself. Here’s a brief outline of what I needed to do to get version control up and running. Continue reading →

PMean: The data structure in i2b2

I’m working with the Greater Plains Collaborative on a research project and my work requires me to understand the underlying data structure of a system known as i2b2. It’s not a difficult data structure, but it is uncommon, so it is worthwhile to document what is going on. Continue reading →

PMean: The need for documentation standards

I’m working in a team programming environment and I would have to characterize the quality of the documentation as uneven. I’m going to make the case for having detailed documentation standards at a meeting tomorrow. Here’s the a general overview of what I will say. Continue reading →