Monthly Archives: August 2016

Recommended: 10 Easy Steps to a Complete Understanding of SQL

This page outlines some of the fundamental properties of SQL programming that you need to know as you start learning SQL. For example, SQL is a declarative language, meaning that you tell it what you want and not how to compute it. Also SQL syntax is not well-ordered, meaning that the order in which SQL statements are evaluated is not the same as the order that they appear. Continue reading →

Recommended: Tibbles (Tibbles are a modern take on data frames)

I’m an old dog R programmer who tends to rely on features of R that were available 10 years ago (an eternity for computers). But it’s time for this old dog to learn new tricks. One thing I need to use in my R programs is called a “tibble” (sometimes called a “tidy tibble”). It’s a minor but important improvement on data frames and many of the newer packages are using tibbles instead of data frames. Tibbles are available in the package, tibble. This web page offers a nice description of the improvements on tibbles. Continue reading →

PMean: Changing the font size in RStudio

Suppose you’re giving a talk and using R Studio. You want to make the fonts a bit larger so your audience can read them. It’s easy to do, once you know where to look. Continue reading →

PMean: Changing the font size in R

This is one of those obvious things that’s not obvious when you need it most. Suppose I’m doing a demo of R for a group like our wonderful Kansas City R Users Group. I want to have a readable sized font. Here’s how you do it. Continue reading →

Recommended: dplyr and pipes: the basics

One of the recent developments in R that I was unaware of until I attended some talks at the Joint Statistical Meetings was the use of dplyr and pipes. It’s an approach to data management that isn’t different from earlier approaches, but the code is much easier to read and maintain. This blog post explains in simple terms how these work and why you would use them. Continue reading →

Recommended: Hadley Wickham, the Man Who Revolutionized R

Hadley Wickham has written many popular R packages, so many that they are sometimes referred to as the “Hadleyverse.” This is a nice biography that emphasizes the impact that Dr. Wickham has had on R. Continue reading →

PMean: Bad examples of data analysis are bad examples to use in teaching

I’m on various email discussion groups and every once in a while someone sends out a request that sounds something like this.

I’m teaching a class (or running a journal club or giving a seminar) on research design (or evidence based medicine or statistics) and I’d like to find an example of a research study that use bad statistical analysis.

And there’s always a flood of responses back. But if I were less busy, I’d jump into the conversation and say “Stop! Don’t do it!” Here’s why. Continue reading →

Recommended: The Importance of Reproducible Research in High-Throughput Biology

I have not viewed this video yet, but have attended a similar talk and read a similar research paper by Keith Baggerly. His general message is that large biological and genetic experiments are sometimes designed so poorly as to invalidate the results. You can often discover these design flaws through a careful examination of the data sets themselves and their metadata. This process of uncovering design flaws is sometimes called “Forensic Statistics.” Continue reading →

Recommended: Enrichment design studies should enhance signals of effectiveness.

I noticed several talks at the JSM 2016 on enrichment designs. I was only very vaguely familiar with what this meant, so I did a quick Google search. I found this very nice non-technical overview. Continue reading →