I’ve always been supportive of efforts to share data. For me, it’s a bit selfish, because I want to find interesting real world examples to use in teaching and on my web pages. But the issue goes way beyond this, of course. Sharing data is an ethical imperative, especially for federally funded research or research that relies on volunteer subjects. It has led to many important discoveries beyond the realm of the original context in which the data was collected. In order for data sharing to be effective, you need to embrace four guiding principles: your data needs to findable, accessible, interoperable, and re-usable. This paper highlights those principles and offers some current examples of data sharing systems.
Monthly Archives: August 2016
Recommended: 10 Easy Steps to a Complete Understanding of SQL
This page outlines some of the fundamental properties of SQL programming that you need to know as you start learning SQL. For example, SQL is a declarative language, meaning that you tell it what you want and not how to compute it. Also SQL syntax is not well-ordered, meaning that the order in which SQL statements are evaluated is not the same as the order that they appear. Continue reading
Recommended: Tibbles (Tibbles are a modern take on data frames)
I’m an old dog R programmer who tends to rely on features of R that were available 10 years ago (an eternity for computers). But it’s time for this old dog to learn new tricks. One thing I need to use in my R programs is called a “tibble” (sometimes called a “tidy tibble”). It’s a minor but important improvement on data frames and many of the newer packages are using tibbles instead of data frames. Tibbles are available in the package, tibble. This web page offers a nice description of the improvements on tibbles. Continue reading
PMean: Changing the font size in RStudio
Suppose you’re giving a talk and using R Studio. You want to make the fonts a bit larger so your audience can read them. It’s easy to do, once you know where to look. Continue reading
PMean: Changing the font size in R
This is one of those obvious things that’s not obvious when you need it most. Suppose I’m doing a demo of R for a group like our wonderful Kansas City R Users Group. I want to have a readable sized font. Here’s how you do it. Continue reading
Recommended: dplyr and pipes: the basics
One of the recent developments in R that I was unaware of until I attended some talks at the Joint Statistical Meetings was the use of dplyr and pipes. It’s an approach to data management that isn’t different from earlier approaches, but the code is much easier to read and maintain. This blog post explains in simple terms how these work and why you would use them. Continue reading
Recommended: Hadley Wickham, the Man Who Revolutionized R
Hadley Wickham has written many popular R packages, so many that they are sometimes referred to as the “Hadleyverse.” This is a nice biography that emphasizes the impact that Dr. Wickham has had on R. Continue reading
PMean: Bad examples of data analysis are bad examples to use in teaching
I’m on various email discussion groups and every once in a while someone sends out a request that sounds something like this.
I’m teaching a class (or running a journal club or giving a seminar) on research design (or evidence based medicine or statistics) and I’d like to find an example of a research study that use bad statistical analysis.
And there’s always a flood of responses back. But if I were less busy, I’d jump into the conversation and say “Stop! Don’t do it!” Here’s why. Continue reading
Recommended: The Importance of Reproducible Research in High-Throughput Biology
I have not viewed this video yet, but have attended a similar talk and read a similar research paper by Keith Baggerly. His general message is that large biological and genetic experiments are sometimes designed so poorly as to invalidate the results. You can often discover these design flaws through a careful examination of the data sets themselves and their metadata. This process of uncovering design flaws is sometimes called “Forensic Statistics.” Continue reading
Recommended: Enrichment design studies should enhance signals of effectiveness.
I noticed several talks at theĀ JSM 2016 on enrichment designs. I was only very vaguely familiar with what this meant, so I did a quick Google search. I found this very nice non-technical overview. Continue reading