I’ve always been supportive of efforts to share data. For me, it’s a bit selfish, because I want to find interesting real world examples to use in teaching and on my web pages. But the issue goes way beyond this, of course. Sharing data is an ethical imperative, especially for federally funded research or research that relies on volunteer subjects. It has led to many important discoveries beyond the realm of the original context in which the data was collected. In order for data sharing to be effective, you need to embrace four guiding principles: your data needs to findable, accessible, interoperable, and re-usable. This paper highlights those principles and offers some current examples of data sharing systems.
This page outlines some of the fundamental properties of SQL programming that you need to know as you start learning SQL. For example, SQL is a declarative language, meaning that you tell it what you want and not how to compute it. Also SQL syntax is not well-ordered, meaning that the order in which SQL statements are evaluated is not the same as the order that they appear. Continue reading
I’m an old dog R programmer who tends to rely on features of R that were available 10 years ago (an eternity for computers). But it’s time for this old dog to learn new tricks. One thing I need to use in my R programs is called a “tibble” (sometimes called a “tidy tibble”). It’s a minor but important improvement on data frames and many of the newer packages are using tibbles instead of data frames. Tibbles are available in the package, tibble. This web page offers a nice description of the improvements on tibbles. Continue reading
Suppose you’re giving a talk and using R Studio. You want to make the fonts a bit larger so your audience can read them. It’s easy to do, once you know where to look. Continue reading
One of the recent developments in R that I was unaware of until I attended some talks at the Joint Statistical Meetings was the use of dplyr and pipes. It’s an approach to data management that isn’t different from earlier approaches, but the code is much easier to read and maintain. This blog post explains in simple terms how these work and why you would use them. Continue reading
Hadley Wickham has written many popular R packages, so many that they are sometimes referred to as the “Hadleyverse.” This is a nice biography that emphasizes the impact that Dr. Wickham has had on R. Continue reading
I’m on various email discussion groups and every once in a while someone sends out a request that sounds something like this.
I’m teaching a class (or running a journal club or giving a seminar) on research design (or evidence based medicine or statistics) and I’d like to find an example of a research study that use bad statistical analysis.
And there’s always a flood of responses back. But if I were less busy, I’d jump into the conversation and say “Stop! Don’t do it!” Here’s why. Continue reading
I have not viewed this video yet, but have attended a similar talk and read a similar research paper by Keith Baggerly. His general message is that large biological and genetic experiments are sometimes designed so poorly as to invalidate the results. You can often discover these design flaws through a careful examination of the data sets themselves and their metadata. This process of uncovering design flaws is sometimes called “Forensic Statistics.” Continue reading
I noticed several talks at the JSM 2016 on enrichment designs. I was only very vaguely familiar with what this meant, so I did a quick Google search. I found this very nice non-technical overview. Continue reading