Recommended: Published methodological quality of randomized controlled trials does not reflect the actual quality assessed in protocols

When evaluating a series of research articles, you often have to assess the quality of the individual papers based on the type of blinding, for example. What do you do if the paper does not discuss these items? I have usually advocated a “no news is bad news policy.” If a paper does not mention blinding, assume that no blinding was done. It seems reasonable, but the paper by Mhaskar et al provides empirical evidence that sometimes authors leave out information that would strengthen the credibility of their study. A similar paper is at https://www.ncbi.nlm.nih.gov/pubmed/22424985 Continue reading

PMean: By the skin of my teeth

I have to brag a bit. I’m working part-time at Kansas University Medical Center (along with a couple other part-time jobs) and my boss asked me two weeks ago if I was interested in writing a paper on the data analyses I had been working on. It would be submitted to the AMIA 2017 Joint Summit on Translational Research and I’d be the first author. Continue reading

PMean: Turning off large blocks of an R Markdown document

When you’re running a large and complicated program using R Markdown, you can use the CACHE option to save a lot of time. CACHE will notice if a program chunk has stayed the same and avoid running it again. I tend to avoid using the CACHE option, though, because sometimes it fails to execute something that you want executed, even though it looks on the surface like nothing has changed. So I created some simple program chunks that allow me to explicitly turn off parts of the R Markdown program that I don’t need to evaluate at the time. Think of it as a manual cache.

It’s a very simple thing, but one which confounded me for a while, so I am writing about it here. That way I won’t forget six months down the road. Continue reading

Recommended: Diverse Perspectives on a Flipped Biostatistics Classroom

This article is a synthesis of a panel discussion at the 2014 Joint Statistical Meetings on the flipped classroom. The article discusses it solely from the perspective of Biostatistics classes, though they offer some references for the flipped classroom in a more general setting. A flipped classroom is a course where the traditional didactic lectures are recorded and watched at home and the homework that would normally be done at home is done instead in the classroom. This homework in a Biostatistics class often takes the form of active learning in small groups, such as critiquing published research studies or conducting analyses on real world data sets. The key component, according to the authors, is the in class interactions during these assignments. Students learn from each other as they work in groups.

Now you could do active learning in a traditional course format. What a flipped classroom does is increases the emphasis and the amount of time spent in active learning.

The common theme of the paper is that the flipped classroom has been successfully applied in a variety of settings. It is not a “one size fits all” approach, but rather can be adapted to the needs of the particular class. Some students may not like the flipped classroom format, and you shouldn’t underestimate the amount of time needed to prepare the videotaped lectures (one rule of thumb is ten hours of work for every hour of video). Still the student reactions and the instructors perceptions of the flipped classroom are generally positive. Continue reading

PMean: Merging in dplyr is a lot faster

At the Joint Statistics Meetings, I found out that the advantages of some of the new libraries for data manipulation (like dplyr and tidyr) go beyond just the flexibility of the new methods of data manipulation. These libraries produce code that is easier to read and which also runs a lot faster. I did not appreciate how much faster until I tried a test today. Continue reading

Recommended: What I need from statisticians

This interview with Nate Silver was conducted shortly after his keynote address at the 2013 Joint Statistical Meetings. I was at those meetings, but was stuck in a class (a very good class by the way, but I still felt stuck) on software engineering for statisticians. This article summarizes the main points of Mr. Silver’s keynote address and adds some extra insights through an interview after the speech. The best part was the quote at the end.

When asked that “Data science is the term of the day. Do you think there is a difference between data science and statistics? Silver replied, “I think data-scientist is a sexed up term for a statistician”, the reaction from the audience was for most, one of instantaneous laughter and applause. “Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.”

If Nate Silver can say something this controversial, then maybe I shouldn’t be so bashful. Continue reading

Recommended: The FAIR Guiding Principles for scientific data management and stewardship

I’ve always been supportive of efforts to share data. For me, it’s a bit selfish, because I want to find interesting real world examples to use in teaching and on my web pages. But the issue goes way beyond this, of course. Sharing data is an ethical imperative, especially for federally funded research or research that relies on volunteer subjects. It has led to many important discoveries beyond the realm of the original context in which the data was collected. In order for data sharing to be effective, you need to embrace four guiding principles: your data needs to findable, accessible, interoperable, and re-usable. This paper highlights those principles and offers some current examples of data sharing systems.

Continue reading

Recommended: 10 Easy Steps to a Complete Understanding of SQL

This page outlines some of the fundamental properties of SQL programming that you need to know as you start learning SQL. For example, SQL is a declarative language, meaning that you tell it what you want and not how to compute it. Also SQL syntax is not well-ordered, meaning that the order in which SQL statements are evaluated is not the same as the order that they appear. Continue reading

Recommended: Tibbles (Tibbles are a modern take on data frames)

I’m an old dog R programmer who tends to rely on features of R that were available 10 years ago (an eternity for computers). But it’s time for this old dog to learn new tricks. One thing I need to use in my R programs is called a “tibble” (sometimes called a “tidy tibble”). It’s a minor but important improvement on data frames and many of the newer packages are using tibbles instead of data frames. Tibbles are available in the package, tibble. This web page offers a nice description of the improvements on tibbles. Continue reading