Category Archives: Recommended

Recommended: beanumber repository

This is the github repository of Ben Baumer. He is one of the co-authors of “Modern Data Science with R” and the data and code from that book is available here. He also provides code and data for OpenWAR, an open source method for calculating a baseball statistic, Wins Above Replacement. Finally, there is an R library for extracting, transforming, and loading “medium” sized datasets into SQL. Medium here means multi-gigabyte sized files. Related to this are a couple of “medium” sized data sets from the Internet Movie Database and from the NYC CitiBike dataset. Continue reading

Recommended: When the revolution came for Amy Cuddy

This is one of the best articles I have ever read in the popular press about the complexities of the research process.

This article by Susan Dominus covers some high profile research by Amy Cuddy. She and two co-authors found that your body language not only influences how others view you, but it influences how you view yourself. Striking a “power pose” meaning something like a “legs astride or feet up on a desk” can improve your sense of power and control and these subjective feelings are matched by physiological changes, Your testosterone goes up and your cortisol goes down. Both of these, apparently, are good things.

The research team publishes these findings in Psychological Science, a prominent journal in this field. The article receives a lot of press coverage. Dr. Cuddy becomes the public face of this research, most notably by garnering an invitation to give a TED talk and does a bang-up job. Her talk becomes the second most viewed TED talk of all time.

But there’s a problem. The results of the Psychological Science publication do not get replicated. One of the other two authors expresses doubt about the original research findings. Another research team reviews the data analysis and labels the work “p-hacking”.

It turns out that there is a movement in the research world to critically examine existing research findings and to see if the data truly supports the conclusions that have been made. Are the people leading this movement noble warriors for truth or are they shameless bullies who tear down peer-reviewed research in non-peer-reviewed blogs.

I vote for “noble warriors” but read the article and decide for yourself what you think. It’s a complicated area and every perspective has more than one side to it.

One of the noble warriors/shameless bullies is Andrew Gelman, a popular statistician and social scientist. He comments extensively about the New York Times article on his blog, which is also worth reading as well as many comments that others have made on his blog post. It’s also worth digging up some of his earlier commentary about Dr. Cuddy. Continue reading

Recommended: Search for unpublished data by systematic reviewers: an audit

The authors looked at all systematic reviews (excluding methodological reviews) published in a few key journals as well as a random sample of Cochrane reviews to see how often the authors tried to search for unpublised data. The answer is not often enough (64% or 130/203). The article also describes the success rate in getting unpublished data when the attempt was made (89% or 116/130) and how often authors found evidence of publication bias when they did such an assessment (40% or 27/68). Although some people have argued that it is not that important to search for unpublised data, this is still a big concern. A closely related article is Searching for unpublished data for Cochrane reviews: cross sectional study. Continue reading

Recommended: OpenRefine: A free, open source, powerful tool for working with messy data

I have not had a chance to use this, but it comes highly recommended. OpenRefine is a program that uses a graphical user interface to clean up messy data, but it saves all the clean up steps to insure that your work is well documented and reproducible. I listed Martin Magdinier as the “author” in the citation below because he has posted most of the blog entries about OpenRefine, but there are many contributors to this package and website. Continue reading

Recommended: How to increase value and reduce waste when research priorities are set

This is the first in a series of articles on reducing waste in research. It focuses on funding agencies and recommends that funders should support more work on making research replicable, be more transparent on how they set priorities, make sure that research proposals are justified through a systematic review of previous research, and encourage greater openness of research in progress to encourage collagoration. Other articles in this series cover research design, conduct, and analysis, regulation and management, inaccessible research, and incomplete reports of research. Continue reading

Recommended: Randomized Controlled Trials in Health Insurance Systems

While researchers often use data from health insurance systems to conduct observational studies, the authors of this research paper point out that you can also conduct randomized trials as well. You can randomly assign different levels of insurance coverage and then get claims data to evaluate how much difference there is, if any, in the levels of coverage. This approach is attractive because you do not need a lot of resources, and you can very quickly get a very large sample size. Since insurance data is collected for administrative needs rather than research needs, you have to contend with inaccurate or incomplete data, potentially causing loss of statistical efficiency or producing biased results. The authors offer some interesting examples of actual studies, propose new potential studies, and offer general guidance on how to conduct a randomized trial from health insurance systems. Continue reading