Category Archives: Recommended

Recommended: Network analysis in cross-sectional data using R

This page is moving to a new website.

These are the slides for a very nice webinar presented by Eiko Fried. Dr. Fried provided a wealth of resources during his webinar (some of these are behind pay walls).

He offered examples of network analysis in the study of bereavement and depression and of post-traumatic stress disorder. He also provided tutorial papers on network models with binary data and regularized partial correlation networks., as well as a nice general overview of network models in mental health. He shared a blog posting on the relationship between a latent variable model and a network model and a facebook page on psychological dynamics. He also showed analyses from several R packages, qgraph, IsingFit, and bootnet. I’m putting those links here so I don’t lose track of them when I revisit this stuff six months from now.

Continue reading →

Recommended: How to use social media in your career

This page is moving to a new website.

This is a short overview of five major social media sites: LinkedIn, Twitter, Facebook, Instagram, and Snapchat and how you might use them to promote your career. The article ends with a few good overall suggestions.

Continue reading →

Recommended: Can A.I. be taught to explain itself

This page is moving to a new website.

This is a nice article in the popular press that talks about some of the problems with “black box” models (in particular deep neural nets) used extensively in many big data projects. It is a bit shy on technical details, which is understandable for a paper like the New York Times. Even so, the stories are quite intriguing. This is a wake up call for those people who fail to recognize the serious problems with many big data models. Continue reading →

Recommended: Databases using R

This page is moving to a new website.

This is a page outlining several related efforts at RStudio to make it seaier for you to work with data stored in various relational databases. Continue reading →

Recommended: Intro to SQL for Data Science

This page is moving to a new website.

This is a series of videos and homework exercises that you can work on at your own pace. I have only viewed the outline for this, but anything from DataCamp comes highly recommended. Continue reading →

Recommended: beanumber repository

This page is moving to a new website.

This is the github repository of Ben Baumer. He is one of the co-authors of “Modern Data Science with R” and the data and code from that book is available here. He also provides code and data for OpenWAR, an open source method for calculating a baseball statistic, Wins Above Replacement. Finally, there is an R library for extracting, transforming, and loading “medium” sized datasets into SQL. Medium here means multi-gigabyte sized files. Related to this are a couple of “medium” sized data sets from the Internet Movie Database and from the NYC CitiBike dataset. Continue reading →

Recommended: Teaching precursors to data science in introductory and second courses in statistics

This page is moving to a new website.

This paper talks about how to get students to think about large databases in an introductory class that normally uses “toy” problems with a few dozen rows of data. Continue reading →

Recommended: When the revolution came for Amy Cuddy

This page is moving to a new website.

This is one of the best articles I have ever read in the popular press about the complexities of the research process.

This article by Susan Dominus covers some high profile research by Amy Cuddy. She and two co-authors found that your body language not only influences how others view you, but it influences how you view yourself. Striking a “power pose” meaning something like a “legs astride or feet up on a desk” can improve your sense of power and control and these subjective feelings are matched by physiological changes, Your testosterone goes up and your cortisol goes down. Both of these, apparently, are good things.

The research team publishes these findings in Psychological Science, a prominent journal in this field. The article receives a lot of press coverage. Dr. Cuddy becomes the public face of this research, most notably by garnering an invitation to give a TED talk and does a bang-up job. Her talk becomes the second most viewed TED talk of all time.

But there’s a problem. The results of the Psychological Science publication do not get replicated. One of the other two authors expresses doubt about the original research findings. Another research team reviews the data analysis and labels the work “p-hacking”.

The term “p-hacking” is fairly new, but other terms, like “data dredging” and “fishing expedition” have been around for a lot longer. There’s a quote attributed to the economist Robert Coase that is commonly cited in this context, “If you torture the data long enough, it will confess to anything.” I have described it as “running ten tests and then picking the one with the smallest p-value.” Also relevant is this XKCD cartoon.

If p-hacking is a real thing (and there’s some debate about that), then it is a lot more subtle than the quotes and cartoon mentioned above. You can find serious and detailed explanations at a FiveThirtyEight web article by Christie Aschwanden and this 2015 PLOS article by Megan Head et al.

If p-hacking is a problem, then how do you fix it? It turns out that there is a movement in the research world to critically examine existing research findings and to see if the data truly supports the conclusions that have been made. Are the people leading this movement noble warriors for truth or are they shameless bullies who tear down peer-reviewed research in non-peer-reviewed blogs?

I vote for “noble warriors” but read the article and decide for yourself what you think. It’s a complicated area and every perspective has more than one side to it.

One of the noble warriors/shameless bullies is Andrew Gelman, a popular statistician and social scientist. He comments extensively about the New York Times article on his blog, which is also worth reading as well as many comments that others have made on his blog post. It’s also worth digging up some of his earlier commentary about Dr. Cuddy. Continue reading →

Recommended: Search for unpublished data by systematic reviewers: an audit

The authors looked at all systematic reviews (excluding methodological reviews) published in a few key journals as well as a random sample of Cochrane reviews to see how often the authors tried to search for unpublised data. The answer is not often enough (64% or 130/203). The article also describes the success rate in getting unpublished data when the attempt was made (89% or 116/130) and how often authors found evidence of publication bias when they did such an assessment (40% or 27/68). Although some people have argued that it is not that important to search for unpublised data, this is still a big concern. A closely related article is Searching for unpublished data for Cochrane reviews: cross sectional study. Continue reading →

Recommended: Get credit for your data – BMC Research Notes launches data notes

This is a new effort to get data out into the open for others to use. A data note can be on data that was not published or it could be an addendum describing data used in another publication. This is just getting started, but could end up being a great teaching resource. Continue reading →

PMean

A blog about statistics, evidence-based medicine, and research ethics

Category Archives: Recommended

Recommended: Network analysis in cross-sectional data using R

Recommended: How to use social media in your career

Recommended: Can A.I. be taught to explain itself

Recommended: Databases using R

Recommended: Intro to SQL for Data Science

Recommended: beanumber repository

Recommended: Teaching precursors to data science in introductory and second courses in statistics

Recommended: When the revolution came for Amy Cuddy

Recommended: Search for unpublished data by systematic reviewers: an audit

Recommended: Get credit for your data – BMC Research Notes launches data notes