While researchers often use data from health insurance systems to conduct observational studies, the authors of this research paper point out that you can also conduct randomized trials as well. You can randomly assign different levels of insurance coverage and then get claims data to evaluate how much difference there is, if any, in the levels of coverage. This approach is attractive because you do not need a lot of resources, and you can very quickly get a very large sample size. Since insurance data is collected for administrative needs rather than research needs, you have to contend with inaccurate or incomplete data, potentially causing loss of statistical efficiency or producing biased results. The authors offer some interesting examples of actual studies, propose new potential studies, and offer general guidance on how to conduct a randomized trial from health insurance systems. Continue reading
Through the effort of a team of statisticians with the American Statistical Association, the New York Times is producing a new resource for educators called “What’s Going On in This Graph?”. This is similar to another New York Times effort called “What’s Going On in This Picture?”
Every month the New York Times will publish a graph stripped of some key information and ask three questions: What do you notice? What do you wonder? and What do you think is going on in this graph?
The content will be suitable for middle school and high school students, but I suspect that even college students will find the exercise interesting.
The first graph will appear on September 19 and on the second Tuesday of every month afterwards. Continue reading
This is a nice example of using R for text mining of twitter feeds, and the author gives lots of links and hints on how you could do something similar. Continue reading
There is more than one way to approach a data analysis and some of the ways lead to easier modifications and updates and help make your work more reproducible. This paper talks about steps that they recommend based on years of teaching software carpentry and data carpentry classes. Continue reading
I’ve been looking for something like this for a while. It is a repository for data sets associated with peer-reveiwed publicattions. I have only glanced at it briefly, but it looks fairly easy to use with a fair number of interesting data sets/publications. Continue reading
This is a nice summary about the prosecution of a statistician, Andreas Georgiu, who was only doing his job. Continue reading
I’ve not had a chance to test this code, but it looks pretty good for anyone who might want to analyze one of the dozens of large databases produced by the U.S. Government. Continue reading
I attended a talk about a decade ago on the problems with for-profit publishing of scientific research and the need to aggressively adopt the open source publication model. It was a message I was ready for, because I had benefited greatly from citing open source resources on my website. I knew that if I cited an open source resource, anyone anywhere could look up that resource. They didn’t need access to a University Library.
This article explains how the for-profit research journals (perhaps better described as a reader-pays model, in contrast to an author-pays model) developed a system that locked in research libraries to their product and then hiked the price. Then they developed journal bundles that further squeezed libraries by forcing them into a take-it-all-or-leave-it-all system that devastated their budgets.
There is still a struggle between the reader-pays model of for-profit publishing and the author-pays model of open source publishing, and I believe there is room for both approaches, though I would argue that we need to promote open source publishing more aggressively than we currently are doing.
This article provides a very nice historical context to the development of for-profit publishing in scientific research. It oversimplifies things, perhaps, and may be a bit too harsh, but it is definitely worth reading.
As an ironic footnote, newspapers have been devastated by the Internet because of the expectations of readers that all of their content should be available for free. There is a note at the bottom of the Guardian article that reads: “Since you’re here we have a small favour to ask. More people are reading the Guardian than ever but advertising revenues across the media are falling fast. And unlike many news organisations, we haven’t put up a paywall – we want to keep our journalism as open as we can. So you can see why we need to ask for your help. The Guardian’s independent, investigative journalism takes a lot of time, money and hard work to produce. But we do it because we believe our perspective matters – because it might well be your perspective, too.”
Take some time to read this and think about it. I normally ignore pitches like this on Wikipedia and elsewhere, but the irony of citing a newspaper article available for free to criticize for-profit research publishing got to me, so I became a supporter of the Guardian at $6.99 per month.
This is yet another interesting source of data. This site specializes in databases prepared by the United States government. Continue reading