I have not had time to preview this software, but it looks very interesting, It takes large problems and converts them to a form for parallel processing, not by changing the underlying algorithm, which would be very messy, but by splitting the data into subsets, analyzing each subset, and recombining these results. Such a method “Divide and Recombine” should work well for some analysis, but perhaps not so well for others. It is based on the R programming language. If I get a chance to work with this software, I’ll let you know what I think. Continue reading
Category Archives: Recommended
Recommended: Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement
If you are writing up a paper that uses a complex regression model (complex meaning multiple independent variables), you need to document information that allows the reader to assess the quality of the predictions that your model would produce. This paper provides a checklist of things that you need to document in such a paper, and is an extension of the CONSORT guidelines to this particular type of research. Continue reading
Recommended: In search of justification for the unpredictability paradox
This is a commentary on a 2011 Cochrane Review that found substantial differences between studies that were adequately randomized and those that were not adequately randomized. The direction of the difference was not predictable, however, meaning that there was not a consistent bias on average towards overstating the treatment effect or a consistent bias on average towards understating the treatment effect. This leads the authors of the Cochrane review to conclude that “the unpredictability of random allocation is the best protection against the unpredictability of the extent to which non-randomised studies may be biased.” The authors of the commentary provide a critique of this conclusion on several grounds. Continue reading
Recommended: Requiring fuel gauges. A pitch for justifying impact evaluation sample size assumptions
This blog entry from the International Initiative for Impact Evaluation talks about the deficiency in many research proposals sent to that organization. They rely too much on standardized effect sizes, which are impossible to interpret and often misleading. The authors also criticize the Intraclass Correlation Coefficients (ICCs) that are included in the sample size justification for many cluster based or hierarchical research designs. The ICCs, they say, often seem to be pulled out of thin air. It is a hard number to get sometimes and they suggest that you consider a range of ICCs in your calculations or that you run a pilot study. Continue reading
Recommended: What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum
Tim Hesterberg has been a long time advocate of the use of the bootstrap. In this article, he provides a nice general overview of the bootstrap with examples of how it works in several common settings that might be covered in an introductory level statistics class. Continue reading
Recommended: Communicating Statistical Findings to Consulting Clients Operating in a Decisionmaking Climate: Best and Worst Practices
There were a large number of excellent talks at the 2014 Joint Statistics Meeting. This session discusses practical issues associated with communication. Although I did not attend this session, it looks pretty good and the speakers have all placed their slides in a single location. Continue reading
Recommended: Special issue–Using Big Data to Transform Care
The July 2014 issue of Health Affairs is devoted entirely to “big data”. The articles provide a general overview to big data, several applications of big data, big data and genomics, use of electronic health records, and ethical issues including privacy concerns. For now, at least, the articles are available for free to any user. Continue reading
Recommended: MLPowSim software
This site provides description of a free software package, MLPowSim, that calculates power for complex random effects models. It was developed by the Centre for Multilevel Modelling, the same group that developed the LMwiN package for analysis of complex random effects models. Continue reading
Recommended: Comparisons within randomised groups can be very misleading
In studies with a baseline, examining the decline exclusively within the treated group, or examining the decline in the treated group and then separately examining the decline in the control group is a bad idea, notes two famous statisticians in the British Medical Journal. They explain why you need to look first at comparisons between the two groups, ideally with analysis of covariance. Continue reading
Recommended: FDA: R OK for drug trials
This blog post reviews a presentation by Jae Brodowsky, a statistician with the U.S. Food and Drug Administration that put to bed the rumor that FDA will only accept submissions where the data analysis was done by SAS. The summary does mention that FDA has certain regulatory requirements for R (or any other statistical package, including SAS). Continue reading