Category Archives: Recommended

Recommended: Data Sharing Network (SHRINE)

I’m ginvg a talk about i2b2 (among other things) and when browsing through their website, I cam across an interesting project, SHRINE. This is an acronym for Shared health Research Informatics NEtwork., and represents a way of allowing users to review information across multiple i2b2 sites. It requires the individual institutions who have i2b2 systems to cooperate with one another, which is not always easy. But this has tremendous potential. Continue reading

Recommended: TinyTeX: A lightweight, cross-platform, portable, and easy-to-maintain LaTeX distribution based on TeX Live

I’ve been using a version of LaTeX (MikTeX) for a couple of years, and it’s not bad. But when I heard about Yihui Xie’s R package, tinytex, I jumped at the opportunity to try it. Dr. Xie is the author of knitr, a package that makes it easy to create well documented R programs where the code and the output are gracefully merged. He created this new package, tinytex, because he felt that the current versions of LaTex had complex installation processes and forced you to choose between a minimal installation that couldn’t do anything useful and a full installation that was bloated with features you’d never use. I can’t say too much about the package yet except that he is right in that it is very easy to install. If I find out more, I’ll let you know. Continue reading

Recommended: EuSpRIG horror stories.

There has been a lot written about data management problems with using spreadsheets, and there is a group the European Spreadsheet Risks Interest Group that has documented the problem carefully and thoroughly. This page on their website lists the big, big, big problems that have occurred because of spreadsheet mistakes. Any program is capable of producing mistakes, of course, but spreadsheets are particularly prone to errors for a variety of reasons that this group documents. Continue reading

Recommended: The Reinhart-Rogoff error – or how not to Excel at economics

There has been a lot written about how lousy Microsoft Excel (and other spreadsheet products) are at data management, but the warning sinks in so much more effectively when you can cite an example where the use of Excel leads to an embarrassing retraction. Perhaps the best example is the paper by Carmen Reinhart and Peter Rogoff where a major conclusion was invalidated when a formula inside their Excel spreadsheet accidentally included only 15 of the relevant 20 countries. Here’s a nice description of that event and some suggestions on how to avoid this in the future. Continue reading

Recommended: Good Publication Practice for Communicating Company-Sponsored Medical Research: GPP3

Very little of my research fits into the category of company-sponsored medical research, but it is important to be aware of the special concerns and the extra oversight that this research requires. This article cover a consensus standard of guidelines that make a lot of sense, in my opinion, to avoid some of the recent controversies about research abuses. It is also a pretty good guideline, for the most part, for other medical research beyond company-sponsored research. Continue reading

Recommended: Textbook Examples Applied Survival Analysis

I’m teaching an online workshop for The Analysis Factor on survival analysis. It’s not announced yet, and I have a LOT of work to do before it is ready. One thing that will save me time is that I am taking many of my examples from the excellent textbook, Applied Survival Analysis Second Edition. One nice perk of this book is that the helpful folks at UCLA have taken every textbook example, and written up code (with comments!) to reproduce the book’s results. With the exception of a few advanced methods in later chapters, where only one or two software packages have the right capability, the code is written in parallel in R, SAS, SPSS, and Stata. They also have links to the raw data at the publishers website, and datasets stored in SAS format and SPSS format. How nice! Browse around and you’ll find software code for all the examples in other popular statistics textbooks as well.

Warning! The R examples look like they are from the first edition, not the second edition. A small nitpick for an otherwise very nice resource. Continue reading

Recommended: Getting Started with the SAS 9.4 Output Delivery System

I don’t use SAS that much anymore. Not because it’s a bad program. Mostly it’s because it’s hard to keep on top of too many statistical packages all at once. But I’m teaching an Introduction to SAS class this semester, and I need to keep up with recent innovations. One of the more important of these is ODS, which is short for Output Delivery System. ODS allows you to customize the output using formats like HTML, RTF, PDF, or PostScript. ODS also produces PowerPoint and Excel files.

ODS also allows you to customize how your output appears. Finally, ODS makes some big changes to procedures that used to only produce printed output. With ODS enabled, these procedures will add in extra high resolution plots, which you can also customize.

I do not know if the Introduction to SAS class should incorporate ODS or not. It’s similar to asking if the Introduction to R class should incorporate markdown documents or not. In general, I tend to think that we should teach plain vanilla versions of SAS and R, but I do worry that we may be missing something important if we don’t teach ODS or markdown. Continue reading