This article talks about how bad the maternal mortality rates are in the United States and how bad our effort to try to quantify the rate is. Continue reading
This is a classic data set for testing out image analysis. You have a data set of 25,000 images which are labelled dogs or cats. This is easy for a human to do, but can you develop an algorithm that can tell the difference? Continue reading
If you are interested in text mining, this is a good data set to start with. It is a bunch of text messages, each one line long, that have been classified by a human as either spam or ham (ham is a legitimate message). Continue reading
I have to help write NIH grants from time to time, and I need to always keep front and center the criteria that NIH peer reviewers use when they evaluate grants. They look at five broad areas: significance, investigators, innovation, approach, and environment. This document explains what each of these five broad areas means. Continue reading
This xkcd cartoon by Scott Munro is open source, so I can hotlink the image directly. But if you go to the source, https://xkcd.com/327/, be sure to hover over the image for a second punch line.
I’m ginvg a talk about i2b2 (among other things) and when browsing through their website, I cam across an interesting project, SHRINE. This is an acronym for Shared health Research Informatics NEtwork., and represents a way of allowing users to review information across multiple i2b2 sites. It requires the individual institutions who have i2b2 systems to cooperate with one another, which is not always easy. But this has tremendous potential. Continue reading
This xkcd cartoon by Scott Munro is open source, so I can hotlink the image directly. But if you go to the source, https://xkcd.com/1179/, be sure to hover over the image for a second punch line.
I’ve been using a version of LaTeX (MikTeX) for a couple of years, and it’s not bad. But when I heard about Yihui Xie’s R package, tinytex, I jumped at the opportunity to try it. Dr. Xie is the author of knitr, a package that makes it easy to create well documented R programs where the code and the output are gracefully merged. He created this new package, tinytex, because he felt that the current versions of LaTex had complex installation processes and forced you to choose between a minimal installation that couldn’t do anything useful and a full installation that was bloated with features you’d never use. I can’t say too much about the package yet except that he is right in that it is very easy to install. If I find out more, I’ll let you know. Continue reading
What percentage effort is reasonable for Biostatistics support on a research grant? The UC Davis Biostatistics Group says 10% as a bare minimum, 35-60% for straightforward projects with uncomplicated analyses, and 50-100%+ for large or complex projects. They give examples of large and complex projects: interim analysis, multi-site projects, development of novel statistical methods, and assembly of data from large, complex, or poorly documented administrative or survey data sets.
They also describe how to split the effort between a PhD Biostatistician, who supervises the overall effort, and a MS Biostatistician, who does most of the data management and statistical analysis.
Another point worth noting is that any grant listing less than 10% effort for a Biostatistician requires a special sign off. Continue reading
There has been a lot written about data management problems with using spreadsheets, and there is a group the European Spreadsheet Risks Interest Group that has documented the problem carefully and thoroughly. This page on their website lists the big, big, big problems that have occurred because of spreadsheet mistakes. Any program is capable of producing mistakes, of course, but spreadsheets are particularly prone to errors for a variety of reasons that this group documents. Continue reading