I’m teaching an online workshop for The Analysis Factor on survival analysis. It’s not announced yet, and I have a LOT of work to do before it is ready. One thing that will save me time is that I am taking many of my examples from the excellent textbook, Applied Survival Analysis Second Edition. One nice perk of this book is that the helpful folks at UCLA have taken every textbook example, and written up code (with comments!) to reproduce the book’s results. With the exception of a few advanced methods in later chapters, where only one or two software packages have the right capability, the code is written in parallel in R, SAS, SPSS, and Stata. They also have links to the raw data at the publishers website, and datasets stored in SAS format and SPSS format. How nice! Browse around and you’ll find software code for all the examples in other popular statistics textbooks as well.
Warning! The R examples look like they are from the first edition, not the second edition. A small nitpick for an otherwise very nice resource. Continue reading
I don’t use SAS that much anymore. Not because it’s a bad program. Mostly it’s because it’s hard to keep on top of too many statistical packages all at once. But I’m teaching an Introduction to SAS class this semester, and I need to keep up with recent innovations. One of the more important of these is ODS, which is short for Output Delivery System. ODS allows you to customize the output using formats like HTML, RTF, PDF, or PostScript. ODS also produces PowerPoint and Excel files.
ODS also allows you to customize how your output appears. Finally, ODS makes some big changes to procedures that used to only produce printed output. With ODS enabled, these procedures will add in extra high resolution plots, which you can also customize.
I do not know if the Introduction to SAS class should incorporate ODS or not. It’s similar to asking if the Introduction to R class should incorporate markdown documents or not. In general, I tend to think that we should teach plain vanilla versions of SAS and R, but I do worry that we may be missing something important if we don’t teach ODS or markdown. Continue reading
I got this recommendation from a friend. IBM has a large number of free resources explaining things like cloud computing and blockchain. I’m most interested in their section on analytics. There’s a nice introduction, for example, to natural language processing. Continue reading
This is a non-technical discussion of the difference between effectiveness and efficacy (two easily confused terms) in the context of vaccination. Short answer: efficacy is a measurement under ideal circumstances while effectiveness is a measurement in a “real-world” setting. Continue reading
This is a list (with single paragraph descriptions) of 186 women who have accomplished great things in the area of Analytics. There is a brief accompanying article at the Forbes magazine website, but it is very brief. The author of this list, Meta S. Brown, defines Analytics quite broadly, so the women have very diverse backgrounds and interests. I only recognized one name off the bat, Grace Wahba, an excellent researcher, but someone, unfortunately, that I haven’t met. If I get a chance, I’ll include in a separate blog post a list of outstanding women in Analytics that I HAVE met. Meta Brown’s list includes links so you can find out more about these talented women. Continue reading
This is a nice explanation of what goes into a data dictionary, written from the perspective of research data management. Continue reading
If you want to understand big data, you need to understand Hadoop. Hadoop is the technology underlying many big data efforts. But most of the descriptions of Hadoop are jargon laden and impenetrable to newcomers. Well, maybe just impenetrable to this newcomer. But one great revelation to me was a historical note as to WHY there was a need to develop Hadoop. It was all those pages that had to be indexed by search engines at Google and Yahoo. So I went out to try to find more details. This article, with a ton of references throughout, is an excellent introduction to the precursors to Hadoop, the development of Hadoop itself, and the explosion of systems that used Hadoop as their foundation. Continue reading
In many text mining or natural language processing applications, you will have problems with words that are very similar, but which are counted separately. An example might be the words win, winner, and winning. You can combine these words into a single category using stemming. This blog post gives a nice overview of stemming. Continue reading
I normally don’t recommend articles that are stuck behind pay walls, but this is an important article. It shows how 85% of a sample of research studies using the National Inpatient Sample database failed to follow at least one of seven well documented practice recommendations of the Agency for Healthcare Research and Quality. Continue reading
I’ve done a lot of work with Evidence-Based Health, but one big and largely unsolved problem is how to get health care professionals to change their practices once the evidence for these changes becomes obvious. If no one changes in the face of evidence, then all the effort to produce and critically appraise the evidence becomes worthless. A new field, implementation science, has been developed to get at methods to encourage the adoption of new evidence-based practices. This paper outlines how implementation science is supposed to work and offers two real world examples of implementation science studies. Continue reading