Category Archives: Recommended

Recommended: A sampling of outstanding women in analytics

This is a list (with single paragraph descriptions) of 186 women who have accomplished great things in the area of Analytics. There is a brief accompanying article at the Forbes magazine website, but it is very brief. The author of this list, Meta S. Brown, defines Analytics quite broadly, so the women have very diverse backgrounds and interests. I only recognized one name off the bat, Grace Wahba, an excellent researcher, but someone, unfortunately, that I haven’t met. If I get a chance, I’ll include in a separate blog post a list of outstanding women in Analytics that I HAVE met. Meta Brown’s list includes links so you can find out more about these talented women. Continue reading

Recommended: The history of Hadoop

If you want to understand big data, you need to understand Hadoop. Hadoop is the technology underlying many big data efforts. But most of the descriptions of Hadoop are jargon laden and impenetrable to newcomers. Well, maybe just impenetrable to this newcomer. But one great revelation to me was a historical note as to WHY there was a need to develop Hadoop. It was all those pages that had to be indexed by search engines at Google and Yahoo. So I went out to try to find more details. This article, with a ton of references throughout, is an excellent introduction to the precursors to Hadoop, the development of Hadoop itself, and the explosion of systems that used Hadoop as their foundation. Continue reading

Recommended: Cleaning Words with R: Stemming, Lemmatization & Replacing with More Common Synonym

In many text mining or natural language processing applications, you will have problems with words that are very similar, but which are counted separately. An example might be the words win, winner, and winning. You can combine these words into a single category using stemming. This blog post gives a nice overview of stemming. Continue reading

Recommended: Adherence to Methodological Standards in Research Using the National Inpatient Sample

I normally don’t recommend articles that are stuck behind pay walls, but this is an important article. It shows how 85% of a sample of research studies using the National Inpatient Sample database failed to follow at least one of seven well documented practice recommendations of the Agency for Healthcare Research and Quality. Continue reading

Recommended: An introduction to implementation science for the non-specialist

I’ve done a lot of work with Evidence-Based Health, but one big and largely unsolved problem is how to get health care professionals to change their practices once the evidence for these changes becomes obvious. If no one changes in the face of evidence, then all the effort to produce and critically appraise the evidence becomes worthless. A new field, implementation science, has been developed to get at methods to encourage the adoption of new evidence-based practices. This paper outlines how implementation science is supposed to work and offers two real world examples of implementation science studies. Continue reading

Recommended: Hi, I’m Mike Bostock.

This is an AMA (Ask Me Anything) session with Mike Bostock, a former graphics editor for the New York Times and creator of the d3.js data visualization package. I’ll be writing a few things about d3.js once I figure things out. Mike is someone worth watching, because he is working on high visibility, high impact stuff. Continue reading

Recommended: How to be more effective in your professional life

Doug Zahn has done a tremendous amount of work on what I like to call the human factors in statistical consulting. He summarizes some key ideas in this article. His humorous anecdote about his prized Mustang car illustrates the tendency of all of us to be poor listeners. Pay special atention to Table 1 where he outlines the five steps you should always follow in any consulting interaction. Continue reading