Author Archives: pmean

Recommended: A sampling of outstanding women in analytics

This is a list (with single paragraph descriptions) of 186 women who have accomplished great things in the area of Analytics. There is a brief accompanying article at the Forbes magazine website, but it is very brief. The author of this list, Meta S. Brown, defines Analytics quite broadly, so the women have very diverse backgrounds and interests. I only recognized one name off the bat, Grace Wahba, an excellent researcher, but someone, unfortunately, that I haven’t met. If I get a chance, I’ll include in a separate blog post a list of outstanding women in Analytics that I HAVE met. Meta Brown’s list includes links so you can find out more about these talented women. Continue reading

PMean: Mixed up variable names in SAS

Some of my students in the Introduction to SAS class were having trouble reading in a tab-delimited text file, and it’s not too surprising, because some of the student in the Introduction to R class were having problems with the same file. Here’s some details about the data set, what problems it caused, and a couple of ways that you could fix it. Continue reading

Recommended: The history of Hadoop

If you want to understand big data, you need to understand Hadoop. Hadoop is the technology underlying many big data efforts. But most of the descriptions of Hadoop are jargon laden and impenetrable to newcomers. Well, maybe just impenetrable to this newcomer. But one great revelation to me was a historical note as to WHY there was a need to develop Hadoop. It was all those pages that had to be indexed by search engines at Google and Yahoo. So I went out to try to find more details. This article, with a ton of references throughout, is an excellent introduction to the precursors to Hadoop, the development of Hadoop itself, and the explosion of systems that used Hadoop as their foundation. Continue reading

Recommended: Cleaning Words with R: Stemming, Lemmatization & Replacing with More Common Synonym

In many text mining or natural language processing applications, you will have problems with words that are very similar, but which are counted separately. An example might be the words win, winner, and winning. You can combine these words into a single category using stemming. This blog post gives a nice overview of stemming. Continue reading

Recommended: Adherence to Methodological Standards in Research Using the National Inpatient Sample

I normally don’t recommend articles that are stuck behind pay walls, but this is an important article. It shows how 85% of a sample of research studies using the National Inpatient Sample database failed to follow at least one of seven well documented practice recommendations of the Agency for Healthcare Research and Quality. Continue reading

PMean: Sentiment analysis of A Christmas Carol

I was at an interesting talk about sentiment analysis and decided to try something simple myself. Sentiment analysis is a text analytics method that compares text data with a list of words with positive or negative sentiments. The relative frequency of the positive or negative words is a crude measure of the general sentiment of the text item. I ran a sentiment analysis on the text of the famous Charles Dickens novel, A Christmas Carol. Continue reading