Category Archives: Statistics

PMean: Sentiment analysis of A Christmas Carol

I was at an interesting talk about sentiment analysis and decided to try something simple myself. Sentiment analysis is a text analytics method that compares text data with a list of words with positive or negative sentiments. The relative frequency of the positive or negative words is a crude measure of the general sentiment of the text item. I ran a sentiment analysis on the text of the famous Charles Dickens novel, A Christmas Carol. Continue reading

PMean: Do you need to name your function arguments in R?

If you program anything in R, you’ll end up calling a lot of functions. You pass your data or your constants to these functions, and you can do it in one of two ways. You can either pass the data/constants in the order in which the function expects the arguments or you can match each data/constant value with a particular argument name. This came up in the context of a question: do I need to save everything using

save.image(file=”foo.RData”)

or can I save it with

save.image(“foo.RData”)? Continue reading

PMean: My work on a CTSA grant

I’m on a Clincal and Translational Science Award (CTSA) research grant (5UL1TR000001-05, formerly 1U54RR031295-01A1), which is pretty cool. My name is even mentioned a few times in the grant. I thought that as I plan what I would do for this grant, I would see what the grant promised and write down what, exactly, that those promises mean. As I talk with various people (especially Russ Waitman, who is supervising my work on this grant), I will revise and update my plans. Still, I thought it would be valuable to put some thoughts down now, both to help me focus on what I should be doing and to offer an early draft of those ideas to the various people that I will end up interacting with. Continue reading

PMean: SAS University. It’s SAS and it’s free

I am teaching a class, Introduction to SAS, that I helped design, but one where another faculty member did all the heavy lifting. I used to teach SAS classes, and I even helped organize a regional SAS conference, but stopped abruptly in 1998. So I’m relearning SAS and one thing that is helping a lot is a product called SAS University which allows you to use SAS for non-commercial purposes for free. Here’s how SAS University works. Continue reading

PMean: Another big data publication

I dislike the term “big data” because it implies a class of problems that are immune from normal statistical considerations. I will admit that certain concepts such as the p-value become meaningless when you have millions of observations. But other concepts, like selection bias become even more important for big data.

Anyway, I now have a second publication that is directly tied to the big data movement. Continue reading