Tag Archives: Text mining

Recommended: GloVe word vector embeddings

This page is moving to a new website.

When you are working with text mining, you might want to reduce the dimensionality of your problem. The word2vec algorithm, developed by Tomas Mikolov and others at Google, offers a nice approach. This page shows how to apply this algorithm within R. Continue reading →

Recommended: Cleaning Words with R: Stemming, Lemmatization & Replacing with More Common Synonym

This page is moving to a new website.

In many text mining or natural language processing applications, you will have problems with words that are very similar, but which are counted separately. An example might be the words win, winner, and winning. You can combine these words into a single category using stemming. This blog post gives a nice overview of stemming. Continue reading →

PMean: Sentiment analysis of A Christmas Carol

This page is moving to a new website.

I was at an interesting talk about sentiment analysis and decided to try something simple myself. Sentiment analysis is a text analytics method that compares text data with a list of words with positive or negative sentiments. The relative frequency of the positive or negative words is a crude measure of the general sentiment of the text item. I ran a sentiment analysis on the text of the famous Charles Dickens novel, A Christmas Carol. Continue reading →

Recommended: Trump’s Android and iPhone tweets, one year later

This is a nice example of using R for text mining of twitter feeds, and the author gives lots of links and hints on how you could do something similar. Continue reading →