Data Mining techniques on Twitter to discover user sentiment

Twitter provides a comprehensive web serice API which can be used to retrive its data. At the university of Waikato some researchers used Weka (a popular Data Mining Open Source tool) to process Twitter real time data. Their objective was to analyze users's sentiment, in other words if each single post contained a positive or negative feeling.

For example, a message like "Of course I will" has a positive meaning, while "It's wrong" contains a negative one. And it's possible to consider emoticons like :-) or :-| as well.

By processing a sample of 2 millions messages they found that 15% of Twitter's messages were positive and 85% negative.

This is an example of advanced Business Intelligence with Open Source tools and Data Mining techniques. The complete paper can be found at:

Data Mining for the Business Intelligence

Weka is an Open Source Data Mining tool, developed at unversity of Waikato (New Zeland) and is included in some Business Intelligence suites (to say the truth a few ones, giving the complexity of this kind of problems and the academic nature of Weka).

Data Mining is the process aimed to find synthetic and usefull informations starting from a plenty of raw data. This discipline requires a good background in statistics and can be done with some Open Source software such as R-project or Weka.

The process is not easy and involves data sampling, data cleasing (Data Quality), algorithm selection over a wide range of different techniques. Weka experiment is a good sample on how challenging Business Intelligence.

Weka provides a set of clustered analysis algorithms to perform sentiment analysis and classification.

Weka Data Mining