How to: Sentiment Analysis of Tweets Using Java

24 comments

Since it is a quite interesting topic, I will describe a simple but accurate process for the sentiment analysis of tweets using Java, based on the use of two libraries: Twitter4J (to collect tweets) and LingPipe (to classify their sentiment).
If you want to easily download the code used in this tutorial, you can easily download it from GitHub here: https://github.com/johnbyron/TwitterSentiment

Phase 1 - Train the classifier

In this phase you will learn how to train a sentiment classifier and save it as a .txt file for later use. If you don't want to train your own classifier you can simply download one from here (it has been trained on random tweets written in English and can distinguish between positive, negative and neutral tweets with an accuracy of  approximately 75%) and skip to the next phase.

First of all you need to download LingPipe, a library for text processing, from here and import it into your project. Then you need to have a training dataset, that is a collection of tweets used to train the classifier. You can download one from here or create one by yourself.

Create a folder named "trainDirectory" in which you put three folders named "pos", "neg" and "neu". (you can also use only "pos" and "neg"). In the "pos" folder you have to put the positive tweets, in "neg" the negative tweets and in "neu" the neutral tweets. Each tweet has to be a different .txt file. Use the code below to train a classifier based on the training dataset and save it as a file named "classifier.txt".

 void train() throws IOException, ClassNotFoundException {  
    File trainDir;  
    String[] categories;  
    LMClassifier class;  
    trainDir = new File("trainDirectory");  
    categories = trainDir.list();  
    int nGram = 7; //the nGram level, any value between 7 and 12 works  
    class= DynamicLMClassifier.createNGramProcess(mCategories, nGram);  
    for (int i = 0; i < categories.length; ++i) {  
       String category = categories[i];  
       Classification classification = new Classification(category);  
       File file = new File(trainDir, categories[i]);  
       File[] trainFiles = file.listFiles();  
       for (int j = 0; j < trainFiles.length; ++j) {  
          File trainFile = trainFiles[j];  
          String review = Files.readFromFile(trainFile, "ISO-8859-1");  
          Classified classified = new Classified(review, classification);  
          ((ObjectHandler>) class).handle(classified);   
       }  
     }  
    AbstractExternalizable.compileTo((Compilable) class, new File("classifier.txt"));  
 }  

Phase 2 - Load the classifier

In this phase you will load the previously created (or downloaded) classifier, saved as "classifier.txt".

The code below shows how to create a class called "SentimentClassifier", used to classify the tweets collected in the next phase.

 public class SentimentClassifier {  
    String[] categories;  
    LMClassifier class;  
    public SentimentClassifier() {  
    try {  
       class= (LMClassifier) AbstractExternalizable.readObject(new File("abc.txt"));  
       categories = class.categories();  
    }  
    catch (ClassNotFoundException e) {  
       e.printStackTrace();  
    }  
    catch (IOException e) {  
       e.printStackTrace();  
    }  
    }  
    public String classify(String text) {  
    ConditionalClassification classification = class.classify(text);  
    return classification.bestCategory();  
    }  
 }  

Phase 3 - Download and classify the tweets

To collect the tweets from the Twitter API you will need to download the Twitter4J library from here and import it into your project.

The code below shows how to build a class named TwitterManager which will take care of collecting and classifying tweets. In the constructor of the class you have to enter your Twitter Developer credentials for the Search API which can be obtained here.

 public class TwitterManager {  
    SentimentClassifier sentClassifier;  
    int LIMIT= 500; //the number of retrieved tweets  
    ConfigurationBuilder cb;  
    Twitter twitter;  
    public TwitterManager() {  
       cb = new ConfigurationBuilder();  
       cb.setOAuthConsumerKey("***");  
       cb.setOAuthConsumerSecret("***");  
       cb.setOAuthAccessToken("***");  
       cb.setOAuthAccessTokenSecret("***");  
       twitter = new TwitterFactory(cb.build()).getInstance();  
       sentClassifier = new SentimentClassifier();  
    }  
    public void performQuery(String inQuery) throws InterruptedException, IOException {  
       Query query = new Query(inQuery);  
       query.setCount(100);  
       try {  
          int count=0;  
          QueryResult r;  
          do {  
             r = twitter.search(query);  
             ArrayList ts= (ArrayList) r.getTweets();  
             for (int i = 0; i &lt; ts.size() &amp;&amp; count &lt; LIMIT; i++) {  
                count++;  
                Status t = ts.get(i);  
                String text = t.getText();  
                System.out.println("Text: " + text);  
                String name = t.getUser().getScreenName();  
                System.out.println("User: " + name);  
                String sent = sentClassifier.classify(t.getText());  
                System.out.println("Sentiment: " + sent);   
             }    
          } while ((query = r.nextQuery()) != null &amp;&amp; count &lt; LIMIT);  
       }  
       catch (TwitterException te) {  
          System.out.println("Couldn't connect: " + te);  
       }  
    }  
 }  

Now you will be able to query the Twitter API and classify the retrieved tweets just calling the "performQuery" method in the main function. Below an example to retrieve and classify the sentiment of tweets talking about "Obama".

 TwitterManager twitterManager = new TwitterManager();  
 twitterManager.performQuery("Obama");  

24 comments :

  1. If i were to close the Python script accidentally, will i be able to restart the script again and continue my downloads? I can't seem to do so btw. The python script just exit itself after i click on enter thrice. I have only manage to download like 800+ tweet? Any ideas?

    ReplyDelete
  2. Hi Giorgio Cavaggion,

    I am new to sentiment analysis. I was following your blog post and tried running the above mentioned code files but getting an error. I will describe what steps i did, do let me know if I missed something.
    1. Downloaded the pre-built clarrifier.txt file from the link.
    2. Created a twitter developer account & registered application to get access token & all.
    3. Downloaded code files posted on github, required third party jar files.
    4. Added all code files and jars in java project & compiled.
    5. When I run above code it gives me error while loading the classifier at following line:
    lmclassifier= (LMClassifier) AbstractExternalizable.readObject(new File("classifier.txt"));
    "java.io.StreamCorruptedException: invalid stream header: ACED2005
    at java.io.ObjectInputStream.readStreamHeader(Unknown Source)
    at java.io.ObjectInputStream.(Unknown Source)
    at com.aliasi.util.AbstractExternalizable.readObject(AbstractExternalizable.java:309)"

    any help in this regard would be grateful.

    Regards,
    Devanshu

    ReplyDelete
    Replies
    1. This means that the .txt file doesn't contain a serialized object. Try to download again the .txt file or to train your own.

      Delete
    2. Hi,
      You described the steps very well.I am working on Twitter sentiment analysis and I am new in this field.
      Can you please provide me the classifier.txt file?Because it has been archived.
      I also need some information about this topic(like- will I need to use Hadoop or any other software to implement this work?)
      I have stuck to do implement this..please,please help me.I will be grateful..
      Regards,
      Farhana

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. "((ObjectHandler>) class).handle(classified); " ...while copiyng the trainer method getting error in this line..plese help me sir..

    ReplyDelete
    Replies
    1. For some reason the code is poorly written, as it gives several compilation errors. I am really grateful that he made this example, otherwise i'd be lost. Nevertheless here is the corrected executing code:

      Note: i putted it in a static method so you don't have to instantiate the class, just call it.

      Trainer:

      import java.io.File;
      import java.io.IOException;

      import com.aliasi.classify.Classification;
      import com.aliasi.classify.Classified;
      import com.aliasi.classify.DynamicLMClassifier;
      import com.aliasi.classify.LMClassifier;
      import com.aliasi.corpus.ObjectHandler;
      import com.aliasi.util.AbstractExternalizable;
      import com.aliasi.util.Compilable;
      import com.aliasi.util.Files;

      /**
      *
      */
      public class Trainer
      {

      public static void main(String[] args)
      {

      try {
      Trainer.trainModel();
      } catch (ClassNotFoundException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
      } catch (IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
      }


      }

      public static void trainModel()throws IOException, ClassNotFoundException
      {

      File trainDir;
      String[] categories;
      LMClassifier classifier;
      trainDir = new File("C:/Users/Nicolas/Corpus/in");
      categories = trainDir.list();
      int nGram = 7; //the nGram level, any value between 7 and 12 works
      classifier = DynamicLMClassifier.createNGramProcess(categories,nGram);

      for (int i = 0; i < categories.length; ++i) {
      String category = categories[i];
      Classification classification = new Classification(category);
      File file = new File(trainDir, categories[i]);
      File[] trainFiles = file.listFiles();
      for (int j = 0; j < trainFiles.length; ++j) {
      File trainFile = trainFiles[j];
      String review = Files.readFromFile(trainFile, "ISO-8859-1");
      Classified classified = new Classified(review, classification);
      ((ObjectHandler) classifier).handle(classified);
      }
      System.out.println("Current Folder: " + (i+1));
      }
      AbstractExternalizable.compileTo((Compilable) classifier, new File("./datos/classifier.lingPipe"));
      }


      }


      For the sentiment Classifier just change class for classificator, or any other Valid name.

      Hope it is useful

      Delete
    2. This comment has been removed by the author.

      Delete
    3. I'm running this project in Eclipse IDE. Can u tell the detailed steps how to run the code??

      Delete
  5. i tried to train for movie reviews
    but i get wrong sentiment for reviews not used to train the model..
    is there a specific format for the training file??

    ReplyDelete
  6. I tried the above written code for trainer but it giving error in
    import com.aliasi.classify.Classified;

    as classified is not included in com.aliasi.Classify.
    I need to know which version of lingpipe is used in the code above.

    Thanx.

    ReplyDelete
  7. Hi everyone,

    I am trying to use this code to classify Arabic tweets, does anyone know what should be added to this code ? where do we start ?

    Thanks in addition

    ReplyDelete
  8. which technique it uses for classsification??

    ReplyDelete
  9. Hi. I love what you done but seriously, what's the point of doing a tutorial if you cannot answer people's questions. I SEE NO POINT!

    ReplyDelete
  10. can anyone help me how to run this code?

    ReplyDelete
  11. Can anyone tell me which classifier is used in the link(download one)?

    ReplyDelete
  12. how to import this project into eclipse ide?

    ReplyDelete
  13. It will surely be too good for the students to think in accordance with such regarded ideas because these are said to be so important while going to decide more sentiments. c programming assignments

    ReplyDelete
  14. Hi

    Iam new to sentiment analysis. I had crawled tweets using twitter4j and i got tweets. My problem is to classify these retrived tweets into some categories like: technology,Science,food,sports,etc.. How this should be done.. please help

    ReplyDelete
  15. java.io.EOFException
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
    at java.io.ObjectInputStream.(ObjectInputStream.java:301)
    at com.aliasi.util.AbstractExternalizable.readObject(AbstractExternalizable.java:309)
    at SentimentAnalisys.SentimentClassifier.(SentimentClassifier.java:21)
    at SentimentAnalisys.Main.classify(Main.java:33)
    at SentimentAnalisys.Main.main(Main.java:18)
    Exception in thread "main" java.lang.NullPointerException
    at SentimentAnalisys.SentimentClassifier.classify(SentimentClassifier.java:30)
    at SentimentAnalisys.Main.classify(Main.java:34)
    at SentimentAnalisys.Main.main(Main.java:18)
    Java Result: 1

    ReplyDelete
  16. Hii, I am using ubuntu and no ide. I have installed lingpipe but getting some trouble compiling my java script. Help me

    ReplyDelete
  17. hey i am giving a final year project on sentimental analysis of text with sms chat app... but i dont know about sentiment analysis.. :(

    ReplyDelete