Sentiment Analysis with SentiWordNet¶

The basic goal of a 'sentiment analysis' is to classify a given text into positive, negative or neutral.

SentiWordNet is a lexical resource for opinion mining. It assigns sentiments to each synset of WordNet which makes it possible to "calculate" an overall sentiment for a text. A SentiWordNet implementation can be found in DL4J in the deeplearning4j-nlp-uima artifact.

This demo has been implemented in Scala using the Jupyter BeakerX Notebook.

Setup¶

In [1]:

%classpath config resolver maven-public http://192.168.1.10:8081/repository/maven-public/
%%classpath add mvn 
org.deeplearning4j:deeplearning4j-nlp-uima:1.0.0-beta2

Added new repo: maven-public

org.deeplearning4j:deeplearning4j-core:1.0.0-beta2 org.deeplearning4j:deeplearning4j-nlp-uima:1.0.0-beta2 org.deeplearning4j:deeplearning4j-nlp:1.0.0-beta2 org.nd4j:nd4j-native-platform:1.0.0-beta2 com.github.habernal:confusion-matrix:1.0

Classifying a Text¶

In order to classify a text we just need to call the classify method on the SWN3 class. Here is an example:

In [2]:

import org.deeplearning4j.text.corpora.sentiwordnet.SWN3

var svn3 = new SWN3()
var txt = "For years  Apple was the innovative leader in personal computers. Not anymore."
var result = svn3.classify(txt)

s"$txt ==> $result"

Out[2]:

For years  Apple was the innovative leader in personal computers. Not anymore. ==> weak_negative

Comparing the Result of SentiWordNet with Sentiment140¶

In quite a few Blogs the Sentiment140 dataset (with 1.6 Million Twitter entries) is used as input for the training of Sentiment Analyisis models. So we would like to double check how good this dataset is relating to the predictions done with Sentiwordnet.

In [7]:

import scala.io.Source

def getLabel(number:String):String = {
    var v = number
    if (v=="\"0\"") v = "negative"
    else if (v=="\"2\"") v = "neutral"
    else if (v=="\"4\"") v = "positive"
    return v
}    

var dataList = Source.fromFile("training.1600000.processed.noemoticon.csv","ISO-8859-1")
    .getLines()
    .map(str => str.split(","))
    .map(array => (array(5),getLabel(array(0))))
    .toList

dataList.length

Out[7]:

Accuracy¶

We can easily caluclate the accuracy by dividing the correctly classified entries with the total entries. For doing this we need to remove the strong_ and weak_ prefixes which are generated by the SentiWordNet.

In [20]:

import scala.util.Random

def stdLabel(label:String):String = {
    var result = label.replace("strong_","")
    result = result.replace("weak_","")
    return result
}

dataList = Random.shuffle(dataList)
val resultCompareList = dataList.slice(0, 20000).par.map(r => (r._2, stdLabel(svn3.classify(r._1))))

resultCompareList.size

Out[20]:

We just compare the classification result values which should be identical

In [21]:

var swnSet = resultCompareList.map(_._2).toSet
var s140Set = resultCompareList.map(_._1).toSet

s"$swnSet <=> $s140Set"

Out[21]:

ParSet(negative, positive, neutral) <=> ParSet(negative, positive)

...and we calculate the Accuracy

In [22]:

val correctCount:Double = resultCompareList.count(v => v._1 == v._2) 

s"Accuracy = ${correctCount / resultCompareList.size}"

Out[22]:

Accuracy = 0.4147

In [25]:

val neutralCount:Double = resultCompareList.count(v => v._2 == "neutral") 
neutralCount / resultCompareList.size

Out[25]:

0.32095

The accuracy is surprsingly low. This can partly be explained by the fact that the Sentiment140 data provides almost no neutral classifications.

In [ ]: