Problem Statment (see competition on Kaggle here)

This problem is a very advanced problem of sentiment analysis, because it focuses to capture not so obvious trends. For instance sarcasm or frustration etc. And we do it by breaking down the sentence. For instance, if I say “The movie was OK but not that awesome” . What do you think is the sentiment of this sentence. It looks neutral as it was expected to be awesome but it was just fine. Now try this “The movie was OK”. Now the meaning of the sentence is quite different as now we don’t know what was author expecting of the movie. This problem gives many sentences broken down to phrases and sentiments of these phrases. This dataset just enables you to create your own sentiment miner algorithm. Here is how the problem reads on Kaggle –


Let’s get started

Let’s load the data and GraphLab.

Kaggle Sentiment Analysis on Movie Reviews

In [1]:
In [5]:
In [3]:
In [6]:
PhraseId SentenceId Phrase Sentiment target
112 3 sitting 2 0
113 3 through this one 2 0
114 3 through 2 0
115 3 this one 2 0
116 3 one 2 0
117 4 A positively thrilling
combination of …
3 0
118 4 A positively thrilling
combination of …
4 1
119 4 A positively thrilling
combination of …
4 1
120 4 A positively thrilling
combination …
3 0
121 4 positively thrilling
combination …
3 0

[79 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [7]:
In [8]:
In [9]:
In [10]:
In [11]:
In [12]:
In [13]:
In [15]:
In [16]:
In [17]:


As you can observe from all the evaluations, with accuracy as the primary metric, GBM comes out to be the best model on the test data.