“Classical” sentiment analysis, as defined in Pang and Lee’s seminal experiments, classifies reviews into two categories: positive and negative. Pang and Lee created training data from Rotten Tomatoes
reviews, which are published with stars. Their data consisted of two
classes representing negative and positive reviews. Neutral reviews
(those getting 3/5 stars) were not included in the data set, and thus
the resulting systems aren’t tested on their ability to reject neutral
reviews as being neither positive nor negative.
Here’s what I’ve been suggesting: use two classifiers, one for
positive/non-positive and one for negative/non-negative. Then you get a
4-way classification into positive (+pos,-neg), negative (-pos,+neg),
mixed (+pos,+neg) and neutral (-pos,-neg). The problem here is that I
need data to try it out. This level of annotation can’t be extracted
from review text plus star rating. I need to know sentence-by-sentence
Someone just forwarded me a pointer to a 2005 IJCAI paper that actually takes neutral sentiments seriously:
M. Koppel and J. Schler (2005) Using Neutral Examples for Learning Polarity. In IJCAI.
What Koppel and Schler built was a three-way classifier for
positive, negative and neutral sentiment. They do it by combining
binary classifiers into n-way classifiers using a standard round-robin
approach. Specifically, they build three binary classifiers:
positive/negative, positive/neutral, and negative/neutral. Then they
run all three and take a vote (with some tie-breaking scheme). They use
SVMs, but any binary classifier may be used.
This approach is generalizable. You can slice and dice the
categories differently. What I suggested was actually combining two
classifiers each of which would be trained on all the data, namely
positive/neutral+negative and negative/neutral+positive. You can go
further and fully expand all the combinations, adding
positive+negative/neutral to round out the set of six binary
classification problems. You can just add the larger categories into
On another note, I’d actually like to get the time to take my
proposed approach and build it into a hierarchical model, like McDonald
et al.’s hierarchical SVM-based approach.
I’d use something like latent Dirichlet analysis (coming soon to
LingPipe) instead of SVMs, so I could predict posterior probabilities,
but that’s a relatively trifling detail compared to the overall
structure of the model. It would actually be possible to partially
supervise LDA, or the whole model could be induced as latent structure
from the top-level review ratings. Even more fun would ensue if we
could use a kind of hierarchical LDA, with a level dedicated to overall
sentiment and then another level per genre/domain (this’d be
hierarchical on the word distributions, not the topic distributions as
in standard hierarchical LDA).