View Source


h2. Introduction

Sentiment prediction can be supervised, semi-supervised or unsupervised.

_Supervised_ approaches rely on annotated datasets. Given the strong domain specificity, it is important that a large corpus from the target domain is available. When not available, domain adaptation methods can be used, that rely on a large out-of-domain corpus and a small supplementary target-domain annotated corpus.

_Unsupervised_ methods rely on sentiment dictionaries: large lists of words with scores quantifying their polarity. Mapping to dictionary and aggregation statistics are used to evaluate sentiment in free text.

_Semi-supervised_ approaches rely on a small set of annotated texts or small polarity dictionaries, that are expanded by either bootstrap methods, or by using external knowledge-bases like Wordnet.

h2. Our approach (for English)

Our aim is to evaluate sentiment polarity (Negative/Positive) at several levels of granularity:
* *document* (overall sentiment): appropriate for blog posts or technical review articles, estimated whether the author's opinion on the topic is generally positive or negative. Strong polarity means the author is very subjective.
* *paragraph* (aspect oriented)
* *entity* (very specific target)

Input: a *sentiment dictionary* of sentiwords, i.e. words that have some polarity, together with scores that quantify how positive or how negative they are.

h2. Sentiment dictionary
We assembled a sentiment dictionary from three sources:
* SentiWordnet
* Stanford IMDB

h2. Sentiment evaluation algorithms

Pipeline for document sentiment:
# Sentiment mapping

Pipeline for entity sentiment:
# Concept tagging
# Segmentation
# Sentiment mapping
# Sentiment evaluation