View Source


h2. Introduction

Sentiment prediction can be supervised, semi-supervised or unsupervised.

Supervised approaches rely on annotated datasets. Given the strong domain specificity, it is important that a large corpus from the same domain as required by the application is available. When not available, domain adaptation methods can be used, that rely on a large out-of-domain corpus and a small supplementary target-domain annotated corpus.

Unsupervised methods rely on sentiment dictionaries: large lists words with scores quantifying their polarity. Mapping to dictionary and aggregation statistics are used to evaluate sentiment in free text.

h2. Our approach (for English)

Our aim is to evaluate sentiment polarity (Negative/Positive) at several levels of granularity:
* *document* (overall sentiment): appropriate for blog posts or technical review articles, estimated whether the author's opinion on the topic is generally positive or negative. Strong polarity means the author is very subjective.
* *paragraph* (aspect oriented)
* *entity* (very specific target)

Input: a *sentiment dictionary* of sentiwords, i.e. words that have some polarity, together with scores that quantify how positive or how negative they are.

h2. Sentiment dictionary
We assembled a sentiment dictionary from three sources:
* SentiWordnet
* Stanford IMDB

h2. Sentiment evaluation algorithms

Pipeline for document sentiment:
# Sentiment mapping

Pipeline for entity sentiment:
# Concept tagging
# Segmentation
# Sentiment mapping
# Sentiment evaluation