The aim is to evaluate sentiment polarity (Negative/Positive) at several levels of granularity:
Sentiment prediction can be supervised, semi-supervised or unsupervised.
Supervised approaches rely on annotated datasets. Given the strong domain specificity, it is important that a large corpus from the target domain is available. When not available, domain adaptation methods can be used, that rely on a large out-of-domain corpus and a small supplementary target-domain annotated corpus.
Unsupervised methods rely on sentiment dictionaries: large lists of words with scores quantifying their polarity. Mapping to dictionary and aggregation statistics are used to evaluate sentiment in free text.
Semi-supervised approaches rely on a small set of annotated texts or small polarity dictionaries, that are expanded by either bootstrap methods, or by using external knowledge-bases like Wordnet.
Our tagging services in intended for generic documents, without specified domain (at least in the early stages). Therefore we opted for an unsupervised approach. We composed a large sentiment dictionary from several open sources, as described below.
We assembled a sentiment dictionary from three sources:
Pipeline for document sentiment:
Pipeline for entity sentiment: