Sentiment extraction (or opinion mining) is the task of inferring the kind of subjective attitude that is expressed in text, either as a whole or with respect to some product, company, person, event, etc. Typically, of high interest is to quantify the polarity of the sentiment - either positive or negative. More refined sentiment analysis can be done to identify specific emotional layers like anger, fear, hope, love, hate, etc.
Sentiment extraction is a very difficult NLP problem, mainly because algorithms must uncover subjective human emotions, that are often mixed, subtle, marked by irony.. Even a human reading subjective text can have a hard time quantifying the polarity of the opinion (what is a 20% positive opinion, or a 90% negative opinion)?
Sentiment extraction is not one task, but requires solutions of several classical NLP sub-problems:
- negation: which verbs, adjectives or adverbs, otherwise with a specific polarity, are negated and thus polarity is changed? E.g. like (positive), don't like (negative); convenient (positive), not convenient (negative). Or, even more complicated expressions, such as in : I would hate to miss that movie (hate, miss are negative, together the opinion on the movie is positive).
- segmentation (which words refer to which entity)
- named entity extraction, or concept extraction: target concepts, for which the sentiment needs to be evaluated, have to be identified in the text
- anaphora identification: concepts are often referenced by "it", "that", "him", etc, that need to be disambiguated
- domain specificity: some words are positive in some context domains, but negative in other domains. An unpredictable movie is something good, however an unpredictable kitchen robot is probably bad.