|
Key
This line was removed.
This word was removed. This word was added.
This line was added.
|
Changes (2)
View Page History
Sentiment extraction is a very difficult NLP problem, mainly because algorithms must uncover subjective human emotions, that are often mixed, subtle, marked by irony.. Even a human reading subjective text can have a hard time quantifying the polarity of the opinion (what is a 20% positive opinion, or a 90% negative opinion)?
NLP subproblems: negation, segmentation (which words refer to which entity), anaphora (cross-references)
- Supervised (annotated corpora)
- Unsupervised (dictionaries)
- Supervised (annotated corpora)
- Unsupervised (dictionaries)
Sentiment extraction is not one task, but requires solutions of several classical NLP sub-problems:
* negation: which verbs, adjectives or adverbs, otherwise with a specific polarity, are negated and thus polarity is changed? E.g. like (positive), don't like (negative); convenient (positive), not convenient (negative). Or, even more complicated expressions, such as in : I would hate to miss that movie (hate, miss are negative, together the opinion on the movie is positive).
* segmentation (which words refer to which entity)
* named entity extraction, or concept extraction: target concepts, for which the sentiment needs to be evaluated, have to be identified in the text
* anaphora identification: concepts are often referenced by "it", "that", "him", etc, that need to be disambiguated
* domain specificity: some words are positive in some context domains, but negative in other domains. An unpredictable movie is something, however an unpredictable kitchen robot is probably bad.
* negation: which verbs, adjectives or adverbs, otherwise with a specific polarity, are negated and thus polarity is changed? E.g. like (positive), don't like (negative); convenient (positive), not convenient (negative). Or, even more complicated expressions, such as in : I would hate to miss that movie (hate, miss are negative, together the opinion on the movie is positive).
* segmentation (which words refer to which entity)
* named entity extraction, or concept extraction: target concepts, for which the sentiment needs to be evaluated, have to be identified in the text
* anaphora identification: concepts are often referenced by "it", "that", "him", etc, that need to be disambiguated
* domain specificity: some words are positive in some context domains, but negative in other domains. An unpredictable movie is something, however an unpredictable kitchen robot is probably bad.