|
Key
This line was removed.
This word was removed. This word was added.
This line was added.
|
Changes (1)
View Page History
* An ensemble model, which combines a gazeteer and a classifier. The classifier outputs "yes" or "no" for each category. It is based on a small number of features, up to 30. Some reduced language model that hashcodes words to categories.
* An unsupervised model that works with tagged entities in the documents and tries to find dbpedia supercategories that cover well the entities. Unwanted aspect: very broad supercategories such as "Living_people" are very often output and are unspecific. The approach is promising, but some specificity score of the output categories mush be introduced.
* An unsupervised model that works with tagged entities in the documents and tries to find dbpedia supercategories that cover well the entities. Unwanted aspect: very broad supercategories such as "Living_people" are very often output and are unspecific. The approach is promising, but some specificity score of the output categories mush be introduced.
Features:
* We are currently using: stopwords elimination, stemming and a bigram model for feature extraction