compared with
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (1)

View Page History
* An ensemble model, which combines a gazeteer and a classifier. The classifier outputs "yes" or "no" for each category. It is based on a small number of features, up to 30. Some reduced language model that hashcodes words to categories.
* An unsupervised model that works with tagged entities in the documents and tries to find dbpedia supercategories that cover well the entities. Unwanted aspect: very broad supercategories such as "Living_people" are very often output and are unspecific. The approach is promising, but some specificity score of the output categories mush be introduced.

* We are currently using: stopwords elimination, stemming and a bigram model for feature extraction