B. For unsupervised approaches, where the categories are not specified apriori, one can use ontology terms, such as dbpedia categories, of various degrees of specificity.

Corpus Corpora
A. A corpus consisting of long abstracts from dbpedia of articles that belong to the 17 IPTC categories, as shown here: . The corpus is available in EN and BG.