* We are currently using: stopwords elimination, stemming and a bigram model for feature extraction
* The multi-label classification is achieved by training K independent classifiers (perceptron, sigmoid perceptrons), corresponding to the K possible labels. For each classifier, the interpretation is: what is the likelihood that sample x has label l, against the alternative that it does not? After training all K classifiers, for each sample, the top highest likelihoods give the set of labels. A rule of thumb is used for deciding how many labels should be returned.