compared with
Current by laura.tolosi
on Dec 05, 2014 16:53.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (6)

View Page History
** Stochastic gradient ascent (fast)
* Parallelization:
** An approach to multithreaded maxent by Mann et al. (2009), ([Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models|http://research.google.com/pubs/archive/35648.pdf])
* Modified objective for targeted optimization of particular Precision/Recall trade-off:
** We implemented a weighted likelihood objective that allows for optimizing a specific F_beta, for a given beta, which means that we can specify a desired Precision/ Recall trade-off. In practice, we can therefore train models that have very high Precision, or very high Recall, at the expense of the complementary measure.
** Main publication: [Georgi Dimitroff, Laura Tolosi, Borislav Popov and Georgi Georgiev. [Dimitroff et al. Weighted maximum likelihood as a convenient shortcut to optimize the F-measure of maximum entropy classifiers, RANLP 2013|http://www.aclweb.org/anthology/R13-1027]
* Regularization:
** L1 regularization is often used in practice for sparse models and reducing overfitting. An L1-regularized maxent can also serve as feature selection procedure.
h5. Perceptron

The structured perceptron, with parallelization is implemented after : McDonald et al. (2010), ([Distributed Training Strategies for the Structured Perceptron | http://aclweb.org/anthology/N/N10/N10-1069.pdf])


h3. Algorithms for feature selection
NLP datasets are characterized by a large number of features, sometimes order of magnitudes higher than the number of training samples available. In order to avoid overfitting, feature selection can be used prior to or during model training. We have a large number of approaches to feature selection:
* Filter by Fisher test (association between feature and outcome), either keeping a percent of features, or keeping the features yielding small enough p-value.
* Filter by mutual information (between feature and outcome), either keeping a percent of features, or keeping the features yielding large enough mutual information
* A feature induction algorithm, described here: ([Tolosi et. al. 2013 A Feature Induction Algorithm with Application to Named Entity Disambiguation. RANLP 2013|http://www.aclweb.org/anthology/R13-1089])


h2. Feature Extraction module

A module for feature extraction.
A module for feature extraction: classification instances are produced automatically, as features are extracted from documents using a set of [Groovy|http://groovy.codehaus.org/] rules.

h2. Edlin-Wrapper(for GATE)
Mallet-Wrapper wraps the algorithms of [Mallet|http://mallet.cs.umass.edu/], so that they can be used in [GATE|http://gate.ac.uk/] for multiple information extraction purposes.
The algorithms are wrapped as ProcessingResources and LanguageResources and can be applied directly in a pipeline.

h2. Document classification API(DAPI).

Currently not part of Edlin.