compared with
Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (3)

View Page History
h5. Perceptron

h3. Algorithms for feature selection
NLP datasets are characterized by a large number of features, sometimes order of magnitudes higher than the number of training samples available. In order to avoid overfitting, feature selection can be used prior to or during model training. We have a large number of approaches to feature selection:
* Filter by Fisher test (association between feature and outcome), either keeping a percent of features, or keeping the features yielding small enough p-value.
* Filter by mutual information (between feature and outcome), either keeping a percent of features, or keeping the features yielding large enough mutual information


h2. Feature Extraction module

A module for feature extraction.
A module for feature extraction: classification instances are produced automatically, as features are extracted from documents using a set of groovy rules.

h2. Edlin-Wrapper(for GATE)