h1. Edlin
Edlin* is Ontotext's machine learning framework, written in JAVA. It was originally started by Georgi Georgiev and Kuzman Ganchev and is now actively developed by the Text-Analysis team of Ontotext.
*Kuzman Ganchev and Georgi Georgiev, Edlin: an easy to read linear learning framework, Recent Advances in Natural Language Processing (RANLP), 2009.
{toc}
h2. Introduction
Edlin is a collection of machine learning algorithms comprising a large number of state-of-the-art methods for classification and sequence tagging. Even though at their core they are general machine-learning approaches (perceptrons, logistic regression), the implementation is optimized for NLP learning tasks:
* inputs are represented as sparse document-term matrices
* parallel computation is used whenever possible (in order to deal with very large datasets)
* specific evaluation metrics such as Precision/Recall/F are being reported
* appropriate feature selection methods are added in order to reduce dimensionality, etc.
Edlin consists of four sub-projects(Basics, Edlin-Wrapper, Mallet-Wrapper and Feature Extraction) and is closely bound to yet another in-house project, Doc-Classif API(a.k.a DAPI).
The source can be found [here|https://svn.ontotext.com/svn/kim/others/edlin/trunk/].
h2. Edlin Basics
The core part of this framework - it contains all implementations algorithms.
They are divided in two general groups - classification and sequence(tagging).
h2. Feature Extraction module
A module for feature extraction.
h2. Edlin-Wrapper(for GATE)
Edlin-Wrapper wraps the algorithms of Edlin, so that they can be used in [GATE|http://gate.ac.uk/] for multiple information extraction purposes.
The algorithms are wrapped as ProcessingResources and LanguageResources and can be applied directly in a pipeline.
More about [Edlin-Wrapper|Edlin-Wrapper]
h2. Mallet-Wrapper(for GATE)
Mallet-Wrapper wraps the algorithms of [Mallet|http://mallet.cs.umass.edu/], so that they can be used in [GATE|http://gate.ac.uk/] for multiple information extraction purposes.
The algorithms are wrapped as ProcessingResources and LanguageResources and can be applied directly in a pipeline.
h2. Document classification API(DAPI).
Currently not part of Edlin.
Edlin* is Ontotext's machine learning framework, written in JAVA. It was originally started by Georgi Georgiev and Kuzman Ganchev and is now actively developed by the Text-Analysis team of Ontotext.
*Kuzman Ganchev and Georgi Georgiev, Edlin: an easy to read linear learning framework, Recent Advances in Natural Language Processing (RANLP), 2009.
{toc}
h2. Introduction
Edlin is a collection of machine learning algorithms comprising a large number of state-of-the-art methods for classification and sequence tagging. Even though at their core they are general machine-learning approaches (perceptrons, logistic regression), the implementation is optimized for NLP learning tasks:
* inputs are represented as sparse document-term matrices
* parallel computation is used whenever possible (in order to deal with very large datasets)
* specific evaluation metrics such as Precision/Recall/F are being reported
* appropriate feature selection methods are added in order to reduce dimensionality, etc.
Edlin consists of four sub-projects(Basics, Edlin-Wrapper, Mallet-Wrapper and Feature Extraction) and is closely bound to yet another in-house project, Doc-Classif API(a.k.a DAPI).
The source can be found [here|https://svn.ontotext.com/svn/kim/others/edlin/trunk/].
h2. Edlin Basics
The core part of this framework - it contains all implementations algorithms.
They are divided in two general groups - classification and sequence(tagging).
h2. Feature Extraction module
A module for feature extraction.
h2. Edlin-Wrapper(for GATE)
Edlin-Wrapper wraps the algorithms of Edlin, so that they can be used in [GATE|http://gate.ac.uk/] for multiple information extraction purposes.
The algorithms are wrapped as ProcessingResources and LanguageResources and can be applied directly in a pipeline.
More about [Edlin-Wrapper|Edlin-Wrapper]
h2. Mallet-Wrapper(for GATE)
Mallet-Wrapper wraps the algorithms of [Mallet|http://mallet.cs.umass.edu/], so that they can be used in [GATE|http://gate.ac.uk/] for multiple information extraction purposes.
The algorithms are wrapped as ProcessingResources and LanguageResources and can be applied directly in a pipeline.
h2. Document classification API(DAPI).
Currently not part of Edlin.