compared with
Current by georgi.georgiev
on Dec 05, 2014 11:41.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (2)

View Page History
h1. Edlin

Edlin \[1\] is Ontotext's machine learning framework, written in JAVA. It was originally started by Georgi Georgiev and Kuzman Ganchev and is now actively developed by the Text-Analysis team of Ontotext.


{code}
[1] [Kuzman Ganchev and Georgi Georgiev, Edlin: an easy to read linear learning framework, Recent Advances in Natural Language Processing (RANLP), 2009.|http://www.aclweb.org/anthology/R09-1018]
{code}

{toc}

h2. Introduction

Edlin is a collection of machine learning algorithms comprising a large number of state-of-the-art methods for classification and sequence tagging. Even though at their core they are general machine-learning approaches (perceptrons, logistic regression), the implementation is optimized for NLP learning tasks:
* inputs are represented as sparse document-term matrices
* parallel computation is used whenever possible (in order to deal with very large datasets)
* specific evaluation metrics such as Precision/Recall/F are being reported
* appropriate feature selection methods are added in order to reduce dimensionality, etc.



Edlin consists of four sub-projects(Basics, Edlin-Wrapper, Mallet-Wrapper and Feature Extraction) and is closely bound to yet another in-house project, Doc-Classif API(a.k.a DAPI).
The source can be found [here|https://svn.ontotext.com/svn/kim/others/edlin/trunk/].


h2. Edlin Basics

The core part of this framework - it contains all implementations algorithms.
They are divided in two general groups - classification and sequence(tagging).

h2. Feature Extraction module

A module for feature extraction.

h2. Edlin-Wrapper(for GATE)

Edlin-Wrapper wraps the algorithms of Edlin, so that they can be used in [GATE|http://gate.ac.uk/] for multiple information extraction purposes.
The algorithms are wrapped as ProcessingResources and LanguageResources and can be applied directly in a pipeline.
More about [Edlin-Wrapper|Edlin-Wrapper]


h2. Mallet-Wrapper(for GATE)

Mallet-Wrapper wraps the algorithms of [Mallet|http://mallet.cs.umass.edu/], so that they can be used in [GATE|http://gate.ac.uk/] for multiple information extraction purposes.
The algorithms are wrapped as ProcessingResources and LanguageResources and can be applied directly in a pipeline.

h2. Document classification API(DAPI).

Currently not part of Edlin.
Edlin ([Kuzman Ganchev and Georgi Georgiev, Edlin: an easy to read linear learning framework, RANLP, 2009.|http://www.aclweb.org/anthology/R09-1018]) is Ontotext's machine learning framework, written in JAVA, and designed to be easy to read and understand. It was started by Georgi Georgiev and Kuzman Ganchev and is now actively developed by the Text-Analysis team of Ontotext and become a fully featured, big data, enterprise grade machine learning framework.