Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Introduction

Edlin is a collection of machine learning algorithms comprising a large number of state-of-the-art methods for classification and sequence tagging. Even though at their core they are general machine-learning approaches (perceptrons, logistic regression), the implementation is optimized for NLP learning tasks:

  • inputs are represented as sparse document-term matrices
  • parallel computation is used whenever possible (in order to deal with very large datasets)
  • specific evaluation metrics such as Precision/Recall/F are being reported
  • appropriate feature selection methods are added in order to reduce dimensionality, etc.

Edlin consists of four sub-projects(Basics, Edlin-Wrapper, Mallet-Wrapper and Feature Extraction). Below find technical details on each of the sub-projects. The source can be found here.

Edlin Basics

Edlin Basics is the core of the tool, containing all ML algorithms, divided into two general groups: classification and sequence (tagging).

Algorithms for classification

Maxent
Perceptron
Naive Bayes
MIRA

Feature Extraction module

A module for feature extraction.

Edlin-Wrapper(for GATE)

Edlin-Wrapper wraps the algorithms of Edlin, so that they can be used in GATE for multiple information extraction purposes.
The algorithms are wrapped as ProcessingResources and LanguageResources and can be applied directly in a pipeline.
More about [Edlin-Wrapper]

Mallet-Wrapper(for GATE)

Mallet-Wrapper wraps the algorithms of Mallet, so that they can be used in GATE for multiple information extraction purposes.
The algorithms are wrapped as ProcessingResources and LanguageResources and can be applied directly in a pipeline.

Document classification API(DAPI).

Currently not part of Edlin.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.