GATE track 1 session: Difference between revisions

From zooid Wiki
Jump to navigation Jump to search
No edit summary
Line 7: Line 7:
== Using GATE developer ==
== Using GATE developer ==


GATE developer is used to process sets of Language Resources in Corpus using Processing Resources. They are typically saved to a serialized Datastore.
* GATE developer is used to process sets of Language Resources in Corpus using Processing Resources. They are typically saved to a serialized Datastore.
 
* ANNIE, VG (verb group) processors.
ANNIE, VG (verb group) processors.
* Preserve formatting embeds tags in HTML or XML.
 
** Different strengths using GATE's graph (node/offset) based XML vs. preserved formatting (original xml/html)
Save with formatting embeds tags in HTML or XML.


=== To investigate ===
=== To investigate ===


* markupAware for HTML/XML
* markupAware for HTML/XML (keeps tags in editor)
* AnnotationStack
* AnnotationStack
* Advanced Options
* Advanced Options


{{Blikied|Aug 30, 2010}}
{{Blikied|Aug 30, 2010}}

Revision as of 19:28, 30 August 2010

A full week of learning GATE text mining/information extraction language processing and talks. Session wiki

GATE screenshot.png

GATE is written in Java and very Java centric. This makes it portable, fast, and heavyweight. A programming library is available. It's 14 years old and has many users and contributors.

Using GATE developer

  • GATE developer is used to process sets of Language Resources in Corpus using Processing Resources. They are typically saved to a serialized Datastore.
  • ANNIE, VG (verb group) processors.
  • Preserve formatting embeds tags in HTML or XML.
    • Different strengths using GATE's graph (node/offset) based XML vs. preserved formatting (original xml/html)

To investigate

  • markupAware for HTML/XML (keeps tags in editor)
  • AnnotationStack
  • Advanced Options



RSS

Blikied on Aug 30, 2010