GATE track 1 session: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 7: | Line 7: | ||
== Using GATE developer == | == Using GATE developer == | ||
GATE developer is used to process sets of Language Resources in Corpus using Processing Resources. They are typically saved to a serialized Datastore. | * GATE developer is used to process sets of Language Resources in Corpus using Processing Resources. They are typically saved to a serialized Datastore. | ||
* ANNIE, VG (verb group) processors. | |||
ANNIE, VG (verb group) processors. | * Preserve formatting embeds tags in HTML or XML. | ||
** Different strengths using GATE's graph (node/offset) based XML vs. preserved formatting (original xml/html) | |||
=== To investigate === | === To investigate === | ||
* markupAware for HTML/XML | * markupAware for HTML/XML (keeps tags in editor) | ||
* AnnotationStack | * AnnotationStack | ||
* Advanced Options | * Advanced Options | ||
{{Blikied|Aug 30, 2010}} | {{Blikied|Aug 30, 2010}} |
Revision as of 19:28, 30 August 2010
A full week of learning GATE text mining/information extraction language processing and talks. Session wiki
GATE is written in Java and very Java centric. This makes it portable, fast, and heavyweight. A programming library is available. It's 14 years old and has many users and contributors.
Using GATE developer
- GATE developer is used to process sets of Language Resources in Corpus using Processing Resources. They are typically saved to a serialized Datastore.
- ANNIE, VG (verb group) processors.
- Preserve formatting embeds tags in HTML or XML.
- Different strengths using GATE's graph (node/offset) based XML vs. preserved formatting (original xml/html)
To investigate
- markupAware for HTML/XML (keeps tags in editor)
- AnnotationStack
- Advanced Options