Wednesday, March 9, 2011

GATE Plugins and CREOLE Resources

GATE (or General Architecture for Text Engineering) is very extensible. Its architecture is based on components (or resources). Its framework functions as a backplane into which users can plug components.

Each component (i.e., a Java Beans), is a reusable chunks of software with well-defined interfaces that may be deployed in a variety of contexts. You can define applications with processing pipelines using these reusable components. In GATE, these resources are officially named CREOLE (i.e., Collection of REusable Objects for Language Engineering).

A set of components plus the framework is a deployment unit which can be embedded in user's applications.

CREOLE Resources

GATE components are one of three types:

  1. Language Resources (LRs) represent entities such as lexicons (e.g. Word-Net), corpora or ontologies
  2. Processing Resources (PRs) represent entities that are primarily algorithmic, such as parsers, generators or n-gram modellers
  3. Visual Resources (VRs) represent visualisation and editing components that participate in GUI
To better organize CREOLE resources, CREOLE plugins are used. In other words, resource implementations can be grouped together as ‘plugins’ and stored at a URL. When the resources are stored in the local file system, this can be a file URL (i.e. file:///D:/Gate/Workspace/GoldFish/) .

CREOLE Plugins

To create a CREOLE plugin, you layout its contents in a directory. Within the directory, it can have a jar which holds its resource implementation, a configuration file (i.e., creole.xml), and external resources such as rules, gazetteer lists, schemas, etc in a resources folder.

To create one, you can use BootStrap Wizard in GATE Developer. For example, we create a new plugin with a single Processing Resource named GoldFish as shown below:



The following files and directories are created:
GoldFish/

+-- src/
put all your Java sources in here.
+-- resources/
any external files used by your plugin (e.g. configuration files,
JAPE grammars, gazetteer lists, etc.) go in here.
+-- build.xml
Ant build file for building your plugin.
+-- build.properties
property definitions that control the build process go in here,
in particular, make sure that gate.home points to your copy of GATE.
+-- creole.xml
plugin configuration file for GATE - edit this to add parameters, etc.,
for your resources.

Using CREOLE Resources

In the applications using GATE Embedded, you can contruct an information extraction (or IE) pipeline using CREOLE resources from different CREOLE plugins. For example, in the Gold Fish example, it constructs a pipeline (i.e., SerialAnalyserController) using three different PRs:
String[] processingResources = {
"gate.creole.tokeniser.DefaultTokeniser",
"gate.creole.splitter.SentenceSplitter",
"sheffield.creole.example.GoldFish"};
SerialAnalyserController pipeline = (SerialAnalyserController)Factory
.createResource("gate.creole.SerialAnalyserController");

for(int pr = 0; pr <processingResource.length; pr++) {
System.out.print("\t* Loading " + processingResource[pr] + " ... ");
pipeline.add((gate.LanguageAnalyser)Factory
.createResource(processingResource[pr]));
}

Two of them are provided by ANNIE plugin and the third one (i.e., sheffield.creole.example.GoldFish) is provided by GoldFish plugin.

In order to use a CREOLE resource, the relevant CREOLE plugin must be loaded. For example, in the Gold Fish Example, it loads two plugins as follows:
// Load GlodFish plugin
Gate.getCreoleRegister().registerDirectories(
new File(System.getProperty("user.dir")).toURI().toURL());
// Load ANNIE plugin for the Defaulttokeniser and SentenceSplitter
Gate.getCreoleRegister().registerDirectories(
new File(Gate.getPluginsHome(), ANNIEConstants.PLUGIN_DIR).toURI().toURL());

Note that all CREOLE resources (i.e., LRs, PRs, and VRs) require that the appropriate plugin be first loaded. The only exceptions are: Document, Corpus or DataStore. For those, you do not need to first load a plugin.

In the above statements, we use registerDirectories() API to load plugins from a given CREOLE directory URL. Note that CREOLE directory URLs should point to the parent location of the creole.xml file.

When a plugin is loaded into GATE it looks for a configuration file called creole.xml relative to the plugin URL and uses the contents of this file to determine what resources this plugin declares and where to find the classes that implement the resource types (typically these classes are stored in a JAR file in the plugin directory).

In the next sections, we will examine the structures of two CREOLE plugins:

  1. ANNIE plugin
  2. GoldFish plugin

ANNIE Plugin

ANNIE plugin has the following layout:
/plugins/ANNIE/ (i.e., ANNIE's plugin directory)
+-- resources/
+-- BengaliNE/
+-- gazeteer/
+-- heptag/
+-- NE/
+-- othomatcher/
+-- regex-splitter/
+-- schema/
+-- sentenceSplitter/
+-- tokenizer/
+-- VP/
+-- build.xml
+-- creole.xml

From creole.xml (i.e., plugin configuration file), we can find the following resources declared:

  • Annotation Schema
  • PRs
    • GATE Unicode Tokeniser
    • ANNIE English Tokeniser
    • ANNIE Gazetteer
    • Sharable Gazetteer
    • Hash Gazetteer
    • Jape Transducer
    • ANNIE NE Transducer
    • ANNIE Sentence Splitter
    • RegEx Sentence Splitter
    • ANNIE POS Tagger
    • ANNIE OrthoMatcher
    • ANNIE Pronominal Coreferencer
    • ANNIE Nominal Coreferencer
    • Document Reset PR
  • VR
    • Jape Viewer

ANNIE is unique in that it's part of the GATE framework. So, all of its components are implemented in the framework (i.e., included in gate.jar). Therefore, it doesn't have a jar file in its directory. However, it does provide external resources like gazetteer lists, JAPE rules, schema, etc. These resources are referenced from CREOLE resource definition. For example, the definition of GATE Unicode Tokeniser is defined as:

<RESOURCE>
<NAME>GATE Unicode Tokeniser</NAME>
<CLASS>gate.creole.tokeniser.SimpleTokeniser</CLASS>
<COMMENT>A customisable Unicode tokeniser.</COMMENT>
<HELPURL>http://gate.ac.uk/userguide/sec:annie:tokeniser</HELPURL>
<PARAMETER NAME="document"
COMMENT="The document to be tokenised" RUNTIME="true">
gate.Document
</PARAMETER>
<PARAMETER NAME="annotationSetName" RUNTIME="true"
COMMENT="The annotation set to be used for the generated annotations"
OPTIONAL="true">
java.lang.String
</PARAMETER>
<PARAMETER
DEFAULT="resources/tokeniser/DefaultTokeniser.rules"
COMMENT="The URL to the rules file" SUFFIXES="rules"
NAME="rulesURL">
java.net.URL
</PARAMETER>
<PARAMETER DEFAULT="UTF-8"
COMMENT="The encoding used for reading the definitions"
NAME="encoding">
java.lang.String
</PARAMETER>
<ICON>tokeniser</ICON>
</RESOURCE>
Its rulesURL parameter has a default value which points to a rule file stored in the resources subfolder:
resources/tokeniser/DefaultTokeniser.rules

Gold Fish Plugin

GlodFish plugin has the following layout:
GoldFish/ (i.e., GoldFish's plugin directory)
+-- build.xml
+-- build.properties
+-- creole.xml
+-- GoldFish.jar

The class "sheffield.creole.example.GoldFish" in GoldFish.jar provides the implementation of the new PR. Because this PR doesn't need any gazetteer list or rules, it has an empty resources folder. In its creole.xml, the content is as simple as:
<CREOLE-DIRECTORY>
<JAR SCAN="true">GoldFish.jar</JAR>
</CREOLE-DIRECTORY>

This tells GATE to load GoldFish.jar and scan its contents looking for resource classes annotated with @CreoleResource.

Configuration Data

Configuration data for the resources may be stored directly in the creole.xml file, or it may be stored as Java annotations on the resource classes themselves; in either case GATE retrieves this configuration information and adds the resource definitions to the CREOLE register. When a user requests an instantiation of a resource, GATE creates an instance of the resource class in the virtual machine.

To learn more on creole.xml, read this section of GATE's user guide. To learn more on Java annotations, read this section.

23 comments:

Unknown said...

Thank you so much for the tutorial, it's so useful.
But i didn't well understand how did you get "GoldFish.jar" ???

Thank you for your help.

priya said...

This is a terrific article, and that I would really like additional info if you have got any. I’m fascinated with this subject and your post has been one among the simplest I actually have read.
Data Science course in rajaji nagar
Data Science with Python course in chenni
Data Science course in electronic city
Data Science course in USA
Data science course in pune | Data Science Training institute in Pune
Data science course in bangalore

Anonymous said...

Informative post, i love reading such posts. Read my posts here
Fdesports
Laravel web development services
Intensityesports

Avijit said...

I agree with a lot of the points you made in this article. I appreciate the work you have put into this and hope you continue writing on this subject.

SEO Services in Kolkata
Best SEO Services in Kolkata
SEO Company in Kolkata
Best SEO Company in Kolkata
Top SEO Company in Kolkata
Top SEO Services in Kolkata
SEO Services in India
SEO Company in India

Avijit said...

This is genuinely interesting and astounding data. I sense you think a great deal like me or the other way around. Much thanks to you for sharing this extraordinary article.


Denial management software
Denials management software
Hospital denial management software
Self Pay Medicaid Insurance Discovery
Uninsured Medicaid Insurance Discovery
Medical billing Denial Management Software
Self Pay to Medicaid
Charity Care Software
Patient Payment Estimator
Underpayment Analyzer
Claim Status

Unknown said...

python training in bangalore | python online training
artificial intelligence training in bangalore |artificial intelligence onine training
uipath training in bangalore | uipath online training
blockchain training in bangalore | blockchain online training
Machine learning training in bangalore | Machine learning online training


Rohini said...

Took me time to understand all of the comments, but I seriously enjoyed the write-up. It proved being really helpful to me and Im positive to all of the commenters right here! Its constantly nice when you can not only be informed, but also entertained! I am certain you had enjoyable writing this write-up.
data science training in Hyderabad

360digiTMG Training said...


Very awesome!!! When I searched for this I found this website at the top of all blogs in search engines.
Data Science Training in Hyderabad

data scientist course said...

Through this post, i do know that your smart information in fiddling with all the items was very useful. I advise that this can be the primary place wherever I notice problems i have been sorting out. you've got a creative nevertheless engaging approach of writing.
data scientists training

Maneesha said...


Magnificent beat ! I wish to apprentice while you amend your site, how could i subscribe for a blog web site? The account aided me a acceptable deal. I had been tiny bit acquainted of this your broadcast offered bright clear idea
data scientist course in hyderabad

Maneesha said...

This was not just great in fact this was really perfect your talent in writing was great.
data scientist training and placement

Maneesha said...

The writer is enthusiastic about purchasing wooden furniture on the web and his exploration about best wooden furniture has brought about the arrangement of this article.
data scientist course in hyderabad

traininginstitute said...

This was not just great in fact this was really perfect your talent in writing was great.
business analytics course

Technogeekscs said...

Thanks for Sharing a Very Informative Post & I read Your Article & I must say that is a very helpful post for us.
Data Science Course in Pune
Python Classes in Pune

traininginstitute said...

Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
digital marketing courses in hyderabad with placement

lionelmessi said...

I'm cheerful I found this blog! Every now and then, understudies need to psychologically the keys of beneficial artistic articles forming.

Data Science Training in Hyderabad

Unknown said...

Viably, the article is actually the best point on this library related issue. I fit in with your choices and will enthusiastically foresee your next updates. business analytics course in mysore

360DigiTMG said...

They're produced by the very best degree developers who will be distinguished for your polo dress creation. You'll find Ron Lauren inside an exclusive array which includes particular classes for men, women.
business analytics training in hyderabad

traininginstitute said...

This is a really very nice post you shared, i like the post, thanks for sharing..
cyber security course

360digiTMG.com said...

Just pure brilliance from you here. I have never expected something less than this from you and you have not disappointed me at all. I suppose you will keep the quality work going on. data scientist course in mysore

Mahil mithu said...

Well, I really appreciated for your great work. This topic submitted by you is helpful and keep sharing...
Separation Before Divorce
Cost of Legal Separation VS Divorce

Anonymous said...


This is an excellent post I see thanks to sharing it. It is really what I wanted to see hope in future you will continue for sharing such a excellent post.
Cyber Security Course

Skills Program said...

I enjoyed reading about the latest trends and advancements in the field of data science in this post.

Kickstart your career by enrolling in this Data science course in Chennai