For non-programmers, you can use GATE Developer for all of the NLP tasks. For programmers, you can use GATE Embedded to embed its language processing functionality in your applications. In this article, we will demontrate how to use GATE in a standalone application named GoldFish.
Gold Fish ExampleAndrew Golightly has noticed a lot of programmers having problems running GATE outside of the GUI (i.e., wanting to run it as a standalone program).
So, he has written a sample program which essentially implements the "Goldfish" example in the User Guide for GATE (see http://www.gate.ac.uk/sale/tao/index.html#x1-220002.7).
This program counts the number of times the word "Goldfish" appears in a sentence. It uses three Processing Resources (PRs) to achieve that:
BootStrap WizardTo create a new PR you need to:
- Write a Java class (i.e., GoldFish.java) that implements GATE’s beans model
- Compile the class, and any others that it uses, into a Java Archive (JAR) file
- Write some XML configuration data (i.e., creole.xml) for the new resource
- Tell GATE the URL of the new JAR and XML files.
- Start up GATE Developer
- Start up BootStrap Wizard (Tools > BootStrap Wizard)
- Fill in the information as shown below
put all your Java sources in here.
any external files used by your plugin (e.g. configuration files,
JAPE grammars, gazetteer lists, etc.) go in here.
Ant build file for building your plugin.
property definitions that control the build process go in here,
in particular, make sure that gate.home points to your copy of GATE.
plugin configuration file for GATE - edit this to add parameters, etc.,
for your resources.
For my environment, I need to update gate.home property in build.properties to point to my new GATE installation location:
Next we use Eclipse IDE for our project development. You can proceed as follows:
- Start up Eclipse
- Bring up New Project Dialog (File > New > Project)
- Select "Java Project from Existing Ant Buildfile" wizard
- Specify GoldFish as your project name
- Select GoldFish/build.xml as your Ant buildfile
- Click Finish
A new project named GoldFish is created as below:
Under src folder, there is a file named GoldFish.java in a package named sheffield.creole.example, which is created by BootStrap Wizard. Now, let's copy two files:
from GATE example code repository and put them in src/sheffield/creole/example . Note that you need to fix up package name (i.e., from andrewgolightly.nlp.gate to sheffield.creole.example) and class name (from Goldfish to GoldFish). So, your src/sheffield/creole/example folder looks like this:
Before we proceed, we need to add two statements below Gate.init() in TotalGoldfishCount.java:
// need resource data for GlodFish
// need ANNIE plugin for the Defaulttokeniser and SentenceSplitter
new File(Gate.getPluginsHome(), ANNIEConstants.PLUGIN_DIR).toURL()
Without these fixes, you'll see the following ResourceInstantiationException exception:
gate.creole.ResourceInstantiationException: Couldn't get ...
Next, you need to edit the Run/Debug Settings of project properties to add a single argument:
This is our input document to be processed by the GATE pipeline. You can copy it from here.
Now we need to compile the class and package it into a JAR file. The bootstrap wizard creates an Ant build file that makes this very easy – so long as you have Ant set up properly, you can simply run
from command line. This will compile the Java source code and package the resulting classes into GoldFish.jar.
Finally, you can run TotalGoldfishCount by right selecting it from Package Explorer(TotalGoldfishCount.java > Run As > Java Application). If everything was set up appropriately, you should see the following output from the console:
== OBTAINING DOCUMENTS ==
1) testFile.txt -- success
== USING GATE TO PROCESS THE DOCUMENTS ==
* Loading gate.creole.tokeniser.DefaultTokeniser ... done
* Loading gate.creole.splitter.SentenceSplitter ... done
* Loading sheffield.creole.example.GoldFish ... done
Creating corpus from documents obtained...done
Running processing resources over corpus...done
== DOCUMENT FEATURES ==
The features of document "/D:/Gate/GoldFishExample/GoldFish/testFile.txt" are:
*) Number of tokens --> 56
*) Total "Goldfish" count --> 9
*) Number of words --> 46
*) Number of characters --> 322
*) Number of sentences --> 7
Demo done... :)