Xml and More: Understanding Module Implementation in ROME

ROME is a set of Atom/RSS Java utilities that make it easy to work in Java with most syndication formats. It provides a Java-friendly abstraction layer on top of the various syndication specifications, that maps the commonalities of the various feed formats into a single simple JavaBeans Data Model.

ROME is designed to be extensible. It uses a plugin mechanism as described here. All the supported feed types (RSSs and Atom) is done by plugins.

Based on this article--How Rome works, it describes what happens during Rome Newsfeed parsing:

Your code calls SyndFeedInput to parse a Newsfeed, for example (see also Using Rome to read a syndication feed):

SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));

SyndFeedInput delegates to WireFeedInput to do the actual parsing.

WireFeedInput uses a PluginManager of class FeedParsers to pick the right parser to use to parse the feed and then calls that parser to parse the Newsfeed.

The appropriate parser parses the Newsfeed parses the feed, using JDom, into a WireFeed. If the Newsfeed is in an RSS format, the the WireFeed is of class Channel and contains Items, Clouds, and other RSS things from the com.sun.syndication.feed.rss package. Or, on the other hand, if the Newsfeed is in Atom format, then the WireFeed is of class Feed from the com.sun.syndication.atom package. In the end, WireFeedInput returns a WireFeed.

SyndFeedInput uses the returned WireFeedInput to create a SyndFeedImpl. Which implements SyndFeed. SyndFeed is an interface, the root of an abstraction that represents a format independent Newsfeed.

SyndFeedImpl uses a Converter to convert between the format specific WireFeed representation and a format-independent SyndFeed.

SyndFeedInput returns to you a SyndFeed containing the parsed Newsfeed.

How the Extensibility Is Supported

Using parsing as an example, the key implementation is the FeedParsers class (a subclass of PluginManager). At runtime, parsers that support different feed types are identified and created on demand using context ClassLoader for the current thread. Parser classes are defined in the properties files (i.e., rome.properties) as below:

WireFeedParser.classes=com.sun.syndication.io.impl.RSS090Parser \
       com.sun.syndication.io.impl.RSS091NetscapeParser \
       com.sun.syndication.io.impl.RSS091UserlandParser \
       com.sun.syndication.io.impl.RSS092Parser \
       com.sun.syndication.io.impl.RSS093Parser \
       com.sun.syndication.io.impl.RSS094Parser \
       com.sun.syndication.io.impl.RSS10Parser  \
       com.sun.syndication.io.impl.RSS20wNSParser  \
       com.sun.syndication.io.impl.RSS20Parser  \
       com.sun.syndication.io.impl.Atom10Parser \
       com.sun.syndication.io.impl.Atom03Parser

In step 3 described above, WireFeedInput class picks the right parser to use based on the default namespace declaration in the document (i.e., XML feed). For example, the following document is an Atom 1.0 feed:


<feed xmlns="http://www.w3.org/2005/Atom">
...
</feed>

and WireFeedInput will choose com.sun.syndication.io.impl.Atom10Parser as its parser.

Module

Modules are supported in RSS 1.0, RSS 2.0, Atom 0.3, and Atom 1.0. The primary objective of modules is to extend the basic XML schema established for more robust syndication of content. This inherently allows for more diverse, yet standardized, transactions without modifying the core syndication specification.

To establish this extension, a tightly controlled vocabulary for module is declared through an XML namespace to give names to concepts and relationships between those concepts. For example, some RSS 2.0 modules with established namespaces are:

The extensibility of ROME also include the support for module plugins. There are two types of module plugins:

Module parser plugins
Module generator plugins

Both types of module plugins can be defined at feed and item (or entry) level.

Module Plugins

At the time of parser instantiation, modules of the same feed type are identified and created on demand using context ClassLoader for the current thread. Module classes are also defined in the properties files (i.e., rome.properties) as below:

atom_1.0.feed.ModuleParser.classes=com.sun.syndication.feed.module.georss.SimpleParser \
com.sun.syndication.feed.module.georss.W3CGeoParser
atom_1.0.item.ModuleParser.classes=com.sun.syndication.feed.module.georss.SimpleParser \
com.sun.syndication.feed.module.georss.W3CGeoParser

As shown above, two module parser plugins are specified:

SimpleParser
W3CGeoParser

for the Atom 1.0 feed type at both feed and item levels. Similarly, module generator plugins can be specified as this:


atom_1.0.feed.ModuleGenerator.classes=com.sun.syndication.feed.module.georss.SimpleGenerator \
com.sun.syndication.feed.module.georss.W3CGeoGenerator \
com.sun.syndication.feed.module.georss.GMLGenerator
atom_1.0.item.ModuleGenerator.classes=com.sun.syndication.feed.module.georss.SimpleGenerator \
com.sun.syndication.feed.module.georss.W3CGeoGenerator \
com.sun.syndication.feed.module.georss.GMLGenerator

To specify module parser or generator plugins for other feed types, just replace the type prefix (i.e., atom_1.0) with other types:

atom_0.3
rss_1.0
rss_2.0

In the above, we have used GeoRss modules as examples. Using GeoRss modules, users can quickly and easily add to their existing feeds with location in an interoperable manner as shown in the example below:


<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:georss="http://www.georss.org/georss"
xmlns:gml="http://www.opengis.net/gml">
<title>Earthquakes</title>

<subtitle>International earthquake observation labs</subtitle>
<link href="http://example.org/"/>
<updated>2005-12-13T18:30:02Z</updated>
<author>
<name>Dr. Thaddeus Remor</name>
<email>tremor@quakelab.edu</email>
</author>
<id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
<entry>
<title>M 3.2, Mona Passage</title>
<link href="http://example.org/2005/09/09/atom01"/>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<updated>2005-08-17T07:02:32Z</updated>
<summary>We just had a big one.</summary>
<georss:where>
<gml:Point>
   <gml:pos>45.256 -71.92</gml:pos>
</gml:Point>
</georss:where>
</entry>
</feed>

Summary

The default plugins definition file is included in the ROME JAR file, com/sun/syndication/rome.properties, this is the first plugins definition file to be processed. It defines the default parsers, generators and converters for feeds and modules ROME provides.

After loading the default plugins definition file, ROME looks for additional plugins definition files in all the CLASSPATH entries, this time at root level, /rome.properties. And appends the plugins definitions to the existing ones. Note that if there are several /rome.properties files in the different CLASSPATH entries all of them are processed. The order of processing depends on how the ClassLoader processes the CLASSPATH entries, this is normally done in the order of appearance -of the entry- in the CLASSPATH.

The plugins classes are then loaded and instantiated. All plugins have some kind of primary key. In the case or parsers, generators and converters the primary key is the type of feed they handle. In the case of modules, the primary key is the module URI.

Friday, April 9, 2010

Understanding Module Implementation in ROME

How the Extensibility Is Supported

Module

Module Plugins

Summary

1 comment: