org.greybird.xmliter
Class SaxIterator

java.lang.Object
  |
  +--org.greybird.xmliter.SaxIterator
All Implemented Interfaces:
XmlIterator

public class SaxIterator
extends java.lang.Object
implements XmlIterator

An XmlIterator that obtains its data from a SaxEventSource. SaxEventSource.generateEvents() is called to generate SAX events, which are then queued by this class and returned during iteration.

SaxIterator adds between 20% and 30% processing overhead compared to direct use of the SAX API. The added overhead depends on whether a XercesSaxEventSource (around 20% overhead) or a ThreadedSaxEventSource (around 30% overhead) is used. Compared to using the DOM API directly, SaxIterator takes only 50% of the time. These percentages are relative to the total time to parse and traverse a document. They come from the PerformanceTest included with this package.

As events are iterated they are removed from the queue and more events are incrementally added. This allows iterating over large documents while keeping only a portion of the document in memory at one time. Therefore documents of any size may be processed.

Additionally, when advance() or advanceMixed() is called and the end tag (endElement event) of the prior element has not been received, new events for the prior element's children will not be queued. This allows skipping over a subtree without wasting resources unnecessarily.

There are two ways to use a SaxIterator. The first way is to use the SaxIterator(InputSource) constructor, and let SaxIterator choose a default SAX event source class and parser. A XercesSaxEventSource is used if the Xerces Native Interface parser is available, and a ThreadedSaxEventSource is used otherwise. This technique is easy to use but does not allow any control over the parser.

 import org.greybird.xmliter.SaxIterator;
 import org.greybird.xmliter.XmlIterator;
 import org.xml.sax.InputSource;
 ...
 //
 // Create a SaxIterator from a SAX InputSource.
 //
 InputSource input = new InputSource(...);
 XmlIterator iter = new SaxIterator(input);
 while (iter.advance()) { ...
 

The second way is to use the SaxIterator(SaxEventSource) constructor, and explicitly specify the event source you wish to use. The event source may be a XercesSaxEventSource, a ThreadedSaxEventSource, or any other implementation of the SaxEventSource interface. The following example shows the use of a ThreadedSaxEventSource with a SAX parser explicitly created using JAXP.

 import javax.xml.parsers.SAXParserFactory;
 import org.greybird.xmliter.SaxEventSource;
 import org.greybird.xmliter.SaxIterator;
 import org.greybird.xmliter.XmlIterator;
 import org.greybird.xmliter.ThreadedSaxEventSource;
 import org.xml.sax.InputSource;
 import org.xml.sax.XMLReader;
 ...
 //
 // Create a ThreadedSaxEventSource from a SAX InputSource and XMLReader
 //
 InputSource input = new InputSource(...);
 XMLReader parser =
     SAXParserFactory.newInstance().newSAXParser().getXMLReader();
 SaxEventSource eventSource = new ThreadedSaxEventSource(input, parser);
 //
 // Create a SaxIterator from a ThreadedSaxEventSource
 //
 XmlIterator iter = new SaxIterator(eventSource);
 while (iter.advance()) { ...
 


Constructor Summary
SaxIterator(org.xml.sax.InputSource inputSource)
          Creates a SAX iterator for parsing a given input source using a default SAX parser.
SaxIterator(SaxEventSource eventSource)
          Creates a SAX iterator from a given SAX event source.
 
Method Summary
 boolean advance()
          Moves to the next element at the current level and returns true, or returns false if there are no more elements.
 boolean advanceMixed()
          Moves to the next element or text item at the current level and returns true, or returns false if there are no more items.
 java.lang.String attribute(java.lang.String namespace, java.lang.String name)
          Returns the value of the attribute with the given name belonging to the current element, or null if no such attribute exists, or null if positioned on a mixed text item.
 XmlAttributes attributes()
          Returns an iterator over the attributes belonging to the current element, or null if positioned on a mixed text item.
 XmlIterator children()
          Returns an iterator positioned before the first child of the current element, or null if the current element does not contain any element children, or null if positioned on a mixed text item.
 java.lang.String name()
          Returns the local name of the current element, or null if positioned on a mixed text item.
 java.lang.String namespace()
          Returns the namespace URI of the current element, or an empty string if the current element has no namespace, or null if positioned on a mixed text item.
 java.lang.String toString()
          Returns a string for error reporting that identifies the current element.
 java.lang.String value()
          Returns the text contained by the current element or text item, or null if positioned on an element that contains one or more element children.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SaxIterator

public SaxIterator(org.xml.sax.InputSource inputSource)
            throws java.io.IOException,
                   org.xml.sax.SAXException,
                   org.apache.xerces.xni.XNIException,
                   javax.xml.parsers.ParserConfigurationException
Creates a SAX iterator for parsing a given input source using a default SAX parser.

If XercesSaxEventSource.isAvailable() returns true, then XercesSaxEventSource.XercesSaxEventSource(InputSource) is used; otherwise, ThreadedSaxEventSource.ThreadedSaxEventSource(InputSource) is used. This behavior may change in a future version, but a default/standard parser configuration will always be used.

Parameters:
inputSource - is the input document to be parsed.

SaxIterator

public SaxIterator(SaxEventSource eventSource)
Creates a SAX iterator from a given SAX event source.
Parameters:
eventSource - is a SAX event source.
Method Detail

advance

public boolean advance()
                throws java.lang.IllegalStateException,
                       java.io.IOException,
                       XmlIteratorException
Description copied from interface: XmlIterator
Moves to the next element at the current level and returns true, or returns false if there are no more elements.
Specified by:
advance in interface XmlIterator
Following copied from interface: org.greybird.xmliter.XmlIterator
Throws:
java.lang.IllegalStateException - if an element would be returned out of depth-first tree order.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

advanceMixed

public boolean advanceMixed()
                     throws java.lang.IllegalStateException,
                            java.io.IOException,
                            XmlIteratorException
Description copied from interface: XmlIterator
Moves to the next element or text item at the current level and returns true, or returns false if there are no more items.
Specified by:
advanceMixed in interface XmlIterator
Following copied from interface: org.greybird.xmliter.XmlIterator
Throws:
java.lang.IllegalStateException - if an element or text item would be returned out of depth-first tree order.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

name

public java.lang.String name()
                      throws java.lang.IllegalStateException
Description copied from interface: XmlIterator
Returns the local name of the current element, or null if positioned on a mixed text item. An empty string will not be returned. Null will not be returned if advance() was called.
Specified by:
name in interface XmlIterator
Following copied from interface: org.greybird.xmliter.XmlIterator
Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.

namespace

public java.lang.String namespace()
                           throws java.lang.IllegalStateException
Description copied from interface: XmlIterator
Returns the namespace URI of the current element, or an empty string if the current element has no namespace, or null if positioned on a mixed text item.
Specified by:
namespace in interface XmlIterator
Following copied from interface: org.greybird.xmliter.XmlIterator
Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.

attribute

public java.lang.String attribute(java.lang.String namespace,
                                  java.lang.String name)
                           throws java.lang.IllegalStateException,
                                  java.io.IOException,
                                  XmlIteratorException
Description copied from interface: XmlIterator
Returns the value of the attribute with the given name belonging to the current element, or null if no such attribute exists, or null if positioned on a mixed text item. Namespace nodes are not attributes and are not returned by this method.

WARNING: Unlike the DOM methods Node.getAttribute() and getAttributeNS() this method returns null, not an empty string, when the attribute does not exist.

Specified by:
attribute in interface XmlIterator
Following copied from interface: org.greybird.xmliter.XmlIterator
Parameters:
namespace - is the namespace of the attribute or may be null or an empty string if the attribute has the empty namespace.
name - is the local name of the attribute.
Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

attributes

public XmlAttributes attributes()
                         throws java.lang.IllegalStateException,
                                java.io.IOException,
                                XmlIteratorException
Description copied from interface: XmlIterator
Returns an iterator over the attributes belonging to the current element, or null if positioned on a mixed text item. Namespace nodes are not attributes and are not returned by the iterator.
Specified by:
attributes in interface XmlIterator
Following copied from interface: org.greybird.xmliter.XmlIterator
Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

value

public java.lang.String value()
                       throws java.lang.IllegalStateException,
                              java.io.IOException,
                              XmlIteratorException
Description copied from interface: XmlIterator
Returns the text contained by the current element or text item, or null if positioned on an element that contains one or more element children.

This method always returns the complete text between elements, which may be the result of concatenating multiple SAX character events or DOM text nodes.

Specified by:
value in interface XmlIterator
Following copied from interface: org.greybird.xmliter.XmlIterator
Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

children

public XmlIterator children()
                     throws java.lang.IllegalStateException,
                            java.io.IOException,
                            XmlIteratorException
Description copied from interface: XmlIterator
Returns an iterator positioned before the first child of the current element, or null if the current element does not contain any element children, or null if positioned on a mixed text item. If a non-null iterator is returned, it will always contain at least one element.

If this method is called for an element more than once it will return the same iterator instance as was returned the first time it was called, and therefore the returned iterator may no longer be positioned before its first child.

Specified by:
children in interface XmlIterator
Following copied from interface: org.greybird.xmliter.XmlIterator
Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

toString

public java.lang.String toString()
Description copied from interface: XmlIterator
Returns a string for error reporting that identifies the current element. The returned value should be in square brackets and contain an XPath identifier or an approximation of one for the current element. For example:
[/ns:TopElement/ns:ChildElement]
Specified by:
toString in interface XmlIterator
Overrides:
toString in class java.lang.Object

Copyright (c) 2003 Mark T. Hayes; All Rights Reserved