org.greybird.xmliter
Interface XmlIterator

All Known Implementing Classes:
SaxIterator, DomIterator

public interface XmlIterator

An iterator for traversing the elements and text of an XML document.

Rather than a callback model like SAX, XmlIterator uses an iterator object that is moved forward through the items in an XML document. Because it is designed to work efficiently with an underlying SAX event stream, the iterator does not allow moving backward in the document.

An iterator is always in one of the following states.

  1. Before advance() has been called for the first time, the iterator is positioned before the first element. All other methods will throw IllegalStateException.
  2. When positioned at an element containing one or more element children, children() will return an iterator over the element's children and value() will return null.
  3. When positioned at an element containing text but no element children, value() will return the contained text and children() will return null.
  4. When positioned at an empty element, value() will return an empty string and children() will return null.

(Note that advanceMixed() may be used instead of advance() and the rules are then slightly different, as will be explained later.)

The following method iterates recursively over all elements in a document.

 import java.io.IOException;
 import org.greybird.xmliter.XmlIterator;
 import org.greybird.xmliter.XmlIteratorException;
 ...
 static void traverse(XmlIterator iter, String indent)
     throws IOException, XmlIteratorException
 {
     while (iter.advance()) {
         System.out.println(indent + "- " + iter.name() +
                            "  (" + iter.namespace() + ')');
         XmlIterator children = iter.children();
         if (children != null) {
             traverse(children, indent + "   ");
         } else {
             String value = iter.value();
             if (value.length() > 0) {
                 System.out.println(indent + "   = " + value);
             }
         }
     }
 }
 

Given the following XML document, traverse() will output the text below it.

 <Top xmlns="http://namespaceURI/">
   <One>
     <Two/>
   </One>
   <Three>some text...</Three>
   <Four/>
 </Top>
 ------------------------------------
 - Top  (http://namespaceURI/)
    - One  (http://namespaceURI/)
       - Two  (http://namespaceURI/)
    - Three  (http://namespaceURI/)
       = some text...
    - Four  (http://namespaceURI/)
 

Notice that the whitespace between element tags is not returned by the iterator. This is because the advance() method returns the next element and skips text that is mixed with elements at the same level of the document. This is convenient for structured XML documents where such mixing only occurs with whitespace that is irrelevant.

Mixed text and elements at the same level of the document are called "mixed content", and are used with unstructured XML documents such as XHTML (XML-ized HTML). Mixed content is supported by calling the advanceMixed() method, which moves to the next element or text item.

The following method iterates recursively over all elements and mixed text items in a document.

 import java.io.IOException;
 import org.greybird.xmliter.XmlIterator;
 import org.greybird.xmliter.XmlIteratorException;
 ...
 static void traverseMixed(XmlIterator iter, String indent)
     throws IOException, XmlIteratorException
 {
     while (iter.advanceMixed()) {
         if (iter.name() != null) {
             System.out.println(indent + "- " + iter.name() +
                                "  (" + iter.namespace() + ')');
             XmlIterator children = iter.children();
             if (children != null) {
                 traverseMixed(children, indent + "   ");
             } else {
                 String value = iter.value();
                 if (value.length() > 0) {
                     System.out.println(indent + "   = " + value);
                 }
             }
         } else {
             System.out.println(indent + "= " + iter.value());
         }
     }
 }
 

Given the following XHTML document, traverseMixed() will output the text below it.

 <html xmlns="http://www.w3.org/1999/xhtml">
 <body>
 <p>This is a <b>bold</b> word.</p>
 </body>
 </html>
 ------------------------------------
 - html  (http://www.w3.org/1999/xhtml)
    = 
 
    - body  (http://www.w3.org/1999/xhtml)
       = 
 
       - p  (http://www.w3.org/1999/xhtml)
          = This is a 
          - b  (http://www.w3.org/1999/xhtml)
             = bold
          =  word.
       = 
 
    = 
 

If a document does not have a mixed content model, it is preferrable to use advance() instead of advanceMixed() to avoid dealing with irrelevant whitespace. After advance() is called, you can be sure that a non-null element name() will always be returned.

But if you do need to use advanceMixed() there are a couple of things to keep in mind.

The following output from running traverse() with the XHTML input above shows that mixed text is ignored by the advance() method. Notice that "This is a" and "word" are not listed because they are mixed with elements, while "bold" is listed because no elements appear at the same level.

 - html  (http://www.w3.org/1999/xhtml)
    - body  (http://www.w3.org/1999/xhtml)
       - p  (http://www.w3.org/1999/xhtml)
          - b  (http://www.w3.org/1999/xhtml)
             = bold
 

Author:
Mark Hayes

Method Summary
 boolean advance()
          Moves to the next element at the current level and returns true, or returns false if there are no more elements.
 boolean advanceMixed()
          Moves to the next element or text item at the current level and returns true, or returns false if there are no more items.
 java.lang.String attribute(java.lang.String namespace, java.lang.String name)
          Returns the value of the attribute with the given name belonging to the current element, or null if no such attribute exists, or null if positioned on a mixed text item.
 XmlAttributes attributes()
          Returns an iterator over the attributes belonging to the current element, or null if positioned on a mixed text item.
 XmlIterator children()
          Returns an iterator positioned before the first child of the current element, or null if the current element does not contain any element children, or null if positioned on a mixed text item.
 java.lang.String name()
          Returns the local name of the current element, or null if positioned on a mixed text item.
 java.lang.String namespace()
          Returns the namespace URI of the current element, or an empty string if the current element has no namespace, or null if positioned on a mixed text item.
 java.lang.String toString()
          Returns a string for error reporting that identifies the current element.
 java.lang.String value()
          Returns the text contained by the current element or text item, or null if positioned on an element that contains one or more element children.
 

Method Detail

advance

public boolean advance()
                throws java.lang.IllegalStateException,
                       java.io.IOException,
                       XmlIteratorException
Moves to the next element at the current level and returns true, or returns false if there are no more elements.
Throws:
java.lang.IllegalStateException - if an element would be returned out of depth-first tree order.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

advanceMixed

public boolean advanceMixed()
                     throws java.lang.IllegalStateException,
                            java.io.IOException,
                            XmlIteratorException
Moves to the next element or text item at the current level and returns true, or returns false if there are no more items.
Throws:
java.lang.IllegalStateException - if an element or text item would be returned out of depth-first tree order.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

name

public java.lang.String name()
                      throws java.lang.IllegalStateException
Returns the local name of the current element, or null if positioned on a mixed text item. An empty string will not be returned. Null will not be returned if advance() was called.
Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.

namespace

public java.lang.String namespace()
                           throws java.lang.IllegalStateException
Returns the namespace URI of the current element, or an empty string if the current element has no namespace, or null if positioned on a mixed text item.
Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.

value

public java.lang.String value()
                       throws java.lang.IllegalStateException,
                              java.io.IOException,
                              XmlIteratorException
Returns the text contained by the current element or text item, or null if positioned on an element that contains one or more element children.

This method always returns the complete text between elements, which may be the result of concatenating multiple SAX character events or DOM text nodes.

Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

children

public XmlIterator children()
                     throws java.lang.IllegalStateException,
                            java.io.IOException,
                            XmlIteratorException
Returns an iterator positioned before the first child of the current element, or null if the current element does not contain any element children, or null if positioned on a mixed text item. If a non-null iterator is returned, it will always contain at least one element.

If this method is called for an element more than once it will return the same iterator instance as was returned the first time it was called, and therefore the returned iterator may no longer be positioned before its first child.

Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

attribute

public java.lang.String attribute(java.lang.String namespace,
                                  java.lang.String name)
                           throws java.lang.IllegalStateException,
                                  java.io.IOException,
                                  XmlIteratorException
Returns the value of the attribute with the given name belonging to the current element, or null if no such attribute exists, or null if positioned on a mixed text item. Namespace nodes are not attributes and are not returned by this method.

WARNING: Unlike the DOM methods Node.getAttribute() and getAttributeNS() this method returns null, not an empty string, when the attribute does not exist.

Parameters:
namespace - is the namespace of the attribute or may be null or an empty string if the attribute has the empty namespace.
name - is the local name of the attribute.
Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

attributes

public XmlAttributes attributes()
                         throws java.lang.IllegalStateException,
                                java.io.IOException,
                                XmlIteratorException
Returns an iterator over the attributes belonging to the current element, or null if positioned on a mixed text item. Namespace nodes are not attributes and are not returned by the iterator.
Throws:
java.lang.IllegalStateException - if advance() or advanceMixed() has not yet been called.
java.io.IOException - if an error occurs retrieving input data.
XmlIteratorException - if an error occurs processing input data.

toString

public java.lang.String toString()
Returns a string for error reporting that identifies the current element. The returned value should be in square brackets and contain an XPath identifier or an approximation of one for the current element. For example:
[/ns:TopElement/ns:ChildElement]
Overrides:
toString in class java.lang.Object

Copyright (c) 2003 Mark T. Hayes; All Rights Reserved