XML::Filter::DOMFilter::LibXML 0.02 review

Download
by rbytes.net on

XML::Filter::DOMFilter::LibXML is a SAX Filter allowing DOM processing of selected subtrees. SYNOPSIS use XML::LibXML; use

License: Perl Artistic License
File size: 4K
Developer: Petr Pajas
0 stars award from rbytes.net

XML::Filter::DOMFilter::LibXML is a SAX Filter allowing DOM processing of selected subtrees.

SYNOPSIS

use XML::LibXML;
use XML::Filter::DOMFilter::LibXML;

my $filter = XML::Filter::DOMFilter::LibXML->new(
Handler => $handler,
XPathContext => XML::LibXML::XPathContext->new(),
Process => [
'/foo[@A='aaa']/*/bar' => &process_bar,
'baz[parent::*/@B='bbb']' => &process_baz
]
);

my $parser = XML::SAX::YourFavoriteDriver->new( Handler => $filter );

# Some DOM processing

sub process_bar {
my ($node)=@_;
my $doc=$node->ownerDocument;
$node->appendTextChild("note","hallo world!");
$node->parentNode->insertAfter($doc->createElement("foo"),$node);
}

sub process_baz {
my ($node)=@_;
$node->unbindNode;
}

This module provides a compromise between SAX and DOM processing by allowing to use DOM API to process only reasonably small parts of an XML document. It works as a SAX filter temporarily building small DOM trees around parts selected by given XPath expressions (with some limitations, see "LIMITATIONS").
The filter has two states which will be refered to as A and B here. The initial state of the filter is A.
In the state A, only a limited vertical portion of the DOM tree is built. All SAX events other than start_element are immediatelly passed to Handler. On start_element event, a new element node is created in the DOM tree. All possible existing siblings of the newly created node are removed. Thus, while in state A, there is exactly one node on every level of the tree. Now all the XPath expressions are checked in the context of the newly created node. If none of the expressions matches, the parser remains in state A and passes the start_element event to Handler. Otherwise, the callback associated with the first expression that matched is remembered and the parser changes its state to B.

In state B the filter builds a complete DOM subtree of the new element according to the incomming events. No events are passed to Handler at this stage. When the subtree is complete (i.e. the corresponding end-tag is encountered), the callback associated with the XPath expression that matched is executed. The root element of the subtree is passed to the callback subroutine as the only argument.

The callback is allowed to do any DOM operations on the DOM subtree, even to replace it with one or more new subtrees. The callack must, however, preserve the element's parent node as well as all its ancestor nodes intact. Failing to do so can result in an error or unpredictable results.

When the callback returns, all subtrees that now appear in the DOM tree under the original element parent are serialized to SAX events and passed to Handler. After that, they are deleted from the DOM tree and the filter returns to state A.

Limitations:
Note that this type of processing highly limits the amount of information the XPath engine can use. Most notably, elements cannot be selected by their content. The only information present in the tree at the time of the XPath evaluation is the element's name and attributes and the same information for all its ancestors. There is nothing known about possible child nodes of the element as well as of its position within its siblings at the time the XPath expressions are evaluated.

Requirements:
Perl

XML::Filter::DOMFilter::LibXML 0.02 keywords