XML::SAX::Intro 0.14 review

Download
by rbytes.net on

XML::SAX::Intro is an Introduction to SAX Parsing with Perl. XML::SAX is a new way to work with XML Parsers in Perl

License: Perl Artistic License
File size: 57K
Developer: Matt Sergeant
0 stars award from rbytes.net

XML::SAX::Intro is an Introduction to SAX Parsing with Perl.

XML::SAX is a new way to work with XML Parsers in Perl. In this article we'll discuss why you should be using SAX, why you should be using XML::SAX, and we'll see some of the finer implementation details. The text below assumes some familiarity with callback, or push based parsing, but if you are unfamiliar with these techniques then a good place to start is Kip Hampton's excellent series of articles on XML.com.

Replacing XML::Parser

The de-facto way of parsing XML under perl is to use Larry Wall and Clark Cooper's XML::Parser. This module is a Perl and XS wrapper around the expat XML parser library by James Clark. It has been a hugely successful project, but suffers from a couple of rather major flaws. Firstly it is a proprietary API, designed before the SAX API was conceived, which means that it is not easily replaceable by other streaming parsers. Secondly it's callbacks are subrefs. This doesn't sound like much of an issue, but unfortunately leads to code like:

sub handle_start {
my ($e, $el, %attrs) = @_;
if ($el eq 'foo') {
$e->{inside_foo}++; # BAD! $e is an XML::Parser::Expat object.
}
}

As you can see, we're using the $e object to hold our state information, which is a bad idea because we don't own that object - we didn't create it. It's an internal object of XML::Parser, that happens to be a hashref. We could all too easily overwrite XML::Parser internal state variables by using this, or Clark could change it to an array ref (not that he would, because it would break so much code, but he could).

The only way currently with XML::Parser to safely maintain state is to use a closure:

my $state = MyState->new();
$parser->setHandlers(Start => sub { handle_start($state, @_) });

This closure traps the $state variable, which now gets passed as the first parameter to your callback. Unfortunately very few people use this technique, as it is not documented in the XML::Parser POD files.

Another reason you might not want to use XML::Parser is because you need some feature that it doesn't provide (such as validation), or you might need to use a library that doesn't use expat, due to it not being installed on your system, or due to having a restrictive ISP. Using SAX allows you to work around these restrictions.

Requirements:
Perl

XML::SAX::Intro 0.14 keywords