DomSax 1.0.0 review

Download

DomSax is an implementation of a XML-parser based on the standard Document Object Model principle (and sun's implementation), co

License:	LGPL (GNU Lesser General Public License)
File size:	18K
Developer:	Richard A.

DomSax is an implementation of a XML-parser based on the standard Document Object Model principle (and sun's implementation), combining it with the flexibility and possibility of low memory consumption of the SAX-parser (also sun's implementation).

Based on the fact that most XML-documents contain repeating blocks (eg the same structure of elements repeated over and over), the parser creates for each repeating block a complete document (with the document-root being the start-element of the repeating block). This enables the programmer to keep the code clean and the memory consumption within bounds.

The parser has been tested on java 1.5.1.

For parsing XML-files there are currently two options: SAX and DOM. With SAX you get the flexibility to load specific elements from a stream, minimizing memory consumption, but complicating searches and decreasing load-time. With DOM you get the nice interface for searching elements in the completely loaded document, but this interface comes with a high cost in memory consumption and low speed.

When I started with this project one of the demands was the ability to process xml-files of 100+ Mb. This left me effectively only the choice of SAX, which allows for parsing the file element for element and enable me to keep the memory consumption within bounds. However I didn't like the implications on the code for the project. Anyone who ever created a parser with SAX will agree that you're left with a mess, because of the separation of receiving the open-tag, data and close-tag.

So what I wanted was the flexibility of the SAX parser combined with the ease of use of the DOM approach. The underlying principle of DomSax is repeating blocks, which can be indicated with the existing XPath technology. Most xml-files store records, which are always described in the same manner (eg repeating blocks).

In the example below there is a single header, which is always the first element within the document-root tag (blue box). After the header the elements follow (orange boxes). For each of the boxes indicated to the parser with an xpath a complete document is created, containing only the data within the box. After the document is completed it is passed to the registered listeners.

DomSax 1.0.0 screenshot
Zoom

DomSax 1.0.0 search tags

DomSax 1.0.0: DomSax is an implementation of a XML-parser based on the standard Document Object Model principle (and sun's implementation), co
XML::DOM::Document 1.44: XML::DOM::Document is an XML document node in XML::DOM. XML::DOM::Document extends XML::DOM::Node. It is the main root of the X
XML::Parser 2.34: XML::Parser is a perl module for parsing XML documents. SYNOPSIS use XML::Parser; $p1 = new XML::Parser(Style => 'Debu
XML::DOM 1.44: XML::DOM is a perl module for building DOM Level 1 compliant document structures. SYNOPSIS use XML::DOM; my $parser = new
XML::Parser::EasyTree 0.01: XML::Parser::EasyTree is an easier tree style for XML::Parser. SYNOPSIS use XML::Parser; use XML::Parser::EasyTree; $XM
CyberNeko HTML Parser 0.9.5: NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the inform
XML::Parser::Style::Stream 2.34: XML::Parser::Style::Stream is a Stream style for XML::Parser. SYNOPSIS use XML::Parser; my $p = XML::Parser->new(Style =>

DomSax 1.0.0 review

Alternative/similar