DomSax 1.0.0 review

Download

DomSax is an implementation of a XML-parser based on the standard Document Object Model principle (and sun's implementation), co

License:	LGPL (GNU Lesser General Public License)
File size:	18K
Developer:	Richard A.

DomSax is an implementation of a XML-parser based on the standard Document Object Model principle (and sun's implementation), combining it with the flexibility and possibility of low memory consumption of the SAX-parser (also sun's implementation).

Based on the fact that most XML-documents contain repeating blocks (eg the same structure of elements repeated over and over), the parser creates for each repeating block a complete document (with the document-root being the start-element of the repeating block). This enables the programmer to keep the code clean and the memory consumption within bounds.

The parser has been tested on java 1.5.1.

For parsing XML-files there are currently two options: SAX and DOM. With SAX you get the flexibility to load specific elements from a stream, minimizing memory consumption, but complicating searches and decreasing load-time. With DOM you get the nice interface for searching elements in the completely loaded document, but this interface comes with a high cost in memory consumption and low speed.

When I started with this project one of the demands was the ability to process xml-files of 100+ Mb. This left me effectively only the choice of SAX, which allows for parsing the file element for element and enable me to keep the memory consumption within bounds. However I didn't like the implications on the code for the project. Anyone who ever created a parser with SAX will agree that you're left with a mess, because of the separation of receiving the open-tag, data and close-tag.

So what I wanted was the flexibility of the SAX parser combined with the ease of use of the DOM approach. The underlying principle of DomSax is repeating blocks, which can be indicated with the existing XPath technology. Most xml-files store records, which are always described in the same manner (eg repeating blocks).

In the example below there is a single header, which is always the first element within the document-root tag (blue box). After the header the elements follow (orange boxes). For each of the boxes indicated to the parser with an xpath a complete document is created, containing only the data within the box. After the document is completed it is passed to the registered listeners.

DomSax 1.0.0 screenshot
Zoom

DomSax 1.0.0 search tags

DomSax 1.0.0: DomSax is an implementation of a XML-parser based on the standard Document Object Model principle (and sun's implementation), co
TYPO3 4.0.3 RC1: TYPO3 is a free Open Source content management system for enterprise purposes on the web and in intranets. TYPO3 offers full flexi
Fenris 0.07-m2 build 3245: Fenris is a suite of tools suitable for code analysis, debugging, protocol analysis, reverse engineering, forensics, diagnostics, sec
PDFreactor 1.1.936.7: RealObjects PDFreactor is a powerful formatting processor for converting XML and XHTML/HTML documents into PDF
SquirrelMail 1.5.1: SquirrelMail is a standards-based Webmail package written in PHP4
GAMGI 0.12.3: GAMGI aims to be useful for: 1) the scientific community working in Atomistic Modelling, that needs a graphic interface to build inpu
Featherweight Linux 1.3: Featherweight Linux is my Live-CD installable Linux distribution that I remastered from Feather Linux, which is built on knoppix te

DomSax 1.0.0 review

Alternative/similar