RDFStore::Parser::SiRPAC 0.50 review

Download
by rbytes.net on

RDFStore::Parser::SiRPAC is a Perl module that implements a streaming RDF Parser as a direct implementation of XML::Parser::Expat.

License: GPL (GNU General Public License)
File size: 476K
Developer: Alberto Reggiori
0 stars award from rbytes.net

RDFStore::Parser::SiRPAC is a Perl module that implements a streaming RDF Parser as a direct implementation of XML::Parser::Expat.

SYNOPSIS

use RDFStore::Parser::SiRPAC;
use RDFStore::NodeFactory;
my $p=new RDFStore::Parser::SiRPAC(
ErrorContext => 2,
Handlers => {
Init => sub { print "INITn"; },
Final => sub { print "FINALn"; },
Assert => sub { print "STATEMENT - @_n"; }
},
NodeFactory => new RDFStore::NodeFactory() );

$p->parsefile('http://www.gils.net/bsr-gils.rdfs');
$p->parsefile('http://www.gils.net/rdf/bsr-gils.rdfs');
$p->parsefile('/some/where/my.rdf');
$p->parsefile('file:/some/where/my.rdf');
$p->parse(*STDIN); #parse stream but with *blocking* Expat (see below example for n-blocking parsing using XML::Parse::ExpatNB)

use RDFStore::Parser::SiRPAC;
use RDFStore::NodeFactory;
my $pstore=new RDFStore::Parser::SiRPAC(
ErrorContext => 2,
Style => 'RDFStore::Parser::Styles::RDFStore::Model',
NodeFactory => new RDFStore::NodeFactory(),
store => {
persistent => 1,
seevalues => 1,
options => { Name => '/tmp/test' }
}
);
my $rdfstore_model = $pstore->parsefile('http://www.gils.net/bsr-gils.rdfs');

#using the expat no-blocking feature (generally for large XML streams) - see XML::Parse::Expat(3)
my $rdfstore_stream_model = $pstore->parsestream(*STDIN);

This module implements a Resource Description Framework (RDF) streaming parser completely in Perl using the XML::Parser::Expat(3) module. The actual RDF parsing happens using an instance of XML::Parser::Expat with Namespaces option enabled and start/stop and char handlers set. The RDF specific code is based on the modified version of SiRPAC of Sergey Melnik in Java; a lot of changes and adaptations have been done to actually run it under Perl. Expat options may be provided when the RDFStore::Parser::SiRPAC object is created. These options are then passed on to the Expat object on each parse call.

Exactly like XML::Parser(3) the behavior of the parser is controlled either by the Style entry elsewhere in this document and/or the Handlers entry elsewhere in this document options, or by the setHandlers entry elsewhere in this document method. These all provide mechanisms for RDFStore::Parser::SiRPAC to set the handlers needed by Expat. If neither Style nor Handlers are specified, then parsing just checks the RDF document syntax against the W3C RDF Raccomandation . When underlying handlers get called, they receive as their first parameter the Expat object, not the Parser object.

To see some examples about how to use it look at the sections below and in the samples and utils directory coming with this software distribution.
E.g. With RDFStore::Parser::SiRPAC you can easily write an rdfingest.pl script to do something like this:

fetch -o - -q http://dmoz.org/rdf/content.rdf.u8.gz |
gunzip - |
sed -f dmoz.content.sed | rdfingest.pl -

Requirements:
Perl

RDFStore::Parser::SiRPAC 0.50 keywords