Linux SoftwareProgrammingLibrariesHTML::Parser 3.54

HTML::Parser 3.54


HTML::Parser is a HTML parser class
Developer:   Gisle Aas
      more software by author →
Price:  0.00
License:   Perl Artistic License
File size:   82K
Language:   
OS:   
Rating:   0 /5 (0 votes)
Your vote:  
enlarge screenshot


HTML::Parser is a HTML parser class. Objects of the HTML::Parser class will recognize markup and separate it from plain text (alias data content) in HTML documents. As different kinds of markup and text are recognized, the corresponding event handlers are invoked.
HTML::Parser is not a generic SGML parser.

We have tried to make it able to deal with the HTML that is actually "out there", and it normally parses as closely as possible to the way the popular web browsers do it instead of strictly following one of the many HTML specifications from W3C. Where there is disagreement, there is often an option that you can enable to get the official behaviour.

The document to be parsed may be supplied in arbitrary chunks. This makes on-the-fly parsing as documents are received from the network possible.
If event driven parsing does not feel right for your application, you might want to use HTML::PullParser. This is an HTML::Parser subclass that allows a more conventional program structure.

SYNOPSIS:

use HTML::Parser ();

# Create parser object
$p = HTML::Parser->new( api_version => 3,
start_h => [&start, "tagname, attr"],
end_h => [&end, "tagname"],
marked_sections => 1,
);

# Parse document text chunk by chunk
$p->parse($chunk1);
$p->parse($chunk2);
#...
$p->eof; # signal end of document

# Parse directly from file
$p->parse_file("foo.html");
# or
open(my $fh, "parse_file($fh);
tags html parser  parse file  use html  the html  parser class  markup and  

Download HTML::Parser 3.54


 http://mirrors.evolva.ro/CPAN/authors/id/G/GA/GAAS/HTML-Parser-3.54.tar.gz


Authors software

libwww-perl 5.64 (by Gisle Aas)
libwww-perl distribution is a collection of Perl modules, which provides a simple and consistent programming interface (API) to the W

HTML::Parser 3.54 (by Gisle Aas)
HTML::Parser is a HTML parser class

URI 1.35 (by Gisle Aas)
URI is Uniform Resource Identifiers (absolute and relative)

MIME::Base64 3.07 (by Gisle Aas)
MIME::Base64 is an encoding and decoding of base64 strings.

SYNOPSIS

use MIME::Base64;

$encoded = encode_base64('Aladdin:o

Convert::Recode 1.04 (by Gisle Aas)
Convert::Recode is a Perl module to make mapping functions between character sets.

SYNOPSIS

use Convert::Recode qw(ebcdic_to_


Similar software

HTML::Parser 3.54 (by Gisle Aas)
HTML::Parser is a HTML parser class

MP3::M3U::Parser 2.20 (by Burak G?rsoy)
MP3::M3U::Parser is a MP3 playlist parser.

SYNOPSIS

use MP3::M3U::Parser;
my $parser = MP3::M3U::Parser->new(%options);

ShaniXmlParser 1.4.7 (by Quentin Anciaux)
ShaniXmlParser is an XML/HTML DOM/SAX parser that can be validating

XML::Parser 2.34 (by Larry Wall)
XML::Parser is a perl module for parsing XML documents.

SYNOPSIS

use XML::Parser;

$p1 = new XML::Parser(Style => 'Debu

Jericho HTML Parser 2.3 (by Martin Jericho)
Jerich HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including s

Cobra 0.96 (by Jose)
Cobra HTML Toolkit is an open source library that provides a pure Java HTML parser and a renderer

libsgml 1.1.4 (by Matt Miller)
libsgml is a fast, lightweight state machine SGML parser capable of parsing HTML, XML, and most other markup languages in their most

HTML::Latex 1.1 (by HTML::Latex Team)
HTML::Latex is a Perl module that creates a Latex file from an HTML file.

SYNOPSIS

use HTML::Latex

my $parser = new HTML::Latex(

Pod::HTML_Elements 0.05 (by Nick Ing-Simmons)
Pod::HTML_Elements is a Perl module to convert POD to tree of LWP's HTML::Element and hence HTML or PostScript.

SYNOPSIS

XML::Parser::PerlSAX 0.08 (by Ken MacLeod)
XML::Parser::PerlSAX is a Perl SAX parser using XML::Parser.

SYNOPSIS

use XML::Parser::PerlSAX;

$parser = XML::Parser::Per


Other software in this category

zlib 1.2.3 (by Jean-loup Gailly)
zlib is designed to be a free, general-purpose, legally unencumbered, lossless data-compression library for use on virtually any comp

libjpeg v6b (by Independent JPEG Group)
libjpeg is a library for handling the JPEG (JFIF) image format

OpenSSL 0.9.7c (by The OpenSSL Project Team)
The OpenSSL Project is a collaborative effort to develop a robust, commercial-grade, full-featured, and Open Source toolkit implement

libxml2 2.6.27 (by DV)
Libxml2 is the XML C parser and toolkit developed for the Gnome project (but usable outside of the Gnome platform), libxml2 library i

GNU C library 2.4 (by Andreas Jaeger)
GNU C library (glibc) is one of the most important components of GNU Hurd and most modern Linux distributions.

GNU C library is us

    search


Featured Software

jEdit 4.3 pre8
jEdit is an Open Source text editor written in Java

Opera 9.02
Surf the Internet in a safer, faster, and easier way with Opera browser

GNU Aspell 0.60.4
GNU Aspell is a Free and Open Source spell checker designed to eventually replace Ispell


Subscribe in Rojo
Google Reader
Add to My Yahoo!

Add to My AOL
Subscribe with Bloglines
Subscribe in NewsGator Online
Add 'nixbit linux software' to Newsburst from CNET News.com
del.icio.us nixbit linux software


Top tags