HTML::TableExtract 2.07 review

Download

HTML::TableExtract is a Perl module that simplifies the extraction of information from tables within HTML documents. Tables, no ma

License:	GPL (GNU General Public License)
File size:	23K
Developer:	Matthew P Sisk

HTML::TableExtract is a Perl module that simplifies the extraction of information from tables within HTML documents.

Tables, no matter how nested or clustered, can be targeted symbolically with column headers or by more specific depth and count information.

Each table is labeled in the first row with coordinates in terms of depth and count, which both start at 0. Some of the tables have headers in the second row; although in this example these header cells are in fact < th > tags, header cells can be either < th > or < td >. The remaining cells in the table indicate row and column information from that cell, along with the table coordinates: depth,count:row,column. Rows and columns begin at 0 as well, so the table label and headers, if present, will affect these cell coordinates.

In the illustrations of what is extracted from these tables, content in italics is notational in nature; it was not actually extracted from the tables. In particular, whenever headers are used for extraction, the order in which the headers were provided is noted by listing the headers, but the header row is not actually extracted from the target table.

What's New in This Release:
A subtable slicing bug and an hrow() attachment bug were fixed.
Tests were added.

HTML::TableExtract 2.07 screenshot
Zoom

HTML::TableExtract 2.07 keywords

HTML::TableExtract 2.07: HTML::TableExtract is a Perl module that simplifies the extraction of information from tables within HTML documents. Tables, no ma
PDFreactor 1.1.936.7: RealObjects PDFreactor is a powerful formatting processor for converting XML and XHTML/HTML documents into PDF
HTML Parser 1.6-20060610: HTMLParser is a super-fast real-time parser for real-world HTML
FoX Desktop 1.0 Professional: FoX Desktop Professional is the first Professional edition of FoX Desktop Linux, a Fedora-based desktop-oriented distribution, is now
Featherweight Linux 1.3: Featherweight Linux is my Live-CD installable Linux distribution that I remastered from Feather Linux, which is built on knoppix te
ASPseek 1.2.10: ASPseek is an Internet search engine software developed by SWsoft and licensed as free software under GNU GPL. ASPseek consists of
SiSU 0.48.8: SiSU is a Serialized information, Structured Units for Electronic Documents, is a document creation and management framework. Here

HTML::TableExtract 2.07 review

Alternative/similar