HTML::TableExtract 2.07 review

Download

HTML::TableExtract is a Perl module that simplifies the extraction of information from tables within HTML documents. Tables, no ma

License:	GPL (GNU General Public License)
File size:	23K
Developer:	Matthew P Sisk

HTML::TableExtract is a Perl module that simplifies the extraction of information from tables within HTML documents.

Tables, no matter how nested or clustered, can be targeted symbolically with column headers or by more specific depth and count information.

Each table is labeled in the first row with coordinates in terms of depth and count, which both start at 0. Some of the tables have headers in the second row; although in this example these header cells are in fact < th > tags, header cells can be either < th > or < td >. The remaining cells in the table indicate row and column information from that cell, along with the table coordinates: depth,count:row,column. Rows and columns begin at 0 as well, so the table label and headers, if present, will affect these cell coordinates.

In the illustrations of what is extracted from these tables, content in italics is notational in nature; it was not actually extracted from the tables. In particular, whenever headers are used for extraction, the order in which the headers were provided is noted by listing the headers, but the header row is not actually extracted from the target table.

What's New in This Release:
A subtable slicing bug and an hrow() attachment bug were fixed.
Tests were added.

HTML::TableExtract 2.07 screenshot
Zoom

HTML::TableExtract 2.07 search tags

HTML::TableExtract 2.07 review

Alternative/similar