TRE 0.7.5 review
DownloadTRE is a robust, lightweight, and efficient POSIX compliant regexp matching library with some exciting features such as approximate (
|
|
TRE is a robust, lightweight, and efficient POSIX compliant regexp matching library with some exciting features such as approximate (fuzzy) matching.
At the core of TRE is a new algorithm for regular expression matching with submatch addressing. The algorithm uses linear worst-case time in the length of the text being searched, and quadratic worst-case time in the length of the used regular expression.
In other words, the time complexity of the algorithm is O(M2N), where M is the length of the regular expression and N is the length of the text. The used space is also quadratic on the length of the regex, but does not depend on the searched string. This quadratic behaviour occurs only on pathological cases which are probably very rare in practice.
Here are some key features of "TRE":
TRE is not just yet another regexp matcher. TRE has some features which are not there in most free POSIX compatible implementations. Most of these features are not present in non-free implementations either, for that matter.
Approximate matching
Approximate pattern matching allows matches to be approximate, that is, allows the matches to be close to the searched pattern under some measure of closeness. TRE uses the edit-distance measure (also known as the Levenshtein distance) where characters can be inserted, deleted, or substituted in the searched text in order to get an exact match. Each insertion, deletion, or substitution adds the distance, or cost, of the match. TRE can report the matches which have a cost lower than some given threshold value. TRE can also be used to search for matches with the lowest cost.
TRE includes a version of the agrep (approximate grep) command line tool for approximate regexp matching in the style of grep. Unlike other agrep implementations (like the one by Sun Wu and Udi Manber from University of Arizona available here) TRE agrep allows full regexps of any length, any number of errors, and non-uniform costs for insertion, deletion and substitution.
What's New in This Release:
A Swedish translation has been added.
Documentation has been updated.
The -q command line option has been added.
A number of bugs have been fixed.
TRE 0.7.5 keywords