minpair 0.5 review

Download
by rbytes.net on

minpair generates a complete list of minimal pairs (words differing in exactly one segment) from a list of words

License: GPL (GNU General Public License)
File size: 132K
Developer: Bill Poser
0 stars award from rbytes.net

minpair generates a complete list of minimal pairs (words differing in exactly one segment) from a list of words. The input should consist of one entry per line in UTF-8 Unicode. As default, each entry consists of two parts, separated by a tab. The first field is the word. The second field is an identifier. Typically this will be a gloss or record number.

The output lists the two segments contrasting in the minimal pair, then the two words, each followed by its identifier, if supplied, and then the context for the difference, with a difference site marker (by default an underscore) marking the site of the difference. The segments differing are listed in a fixed order (that of the character codes) so that all tokens of the same pair will sort together.

By default minpair searches only for pairs of words of the same length differing in exactly one segment. Command line options allow the addition of single insertions/deletions and single transpositions.

In order to find all minimal pairs it is normally necessary for the input notation to use one character for each segment. Even in IPA transcription, this is often not the case. minpair provides for this situation by accepting definitions of multigraphs. For instance, if you put the sequences p', t', and k', representing glottalized /p/, /t/, and /k/, in the multigraph definition file, minpair will treat them as single segments. The multigraph definition file should consist of the character sequences that are to be treated as single segments, one per line. Like all other input, this file should be encoded in UTF-8 Unicode. Sequences declared as multigraphs are compressed to a single UTF-32 codepoint so that they will compare as single segments, then decompressed on output.

The basic program has a command-line interface. mpg provides an optional graphical interface. mpg will also arrange for the output of minpair to be sorted if a suitable sort utility is available. Standard sort utilities like Unix sort will do, but if the data contains multigraphs, the best results will be obtained using msort since it can read and use the same multigraph definitions as does minpair.

It is also possible to use mpg without minpair. mpg can find minimal pairs involving substitutions but currently cannot handle indels and transpositions. mpg is slower than minpair but fast enough as to be usable with lists of a few thousand words.
mpg is also able to find pairs of words that differ in two positions, which minpair does not know how to do. This is useful when looking for phonological rules. The maximum distance between the two positions may be specified.

Requirements:
msort

What's New in This Release:
GNU autoconfiguration is now available.
It is now possible to run mpg without minpair. mpg can find minimal pairs involving substitutions but not indels or transpositions. mpg is slower than minpair but fast enough as to be tolerable for lists of several thousand words.
mpg can find pairs of words differing in exactly two positions. This is useful in looking for phonological rules.
Improved codepoint validation in popup for entering characters by Unicode codepoint. Now clear message window at the beginning of each attempt to insert a character. Also gave the popup a title.
Corrected error in accented letter chart in mpg that had an erroneous value for i with double grave.
Updated font control panel to new version that provides color control.
Scrollbars now scroll by a large increment if the right mouse button is used in mpg.
Added list of Tcl commands available in init file to the help menu of mpg.
Added Save Configuration command to mpg.
Made a number of changes in the system for defining custom character insertion charts. There are now two commands available in init files: ReadCharacterChart, which takes a filename as argument reads from the file, and DefineCharacterChart, which takes an in-place tcl list as argument. Character chart specifications now require u immediately preceding the hex codeponts

minpair 0.5 keywords