Uplug 0.2.0 reviewDownload
Uplug is a collection of tools for linguistic corpus processing, word alignment, and term extraction from parallel corpora
Uplug is a collection of tools for linguistic corpus processing, word alignment, and term extraction from parallel corpora. Several tools have been integrated in Uplug.
Pre-processing tools include a sentence splitter, tokenizer, and external part-of-speech tagger and shallow parsers. The following external tools are used: the Grok system for English (tagging and chunking) and the morphological analyzer ChaSen for Japanese.
Other tools such as the TreeTagger can easily be added. Translated documents can be sentence aligned using the length-based approach by Gale & Church. Words and phrases can be aligned using the clue alignment approach and the toolbox for training statistical alignment models GIZA++.
What's New in This Release:
User management for the Web-based alignment interfaces (ICA & ISA) using simple password protection and user-specific data storage.
2 new sentence aligners integrated into Uplug: hunalign and GMA.
Another sentence alignment approach: uplugalign (length-based sentence alignment with cognate/dictionary filters).
Quickstart documentation for the new features.
Uplug 0.2.0 keywords