Poliqarp 1.0
Poliqarp is a utility for searching large corpora.
Here are some key features of "Poliqarp":
Support for tagged corpora
· The
Poliqarp is a utility for searching large corpora.
Here are some key features of "Poliqarp":
Support for tagged corpora
The searched collection can contain not only raw text, but also information about the words and texts that constitute it (grammatical forms of words; structure of the texts; various meta-information about the texts such as authorship and date of writing).
Expressive query language
Poliqarp's query language is based on regular expressions and allows you to search not only for a given word or sequences of words, but also, for example, for:
an adjective followed by a noun
five nouns in a row
five, six, or seven nouns in a row
a given word occurring close, but not necessarily next, to another given word
words starting with 'z' that occur in texts published in the 19th century
sentences longer than 100 words
...and many more
Support for positional tagsets
The tags assigned to words can have an internal structure, and this structure may be incorporated in queries. For instance, nouns might have gender, number or case, verbs might have aspect, and so on.
This is especially useful with languages that are rich in inflection, such as Polish (in fact, Poliqarp was originally developed and is used within a Polish corpus project — the IPI PAN Corpus).
Does not depend on a particular tagset
Support for Unicode
You can create corpora of texts written in almost any language in its native script — be it English, Polish, Japanese or Thai — as long as they are encoded in the UTF-8 format.
Support for ambiguities
Tags of a word are not necessarily unique: there might occur situations where a word can be interpreted in several ways (and thus have several tags assigned to it). Poliqarp can handle such situations and allows you to say whether your query must match any of the possible interpretations or all of them. Few, if any, other concordancers have this ability.
Multi-platform
Poliqarp is written in Java and portable C, and is thus available for Windows and most Unix-like systems, including Linux, *BSD and Solaris. Currently, it supports only little-endian architectures, but work is underway to make it endian-neutral.
Efficient
It is hard to estimate the average time of searching a corpus, since it heavily depends on the structure of the query. However, simple queries (for a word or phrase) take a few seconds even on corpora containing more than a hundred million words (in terms of raw texts, that's several gigabytes including tags and metadata!) More complex query take longer to execute, but even then, you get the results as soon as they are found, so you don't have to wait long.
Free
Poliqarp is free/open source software, available under the terms of the GNU General Public License.
Requirements:
Java 1.5
tags
support for given word allows you not necessarily tags assigned they are might have queries for and allows query language but also not only information about
Download Poliqarp 1.0
http://prdownloads.sourceforge.net/poliqarp/poliqarp-1.0.tar.bz2?use_mirror=kent
http://prdownloads.sourceforge.net/poliqarp/poliqarp-1.0.tar.bz2?use_mirror=switch
http://prdownloads.sourceforge.net/poliqarp/poliqarp-1.0.tar.bz2?use_mirror=superb
Authors software
|
Poliqarp 1.0 (by Daniel Janus)
Poliqarp is a utility for searching large corpora.
Here are some key features of "Poliqarp":
Support for tagged corpora
· The
|
Similar software
|
Poliqarp 1.0 (by Daniel Janus)
Poliqarp is a utility for searching large corpora.
Here are some key features of "Poliqarp":
Support for tagged corpora
· The
|
|
Diogenes 1.4.2 (by Peter Heslin)
Diogenes is a tool for searching and browsing the databases of ancient texts, primarily in Latin and Greek, that are published by the
|
|
Anagramarama 0.2 (by Colm Gallagher)
Anagramarama is a FREE word game for Linux, Windows and BeOS.
The aim is to find as many words as possible in the time available
|
|
kdsing 0.3.0 (by Rolf Jakob)
kdsing is a project that searches in words list (for translations).
kdsing uses a word list (e.g
|
|
Uplug 0.2.0 (by Joerg Tiedemann)
Uplug is a collection of tools for linguistic corpus processing, word alignment, and term extraction from parallel corpora
|
|
JLearnIt 5.0 (by Anthony Goubard)
JLearnIt is a multilingual dictionary sorted by categories that helps you learn the vocabulary of another language progressively (eac
|
Other software in this category
|
HTMLDOC 1.8.27 (by Michael Sweet)
HTMLDOC converts HTML files and web pages into indexed HTML, PostScript, and PDF files suitable for on-line viewing and printing.
|
|
harvest 1.9.15 (by kjl)
Harvest is a system to collect information and make them searchable using a web interface
|
|
SWISH++ 6.1.4 (by Paul J. Lucas)
SWISH++ is a Unix-based file indexing and searching engine (typically used to index and search files on web sites).
SWISH++ projec
|
|
PyLucene 2.0 (by Andi Vajda)
PyLucene is a GCJ-compiled version of Java Lucene integrated with Python via SWIG.
PyLucene goal is to allow you to use Lucene's t
|
Featured Software
jEdit 4.3 pre8
jEdit is an Open Source text editor written in Java
Opera 9.02
Surf the Internet in a safer, faster, and easier way with Opera browser
GNU Aspell 0.60.4
GNU Aspell is a Free and Open Source spell checker designed to eventually replace Ispell