Poliqarp 1.0


Poliqarp is a utility for searching large corpora. Here are some key features of "Poliqarp": Support for tagged corpora · The
Developer:   Daniel Janus
      more software by author →
Price:  0.00
License:   GPL (GNU General Public License)
File size:   0K
Language:   
OS:   
Rating:   0 /5 (0 votes)
Your vote:  
enlarge screenshot


Poliqarp is a utility for searching large corpora.

Here are some key features of "Poliqarp":
Support for tagged corpora

  • The searched collection can contain not only raw text, but also information about the words and texts that constitute it (grammatical forms of words; structure of the texts; various meta-information about the texts such as authorship and date of writing).

    Expressive query language

  • Poliqarp's query language is based on regular expressions and allows you to search not only for a given word or sequences of words, but also, for example, for:
  • an adjective followed by a noun
  • five nouns in a row
  • five, six, or seven nouns in a row
  • a given word occurring close, but not necessarily next, to another given word
  • words starting with 'z' that occur in texts published in the 19th century
  • sentences longer than 100 words
  • ...and many more

    Support for positional tagsets

  • The tags assigned to words can have an internal structure, and this structure may be incorporated in queries. For instance, nouns might have gender, number or case, verbs might have aspect, and so on.
  • This is especially useful with languages that are rich in inflection, such as Polish (in fact, Poliqarp was originally developed and is used within a Polish corpus project — the IPI PAN Corpus).
  • Does not depend on a particular tagset
    Support for Unicode
  • You can create corpora of texts written in almost any language in its native script — be it English, Polish, Japanese or Thai — as long as they are encoded in the UTF-8 format.

    Support for ambiguities

  • Tags of a word are not necessarily unique: there might occur situations where a word can be interpreted in several ways (and thus have several tags assigned to it). Poliqarp can handle such situations and allows you to say whether your query must match any of the possible interpretations or all of them. Few, if any, other concordancers have this ability.

    Multi-platform

  • Poliqarp is written in Java and portable C, and is thus available for Windows and most Unix-like systems, including Linux, *BSD and Solaris. Currently, it supports only little-endian architectures, but work is underway to make it endian-neutral.

    Efficient

  • It is hard to estimate the average time of searching a corpus, since it heavily depends on the structure of the query. However, simple queries (for a word or phrase) take a few seconds even on corpora containing more than a hundred million words (in terms of raw texts, that's several gigabytes including tags and metadata!) More complex query take longer to execute, but even then, you get the results as soon as they are found, so you don't have to wait long.

    Free

  • Poliqarp is free/open source software, available under the terms of the GNU General Public License.

    Requirements:
  • Java 1.5
    tags support for  given word  allows you  not necessarily  tags assigned  they are  might have  queries for  and allows  query language  but also  not only  information about  

    Download Poliqarp 1.0


     http://prdownloads.sourceforge.net/poliqarp/poliqarp-1.0.tar.bz2?use_mirror=kent
     http://prdownloads.sourceforge.net/poliqarp/poliqarp-1.0.tar.bz2?use_mirror=switch
     http://prdownloads.sourceforge.net/poliqarp/poliqarp-1.0.tar.bz2?use_mirror=superb


    Authors software

    Poliqarp 1.0 (by Daniel Janus)
    Poliqarp is a utility for searching large corpora.

    Here are some key features of "Poliqarp":
    Support for tagged corpora

    · The


    Similar software

    Poliqarp 1.0 (by Daniel Janus)
    Poliqarp is a utility for searching large corpora.

    Here are some key features of "Poliqarp":
    Support for tagged corpora

    · The

    Full Text for SQLite3 0.2.2 (by Pierre Aubert)
    Full Text for SQLite3 is a full text indexer for datas stored into a sqlite3 database

    Diogenes 1.4.2 (by Peter Heslin)
    Diogenes is a tool for searching and browsing the databases of ancient texts, primarily in Latin and Greek, that are published by the

    Anagramarama 0.2 (by Colm Gallagher)
    Anagramarama is a FREE word game for Linux, Windows and BeOS.

    The aim is to find as many words as possible in the time available

    kdsing 0.3.0 (by Rolf Jakob)
    kdsing is a project that searches in words list (for translations).

    kdsing uses a word list (e.g

    Memorize Words Flashcard System 2.1.1.0 (by Mohammad Reza Alavi)
    Memorize word Flashcard System is a Leitner Flashcard English learning tool.

    Memorize Words Flashcard System was originally based

    Uplug 0.2.0 (by Joerg Tiedemann)
    Uplug is a collection of tools for linguistic corpus processing, word alignment, and term extraction from parallel corpora

    The Comprehensive Danish Dictionary 1.6.0 (by Skane Sjelland Linux User Group)
    The Comprehensive Danish Dictionary is a word list for spell checking of Danish texts.

    The Comprehensive Danish Dictionary (DSDO)

    JLearnIt 5.0 (by Anthony Goubard)
    JLearnIt is a multilingual dictionary sorted by categories that helps you learn the vocabulary of another language progressively (eac

    ispell-da 1.6.0 (by Jacob Sparre Andersen)
    ispell-da is an ispell dictionary for spell-checking of Danish texts


    Other software in this category

    HTMLDOC 1.8.27 (by Michael Sweet)
    HTMLDOC converts HTML files and web pages into indexed HTML, PostScript, and PDF files suitable for on-line viewing and printing.

    Auto Directory Index PHP Script 1.5.4 (by Justin Hagstrom)

    harvest 1.9.15 (by kjl)
    Harvest is a system to collect information and make them searchable using a web interface

    SWISH++ 6.1.4 (by Paul J. Lucas)
    SWISH++ is a Unix-based file indexing and searching engine (typically used to index and search files on web sites).

    SWISH++ projec

    PyLucene 2.0 (by Andi Vajda)
    PyLucene is a GCJ-compiled version of Java Lucene integrated with Python via SWIG.

    PyLucene goal is to allow you to use Lucene's t

  •     search


    Featured Software

    jEdit 4.3 pre8
    jEdit is an Open Source text editor written in Java

    Opera 9.02
    Surf the Internet in a safer, faster, and easier way with Opera browser

    GNU Aspell 0.60.4
    GNU Aspell is a Free and Open Source spell checker designed to eventually replace Ispell


    Subscribe in Rojo
    Google Reader
    Add to My Yahoo!

    Add to My AOL
    Subscribe with Bloglines
    Subscribe in NewsGator Online
    Add 'nixbit linux software' to Newsburst from CNET News.com
    del.icio.us nixbit linux software


    Top tags