Linux SoftwareInternetHTTP (WWW)JBootCat 0.2

JBootCat 0.2


JBootCat is a Java implemention of the BootCat scripts written by Marco Baroni et al for generating corpora from the Internet
Developer:   Andy Roberts
      more software by author →
Price:  0.00
License:   LGPL (GNU Lesser General Public License)
File size:   989K
Language:   
OS:   
Rating:   0 /5 (0 votes)
Your vote:  
enlarge screenshot


JBootCat is a Java implemention of the BootCat scripts written by Marco Baroni et al for generating corpora from the Internet. JBootCat's main goal is to encapsulate the BootCat functionality within a user-friendly desktop application.

The advantage of using the Java platform is that JBootCat can be run easily on most major operating systems.

Here are some key features of "JBootCat":
  • Step-by-step "wizard" interface - review each step of the process
  • Enter "seeds" direct or load from a file (and save to file for future).
  • Generate "tuples" directly or load from a file (and save to file for future).
  • Queries Google's massive online index to obtain relevant web pages (only HTML pages supported at the moment).
  • HTML clenser and advanced tokeniser (courtesy of jTokeniser).
  • URL review
  • Selected URLs downloaded to text file (using BootCat's "Raw" format) and saved as UTF8.
  • Multi-platform - runs on any computer with Java installed.
  • Free and Open Source (LGPL)

    What's New in This Release:
  • This version contains the core functionality for searching Google for relevant pages and then downloading, filtering, and tokenising.
    tags file for  for future  and save  file and  load from  the bootcat  

    Download JBootCat 0.2


     http://www.andy-roberts.net/software/jbootcat/releases/0.2/jbootcat-0.2.zip
     http://www.andy-roberts.net/software/jbootcat/releases/0.2/jbootcat-0.2_src.zip


    Authors software

    Jacman 0.4 (by Andy Roberts)
    Jacman project is a frontend for pacman software management software that comes with the equally excellent ArchLinux

    jTokeniser 2.0 (by Andy Roberts)
    jTokeniser project is a Java library for tokenising strings into a list of tokens.

    Here are some key features of " jTokeniser":
    ·

    JBootCat 0.2 (by Andy Roberts)
    JBootCat is a Java implemention of the BootCat scripts written by Marco Baroni et al for generating corpora from the Internet


    Similar software

    JBootCat 0.2 (by Andy Roberts)
    JBootCat is a Java implemention of the BootCat scripts written by Marco Baroni et al for generating corpora from the Internet

    Template::Tutorial 2.15 (by Andy Wardley)
    Template::Tutorial are template toolkit tutorials.

    This section includes tutorials on using the Template Toolkit

    MathML::Entities 0.13 (by Jacques Distler)
    MathML::Entities is a Perl module that can convert XHTML+MathML Named Entities to Numeric Character References.

    SYNOPSIS

    use M

    Forrest 0.7 (by Forrest Community)
    Apache Forrest is a publishing framework that transforms input from various sources into a unified presentation in one or more output

    Java Image Album 1.0 (by Mirko)
    Java Image Album project is a Free Open Source easy to use wizard-style JavaTM application that generates HTML photo albums.

    Autom

    jTokeniser 2.0 (by Andy Roberts)
    jTokeniser project is a Java library for tokenising strings into a list of tokens.

    Here are some key features of " jTokeniser":
    ·

    Java Sudoku 1.0.1 (by Samantha Yen)
    Java Sudoku is a cross platform version of the popular Sudoku logic game

    MathStudio 0.7.2-1 (by Francesco Montorsi)
    MathStudio project is an interactive equation editor and step-by-step solver.

    MathStudio is a project aimed at making math-typing,

    Red-Piranha 0.3 (by Paul Browne)
    Red-Piranha is an open source search system that can actually 'learn' what you are looking for

    CodePrinter 1.0.3 (by J-Domain)
    CodePrinter is a tiny utility to print out source code or other text files


    Other software in this category

    SquirrelMail 1.5.1 (by The SquirrelMail Project Team)
    SquirrelMail is a standards-based Webmail package written in PHP4

    Tiki CMS/Groupware 1.9.7 (by Luis Argerich)

    Downloader for X 2.5.7 (by Chuchelo)
    Downloader for X is a tool for downloading files from the Internet via both HTT

    Links 2.1pre26 (by Martin Pergel)
    Links is graphics and text mode WWW browser, similar to Lynx

    Mozilla Firefox 1.5.0.8 (by Mozilla Project)

  •     search


    Featured Software

    jEdit 4.3 pre8
    jEdit is an Open Source text editor written in Java

    Opera 9.02
    Surf the Internet in a safer, faster, and easier way with Opera browser

    GNU Aspell 0.60.4
    GNU Aspell is a Free and Open Source spell checker designed to eventually replace Ispell


    Subscribe in Rojo
    Google Reader
    Add to My Yahoo!

    Add to My AOL
    Subscribe with Bloglines
    Subscribe in NewsGator Online
    Add 'nixbit linux software' to Newsburst from CNET News.com
    del.icio.us nixbit linux software


    Top tags