JBootCat 0.2
JBootCat is a Java implemention of the BootCat scripts written by Marco Baroni et al for generating corpora from the Internet
JBootCat is a Java implemention of the BootCat scripts written by Marco Baroni et al for generating corpora from the Internet. JBootCat's main goal is to encapsulate the BootCat functionality within a user-friendly desktop application.
The advantage of using the Java platform is that JBootCat can be run easily on most major operating systems.
Here are some key features of "JBootCat":
Step-by-step "wizard" interface - review each step of the process
Enter "seeds" direct or load from a file (and save to file for future).
Generate "tuples" directly or load from a file (and save to file for future).
Queries Google's massive online index to obtain relevant web pages (only HTML pages supported at the moment).
HTML clenser and advanced tokeniser (courtesy of jTokeniser).
URL review
Selected URLs downloaded to text file (using BootCat's "Raw" format) and saved as UTF8.
Multi-platform - runs on any computer with Java installed.
Free and Open Source (LGPL)
What's New in This Release:
This version contains the core functionality for searching Google for relevant pages and then downloading, filtering, and tokenising.
tags
file for for future and save file and load from the bootcat
Download JBootCat 0.2
http://www.andy-roberts.net/software/jbootcat/releases/0.2/jbootcat-0.2.zip
http://www.andy-roberts.net/software/jbootcat/releases/0.2/jbootcat-0.2_src.zip
Authors software
|
Jacman 0.4 (by Andy Roberts)
Jacman project is a frontend for pacman software management software that comes with the equally excellent ArchLinux
|
|
jTokeniser 2.0 (by Andy Roberts)
jTokeniser project is a Java library for tokenising strings into a list of tokens.
Here are some key features of " jTokeniser":
·
|
|
JBootCat 0.2 (by Andy Roberts)
JBootCat is a Java implemention of the BootCat scripts written by Marco Baroni et al for generating corpora from the Internet
|
Similar software
|
JBootCat 0.2 (by Andy Roberts)
JBootCat is a Java implemention of the BootCat scripts written by Marco Baroni et al for generating corpora from the Internet
|
|
Forrest 0.7 (by Forrest Community)
Apache Forrest is a publishing framework that transforms input from various sources into a unified presentation in one or more output
|
|
Java Image Album 1.0 (by Mirko)
Java Image Album project is a Free Open Source easy to use wizard-style JavaTM application that generates HTML photo albums.
Autom
|
|
jTokeniser 2.0 (by Andy Roberts)
jTokeniser project is a Java library for tokenising strings into a list of tokens.
Here are some key features of " jTokeniser":
·
|
Other software in this category
Featured Software
jEdit 4.3 pre8
jEdit is an Open Source text editor written in Java
Opera 9.02
Surf the Internet in a safer, faster, and easier way with Opera browser
GNU Aspell 0.60.4
GNU Aspell is a Free and Open Source spell checker designed to eventually replace Ispell