Linux SoftwareText Editing&ProcessingOthersUnicodeConverter 1.3

UnicodeConverter 1.3


UnicodeConverter is a Java program that converts text and HTML files in ISC, TCVN3 (ABC), VISCII, VNI, and VPS format to Unicode UTF-
Developer:   Quan Nguyen
      more software by author →
Price:  0.00
License:   GPL (GNU General Public License)
File size:   0K
Language:   
OS:   
Rating:   0 /5 (0 votes)
Your vote:  
enlarge screenshot


UnicodeConverter is a Java program that converts text and HTML files in ISC, TCVN3 (ABC), VISCII, VNI, and VPS format to Unicode UTF-8. Conversion support for Unicode Composite, Numeric Character References (NCR), and VIQR (Vietnet) is also included. In all cases, the output will be in Unicode Normalization Form C, or better known as Unicode Precomposed format.

UnicodeConverter, executable in both graphic user interface (GUI) and command-line modes, is capable of converting multiple files in a directory, or an entire directory, including its subdirectories. In effect, this enhanced capability enables conversion of an entire website to Unicode UTF-8 format with one single command or a few mouse clicks. Drag-and-Drop support is also included.

Support for conversion of Word documents and Excel workbooks on the Windows platform is included. This feature is implemented using JACOB, a Java-COM Bridge that allows clients to call COM Automation components from Java. JACOB uses Java Native Interface (JNI) to make native calls into the COM and Win32 libraries; consequently, the added functionality is not portable nor available to other platforms. Conversion support for Rich Text Format files is also provided.

UnicodeConverter is released and distributed under the GNU General Public License. Its homepage is at http://unicodeconvert.sourceforge.net.
SYSTEM REQUIREMENTS

You will need to have the Java 2 Runtime Environment, Standard Edition (JRE) 1.4 or later installed on your machine to execute UnicodeConverter. J2RE can be downloaded free from http://java.sun.com/j2se/. The Java 2 Runtime Environment, Standard Edition (JRE) consists of the Java virtual machine, the Java platform core classes, and supporting files to allow you to run applications written in the Java programming language.

On Mac OS X Tiger or Panther, UnicodeConverter runs without additional requirements. For Jaguar 10.2.6 or later, Java 1.4.1 Update 1 can be installed.

To be able to convert Word or Excel documents, you'll need to be on a Windows system with Microsoft Word or Excel installed. Put the file jacob.dll in your path, for example, into the system32 or jre/bin folder.
HOW TO RUN UnicodeConverter

UnicodeConverter is written in Java language and packaged as executable Java-Archive. Download and unzip UnicodeConverter-1.3.zip. UnicodeConverter.jar is the Java-Archive executable program to be run. You can run it either by double-clicking the UnicodeConverter.jar file or by executing the command uni at the command line to launch the program in GUI mode. Alternatively, the longer commands

java -jar UnicodeConverter.jar

or (on Windows)

javaw -jar UnicodeConverter.jar

will work, too. The filename is case-sensitive on some operating systems. Be sure the directory that contains the UnicodeConverter.jar file is the current directory.

Note: It is recommended that Microsoft Word/Excel not open any file when you convert Word/Excel documents. It may cause errors or slow down the conversion process.

Tip: Minimize the number of text boxes within Word documents to a few; having too many will slow down conversion significantly.

You can select single or multiple files, or a directory d for conversion. The resulting Unicode output files will be placed in a d_Unicode directory located at the same tree level as the source directory that contains the original files, which remain unchanged. You also can drag files or directory from native file manager and drop onto the application window to initiate conversion operation.

The program can also function as a command-line program, which is frequently used in batch file processing:

java -jar UnicodeConverter.jar < SourceEncoding > < SourceFile/Dir > < TargetFile/Dir >

where possible options for source encoding are VNI, VISCII, VPS, VIQR, TCVN3, and UNI-COMP. This functionality works for text-based files only, not Word/Excel documents.

Unicode composite (UNI-COMP) source text files should be saved in UTF-8 format for correct conversion to Unicode precomposed.

The default fonts for the output UTF-8 HTML files are Times New Roman, and Arial. Users can change to other Unicode-compliant fonts, using Unicode-compatible HTML editors such as FrontPage or Composer. Do not use Unicode-incompatible editors (such as Notepad of Win9x/Me) to edit UTF-8 files. Doing so would corrupt the UTF-8 byte sequence, rendering the characters or the file unreadable.

Use Firefox, Netscape, Internet Explorer (Windows), Opera, Mozilla, Safari, OmniWeb, or Chimera web browsers to view UTF-8 HTML files. You will not need to change their default settings; the tag tells the browsers to use Unicode UTF-8 character encoding in displaying the page.
FILE PREPARATIONS FOR CONVERSION

To ensure successful conversion of HTML files in legacy formats and to minimize post-conversion editing, some pre-conversion conditioning may need to be performed on the source files. Changing the original document fonts to the more common ones with respect to its original encoding may be needed (see table below). Removing obsolete dynamic font links (.pfr or .eot) and associated ActiveX control scripts (e.g., tdserver.js) is also recommended, for leaving them in will needlessly slow down page download.

These basic editing tasks should be done prior to the actual conversion process and can be expeditiously performed by using MDI (multiple document interface) text editors which allow opening multiple files and performing global find/replace actions on all open files at once. CuteHTML, TextPad, UltraEdit, EditPlus, and EditPad are some text editors that sport such useful features. They can be searched and downloaded from http://www.download.com.
Source Encoding Fonts for original HTML documents
VNI VNI-Times, VNI Times, VNI-Aptima, VNI Aptima, VNI-Helve, VNI Helve
VPS VPS Times, VPS Helv
VISCII VI Times, VI Arial, HoangYen, MinhQu, PhuongThao, ThaHuong, UHo
TCVN3 .VnTime, .VnTimeH, .VnArial, .VnArialH
VIQR No font formatting

Note: Due to the nature of TCVN3 encoding, conversion of some Vietnamese capital vowels will result in incorrect, lower case. Some post-conversion editing may be necessary.
UNICODE-COMPLIANT FONTS

Unicode has only limited support in Windows 95/98/Me, but they are still capable of displaying all Vietnamese characters using appropriate Unicode fonts. Full Unicode support is built into Windows NT/2000/XP. Linux and Mac OS 8.5 or greater have begun to provide support Unicode. Mac OS X and Palm OS provide full Unicode support.

The following TrueType fonts, which come supplied with Windows 98SE/Me/2000/XP, contain many Unicode characters, including Vietnamese:

Times New Roman, Courier New, Arial, Tahoma, Verdana, Palatino Linotype

This list of Unicode fonts is by no means comprehensive, as there are more and more fonts are being commercially developed or expanded to include Unicode characters.

Requirements:
  • Java 1.4.2 or later

    What's New in This Release:
  • Refactored using Design Patterns to improve code reusability, program extensibility and maintainability
  • Updated JACOB library to version 1.9.1
    tags unicodeconverter jar  the java  html files  excel documents  for conversion  word excel  unicode utf  multiple files  command line  support for  slow down  jar unicodeconverter  unicode compliant  

    Download UnicodeConverter 1.3


     http://prdownloads.sourceforge.net/unicodeconvert/UnicodeConverter-1.3.zip?use_mirror=voxel
     http://prdownloads.sourceforge.net/unicodeconvert/UnicodeConverter-1.3.zip?use_mirror=heanet
     http://prdownloads.sourceforge.net/unicodeconvert/UnicodeConverter-1.3.zip?use_mirror=switch
     http://prdownloads.sourceforge.net/unicodeconvert/UnicodeConverter-1.3-src.zip?use_mirror=jaist


    Authors software

    UnicodeConverter 1.3 (by Quan Nguyen)
    UnicodeConverter is a Java program that converts text and HTML files in ISC, TCVN3 (ABC), VISCII, VNI, and VPS format to Unicode UTF-

    VietPad 1.3 (by Quan Nguyen)
    VietPad is a full-featured Java/.NET Vietnamese Unicode text editor

    VietIME 1.2 (by Quan Nguyen)
    VietIME is a Java-based Vietnamese input method editor (IME)


    Similar software

    UnicodeConverter 1.3 (by Quan Nguyen)
    UnicodeConverter is a Java program that converts text and HTML files in ISC, TCVN3 (ABC), VISCII, VNI, and VPS format to Unicode UTF-

    VietPad 1.3 (by Quan Nguyen)
    VietPad is a full-featured Java/.NET Vietnamese Unicode text editor

    ID3iconv 0.2.1 (by Feng Zhou)
    ID3iconv is a little Java command line tool to convert ID3 tags in mp3 files from whatever machine encoding you have to Unicode

    downCast 1.7.2 (by infinity-loop GmbH)
    downCast converts XML documents valid according to the upCast DTD (and, in fact, any XML document whose DTD can be mapped into the up

    uni2ascii 3.10 (by Bill Poser)
    uni2ascii and ascii2uni convert between UTF-8 Unicode and any of a variety of 7-bit ASCII equivalents including: hexadecimal and deci

    libucd 0.2.1-5.0.0 (by H. Peter Anvin)

    MP3Unicode 1.0 (by Andrei Dubovik)
    MP3Unicode is a command line utility to convert ID3 tags in mp3 files between different encodings.

    For example, mp3unicode --sourc

    VietIME 1.2 (by Quan Nguyen)
    VietIME is a Java-based Vietnamese input method editor (IME)

    catdoc 0.94 (by Victor Wagner)
    catdoc is program which reads one or more Microsoft word files and outputs text, contained insinde them to standard output

    UVConverter 1.1.3b (by Pham Kim Long)
    UVConverter is a universal command-line Vietnamese encoding converter which supports all popular charsets/encodings (Unicode, VNI, TC


    Other software in this category

    EditPad 5.4.5 (by Jan Goyvaerts)
    EditPad for Linux is compatible with most recent Linux distributions, including SuSE 7.x and 8.x, Mandrake 7.x, 8.x and 9.x and Red H

    LinuText 1.3 (by Naskita)
    LinuText is a text editor for Linux written in Tcl/Tk

    Vim 7.0 (by Bram Moolenaar)
    Vim is an advanced text editor that seeks to provide the power of the de-facto Unix editor 'Vi', with a more complete feature set.

    GNU nano 2.0.1 (by Chris Allegretta)
    GNU nano project was started because of a few "problems" with the wonderfully easy-to-use and friendly Pico text editor.

    First and

    NEdit 5.5 (by NEdit)
    NEdit is a multi-purpose text editor for the X Window System, which combines a standard, easy to use, graphical user interface with t

  •     search


    Featured Software

    jEdit 4.3 pre8
    jEdit is an Open Source text editor written in Java

    Opera 9.02
    Surf the Internet in a safer, faster, and easier way with Opera browser

    GNU Aspell 0.60.4
    GNU Aspell is a Free and Open Source spell checker designed to eventually replace Ispell


    Subscribe in Rojo
    Google Reader
    Add to My Yahoo!

    Add to My AOL
    Subscribe with Bloglines
    Subscribe in NewsGator Online
    Add 'nixbit linux software' to Newsburst from CNET News.com
    del.icio.us nixbit linux software


    Top tags