uni2ascii 3.10 review

Download

uni2ascii and ascii2uni convert between UTF-8 Unicode and any of a variety of 7-bit ASCII equivalents including: hexadecimal and deci

License:	GPL (GNU General Public License)
File size:	120K
Developer:	Bill Poser

uni2ascii and ascii2uni convert between UTF-8 Unicode and any of a variety of 7-bit ASCII equivalents including: hexadecimal and decimal HTML numeric character references, u-escapes, standard hexadecimal, and raw hexadecimal.

Such ASCII equivalents are useful when including Unicode text in program source, when entering text into Web programs that can handle the Unicode character set but are not 8-bit safe, and when debugging.

The Unicode escapes available are:

HTML hexadecimal numeric character references (e.g. ?)
HTML decimal numeric character references (e.g. ȳ)
u-escapes, as used in Python (e.g. u00E9)
u-escapes within the BMP and U-escapes beyond the BMP, e.g. u00E9 but U00010024.
U+-escapes (e.g. U+00E9)
U-escapes (e.g. U00E9)
u-escapes (e.g. u00E9)
U-escapes within angle brackets (e.g. )
x-escapes (e.g. x00E9)
x-escapes with braces (e.g. x{00E9})
Standard hexadecimal (e.g. 0x00E9)
Raw hexadecimal (e.g. 00E9)

uni2ascii accepts a command line flag determining whether to generate upper-case A-F or lower-case a-f as hexadecimal digits since some some programs accept only one or the other. ascii2uni accepts either.

In the case of uni2ascii by default, only characters outside the ASCII range are converted. Even if ASCII characters are also converted, newlines are preserved unless their conversion is explicitly requested. Space characters are also preserved unless conversion is explicitly requested. In the case of the three non-ASCII space characters (Ethiopic word space, Ogham space, and ideographic space), if space characters are not converted, these are replaced with ASCII space (0x20) so as to keep the output within the 7-bit ASCII range.

This package contains four programs. The main program is uni2ascii. It is written in C and must be compiled. uni2html.py is the predecessor to uni2ascii. As it is written in Python, it does not need to be compiled and should run on just about any current computer. uni2ascii is otherwise superior in that:

It generates a wider range of output formats.
It is approximately 20 times faster.
It handles input in the full 32 bit Unicode range. In contrast, uni2html handles only the

Basic Multilingual Plane (Plane 0) because at present Python represents Unicode encoded text internally using 16-bit integers. If you've got text in, say, Linear B or Ugaritic, you need uni2ascii.

It does a better job of reporting errors. If it encounters an error in its input, such as mal-formed UTF-8, it reports the location of the error both in terms of the character count from the beginning of the file (starting at 0) and in terms of the byte count from the beginning of the file (also starting at 0). (Character counts and byte counts are generally not the same since a UTF-8 encoded character occupies from one to four bytes.) The Python version reports only the character count. uni2ascii also provides information about the nature of the error.

The third program, ascii2uni, is the inverse of uni2ascii. It accepts text containing a variety of ASCII representations of Unicode characters and generates UTF-8 Unicode.

The fourth program, ascii2uni.py, reads 7-bit ASCII containing u-escaped Unicode, as used in Python and Tcl, and converts it to UTF-8 Unicode. It is the original program of which ascii2uni is a generalization.

What's New in This Release:
This release fixes several bugs and adds support for Common Lisp format hexadecimal numbers.

uni2ascii 3.10 screenshot
Zoom

uni2ascii 3.10 keywords

uni2ascii 3.10: uni2ascii and ascii2uni convert between UTF-8 Unicode and any of a variety of 7-bit ASCII equivalents including: hexadecimal and deci
UnicodeConverter 1.3: UnicodeConverter is a Java program that converts text and HTML files in ISC, TCVN3 (ABC), VISCII, VNI, and VPS format to Unicode UTF-
Cream for Vim 0.38: Cream for Vim is a free and easy-to-use configuration of the powerful and famous Vim text editor for both GNU/Linux and Microsoft Win
GIMP ASCII Load/Save Plugin 1.0: GIMP ASCII Load/Save Plugin package is a set of identical plugins Save 2 ASCII, and ASCII 2 Image, which make saving (loading) image
MP3Unicode 1.0: MP3Unicode is a command line utility to convert ID3 tags in mp3 files between different encodings. For example, mp3unicode --sourc
International Components for Unicode 3.6: International Components for Unicode provides a Unicode implementation, with functions for formatting numbers, dates, times, and curr
Convert::MIL1750A 0.1: Convert::MIL1750A is a Perl module for conversion routines between decimal floating/integer values and hexadecimal values in the MIL-

uni2ascii 3.10 review

Alternative/similar