libunicode 0.7 review

by on

Libunicode offers low-level Unicode (UTF-16) text processing functionality, which can be divided into three categories: - Charact

License: GPL (GNU General Public License)
File size: 0K
Developer: Matthew Parry
0 stars award from

Libunicode offers low-level Unicode (UTF-16) text processing functionality,
which can be divided into three categories:

- Character handling
- String handling
- Charsets handling

Libunicode uses ISO/IEC 10646-defined UTF-16 encoding for storing and minipulating all character entities. It will supports other encoding standards (e.g., UTF-8, ISO 8859-x, etc.) for input and output only.

Libunicode bases, where applicable, on "Single Unix Specification, Version 2(R)" (susv2) as API and semantics reference. susv2 is the unification and superset of de jure POSIX and ANSI C (run-time library part) and de facto BSD standards. This means that, if you know standard character and string handling functions, you can readily use libunicode; and, if you have apllication using standard character/string processing facilities, you may with minimal troubles make it Unicode-aware.

Also, don't let word "Unix" in standard name confuse you. Susv2, as same as POSIX, is standard for *Open* operating systems, where MS Windows, MacOs, etc. fit. Such name was choosen by OpenGroup, maintainer of susv2, to unite and defend market sectors actively attacked by Microsoft with its "decommodizing" tactics. Libunicode is bright example of opposite approach, offering crossplatform portability and comptability for Unix and Win32 systems. (*)

(*) Opinions presented in the paragraph above are solely opinion of documentation author and should not be considered as reflecting real state of the things.

Libunicode defines new type, 'Uchar', which can handle any non-surrogate UTF-16 character without space overhead.

Library offer two APIs, one being precise remapping of susv2 functions, and one offering slightly higher-level API, with automatic memory management fully controlled by user.

Functions of 1st API (fully standard-compliant, the one you probably will use) uses 'u_' prefix, e.g. standard

char *strchr(const char *s, char c);


Uchar *u_strchr(const Uchar *s, Uchar c);

Functions of 2nd API use 'uni_' prefix. They are conceived to be used in special environments, for example, in Apache webserver modules. Most functions has completely identical 'u_' and 'uni_' implementation, but following have differring from standard argument structure and semantics:


You should consult library reference for their full description.

libunicode 0.7 search tags