Enca 1.9 review
DownloadEnca detects the encoding of text files, on the basis of knowledge of their language. Enca is an Extremely Naive Charset Analyser
|
|
Enca detects the encoding of text files, on the basis of knowledge of their language.
Enca is an Extremely Naive Charset Analyser. It detects character set and encoding of text files and can also convert them to other encodings.
The charset detecing functionality is also available as a library. Work has begun on pyenca, a Python libenca interface.
Here are some key features of "Enca":
recognises several multibyte encodings: UCS-2, UCS-4, UTF-8, UTF-7 and TeX accents
recognises all common EOL types, byte orders and also Quoted-printables
detects files accidentaly converted twice to UTF-8 from some 8bit encoding
can report charset names after various conventions (or programs) as well as human-readable descriptions; accepts all common charset aliases
works with multiple files and can act as an intelligent filter
converts files using a built-in convertor, GNU recode library, UNIX98 iconv functions or some external convertor that can be specified on command line (e.g. cstocs, GNU recode)
automagically converts files to your locale preferred character set when called as enconv
has a special ambiguous mode for very short texts
can filter out binary parts of file and/or box drawing characters before guessing so it can determine encoding of pretty messy files
uses various tricks to solve hardly decidable cases like distinguishing between iso8859-2/cp1250, etc.
is fairly portable, runs on GNU/Linux and all sane Unices
What's New in This Release:
Support for HZ encoded GB2312 was added.
GB2312 and Big5 detection were improved.
Enca 1.9 keywords