bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Endianness-specific


From: Bruno Haible
Subject: Re: Endianness-specific
Date: Sat, 6 Oct 2007 20:22:18 +0200
User-agent: KMail/1.5.4

Hi Ludovic,

> I'm trying to implement functions that convert a string in the current
> locale encoding to its UTF-{16,32} representation, for a given
> endianness.

This kind of task is outside of the scope of the uniconv/* modules.
'unistr' and 'uniconv' deal wih UTF-{8,16,32} as an internal representation
of strings in memory; therefore they assume machine-dependent endianness
and alignment - and therefore can access every unit in a single memory
access.

If the endianness or alignment is different, the code needs to access
every unit byte after byte; this is not the way it's done in the 'unistr'
and 'uniconv' libraries.

Therefore I would recommend to use the mem_cd_iconveh function from the
'striconveh' module, with FROMCODE = locale_charset() and TOCODE =
"UTF-16BE" or "UTF-16LE" (or vice versa). Or mem_iconveh you don't
want to reuse the conversion descriptors.

The str_cd_iconveh and str_iconveh functions are not usable here because they
look for the end of string via strlen().

I recommend the 'striconveh' module here over the 'striconv' module, because
it will work even with Solaris iconv() which can convert from anything to
UTF-8 and vice versa, but cannot convert directly e.g. between ISO-8859-2
and UTF-16LE. The 'striconveh' module does the conversion in two steps in
such a case.

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]