[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF8 string storage and retrieval in XML

From: Mark H Weaver
Subject: Re: UTF8 string storage and retrieval in XML
Date: Mon, 01 Feb 2016 13:16:38 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Richard Shann <address@hidden> writes:

> Can anyone explain what is going on when you try to store strings with
> non-ASCII characters? Here is an example:
> guile> (define another-data "Čć")           
> guile> another-data
> "�\x8c�\x87"
> guile> (display another-data)
> Čćguile> 

I guess this is Guile 1.x, where strings are merely byte sequences.
Your terminal is using the UTF-8 encoding, where "Čć" is represented as
the byte sequence:

  0xC4 0x8C 0xC4 0x87

When printing this using 'write' (which is how values are printed at the
REPL), Guile 1.x is treating this byte sequence as Latin-1.  The 0xC4 is
the Latin-1 representation for the character "Ä", but 0x8C and 0x87 are
not characters in Latin-1 and so are escaped as "\x8c" and "\x87".

When printing using display, Guile is simply writing the bytes out
unescaped, which your terminal interprets as UTF-8.

Obviously this is terrible, which is why Guile 2.0+ strings are
sequences of unicode code points.  Can you switch to Guile 2.0?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]