guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: Andy Wingo
Subject: Re: guile can't find a chinese named file
Date: Sun, 26 Feb 2017 22:20:31 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

Hello,

I feel the need to correct points in this mail for the benefit of
guile-user.  No reply is needed.

On Wed 15 Feb 2017 00:58, David Kastrup <address@hidden> writes:

> Mike Gran <address@hidden> writes:
>
>> But, for what it is worth, the Latin-1/UCS-32 design decision came
>> from a couple of conflicting requirements.  The switch happened in the
>> 1.9.x series.
>>
>> There was several examples of legacy C code using Guile for an
>> extension language that accessed the bytes of a string directly, using
>>
>> SCM_STRING_CHARS or scm_i_string_chars.  To keep from breaking legacy
>> code, we needed to retain the capability to use this (then already
>> deprecated) capability to have C programs access 8-bit-locale string
>> internals directly.
>
> But if you don't know whether the strings are Latin-1 or UCS-32, that's
> sort of academical.

Not at all.  Legacy programs don't use codepoints >255.  For UTF-32,
attempting to get the string data would throw an exception.  The
SCM_STRING_CHARS hack was a good trade-off.

> The problem is that Guile is _constantly_ required to recode strings it
> is processing.  And to add insult to injury, it cannot do this without
> data loss when its string encoding assumptions are wrong.

In Scheme, strings are sequences of characters.  Encoding and decoding
is only needed when going to and from bytes.  Guile supports a finite
number of encodings, so in general some encoding/decoding will always be
needed.  The specific encoding may change over time.

> PostScript files are usually encoded in Latin-1 with occasional UCS-16
> passages.  Reading and writing and copying such files byte-correctly
> while trying to actually parse their contents is not feasible with
> Guile.

Works perfectly well.  The web server for example reads the request as
Latin-1 and the body as something else.  Just re-set the port encoding
and there you go.

>> I still maintain that this design decision was a good one based on the
>> simplicity of implementation.
>
> As I said: the problem is not the chosen internal representation.  The
> problem is that there is no API to access it, and it does not even map
> to string ports.

String ports have nothing to do with the discussion AFAIU.  (Ports in
Guile are sequences of bytes also.  They may be accessed using textual
interfaces as well.  Therefore a string port must have an associated
encoding, to read/write the bytes.  But no error is possible for textual
I/O with the default UTF-8 encoding as all characters are representable.
Encoding to UTF-8 is fast and space-efficient.)

Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]