guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: David Kastrup
Subject: Re: guile can't find a chinese named file
Date: Mon, 30 Jan 2017 20:27:34 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)

Marko Rauhamaa <address@hidden> writes:

> David Kastrup <address@hidden>:
>
>> Marko Rauhamaa <address@hidden> writes:
>>> Guile's mistake was to move to Unicode strings in the operating system
>>> interface.
>>
>> Emacs uses an UTF-8 based encoding internally [...]
>
> C uses 8-bit characters. That is a model worth emulating.

That's Guile-1.8.  Guile-2 uses either Latin-1 or UCS-32 in its string
internals, either Latin-1 or UTF-8 in its string API, and UTF-8 in its
string port internals.

> UTF-8 beautifully bridges the interpretation gap between 8-bit
> character strings and text. However, the interpretation step should be
> done in the application and not in the programming language.

Elisp is focused enough about text that I think its choice of going
UTF-8 internally with a Unicode character type reasonably sane.  Its
strings (the quirky unibyte strings excluded) are its own variant of
UTF-8 internally, and its string port equivalent (buffers) are that same
variant of UTF-8.  And its API talks UTF-8 for strings, Unicode (or
higher) for characters, and it indexes strings and buffers via Unicode
character counts.  Not O(1), but with enough trickery that it works well
enough in practice.  If strings are to be implemented strictly
Scheme-standard-conforming, they need to be O(1) indexable.  The Scheme
standard is rather silent about Unicode however.  I am not sure that
sticking to the standard where it does not deal with reality is the best
choice.

I think the case for Guile-2 to _also_ support "unibyte strings" would
be quite stronger than for Emacs (byte arrays and binary string ports
don't allow using Guile's string processing functions).  As it stands,
the design of Guile-2 in my book currently involves too many mandatory
conversions for just passing data around with Guile itself and
Guile-based applications.

> Support libraries for Unicode are naturally welcome.
>
> Plain Unicode text is actually quite a rare programming need. It is
> woefully inadequate for the human interface, which generally requires
> numerous other typesetting effects. But is also causing unnecessary
> grief in the computer-computer interface, where the classic textual
> naming and textual protocols are actually cutely chosen octet-aligned
> binary formats.

Sometimes yes, sometimes not.  As long as Guile wants to be a
general-purpose programming and extension language, it should deal
reliably and robustly and reproducibly with whatever is thrown at it.
Its choice of libraries does not currently make it so, but that could be
fixed by either working on the (GNU) libraries or by giving Guile its
own implementation.

But that needs to be considered a priority.  Nobody will do this just
for fun and kicks.

-- 
David Kastrup



reply via email to

[Prev in Thread] Current Thread [Next in Thread]