Re: Worrying development

From: Marius Vollmer
Subject: Re: Worrying development
Date: Sat, 24 Jan 2004 01:27:51 +0100
Tom Lord <address@hidden> writes:

>     > I'd say that the real 'trouble' is that strings are mutable at
>     > all.
> Worried mostly about variable-length character encodings in string?
> Or you'd just rather be programming in an ML-family language? :-)

Heh, no, I'm not really worried, I was actually trying to comment
Dirk's concerns.

>     > Also, I still like the idea of using mutation-sharing substrings as
>     > markers that allow O(1) access into variable-width encoded strings.
> Interesting.  The interaction with STRING-SET! will be tricky.  I
> think you'll either have to "timestamp" strings (one tick per mutation
> -- and you'll likely have to use a GC'ed value rather than an inline
> integer for timestamps) or wind up with O(K) for mutations where K is
> the number of shared substrings.

Yes.  What I have in mind is that accessing strings is efficient as
long as no mutations are performed.  I.e., instead of indication
positions in a string with an integer index, you create a shared
substring that starts at the desired position.  (This could be done
with COW substrings, tho.)

>     > Also, there is the possibility on the horizon that we turn
>     > string-ref etc into 'primitive generics' which means that people
>     > could implement new kinds of strings using GOOPS.
> Well, heck.  In that case, maybe consider what I'm planning for Pika
> (at least initially).  Purely ASCII strings are stored 1-byte per
> character.  Most other strings 2-bytes per character.   Strings using
> characters outside the Basic Multilingual Plane, 4 bytes per
> character.

Yes, that's an attractive approach.  But I also find simply using
UTF-8 exclusively very attractive.  It might fit better with what
other people are doing and we might need fewer conversions when
wrapping external libraries.  Or maybe not.

