chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposed alternative encoding for stray UTF-8 bytes in strings


From: felix . winkelmann
Subject: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Date: Mon, 27 Nov 2023 14:41:59 +0100

> Question: if there is no translation at all, won't the invalid chars cause 
> issues with things like string-length and string-copy procs? That is, since 
> the number of octets can't be correctly translated to a number of glyphs, 
> there will be some unpleasant side effects.

Converting a octet-sequence to a string involves a decoding step to compute the 
length.
Any invalid embedded UTF-8 sequence is taken as one ore more "illegal" 
code-points,
counting for one ore more characters in the final string length. Note that the 
length
of the "backing store" bytevector for the string is retained together with the 
number of
code-points that the string holds (the former is stored in the header of the 
string's
bytevector buffer, the latter in a slot of the string).


felix




reply via email to

[Prev in Thread] Current Thread [Next in Thread]