Re: Proposed alternative encoding for stray UTF-8 bytes in strings

chicken-hackers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposed alternative encoding for stray UTF-8 bytes in strings

From:	felix . winkelmann
Subject:	Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Date:	Mon, 27 Nov 2023 14:41:59 +0100

> Question: if there is no translation at all, won't the invalid chars cause 
> issues with things like string-length and string-copy procs? That is, since 
> the number of octets can't be correctly translated to a number of glyphs, 
> there will be some unpleasant side effects.

Converting a octet-sequence to a string involves a decoding step to compute the 
length.
Any invalid embedded UTF-8 sequence is taken as one ore more "illegal" 
code-points,
counting for one ore more characters in the final string length. Note that the 
length
of the "backing store" bytevector for the string is retained together with the 
number of
code-points that the string holds (the former is stored in the header of the 
string's
bytevector buffer, the latter in a slot of the string).


felix

[Prev in Thread]

Current Thread

[Next in Thread]

Proposed alternative encoding for stray UTF-8 bytes in strings, John Cowan, 2023/11/24
- Re: Proposed alternative encoding for stray UTF-8 bytes in strings, felix . winkelmann, 2023/11/27
  - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, elf, 2023/11/27
  - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, felix . winkelmann <=
    - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, elf, 2023/11/27
    - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, felix . winkelmann, 2023/11/28

Prev by Date: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Next by Date: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Previous by thread: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Next by thread: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Index(es):
- Date
- Thread