[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: string types
From: |
Bruno Haible |
Subject: |
Re: string types |
Date: |
Fri, 27 Dec 2019 11:51:18 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-170-generic; KDE/5.18.0; x86_64; ; ) |
Aga wrote:
> I do not know if
> you can (or if it is possible, how it can be done), extract with a way a
> specific
> a functionality from gnulib, with the absolute necessary code and only that.
gnulib-tool does this. With its --avoid option, the developer can even customize
their notion of "absolutely necessary".
> In a myriad of codebases a string type is implemented at least as:
> size_t mem_size;
> size_t num_bytes;
> char *bytes;
This is actually a string-buffer type. A string type does not need two size_t
members. Long-term experience has shown that using different types for string
and string-buffer is a win, because
- a string can be put in a read-only virtual memory area, thus enforcing
immutability (-> reducing multithread problems),
- providing primitives for string allocation reduces the amount of buffer
overflow bugs that otherwise occur in this area. [1]
Unfortunately, the common string type in C is 'char *' with NUL termination,
and a different type is hard to establish
- because developers already know how to use 'char *',
- because existing functions like printf consume 'char *' strings.
- Few programs have had the need to correctly handles strings with embedded
NULs.
> An extended ustring (unicode|utf8) type can include information for its bytes
> with
> character semantics, like:
> (utf8 typedef'ed as signed int)
> utf8 code; // the integer representation
> int len; // the number of the needed bytes
> int width; // the number of the occupied cells
> char buf[5]; // and probably the character representation
Such a type would have a niche use, IMO, because
- 99% of the processing would not need to access the width (screen columns) -
so
why spend CPU time and RAM to store it and keep it up-to-date?
- 80% of the processing does not care about the Unicode code points either,
and libraries like libunistring can do the Unicode-aware processing.
> But the programmer mind would be probably best
> if could concentrate to how to express the thought (with whatever meaning of
> what we
> are calling "thought") and follow this flow, or if could concentrate the
> energy to
> understand the intentions (while reading) of the code (instead of wasting
> self with
> the "details" of the code) and finally to the actual algorithm (usually
> conditions
> that can or can't be met).
That is the idea behind the container types (list, map) in gnulib. However, I
don't
see how to reasonably transpose this principle to string types.
Bruno
[1] https://lists.gnu.org/archive/html/bug-gnulib/2019-09/msg00031.html
Re: hard-locale: make multithread-safe, Paul Eggert, 2019/12/17