[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: string types
From: |
Tim Rühsen |
Subject: |
Re: string types |
Date: |
Mon, 6 Jan 2020 17:08:32 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1 |
Hi Bruno,
On 1/6/20 1:46 PM, Bruno Haible wrote:
>>>>> - providing primitives for string allocation reduces the amount of
>>>>> buffer
>>>>> overflow bugs that otherwise occur in this area. [1]
>>>>> [1] https://lists.gnu.org/archive/html/bug-gnulib/2019-09/msg00031.html
>>> ...
>> We created a 'catch them all' string/buffer type plus API. It is a good
>> compromise for all kinds of situations, works like a memory buffer but
>> is guaranteed 0-terminated, allows custom stack buffers with fallback to
>> heap if to small.
>>
>> https://gitlab.com/gnuwget/wget2/blob/master/libwget/buffer.c
>>
>>
>> There also is a sprintf functionality (glibc compatible) using these
>> buffers - and the operation is well faster than glibc's sprintf-like
>> functions for all format strings tested (tested back a few years). The
>> code is also much smaller (380 C code lines), the return values are
>> size_t. It doesn't support float/double.
>>
>> https://gitlab.com/gnuwget/wget2/blob/master/libwget/buffer_printf.c
>>
>> If there is serious interest, I could prepare modules for gnulib.
>
> It is interesting that your solution does not only cover the simple cases
> (string concatenation, etc.), but also the more complex one, possibly
> with if()s in the generation logic, and all this without blatant potential
> for buffer overflow bugs.
>
> So, the solution would consists of the following parts:
> (A) A growable buffer type, with up to N (128 or 1024 or so) bytes on
> the stack.
Preferable, the initial size and if starting with heap or stack buffer
should be (runtime) configurable.
- initial size because it allows fine-tuning to better avoid reallocations
- initial stack if used as local / temporary buffer
- initial heap when you already know that the resulting string has to
persist the function return
Currently there is are two init functions (I leave away the wget namespace):
int buffer_init(buffer *buf, char *data, size_t size);
buffer *buffer_alloc(size_t size);
buffer_alloc creates a buffer instance on the heap and initializes it
with a heap buffer of size.
buffer_init(buf, data, date_size) initializes 'buf' with the given data
and data_size. data will not be free'd, so stack data can be used here.
buffer_init(buf, NULL, date_size) initializes 'buf' with freshly
allocated heap data of size 'data_size'.
buffer_init(buf, NULL, 0) initializes 'buf' with freshly allocated heap
data of size 128. We could leave this out - it's a currently unused
special case to avoid error handling.
Then there is
int buffer_ensure_capacity(buffer *buf, size_t size);
> (B) A set of functions for appending to such a growable buffer.
To copy a number of bytes to the beginning (effectively dropping the
previous content):
size_t buffer_memcpy(buffer *buf, const void *data, size_t length);
To append a number of bytes:
size_t buffer_memcat(buffer *buf, const void *data, size_t length);
To copy a string to the beginning (effectively dropping the previous
content):
size_t buffer_strcpy(buffer *buf, const char *s);
To append a string:
size_t buffer_strcat(buffer *buf, const char *s);
To set a number of bytes at the beginning (effectively dropping the
previous content):
size_t buffer_memset(buffer *buf, char c, size_t length);
To append a number of the same bytes:
size_t buffer_memset_append(buffer *buf, char c, size_t length);
> (C) A function for creating a heap-allocated 'char *' from a growable
> buffer.
Currently we do:
buffer buf;
buffer_init(&buf, NULL, date_size); // allocate buf.data on heap
... add stuff to buf ...
mydata = buf.data; buf.data = NULL;
buffer_deinit(&buf);
We could make up a (static inline) for this, named
void *buffer_deinit_transfer(buffer *buf);
This function could also call realloc() to shrink 'data' to it's
occupied length.
> (D) Short-hand functions for the simple cases (like string concatenation).
See above, e.g.
buffer_strcpy(buf, scheme);
buffer_strcat(buf, "://");
buffer_strcat(buf, domain);
buffer_memcat(buf, ":", 1);
buffer_strcat(buf, port_s);
buffer_memcat(buf, "/", 1);
buffer_strcat(buf, path);
But I prefer the slightly slower but better readable form
buffer_printf("%s://%s:%d/%s", scheme, domain, port, path);
Since our printf-like functions directly write into a buffer, there is
no overhead for copying data.
> It would be good to have all these well integrated (in terms of function
> names and calling conventions). So far, in gnulib, we have only pieces of
> it:
> - Module 'scratch_buffer' is (A) without (B), (C), (D).
> - Modules 'vasnprintf', 'asprintf' are (B), (C), (D) but without (A).
>
> Before you start writing the code, it's worth looking at the following
> questions:
> * Should the module 'scratch_buffer' be reused for (A)? Or is this
> not possible? If not, can it still have a memory-leak prevention
> like described in lib/malloc/scratch_buffer.h?
I don't see the advantage of the described memory-leak prevention. On
memory error scratch_buffer_grow() releases all memory, so you can
simply return ? Most functions have a 'cleanup' part anyways, which
needs to be jumped to before return. Why not add a scratch_buffer_free()
there ?
Also, scratch_buffer has IMO these flaws.
(1) The fixed initial size of 1024 on the stack. This should be tunable.
(2) If you need a scratch_buffer itself on the heap, you can't free
those initial 1024 bytes.
(3) The 'expected usage' as described in scratch_buffer.h expects the
developer to explicitly design his functions in a certain way.
> * What about reusing the complete vasnprintf.c for (B), rather than
> adding another, limited printf-like implementation?
Yes, would be nice. At least for appending we need an extra
malloc/malloc/memcpy/free.
vasnprintf reallocates RESULTBUF that means we can't use a stack
allocated buffer - thus we lose the performance advantage. Or should we
try snprintf first and fallback to vasnprintf in case of truncation ?
We want another module e.g. buffer-printf to not pull in vasnprintf when
not needing printf-like buffer functions.
Once the buffer module is done, we could think of amending vasnprintf to
better play with the buffer type.
> * Is it best to implement (D) based on (A), (B), (C), or directly
> from scratch?
Better from scratch, but the code already exists (with FSF copyright)
and has been used a lot. So it's not really from scratch, though we can
amend/optimize a few things.
Regards, Tim
signature.asc
Description: OpenPGP digital signature