bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xmemdup0


From: Eric Blake
Subject: Re: xmemdup0
Date: Wed, 14 May 2008 06:43:49 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080421 Thunderbird/2.0.0.14 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Bruno Haible on 5/9/2008 4:13 PM:
|
| I would prefer if you could move it to a module of its own.

Sure.  Done as follows.

|
| We have two data types:

This statement should be qualified by "when dealing with byte arrays".

|   (M) memory with arbitrary contents, where the start address and the length
|       are given,
|   (S) memory that is terminated with a NUL byte and otherwise does not
contain
|       NULs; here only the start address is usually given.

Actually, I consider there to be a third type:

~    (M0) memory with arbitrary contents, where the start address and
length are given, and where one past the length is guaranteed to be NUL

Functions such as Posix 200x' getline() and open_memstream(), glibc's
obstack_grow0 and obstack_copy0, and gnulib's getndelim2() intentionally
use (M0), not (M) or (S), as their underlying byte array type.

|
| xmemdup0 is a function to convert from (M) to (S) in certain situations.

Yes, but it is more than that.  xmemdup0 is also a function to create a
substring of type (M0) from type (M) or (S); I have several use cases for
this in m4, as part of my series of patches converting m4 to pass embedded
NUL transparently from input to output.  For example, m4_format(address@hidden,
1) expands to three characters, 'a', '\0', '1'.  Now consider
m4_format(address@hidden): it turns out that checking for valid % sequences
is somewhat faster (fewer conditionals per iteration of the outer loop) if
I can always blindly read the bytes after % until determining that one of
the bytes created an invalid specifier (the ^@, guaranteed by (M0), is
always an invalid specifier), rather than checking on encountering each %
and successive byte whether it falls in the last byte of the string (which
I would have to do if I only had (M)), and different than stopping at the
first NUL (which it used to do, when m4 used (S) instead of (M0)).  Only
when encountering ^@ do I then decide if I saw an invalid sequence %^@ in
the middle, or an unterminated % at the end (and even then, only so that
the error message is more informative).  Another instance of faster
processing due to (M0) is escape processing in the replacement string of
regexp and patsubst builtins; GNU regex can transparently handle NUL, and
identifying dangling \ is faster when a read of \^@ is guaranteed.

| I don't deny that such a function can occasionally be useful. But it can
| also easily be the source of several kinds of bugs:
|
|   Bug 1: What if the memory contents contains a NUL? You do nothing.

Yes, by design.

| No conversion of a NUL to a backslashed syntax or other escaping?
|          Nothing? It would be better to call abort() in this case, then.

Absolutely not.  xmemdup0 is doing what it was designed for, creating an
instance of (M0).

|
|   Bug 2: Off-by-one errors. They are also latent when you convert from
|          (S) to (M), but here as well.

xmemdup0(s, len) is probably less prone to off-by-one errors than
xmemdup(s, len + 1), in cases where it is assumed that s already has a
trailing NUL byte.

|   - 19 places where the task is to extract a substring from a string (S).
|     Here it is guaranteed that there is no embedded NUL. The function
|     does not need to check it; it could be called
|         char * xsubstring (char *ptr, size_t len);
|
|   - 9 places where the task is to transform some memory (M) into (S).
|     Here the name xmemdup0 is appropriate (or xmemtostr ?), and it
|     should better verify that the memory does not contain NUL bytes.
|     (Better abort() than produce wrong results.)

It seems like both xsubstring (blind copy of (S) to (S), without worry
about embedded NUL) and xmemtostr (checking copy, and abort() on embedded
NUL) are useful, but neither has the same semantics as xmemdup0.

| The result type should be 'char *', in C++ and also in C. 'unsigned char *'
| is not a useful return type, because users want a result in data type (S),
| in particular they want strlen() to be applicable to it.

I originally used void* only because xmemdup does.  But you've convinced
me - xmemdup0 makes sense only when treating the memory block as an array
of bytes, at which point returning char* makes it much easier to manage
(in converting m4 to transparently handle NUL, I have found a number of
places where having mem* return char* instead of void* would be so much
more convenient).

Meanwhile, should we be using __attribute__((__malloc__)) on xmemdup0 and
the various xalloc.h functions, to aid in gcc optimization?

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgq3f8ACgkQ84KuGfSFAYAOOACfQ0k+zkizodcJj6/cxUiK2zNa
ekkAoLpkatizQT9jmKOss1q1kCnEGtqP
=K/Up
-----END PGP SIGNATURE-----
>From 929dcebaff7c1ee0501e83ee6e3016aa6d95b4cf Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 13 May 2008 21:28:43 -0600
Subject: [PATCH] Split xmemdup0 into its own module.

* modules/xmemdup0: New file.
* lib/xmemdup0.h: Likewise.
* lib/xmemdup0.c: Likewise.
* MODULES.html.sh (Memory management functions): Add xmemdup0.
* lib/xalloc.h (xmemdup0): Remove.
* lib/xmalloc.c (xmemdup0): Likewise.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog        |   10 ++++++++++
 MODULES.html.sh  |    1 +
 lib/xalloc.h     |    7 -------
 lib/xmalloc.c    |   18 ------------------
 lib/xmemdup0.c   |   44 ++++++++++++++++++++++++++++++++++++++++++++
 lib/xmemdup0.h   |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 modules/xmemdup0 |   23 +++++++++++++++++++++++
 7 files changed, 129 insertions(+), 25 deletions(-)
 create mode 100644 lib/xmemdup0.c
 create mode 100644 lib/xmemdup0.h
 create mode 100644 modules/xmemdup0

diff --git a/ChangeLog b/ChangeLog
index 147ea61..70035ae 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,4 +1,14 @@
 2008-05-13  Eric Blake  <address@hidden>
+
+       Split xmemdup0 into its own module.
+       * modules/xmemdup0: New file.
+       * lib/xmemdup0.h: Likewise.
+       * lib/xmemdup0.c: Likewise.
+       * MODULES.html.sh (Memory management functions): Add xmemdup0.
+       * lib/xalloc.h (xmemdup0): Remove.
+       * lib/xmalloc.c (xmemdup0): Likewise.
+
+2008-05-13  Eric Blake  <address@hidden>
             Bruno Haible  <address@hidden>
 
        Reduce number of forks required during autoconf.
diff --git a/MODULES.html.sh b/MODULES.html.sh
index f07d11c..f390271 100755
--- a/MODULES.html.sh
+++ b/MODULES.html.sh
@@ -1609,6 +1609,7 @@ func_all_modules ()
   func_module alloca-opt
   func_module malloca
   func_module xmalloca
+  func_module xmemdup0
   func_end_table
 
   element="Integer arithmetic functions <stdlib.h>"
diff --git a/lib/xalloc.h b/lib/xalloc.h
index 314ed6d..40dcf4b 100644
--- a/lib/xalloc.h
+++ b/lib/xalloc.h
@@ -50,7 +50,6 @@ void *xcalloc (size_t n, size_t s);
 void *xrealloc (void *p, size_t s);
 void *x2realloc (void *p, size_t *pn);
 void *xmemdup (void const *p, size_t s);
-void *xmemdup0 (void const *p, size_t s);
 char *xstrdup (char const *str);
 
 /* Return 1 if an array of N objects, each of size S, cannot exist due
@@ -265,12 +264,6 @@ xmemdup (T const *p, size_t s)
   return (T *) xmemdup ((void const *) p, s);
 }
 
-template <typename T> inline T *
-xmemdup0 (T const *p, size_t s)
-{
-  return (T *) xmemdup0 ((void const *) p, s);
-}
-
 # endif
 
 
diff --git a/lib/xmalloc.c b/lib/xmalloc.c
index b1f6993..89ecf17 100644
--- a/lib/xmalloc.c
+++ b/lib/xmalloc.c
@@ -113,24 +113,6 @@ xmemdup (void const *p, size_t s)
   return memcpy (xmalloc (s), p, s);
 }
 
-/* Clone an object P of size S, with error checking, and include a
-   terminating NUL byte.
-
-   The terminating NUL makes it safe to use strlen or rawmemchr to
-   check for embedded NUL; it also speeds up algorithms such as escape
-   sequence processing on arbitrary memory, by making it always safe
-   to read the byte after the escape character rather than having to
-   check if each escape character is the last byte in the object.  */
-
-void *
-xmemdup0 (void const *p, size_t s)
-{
-  char *result = xcharalloc (s + 1);
-  memcpy (result, p, s);
-  result[s] = 0;
-  return result;
-}
-
 /* Clone STRING.  */
 
 char *
diff --git a/lib/xmemdup0.c b/lib/xmemdup0.c
new file mode 100644
index 0000000..0c20111
--- /dev/null
+++ b/lib/xmemdup0.c
@@ -0,0 +1,44 @@
+/* xmemdup0.c -- copy a block of arbitrary bytes, plus a trailing NUL
+
+   Copyright (C) 2008 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <config.h>
+
+#include "xmemdup0.h"
+#include "xalloc.h"
+
+#include <string.h>
+
+/* Clone an arbitrary block of bytes P of size S, with error checking,
+   and include a terminating NUL byte.  P is of type `void const *',
+   to make it easier to use this with other mem* functions that return
+   `void *', but since appending a NUL byte only makes sense on bytes,
+   the return type is `char *'.
+
+   The terminating NUL makes it safe to use strlen or rawmemchr to
+   check for embedded NUL; it also speeds up algorithms such as escape
+   sequence processing on arbitrary memory, by making it always safe
+   to read the byte after the escape character rather than having to
+   check if each escape character is the last byte in the object.  */
+
+char *
+xmemdup0 (void const *p, size_t s)
+{
+  char *result = xcharalloc (s + 1);
+  memcpy (result, p, s);
+  result[s] = 0;
+  return result;
+}
diff --git a/lib/xmemdup0.h b/lib/xmemdup0.h
new file mode 100644
index 0000000..ca19542
--- /dev/null
+++ b/lib/xmemdup0.h
@@ -0,0 +1,51 @@
+/* xmemdup0.h -- copy a block of arbitrary bytes, plus a trailing NUL
+
+   Copyright (C) 2008 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef XMEMDUP_H_
+# define XMEMDUP_H_
+
+# include <stddef.h>
+
+
+# ifdef __cplusplus
+extern "C" {
+# endif
+
+# ifndef __attribute__
+#  if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 8)
+#   define __attribute__(x)
+#  endif
+# endif
+
+# ifndef ATTRIBUTE_NORETURN
+#  define ATTRIBUTE_NORETURN __attribute__ ((__noreturn__))
+# endif
+
+/* This function is always triggered when memory is exhausted.
+   It must be defined by the application, either explicitly
+   or by using gnulib's xalloc-die module.  This is the
+   function to call when one wants the program to die because of a
+   memory allocation failure.  */
+extern void xalloc_die (void) ATTRIBUTE_NORETURN;
+
+char *xmemdup0 (void const *p, size_t s);
+
+# ifdef __cplusplus
+}
+# endif
+
+#endif /* !XMEMDUP0_H_ */
diff --git a/modules/xmemdup0 b/modules/xmemdup0
new file mode 100644
index 0000000..d7ea438
--- /dev/null
+++ b/modules/xmemdup0
@@ -0,0 +1,23 @@
+Description:
+Copy a block of arbitrary bytes, and append a trailing NUL.
+
+Files:
+lib/xmemdup0.h
+lib/xmemdup0.c
+
+Depends-on:
+xalloc
+
+configure.ac:
+AC_LIBOBJ([xmemdup0])
+
+Makefile.am:
+
+Include:
+"xmemdup0.h"
+
+License:
+GPL
+
+Maintainer:
+Eric Blake
-- 
1.5.5.1


>From 4b81fbec65dc291f0ecd74f5f0115d9947daa421 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Wed, 14 May 2008 06:12:16 -0600
Subject: [PATCH] Test xmemdup0.

* modules/xmemdup0-tests: New file.
* tests/test-xmemdup0.c: Likewise.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog              |    6 +++
 modules/xmemdup0-tests |   12 ++++++
 tests/test-xmemdup0.c  |   90 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 108 insertions(+), 0 deletions(-)
 create mode 100644 modules/xmemdup0-tests
 create mode 100644 tests/test-xmemdup0.c

diff --git a/ChangeLog b/ChangeLog
index 70035ae..4962795 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2008-05-14  Eric Blake  <address@hidden>
+
+       Test xmemdup0.
+       * modules/xmemdup0-tests: New file.
+       * tests/test-xmemdup0.c: Likewise.
+
 2008-05-13  Eric Blake  <address@hidden>
 
        Split xmemdup0 into its own module.
diff --git a/modules/xmemdup0-tests b/modules/xmemdup0-tests
new file mode 100644
index 0000000..bcbc7ab
--- /dev/null
+++ b/modules/xmemdup0-tests
@@ -0,0 +1,12 @@
+Files:
+tests/test-xmemdup0.c
+
+Depends-on:
+progname
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-xmemdup0
+check_PROGRAMS += test-xmemdup0
+test_xmemdup0_LDADD = $(LDADD) @LIBINTL@
diff --git a/tests/test-xmemdup0.c b/tests/test-xmemdup0.c
new file mode 100644
index 0000000..7295cc0
--- /dev/null
+++ b/tests/test-xmemdup0.c
@@ -0,0 +1,90 @@
+/* Test of xmemdup0() function.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
+
+/* Written by Eric Blake <address@hidden>, 2008.  */
+
+#include <config.h>
+
+#include "xmemdup0.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#define ASSERT(expr) \
+  do                                                                        \
+    {                                                                       \
+      if (!(expr))                                                          \
+        {                                                                   \
+          fprintf (stderr, "%s:%d: assertion failed\n", __FILE__, __LINE__); \
+          fflush (stderr);                                                  \
+          abort ();                                                         \
+        }                                                                   \
+    }                                                                       \
+  while (0)
+
+int
+main (int argc, char **argv)
+{
+  char buffer[10] = { 'a', 'b', 'c', 'd', '\0',
+                     'f', 'g', 'h', 'i', 'j'   };
+
+  /* Empty string.  */
+  {
+    char *result = xmemdup0 (NULL, 0);
+    ASSERT (result);
+    ASSERT (!*result);
+    free (result);
+  }
+  {
+    char *result = xmemdup0 ("", 0);
+    ASSERT (result);
+    ASSERT (!*result);
+    free (result);
+  }
+
+  /* Various buffer lengths.  */
+  {
+    char *result = xmemdup0 (buffer, 4);
+    ASSERT (result);
+    ASSERT (strcmp (result, buffer) == 0);
+    free (result);
+  }
+  {
+    char *result = xmemdup0 (buffer, 5);
+    ASSERT (result);
+    ASSERT (strcmp (result, buffer) == 0);
+    ASSERT (result[5] == '\0');
+    free (result);
+  }
+  {
+    char *result = xmemdup0 (buffer, 9);
+    ASSERT (result);
+    ASSERT (memcmp (result, buffer, 9) == 0);
+    ASSERT (result[9] == '\0');
+    free (result);
+  }
+  {
+    char *result = xmemdup0 (buffer, 10);
+    ASSERT (result);
+    ASSERT (memcmp (result, buffer, 10) == 0);
+    ASSERT (result[10] == '\0');
+    free (result);
+  }
+
+  return 0;
+}
-- 
1.5.5.1


reply via email to

[Prev in Thread] Current Thread [Next in Thread]