bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnulib] getline & getline_safe


From: Derek Robert Price
Subject: Re: [Bug-gnulib] getline & getline_safe
Date: Fri, 25 Jul 2003 10:17:48 -0400
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Netscape/7.1

Bruno Haible wrote:

Derek Robert Price wrote:
I still don't agree. In the single case where the number of characters
read (the return value of the function) is the same as the character
limit passed to it, the caller can test the last character (ignoring
the NUL byte) of it's buffer to see if it contains a delimiter or not.
Thus, with a character limit specified, the edge case is detectable and
can be dealt with in something like a typical while(read) loop, where
each time the buffer is filled, it is dealt with and the next block of
bytes are read until the caller is satisfied.

And what can the program do inside this while(read) loop, assuming that
it absolutely wants to avoid unlimited memory growth? It cannot accumulate
the result, since that would take too much space. It cannot easily apply a
regexp, since the regex function which works on a concatenation of two
strings is not portable (a GNU extension). No sane programmer will
really want to handle all of the line's pieces *and* limit the memory.


You do assume here that the regexp would need to be applied to the whole line. It is certainly possible that regexp or some other means could parse usable tokens off the beginning of the line. How do you parse language? Not many people remember everything word-for-word. Your brain converts words into more easily digestible and manipulatable "concepts" which you probably remember. This sort of functionality can be useful to a computer program too.

Perhaps a sane programmer knows that each "line" may be very long but that it is usually more efficient to start a subprocess to deal with early chunks of the data when the line exceeds a certain length.

I'm sure I could come up with more examples. The point is that I like to apply principles of open-ended and extensibility to my software before sending it out into common usage to avoid rewriting APIs or having to create new functions to do mostly the same thing.

Indeed, in CVS itself you don't need the remainder of the line, you
just bail out like this:
http://ccvs.cvshome.org/source/browse/ccvs/src/server.c?rev=1.307&content-type=text/x-cvsweb-markup&sortby=log

   if( getnline( &tmp, &tmp_allocated, PATH_MAX, stdin ) < 0 )
        {
            error (1, 0, "bad auth protocol start: EOF");
        }

And then later in the same file:

   getnline( &repository, &repository_allocated, PATH_MAX, stdin );
   getnline( &username, &username_allocated, PATH_MAX, stdin );
   getnline( &password, &password_allocated, PATH_MAX, stdin );

So you see, the average programmer does not want to get into complications,
reading the line piece by piece, doing a while(read) loop etc. He wants
to emit just 1 getnline() call per line. And for this case it is better
to truncate the username after PATH_MAX bytes than to pretend that the
password were the remainder of the username's line.


In CVS, there is no point to doing things this way. If the repository exceeds PATH_MAX, it won't be found anyhow. As for possibly parsing username and password off of the remainder of the line, this has no more chance and probably less of accidentally hitting a working username & password than an attacker would with a dictionary attack.

Now that you point it out however, I think I may make maxing the allocation an error here to avoid even a chance of attempting a username and password that were not intentionally presented as such.

I'll grant that CVS not actually needing this functionality just now is almost enough to make me drop this, but I know that someday it or somebody else might and my pride won't let me release non-extensible code without a fight.

As for "line-oriented protocols", I would assume the developer knew what
they were doing when they made a call to "getnline" as opposed to getline.

When I see the code cited above, from src/server.c, I'm not so sure about
this.

And when I read Karl Fogel's original mail
http://www.mail-archive.com/address@hidden/msg00327.html
where he talks about "impose appropriate restrictions on line length"
I'm sure he did not mean to fake newline characters where there are none.


I'll grant this and fix it as per the above.

Also please consider simplicity!

Correctness goes over simplicity.


I still maintain that limiting the characters read and not the memory allocation is the correct method. Limiting the characters read can be applied to the memory allocation case but not visa versa.

You've also neglected to comment on the reordering of the arguments to
getndelim2.  The new order is much more intuitive

I can hardly argue about matters of taste. For questions of taste, polls
are better than just my or just your opinion.


I don't think this is strictly a matter of taste. The pattern has already been established in the previous functions:

extern ssize_t getline (char **_lineptr, size_t *_linesize, FILE *_stream);

extern ssize_t getdelim (char **_lineptr, size_t *_linesize, int _delimiter,
                         FILE *_stream);

extern ssize_t getnline (char **lineptr, size_t *linesize, size_t nmax,
                        FILE *stream);

extern ssize_t getndelim (char **lineptr, size_t *linesize, size_t nmax,
                         int delimiter, FILE *stream);


Note that the FILE * keeps being pushed out by new arguments. The same goes for the delimiter arguments relative to the size_t nmax (which should be limit) argument. In contrast, getndelim2 suddenly reshuffles all the arguments:

extern ssize_t getndelim2 (char **lineptr, size_t *linesize, size_t nmax,
                          FILE *stream, int delim1, int delim2,
                          size_t offset);


I am arguing in favor of consistency to make the function more easily accessible to humans who are familiar with its sister functions. I also think that the grouping of related arguments make the function API more accessible to humans. This is not strictly a style issue. It is an organizational, in some ways a psychological (perhaps ergonomics would be an easier word to apply), issue, and it seems that many others design similarly. For instance in the regex.c from GNULIB:

extern int re_search_2 (struct re_pattern_buffer *buffer, const char *string1,
                       int length1, const char *string2, int length2,
int start, int range, struct re_registers *regs, int stop);

Note that the int which describes the length of a string always follows the string it describes. Likewise, regex.c functions which contain struct re_registers arguments and int stop arguments always put them at the end of the argument list.

Finally, the original bug reported by my user was a typing conflict
which occurred on 64-bit systems where ssize_t was not the same thing as
an int.  Now, I'm not sure how they managed to get past your $if
__GLIBC__ < 2 on that system to hit the prototypes in getline.h, but
they did.

It's very simple: The original report
http://ccvs.cvshome.org/issues/show_bug.cgi?id=130
dates from 2003-07-13, and from
http://ccvs.cvshome.org/source/browse/ccvs/lib/getline.h
you can see that your lib/getline.h did not have a #if protection at
all at that time.


Hrm.  Yep.  Sorry I missed that.  Thanks.

I switched the header to switch on HAVE_WORKING_GETLINE &
HAVE_GETDELIM as detected by configure

This is not necessary. since from the gettext releases I know that
the "#if __GLIBC__ < 2" condition is the right one for all existing
systems, including glibc-2.0.x and Linux libc5. And btw, setting
HAVE_GETDELIM without testing for it explicitly is against the
spirit of autoconf. It may break other modules.

Okay, I have to grant you that as well. Thanks again. I'll merge these two items.

New shorter patch follows:

Index: lib/getline.c
===================================================================
RCS file: /cvsroot/gnulib/gnulib/lib/getline.c,v
retrieving revision 1.15
diff -u -r1.15 getline.c
--- lib/getline.c       24 Jul 2003 15:23:15 -0000      1.15
+++ lib/getline.c       25 Jul 2003 14:10:07 -0000
@@ -26,12 +26,6 @@
/* Specification.  */
#include "getline.h"

-#include <stddef.h>
-#include <stdio.h>
-
-/* Get ssize_t.  */
-#include <sys/types.h>
-
#if defined __GNU_LIBRARY__ && HAVE_GETDELIM

ssize_t
@@ -47,12 +41,14 @@
ssize_t
getline (char **lineptr, size_t *linesize, FILE *stream)
{
-  return getndelim2 (lineptr, linesize, (size_t)(-1), stream, '\n', 0, 0);
+  return getndelim2 (lineptr, linesize, 0, GETNDELIM_NO_LIMIT, '\n', 0,
+                     stream);
}

ssize_t
getdelim (char **lineptr, size_t *linesize, int delimiter, FILE *stream)
{
-  return getndelim2 (lineptr, linesize, (size_t)(-1), stream, delimiter, 0, 0);
+  return getndelim2 (lineptr, linesize, 0, GETNDELIM_NO_LIMIT, delimiter, 0,
+                     stream);
}
#endif
Index: lib/getndelim2.c
===================================================================
RCS file: /cvsroot/gnulib/gnulib/lib/getndelim2.c,v
retrieving revision 1.2
diff -u -r1.2 getndelim2.c
--- lib/getndelim2.c    18 Jul 2003 16:58:06 -0000      1.2
+++ lib/getndelim2.c    25 Jul 2003 14:10:07 -0000
@@ -1,5 +1,5 @@
-/* getndelim2 - Read a line from a stream, stopping at one of 2 delimiters,
-   with bounded memory allocation.
+/* getndelim2 - Read n characters or less from a stream, stopping at one of up
+   to two specified delimiters.

   Copyright (C) 1993, 1996, 1997, 1998, 2000, 2003 Free Software
   Foundation, Inc.
@@ -39,23 +39,18 @@
#define MIN_CHUNK 64

ssize_t
-getndelim2 (char **lineptr, size_t *linesize, size_t nmax,
-           FILE *stream, int delim1, int delim2, size_t offset)
+getndelim2 (char **lineptr, size_t *linesize, size_t offset, size_t limit,
+            int delim1, int delim2, FILE *stream)
{
  size_t nbytes_avail;          /* Allocated but unused chars in *LINEPTR.  */
  char *read_pos;               /* Where we're reading into *LINEPTR. */

-  if (!lineptr || !linesize || !nmax || !stream)
+  if (!lineptr || !linesize || !stream)
    return -1;

  if (!*lineptr)
    {
-      size_t newlinesize = MIN_CHUNK;
-
-      if (newlinesize > nmax)
-       newlinesize = nmax;
-
-      *linesize = newlinesize;
+      *linesize = MIN_CHUNK;
      *lineptr = malloc (*linesize);
      if (!*lineptr)
        return -1;
@@ -67,36 +62,35 @@
  nbytes_avail = *linesize - offset;
  read_pos = *lineptr + offset;

-  if (nbytes_avail == 0 && *linesize >= nmax)
-    return -1;
-
  for (;;)
    {
      /* Here always *lineptr + *linesize == read_pos + nbytes_avail.  */
+      register int c;
+
+      if (limit == 0)
+       break;
+
+      c = getc (stream);

-      register int c = getc (stream);
+      if (limit != GETNDELIM_NO_LIMIT)
+       limit--;

      /* We always want at least one char left in the buffer, since we
         always (unless we get an error while reading the first char)
         NUL-terminate the line buffer.  */

-      if (nbytes_avail < 2 && *linesize < nmax)
+      if (nbytes_avail < 2)
        {
-         size_t newlinesize =
-           (*linesize > MIN_CHUNK ? 2 * *linesize : *linesize + MIN_CHUNK);
-
-         if (newlinesize > nmax)
-           newlinesize = nmax;
+         if (*linesize > MIN_CHUNK)
+           *linesize *= 2;
+         else
+           *linesize += MIN_CHUNK;

-         if (newlinesize > *linesize)
-           {
-             *linesize = newlinesize;
-             nbytes_avail = *linesize + *lineptr - read_pos;
-             *lineptr = realloc (*lineptr, *linesize);
-             if (!*lineptr)
-               return -1;
-             read_pos = *linesize - nbytes_avail + *lineptr;
-           }
+         nbytes_avail = *linesize + *lineptr - read_pos;
+         *lineptr = realloc (*lineptr, *linesize);
+         if (!*lineptr)
+           return -1;
+         read_pos = *linesize - nbytes_avail + *lineptr;
        }

      if (c == EOF || ferror (stream))
@@ -108,11 +102,8 @@
            break;
        }

-      if (nbytes_avail >= 2)
-       {
-         *read_pos++ = c;
-         nbytes_avail--;
-       }
+      *read_pos++ = c;
+      nbytes_avail--;

      if (c == delim1 || (delim2 && c == delim2))
        /* Return the line.  */
Index: lib/getndelim2.h
===================================================================
RCS file: /cvsroot/gnulib/gnulib/lib/getndelim2.h,v
retrieving revision 1.1
diff -u -r1.1 getndelim2.h
--- lib/getndelim2.h    18 Jul 2003 16:58:06 -0000      1.1
+++ lib/getndelim2.h    25 Jul 2003 14:10:07 -0000
@@ -1,5 +1,5 @@
-/* getndelim2 - Read a line from a stream, stopping at one of 2 delimiters,
-   with bounded memory allocation.
+/* getndelim2 - Read n characters or less from a stream, stopping at one of up
+   to two specified delimiters.

   Copyright (C) 2003 Free Software Foundation, Inc.

@@ -26,17 +26,17 @@
/* Get ssize_t.  */
#include <sys/types.h>

+#define GETNDELIM_NO_LIMIT (ssize_t)-1
+
/* Read up to (and including) a delimiter DELIM1 from STREAM into *LINEPTR
   + OFFSET (and NUL-terminate it).  If DELIM2 is non-zero, then read up
   and including the first occurrence of DELIM1 or DELIM2.  *LINEPTR is
   a pointer returned from malloc (or NULL), pointing to *LINESIZE bytes of
-   space.  It is realloc'd as necessary.  Reallocation is limited to
-   NMAX bytes; if the line is longer than that, the extra bytes are read but
-   thrown away.
+   space.  It is realloc'd as necessary.  Read no more than LIMIT bytes.
   Return the number of bytes read and stored at *LINEPTR + OFFSET (not
   including the NUL terminator), or -1 on error or EOF.  */
-extern ssize_t getndelim2 (char **lineptr, size_t *linesize, size_t nmax,
-                          FILE *stream, int delim1, int delim2,
-                          size_t offset);
+extern ssize_t getndelim2 (char **_lineptr, size_t *_linesize, size_t _offset,
+                           size_t _limit, int _delim1, int _delim2,
+                           FILE *_stream);

#endif /* GETNDELIM2_H */
Index: lib/getnline.c
===================================================================
RCS file: /cvsroot/gnulib/gnulib/lib/getnline.c,v
retrieving revision 1.2
diff -u -r1.2 getnline.c
--- lib/getnline.c      18 Jul 2003 16:58:06 -0000      1.2
+++ lib/getnline.c      25 Jul 2003 14:10:07 -0000
@@ -1,4 +1,4 @@
-/* getnline - Read a line from a stream, with bounded memory allocation.
+/* getnline - Read a line of n characters or less from a stream.

   Copyright (C) 2003 Free Software Foundation, Inc.

@@ -26,14 +26,14 @@
#include "getndelim2.h"

ssize_t
-getnline (char **lineptr, size_t *linesize, size_t nmax, FILE *stream)
+getnline (char **lineptr, size_t *linesize, size_t limit, FILE *stream)
{
-  return getndelim2 (lineptr, linesize, (size_t)(-1), stream, '\n', 0, 0);
+  return getndelim2 (lineptr, linesize, 0, limit, '\n', 0, stream);
}

ssize_t
-getndelim (char **lineptr, size_t *linesize, size_t nmax,
-          int delimiter, FILE *stream)
+getndelim (char **lineptr, size_t *linesize, size_t limit, int delimiter,
+           FILE *stream)
{
-  return getndelim2 (lineptr, linesize, (size_t)(-1), stream, delimiter, 0, 0);
+  return getndelim2 (lineptr, linesize, 0, limit, delimiter, 0, stream);
}
Index: lib/getnline.h
===================================================================
RCS file: /cvsroot/gnulib/gnulib/lib/getnline.h,v
retrieving revision 1.1
diff -u -r1.1 getnline.h
--- lib/getnline.h      17 Jul 2003 16:23:52 -0000      1.1
+++ lib/getnline.h      25 Jul 2003 14:10:07 -0000
@@ -1,4 +1,4 @@
-/* getnline - Read a line from a stream, with bounded memory allocation.
+/* getnline - Read a line of n characters or less from a stream.

   Copyright (C) 2003 Free Software Foundation, Inc.

@@ -27,23 +27,21 @@

/* Read a line, up to the next newline, from STREAM, and store it in *LINEPTR.
   *LINEPTR is a pointer returned from malloc (or NULL), pointing to *LINESIZE
-   bytes of space.  It is realloc'd as necessary.  Reallocation is limited to
-   NMAX bytes; if the line is longer than that, the extra bytes are read but
-   thrown away.
+   bytes of space.  It is realloc'd as necessary.  Read a maximum of LIMIT
+   bytes.
   Return the number of bytes read and stored at *LINEPTR (not including the
   NUL terminator), or -1 on error or EOF.  */
-extern ssize_t getnline (char **lineptr, size_t *linesize, size_t nmax,
-                        FILE *stream);
+extern ssize_t getnline (char **_lineptr, size_t *_linesize, size_t limit,
+                         FILE *_stream);

/* Read a line, up to the next occurrence of DELIMITER, from STREAM, and store
   it in *LINEPTR.
   *LINEPTR is a pointer returned from malloc (or NULL), pointing to *LINESIZE
-   bytes of space.  It is realloc'd as necessary.  Reallocation is limited to
-   NMAX bytes; if the line is longer than that, the extra bytes are read but
-   thrown away.
+   bytes of space.  It is realloc'd as necessary.  Read a maximum of LIMIT
+   bytes.
   Return the number of bytes read and stored at *LINEPTR (not including the
   NUL terminator), or -1 on error or EOF.  */
-extern ssize_t getndelim (char **lineptr, size_t *linesize, size_t nmax,
-                         int delimiter, FILE *stream);
+extern ssize_t getndelim (char **_lineptr, size_t *_linesize, size_t limit,
+                          int _delimiter, FILE *_stream);

#endif /* GETNLINE_H */



Derek

--
               *8^)

Email: address@hidden

Get CVS support at <http://ximbiot.com>!
--
"It's difficult to work in a group when you're omnipotent."

        -Q, "Deja Q"






reply via email to

[Prev in Thread] Current Thread [Next in Thread]