bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?


From: Koichi Murase
Subject: Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?
Date: Tue, 4 May 2021 15:31:02 +0900

I'm still thinking about whether it is possible to use buffered read
in more cases of mapfile.

(1) Suggestion: use buffered read for a regular file for delim != '\n'?

> It won't work on any system that doesn't return -1/ESPIPE when you try
> to lseek on a terminal device.

I found a related interesting discussion in a FreeBSD mailing list:
https://lists.freebsd.org/pipermail/freebsd-hackers/2011-November/036863.html
The BUGS section mentioned in this FreeBSD discussion can be found on
the FreeBSD man page:
https://www.freebsd.org/cgi/man.cgi?query=lseek&sektion=2

lseek(2) may succeed on non-seekable streams including some terminal
device (mentioned in Chet's reply) and a tape device (mentioned in the
above FreeBSD discussion).  According to the discussion, BSD tar has
changed to use lseek(2) for the seek capability test in FreeBSD only
when the file descriptor is a regular file or a block device.

The current devel branch (builtins/mapfile.def L187..L194) is
equivalent to the following code:

#ifndef __CYGWIN__
  if (delim == '\n')
    unbuffered_read = (lseek (fd, 0L, SEEK_CUR) < 0) && (errno == ESPIPE);
  else
    unbuffered_read = 1; /* #1 */
#else
  unbuffered_read = 1;
#endif

Here I suggest modifying the line #1 in the following way so as to
switch to the buffered read when the file descriptor is a regular file
and lseek(2) succeeds on it. (BSD tar seemed to treat block devices
the same as regular files, but I'm afraid that the above-mentioned
tape device might be implemented as a block device, so I just allowed
the buffered read only for regular files):

#ifndef __CYGWIN__
  if (delim == '\n')
    unbuffered_read = (lseek (fd, 0L, SEEK_CUR) < 0) && (errno == ESPIPE);
  else
    {
      struct stat st;
      unbuffered_read = 1;
      if (fstat (fd, &st) >= 0 &&
        S_ISREG(st.st_mode) &&
        lseek (fd, 0L, SEEK_CUR) >= 0)
        unbuffered_read = 0;
    }
#else
  unbuffered_read = 1;
#endif

----
(2) Suggestion: use unbuffered_read only when the options `-n/-C' are used?

It seems that the unbuffered read is only needed when `mapfile' is
supplied with the option `-n max_nlines' or the option `-C callback'.
In the other cases, `mapfile' anyway reads the stream till the end
without stopping, so it seems that we don't have to care about the
unprocessed data in the read buffer.  What do you think of turning on
unbuffered_read only when the option `-n' or the option `-C' is
specified?

----
(3) Suggestion: change the order of `run_callback' and `zsyncfd'?

I noticed that `zsyncfd' is called before `run_callback':

builtins/mapfile.def L215..L223
> /* Has a callback been registered and if so is it time to call it? */
> if (callback && line_count && (line_count % callback_quantum) == 0)
>   {
>     run_callback (callback, array_index, line);
>
>     /* Reset the buffer for bash own stream. */
>     if (unbuffered_read == 0)
>       zsyncfd (fd);
>   }

I feel it is better to call `zsyncfd' before `run_callback'.  If the
`callback' reads data from the same file descriptor when there is
unprocessed data in the read buffer, something strange will happen.
`zsyncfd' will just shift the stream position backward by the same
number of bytes as the unprocessed data.  Since it is more natural
that `callback' finds the unprocessed data in the stream, I think
`zsyncfd' is better to be called before `run_callback'.

----
I attach a combined patch for (1)--(3).  This time, I didn't include
the special treatment on __GLIBC__, but maybe we can skip the check of
S_ISREG (st.st_mode) if __GLIBC__ is defined.

Remark: In the current devel branch, when delim == '\n' and lseek(2)
fails with errno != ESPIPE, `mapfile' still performs buffered read.  I
kept this behavior in the attached patch, but maybe this could also be
changed.  I doubt that we could still expect that `zsyncfd' will work
when lseek(2) fails (even though errno != ESPIPE).

Attachment: 0001-mapfile-buffered-read-in-more-cases.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]