bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why does "mapfile -d delim" (delim != '\n') use unbuffered read?


From: Koichi Murase
Subject: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?
Date: Sun, 2 May 2021 22:51:54 +0900

Maybe I'm asking a stupid question, but, as in the subject, why does
the builtin "mapfile -d delim" use unbuffered read when delim != '\n'?
 Can we use buffered read for seekable file descriptors the same as
the `delim == '\n'' case?

Background:

`mapfile' can efficiently read entries delimited by LF from a file.
The following mapfile runs in 67 msec in my machine
(x86_64-redhat-linux-gnu, bash-5.0.11):

$ a=({000000..500000})
$ printf '%s\n' "${a[@]}" > tmp
$ time mapfile -t a < tmp

However, when `mapfile' reads the same entries but instead delimited
by NUL, it takes about 20x longer time (1603 msec) in my machine:

$ printf '%s\0' "${a[@]}" > tmp
$ time mapfile -d '' a < tmp

This is because mapfile switches to the unbuffered read when the
delimiter is not LF:

builtins/mapfile.def L187..L194
> #ifndef __CYGWIN__
>   unbuffered_read = (lseek (fd, 0L, SEEK_CUR) < 0) && (errno == ESPIPE);
> #else
>   unbuffered_read = 1;
> #endif
>
>   if (delim != '\n')
>     unbuffered_read = 1;

`mapfile' calls `zgetline' which calls `zread' or `zreadc' depending
on the buffering mode. `zread' / `zreadc' calls read(2) with the third
parameter being 1 / buffer-size.  In the buffered mode, the position
in the stream is adjusted using lseek called through `zsyncfd'.  I'm
here wondering the reason for avoiding buffered read for delim !=
'\n'.

This treatment of `mapfile' for "delim != '\n'" exists since the
mapfile delimiter is first introduced by commit 25a0eacfe "commit
bash-20140625 snapshot". Would it be a problem to change to the
buffered read also for non-LF delimiters? If we could remove the above
two lines (i.e., if (delim != '\n') unbuffered_read = 1;), I'd be very
happy...

In case someone doubts the use case, I'm looking for a way to solve
the performance issue reported at
https://github.com/akinomyoga/ble.sh/pull/65#issuecomment-801245808 .
A solution with `mapfile' is one of the possible solutions, for which
I'd like to use NUL as the delimiter so that any non-NUL character can
be used for the entries. If someone (e.g. Greg) is interested, I can
explain what I want to achieve using a reduced problem setup.

--
Koichi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]