[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Why does "mapfile -d delim" (delim != '\n') use unbuffered read?
From: |
Koichi Murase |
Subject: |
Why does "mapfile -d delim" (delim != '\n') use unbuffered read? |
Date: |
Sun, 2 May 2021 22:51:54 +0900 |
Maybe I'm asking a stupid question, but, as in the subject, why does
the builtin "mapfile -d delim" use unbuffered read when delim != '\n'?
Can we use buffered read for seekable file descriptors the same as
the `delim == '\n'' case?
Background:
`mapfile' can efficiently read entries delimited by LF from a file.
The following mapfile runs in 67 msec in my machine
(x86_64-redhat-linux-gnu, bash-5.0.11):
$ a=({000000..500000})
$ printf '%s\n' "${a[@]}" > tmp
$ time mapfile -t a < tmp
However, when `mapfile' reads the same entries but instead delimited
by NUL, it takes about 20x longer time (1603 msec) in my machine:
$ printf '%s\0' "${a[@]}" > tmp
$ time mapfile -d '' a < tmp
This is because mapfile switches to the unbuffered read when the
delimiter is not LF:
builtins/mapfile.def L187..L194
> #ifndef __CYGWIN__
> unbuffered_read = (lseek (fd, 0L, SEEK_CUR) < 0) && (errno == ESPIPE);
> #else
> unbuffered_read = 1;
> #endif
>
> if (delim != '\n')
> unbuffered_read = 1;
`mapfile' calls `zgetline' which calls `zread' or `zreadc' depending
on the buffering mode. `zread' / `zreadc' calls read(2) with the third
parameter being 1 / buffer-size. In the buffered mode, the position
in the stream is adjusted using lseek called through `zsyncfd'. I'm
here wondering the reason for avoiding buffered read for delim !=
'\n'.
This treatment of `mapfile' for "delim != '\n'" exists since the
mapfile delimiter is first introduced by commit 25a0eacfe "commit
bash-20140625 snapshot". Would it be a problem to change to the
buffered read also for non-LF delimiters? If we could remove the above
two lines (i.e., if (delim != '\n') unbuffered_read = 1;), I'd be very
happy...
In case someone doubts the use case, I'm looking for a way to solve
the performance issue reported at
https://github.com/akinomyoga/ble.sh/pull/65#issuecomment-801245808 .
A solution with `mapfile' is one of the possible solutions, for which
I'd like to use NUL as the delimiter so that any non-NUL character can
be used for the entries. If someone (e.g. Greg) is interested, I can
explain what I want to achieve using a reduced problem setup.
--
Koichi
- Why does "mapfile -d delim" (delim != '\n') use unbuffered read?,
Koichi Murase <=
- Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?, Chet Ramey, 2021/05/03
- Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?, Chet Ramey, 2021/05/03
- Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?, Koichi Murase, 2021/05/03
- Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?, Koichi Murase, 2021/05/04
- Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?, Chet Ramey, 2021/05/04
- Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?, Koichi Murase, 2021/05/04
- Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?, Chet Ramey, 2021/05/04
- Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?, Koichi Murase, 2021/05/07
- Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?, Chet Ramey, 2021/05/07
- Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?, Koichi Murase, 2021/05/07