bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash 5.1 heredoc pipes problematic, shopt needed


From: Chet Ramey
Subject: Re: bash 5.1 heredoc pipes problematic, shopt needed
Date: Thu, 28 Apr 2022 15:29:14 -0400
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.8.1

On 4/28/22 11:26 AM, Alexey wrote:

I promised you more examples, and here they are:
Very common case to build a list of files for further processing:
   declare -a FILES
   #1
  FILES=(); time readarray -t FILES <<<"$(find "$d" -xdev -maxdepth 5 -type f)"
   #2
   # <<< act as a tmp file (due to result bigger than 64K)
  FILES=(); time while read -r f; do FILES+=("$f"); done <<<"$(find / -xdev -maxdepth 5 -type f)"
   #3
  FILES=(); time while read -r f; do FILES+=("$f"); done < <(find / -xdev -maxdepth 5 -type f)

OK, what we see here is what we knew: that as the amount of data approaches
64K (or whatever the pipe capacity is), the pipe solution falls farther and
farther behind temp files.



  - example #3 do read() only for 1b at once (it's the worst way to do so).
   Yes, I know that we can't do lseek() in PIPE and this is the main reason for 1-byte read().    bash can do 4096b read() to internal buffer related to file-descriptor and have an emulated lseek()
    within that buffer.

How would this make a difference? The performance issue you're talking
about is due to reading from the pipe. That doesn't change if bash uses an
internal buffer before handing the data to `read'.

Example 1 doesn't sync the fd because you don't use callbacks or stop
reading before EOF. For example 2, since read is required to read only
a line at a time, you have to make sure you don't read more than one line
from your input source.

Now, if you're talking about using this auxiliary buffer with example 2,
yes, it's possible to augment the bash data reading code (zread, zreadc,
zgetline, etc.) to add a function that reads while keeping track of where
it thinks it's read in the file and seeks around in that buffer. If you
want to take a crack at writing that code, which may be as simple as some
changes to zsyncfd, I'd be glad to see what you come up with. Make sure the
code can handle arbitrary changes to the file descriptor that may happen
due to, for example, redirection.

CONCLUSION:
  - we shouldn't change tmp file to pipe if it slows down code execution;

This is why it will be a compat setting in bash-5.2.

 - BUT it's a good attempt to go away from tmp files to pipes IF bash will create internal buffer
    for reading to level the problem with 1-byte read() from pipe.
   Bash could do 4096b read() to some internal buffer related to file-descriptor and have an emulated lseek()
    within that buffer.

This makes no sense and doesn't change anything.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]