bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash 5.1 heredoc pipes problematic, shopt needed


From: Alexey
Subject: Re: bash 5.1 heredoc pipes problematic, shopt needed
Date: Thu, 28 Apr 2022 19:26:19 +0400
User-agent: Mail UserAgent

On 2022-04-26 01:05, Alexey via Bug reports for the GNU Bourne Again SHell wrote:
On 2022-04-26 00:54, Chet Ramey wrote:
On 4/25/22 4:33 PM, Alexey wrote:

My key point that we have two choices for future:
  - make read from pipe faster, or

You mean the read builtin, right? I already explained those semantics.

  - provide options for force here-string to use temp files.

Yes, the absolute worst case scenario has a performance penalty. The
question is how that affects things that run in real life scenarios. I
think making this a shell compatiblity mode option is the best place
to start.

I don't see any other options for fast-enough performance.

Since you don't define `fast-enough', it's not really a question that
can be answered.

Sure, I'll try to provide you more real life scenario later rather
than just empty for loop.

But getting performance degradation comparing to bash4.4 (which always
use temp files for here-string) it's sad evolution.

p.s. I disagreed that I should choose other script languages (not bash
or other shells) for performance critical tasks if we are talking
about system interactions. Bash is great suitable for most admins
tasks.

Hello.

I promised you more examples, and here they are:
Very common case to build a list of files for further processing:
  declare -a FILES
  #1
FILES=(); time readarray -t FILES <<<"$(find "$d" -xdev -maxdepth 5 -type f)"
  #2
  # <<< act as a tmp file (due to result bigger than 64K)
FILES=(); time while read -r f; do FILES+=("$f"); done <<<"$(find / -xdev -maxdepth 5 -type f)"
  #3
FILES=(); time while read -r f; do FILES+=("$f"); done < <(find / -xdev -maxdepth 5 -type f)

From these examples we can see that:
- example #1 approximately 2 times faster than example #2, and 4 times faster than example #3. - to be more honest, first example should be appended with at least empty loop: for f in "${FILES[@]}"; do :; done
    after such modification example #2 became comparable with example #1

Also there is a problem that we can't use `mapfile -t <<<"$()"' as equivalent to `mapfile -t < <()', because here-string appends a newline, so MAPFILE will have one empty element instead of no elements in case of empty subshell result. So it's one more situation where we have to use PIPE instead of tmp file.

If we dig into `strace' we can see, that:
- example #1 do a sequential read() for 4096b (it's the most productive way); - example #2 do read() for 4096b and then lseek() back if bash found `delimiter' in read buffer; - example #3 do read() only for 1b at once (it's the worst way to do so). Yes, I know that we can't do lseek() in PIPE and this is the main reason for 1-byte read(). bash can do 4096b read() to internal buffer related to file-descriptor and have an emulated lseek()
   within that buffer.


CONCLUSION:
 - we shouldn't change tmp file to pipe if it slows down code execution;
- BUT it's a good attempt to go away from tmp files to pipes IF bash will create internal buffer
   for reading to level the problem with 1-byte read() from pipe.
Bash could do 4096b read() to some internal buffer related to file-descriptor and have an emulated lseek()
   within that buffer.
- we can create additional option for read/readarray built-in (for example -b) to force described above read buffer. This options allow script-writer to decide how read() should be done according to his knowledge of further using of this PIPE (e.g. calling subshell or completely exec to a new program).


Regards,
Alexey.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]