bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "here strings" and tmpfiles


From: L A Walsh
Subject: Re: "here strings" and tmpfiles
Date: Tue, 09 Apr 2019 16:48:30 -0700
User-agent: Thunderbird


On 4/8/2019 9:19 PM, Robert Elz wrote:
>  
>   | Optionally, I would accept that
>   | an implementation would support forward seeking as some equivalent
>   | to having read the bytes.
>
> I suppose one could make pipes do that, but no implementation I have
> ever seen does, so I don't think you should hold your breath waiting for that 
> one to happen.
>   
Never seen it either, and was only stating that I could see
it being supported as one can skip input, however, it's
counter-intuitive that any mechanism seeking backwards would
make sense.
>   | > 2. Have limited capacity. Writers will sleep when the pipe becomes full.
>   | >   
>   | So does a read-only disk, except writer doesn't flag the error to
>   | the reader in the same way a broken pipe would.
>
> Broken pipe wasn't Chet's point, rather with pipes it is possible to
> deadlock - an obvious example where a shell needs to be careful is
> in something like
>
>       X=$( cat << FOO )
>   
----
    I am aware of that, however, if a pipe implementation
*stops* on reaching a full condition from some 'tmp-storage-space'
and awaits for space to become available, a similar dynamic would
apply.  That's all. 

Example:  Suppose output from a program
was buffered to disk files 64k in size.  The reader
process would get input from those buffers on disk and
free the files as they are read.  If the writer ran out of
space, then sleeping and retrying the operation would make
since, as it would be expected that the reader would be
freeing blocks on disk as it read them.  It's not always
a safe assumption, but what else can it do?

[explanation of data piping elided -- seems to be similar
to using a tmp-space in a manner similar to my example].


> In general here docs (and here strings) are overused ...
>   
---
    Often the choice is based on intent and a matter of
script formatting.

> ...
>
>   | since writing to a read-only tmp or reading from a non
>   | existent fileshould be regarded as writing to a pipe with no
>   | listeners (because no one will ever be able to read from that
>   | 'tmp' file since it doesn't exist).
>
> Sorry, that makes no sense.   The file cases have no valid fd
> (opening a non-existant file fails, opening a file for writing
> on a read only filesys fails).   A better analogy would be when
> writing to a file fails when the filesystem becomes full, or the
> user's quota is exceeded.
>   
Precisely, you are correct.  I was referring to an attempt of
mapping errors in using a file for tmp-space into types of errors
one would normally get from a real pipe.

That said, I could also imagine trying to open output to a
process on a process of a different security level on a
mandatory-access controlled OS where the writer doesn't
have permission to write or send information to the
'reader'.  If that happened, I would think it would have
equivalent error semantics as trying to open
a write-FD, on a RO file system.  This would especially be true
if the device's RO-state wasn't known about until attempting
to write to it (like an unwritable CD media in a CD-writer device).

>   | Using a file doesn't sequence -- the writer can still continue
>   | execution pass the point of bash possibly flagging an internal
>   | error for a non-existent tmp file (writable media) and the
>   | reader won't get that the "pipe" (file) had no successful writer,
>   | but instead get an EOF indication and continue, not knowing that
>   | a fatal error had just occurred.
>
> I doubt that is what happens.
>   
----
    That is what appeared to happen in the post mentioned by Chet.
The boot process got a /dev/df/99 not found and continued on
seemingly as though though there had been no input.
>   | However, that would
>   | be code in the pipe implementation or an IO library on top
>   | of some StdIO implementation using such.
>
> Pipes are implemented in the kernel - userland does nothing different
> at all (except the way they are created.)
>   
----
    They usually are.  That doesn't prevent a stdlib implementation
putting a wrapper around some "non-compliant" kernel call
to implement a different 'view' to the users of that lib.

>   | W/pipes, there is the race condition of the reader not being able
>   | to read in the condition where the writer has already gone away.
>
> Huh?   That's nonsense.   It is perfectly normal for a reader
> to read long after the writer has finished and exited.   Try this
>
>       printf %s\\n hello | { sleep 5; cat; }
>   
===
    It may be normal in some cases, but:

https://superuser.com/questions/554855/how-can-i-fix-a-broken-pipe-error

I've encountered this error when I've use pipes. You may
not be seeing it due to buffer sizes (default buffer size
on linux it is 1M).
>   | "Various purposes"...  Ok, so how do I give that file name
>   | to 'cp' in the next line and copy it somewhere?
>
> You mean
>
>       cp <(process) /tmp/foo
>
> It is, it has to be to work.
>   
---
    *red face*  I'd never tried to copy something that
looked like input redirection.  My apologies on my misconception.
> You are still missing Chet's point.   There is no "< <()" operator.
> That is two bash syntax elements being combined.  "<" (redirect stdin)
> and "<()" (create a name to refer to the output from the command).
>   
----
    I've never seen <() used without '<', so I thought it was
part of the syntax '< <()'.  As I pointed out in the old post
I reposted, there would seem to be multiple syntaxes that
vary in implementation that _look_ similar.
> Pipes have no size limit.  What is limited is how much the kernel
> will store before stalling the sending process, until the reader consumes
> data, leaving more space.   That process can go on forever.
>
>   | Going to disk will create a pipe as large as the
>   | free space on partition '/tmp'.
>
> I assume "pipe" there is some confused way of saying "here doc".
>   
---
    We are talking tradeoffs of using pipes to communicate
heredocs vs. using a temporary file (presumably in/on /tmp), no?
The statement reflected my thinking about how, currently,
the entire contents of the pipe is being "spilled to disk"
(spilled in the sense of their being insufficient room in
memory -- or, in this case of there being no 'in-memory'
implementation at all).

>   | On *my* system, tmp is on a partition of size 7.8G (w/4.7G free)
>   | Running 'df' on tmpfs give me '79G'.
>
> So, you have lots of ram / swap space, and no desire to limit how
> much of that your tmpfs consumes.   I doubt that's a good idea, but
> if it meets your needs, fine.
>   
---
    On my system /tmp is a file system on disk, not in memory. 
Anything that uses it will run out of space -- unlike the pipes
with a reader running in parallel with the writer that can go on
forever, as you say.  FWIW, I am limiting the tmp space
consumed by having a tmp-partition of a fixed & limited size.
>   | If bash uses /tmp, it can have a pipe of size 4.7G.  If
>   | it uses memory, it would have pipe of 79G.
>
> That's gibberish.
>   
Oh please, its not that obtuse.  If bash currently writes the
entire contents of "whatever" it is (the here doc), to a temporary
file, then it is limited by the space on the temporary file system.

If I chose to mount 'tmpfs' on /tmp, 'df' shows it having a size
of 79G.  However, if bash executes the reader and writer in
parallel, and uses a pipe, and not a tmp-file on disk, then
up to 79G of information could potentially be *buffered* (though
I think the kernel limits that to a lower value) before the
writer had to be paused.

If they are running in parallel, then, conceivably there would
be no need for such a large buffer if the consuming process was
as fast as the producing process.
>   |  If it uses
>   | an OS pipe...that's OS dependent, no?  If the OS transparently
>   | used memory to add dynamic space to a pipe, it would
>   | also get 79G, or at least, some value like
>   | /proc/sys/fs/pipe-max-size.
>
> You clearly have no idea what a pipe is, or what that parameter
> represents.
>   
---
    If you admit to not being able to decipher what I said, it is
premature to say that I have no idea what a pipe is.  *hmph*

    The 79G, has to do with the space that will be buffered
(which would be similar to the amount of space on the tmp
file-system if pipes were spilled to disk as in my example
above).

> Thing of a garden hose, with a tap at one end (the writer) and a
> spray nozzel with a trigger at the other.  ...   Either end 
> can stop temporarily and start again, as many times as you like.
>
> That is what a pipe is like.
>   
----

    But the implementation of process substitution in bash
isn't implemented that way in the currently released version.  It
uses a tmp file on a disk of fixed size to store *all* of the output
of the 'writer' before the reader is called.

    That's another limitation of using file-based semantics where
the writer must write the full output before the reader is
allowed to start.




    Clearly, in addition to defaulting to buffering in
memory, the two processes need to be run in parallel for there
to be any real "safety" from a potentially full tmp storage.

    If they are running in parallel, one would hope the reader
would be able to empty the pipe storage so the writer could
continue and eventually finish.

    Would that be your understanding as well? Or, how do you
see bash handling the absence of a writeable tmp file system
or running out of room on it before the reader can start?






reply via email to

[Prev in Thread] Current Thread [Next in Thread]