bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk 5.2.2 fatal crash when closing a two-way pipe for a process tha


From: arnold
Subject: Re: gawk 5.2.2 fatal crash when closing a two-way pipe for a process that does not have a pid anymore
Date: Thu, 24 Aug 2023 12:23:55 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Thank you Finn and Andy.

I agree, there's nothing else to do in the gawk code. It reports
an error but isn't crashing, and that's about all it can do.

Thanks,

Arnold

"Andrew J. Schorr" <aschorr@telemetry-investments.com> wrote:

> Hi,
>
> Hmmm. In my test reproducer, I'm writing to a dead pipe, and that probably
> should be a fatal error. But I see your point -- when I try on 5.1.1, it's
> not giving a fatal error. It looks like this logic was added in
> patch 9eb357e00, which was committed after the release of 5.1.1:
>
> 2021-11-30         Andrew J. Schorr      <aschorr@telemetry-investments.com>
>
>         Improve output redirection error handling for problems not detected
>         until the final flush or close. Thanks to Miguel Pineiro Jr.
>         <mpj@pineiro.cc> for the bug report and suggesting a fix.
>
>         * awk.h (efflush): Add declaration.
>         * builtin.c (efwrite): Break up into 3 functions by moving the
>         flushing logic into efflush and the error handling logic into
>         wrerror.
>         (wrerror): New function containing the error-handling logic extracted
>         from efwrite.
>         (efflush): New function containing the fflush logic extracted from
>         efwrite.
>         * io.c (close_redir): Call efflush prior to closing the redirection
>         to identify any problems with flushing output and to take advantage
>         of the error-handling logic used for print and printf.
>
> There's a discussion thread here:
> https://lists.gnu.org/archive/html/bug-gawk/2021-11/msg00022.html
>
> In your actual usage where the crash is occurring, are you writing to
> the process after it has gone away? If so, I think NONFATAL should
> reasonably be required.
>
> So yes, it does seem to be an incompatible change, but it's fixing a bug
> where I/O was silently failing.
>
> Regards,
> Andy
>
> On Thu, Aug 24, 2023 at 03:28:36PM +0000, Finn Magnusson wrote:
> > Hi
> > Many thanks for the clarification. 
> > Well done on the reproducer : -)
> > The reason why I thought it is a bug is due to the difference in behaviour
> > compared to gawk 5.1.1 , where this scenario does not trigger a crash even 
> > if
> > "NONFATAL" is not set. 
> > If not a bug, then is this a non-backward compatible change?
> > Many thanks.
> > BR
> > Finn
> > 
> > On Thursday, August 24, 2023 at 04:55:24 PM GMT+2, Andrew J. Schorr
> > <aschorr@telemetry-investments.com> wrote:
> > 
> > 
> > Hi,
> > 
> > I made a reproducer. It's not so hard. :-)
> > 
> > Using the master branch:
> > 
> > bash-4.2$ cat /tmp/bug.gawk
> > BEGIN {
> >   cmd = "ssh `hostname` uptime"
> >   print "hello" |& cmd
> >   system("ps -ef | grep ssh")
> >   print "sleeping while waiting for ssh to exit"
> >   sleep(1)
> >   print "another write after process is gone" |& cmd
> >   system("ps -ef | grep ssh")
> >   print "closing now"
> >   close(cmd)
> > }
> > 
> > bash-4.2$ ./gawk -l extension/.libs/time.so -f /tmp/bug.gawk
> > schorr    3684    1  0  2021 ?        00:00:00 ssh-agent
> > root    19396 26130  0 10:34 ?        00:00:00 sshd: schorr [priv]
> > schorr  19399 19396  0 10:34 ?        00:00:01 sshd: schorr
> > schorr  22087 22086  0 10:48 pts/9    00:00:00 ssh ti139 uptime
> > schorr  22088 22086  0 10:48 pts/9    00:00:00 sh -c ps -ef | grep ssh
> > schorr  22091 22088  0 10:48 pts/9    00:00:00 grep ssh
> > root    24832 26130  0 Apr14 ?        00:00:00 sshd: schorr [priv]
> > schorr  24834 24832  0 Apr14 ?        00:00:05 [sshd] <defunct>
> > root    26130    1  0  2021 ?        00:00:37 /usr/sbin/sshd -D
> > sleeping while waiting for ssh to exit
> > schorr    3684    1  0  2021 ?        00:00:00 ssh-agent
> > root    19396 26130  0 10:34 ?        00:00:00 sshd: schorr [priv]
> > schorr  19399 19396  0 10:34 ?        00:00:01 sshd: schorr
> > schorr  22087 22086  1 10:48 pts/9    00:00:00 [ssh] <defunct>
> > schorr  22121 22086  0 10:48 pts/9    00:00:00 sh -c ps -ef | grep ssh
> > schorr  22123 22121  0 10:48 pts/9    00:00:00 grep ssh
> > root    24832 26130  0 Apr14 ?        00:00:00 sshd: schorr [priv]
> > schorr  24834 24832  0 Apr14 ?        00:00:05 [sshd] <defunct>
> > root    26130    1  0  2021 ?        00:00:37 /usr/sbin/sshd -D
> > closing now
> > gawk: /tmp/bug.gawk:10: fatal: flush to "ssh `hostname` uptime" failed: 
> > reason
> > unknown
> > 
> > The defunct process (pid 22087) doesn't seem to be relevant. As you noted,
> > gawk gives a fatal error.
> > 
> > I'm not sure that this is actually a bug. Have you considered using
> > non-fatal I/O?
> > 
> > If I add 'PROCINFO["NONFATAL"] = 1' to the script, it no longer gives
> > a fatal error. Or limit it to the command in question:
> > 
> > BEGIN {
> >   cmd = "ssh `hostname` uptime"
> >   PROCINFO[cmd, "NONFATAL"] = 1
> >   print "hello" |& cmd
> >   system("ps -ef | grep ssh")
> >   print "sleeping while waiting for ssh to exit"
> >   sleep(1)
> >   print "another write after process is gone" |& cmd
> >   system("ps -ef | grep ssh")
> >   print "closing now"
> >   close(cmd)
> > }
> > 
> > Why do you think it's a bug?
> > 
> > Regards,
> > Andy
> > 
> > On Thu, Aug 24, 2023 at 01:31:24PM +0000, Finn Magnusson wrote:
> > > Hi
> > > I wish I could manage to reproduce the issue with a simple recipe.
> > > But whichever way I try to close the process associated with the two-way
> > pipe,
> > > it stays as a defunct process and then the issue does not occur.
> > > The only way I get the issue is in my program where I start a two-way pipe
> > > toward a ssh client which opens to a netconf session on a remote machine. 
> > > On
> > > the remote machine, I issue a command to close the netconf session. This
> > causes
> > > the ssh client to close down completely on my machine and no defunct 
> > > process
> > > remains. Then when using the close() function in gawk to close the two-way
> > pipe
> > > it crashes because the ssh client process does not exist anymore, not 
> > > even as
> > a
> > > defunct process.
> > > So that is not so easy to reproduce outside of my environment since the
> > netconf
> > > server that I use is a proprietary system here at the company where I 
> > > work.
> > > In case you make a fix I can always try it in my environment and let you 
> > > know
> > > whether it solved the issue.
> > > If that is not satisfactory then feel free to discard this bug report 
> > > until I
> > > found a way to reproduce it that could be done in any environment.
> > > Many thanks.
> > > BR
> > > Finn
> > >
> > > On Thursday, August 24, 2023 at 03:06:37 PM GMT+2, Andrew J. Schorr
> > > <aschorr@telemetry-investments.com> wrote:
> > >
> > >
> > > Hi,
> > >
> > > Thanks for the bug report. Can you please provide a simple recipe
> > > for how to reproduce this problem?
> > >
> > > Thanks,
> > > Andy
> > >
> > > On Thu, Aug 24, 2023 at 09:56:54AM +0000, Finn Magnusson via Bug reports 
> > > only
> > > for gawk. wrote:
> > > >  Dear gawk developers
> > > > I noticed the below issue in gawk 5.2.2 which was not present in 
> > > > previous
> > > gawk version I was using (5.1.1): when using the close() function to 
> > > close a
> > > two-way pipe to a process that does not have a PID anymore (e.g. due to 
> > > the
> > > process got closed by an external command), then I got the below fatal 
> > > crash:
> > > > gawk.lin64: /app/moshell/23.2h/moshell/prog.awk:19919: fatal: flush to 
> > > > "/
> > app/
> > > moshell/23.2h/moshell/commonjars/ssh.lin64 -p 2022 -z 
> > > '/proj/wcdma-userarea/
> > > users/eanzmagn/moshell_logfiles/logs_moshell/tempfiles/20230824-114538_6552/
> > > sshz6592' -l expert -o StrictHostKeyChecking=no -o 
> > > UserKnownHostsFile=/dev/
> > null
> > > -o HostKeyAlgorithms="ssh-dss,ssh-rsa,rsa-sha2-512,rsa-sha2-256" -o
> > > NumberOfPasswordPrompts=1 -o ConnectTimeout=10 -o ServerAliveInterval=300 
> > > -o
> > > ConnectionAttempts=1 -o ServerAliveCountMax=0 -o TCPKeepAlive=no -o
> > > PreferredAuthentications=publickey,password 10.136.72.120 -s netconf 2>&1"
> > > failed: reason unknown
> > > >
> > > > I was able to solve it by commenting out the below efflush statement in
> > > gawk-5.2.2/io.c :  /* flush before closing to leverage special error 
> > > handling
> > *
> > > / efflush(rp->output.fp, "flush", rp);
> > > > Is it possible to make a fix for this in a coming gawk release?
> > > > Many thanks.BRFinn
>
> -- 
> Andrew Schorr                      e-mail: aschorr@telemetry-investments.com
> Telemetry Investments, L.L.C.      phone:  917-305-1748
> 152 W 36th St, #402                fax:    212-425-5550
> New York, NY 10018-8765



reply via email to

[Prev in Thread] Current Thread [Next in Thread]