octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: segfaults building documentation when machine under load


From: Dmitri A. Sergatskov
Subject: Re: segfaults building documentation when machine under load
Date: Sat, 23 May 2020 00:37:53 -0400



On Sat, May 23, 2020 at 12:24 AM Daniel J Sebald <address@hidden> wrote:
On 5/22/20 4:52 PM, John W. Eaton wrote:
> On 5/19/20 4:11 PM, Dmitri A. Sergatskov wrote:
>>
>>
>> On Tue, May 19, 2020 at 4:02 PM John W. Eaton <address@hidden
>> <mailto:address@hidden>> wrote:
>>
>>     On 5/19/20 3:26 PM, Dmitri A. Sergatskov wrote:
>>
>>      >     Should we switch to bug-tracker?
>>      >     I was able to get a crash when I bumped the jobs to 200.
>>      >     bt is attached. The relevant part seems to be:
>>
>>     If I use a large number of jobs, I see
>>
>>         error: imwrite: invalid empty image
>>         error: called from
>>             __imwrite__ at line 40 column 5
>>             imwrite at line 125 column 5
>>             print at line 755 column 13
>>             interpimages at line 72 column 5
>>
>>     but no segfaults.
>>
>>     It does look like a threading issue.
>>
>>
>> I used a simplified test by Andreas:
>>
>> parallel -N0 -q octave --norc --silent --no-history --eval 'figure
>> (1,"visible", "off");' ::: {1..200}
>
> Thanks.
>
> After much confusion, I think I arrived at a solution.  I pushed the
> following changeset to stable and merged with default:
>
>    http://hg.savannah.gnu.org/hgweb/octave/rev/00a9a49c7670
>
> on stable and merged with default.
>
> These most recent changes appear to improve the situation for the test
> case shown above.  I'm not longer able to cause a segfault with the
> following parallel execution:
>
>      parallel -j 50 -N0 -q octave --norc --silent --no-history --eval
> 'figure (1, "visible", "off");' ::: {1..1000}
>
> Here's the summary from the changset commit message:
>
> ----
> This change is a further attempt to avoid segfaults when shutting down
> the interpreter and exiting the GUI event loop.  The latest approach is
> to have the interpreter signal that it is finished with "normal" command
> execution (REPL, command line script, or --eval option code), then let
> the GUI thread process any remaining functions in its event loop(s) then
> signal back to the interpreter that it is OK to shutdown.  Once the
> shutdown has happened (which may involve further calls to the GUI thread
> while executing atexit functions or finish.m or other shutdown code, the
> interpreter signals back to the GUI that shutdown is complete.  At that
> point, the GUI can delete the interpreter object and exit.
> ----
>
> Before this change, the GUI could still be processing events (displaying
> the figure window, for example) while the interpreter was being deleted.
>   Obviously, that causes trouble.
>
> Although we recognized this problem before, none of the previous
> solutions have really worked.  See the commit message for
> https://hg.savannah.gnu.org/hgweb/octave/rev/cdb681adc85a, for example,
> where I noted that
>
>    ... the crash described in bug report #56952 appeared to be happening
> when the Qt event loop was calling
> QtHandles::qt_graphics_toolkit::create_object when the interpreter was
> being deleted and the gh_manager object was already invalid, ...
>
> I noticed this again and finally realized that we could probably use the
> Qt event queue to ensure that pending graphics events are allowed to
> finish before shutting down the interpreter.  It seems to work for all
> the tests I've tried so far, including creating a figure in the finish.m
> script or using "atexit ('sombrero')".

Some time ago a group of us looked at the problem of exiting the GUI
when the worker core is busy:

https://savannah.gnu.org/bugs/?44485

I had put some effort into a nice system whereby a QTimer waits for the
core to finish and after a certain amount of time it would signal that a
dialog box appear asking if the user wants to force an exit.  Of course,
if the core does then quit while the user hasn't answered the dialog yet
then the dialog box should disappear.  It all had to do with saving
files in the editor and closing the editor and so on.

However, I never completed the patch because I could never get the
sequencing just right.  There was always something like "What if the
user does this?", or "What if the core finishes at this point?".  This
shutdown signal might be just the thing to make it work.  I'll revisit
that bug when I can.

Dan

I posted this on the bug list, but perhaps it worth to repost it here.
After the latest John's patch (c6d10df71863 tip @) the segfault crash is gone.
The failed builds are due to missing files.
I tried the following test with parallel:

rm -rf /tmp/t1/*

parallel -j 32 -N0 -q ./run-octave --norc --silent --no-history --eval 'figure(1, "visible", "off"); plot (1:2); print(tempname("/tmp/t1", "t1-"));' ::: {1..128}

ls -c /tmp/t1/ | wc -l
92

I expect to have 128 files in /tmp/t1; the actual number varies from run to run. Adding pause(1) after plot and print
improves the situation, but does not solves it. Also increasing number of jobs seems to make it worse.
But may be i am not using parallel correctly.

Sincerely,

Dmitri.
--


 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]