qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tests/avocado: Fix console data loss


From: Nicholas Piggin
Subject: Re: [PATCH] tests/avocado: Fix console data loss
Date: Fri, 15 Sep 2023 10:07:15 +1000

On Wed Sep 13, 2023 at 6:51 PM AEST, Alex Bennée wrote:
>
> Nicholas Piggin <npiggin@gmail.com> writes:
>
> > Occasionally some avocado tests will fail waiting for console line
> > despite the machine running correctly. Console data goes missing, as can
> > be seen in the console log. This is due to _console_interaction calling
> > makefile() on the console socket each time it is invoked, which must be
> > losing old buffer contents when going out of scope.
> >
> > It is not enough to makefile() with buffered=0. That helps significantly
> > but data loss is still possible. My guess is that readline() has a line
> > buffer even when the file is in unbuffered mode, that can eat data.
> >
> > Fix this by providing a console file that persists for the life of the
> > console.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>
> Queued to testing/next, thanks.
>
> > ---
> >
> > For some reason, ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_192m
> > was flakey for me due to this bug. I don't know why that in particular,
> > 3 calls to wait_for_console_pattern probably helps.
> >
> > I didn't pinpoint when the bug was introduced because the original
> > was probably not buggy because it was only run once at the end of the
> > test. At some point after it was moved to common code, something would
> > have started to call it more than once which is where potential for bug
> > is introduced.
>
> There is a sprawling mass somewhere between:
>
>   - pythons buffering of IO
>   - device models dropping chars when blocked
>   - noisy tests with competing console output
>
> that adds up to unreliable tests that rely on seeing certain patterns on
> the console. 

Yeah it's a tricky bug and a difficult stack to diagnose. I started to
look at 40p machine firmware console at first since it was happening on
there.

It's actually not too bad now, I was irritating it by putting delays in
various avocado console socket reading, which can trigger it easily (my
guess is due to delay allowing file buffer to pull in more data than is
consumed). With patch the only check-avocado failures I was getting was
some OS watchdog timeouts in their console print code caused by back
pressure.

Thanks,
Nick



reply via email to

[Prev in Thread] Current Thread [Next in Thread]