qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 00/29] PowerPC interrupt rework


From: Matheus K. Ferst
Subject: Re: [PATCH v3 00/29] PowerPC interrupt rework
Date: Fri, 21 Oct 2022 11:21:47 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2

On 21/10/2022 07:56, Daniel Henrique Barboza wrote:
Matheus,

I did some digging yesterday. There are 2 distinct things happening:

- the apparent problem with the avocado test. After doing more and more tests
it seems like the test failure rate is lower than 10%. With a simple script
to exercise it in my laptop:

n=1
while [ 1 ]; do
        make -j check-avocado \
AVOCADO_TESTS='tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500' ;
        if [ $? -ne 0 ]; then
                echo "test failed after $n interactions"
                exit 1
        fi
        ((n=n+1))
done

In master I managed to get up to 100+ runs without failure. Sometimes I get 90, 50, 30 runs before failure and so on. This is an OK failure rate in my opinion, so if any code contribution does not dramatically increase this failure rate I'm
fine with it. This also means that I'll not be skipping the test.


Thanks for this testing, I suspect we may have more than one bug that causes this test failure.

- back to this series, I couldn't manage to get a single successful run with
patch 27 applied. On the other hand, running the aforementioned script with
patches 1-26 I just got 96 test runs before the first failure. This is enough evidence for me to believe that, yeah, patch 27 is really doing something that is
messing with the icount replay for e500 one way or the other.

Patch 27 is definitely wrong - other places that write in special registers and SPRs that may cause an interrupt (e.g., gen_helper_store_decr, gen_mtmsr[d]) call gen_io_start, so we also should use it before helper_ppc_maybe_interrupt. Without that call, we hit the cpu_abort in icount_handle_interrupt when using icount if writee[i] unmasks a pending interrupt.

The current writee[i] may be wrong in not calling it too, as it may cause an interrupt to be delivered. However, before the interrupt rework, CPU_INTERRUPT_HARD was set somewhere else, so it wouldn't trigger the abort.

That said, even after adding this call I still see failures after ~200 iterations of this test, so we may have more problems to tackle here. However, it's not a CPU abort anymore, the second QEMU invocation exits with zero without writing anything to the console.


All that said, patches 1-26 are queued in ppc-next.


On 10/20/22 10:40, Matheus K. Ferst wrote:
On 20/10/2022 08:18, Daniel Henrique Barboza wrote:
On 10/19/22 18:55, Daniel Henrique Barboza wrote:
Matheus,

This series fails 'make check-avocado' in an e500 test. This is the error output:

Scrap that.

This avocado test is also failing on master 10% of the time, give or take. It might be case that patch 27 makes the failure more consistent, but I can't
say it's the culprit.


I'll take a closer look and see if I can diagnose one particular commit that is making the patch fail 1 out of 10 times. It can be case where I might need
to skip the test altogether.


Nice catch. I guess we need a gen_icount_io_start before calling helper_ppc_maybe_interrupt, so maybe it's better to make a gen_ppc_maybe_interrupt that calls icount and the helper. I'll give it a bit more testing and re-spin the series.


Don't need to re-spin everything (unless you needed to do some changes in
the patches prior). Just resend patch 27+.



Ok, I'll send 27-29 with based on ppc-next.

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]