qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe


From: BALATON Zoltan
Subject: Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe
Date: Thu, 7 Dec 2023 02:35:03 +0100 (CET)

Hello,

On Wed, 15 Nov 2023, BALATON Zoltan wrote:
On Tue, 14 Nov 2023, Nicholas Piggin wrote:
Well I split out these patches and looked a bit closer and added
a few more things.

I think it may be a bit too much to do the optimisations for
this release, because 4xx TLB flushing has some quirks too so
it's not just simple implementation of 4xx scheme in 440. We
could try for next time.

The bug fix patch 1 maybe we should do. We haven't been able to
confirm it fixes anything but there was mention of occasional
random crashes.

I did some quick testing of this series and found that patch 1 alone makes it slower but not known to fix any issue so I'd say don't commit just this patch without the rest. The current version works enoigh so we can live with that until the next version. With the other patches it's faster and the last patch does make a difference, it makes it a bit faster. I did not record the numbers and only did one measurement so it's only approximate but unless you plan to take the whole series now then keep patch 1 for next devel cycle as well.

We've done some more experiments and I've collected some numbers now. The test was running lame to convert a wav file to mp3 right after boot and then get "info jit" after it finished. The same executable runs on pegasos2 and sam460ex so we can compare these before and after this series and to pegasos2 as well. These were run on the same host machine so the numbers should be comparable. (This test is also hitting the slow FPU emulation on PPC target that's another reason it runs slowly.)

On pegasos2 I get:

Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=2)
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
  1149/1149  (100%)|    0:33/    0:33|    0:33/    0:33|   0.8982x|    0:00
QEMU 8.1.92 monitor - type 'help' for more information
Accelerator settings:
one-insn-per-tb: off

Translation buffer state:
gen code size       29666515/1023052800
TB count            52723
TB avg target size  24 max=2048 bytes
TB avg host size    325 bytes (expansion ratio: 13.4)
cross page TB count 0 (0%)
direct jump count   31917 (60%) (2 jumps=25829 48%)
TB hash buckets     24452/32768 (74.62% head buckets used)
TB hash occupancy   33.37% avg chain occ. Histogram: [0,10)%|▆ █  ▅▁▃▁▁|[9
TB hash avg chain   1.018 buckets. Histogram: 1|█▁3

Statistics:
TB flush count      0
TB invalidate count 7841
TLB full flushes    0
TLB partial flushes 13298
TLB elided flushes  100190
[TCG profiler not compiled]

On sam460ex *without* this series:

    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
  1149/1149  (100%)|    0:37/    0:37|    0:37/    0:37|   0.8093x|    0:00
QEMU 8.1.92 monitor - type 'help' for more information
Accelerator settings:
one-insn-per-tb: off

Translation buffer state:
gen code size       32917427/1023052800
TB count            60534
TB avg target size  22 max=2048 bytes
TB avg host size    306 bytes (expansion ratio: 13.9)
cross page TB count 0 (0%)
direct jump count   37047 (61%) (2 jumps=29011 47%)
TB hash buckets     26619/32768 (81.23% head buckets used)
TB hash occupancy   40.02% avg chain occ. Histogram: [0,10)%|▅ █  ▆▁▄▁▂|[9
TB hash avg chain   1.035 buckets. Histogram: 1|█▁3

Statistics:
TB flush count      0
TB invalidate count 5629
TLB full flushes    0
TLB partial flushes 508238
TLB elided flushes  7680722
[TCG profiler not compiled]

On sam460ex *with* this series:

    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
  1149/1149  (100%)|    0:34/    0:34|    0:34/    0:34|   0.8595x|    0:00
QEMU 8.1.92 monitor - type 'help' for more information
Accelerator settings:
one-insn-per-tb: off

Translation buffer state:
gen code size       33094883/1023052800
TB count            60607
TB avg target size  22 max=2048 bytes
TB avg host size    308 bytes (expansion ratio: 13.9)
cross page TB count 0 (0%)
direct jump count   37093 (61%) (2 jumps=29038 47%)
TB hash buckets     26682/32768 (81.43% head buckets used)
TB hash occupancy   40.12% avg chain occ. Histogram: [0,10)%|▅ █  ▆▁▄▁▂|[9
TB hash avg chain   1.034 buckets. Histogram: 1|█▁3

Statistics:
TB flush count      0
TB invalidate count 5628
TLB full flushes    0
TLB partial flushes 73
TLB elided flushes  1143
[TCG profiler not compiled]

The excessive TLB flushes are resolved, there are even much less now than on pegasos2 that uses a G4 CPU. I wonder why and if that could be reduced further as well for books. I still runs slower on sam460ex than on pegasos2 but that will need further profiling to find out what is the next bottle neck.

Regards,
BALATON Zoltan

Thanks,
Nick

Nicholas Piggin (6):
 target/ppc: Fix 440 tlbwe TLB invalidation gaps
 target/ppc: Factor out 4xx ppcemb_tlb_t flushing
 target/ppc: 4xx don't flush TLB for a newly written software TLB entry
 target/ppc: 4xx optimise tlbwe_lo TLB flushing
 target/ppc: 440 optimise tlbwe TLB flushing
 target/ppc: optimise ppcemb_tlb_t flushing

target/ppc/mmu_helper.c | 105 +++++++++++++++++++++++-----------------
1 file changed, 61 insertions(+), 44 deletions(-)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]