qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe


From: Nicholas Piggin
Subject: Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe
Date: Thu, 07 Dec 2023 14:22:06 +1000

On Thu Dec 7, 2023 at 11:35 AM AEST, BALATON Zoltan wrote:
> Hello,
>
> On Wed, 15 Nov 2023, BALATON Zoltan wrote:
> > On Tue, 14 Nov 2023, Nicholas Piggin wrote:
> >> Well I split out these patches and looked a bit closer and added
> >> a few more things.
> >> 
> >> I think it may be a bit too much to do the optimisations for
> >> this release, because 4xx TLB flushing has some quirks too so
> >> it's not just simple implementation of 4xx scheme in 440. We
> >> could try for next time.
> >> 
> >> The bug fix patch 1 maybe we should do. We haven't been able to
> >> confirm it fixes anything but there was mention of occasional
> >> random crashes.
> >
> > I did some quick testing of this series and found that patch 1 alone makes 
> > it 
> > slower but not known to fix any issue so I'd say don't commit just this 
> > patch 
> > without the rest. The current version works enoigh so we can live with that 
> > until the next version. With the other patches it's faster and the last 
> > patch 
> > does make a difference, it makes it a bit faster. I did not record the 
> > numbers and only did one measurement so it's only approximate but unless 
> > you 
> > plan to take the whole series now then keep patch 1 for next devel cycle as 
> > well.
>
> We've done some more experiments and I've collected some numbers now. The 
> test was running lame to convert a wav file to mp3 right after boot and 
> then get "info jit" after it finished. The same executable runs on 
> pegasos2 and sam460ex so we can compare these before and after this series 
> and to pegasos2 as well. These were run on the same host machine so the 
> numbers should be comparable. (This test is also hitting the slow FPU 
> emulation on PPC target that's another reason it runs slowly.)
>
> On pegasos2 I get:
>
> Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=2)
>      Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
>    1149/1149  (100%)|    0:33/    0:33|    0:33/    0:33|   0.8982x|    0:00
> QEMU 8.1.92 monitor - type 'help' for more information
> Accelerator settings:
> one-insn-per-tb: off
>
> Translation buffer state:
> gen code size       29666515/1023052800
> TB count            52723
> TB avg target size  24 max=2048 bytes
> TB avg host size    325 bytes (expansion ratio: 13.4)
> cross page TB count 0 (0%)
> direct jump count   31917 (60%) (2 jumps=25829 48%)
> TB hash buckets     24452/32768 (74.62% head buckets used)
> TB hash occupancy   33.37% avg chain occ. Histogram: [0,10)%|▆ █  ▅▁▃▁▁|[9
> TB hash avg chain   1.018 buckets. Histogram: 1|█▁3
>
> Statistics:
> TB flush count      0
> TB invalidate count 7841
> TLB full flushes    0
> TLB partial flushes 13298
> TLB elided flushes  100190
> [TCG profiler not compiled]
>
> On sam460ex *without* this series:
>
>      Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
>    1149/1149  (100%)|    0:37/    0:37|    0:37/    0:37|   0.8093x|    0:00
> QEMU 8.1.92 monitor - type 'help' for more information
> Accelerator settings:
> one-insn-per-tb: off
>
> Translation buffer state:
> gen code size       32917427/1023052800
> TB count            60534
> TB avg target size  22 max=2048 bytes
> TB avg host size    306 bytes (expansion ratio: 13.9)
> cross page TB count 0 (0%)
> direct jump count   37047 (61%) (2 jumps=29011 47%)
> TB hash buckets     26619/32768 (81.23% head buckets used)
> TB hash occupancy   40.02% avg chain occ. Histogram: [0,10)%|▅ █  ▆▁▄▁▂|[9
> TB hash avg chain   1.035 buckets. Histogram: 1|█▁3
>
> Statistics:
> TB flush count      0
> TB invalidate count 5629
> TLB full flushes    0
> TLB partial flushes 508238
> TLB elided flushes  7680722
> [TCG profiler not compiled]
>
> On sam460ex *with* this series:
>
>      Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
>    1149/1149  (100%)|    0:34/    0:34|    0:34/    0:34|   0.8595x|    0:00
> QEMU 8.1.92 monitor - type 'help' for more information
> Accelerator settings:
> one-insn-per-tb: off
>
> Translation buffer state:
> gen code size       33094883/1023052800
> TB count            60607
> TB avg target size  22 max=2048 bytes
> TB avg host size    308 bytes (expansion ratio: 13.9)
> cross page TB count 0 (0%)
> direct jump count   37093 (61%) (2 jumps=29038 47%)
> TB hash buckets     26682/32768 (81.43% head buckets used)
> TB hash occupancy   40.12% avg chain occ. Histogram: [0,10)%|▅ █  ▆▁▄▁▂|[9
> TB hash avg chain   1.034 buckets. Histogram: 1|█▁3
>
> Statistics:
> TB flush count      0
> TB invalidate count 5628
> TLB full flushes    0
> TLB partial flushes 73
> TLB elided flushes  1143
> [TCG profiler not compiled]

Great, thanks for the numbers.

> The excessive TLB flushes are resolved, there are even much less now than 
> on pegasos2 that uses a G4 CPU. I wonder why and if that could be reduced 
> further as well for books. I still runs slower on sam460ex than on 
> pegasos2 but that will need further profiling to find out what is the next 
> bottle neck.

G4 uses segments and hash table? I think the problem with that is QEMU
TLB does not match the MMU well, so a TLBIE address can not easily match
to a QEMU TLB address.

So it would not be trivial to improve like this series. It could be an
interesting project, I think you need some way to quickly map a hash
virtual address to the possible segment effective addresses that could
be mapping it, and so you can invalidate those addresses (that is what
TCG TLBs cache).

Thanks,
Nick



reply via email to

[Prev in Thread] Current Thread [Next in Thread]