qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] cputlb and ssi: cache class to avoid expensive object_dy


From: Cédric Le Goater
Subject: Re: [RFC PATCH] cputlb and ssi: cache class to avoid expensive object_dynamic_cast_assert (HACK!!!)
Date: Thu, 4 Aug 2022 19:33:56 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0

On 8/4/22 18:51, Alex Bennée wrote:

Cédric Le Goater <clg@kaod.org> writes:

Hello Alex,

Thanks for putting some time into this problem,

On 8/4/22 11:20, Alex Bennée wrote:
Investigating why some BMC models are so slow compared to a plain ARM
virt machines I did some profiling of:
    ./qemu-system-arm -M romulus-bmc -nic user \
      -drive
      file=obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd \
      -nographic -serial mon:stdio
And saw that object_dynamic_cast was dominating the profile times.
We
have a number of cases in the CPU hot path and more importantly for
this model in the SSI bus. As the class is static once the object is
created we just cache it and use it instead of the dynamic case
macros.
[AJB: I suspect a proper fix for this is for QOM to support a cached
class lookup, abortive macro attempt #if 0'd in this patch].
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Cédric Le Goater <clg@kaod.org>


Here are some results,

* romulus-bmc, OpenBmc login prompt

   without : 82s
   with    : 56s

Looks like I lucked out picking the lowest hanging fruit.

That's a huge improvement. I tend to use buildroot mostly for FW and
kernel dev but OpenBMC has become as complex as a common server distro.
The above result is probably faster than real HW, for the AST2400 and
AST2500 at least.



* ast2500-evb,execute-in-place=true, U-boot 2019.04 prompt

   without : 30s
   with    : 22s

* witherspoon-bmc,execute-in-place=true, U-boot 2016.07 prompt

   without : 5.5s
   with    : 4.1s

There is definitely an improvement in all scenarios.

Applying a similar technique on AspeedSMCClass, I could gain
another ~10% and boot the ast2500-evb,execute-in-place=true
machine, up to the U-boot 2019.04 prompt, in less then 20s.

There are some fundamentals to XIP which means they will be slower if
each instruction is being sucked through io_readx/device emulation

Yes. But when using XIP, there is a huge time difference between two
U-boot versions. See above. It takes 4s to reach the U-boot prompt of
the older 2016.07 and 22s on the newer U-boot 2019.04.

although I'm not sure what the exact mechanism is because surely a ROM
can just be mapped into the address space and run from there?

It can and that's the default QEMU mode for the Aspeed machines. The flash
contents is copied in a ROM at 0x0. See commit 1a15311a12fa ("hw/arm/aspeed:
add a 'execute-in-place' property to boot directly from CE0")


That's not exactly how the HW works and there are still some FW (like uboot
on the AST2600 BMC of some Meta boards) which will fetch instructions to
execute from the flash contents region at 0x20000000 and not use the ROM
region copied at 0x0.

However, newer u-boot are still quite slow to boot when executing
from the flash device.

For any of those machines?

Yes. It gets worse with the AST2600, which has 2 CPUs

Whats the next command line for me to dig into?

Here are images to reproduce.

* U-Boot 2016.07:

  wget 
https://github.com/openbmc/openbmc/releases/download/2.9.0/obmc-phosphor-image-romulus.static.mtd

  qemu-system-arm -M romulus-bmc -drive 
file=./obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd -nographic
  qemu-system-arm -M romulus-bmc,execute-in-place=true -drive 
file=./obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd -nographic

* U-Boot 2019.04:

  wget https://www.kaod.org/qemu/aspeed/romulus/flash-romulus-bmc

  same commands

Thanks,

C.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]