I am trying to
run a big realtime multi-threaded ppc (32bit BE)
application via QEMU on a x86_64 system
(Fedora 35). I have created several test programs that mimic the behavior of the application to test system calls, shared memory and multithreading and to measure emulation performance.
- First try => qemu-system emulation for full ppc64 RedHat7.9 VM – immediately dropped
because of very poor performances
- Then tried to use userspace emulation:
- Ppc (32bit) user emulation using qemu-ppc (version 6.1.0, QEMU_CPU=7457a and glibc-2.5.90-19.0.109.1602824
ppc_74xx) – didn’t work (some mutex and sem_post/sem_wait operation didn't work as expected).
- qemu-ppc64 cannot open 32bit binary.
- Revived within qemu the ppc64abi32 usermode (qemu-ppc64abi32 version 6.1.0,QEMU_CPU=power7+,glibc-2.17-325.el7_9.ppc ).I encountered here several issues, the most notable being the bad behavior of pthread_self(), but replacing it by gettid(), things started to work:
- the test program showed an average performance of 18% compared to a power9 processor (6 times slower) .. (memset loops, memcpy operations) which is quite acceptable.
- I noticed poor performance of sem_post/sem_wait, but the measurement loops were not affected by them.
- Measuring the REAL APPLICATION, the performance is below 2.5% (40times slower) !!!
The application and the test programs were compiled with the same gcc options (tried gcc 3.2, gcc4.8.5).
I can only explain the decrease in performance of the main application by its size: a large one, using even several copies of the same .so
loaded into memory at several addresses, a large amount of shared memory (IPC), etc.
Has anyone experienced this kind of qemu behavior? Am I referring to the performance differences between small and large binaries. Can this be explained by the advantages of the host processor cache? I even tried to increase the size of the QEMU blob cache (hashed to 12 bits with) to a ridiculous size (24 bit width) without effect.
Is there a way to explore some statistics from qemu-user ?
Reference PPC machine: 8core VM (ontop via KVM a RH8
ppc64le system-wide OS), RH7.9ppc64,
running on POWER9 host.
Emulator : 4core VM(ontop via KVM a fedora 34 x86_64
system-wide OS), fedora35 x86_64
Running on intel host Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Regards
Szilard.