[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH 0/5] Experimenting with tb-lookup tweaks
From: |
Richard Henderson |
Subject: |
Re: [RFC PATCH 0/5] Experimenting with tb-lookup tweaks |
Date: |
Wed, 24 Feb 2021 16:28:54 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 2/24/21 8:58 AM, Alex Bennée wrote:
> Hi Richard,
>
> Well I spun up some of the ideas we talked about to see if there was
> anything to be squeezed out of the function. In the end the results
> seem to be a washout with my pigz benchmark:
>
> qemu-system-aarch64 -cpu cortex-a57 \
> -machine type=virt,virtualization=on,gic-version=3 \
> -serial mon:stdio \
> -netdev user,id=unet,hostfwd=tcp::2222-:22 \
> -device virtio-net-pci,netdev=unet,id=virt-net,disable-legacy=on \
> -device virtio-scsi-pci,id=virt-scsi,disable-legacy=on \
> -blockdev
> driver=raw,node-name=hd,discard=unmap,file.driver=host_device,file.filename=/dev/zen-disk/debian-buster-arm64
> \
> -device scsi-hd,drive=hd,id=virt-scsi-hd \
> -smp 4 -m 4096 \
> -kernel ~/lsrc/linux.git/builds/arm64/arch/arm64/boot/Image \
> -append "root=/dev/sda2 systemd.unit=benchmark-pigz.service" \
> -display none -snapshot
>
> | Command | Mean [s] | Min [s] | Max [s] | Relative |
> |---------+----------------+---------+---------+----------|
> | Before | 46.597 ± 2.482 | 45.208 | 53.618 | 1.00 |
> | After | 46.867 ± 2.242 | 45.871 | 53.180 | 1.00 |
Well that's disappointing.
> Maybe the code cleanup itself makes it worthwhile. WDYT?
I think there's little doubt that the first 3 patches are a good code cleanup.
Patch 4 I think is still beneficial, simply so that we can add that "Above
fields" comment.
Patch 5 would only be worthwhile if we could measure any positive difference,
which it seems we cannot.
I have a follow-up patch to remove the parallel_cpus global variable which I
will post in a moment. While it removes a handful of insns from this
fast-path, I doubt it helps. But getting rid of a global is probably always
positive, no?
I was glancing through the lookup function for alpha, instead of aarch64 and
saw:
21e: 33 43 18 xor 0x18(%rbx),%eax
221: 4c 31 e1 xor %r12,%rcx
224: 44 31 ea xor %r13d,%edx
227: 09 c2 or %eax,%edx
229: 48 0b 4b 08 or 0x8(%rbx),%rcx
and thought -- hang on, how come we're just ORing nor XORing here? Of course
it's the cs_base field, which alpha has set to zero. The compiler has
simplified bits |= 0 ^ tb->cs_base.
Which got me thinking: what if we had a per-cpu
typedef struct {
target_ulong pc;
...
} TranslationBlockID;
static inline bool arch_tbid_cmp(TranslationBlockID x,
TranslationBlockID y)
{
return x.pc == y.pc && ...;
}
We could potentially reduce this to memcmp(&x, &y).
First, this would allow cs_base to be eliminated where it is not used. Second,
this would allow cs_base to be renamed for the non-x86 targets for which it is
being abused. Third, it would allow tb->flags to be either (a) elided or (b)
extended by the target as needed.
This final is directed at ARM, of course, where we've overflowed the uint32_t
that is tb->flags. We could now extend that to 64-bits.
Obviously, some tweaks to tb_hash_func would be required as well, but that's
manageable.
What do you think about this last?
r~
- [RFC PATCH 0/5] Experimenting with tb-lookup tweaks, Alex Bennée, 2021/02/24
- [RFC PATCH 4/5] include/exec: lightly re-arrange TranslationBlock, Alex Bennée, 2021/02/24
- [RFC PATCH 5/5] include/exec/tb-lookup: try and reduce branch prediction issues, Alex Bennée, 2021/02/24
- [RFC PATCH 1/5] accel/tcg: rename tb_lookup__cpu_state and hoist state extraction, Alex Bennée, 2021/02/24
- [RFC PATCH 2/5] accel/tcg: move CF_CLUSTER calculation to curr_cflags, Alex Bennée, 2021/02/24
- [RFC PATCH 3/5] accel/tcg: drop the use of CF_HASH_MASK and rename params, Alex Bennée, 2021/02/24
- Re: [RFC PATCH 0/5] Experimenting with tb-lookup tweaks,
Richard Henderson <=
- Re: [RFC PATCH 0/5] Experimenting with tb-lookup tweaks, no-reply, 2021/02/25