qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/5] Experimenting with tb-lookup tweaks


From: Alex Bennée
Subject: Re: [RFC PATCH 0/5] Experimenting with tb-lookup tweaks
Date: Thu, 25 Feb 2021 10:15:42 +0000
User-agent: mu4e 1.5.8; emacs 28.0.50

Richard Henderson <richard.henderson@linaro.org> writes:

> On 2/24/21 8:58 AM, Alex Bennée wrote:
>> Hi Richard,
>> 
>> Well I spun up some of the ideas we talked about to see if there was
>> anything to be squeezed out of the function. In the end the results
>> seem to be a washout with my pigz benchmark:
>> 
>>  qemu-system-aarch64 -cpu cortex-a57 \
>>    -machine type=virt,virtualization=on,gic-version=3 \
>>    -serial mon:stdio \
>>    -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>    -device virtio-net-pci,netdev=unet,id=virt-net,disable-legacy=on \
>>    -device virtio-scsi-pci,id=virt-scsi,disable-legacy=on \
>>    -blockdev 
>> driver=raw,node-name=hd,discard=unmap,file.driver=host_device,file.filename=/dev/zen-disk/debian-buster-arm64
>>  \
>>    -device scsi-hd,drive=hd,id=virt-scsi-hd \
>>    -smp 4 -m 4096 \
>>    -kernel ~/lsrc/linux.git/builds/arm64/arch/arm64/boot/Image \
>>    -append "root=/dev/sda2 systemd.unit=benchmark-pigz.service" \
>>    -display none -snapshot
>> 
>> | Command | Mean [s]       | Min [s] | Max [s] | Relative |
>> |---------+----------------+---------+---------+----------|
>> | Before  | 46.597 ± 2.482 |  45.208 |  53.618 |     1.00 |
>> | After   | 46.867 ± 2.242 |  45.871 |  53.180 |     1.00 |
>
> Well that's disappointing.
>
>> Maybe the code cleanup itself makes it worthwhile. WDYT?
>
> I think there's little doubt that the first 3 patches are a good code cleanup.
>
> Patch 4 I think is still beneficial, simply so that we can add that "Above
> fields" comment.
>
> Patch 5 would only be worthwhile if we could measure any positive difference,
> which it seems we cannot.
>
> I have a follow-up patch to remove the parallel_cpus global variable which I
> will post in a moment.  While it removes a handful of insns from this
> fast-path, I doubt it helps.  But getting rid of a global is probably always
> positive, no?
>
> I was glancing through the lookup function for alpha, instead of aarch64 and 
> saw:
>
>  21e:   33 43 18                xor    0x18(%rbx),%eax
>  221:   4c 31 e1                xor    %r12,%rcx
>  224:   44 31 ea                xor    %r13d,%edx
>  227:   09 c2                   or     %eax,%edx
>  229:   48 0b 4b 08             or     0x8(%rbx),%rcx
>
> and thought -- hang on, how come we're just ORing nor XORing here?  Of course
> it's the cs_base field, which alpha has set to zero.  The compiler has
> simplified bits |= 0 ^ tb->cs_base.
>
> Which got me thinking: what if we had a per-cpu
>
> typedef struct {
>     target_ulong pc;
>     ...
> } TranslationBlockID;
>
> static inline bool arch_tbid_cmp(TranslationBlockID x,
>                                  TranslationBlockID y)
> {
>     return x.pc == y.pc && ...;
> }
>
> We could potentially reduce this to memcmp(&x, &y).
>
> First, this would allow cs_base to be eliminated where it is not used.  
> Second,
> this would allow cs_base to be renamed for the non-x86 targets for which it is
> being abused.  Third, it would allow tb->flags to be either (a) elided or (b)
> extended by the target as needed.
>
> This final is directed at ARM, of course, where we've overflowed the uint32_t
> that is tb->flags.  We could now extend that to 64-bits.
>
> Obviously, some tweaks to tb_hash_func would be required as well, but that's
> manageable.
>
> What do you think about this last?

Sounds like a good idea for clean-up, especially to get rid of
cs_base/extend tbflags when needed. One concern would be where do we go
when we get to heterogeneous emulation? Will they share the same
translation area like the current cpu->cluster_index stuff or will that
only be for similar but not quite the same architectures? Maybe I'm
thinking too far ahead... 

>
>
> r~


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]