[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
From: |
Dale Whitfield |
Subject: |
Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t |
Date: |
Wed, 22 Apr 2009 09:23:49 +0200 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
Hi David >@2009.04.21_19:42:34_+0200
> Dale Whitfield wrote:
>> Hi,
>>
>> All this talk of super-optimisers, etc. brings this issue I've had,
>> to the fore again.
>>
>> Perhaps there are suggestions on how to encourage the compiler to do the
>> optimisations rather than having to hand-code to get the best result.
>>
>> Take this code:
>>
>> volatile uint32_t status;
>> static inline void set_status(uint32_t flag, uint8_t set)
>> __attribute__((always_inline));
>>
>> void set_status(uint32_t flag, uint8_t set) {
>> if (set) status |= flag;
>> else status &= ~flag;
>> }
>>
>> Which when used in an interrupt generates:
>>
>> ISR (INT7_vect)
>> {
>> 4e9a: 1f 92 push r1
>> 4e9c: 0f 92 push r0
>> 4e9e: 0f b6 in r0, 0x3f ; 63
>> 4ea0: 0f 92 push r0
>> 4ea2: 11 24 eor r1, r1
>> 4ea4: 8f 93 push r24
>> 4ea6: 9f 93 push r25
>> 4ea8: af 93 push r26
>> 4eaa: bf 93 push r27
>> 4eac: 80 91 1a 18 lds r24, 0x181A
>> 4eb0: 90 91 1b 18 lds r25, 0x181B
>> 4eb4: a0 91 1c 18 lds r26, 0x181C
>> 4eb8: b0 91 1d 18 lds r27, 0x181D
>> 4ebc: a0 64 ori r26, 0x40 ; 64
>> 4ebe: 80 93 1a 18 sts 0x181A, r24
>> 4ec2: 90 93 1b 18 sts 0x181B, r25
>> 4ec6: a0 93 1c 18 sts 0x181C, r26
>> 4eca: b0 93 1d 18 sts 0x181D, r27
>> set_status(0x00400000UL, 1);
>> 4ed0: bf 91 pop r27
>> 4ed2: af 91 pop r26
>> 4ed4: 9f 91 pop r25
>> 4ed6: 8f 91 pop r24
>> 4ed8: 0f 90 pop r0
>> 4eda: 0f be out 0x3f, r0 ; 63
>> 4edc: 0f 90 pop r0
>> 4ede: 1f 90 pop r1
>> 4ee0: 18 95 reti
>>
>> That's way too much code... Its fairly obvious where the optimisations
>> should be, although I can see that some have been done already.
>>
>
> I can't see much possibility for improving the above code (except by
> removing the push and zeroing of r1). You asked for "status" to be a
> volatile 32-bit int, so that's what you got. The code below is
> semantically different, and thus compiles differently.
>
I don't believe it is semantically different. Which is one reason I
raised this. I am using the version below in existing code and it
behaves correctly.
Loading 4 registers and then storing back 3 that are unchanged makes no
sense at all.
Where volatile comes in here is that the optimisation shouldn't use any
previous loads or modify register values more than once without
reloading/storing etc. Here, the value is loaded once, one byte is
changed and then all 4 are stored. That's wasteful.
In this example, the interrupt routine is half as long when, as I see
it, correctly optimised. On an interrupt particularly, this could be
critical time/cycle saving.
>> If I recode the routine as follows, I get what I want:
>>
>> void set_status(uint32_t flag, uint8_t set) {
>> volatile uint8_t *pstatus = (uint8_t *)&status;
>>
>> if (set) {
>> if (flag & 0x000000ff)
>> pstatus[0] |= (uint8_t)(flag & 0xff);
>> if (flag & 0x0000ff00)
>> pstatus[1] |= (uint8_t)(flag >> 8);
>> if (flag & 0x00ff0000)
>> pstatus[2] |= (uint8_t)(flag >> 16);
>> if (flag & 0xff000000)
>> pstatus[3] |= (uint8_t)(flag >> 24);
>> } else {
>> if (flag & 0x000000ff)
>> pstatus[0] &= ~(uint8_t)(flag & 0xff);
>> if (flag & 0x0000ff00)
>> pstatus[1] &= ~(uint8_t)(flag >> 8);
>> if (flag & 0x00ff0000)
>> pstatus[2] &= ~(uint8_t)(flag >> 16);
>> if (flag & 0xff000000)
>> pstatus[3] &= ~(uint8_t)(flag >> 24);
>> }
>> }
>>
>> Which is:
>>
>> ISR (INT7_vect)
>> {
>> 6b80: 1f 92 push r1
>> 6b82: 0f 92 push r0
>> 6b84: 0f b6 in r0, 0x3f ; 63
>> 6b86: 0f 92 push r0
>> 6b88: 11 24 eor r1, r1
>> 6b8a: 8f 93 push r24
>> 6b8c: 80 91 4e 18 lds r24, 0x184E
>> 6b90: 80 64 ori r24, 0x40 ; 64
>> 6b92: 80 93 4e 18 sts 0x184E, r24
>> set_status(0x00400000UL, 1);
>> 6b98: 8f 91 pop r24
>> 6b9a: 0f 90 pop r0
>> 6b9c: 0f be out 0x3f, r0 ; 63
>> 6b9e: 0f 90 pop r0
>> 6ba0: 1f 90 pop r1
>> 6ba2: 18 95 reti
>>
>> Compiler options are:
>>
>> avr-gcc -mmcu=atmega2560 -Wall -gdwarf-2 -std=gnu99 -Os -funsigned-char
>> -funsigned-bitfields -fpack-struct -fshort-enums -finline-limit=10
>> -DF_CPU=7372800UL
>>
>> version 4.3.3
>>
>> In a 75k byte application this reduces code size by 1632 bytes.
>>
>> Dale.
>
>
>
> _______________________________________________
> AVR-GCC-list mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
>
- [avr-gcc-list] Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/21
- Re: [avr-gcc-list] Optimisation of bit set/clear in uint32_t, Georg-Johann Lay, 2009/04/21
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/21
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t,
Dale Whitfield <=
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Paulo Marques, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22