|
From: | address@hidden |
Subject: | Re: [lwip-devel] byte order, packing, optimizations |
Date: | Mon, 15 Feb 2010 20:17:09 +0100 |
User-agent: | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; de; rv:1.9.1.7) Gecko/20100111 Thunderbird/3.0.1 |
Bill Auerbach wrote:
All the patches so far have the downside (to me) that they effectively require you to have incoming data aligned in a specific way. Up to now, we solved this the other way round by telling the compiler to load data from unaligned addresses.I think (hope) we're in agreement that struct padding of 2 should be the default (over 1) since there is benefit in doing so and there is no need to ever pack by 1.
I do think that pack(2) and asserting for packets to be aligned is good regarding speed, but worse regarding flexibility and portability of our code, as well as harder to track down bugs in the future (if we don't include appropriate checks for aligment).
I'm happy with adding defines to let stack work faster if you *know* your incoming packets are aligned, but I'm not happy with optimizing the actual source code for this. There are platforms which benefit from this while others don't. A good example is Stephane's proposed change from htonX + shift to accessing as u8_t pointer + offset: if your htonX-function is fast (e.g. a custom OP on the NIOS), the old code may be faster than loading 4 bytes (which can really be 4 loads when loading from a struct pointer).
All I'm saying here is that I'd like these performance-improvements to be tested on more platforms and making them overridable - this also implies that we can't have too many of them or anyone wanting to optimize fails to get an overview.
A comment might be nice stating padding of 2 or 4 is only necessary if the processor has 16 or 32-bit alignment requirements.
Padding? Where?
Personally, (and this is also a summary to the lines above) I prefer the macro patch over hard-coded u8_t-array access: if you know the packets will be aligned, this is faster, if you don't know what's best for your platform, the default version still "just works" (even if it is a bit faster than it could be). After all, there are people who don't want to spend time on increasing throughput!The patch you attached does far more than that! And it's not clear to me if/why we need all that. Also, it interferes with / task #10173 <http://savannah.nongnu.org/task/?10173>/ (Add IPCOPY and HWACOPY): http://savannah.nongnu.org/task/?10173I wondered if his patch in fact covers 10173. And where does it leave us with IPv6?
Simon
[Prev in Thread] | Current Thread | [Next in Thread] |