[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Portable device driver considerations
Peter 'p2' De Schrijver
Portable device driver considerations
Sun, 4 Aug 2002 17:25:54 +0200
Some things I considered when thinking about writing device drivers
portable. (Those of you who were at the LSM will remember this :))
I probably should add a section on IRQs at some point.
little endian : 0x12345678 is stored as 0x78 0x56 0x34 0x12
big endian (aka network byte order) : 0x12345678 is stored as 0x12 0x34 0x56
Some people also define bit order to be different. Typically bit 0 is the least
significant bit and bit 7 is the most significant bit. But some people define
bit 0 to be the most significant bit and bit 7 to be the least significant bit.
You will find this convention in the ppc manuals for example.
Most RISC CPU's can not do unaligned accesses. Ie. an access to a 16,32 or 64
bit object should be aligned on a 16,32 or 64bit boundary respectively.
Most CPU's will raise an exception if an unaligned access is attempted. System
software can trap this exception and perform the access in software. This is
quite slow however, so to be avoided if possible. Some CPU's don't generate
a trap, but just don't do what you would expect. It's always better to handle
unaligned accesses in userland if you can do this without having to check every
pointer. In most cases the compiler handles this for you, but in some cases
this is not possible. Particularly when you process data coming from the
outside world (eg: network or file I/O), the compiler may not be aware of
alignment issues. If you need a load or store to be atomic, be sure to use a
native register sized load/store. Any other size might not be atomic on some
These 2 issues typically matter if you have to talk to the outside world via
the network or exchange files with the outside world.
Device driver issues :
1. non-coherent I/O
A lot of CPU/bus controller implementations don't snoop the I/O busses to make
sure their caches are coherent with the data in main memory. They rely on the
OS software instead to provide this coherency. Basically we can distinguish
between 2 kinds of DMA accesses : streaming and non-streaming. Streaming DMA
accesses are typically a number of sequential accesses by the device to the
memory buffer after which the buffer is passed to some other entity. Examples
include network frames, disk buffers, ... non-streaming accesses are typically
accesses towards control structures which reside in main memory and which live
as long as the device is in use. Examples include ring buffer structures,
mailboxes, microcode, ...
For each of these DMA types we need a different infrastructure to support them
In general we have 3 different types of addresses in a system :
+ Virtual addresses which are used by software
+ Physical addresses which appear on the address bus of the CPU
+ Bus addresses which appear on the address bus of a bus attached to the CPU
The translation between Virtual and Physical addresses is probably well known,
so I won't discuss it here.
But there are also translations between physical and bus addresses.
These translations are typically platform dependant.
We can roughly distinguish between 3 types of relations between physical and
bus addresses :
- identity mapping : physical address == bus address. This is for example the
case on PC systems. This means the bridge device between the CPU and the bus
will forward any access which is not destined for a memory device to the bus.
This effectively means bus devices and memory share the same adress space.
- fixed offset mapping : bus address = physical address + offset.
- non-memory space mapping : In this case the bus address cannot be generated
by a simple memory access, but special instructions are necessary to access
devices on the bus. The most common example is probably IA32 I/O ports.
Obviously the processor MMU cannot translate these accesses.
On some busses not only the CPU can be master, but other devices as well. In
this case memory also has bus addresses which are not necessarily the same
as the physical addresses generated by the CPU. There can be an identity
or fixed offset mapping between them, or there can be a software programmable
mapping between them. This is useful to allow 32bit PCI cards to access memory
above the 4GB limit or to allow PCI cards to access large blocks memory without
having to allocate a large contiguous block of memory. An example of such a
setup is the AGP GART.
3. out of order memory accesses
Some architectures (most notably PowerPC) can reorder memory writes or reads.
This is can be a problem doing memory mapped I/O because the CPU doesn't realize
the order of the memory accesses is important. The solution is to introduce a
barrier instruction which tells the cpu it should no do any memory access,
before all pending memory accesses have been completed. Note that this problem
is not limited to the CPU. Any device which sits between the CPU and the actual
I/O chip and has some form of caching or reordering may exhibit similar
|[Prev in Thread]
||[Next in Thread]|
- Portable device driver considerations,
Peter 'p2' De Schrijver <=