[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 64-bit virtual adresses and registers
From: |
Espen Skoglund |
Subject: |
Re: 64-bit virtual adresses and registers |
Date: |
Tue, 7 Aug 2007 16:17:42 +0200 |
[Jonathan S Shapiro]
> On Mon, 2007-08-06 at 16:52 -0300, Fortes Marcelo wrote:
>> In time,(pardon me by my ignorance)Is Eros/CoyotOS IPC message
>> passing sychronous like L4 or Assyinchronous like Mach ?
> Synchronous. We did discuss asynchronous IPC for a while, but the
> idea was much too complicated. It was eventually dropped.
I assume that you might still support some sort of asynchronous
notification mechanism? This is very useful when one needs an
efficient and reliable signal delivery mechanism. Some time ago I
implemented something similar to the async notification mechanism
described in the NICTA Nx APIs. The following numbers where measured
on a quad-core Intel Xeon E5310 @ 1.60GHz:
SMP-kern SMP-kern SMP-kern
Inter-AS Intra-AS Inter-AS Inter-AS XCPU
Send async notify: 99.13 97.15 113.67 113.66 123.13
Poll async notify: 1.55 1.55 1.55 1.55 1.55
Pingpong async notify: 543.07 233.57 618.21 315.42 2837.07
Pingpong (single MR): 523.26 207.33 453.07 208.28 8184.09
As one can see polling is extremely cheap since it can be done
completely in user-mode. Sending a notification in enters the kernel,
updates a bitmask in the destination, and checks if the destination
needs to be woken up (in this case no wakeup). Async pingpong is two
threads using the async notification mechanism to wake each other up,
and single MR pingpong is the standard pingpong measurement for a
single message register.
Some observations:
- Sending a notification (i.e., with kernel entry and cheking for
whom to schedule) takes less than 100 cycles (some more on SMP
because we have to use bus-locking instructions and do some more
tests).
- Sending a notification has pretty much the same cost whether it be
to a local or remote CPU. The reason for the difference is that
the remote thread is constantly polling for notifications, so the
cache line will have to transition from modified to shared all the
time.
- Async pingpong (single CPU) is typically a little more costly than
sending a 0-byte synchronous IPC. This is to be expected because
of some more extra work to be done.
- Async pingpong between CPUs is about 1/3 the cost of synchronous
IPC. This is due to not having to wait for other CPU to synch up
so that one can do a rendezvous between the threads.
I've also done some measurements on an AMD box. These are quite a bit
faster (especially for inter-AS due to the TLB flush filter), but I
don't have the numbers at hand right now.
eSk