[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: NSRunLoop Tidying
From: |
Richard Frith-Macdonald |
Subject: |
Re: NSRunLoop Tidying |
Date: |
Sun, 10 Oct 2010 12:43:36 +0100 |
On 9 Oct 2010, at 12:19, David Chisnall wrote:
> On 8 Oct 2010, at 17:14, Richard Frith-Macdonald wrote:
>
>> Actually the current implementation uses an unsorted list as depending on
>> the ordering introduced a subtle bug ... a timer can be present in more than
>> one run loop, and/or in the same loop in more than one mode.
>
> From the NSTimer documentation (CFRunLoopTimerRef contains a similar line):
>
> 'A timer object can be registered in only one run loop at a time, although it
> can be added to multiple run loop modes within that run loop.'
>
> Therefore, the second case is the only one that should apply.
That's good ... perhaps it changed or I misread it. If a timer is restricted
to a single loop, we can avoid the need for locking (since a loop is restricted
to a single thread) , which makes keeping an ordered list much easier to do
efficiently.
>> If it fires in one loop and changes its next fire date (ie is a repeating
>> timer), that automatically breaks the ordering for the other loops/modes
>> that it exists in. Of course, what we really want is probably something
>> more complex to keep track of all the loops/modes the timer is in, and tell
>> them that they need to re-order their list of timers ... if we did that we
>> could depend on the order in the list.
>> The API demands that we keep track of the timers some way (so we can return
>> the correct value from the -limitDateForMode: method), but it doesn't have
>> to be an ordered list if a better alternative is available (though I can't
>> really think of one).
>
> I was thinking that it would be simpler to just keep an unsorted list and
> scan it when one of these methods is called. It's O(n) in terms of the
> number of timers, but hopefully these methods are not called very often (and
> n is relatively small - I doubt many runloops have hundreds of timers
> associated with them).
That's what the current code does ... and it's been fine for most applications
(because most only have a small number of timers). However, I recently had to
rewrite one application's timer handling for precisely this reason ... when
scaling to handle large volumes of traffic it was grinding to a halt because it
was searching a huge list of timers in each loop iteration ... the code used to
have good scalable performance when we used an ordered list but was no longer
workable for the unsorted list. So, while the current code works for most case,
there are certainly ones where it's performance is not good enough. Unless the
operating system supports efficient lookups to find the earliest timer in a
run loop mode (and I don't believe it does), we really need to maintain that
list ourselves. In fact in terms of scalability of the code, the current
unordered list seems to be the main bottleneck ... I get reasonable performance
with thousands of file descriptors, but not with thousands of timers. I'd like
to have even better performance with huge numbers of descriptors (and using the
epoll API ought to provide that, I guess kqueue probably would too), but the
current performance is good enough for the vast majority of applications.
>
>>> This is required with traditional select() and poll(), but Win32 has
>>> SetTimer, Linux has timerfd(), and *BSD has kevent(), all of which allow
>>> you to schedule timer events and wait on them just like fd events.
>>
>> I don't really see how that can help since the API explicitly separates
>> timer firing (done in -limitDateForMode:) and I/O event handling ... so
>> having timers and I/O events use the same mechanism in the operating system
>> does not mean that they can be done at the same place in the code or with a
>> single system call. So using things like SetTimer() and timerfd() may just
>> add complexity.
>
> No. Timers and timeouts are different in the high-level API.
> -limitDateForMode: is an exception. It is possible to implement the entire
> API on top of this, but that requires doing something in userspace that the
> kernel already has code for. It is significantly simpler to just schedule
> the timers and sleep.
I can't see how. The API does not allow timers to fire except as part of the
runloop ... so you are always waiting for I/O (except in the rare case where a
loop contains no input sources) at the same time as waiting for a timer to fire
... so using two codepaths (one to sleep waiting for a timer but no I/O events,
and one to poll/select for both I/O events and timers) is clearly more complex
than using a single one.... you have twice the code to maintain.
> The most common way of using a runloop is either -run (no timeout), or
> -runUntilDate: / -runMode:beforeDate:, which have an explicit timeout that is
> orthogonal to the timers.
> As these are currently implemented, we:
>
> 1) Look through the timers list to find the next timer.
> 2) Find whether the timer will fire before the timeout.
> 3) Set the minimum as the timeout.
> 4) Call down into the kernel.
> 5) If the timeout occurred, determine whether it was due to a timer firing or
> due to the user-specified timeout elapsing.
>
> In contrast, with kqueue timers, win32 timers, or timerfd, we need to:
>
> 1) Call down to the kernel with the user-specified timeout.
>
> This is vastly simpler.
I see what you mean, though I think you've overstated the case a bit since you
are omitting loads of stuff to do with managing the timers, and not comparing
like with like (eg 4/5 (system call and branch on result) in the first instance
are just counted as 1 operation in the second instance). But clearly there is
the possibility of a simpler (and faster) codepath there. Adding this sort of
thing will of course make the code a little more complex and less maintainable
overall (since you still need the existing code or something similar for POSIX
platforms), but it will get you something more efficient on some modern
platforms in most circumstances ... though when you have lots of timers firing,
an implementation with an ordered list of timers might be just as efficient
since it runs in user space apart from a single call to get the current time.
> This is especially true when you consider multiple run loop iterations. If
> timers change during the loop, then we need to recalculate the timer /
> timeout minimum with the first approach, while with the second we just need
> to modify a single timer in the kernel.
> On vaguely modern platforms, this means that entering a run loop is much
> simpler, meaning that we spend a lot more time sleeping in userspace waiting
> for the kernel to wake us up, which helps preserve battery life as it means
> more time the CPU can spend in low-power mode.
So what you are talking about here is sacrificing a little
maintainability/simplicity for a little performance ... probably a good
trade-off, but not obviously so since the majority of apps don't need that
performance boost (they spend almost all their time either waiting for an event
or handling it, with a tiny portion of time spent in the runloop code itsself).
I happen to write server code where a very high throughput is important ... so
I'm probably more sympathetic to the idea than most people.
>
>> I agree with "Dr. H. Nikolaus Schaller" <address@hidden> ... that it would
>> make most sense to start by implementing the CFRunLoop API, and then
>> re-implement NSRunLoop on top of it (it's quite hard to do the other way
>> round).
>
> CFRunLoop is quite a messy API and is actually not sufficiently expressive
> for NSRunLoop. For example, you can add Mach ports as sources to a
> CFRunLoop, but the API does not provide a mechanism for specifying whether
> you want read or write events (presumably it only generates data-available?),
> and it does not provide an API for adding file descriptors, rather than Mach
> ports, so you'd have to modify it slightly for all non-Darwin platforms.
Yes, you would need CFRunLoop+ to implement NSRunLoop, and you would need
NSRunLoop+ to implement CFRunLoop ... but it does look easier and more
efficient to do it by extending CFRunLoop.
>> Also, you need to put together *LOTS* to testcases in the testsuite to
>> ensure that the implementation behaves exactly like OSX does ... this is
>> because NSRunLoop is a really key part of so much of the system and very
>> subtle/minor changes in behavior can have really nasty effects on
>> applications. It's really not good enough to have an implementation which
>> does what the documentation says, we have to mimic actual OSX behavior
>> pretty closely.
>
> Agreed.
>
>> My guess is that your idea of re-implementing timer handling in a platform
>> specific way is actually a recipe for a less maintainable and more complex
>> codebase, since the timeout behavior is largely mandated by the API and I
>> can't currently see how platform specific APIs will help...
>
> Timers are not timeouts. They are completely orthogonal in the API, it's
> only an accident of implementation that we treat them as being the same.
>
>> but maybe you have some ideas you didn't mention here. In general it's a
>> good idea (for maintainability) to keep platform specific code to a minimum,
>> so for instance I wouldn't use PostThreadMessage() as it's probably no more
>> efficient than the current code using SetEvent(), and would demand changes
>> to NSThread to hold additional thread data and a message queue.
>
> Why would it need NSThread to be modified?
To send a message to a thread, you need to pass the windows thread-id as a
parameter to the call ... so of course you must know that ID, which means that
NSThread must store it so that it can get to it when you try to perform a
selector on a particular thread. That much is clearly needed. A quick look at
the documentation says that a message queue must be created for the thread too
... but that's probably fairly trivial (a call to some windows function when
the thread is created). I'm not saying there's any reason we can't use
PostThreadMessage(), just that it would take more work to code/change, and make
the codebase slightly less maintainable. I generally believe we should keep
things simple and avoid platform specific code unless there is good reason to
make things more complex (ie where we have to do so for exact OSX
compatibility, or we want to do so for substantial performance improvement).
>> On the other hand, the kqueue API is supposed to be significantly more
>> efficient than poll/select ... so using that on BSD and the similar epoll
>> API on Linux would be good.
>
> Using kqueue() to emulate poll() does not provide much of a performance
> advantage. The advantage comes from actually using the features available
> with the new APIs, not simply making the existing kludge use a tiny subset of
> the new APIs.
I don't know kqueue, but I thought it was supposed to be like epoll, and that
should offer significant performance advantages when a very high number of
events are to be monitored.
At the moment, each time we call select/poll we must build a data structure to
describe the events we are polling for ... which is quite expensive (and dwarfs
the overhead of handling timers separately from I/O events).
With epoll we could avoid that, since the data structure would be held in the
kernel and we would modify it only when adding/removing a descriptor to the set
of those we are interested in. So adding/removing a descriptor would be a bit
more expensive, but usually that's pretty rare so performance would improve.
The obvious case where this would be great is some sort of server process which
is listening for data from thousands of clients ... each time round the loop
it's probably only dealing with input from one client, so why should it need to
build new list of (say) two thousand descriptors to be polled?
>
>> So, what would be great is to:
>> 1. Implement the CFRunLoop
>> 2. Keep platform specific code to a minimum (though make use of it where it
>> really helps performance) for maintainability
>> 3. Have testcases to check that it behaves like OSX on all platforms
>> 4. Finally, re-implement NSRunLoop on top of CFRunLoop
>
> It is not possible to implement NSRunLoop on top of CFRunLoop using only the
> public APIs. It should, however, be possible to create a thin
> platform-specific layer which permits implementation of both NSRunLoop and
> CFRunLoop. On OS X 10.6, libdispatch is used to implement both.
Yes, well if you want to implement a libdispatch clone and build on top of that
I wouldn't complain ... sounds like fun.
When you rewrote thread handling, I think we got a really big benefit ... you
ware able to make the code simpler (more maintainable) and more efficient
because you could work to a single thread API for all platforms. Yes there
were teething problems and extra work involved, but not too much and the result
was a definite improvement.
When you rewrote NSNumber the result was much less successful because the new
code, while smaller, is not actually much/any easier to understand (except to
you perhaps), and there were enough glitches to iron out for OSX compatibility
that overall benefit is hard to judge.
I'd like any NSRunLoop rewrite to be more like the NSThread rewrite, but the
big problem is that there is no single API to use, so whatever you do needs to
maintain something like the existing code for posix and add lots more code for
different modern platforms. This inherently means more code and makes things
less maintainable, so you need to look for *real* improvements in performance
(eg scaling to many thousands of descriptors and/or timers) and/or
power/flexibility (eg adding CFRunLoop and perhaps libdispatch) without
breaking any of the existing API of course :-)