bug#41865: ns_select behavior in "recent" (2013+, 10.9/Mavericks+) mac-o

Hi,

This might be more of a discussion than a bug, or a documentation suggestion, or something like that.

During debugging of a performance issue, I discovered that `ns_select` in nsterm.m uses `scheduledTimerWithTimeInterval` to rewake the run loop when it is called with "only" a timeout. This would occur frequently if a timer is scheduled with `run-with-timer` only. For example, emacs -Q with

(run-with-timer 1 1 (lambda ()
(print (format "%s" (format-time-string "%s"))
'external-debugging-output)))

And tab away, eg. have `focus-out` occur, e.g. when `applicationDidResignActive` occurs. You will notice that after a little time (this depends on whether you are using battery or not), the unix epoch time printed will be delayed by ~10 seconds, and then (of course) code in `timer.el` repeats the timer to catch up.

This is because in 2013, OSX Mavericks added "appnap", which is a feature designed to conserve battery life and power consumption. One thing it does is to lower timer frequencies for background tasks.

More context:

1. elisp documentation "timers" clearly already says that a timer might be run multiple times after being delayed if "repeat" is given.

2. `timer-max-repeats` exists as a defcustom, with a default of 10 (!).

3. `suspend-frame' in TTY will do something similar, but it won't run any timers while suspended, of course; but, when you unsuspend all the timers "catch up".

4. I grepped built-in elisp. There are actually only very few cases where people use repeat for `run-with-timer`, `autorevert` being a major one.

5. I grepped major emacs packages, company, lsp, doom, spacemacs, etc. and there's almost no usage of `run-with-timer` with repeat.

A few options occur to me:

1. Add more words in documentation in the timers node to clarify that things like suspending, OS decisions, GC, could impact when a rescheduled timer might run again. Based on my brief survey, almost everyone really wants "run again, once". Make it even clearer that is probably what you really want.

2. Support the usual use case better by making `timer-max-repeats` bindable around `run-with-timer`, so that one could control how often a late timer is rerun in a row. It's more likely that the developer of a timer really knows how often it should run then a user, to be honest.

3. Relatedly, consider changing the default value of `timer-max-repeats' to 1. There seems to be no point having this value be configurable to me actually. Timer code must already be robust to being rate limited. In what use case is 10 better than 1? Changing defaults is always impactful, though... so maybe not.

4. And/or, extend `run-with-timer` with a new API argument, or have a new function `run-with-timer-exactly-once`, or something, to better cater for the more common use case.

5. In either case, update `auto-revert` to run `auto-revert-buffers` once exactly. There's no point running the revert timer handler again and again in very quick succession - the default being 10 - after being suspended or delayed. Especially in the case of remote connections, this might be expensive. Also, various packages use those hooks to do further heavyweight actions (magit, lsp, eg.)

6. Extend ns* layer code to understand the timer list, so that timers can be scheduled at the OS level, which allows one to suggest to the OS that certain tasks are user-initiated or background tasks - e.g "refresh this buffer tailing" versus "autosave/gc on idle". MacOS has this level of granularity. This would be a giant undertaking.

My suggestion would be 1+2+5.

(4) seems a little adventurous, and perhaps overkill. However, guiding developers to do the right thing easily is the key goal of API design, so maybe it's worth the extra API surface to support. I would hypothesize that power management & control of scheduled tasks is likely to spread to other OSes - I haven't checked but I wouldn't be surprised it exists on Windows already - so there might be value there for the longer term future.

(5) is scary work, and I looked at the number of reverted attempts to improve `ns_select` in `nsterm.m` since 2013. It would also imply deeply understanding/changing how the various process.c / keyboard.c / read / event handling code works; most of that is OS platform independent and pretty gnarly, so it would be tough going. Perhaps in some long term future it might be interesting to attempt - deeper integration of OS scheduling features into emacs would probably improve latency and performance. I mentioned it here because in a pure sense this would be the "best" fix, but I think the tradeoffs are not worth it, unless there is platform wide support for that across Linux, windows & mac osx.

If there is interest for 1 + 2 + 5, I would be willing to submit a patch along those lines in parts for each.

Thanks,

Lin

From:	Lin Xu
Subject:	bug#41865: ns_select behavior in "recent" (2013+, 10.9/Mavericks+) mac-osx
Date:	Sun, 14 Jun 2020 16:23:55 -0700