Hi Everyone,
I initiated a discussion back in 2015 [1] about fragility of Emacs in
terms of filesystem operations on stale NFS. No solution actually came
out of this discussion. I still find this issue very disruptive. Yet
another example would be `recentf-cleanup' which is in my case triggered
on Emacs start up, when the file comes from stale NFS, the corresponding
`file-readable-p' down the stack will hang indefinitely, and there would
be no way to unfreeze it apart from issuing 'kill -9' to that Emacs
instance. Don't you people find it unacceptable for the daily usage?
Well, I do. Such hangs always disrupt daily work and require quite some
time to track them down as they are not Lisp-debuggable with e.g. <C-g>
in a straightforward way (these are dead hangs from C code, where even
attaching a GDB does not work).
Well, enough rant. I think I have a proposal how to fix the issue, even
given the blocking nature of Emacs. How about introducing a variable
`file-access-timeout' defaulting to `nil', which would reflect a
configurable timeout for all access operations (such as
`file-readable-p')? This would be achieved via `SIGALARM' in the C
code, which would protect every such operation. For example,
#include <sigaction.h>
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>
static void alarm_handler(int sig)
{
return;
}
int emacs_stat(const char* path, struct stat* s, unsigned int seconds)
{
struct sigaction newact;
struct sigaction oldact;
memset(&newact, 0, sizeof(newact));
memset(&oldact, 0, sizeof(oldact));
sigemptyset(&newact.sa_mask);
newact.sa_flags = 0;
newact.sa_handler = alarm_handler;
sigaction(SIGALRM, &newact, &oldact);
alarm(seconds);
errno = 0;
const int rc = stat(path, s);
const int saved_errno = errno;
alarm(0);
sigaction(SIGALRM, &oldact, NULL);
errno = saved_errno;
return rc;
}
where `seconds' should be initialized with the value of
`file-access-timeout'. The cool advantage of this that I see is that
one can then also selectively `let'-bind different values for
`file-access-timeout', thus having total control over the use cases in
which one wants to protect oneself from indefinite hangs.
Kind regards,
Alexander
[1] https://lists.gnu.org/archive/html/help-gnu-emacs/2015-11/msg00251.html