[PATCH] Re: Hurd shutdown problems

Aloha -

OK, I seem to have gotten a handle on this thing now.

First, there's a missing mutex unlock in mach_defpager. I'm attaching two patches. One fixes the debug printfs in mach_defpager/default_pager.c, which obviously haven't been compiled for a while. Use %p and %lx instead of %x to silence compiler warnings, and access pthread_mutex_t's internal structure member __held instead of held when printing mutex state. The second patch actual fixes the problem.

Second, the sysvinit scripts are killing mach_defpager during the shutdown sequence, and this wreaks havoc. The big culprit is /sbin/killall5, a C program in the sysvinit-utils package. It's readproc() function operates by reading each process's stat file and parsing its startcode and endcode values (Linux man page proc(5) - the address range of the program text), and flagging the PID as a 'kernel' process, not to be killed, if these values are both zero. Obviously, this doesn't work on hurd.

I've tinkered with several band-aids - strcmp on the program name, not killing PIDs below 100, but obviously none of this is suitable to submit as a patch. killall5's internal logic is just too Linux specific, IMHO. What's the Hurdish way to do it? I'm thinking killall5 should check that 'important' flag on the process and skip processes for which that flag is set. Yet, I don't understand what that flag is really intended for. Does this make sense?

I think this means changing killall5 so it access the Hurd process server directly, instead of walking /proc. Incidentally, the program currently works by mounting /proc if it isn't mounted already - odd behavior for a program that's supposed to be shutting things done, not starting them up! Might have problems getting such a Hurd specific patch into the upstream code base; who knows?

Also, what should the kernel do if it has problems with the default pager? After I fixed the mutex bug, I started getting a bunch of memory_object_data_request failed messages on console. Still mysterious, but I guess that's better than nothing! The error code prints in hex, and when I looked it up it was MACH_SEND_INVALID_DEST. Is that what you get when you send to a dead port?

Yet when the mutex locked up, the result was a silent, locked system. A timeout of some kind, accompanied with complaints on console, would be better, I think, but I don't understand the vm code enough to attempt such a change right now.

Also, there's this proxy-defpager. Is that the actual default pager, acting as front end to mach-defpager? Yet killall5 seems to be able to kill proxy-defpager without consequence. I don't understand.

For me, though, I now have a qemu VM that can cleanly start up, use swap, and shutdown, so I have real sense of accomplishment!

agape

brent

From:	Brent W. Baccala
Subject:	[PATCH] Re: Hurd shutdown problems
Date:	Wed, 17 Aug 2016 10:58:41 -1000