Hurd on a cluster computer

Hi -

I've been wondering about using Hurd on a cluster computer; i.e, a configuration where each node has multiple identical cores and its own memory. For example, an eight node cluster where each node has 8 GB of RAM and eight cores. I stress that the cores are identical, so that processes can run on any core and even migrate between them.

I'd like to see the whole thing running an integrated POSIX operating system. So, when I run 'top', I see 64 processors and 64 GB of RAM.

Code tuned for this architecture might behave like this: A program forks eight processes, and each process spawns eight threads. Our basic programming paradigm is that threads share memory and run on the same node, while processes do not share memory and likely run on different nodes. We can setup shared memory between processes (System V IPC), but we know that this is expensive because it has to be emulated, so we try to avoid it. Process migration is expensive, too, so we try to avoid it as well. In the example, we migrate seven of our eight processes, right at the beginning, during program initialization, take the performance hit once, then leave them to run on their separate nodes.

All we really have between the nodes is a fast LAN, so we can do message passing. Yet that's exactly what Mach/Hurd is designed for - a kernel built around message passing.

Can Hurd work, well, in such an environment?

Since I haven't dug into a single line of Hurd, I don't know - that's why I'm asking. I've done some homework, though, and there are some things that I am aware of.

First, it's basically Mach that would have to be modified, right? Changes to Hurd servers might be required for performance reasons, but so long as Mach works on the cluster, Hurd should work.

Next, Mach/Hurd's memory limitations and 32-bit pointers. My first through was to ignore it for right now, since these are well known problems. If we could get Hurd running at all on a cluster computer, then we've have to come back and make sure it can actually use the entire 8 GB of RAM on a single node. Yet I'm not sure. There might be situations where we have to address the entire cluster's RAM, even though accessing a non-local part of it will be slow.

Sending large blocks of data in Mach messages becomes problematic, since we can't play shared memory games. It would have to be emulated, and avoided whenever possible. These are the kinds of changes that would be needed to the Hurd servers themselves - they can no longer assume that firing virtual memory across a port is fast.

In-order and guaranteed delivery. For the moment, let's assume that our LAN can do this natively. Since we're not going through routers, only a single Ethernet switch, maybe virtualized, this might work.

Can a Hurd network driver be built to pass kernel messages, or is this a huge problem? Something like, you load an Ethernet driver, and it has some kind of interface that allows Mach messages to be passed through it?

Protected data, like port rights - let's assume that we use a dedicated Ethertype that isn't routed and can't be addressed by anything but trusted Mach kernels. Yes, this means that our Ethernet driver now becomes a potential security hole that can be used to steal port rights, but let's keep noting and then ignoring stuff like that...

And, oh yes, a "Mach kernel" is now something that runs across multiple processors with no shared memory. This is the biggest problem that I can see - Mach is multithreaded, so that's not a problem, but I'll bet it assumes shared memory structures between the threads, and that's pervasive in all its code. Am I right?

If so, then the first step would be to modify Mach, probably throughout its code, so that it can handle threads with no shared memory between them, only a communication interface provided by a network driver. That gets it running on a cluster, then we need to remove the memory limitations, and start tuning things to make it run well.

The payoff is a supercomputer operating system that presents an entire cluster as a single POSIX system with hundreds of processors, terabytes of RAM, and petabytes of disk space.

Any thoughts?

Any prior work in this direction?

Thank you!

agape

brent

From:	Brent W. Baccala
Subject:	Hurd on a cluster computer
Date:	Tue, 26 Jul 2016 13:42:16 -1000