[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Linux Binary Compatibility

From: Farid Hajji
Subject: Linux Binary Compatibility
Date: Sat, 28 Apr 2001 03:49:00 +0200

CAVEAT: Theoretic discussion ahead! Comments welcome.

Under Lites/Mach, BSD binaries can be executed directly through
a technique known as dynamic relinking. See Helanders Lites thesis
for details.

We _could_ use this Lites approach as well in the Hurd, to provide
binary compatibility to, say, Linux-Binaries. Marcus suggested
yet a better/easier way: Since Linux- and Hurd-Binaries share the
same glibc _ABI_, one could simply relink at load time linux
binaries against the Hurd's glibc. As long as no linux syscalls
are being made in the binaries themselves, the binaries won't
even notice that they are not running under Linux. [Note that
by 'syscall', I explicitely mean calling the kernel, e.g. through
a trap, _not_ simple library calls!]

Of course, it's not so easy! Here are the common pitfalls to watch
out for (please expand this list):

* some binaries are not strictly POSIX compliant and expect
  unique linux features. E.g. are binaries that use Linux' /proc
  or special drivers/ioctls.
* some glibc functions are not [yet] fully implemented as sysdeps
  in the Hurd's glibc. If I can remember, Marcus suggested that the
  main problem were the linuxthreads, though I'm not sure...
* statically linked binaries contain syscalls/traps to the linux
  kernel. They cannot make use of switching the glibc's at loading
  time and they would require full emulation (meaning: another

Even if we disregard the above problems, the number of Linux binaries
that could run directly under the Hurd seems quite big.

By swapping glibc's at loading time, running Linux binaries under
the Hurd would probably not be [much?] slower than running native
Hurd binaries. Besides the overhead at exec() time [but then
unfortuately for _every_ binary and this is already sloooowww],
no further delays would be introduced.

One minor (logistic) problem I see here (besides working out how
to swap glibc's in exec()), is that not only glibc needs to be
swapped, but probably other libraries as well. If a Linux binary uses
Libraries l1, l2 and l3, then l1.so, l2.so and l3.so will have
to be linked as well (or replaced?). More importantly, l1 may nameclash
with Hurd's l1 so it is obviously necessary to stash all linux-libraries
that will be linked to the binaries) to a safe place, distinct
from, say, /lib. The loader would have to know about this pecularity.
Any ideas?

The mid-term goal is, of course, to realize something similar in
functionality to the Linuxulator of FreeBSD. This is a (kernel-based)
mechanism in FreeBSD that 1. links dynamic binaries against the
correct libraries at load time, and 2. redirects linux syscalls to
a kernel-module that intercepts those syscalls and either translates
them directly into native syscalls or adapts them accordingly.
Actually, most linux binaries run even faster under the Linuxulator
than under Linux (I benchmarked quite a few and can confirm this
surprising result)!

A similar emulator system for the Hurd could be even much more
sophisticated. Here's one possible approach:

Let's assume that binaries are stored in a hiearchy like, say:

Now suppose that prog1 is started. exec() would get a hint from
the translator serving /emulator/linux to use linux-emulation for
this file. By the same token, exec()ing prog3 would use freebsd-
emulation and so on... OTOH, binaries _not_ located under /emulator
could be exec()d the usual way (no hints from the underlying file_t
object, and therefore no additional overhead).

An alternative is to 'brand' each non-native binary with the
right OS type, somewhat like brandelf does in FreeBSD's linuxulator,
or, where this is not necessary, rely on the linker's infos stored
in the ELF header.

While we're at it, a context sensitive exec() described above could
also be creatively used later in SMP environments, e.g. like this:

Let's assume the following pseudo-filesystem:
Starting a program on a specific CPU <n> would be a matter or
cd'ing to /cpus/<n> and from there exec()ing the program.
Alternatively, cd'ing to /cpus/round-robin would and then
exec()ing a program would assign the new process to this
special scheduler. Actually, two glibc calls are not necessary
(it's not necessary to chdir()). What about this?
  exec ("/cpus/1/bin/myprog");
  exec ("/cpus/5/bin/myprog");
  exec ("/cpus/realtime/bin/myprog");
Here, /bin/myprog would be started three times, once on cpu #1,
once on cpu #2 and once under the realtime scheduler.

Of course, this generalizes to multiple hosts:
  exec ("/cluster/host5.mynet.net/cpus/2/bin/myprog");
  exec ("/cluster/host9.mynet.net/cpus/1/bin/myprog");
  exec ("/cluster/host11.mynet.net/bin/myprog");
Here, /bin/myprog (on "our" host, that is, the host that calls exec()),
gets exec()d on host5's cpu #2, host9's cpu #1 and host11 (using its
default scheduler) respectively.

Okay, that's enough SF for now ;-). Could someone more knowlegeable
in glibc and exec() please point out, what is necessary to do in
order to swap linux' glibc to the Hurd's glibc in binaries dynamically
linked against linux' glibc? That would be a first step towards
binary compatibility.



Farid Hajji -- Unix Systems and Network Admin | Phone: +49-2131-67-555
Broicherdorfstr. 83, D-41564 Kaarst, Germany  | farid.hajji@ob.kamp.net
- - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - -
One OS To Rule Them All And In The Darkness Bind Them... --Bill Gates.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]