bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Improving object mobility within the Hurd


From: Carl Fredrik Hammar
Subject: Re: Improving object mobility within the Hurd
Date: Thu, 22 Jan 2009 10:54:53 +0100
User-agent: Mutt/1.5.18 (2008-05-17)

Hi,

On Fri, Jan 16, 2009 at 01:11:09PM +0100, olafBuddenhagen@gmx.net wrote:
> BTW, it's rather sad that we already run into terminology issues
> again... We need to sort this out, to make a clear distinction between
> abstract objects and RPC objects.
> 
> (Judging by the terminology used in libstore, it seems that my use of
> "object" as a more abstract entity is closer to traditional use in Hurd:
> The store classes are clearly classes of abstract objects, not of RPC
> objects...)

Lets sort this terminology business out first then.

I actually think we agree on what an object is: a bundle of state and
code with a specific interface, i.e. what you call abstract objects.
The interface can be RPCs, function calls, direct state manipulation,
or some other way of using the object.

A /remote/ object is an object that can be called remotely.  A /local/
object is one that can be called locally.  These are the terms are used
by Java's RMI framework.

The biggest disadvantage is that they overload the terms local and remote,
as they can also be used for location.  However they usually coincide,
and if not it can be clarified e.g. `a remote object that is local to
the process' or `a local object that is remotely used'.

They are the best /pair/ of terms I've found so far.  `RPC object'
is more specific than remote, but I haven't been able to find a good
substitute for /local/, the best I have mustered is /C/ object.

A /Hurd object/ is a remote object implementing one of the Hurd's
interfaces.  This is the way it's used in the critique.  It is somewhat
confusing as it could be taken as /any/ object in the Hurd, e.g. including
stores.  I will try to avoid it in favor of remote object.

A mobile object is one that can be copied from one process to another,
code and all.  Note that both local and remote objects can both be mobile
or not.

Some might object (he he) to the fact that mobile objects are copied and
not moved.  However, movement is typically trivial to implement on top
of a copy: just remove the original object, and in case of RPC objects:
transfer the port read-right.

An /object system/ is a framework for implementing objects and controls
how they may be formed.  libstore is a trivial object system where all
objects have the same single interface.  Mach's IPC and MiG forms the
object system for remote objects. which allows objects with several
interfaces.

A /mob/ is an object specifically implemented through my future object
system.  Unless otherwise mentioned, a mob is assumed to be mobile as
it is the framework's primary purpose.

/Transparent/ in this context means that either a local or remote object
can be used with the same interface (using a wrapper).  This is to make
it possible to fall-back on using the object remotely if the object
can't be transferred.

> On Sat, Jan 10, 2009 at 06:56:15PM +0100, Carl Fredrik Hammar wrote:
> > On Sat, Jan 10, 2009 at 08:50:28AM +0100, olafBuddenhagen@gmx.net
> > wrote:
> 
> > > > I've split the project into three main parts: improving authority
> > > > verification, improving code transfer and an object system that
> > > > can emulate Hurd objects.  This because while they reinforce
> > > > each-other in respect to mobility, they are orthogonal and might
> > > > be useful in other areas.
> > > 
> > > I must admit that I don't see it (yet)... Can you please explain how
> > > you imagine these to be useful in different contexts?
> > 
> > The first two parts could be used to improve mobility in libstore
> > directly, without porting stores to the new object system.  It could
> > also be useful for alternative mobility framework, which might want to
> > use a tailored object system.  Most likely they might consider my
> > object system overkill if they don't actually want to emulate Hurd
> > objects.
> 
> I see. It all hinges on the definition of "Hurd object"... I wasn't
> paying enough attention: By object, you mean strictly the server code
> serving RPCs on a port it seems. 

By Hurd object I mean remote object as explained above.

Also by `emulate' I mostly mean `look-like', not full emulation as to
make it possible to plug in a Hurd object implementation and have it
work directly (as I aimed for in our last discussion).

This mostly means that objects should be dynamically typed and be able
to implement several interfaces.

> Object migration would simply mean loading the code in the client
> instead, and pretending to do real RPCs on it.

This would be one possibility.  I'm trying to avoid making assumptions
on how interfaces might look like.

> I was thinking of objects in a somewhat more abstract sense: The actual
> functionality provided by the server, which could be accessed by RPCs,
> but also by an alternative interface optimized for direct invocation
> when migrated to the client...

Interfaces need not be transparent.  An interface that can't fail can't
be transparent as it can't deal with the errors associated with RPCs
which are out of the object's control.

I do, however, suspect that transparent interfaces will be optimized
for the local object case.  For a io interface that would probably mean
a POSIX style, rather than a Hurd style, interface.

> So I guess by your definition, the use case I'm interested in for
> translator stacking, would actually not classify under object migration,
> but under other uses... I guess you remember that I don't consider
> actual RPC emulation particularily useful :-)

I'm guessing it classifies under partially transparent or non-transparent
object migration.  Though my framework will support both so I gather
it's still to be considered object migration.

> > The object system itself can be used in contexts outside of mobility.
> > For instance, libstores mechanism for specifying stores on the command
> > line.  Thus the process don't need other translators to provide the
> > store for it (using it remotely or transferring it).  This is useful
> > for the root file system translator during boot, e.g. if the file
> > system is on some RAID configuration.
> 
> Well, strictly speaking this is not mobility, as the object is not
> actually transferred from one process to the other, but rather loaded by
> the client directly... But conceptually, I don't see much difference
> really. The idea of loading an object directly instead of talking to it
> through RPC is exactly the same.
>
> > The command line mechanism can ignore many of the issues that arise in
> > mobility, e.g. consistency between different copies.
> 
> I must admit that I don't see the difference... Please explain.

Take copy store as an example.  The copy store makes a copy-on-write
copy of another store and discards changes when closed.  For instance,
a copy store over a zero store is useful for backing /tmp.

If a copy store where to migrate, then all modifications would also be
copied.  Writes made to the copy would not be reflected in the original
and vice versa.  Because of this, the copy store has the enforced flag set,
which makes storeio refuse migration requests.

When creating an object instead there will only be a single copy.  Which
circumvents the problem entirely.

This problem might be fixable for the copy store by sharing memory.
However, the solution doesn't generalize for all objects with shared
state.

> > > As I said before, I believe it is best for the server to provide the
> > > object code directly (by an RPC), rather than letting the client
> > > look it up somewhere. This would entirely remove the naming problem,
> > > along with some others.
> > 
> > This is one of the methods I'm considering, i.e. ``naming'' the code
> > with a port to the .so file.
> 
> Actually, I meant the server providing the object code directly through
> the RPC... But providing a port to a file containing it indeed seems
> preferable :-)

I wouldn't know where to begin doing it that way.  :-)

Using a port, or more specifically a file descriptor, instead of a file
name would be fairly straight forward.  dlopen takes a file name argument,
and I haven't found an alternative that takes a file descriptor.  But I
imagine I could copy paste most of dlopen to create such an alternative.
Or even do something like: dlopen("/dev/fd/$CODE_FD").

> > I do not have high hopes for this method though, mostly because it's
> > hard for the recipient to determine if it can trust the code.
> 
> Well, in the simple case -- using the traditional UNIX model -- it's
> pretty trivial: The client trusts the code if it trust the server, which
> is the case when the server is run by the same user, or by root. In this
> case, there is no problem at all.

Ah, but -- as per the Hurd's design goals -- we want to reduce the
trust needed between normal users to take advantage of this feature
when cooperating.  And the client doesn't need to trust the server if
it acquires the code from a trusted source, e.g. from /lib, /usr/lib,
or $LD_LIBRARY_PATH, or even statically linked code.

> Admittedly, this is more tricky when leaving the UNIX model, and working
> with pure capabilities... I'm not sure that an object named through a
> textual file name is indeed more trustworthy than one named through a
> port directly -- but I haven't really thought about it yet. I'm curious
> what you have to say on that in the promised later mails :-)

Using a file name, you can figure out who controls the file, and decide
whether you trust it based on that.  (Or at least I think so, I'm not
sure yet if a malicious file system can't fool you.)

This might not be impossible with ports, but I imagine it's trickier.

> > > Regarding trust, I think this is complementary to authority.
> > > Probably they should be considered together.
> > 
> > Really?  Being allowed to do something is quite distinct from trusting
> > that it really does what it claims to do IMHO.
> 
> Of course it is distrinct. I said they are *complementary*, not that
> they are the same :-)
> 
> In the UNIX case, it is actually quite symmetrical: The client trusts
> the object code provided by the server, if the server is the same user
> or root. The server entrusts the client with the content of the object,
> if the client is the same user or root.

I'm hoping to make it so that the server doesn't need to trust that the
client doesn't miss use the content of the object.  This by verifying
that the client already has the authority needed to hold it, and would
thus already able to acquire the content through other means.

Also note that checking that it's the same user is not enough, a process
can have its authority limited by chroots and sub-Hurds.  Root might
still be an exception though.

As I stated above I hope to derive trust from the name of the code, which
makes the two subjects linked.  While authority verification seems pretty
orthogonal, at least mechanism wise.

> > One idea I have been toying with is using it for alleviating the
> > problem of resource accountability.  I.e. making the client load an
> > object which manages state needed by the server.  Consider an object
> > that maintains the current position within a file, then the translator
> > wouldn't need to maintain such session state.
> 
> I see serious problems with this use case: The reason the file pointer
> for example is maintained by the server is that it can be shared between
> multiple clients, if clients share an fd. According to POSIX, they must
> share the file position as well in this case. So you can't trivially
> move the handling to the client. (Otherwise it would have been
> implemented in libc from the beginning...)

I'm aware of this issue.

> We could still move the handling to the client in the more common case
> that there is only one client -- but that wouldn't solve the resource
> management problem, as there are still the cases where it must remain in
> the server.

It doesn't need to be in *the* server, though someone must act as a
server for the file cursor object.  This could be the original client,
the new client, the server, or a third-party server in the system/per
user/per login/whatever.

My thoughts mostly revolve around clients pushing the cursor to a
third-party server and reloading if it becomes the sole client again.
This solution might be a tad messy, but it's fun to think about.  :-)

> It might actually be possible to move it to the clients and coordinate
> use somehow, but this is really tricky. Or we could just ignore the case
> alltogether, as it's a rather obscure feature, and the few (if any)
> programs actually making use of it probably could easier be adapted than
> handling it "properly" in the Hurd... But this is all beside the point:
> Object migration doesn't help with addressing the real problems here.

I'm not convinced it's obscure.  stdin & co. are very often shared with
child processes.  Redirecting one of them to a file means a cursor must
be shared.  Even a trivial command like below would fail otherwise:

        (cat foo; cat bar) > foobar

Logging a build is perhaps more compelling:

        make > make.log

A semi-plausible and slightly contrived concurrent example:

        (tail -f foo.log & tail -f bar.log|grep IMPORTANT) \
                > foobar.log

(tail -f foo.log bar.log would have the problem; hence the pipe to grep.)

> I like the translator concept, because it allows intuitively naming
> objects through filesystem locations; and the objects are standalone,
> i.e. can be accessed directly from the command line, typically through a
> filesystem interface.

I'm not sure what you mean by an object being standalone...

> I never considered these two aspects distictly, but I see now that it
> makes sense to do so: Not all translators implement a standard interface
> that can actually be used with standard tools. Even if technically they
> are standalone, the objects can only be handled by specific programs
> really. So the standalone aspect is pretty meaningless here -- but the
> naming aspect is still important.
> 
> For such objects it would make perfect sense to be implemented as a
> special kind of translators, so they can still be named through the
> filesystem; but the actual code always being executed in the client --
> i.e. using a migration mechanism that needn't be transparent.

I have also considered completely non-transparent objects and they might
indeed be useful.  However, it forces the client to trust the code base of
the object, so you can't use such objects with code from an untrusted user.

> An obvious use case are ioctl handlers: I believed for a long time that
> rather than being hardcoded in libc, they should be handled by some kind
> of loadable modules. This was actually discussed as part of the channel
> concept, but I discarded it back then, as it doesn't fulfill the
> transparency requirement, and thus didn't seem useful to me back then.

I never did look into how ioctls are handled so I can't tell of-hand
whether this is a good idea.  Perhaps I'll revisit them as a plausible
use-case later on.

> Now I see that it might be still useful to implement this using a common
> mobility framework, so they can be handled like something akin to
> translators -- providing objects that are not really standalone, but are
> named through filesystem locations.

They should be implementable as mobs.  However, as they are more
specialized I don't think they need more than a single interface, so
they might want to use a separate object system.

Regards,
  Fredrik




reply via email to

[Prev in Thread] Current Thread [Next in Thread]