bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New channel concept


From: Carl Fredrik Hammar
Subject: Re: New channel concept
Date: Sat, 19 Jan 2008 13:57:55 +0100
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)

Hello,

<olafBuddenhagen@gmx.net> writes:
> Hi,
>
> This time I have a bit more to say... ;-)

Indeed. :-)

>> There are also objects which I have called ``channel hubs'' (yuck!),
>> that act as trivial filesystem objects, and they implement a subset of
>> the Hurds fs interface.
>
> Let me get this right: The hub is the entity that handles any RPC
> requests made to a channel translator?

No, a hub is a an object similar to a channel, except it deals with
filesystem requests, i.e. opens.  Which results in a channel.  (You
wanted to call hubs ``channels'', and call channels ``channel
sessions'' in the old discussion.)


>> It's such an object that actually gets transferred over IPC, which
>> then produces channel objects.
>
> You mean the process of a channel translator "uploading" the necessary
> information so the client using libchannel can handle stuff itself
> instead of further using the translator?

Yes.


>> I then started toying with the idea of hubs as a type of channels that
>> implement the fs interface, one that can return new channels.
>
> I fear I'm totally lost now.
>
> I thought channels basically offer a standard I/O interface, plus
> individual additions, plus the channel management interface that allows
> "uploading" the channel information so it can be handled in the client.
> (Using libchannel modules.)
>
> What are you suggesting here? Separating the channel management
> interface from the actual I/O+extensions, so they are implemented by
> different objects? If so, how would you uphold the concept of channels:
> That a translator can either handle I/O requests itself, or optionally
> upload the functionality to the client?
>
> Well, maybe it's not that important to understand it, as it seems only a
> temporary idea?...

My original idea was to implement hubs as channels, using channels
ability to implement extra interfaces and not implementing io.  This
didn't fit with the channel concept.  Expanding the concept to allow
it, we can throw out hub as a special concept, it's ``just a channel
implementing the fs interface''.  (The channel fs interface would
return channels instead of ports on `open()'.)


>> Of course this doesn't fit at all with the old concept of a channel,
>> instead I now propose that channels be able to implement *arbitrary*
>> Hurd objects.  All a server object has to do is implement a channel
>> interface in order for a client to transfer it.
>
> Well, I guess that fits with my proposal to turn channels into a generic
> translator stacking framework. Not sure though what exactly you mean by
> "Hurd objects" in this context. Did I mention already that the word
> "object" is way overloaded? ;-)

By Hurd objects I meant server-side object, as in objects servers of
the Hurd provide.  Typically file objects, but also others, for
instance user identities and processes through the auth and proc
servers respectively.  (I'm not suggesting these particular objects
are useful in the context of channels.)

I'll stick with the term `server-side object' instead.


>> Because channels are designed to deal with stacked translators,
>> channels can be considered a *logical communication channel* going
>> through the stack.  Therefore I still regard ``channel'' an valid name
>> for this abstraction.  (But by all means, do feel free to object!)
>> Henceforth, ``channel'' will refer to the new concept unless otherwise
>> specified.
>
> Considering the more generic scope, maybe we should drop the whole
> "channel" terminology alltogether, and try to find something more
> intuitively describing translator stacking... I have no suggestions
> offhand, though :-)

How about `virtual port' or `vport' for short.  Since channels should
have the same semantics as a port, the difference being you may
reference a local object instead of a remote one.  In this sense it is
a virtualizable port.  At least, I think this fits with the concept I
have proposed.

While it doesn't capture the stacking part, the channel mechanism
isn't really limited to this use-case.  I just can't think of any
others at the moment.  ;-)


>> Arbitrary objects you say?  So now we can just skip IPC altogether and
>> reimplement the Hurd using channels, and the Hurd will be lightning
>> fast?  Unfortunately not.  ;-)
>> 
>> Channels can't provide any functionality to a process which it
>> couldn't otherwise do.  This because channels can't offer any
>> protection whatsoever for their content.  So an effort to only use
>> channels would eventually lead to the entire system running inside a
>> single process, or worse, inside the kernel.
>
> That's not really what would happen. What libchannel does is moving
> functionality into library modules within the clients, i.e. towards
> top-level application programs. Taken to the extreme, you still have
> exactly one process for each top-level application; but this process
> implements almost all functionality it relies upon, including stuff that
> traditionally is provided by other components; only at the very lowest
> level (access to actual hardware and peripherial interfaces), you still
> have central management. In short, you get the exokernel approach. Are
> you familiar with that?

I have heard about it, but not looked into it further.  But I agree
that it does seem closer to what actually would happen.


> [snipped (not so) short introduction to exokernels]
>
> All in all I believe that the Exokernel model is interesting, and
> worthwhile to use for some specific subsystems -- for graphics hardware,
> Mesa's/Xorg's DRI/DRM model basically implements this model for example.
> (As well as the older, now defunct KGI approach...) But doing it on a
> large scale is not a good idea IMHO. For the most part, I think a good
> design is about choosing the *right* abstractions in resource
> management, not about avoiding them alltogether.

I agree with this sentiment.  Thanks for the primer on exokernels.
;-)


> (For stores for example it's probably not really useful most of the
> time... It seems to me that the major motivation behind libstore was
> actually to allow the root FS to run without relying on other processes.
> Personally, I'm not convinced this was really a good idea. But well,
> what do I know :-) )

Interesting.  I think I agree with you, at the very least libstore
seems a much too complicated solution with respect to the problem.


>> This determines a first requirement of libchannel: there must be a
>> special channel type that wraps around a port referencing the object,
>> if it could not be fetched.
>
> I wouldn't call it a special channel type. Rather, it's an inherent part
> of the framework. Not only logically, but also technically it's probably
> more useful to implement it specially, rather than trying to fit it into
> the channel module mechanism...

This is actually what I had in mind, but I didn't wish to elaborate.
(Such details are the subject of my next mail in my little series.)


>> (This excludes channel junctions mentioned in the old discussion, but
>> these couldn't be transferred in the old libchannel either, they were
>> only coaxed in as channels to ease their implementation.  A decision I
>> believe was a mistake in hindsight, as they only need to *use*
>> channels.)
>
> Very true. This point totally escaped me in the original discussion: As
> junctions always have to run in a separate process from the clients,
> i.e. can't be uploaded from their original translator, there is no point
> at all to fit them into the channel model! I feel so stupid now :-)

Don't worry, it made me feel the same way.  ;-)


> Perhaps it might be still desirable to have some way to handle junctions
> as part of a channel, e.g. to allow them to be used with channel
> stacking syntax... But I'm not at all sure this is worthwhile, or even
> desirable. I guess we should just ignore such a possibility for the base
> concept -- that simplifies things a lot :-)

Their omission would of simplified old libchannel, but my current
proposal would handle them gracefully.  Too bad they're (probably)
irrelevant.


>> Now there's a clear goal to strive for: the closer channel objects are
>> to the semantics of using the object through a port, the more
>> interfaces can be implemented by channels.  More specifically, the
>> goal is to emulating full RPC calls (as opposed to the underlying
>> message passing).  Ideally channel functions should act exactly like
>> those generated by MIG, only differing in that their name are prefixed
>> with ``channel_'', that their port parameters are dealt with in a
>> uniform manner, and that they might perform better.  Given that MIG is
>> well documented, it will be easy to evaluate this property of a
>> design.  Any automation would also be welcomed.
>
> I must disagree here. The idea of channels (or more generally,
> optimizing translator stacking) is *not* merely to avoid actual RPC
> calls. I don't think that would be a worthwhile goal. The actual IPC is
> very slow on Mach, but modern microkernels show that it is possible to
> do much better. The cost of IPC itself is not exactly neglectible, but
> only in few situations really a relevant performance factor.

I would argue that (potentially deep) translator stacks is one such
situation.  Also my impressions are that we will be stuck with Mach
for quite a while, and that IPC on Mach is inherently slow.  So the
fast IPC argument doesn't really apply.

However the argument that the IPC context inhibits faster interfaces
is perhaps valid and interesting.


> The main overhead of RPCs is not from the actual calls, but from the
> implications of an RPC interface -- from the fact that client and server
> can run in different processes (address spaces), possibly even on
> different hosts. (Though we don't employ this latter possibilily
> presently, and I'm not convinced it is really useful to preserve network
> transparency at such a low level.) Meaning the client needs to be
> prepared for communication to fail; meaning that the interface is
> constrained to passing mostly plain values, no pointers, no global
> variables, no function pointers etc.; meaning client and server don't
> have access to the same resources; meaning server and client threads run
> asynchronously (unless using passive objects, which we don't).

I'll tackle the issues one at the time in the order you enumerated
them.

* Communication failure

  We have to deal with this in either case, since we might be using a
  port wrapper.  Even if not using a port wrapper directly, the bottom
  layer of a channel stack probably uses IPC, (it is most likely a
  port wrapper).


* No pointers

  The problem here is unnecessary copying.  To illustrate this lets
  compare the Hurd's `io_read' to POSIX's `read'.

  `io_read' optionally takes a buffer as input and returns a buffer
  which is either the input buffer or a newly allocated one.  Note
  that the input buffer is deallocated from the client on a successful
  send and that the output buffer is deallocated on a successful
  reply.  Mach can take advantage of this and avoid copying the
  buffer if the server reuses the input buffer.

  Instead of taking a buffer, `read' takes a pointer as input and
  writes to the underlying memory, thus avoiding any copy.

  The problem with `io_read' is that we have to pass page aligned
  data.  (We can pass unaligned data, but the entire page would be
  visible to the server.)  This means that we have to resort to copy
  if we want to store the data at an unaligned address.
  Unfortunately, this is quite common and for instance it's needed for
  buffering.

  But note that `io_read' is more efficient in some cases, such as
  when the result is simply forwarded to a client with no buffering
  required.  I suspect this is case actually be the most common in the
  context of translator stacks.

  This might imply that we actually want both.  But that would in-turn
  complicate channel modules.


* No globals

  The use of globals implies that memory be shared between different
  clients each having a channel from the same translator.  I think we
  can agree that is a bad thing in this context, (unless read-only
  like code).


* No function pointers

  Right.  But we do have ports which can do the same thing, just send
  it and listen for call-backs.  When using channels one might provide
  a channel instead, which means we will avoid IPC if the channel is
  local.

  To support this, libchannel need to provide channel wrapping ports.
  This is more complex then a port wrapping channel, because of the
  asynchronous nature of ports.  Translators need this infrastructure
  anyway and so libchannel can probably be designed to utilize it (or
  the other way around).


* Asynchronism

  While Mach's IPC primitive `mach_msg' is asynchronous, we are only
  interested in RPCs and these are synchronous.  In some sense an RPC
  is just a function call to a function in another address space.

  Also you wrongly state that we don't use passive objects.  However,
  the objects a translator implement are passive, they can only
  respond to requests from a client.  Sure, translators have threads
  and can make requests to other servers, but these are auxiliary in
  nature.

  Of course, we also have asynchronous interfaces, but they are
  implemented over synchronous RPCs.  The only difference is that the
  client implements a passive object and passes it to the server.  The
  channel wrapping port described above covers this problem also.

* etc.

  It's hard for me to counter this one.  I hope you don't mind me
  skipping it.  ;-)


> All these properties require a lot of overhead, which won't go away by
> simply avoiding the actual RPC call. To really optimize here, it's
> necessary to take it up at a much higher level. (In fact, the major
> advantage of the channel concept over my earlier ideas on optimizing
> translator stacking is the fact that it works at a higher level!)

As is hopefully clear by now, the only property that introduces
ay significant overhead is unnecessary copying in some cases.  But I do
agree that this is something we do not want, especially in the context
of deep stacks.


> When client and server are from the same channel family, they can
> interact at a level that is in fact totally detached from the standard
> I/O-based interface, using something much more abstract instead.

I agree that this is useful and that, although I was aware of it, have
perhaps overlooked its importance.


> (I must confess that I don't know the actual store interfaces; but
> my guess is that libstore uses such a special interface between the
> modules internally?)

It seems the only difference is that offsets are mandatory and given
in blocks instead of bytes, amount to be read is in bytes but must be
a multiple of block size.  In this case it would of been better to
keep it wholly consistent with the io interface, and just use block
aligned offsets.

Also it has some funky functionality to remap the blocks of a store
without any cooperation from the back-end.  But this just seems
awfully complex and could probably be reimplemented through a store
module with only a slight loss of performance.

It seems that libstore really could use a clean-up.  :-/


> In some cases the actual functionality of the individual layers
> perhaps could even be implemented using some kind of abstract
> description, rather than C code.

I don't really see how that would work, do you want to elaborate?


> When client and server are from different families, obviously they can't
> use such a specific interface; they will be bound by the I/O interface
> as common denominator. But it still can happen through a high-level
> variant totally detached from RPC mechanisms.

Right.


> Even for "dumb" clients, not knowing anything about channels at all,
> it might still be useful to wrap the I/O system calls at a level
> above actual RPCs. (In this case, libc would have to call out to
> libchannel...)

Yes, an interface closer to what libc provides would probably be more
efficient in this case.


>> If a design achieves the goal stated above, channels can handle
>> anything the old libchannel could.  The client obtains the port to a
>> channel translators fs object, fetches the underlying channel (which
>> correspond to a hub) and then calls ``channel_dir_lookup'' to get io
>> channels (which correspond to old channels).
>
> I don't understand the "channel_dir_lookup" step. What does that do?

`dir_lookup' is the fs interface for file name lookups, and the
primitive used to implement POSIX's `open'.

Although it seems I was slightly mistaken about the details of the
lookup protocol when it comes to trivial single-node filesystems.  The
result seems to be a io object, and file io objects (as opposed to
socket io objects) are required to implement `dir_lookup' so that
clients can reopen the underlying file.

This is interesting because one of the reasons I introduced hubs
because I thought it modeled a trivial filesystem better.  It seems
hubs was a silly idea all along.  :-)


>> If an interface makes little sense in a IPC context, the translator
>> can simply not handle such calls over IPC but still provide a
>> implementation in the channel itself.  To deal with this the client
>> should be able to detect when a transfer fails, so it can apply a
>> compatibility channel over the original port.  If such a channel can't
>> be implemented then the entire operation must fail if transfer fails.
>> (This case is probably uninteresting, it would basically be a
>> complicated way to implement plug-ins.)
>
> I really think channels should only be used as an optimization of
> translator stacking, i.e. be limited to things that can be expressed
> through RPC interfaces.
>
> Is there any use case where you think it would be important to do
> otherwise?
>
> -antrik-

I didn't mean functionality that can't be implemented through RPC, I
was actually thinking about interfaces that would be more efficient in
a single address space context then their RPC counterpart, i.e.
exactly what you were asking for earlier in your mail.  ;-)

Although I mostly added it as an afterthought, and I'll admit it was a
bit hammered on.

So where do we go from here?  As I see it we have two extremes, which
I will call dynamic and static channels (at least for now).

Dynamic channels are the ones I have presented, sans the non-RPC
interfaces.  That is they closely emulate the existing RPC interfaces,
and appears as nothing more than fast RPCs to clients.

The big selling point here is transparency.  Using a channel is just
like using a port, so existing clients need little change to benefit
from using channels.  Same thing with servers, since they implement
RPCs already, it's mostly a matter of hooking the existing RPC
implementations to channels instead.

Unfortunately we miss out on the convenience offered by libc, unless
we reimplement them over channels and supply them also.  (We could
also integrate channels into libc directly, but I suspect that
wouldn't happen anytime soon if at all.)

Other benefits include that different channel families need not be
aware of each-other to benefit from using the channel interface.
Though I suspect interoperability isn't very useful.  Why would an
audio channel want to be layered over a network channel?

Also we get some increased interface stability.  Because the
interfaces are derived from RPC interfaces, which more applications
rely upon, which presumably will remain stable for long periods of
time.

With static channels each channel belongs to a family, where each
family corresponds to its own abstraction, and where each channel
implements an interface optimized for this abstraction.

That is, we introduce a libaudio, libnet, etc. for each family.  Each
being like libstore currently is, only cleaner and sharing common code
through libchannel.  Where libchannel itself doesn't introduce a
channel abstraction per se, it's just a support library.

(Although in this case I'd much rather split libchannel into smaller,
clear cut pieces, for instance a `libenc' to deal with encoding and
decoding transferred data.)

The thing about static channels is that they are simple.  Both simple
to implement and use because their interface can be brought closer to
the problem domain.

The downside being that a suitable abstraction must be engineered for
each channel family, including support libraries to implement the
translators corresponding to each modules.  Also any interoperability
must be explicit, by creating modules that adapts one channel type to
another.

The middle ground as I see it is dynamic channels with non-RPC
interfaces, where each such interface corresponds to a channel family.
But this seems like a clumsy way to implement abstractions tailored
for each family.  Somehow I think providing *both* static and dynamic
channels would be cleaner and more straight forward, using which ever
handles the task well enough.  Where dynamic channels could be
considered a fancy static channel.

Anyway, I'm a bit torn over which route is best.  Regardless I'll
return to writing the mail describing how dynamic channels could be
implemented, which I was in the middle of when I got your mail.  It's
an interesting mechanism even if it doesn't turn out to be very
useful.  ;-)

Regards,
  Fredrik




reply via email to

[Prev in Thread] Current Thread [Next in Thread]