Part 2: System Structure

l4-hurd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Part 2: System Structure

From:	Marcus Brinkmann
Subject:	Part 2: System Structure
Date:	Fri, 12 May 2006 17:45:22 +0200
User-agent:	Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (Sanjō) APEL/10.6 Emacs/21.4 (i486-pc-linux-gnu) MULE/5.0 (SAKAKI)

This is part 2 in a small series of notes explaining my opinion on
what is a good system structure for the Hurd.  While the ideas in
part 1 motivate the system structure presented here, the feasibility of
this system structure in turn justifies my opinion as presented in
part 1.  However, either part can also be taken individually.  There
will probably not be a third part.

Part 2: System Structure
------------------------

I will start with presenting the process hierarchy, explain some
abstract design patterns, and then show some specific applications.

Note that within this document, I will limit myself to certain types
of operations and features.  This does not mean that the system
itself, by design, contains any measures to forbid or ban other types
of operations.


Process Hierachy

A process is a protection domain.  The initial configuration of the
machine contains one or more processes with specific, but unspecified,
relationships.  These processes are called the "root processes".  From
the initial configuration, processes can be created and destroyed.


Resource Management

I do not make a disctinction between data and capability pages.  Both
are, for the course of this discussion, memory pages.

Processes require at the very least some memory resources to keep the
process state.  Memory is allocated from containers, which therefore
provide an abstraction for memory reserves.  It is required that one
of the root processes is a server implementing container objects.

A container provides an interface that allows to allocate and return
memory frames, and to create new containers with a new reserve limit
(thus, containers form a hierarchy).  Any successful allocation and
deallocation from such a derived container will also be accounted for
in all containers from which it is derived.  A container can be
destroyed, which will return all memory frames allocated from it, and
thus recursively destroy all containers derived from it as well.


Process Creation And Destruction

Any process which has access to a container from which a sufficient
amount of memory can be allocated, can convert this memory into a
process.  The process is destroyed by deallocating the memory from
which it was created.


Filling In the Gaps

The above description is actually mostly complete.  What is missing is
the description of a somewhat unrelated feature which allows process
identification, a description of what the default mechanisms are in
the system to support common design patterns, and an illustration that
these design patterns are sufficient.


Canonical Invariances

By default, every process is associated with one memory container, the
primary container of the process.  This is the container from which
the process is allocated, and from which the process does all
allocations for its own needs.  Primary containers are by default not
shared.


Canonical Process Creation

To create a new process, by default, a process, the parent, creates a
new container from its primary container, allocates some memory from
it and converts it into a new process, the child.  It then prepares
the process to get it into a runnable state.  This includes the
following steps: First, a special executable image (allocated from the
primary container of the child) is installed into the child's address
space, which runs a cooperative protocol with the parent.  Then, the
parent provides the primary container of the child, and any other
initial state that the child should receive, to the startup code.  The
startup code finally installs this initial state and starts to execute
it.

It is clear from this description that the child's existance is
completely determined by the parent.


Canonical Process Destruction

Process destruction can be done either cooperatively, or forcibly.
The difference corresponds approximately to the difference between
SIGTERM and SIGKILL in Unix.  To destroy a process cooperatively, a
request message is sent to a special capability implemented by the
child process.  The child can then begin to tear down the program, and
at some time send a request back to the parent process to ask for
forced process destruction.

Forced process destruction can be done by the parent process without
any cooperation by the child process.  The parent process simply
destroys the primary container of the child (this means that the
parent process should retain the primary container capability).

Because container destruction works recursively, forced process
destruction works recursively as well.


Process Hierarchy

From the above description it should be clear that containers and
processes are organized in the same hierarchical tree structure, where
every node corresponds to a process and its primary container, and
every edge corresponds to a parent-child relationship.


Isolation

The ability to subdivide the container's resource reserves provides
the ability to completely isolate sibling processes in the process
hierarchy.  By default, two processes, where neither is an ancestor of
the other process, are completely isolated.  Also, an ancestor is
partially isolated from its child.  To overcome this isolation, the
two processes need the cooperation of at least all their respective
ancestors up to the first common ancestor in the tree.  An example
should illustrate that:

                A
               / \
              B   C
                 / \
                D   E

In this picture, A is the direct parent of B and C, and C is the
direct parent of D and E.  A is a common ancestor of B, C, D and E.  C
is a comon ancestor of D and E.  The isolation is by default complete
between (B C), (B D), (B E), and (D E).  There is partial isolation
between (A B), (A C), (A D), (A E), (C D) and (C E).  The isolation
properties of A are, if it is a root node, defined by the initial
configuration.

If, for example, B and D should be able to communicate, the explicit
or implicit permission needs to be provided by both A and C.


Confinement

Because of the recursive nature of the process hierarchy, and because
the existance of a child is completely determined by its direct parent
(which existance is completely determined by _its_ direct parent,
etc), processes can be confined, and the confinement extends to all
their child processes as well.

In the above example, A confines B, C, D and E.  C confined D and E.
Thus, B and C are only confined by A, whereas D and E are confined by
A and C.


Meaning Of Words Such As Secure, Want, External, etc

Because the existance of a child process is completely defined by its
parent, its understanding of what is secure, what its needs are, what
is "external" to itself and what is internal, etc, is completely
defined by the parent as well.  It therefore does not make sense to
object to the above model by claiming that the child can not do what
it wants to do, because what the child wants to do is completely
defined by the parent, as are its abilities to do it.  It also does
not make sense to object that the child can not determine if a
capability it got from the parent is safe to use, because it is the
parent which defines for the child if a capability is safe to use or
not.

Any such objection has, at its root, some assumption that is different
from the assumptions made in this model, and thus needs to be analysed
and reasoned about outside of the model.


Identify Operation

An branding operation exists which, at the micro-level, allows a
server process to check if a certain capability is implemented by
itself.  The server can then provide an identify operation to its
clients, which allow the clients to check with the server if a certain
capability is implemented by it.  The client can then refuse to use
the capability if it is not authentic.


Applications

I will now describe some common applications that need to be
supported, and how they can be supported in the above system
structure.  To make this brief, I only include applications that have
any significance in the confined+isolated discussion.  There are other
applications (pipes, daemonization, process management), which are
important to discuss, but can be solved in identical ways in both
types of system structures, so I am excluding them here.


System Services

Unix-style suid applications have been proposed as one application for
alternative process construction mechanisms.  However, suid
applications in Unix are, from the perspective of the parent, not
confined, only isolated.  Thus, they are readily replaced by a system
service that is created by the system software, and that runs as a
sibling to any user process.  Only the ability to invoke the system
service needs to be given to the user, not the ability to instantiate
it.

In fact, no gain can derived from letting the user instantiate system
services.  In Unix, system services run on durable resources, which
the user can not revoke.  Thus, the system service needs to acquire
its resources from a container that is not derived from the user's
primary container.


Cut & Paste

In "Design of the EROS Trusted Window System", Shap et al describe a
uni-directional communication mechanism that can be used for a
cut&paste operation in a window manager, that is guaranteed to not
allow backflow of information.  The main challenge to do this is
format conversion, which traditionally requires negotiation between
the two parties.  In the mechanism proposed, confined constructors are
used to allow the sending party to provide format converters that can
be used by the receiving party to convert into a format it understands.

I think that in the context of a free software operating system, and
considering the threat caused by proprietary document formats, it is
fully sufficient and in fact appropriate for our needs to replace this
mechanism with one in which the format converters are not provided as
isolated programs, but where instead at least the binary image of the
format converter is provided in read-only fasion to the receiver.

Accepting this means, in practice, that in the proposed protocol, the
format converter constructor capability can be replaced by the vector
of capabilities, which must be transitive read-only, which is put into
the constructor by the sending party before sealing.  The sending
party then can instantiate these programs itself.

This alternative mechanism breaks with the principle of least
authority, because it values other principles with a higher priority.


Suspicious Collaboration

Two agents in the system can collaborate suspiciously by means of a
third agent.  In the process, they rely on the third agent to
implement the common will.  This third agent can even be a
constructor-like service.  The validity of the service can either be
established by the abovely described "Identify" operation, or, in
principle, if the underlying operating system exposes the
functionality of a "trusted computing" component, the two agents can
even get all the guarantees and restrictions imposed by such a
component.  There is nothing in the system structure above that can
prevent this[1].  The changes needed in the underlying operating
system are purely local changes with no effect on the overall system
structure.

  [1] I should add here that my analysis is limited to technical
  constraints.  There may be further legal constraints imposed by
  software licenses such as the upcoming GPL v3, which draft has an
  anti-DRM provision.

I said earlier that this makes it hard for me to understand why it has
been said that the above system structure constitutes a "ban" on this
mechanism.  I believe, without having inquired further, that the
reason must be that the suspicious collaboration in the above sense is
a contract with limited scope.  Any information that is passed from
the mediating agent to either of the two parties will subsequently not
be controlled further.  This is in fact always true.  The only
difference is what the scope of the mediating agent is.

In "locked down" computer systems, the mediating agent has a scope
that extends to all of the operating system.  For example, the window
manager would be part of the mediating agent, and conspire with other
components to not allow some information displayed to be read out or
modified.  Or it could reduce the quality of the information if such a
read out occurs (as is required by HDCP licenses, for example).  In
the danger of repeating myself here, the differences that surfaced in
the discussion are probably rooted in the issue of scope.  The scope
problem is not visible under a microsope, but is only revealed as
emergent behaviour by a macroscopic analysis of the resulting system.

Thanks,
Marcus

[Prev in Thread]

Current Thread

[Next in Thread]

Part 2: System Structure, Marcus Brinkmann <=
- Re: Part 2: System Structure, Pierre THIERRY, 2006/05/13
  - Re: Part 2: System Structure, Bas Wijnen, 2006/05/13
    - Re: Part 2: System Structure, Pierre THIERRY, 2006/05/13
    - Re: Part 2: System Structure, Marcus Brinkmann, 2006/05/14
    - Re: Part 2: System Structure, Pierre THIERRY, 2006/05/14
    - Re: Part 2: System Structure, Bas Wijnen, 2006/05/15
    - Re: Part 2: System Structure, Pierre THIERRY, 2006/05/15
    - Re: Part 2: System Structure, Bas Wijnen, 2006/05/15
    - Re: Part 2: System Structure, Pierre THIERRY, 2006/05/15
    - Re: Part 2: System Structure, Bas Wijnen, 2006/05/15

Prev by Date: Re: Linus replies
Next by Date: Re: A Framework for Device Drivers in Microkernel Operating Systems
Previous by thread: A Framework for Device Drivers in Microkernel Operating Systems
Next by thread: Re: Part 2: System Structure
Index(es):
- Date
- Thread