Re: Networking design proposal

bug-hurd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Networking design proposal

From:	Niels Möller
Subject:	Re: Networking design proposal
Date:	06 Nov 2002 14:52:25 +0100
User-agent:	Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Hi again. Now I've read your description twice, and there are still a
few things that I find are unclear, and there are also a few things
that I think I understand, and which I believe are wrong.

Olivier Péningault <peningault@free.fr> writes:

> You didn't understand correctly. Layer 2 translator performs ethernet +
> arp, not ip !

I think it's unclear to talk about a "layer 2" translator. To me,
"layer 2" is naturally an interface, not an action or a translation.
So you need to say explicitly what the interfaces are on each side of
the translator.

>From your description, my guess is that you mean that on one end, the
translator talks to some device driver to send and receive ethernet
frames. And on the other end, it sends and receives ip packets to the
upper layer. If so, it's about the same thing as what I called "layer
3 part I" (I tried naming my blocks after the interface understood by
the upper interface of the block. I'll try to stick to your
terminology for the rest of this message).

Is that right, or do I still misunderstand you?

> Here is a draw of my idea (for ethernet/ip) :
> 
> __________________   _____________________  ___________________
> | L3 translator  |   |  L3 translator    |  | L3 translator   |
> | 192.168.1.1/32 |   | 192.168.2.1/32    |  | 2001::1/128 ;)  |
> ------------------   ---------------------  -------------------
>  | Mach              || Mach     | Mach       | Mach
>  | port              || port     | port       | port
>  | 1                 || 2 5      | 3          | 4
> ______________________________  _______________________________
> | L2 trans on /dev/eth0      |  | L2 trans on /dev/eth1       |
> |Registred L3 trans :        |  |Registred L3 trans:          |
> | + 1 192.168.1.1/32 0x800   |  | + 3 192.168.2.1/32 0x800    |
> | + 2 192.168.2.0/24 0x800   |  | + 4 2001::1/128    0x86DD   |
> | + 0.0.0.0/0        0x800   |  |                             |
> ------------------------------  -------------------------------
>                |                               |
> -------------------------------------------------------------------
> 
>  K  E  R  N  E  L    W /    D  E  V  I  C  E    D  R  I  V  E  R  S
> 
> --------------------------------------------------------------------

Looks reasonable to me. I'd do some details a little differently, but
I think it's basically right.

> > Such an icmp service makes sense if one has several independent
> > processes doing transport that talk to the same interface. But I'm not
> > sure that makes sense; I think it's better to have at most one
> > transport proces per ip number, to get easy management of portnumbers.
> This would require a lot of memory !

I don't understand this comment.

> > Are there any icmp messages that you can process without knowledge of
> > transport level state?
> No, but think about ISO-CLNP. It is a stand alone layer 3 protocol, data
> transmission (a la ip) and control (a la icmp) are implemented in this
> protocol. If icmp is implemented with layer 4 protocols, you won't be
> able to implement protocols built like CLNP.

You sure can. Consider a hardware bridge that talks plain IP (e.g over
ethernet) on one port, and IP over CLNP on another. If CLNP requires
that icmp packets are encapsulated differently from other ip packets,
then the bridge will have to look into the IP packets it receives and
do the right thing. (I don't know CLNP, but that's the way it has to
work if you want to use it with IP and interoperate with any other
link technology). A hurd CLNP driver could do the same thing, just
looking into the ip packets to figure out how to transmit them. And I
don't think we need any tweaks in the interfaces to optimize for the
CLNP case.

> I want to have ip and icmp to run together, because they provide the two
> services of the layer 3 : data transmission, and control.

That sentence is very strange to me. To me, layer 3 is only one single
and quite primitive service: transport of IP datagrams between nodes
in the network. No more, no less. (And I don't think about the routing
that layer 3 has to under the hood as a service: A layer 3 user will
ask layer 3 "please deliver this IP packet", it won't ask "please
figure out the route for this packet" or "please figure out the path
MTU to this remote address").

> If you receive an icmp packet, you'll have enough information (ip
> header + 8 bytes of the layer 4 packet) to know to who you will
> notify this error.

You also need to understand some of layer 4. If you have a received
icmp packet and a bunch of clients, then you need some more
information about your clients, like "client 1 sends tcp-packets from
address 1.2.3.4" or "client 2 sends udp-packets from address 5.6.7.8,
port 47".

You could have a subscription interface where clients specifies to the
layer 3 code what an icmp packet should look like in order to be
interesting. But then it's more general to have a subscription
interface where client can specify what an *ip* packet should look like
in order to be interesting. And then the layer 3 code need not know
anything in particular about icmp.

How powerful rules do we need? I can think of at least four levels of power:

1. IP address only. Say what ip-addresses you want packets for.
2. IP address and upperlevel protocol code.
3. IP address, upperlevel protocol code, prefix of upperlevel packet.
4. Regexp matched against full ip packet.

With (1), we can really only have at most one transport server for
each ip-address. (2) is quite useless, as it's not powerful enough to
distinguish between icmp messages for different transport servers. I
hope (3) is probably powerful enough to handle several transport
servers, icmp messages, ipsec security associations, etc.

> > That sounds odd. Port number space is a part of layer 4, not layer 3.
> >>
> I know it. but, I thought about the way it had to be implemented.
> - in the layer 4. If you run (at least) 2 layer 4 translators, I've
> found race conditions that will disturb the service.

I think it's a reasonable restriction that you can't have two
transport programs/translators do the same protocol (udp or tcp) on
the same ip address. That should solve the problem, I think.

And if you later want to get rid of that restriction, you need to
figure out some service and/or protocol that lets two processes to
share port number space. But that should still be responsibility of
the (multiple) transport servers, and it can be implemented without
any modifications to any components in the stack above or below the
transport server.

> - in the layer 3 translator. This doesn't respect ISO layers, BUT :
>  * L3 translators will only know L-4 tyranslators want to get a number,
> L3 translator will not know it is a network port. Only a number !!!

It also has to manage seperate namespaces, port numbers are local to
an ip-address and protocol.

>  * no race condition is possible (it is usefull, AFAIK)
>  * less rpc calls.

I don't see what you gain by putting it into layer 3 rather than in a
general "number allocation" server. I really think the allocation and
mangement belongs in the transport server.

> This is a way of avoiding too much mach port allocation/release. For
> tcp, we need a mach port (and a network port) for each session.

The program that communicates with the transport service should have
one port per open socket, just like for any other open files. In the
transport server's end, that port should be associated with a sending
address and port (and that applies to *both* udp and tcp), and for tcp
also with the address and port at the remote end.

For communication between the transport server and the ip (layer 3)
server, you should not need more than one or two ports. The way I see
it, the transport server should give the ip server complete ip
datagrams, with source and destination addresses and port numbers
already filled in. There may be other calls to figure out available
addresses, source and destination address selection, etc, but nothing
that has to be done for every packet.

> This is beautifull, here is what I think :
> 
>   +---------------------+
>   | random posix socket |
>   |    applications     |
>   +---------------------+
> --------------------------- The standard socket API
>   +--------------+
>   | glibc glue   | (with some help of L-4 and L3 translators. as you said
>   +--------------+  socket.defs will be split here, in -lsocket)
> --------------------------- Layer 4+ interface
>   +----------------------+ 
>   | transport protocols  |
>   +----------------------+
> --------------------------- Layer 3/4 interface
>   +----------------------------------+
>   | layer 3 data transmission +      |
>   | control + <<numbers>> allocation |
>   +----------------------------------+
> --------------------------- Layer 2/3 interface
>   +--------------------+
>   | layer 2 stuff      |
>   +--------------------+
> --------------------------- Kernel/user land
>   +--------------+
>   | network card |
>   +--------------+
> --------------------------- Physical interface

> Please think about it. But take your time to answer, I won't be there in
> the next days. :)

I think my main remaining objections to this model concerns "layer 3/4
interface". I want this interface to be only read and write ip packets
(plus some for configuration of addresses, etc). I want to move the
responsibility for "control + <<numbers>> allocation" up one level.
The transport server should be the only component that knows details
about port numbers.

You also put some routing into layer 2, I'd prefer to move that up one
level, either into the "layer 3" block, or into a separate
process that talks to the "layer 2" block, independently from the
layer 3 block that doesn't handle forwarded traffic.

Regards,
/Niels

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Networking design proposal, Niels Möller <=
- Re: Networking design proposal, Michal 'hramrach' Suchanek, 2002/11/11
  - Re: Networking design proposal, Niels Möller, 2002/11/12
    - Re: Networking design proposal, Michal 'hramrach' Suchanek, 2002/11/12
    - Re: Networking design proposal, Niels Möller, 2002/11/13
    - Re: Networking design proposal, Michal 'hramrach' Suchanek, 2002/11/12
    - Re: Networking design proposal, Niels Möller, 2002/11/12
    - Re: Networking design proposal, Hisham Kotry, 2002/11/12
    - Re: Networking design proposal, Niels Möller, 2002/11/13

Prev by Date: configure.in patches
Next by Date: msync w/o MS_* constants?
Previous by thread: configure.in patches
Next by thread: Re: Networking design proposal
Index(es):
- Date
- Thread