[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: socket.c added support for unix domain socket datagram transport
From: |
Ralph Schmieder |
Subject: |
Re: socket.c added support for unix domain socket datagram transport |
Date: |
Mon, 26 Apr 2021 13:14:48 +0200 |
> On Apr 23, 2021, at 18:39, Stefano Brivio <sbrivio@redhat.com> wrote:
>
> On Fri, 23 Apr 2021 17:48:08 +0200
> Ralph Schmieder <ralph.schmieder@gmail.com> wrote:
>
>> Hi, Stefano... Thanks for the detailed response... inline
>> Thanks,
>> -ralph
>>
>>
>>> On Apr 23, 2021, at 17:29, Stefano Brivio <sbrivio@redhat.com>
>>> wrote:
>>>
>>> Hi Ralph,
>>>
>>> On Fri, 23 Apr 2021 08:56:48 +0200
>>> Ralph Schmieder <ralph.schmieder@gmail.com> wrote:
>>>
>>>> Hey... new to this list. I was looking for a way to use Unix
>>>> domain sockets as a network transport between local VMs.
>>>>
>>>> I'm part of a team where we run dozens if not hundreds of VMs on a
>>>> single compute instance which are highly interconnected.
>>>>
>>>> In the current implementation, I use UDP sockets (e.g. something
>>>> like
>>>>
>>>> -netdev
>>>> id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678)
>>>>
>>>> which works great.
>>>>
>>>> The downside of this approach is that I need to keep track of all
>>>> the UDP ports in use and that there's a potential for clashes.
>>>> Clearly, having Unix domain sockets where I could store the
>>>> sockets in the VM's directory would be much easier to manage.
>>>>
>>>> However, even though there is some AF_UNIX support in net/socket.c,
>>>> it's
>>>>
>>>> - not configurable
>>>> - it doesn't work
>>>
>>> I hate to say this, but I've been working on something very similar
>>> just in the past days, with the notable difference that I'm using
>>> stream-oriented AF_UNIX sockets instead of datagram-oriented.
>>>
>>> I have a similar use case, and after some experiments I picked a
>>> stream-oriented socket over datagram-oriented because:
>>>
>>> - overhead appears to be the same
>>>
>>> - message boundaries make little sense here -- you already have a
>>> 32-bit vnet header with the message size defining the message
>>> boundaries
>>>
>>> - datagram-oriented AF_UNIX sockets are actually reliable and
>>> there's no packet reordering on Linux, but this is not "formally"
>>> guaranteed
>>>
>>> - it's helpful for me to know when a qemu instance disconnects for
>>> any reason
>>>
>>
>> IMO, dgram is the right tool for this as it is symmetrical to using a
>> UDP transport... Since I need to pick up those packets from outside
>> of Qemu (inside of a custom networking fabric) I'd have to make
>> assumptions about the data and don't know the length of the packet in
>> advance.
>
> Okay, so it doesn't seem to fit your case, but this specific point is
> where you actually have a small advantage using a stream-oriented
> socket. If you receive a packet and have a smaller receive buffer, you
> can read the length of the packet from the vnet header and then read
> the rest of the packet at a later time.
>
> With a datagram-oriented socket, you would need to know the maximum
> packet size in advance, and use a receive buffer that's large enough to
> contain it, because if you don't, you'll discard data.
For me, the maximum packet size is a jumbo frame (e.g. 9x1024) anyway --
everything must fit into an atomic write of that size.
>
> The same reasoning applies to a receive buffer that's larger than the
> maximum packet size you can get -- you can then read multiple packets at
> a time, filling your buffer, partially reading a packet at the end of
> it, and reading the rest later.
>
> With a datagram-oriented socket you need to resort to recvmmsg() to
> receive multiple packets with one syscall (nothing against it, it's
> just slightly more tedious).
>
>> Using the datagram approach fits nicely into this concept.
>> So, yes, in my instance the transport IS truly connectionless and VMs
>> just keep sending packets if the fabric isn't there or doesn't pick
>> up their packets.
>
> I see, in that case I guess you really need a datagram-oriented
> socket... even though what happens with my patch (just like with the
> existing TCP support) is that your fabric would need to be there when
> qemu starts, but if it disappears later, qemu will simply close the
> socket. Indeed, it's not "hotplug", which is probably what you need.
That's the point. This is peer-to-peer/point-to-point and not client/server.
>
>> And maybe there's use for both, as there's currently already support
>> for connection oriented (TCP) and connectionless (UDP) inet
>> transports.
>
> Yes, I think so.
>
>>>> As a side note, I tried to pass in an already open FD, but that
>>>> didn't work either.
>>>
>>> This actually worked for me as a quick work-around, either with:
>>> https://github.com/StevenVanAcker/udstools
>>>
>>> or with a trivial C implementation of that, that does essentially:
>>>
>>> fd = strtol(argv[1], NULL, 0);
>>> if (fd < 3 || fd > INT_MAX || errno)
>>> usage(argv[0]);
>>>
>>> s = socket(AF_UNIX, SOCK_STREAM, 0);
>>> if (s < 0) {
>>> perror("socket");
>>> exit(EXIT_FAILURE);
>>> }
>>>
>>> if (connect(s, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
>>> perror("connect");
>>> exit(EXIT_FAILURE);
>>> }
>>>
>>> if (dup2(s, (int)fd) < 0) {
>>> perror("dup");
>>> exit(EXIT_FAILURE);
>>> }
>>>
>>> close(s);
>>>
>>> execvp(argv[2], argv + 2);
>>> perror("execvp");
>>>
>>> where argv[1] is the socket number you pass in the qemu command line
>>> (-net socket,fd=X) and argv[2] is the path to qemu.
>>
>> As I was looking for dgram support I didn't even try with a stream
>> socket ;)
>
> Mind that it also works with a SOCK_DGRAM ;) ...that was my original
> attempt, actually.
>
>>>> So, I added some code which does work for me... e.g.
>>>>
>>>> - can specify the socket paths like -netdev
>>>> id=bla,type=socket,unix=/tmp/in:/tmp/out
>>>> - it does forward packets between two Qemu instances running
>>>> back-to-back
>>>>
>>>> I'm wondering if this is of interest for the wider community and,
>>>> if so, how to proceed.
>>>>
>>>> Thanks,
>>>> -ralph
>>>>
>>>> Commit
>>>> https://github.com/rschmied/qemu/commit/73f02114e718ec898c7cd8e855d0d5d5d7abe362
>>>>
>>>
>>> I think your patch could be a bit simpler, as you could mostly reuse
>>> net_socket_udp_init() for your initialisation, and perhaps rename
>>> it to net_socket_dgram_init().
>>
>> Thanks... I agree that my code can likely be shortened... it was just
>> a PoC that I cobbled together yesterday and it still has a lot of
>> to-be-removed lines.
>
> I'm not sure if it helps, but I guess you could "conceptually" recycle
> my patch and in some sense "extend" the UDP parts to a generic datagram
> interface, just like mine extends the TCP implementation to a generic
> stream interface.
>
> About command line and documentation, I guess it's clear that
> "connect=" implies something stream-oriented, so I would prefer to
> leave it like that for a stream-oriented AF_UNIX socket -- it behaves
> just like TCP.
>
> On the other hand, you can't recycle the current UDP "mcast=" stuff
> because it's not multicast (AF_UNIX multicast support for Linux was
> proposed some years ago, https://lwn.net/Articles/482523/, but not
> merged), and of course not "udp="... would "unix_dgram=" make sense
> to you?
>
> On a side note, I wonder why you need two named sockets instead of
> one -- I mean, they're bidirectional...
Hmm... each peer needs to send unsolicited frames/packets to the other end...
and thus needs to bind to their socket. Pretty much for the same reason as the
UDP transport requires you to specify a local and a remote 5-tuple. Even
though for AF_INET, the local port does not have to be specified, the OS would
assign an ephemeral port to make it unique. Am I missing something?
Another thing: on Windows, there's a AF_UNIX/SOCK_STREAM implementation... So,
technically it should be possible to use that code path on Windows, too. Not a
windows guy, though... So, can't say whether it would simply work or not:
https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/
>
> --
> Stefano
>
- socket.c added support for unix domain socket datagram transport, Ralph Schmieder, 2021/04/23
- Re: socket.c added support for unix domain socket datagram transport, Daniel P . Berrangé, 2021/04/23
- Re: socket.c added support for unix domain socket datagram transport, Stefano Brivio, 2021/04/23
- Re: socket.c added support for unix domain socket datagram transport, Ralph Schmieder, 2021/04/26
- Re: socket.c added support for unix domain socket datagram transport, Daniel P . Berrangé, 2021/04/26
- Re: socket.c added support for unix domain socket datagram transport, Stefano Brivio, 2021/04/27
- Re: socket.c added support for unix domain socket datagram transport, Daniel P . Berrangé, 2021/04/28
- Re: socket.c added support for unix domain socket datagram transport, Markus Armbruster, 2021/04/29