qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: socket.c added support for unix domain socket datagram transport


From: Stefano Brivio
Subject: Re: socket.c added support for unix domain socket datagram transport
Date: Fri, 23 Apr 2021 18:39:01 +0200

On Fri, 23 Apr 2021 17:48:08 +0200
Ralph Schmieder <ralph.schmieder@gmail.com> wrote:

> Hi, Stefano... Thanks for the detailed response... inline
> Thanks,
> -ralph
> 
> 
> > On Apr 23, 2021, at 17:29, Stefano Brivio <sbrivio@redhat.com>
> > wrote:
> > 
> > Hi Ralph,
> > 
> > On Fri, 23 Apr 2021 08:56:48 +0200
> > Ralph Schmieder <ralph.schmieder@gmail.com> wrote:
> >   
> >> Hey...  new to this list.  I was looking for a way to use Unix
> >> domain sockets as a network transport between local VMs.
> >> 
> >> I'm part of a team where we run dozens if not hundreds of VMs on a
> >> single compute instance which are highly interconnected.
> >> 
> >> In the current implementation, I use UDP sockets (e.g. something
> >> like 
> >> 
> >> -netdev
> >> id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
> >> 
> >> which works great.
> >> 
> >> The downside of this approach is that I need to keep track of all
> >> the UDP ports in use and that there's a potential for clashes.
> >> Clearly, having Unix domain sockets where I could store the
> >> sockets in the VM's directory would be much easier to manage.
> >> 
> >> However, even though there is some AF_UNIX support in net/socket.c,
> >> it's
> >> 
> >> - not configurable
> >> - it doesn't work  
> > 
> > I hate to say this, but I've been working on something very similar
> > just in the past days, with the notable difference that I'm using
> > stream-oriented AF_UNIX sockets instead of datagram-oriented.
> > 
> > I have a similar use case, and after some experiments I picked a
> > stream-oriented socket over datagram-oriented because:
> > 
> > - overhead appears to be the same
> > 
> > - message boundaries make little sense here -- you already have a
> >  32-bit vnet header with the message size defining the message
> >  boundaries
> > 
> > - datagram-oriented AF_UNIX sockets are actually reliable and
> > there's no packet reordering on Linux, but this is not "formally"
> > guaranteed
> > 
> > - it's helpful for me to know when a qemu instance disconnects for
> > any reason
> >   
> 
> IMO, dgram is the right tool for this as it is symmetrical to using a
> UDP transport... Since I need to pick up those packets from outside
> of Qemu (inside of a custom networking fabric) I'd have to make
> assumptions about the data and don't know the length of the packet in
> advance.

Okay, so it doesn't seem to fit your case, but this specific point is
where you actually have a small advantage using a stream-oriented
socket. If you receive a packet and have a smaller receive buffer, you
can read the length of the packet from the vnet header and then read
the rest of the packet at a later time.

With a datagram-oriented socket, you would need to know the maximum
packet size in advance, and use a receive buffer that's large enough to
contain it, because if you don't, you'll discard data.

The same reasoning applies to a receive buffer that's larger than the
maximum packet size you can get -- you can then read multiple packets at
a time, filling your buffer, partially reading a packet at the end of
it, and reading the rest later.

With a datagram-oriented socket you need to resort to recvmmsg() to
receive multiple packets with one syscall (nothing against it, it's
just slightly more tedious).

> Using the datagram approach fits nicely into this concept.
> So, yes, in my instance the transport IS truly connectionless and VMs
> just keep sending packets if the fabric isn't there or doesn't pick
> up their packets.

I see, in that case I guess you really need a datagram-oriented
socket... even though what happens with my patch (just like with the
existing TCP support) is that your fabric would need to be there when
qemu starts, but if it disappears later, qemu will simply close the
socket. Indeed, it's not "hotplug", which is probably what you need.

> And maybe there's use for both, as there's currently already support
> for connection oriented (TCP) and connectionless (UDP) inet
> transports. 

Yes, I think so.

> >> As a side note, I tried to pass in an already open FD, but that
> >> didn't work either.  
> > 
> > This actually worked for me as a quick work-around, either with:
> >     https://github.com/StevenVanAcker/udstools
> > 
> > or with a trivial C implementation of that, that does essentially:
> > 
> >     fd = strtol(argv[1], NULL, 0);
> >     if (fd < 3 || fd > INT_MAX || errno)
> >             usage(argv[0]);
> > 
> >     s = socket(AF_UNIX, SOCK_STREAM, 0);
> >     if (s < 0) {
> >             perror("socket");
> >             exit(EXIT_FAILURE);
> >     }
> > 
> >     if (connect(s, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
> >             perror("connect");
> >             exit(EXIT_FAILURE);
> >     }
> > 
> >     if (dup2(s, (int)fd) < 0) {
> >             perror("dup");
> >             exit(EXIT_FAILURE);
> >     }
> > 
> >     close(s);
> > 
> >     execvp(argv[2], argv + 2);
> >     perror("execvp");
> > 
> > where argv[1] is the socket number you pass in the qemu command line
> > (-net socket,fd=X) and argv[2] is the path to qemu.
> 
> As I was looking for dgram support I didn't even try with a stream
> socket ;)

Mind that it also works with a SOCK_DGRAM ;) ...that was my original
attempt, actually.

> >> So, I added some code which does work for me... e.g.
> >> 
> >> - can specify the socket paths like -netdev
> >> id=bla,type=socket,unix=/tmp/in:/tmp/out
> >> - it does forward packets between two Qemu instances running
> >> back-to-back
> >> 
> >> I'm wondering if this is of interest for the wider community and,
> >> if so, how to proceed.
> >> 
> >> Thanks,
> >> -ralph
> >> 
> >> Commit
> >> https://github.com/rschmied/qemu/commit/73f02114e718ec898c7cd8e855d0d5d5d7abe362
> >>  
> > 
> > I think your patch could be a bit simpler, as you could mostly reuse
> > net_socket_udp_init() for your initialisation, and perhaps rename
> > it to net_socket_dgram_init().  
> 
> Thanks... I agree that my code can likely be shortened... it was just
> a PoC that I cobbled together yesterday and it still has a lot of
> to-be-removed lines.

I'm not sure if it helps, but I guess you could "conceptually" recycle
my patch and in some sense "extend" the UDP parts to a generic datagram
interface, just like mine extends the TCP implementation to a generic
stream interface.

About command line and documentation, I guess it's clear that
"connect=" implies something stream-oriented, so I would prefer to
leave it like that for a stream-oriented AF_UNIX socket -- it behaves
just like TCP.

On the other hand, you can't recycle the current UDP "mcast=" stuff
because it's not multicast (AF_UNIX multicast support for Linux was
proposed some years ago, https://lwn.net/Articles/482523/, but not
merged), and of course not "udp="... would "unix_dgram=" make sense
to you?

On a side note, I wonder why you need two named sockets instead of
one -- I mean, they're bidirectional...

-- 
Stefano




reply via email to

[Prev in Thread] Current Thread [Next in Thread]