lwip-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-devel] [task #6735] Provide new pbuf type: PBUF_RAM_NOCOPY


From: Jonathan Larmour
Subject: [lwip-devel] [task #6735] Provide new pbuf type: PBUF_RAM_NOCOPY
Date: Thu, 09 Aug 2007 16:16:27 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070530 Fedora/1.5.0.12-1.fc5 Firefox/1.5.0.12

Follow-up Comment #18, task #6735 (project lwip):

> So if implementing PBUF_RAM_NOCOPY (and using it in TCP and all
> other stack-internal uses of PBUF_RAM), this would still leave
> us with the fact that the present zero-copy sending for UDP
> sockets would not work any more! At least for most hardware
> (as DMA MACs are rather widely used). 

But this isn't a surprise is it? If you can't guarantee a packet has truly
been sent when handed off to the driver, then you can't allow the application
to potentially change the buffer. Maybe you could argue that the UDP socket
send call should not return until the packet has definitely left the hardware,
but I think that would create bad inefficiencies of its own.

> I'd like to know how other stacks (that are advertising 
> zero-copy) solve this... Although this seems a problem only for
> the sockets layer, a netconn app can use the new pbuf type.

Indeed, you may have noticed me commenting before that a socket API won't be
as efficient as the netconn API ;-). Netconn can pass incoming packet data
directly as received from the hardware, and, with PBUF_RAM_NOCOPY, send
directly too. Any alignment constraints for data are currently left as the
application's issue. In due course we should probably address things like
alignment requirements later, but there's immediate need straight away.

But data passed through the socket API is harder. Here's some links to what
BSD OS people have tried to do:
http://freebsd-man.page2go2.com/man9/zero_copy_9.html
http://www.cs.duke.edu/ari/trapeze/freenix/node6.html
http://people.freebsd.org/~ken/zero_copy/

Even there they have had to jump through quite exceptional hoops (only one
hardware device supported now, and even that requires specially patched
firmware), and much of that come from the inherent properties of the socket
API they're trying to fit it into. Additionally, relying on the presence of
copy-on-write capable VM is something they may be able to do, but lwIP
certainly couldn't (and shouldn't).

Maybe there's something more specific that can be done, like having a
setsockopt that indicates that subsequent UDP sends should use
PBUF_RAM_NOCOPY, but it's all hairy and obviously non-standard, which raises
the question of why use the socket API in that case anyway.

> One thing: I would always let the allocator deallocate the pbuf
> (e.g. alloc; send; free) and let the lower layers (network 
> driver?) ref the pbuf and free it later, instead of the way
> described here earlier. Like Jared, I have the feeling this
> could otherwise lead to many 'bugs' reported on lwip-users and
> I also think the code should be the same for all pbuf types.

It is a valid alternative solution, and I agreed with Jared too in comment
#13. As per comment #15, it seems Kieran would prefer to go with the
PBUF_RAM_NOCOPY approach. Certainly we have to do something because the way
the stack works now is not good: choose one of unsafe (unless you're using a
polled driver) or extra unnecessary copies.

The API contract for PBUF_RAM_NOCOPY is explicitly that the application
doesn't free it. Where it gets freed inside the stack or the driver is up to
us, but the current interface between the stack and driver implies it should
be freed in the driver. I don't think this will be too hard for users to
understand, and only affects things if they choose to use PBUF_RAM_NOCOPY in
the first place.

Jifl


    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/task/?6735>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]