bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GSoC 2017] Work done so far


From: Joan Lledó
Subject: [GSoC 2017] Work done so far
Date: Tue, 6 Jun 2017 17:03:17 +0200

It's useful to make a review of the work done in the last months and
list some of the problems arisen during this period.

The process of writing the sockets and I/O operations have been quite
straightforward. Most of actions performed by pfinet's operations are
already implemented by LwIP, this includes managing the state of the
sockets and concurrency, so many operations in the LwIP translator
only check for RPC credentials, call the proper function in LwIP's
sockets API and return errno. Connect[1] operation is a good example.
As you may see, some operations like recv and connect itself have
needed some additional changes to meet the requirements of Glibc, but
in general, problems have come later.

One of the major issues I had was related with the get_openmodes[2]
I/O operation. The implementation in pfinet[3] returns O_WRITE if our
local socket hasn't sent the FIN message, and O_READ if the peer
hasn't sent it. The operation also returns O_NONBLOCK if that flag is
enabled on the local socket. In LwIP, only the O_NONBLOCK flag was
supported by lwip_fcntl()[4], so I had to make some changes in that
function in order to support the other two flags. I wrote a patch[5]
that was rejected as it was based on some misconceptions and wasn't
polished, but finally managed to fix it and was accepted to be part of
the next LwIP release, 2.0.3.

Another time, I observed that the stack always failed when trying to
download a file with wget for the first time, but following attempts
worked  fine. After some research, I found the problem was related
with the ARP protocol. The first time the stack tries to send a
message, the ARP table is empty and an ARP request is sent to get the
MAC address of the destination. In this case, that first message was a
DNS request that was stored in an internal buffer while waiting for
the ARP response to arrive. The problem was that a second DNS query
was generated to get the IPv6 address of the same domain, and this
second query kicked out the first one as there was only space for one
packet in the queue. It was easy to fix, since only one line was
needed to increase the size of the queue, but finding the problem took
me a few days.

I had some inglorious moments during these months, one of them was
when I lost a couple of afternoons trying to know why the I/O select
operation was receiving a value for the timeout parameter that was
strange and completely different to the one sent by the user program,
and the answer was... Glibc was converting it to Unix time :-/. The
devil is in details.

I also had a problem when sending frames to the NIC that I still
haven't understood completely. In LwIP, the frames generated by the
stack are structured in a singly linked list:

struct pbuf {
  struct pbuf *next; // next pbuf in singly linked pbuf chain
  void *payload; // pointer to the actual data in the buffer
  u16_t tot_len; // total length of this buffer and all next buffers in chain
  u16_t len; // length of this buffer
  u8_t type;
  u8_t flags;
  u16_t ref; // reference count
}

This way, it's easy for the stack to mount the frame by adding headers
in each layer. The user program generates some data that is stored in
a pbuf, then the bottom layers generate their own headers and store
them in another pbuf that points to the first one, and so on. At the
end, the Ethernet module receives the pointer to the first pbuf in the
chain and only have to follow the links to get the entire frame in the
correct order. That's why I used to send the data to the NIC as I was
going through the chain. But doing it this way, the frame never
reached the wire. The only way to make the data reach the wire is to
concatenate all the parts in a single buffer containing an entire
frame before sending it to the device. That makes me think that maybe
somewhere between the stack and device, maybe in the driver, these
frame parts are treated as malformed frames and discarded. I'd like to
have one of those blogs whose author seems to know what is s/he
talking about :P, but the truth is I'm still not sure about what's
going on here.

And that's all until today. From the list of tasks I included in my
proposal[6], the following are still pending:

- Add support for IPv6
- Implement other interfaces' operations if needed.
- Implement support for more than one Ethernet interface.
- Add support for command-line parameters.
- Add support for fsysopts.

The prototype is working and is able to connect to the Internet. But
when one tests it seriously many errors arise, so it's still far from
being stable and there's still a lot to polish.

------
[1] 
https://github.com/jlledom/lwip-hurd/blob/748e51859c5c504c45e43758c5e5dd42886959cd/socket-ops.c#L142
[2] 
https://github.com/jlledom/lwip-hurd/blob/748e51859c5c504c45e43758c5e5dd42886959cd/io-ops.c#L152
[3] 
https://git.savannah.gnu.org/cgit/hurd/hurd.git/tree/pfinet/io-ops.c?h=v0.9#n202
[4] 
https://git.savannah.nongnu.org/cgit/lwip.git/tree/src/api/sockets.c?h=STABLE-2_0_2_RELEASE#n2707
[5] https://savannah.nongnu.org/patch/index.php?9283
[6] 
http://blogs.uoc.edu/jlledom/files/2017/05/PortingLwIPtotheGNUHurd_public.pdf



reply via email to

[Prev in Thread] Current Thread [Next in Thread]