Re: [lwip-users] current_iphdr_* + loopif = non-reentrant code

From:

David Empson

Subject:

Date:

Fri, 20 May 2011 10:27:38 +1200

Hi Luca

Your analysis of what went wrong is correct, but it is not a bug in LWIP. Your usage of LWIP is at fault.

For all versions of LWIP, the core is not reentrant.

In this case you have your main loop invoking LWIP and an interrupt attempting to invoke LWIP before it has returned to the main loop, which results in two parts of the LWIP core executing at the same time, with attendant risks of overwriting global variables or other state information. If it worked before, you probably got lucky because UDP has less state information than TCP, so less that can go wrong due to reentancy.

For NO_SYS = 1, the best solution is usually to have your Ethernet interrupt not directly call LWIP, but instead do something like set a flag which is picked up by the main loop to process incoming Ethernet packets.

An alternative is to lock out your Ethernet interrupt (or all interrupts) around all calls to any LWIP function invoked from the main loop (or from any code called by the main loop). Just blocking interrupts for lwip_poll() is not sufficient. You would also have to block interrupts during any calls which transmit data, open or close PCBs, etc.

----- Original Message -----

From: Luca Ceresoli

To: address@hidden

Sent: Friday, May 20, 2011 8:16 AM

Subject: [lwip-users] current_iphdr_* + loopif = non-reentrant code

Hi,

after upgrading lwIP from 1.3.2 to lwIP 1.4.0-rc1 a problem started showing
up in our products, still present in -rc2. I bet it's there in 1.4.0
too, even without having tested it, as the relevant code has not changed.

I think this problem can be considered a bug, although I may be doing
something wrong, so I'm asking here.

The problem is caused by the introduction of the current_iphdr_dest/_src
global variables, which made udp_input() and probably other functions
non-reentrant anymore.

The context in which the bug showed up is the following.

A NO_SYS==1 lwIP-based device has 1 Ethernet netif plus a loopback netif
(LWIP_NETIF_LOOPBACK=1).

Two different logical entities (A and B) exist and run independently on the
device.
Entity A sends UDP packets to a target entity, which may be entity B, in
this case the packets sent from A to B are queued into the loopif queue.

If the packet flow is sufficiently high, the following happens frequently:
 1. entity A sends a packet to entity B; the packet (pck1) is enqueued into the
    loopif queue;
 2. the main application loop calls netif_poll, which calls ip_input passing
    pck1;
 3. ip_input (line ~314) sets current_iphdr_dest/_src with values from pck1;
 4. ip_input calls udp_input for pck1;
 5. a little before or after this function call, an incoming Ethernet packet
    (pck2) triggers an interrupt;
 6. in interrupt context, the new packet reaches ip_input which overwrites
    current_iphdr_dest/_src with values from pck2 (non-reentrancy!);
 7. when pck2 has been handled, the CPU exits the  interrupt context;
 8. execution continues in udp_input, where it was about to handle pck1;
 9. udp_input checks for current_iphdr_dest/_src, which have been overwritten
    with a value from pck1; this is clearly wrong, and leads to (at least)
    dropped packets.

Do you think there is something wrong I did, or is my analysis correct?
Can be this considered a bug? Should I then file a bug report?

I could not follow closely the discussion that led to introducing
current_iphdr_dest/_src and do not have a plan to solve this issue.

At least I found a simple workaround: blocking interrupts before calling
netif_poll() makes the product work as before.

Thanks,

Luca