[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lwip-users] Race condition in tpcip.c causing memory corruption
From: |
Stephane Lesage |
Subject: |
Re: [lwip-users] Race condition in tpcip.c causing memory corruption |
Date: |
Wed, 27 Feb 2013 21:47:02 +0100 |
Hi,
Very interesting multithread tracing, but...
>In this scenario a thread doing an outbound socket write results in a msg for
>do_write getting posted to the mbox.
>This causes a context switch to the tcpip_thread() which fetches the msg from
>the mailbox and begins processing.
>This thread gets context switched out before getting to the TCPIP_APIMSG_ACK().
>Execution is passed to a thread that is passing packets into lwip.
OK
>This thread gets into tcpip_apimsg() and posts to the mbox.
If you're talking about your netif driver giving the packets to the stack, then
I think this is wrong.
You should use tcpip_input().
This function will create a TCPIP_MSG_INPKT message and sys_mbox_trypost() it
to the tcpip thread.
>No context switch occurs (because tcpip_thread() is not currently waiting in
>the fetch call)
>so this receive thread makes it to the
>sys_arch_sem_wait(&apimsg->msg.conn->op_completed, 0) call and blocks.
Clear, passing a packet to the stack works at the lowest level: you give a pbuf
from your netif.
It cannot involve a PCB or a netconn and its semaphore...
>Now a context switch occurs back to the outbound thread which finally makes it
>to the same sys_arch_sem_wait() call and blocks.
>Now context is switched to the tcpip_thread which finish the do_write()
>execution and calls TCPIP_APIMSG_ACK().
>This should have unblocked the outbound thread however the first one to block
>on that sem was the inbound thread
>(which still has it's message posted in the mbox) so the inbound thread
>receives the signal.
>Now the tcpip_thread() grabs the inbound msg (which container was on the
>inbound thread's stack which has been popped)
>and starts processing the message. That container can now be corrupted since
>the stack has been popped.
>Bad things happen after this.....
Of course, and this is why LwIP does not support multiple threads using the
same socket (without the core locking option)
>I'm wondering if I'm somehow using the interfaces wrong to cause this to
>happen.
>I fixed this by protecting the tcpip_apimsg() call with a semaphore to stop
>reentrancy.
>I'm I doing something wrong or is this a real bug?
If I understand correctly, then you just need to use tcpip_input(pbuf, netif)
in your driver RX thread.
PS: I personally do not like the overhead of using a RX thread and/or
tcpip_input() function which dynamically allocates a message.
My init function allocates a static rxmsg = tcpip_callbackmsg_new(rx_callback,
netif);
My interrupts do fast/minimal DMA queue processing and call
tcpip_trycallback(rxmsg) (only if necessary)
Then my rx_callback() does the actual job in the tcpip thread context:
- loop to extract pbuf from the "completed" DMA descriptors queue
- snmp/statistics update
- call ethernet_input(pbuf, netif)
- try to reallocate a new pbuf to reuse the now free DMA descriptor
--
Stephane Lesage