lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] Crash when doing stresstest with ppp and sockets/UDP


From: Sylvain Rochet
Subject: Re: [lwip-users] Crash when doing stresstest with ppp and sockets/UDP
Date: Thu, 21 Dec 2017 19:42:18 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

Hi zulu4711,


On Thu, Dec 21, 2017 at 10:57:55AM -0700, zulu4711 wrote:
> Thanks for input Sylvain!
> 
> I'm sure that I'm doing something wrong, I just need to find out what ;)
> In the project this will eventually go into, I have around 300.000 lines of
> C code and more than 50 separate threads running so a "non RTOS" operation
> is not possible.
> 
> I was led to believe (not paying enough attention) that once you decide that
> you want to run lwip in "RTOS mode", most will be protected and thread safe
> and this is clearly not the case (my fault!)
> (and I have actually been reading the documentation on lwip/ppp, but maybe
> I'm just not getting all the hints in the various documents, I'm sure I'm
> not alone in that)
> 
> The test I have running was/is organized as follows:
> 
> "sio thread":
>   loops and:
>     calls pppos_input() with data from serial port (only when ppp != null,
> ie ppos_create() etc has been called)
>     Note: PPP_INPROC_IRQ_SAFE is set to 1

IRQ safe and thread safe are not the same thing. IRQ safe means 
interrupt context can interrupt the main context but *NOT* the contrary. 
Thread safe means A context can interrupt B context and the other way 
around. Using PPP_INPROC_IRQ_SAFE is *NOT* thread safe. This is properly 
documented in the PPPoS input path section of the PPP documentation...


>     when re-connecting is needed (signalled using flags) it calls
> ppp_close() (when DCD goes away) and then ppp_connect() (when DCD comes
> again)

Doing that is very dangerous (even with thread safe pppapi_* variants), 
ppp_connect and ppp_close are just initiators and you have to wait for 
the PPP status callback to fire before initiating a new state 
transition. Calling ppp_connect while closing is still in progress will 
do nothing, and since you discard all ppp_* functions return values you 
won't have a hint that it failed...


> "UDP worker thread":
>   call netconn_send() and netconn_receive in a loop forever
>   
> "TCP worker thread":
>   call send() and receive in a loop forever
>   
> Both the "UDP worker" and the "TCP worker" threads are each started twice
> (so 4 "worker" threads are running)
> 
> 
> "main thread":
>   at startup/initialization it:
>   starts the "SIO thread"
>   dials the modem and waits for DCD (connected)
>   calls tcpip_init(), pppos_create(), ppp_connect()
>   waits for first time connection, then starts the other threads
>   
>   after this nothing more is done in the "main thread"!
>   
> This crashed (after 2 to 5 reconnect sessions).
> I then changed the ppp_close() and ppp_connect() in the "SIO thread" to
> pppapi_close() and pppapi_open(). 
> This seems to work, at least it has been running much longer than without
> using pppapi_ functions.
> 
> Once again, thanks for a nice project and all the help :)

It still look wrong, you have to use pppapi_* *everywhere* you want to 
call a ppp_* function outside the lwIP core thread. This is a must even 
at init stage. Please don't just fix where it crashed because it is in a 
loop and so much more likely to trigger a race condition.

Looks like you are calling netif_* functions outside the lwIP core 
thread, guess what ? This is not thread safe. (Or maybe it is with core 
locking enabled).


Sylvain

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]