RE : RE : [lwip-users] local connection failed on loopif, race?

lwip-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE : RE : [lwip-users] local connection failed on loopif, race?

From:	Frédéric BERNON
Subject:	RE : RE : [lwip-users] local connection failed on loopif, race?
Date:	Fri, 30 Mar 2007 09:36:49 +0200

>It is interesting to note that with a "sys_msleep(1)" inside the patch, the 
>performance is much better than the patch without one.  I guess that's 
>probably because the sys_msleep() gives another thread a chance to run, which 
>is crucial in my testing case
Yes, even it is not the nicest solution, it avoid to consumne all cycles with 
this check, and let others threads (like tcpip_thread) running.

>my test usually crashed into a NULL conn->pcb, which requires another 
>(conn->pcb.tcp != NULL) workaround:
Yes, I also add it now, even if a crash in this part is more the result from 
another problem (allocation error somewhere else). 

>why armed sys_timeout() inside lwIP seems to "lost" when the system is heavily 
>stressed?
In a general way, lwIP is not very reliable in upper-layers, due to some malloc 
which are not all checked. More, some critical internal features could hang the 
stack or a calling thread, or cause some strange behaviors. If your have 
MEM_LIBC_MALLOC==0, you use the internal lwIP heap, with its limitations. 
mem_malloc is used in protocols such DHCP, SNMP, IGMP, but the real use if for 
pbuf_alloc(..., PBUF_RAM) and for loopif_output. If the first case is normal, I 
think that using loopif_output got some problems :

If you got a mem_malloc( sizeof( void *[2]))c error, the "r" pbuf is not freed 
(and "one day" you will got a "not enough memory")
Inside sys_timeout, if you got a memp_malloc(MEMP_SYS_TIMEOUT) error, there is 
no way to know by the caller is the timer is initialized (and to free any 
resources in this case, and so "one day", you will got a "not enough memory"). 
Last, using a timer to communicate pbuf between two "contexts" (and in fact, 
this is the same, tcpip_thread) seems a little strange...

So, things to do seems to be :

- add pbuf_free when "if( NULL == arg ) {" is true
- add a return type de sys_timeout, and check result to do a pbuf_free and a 
mem_free.
- prehaps redesign with a mbox + a new thread (but, I don't think it's a good 
idea)?

I don't use loopif in my applications, so, I suggest you to add a new bug on 
savannah bug tracker to explain these problems, hoping that a developper can 
help you, or, better, propose your own patch...

====================================
Frédéric BERNON 
HYMATOM SA 
Chef de projet informatique 
Microsoft Certified Professional 
Tél. : +33 (0)4-67-87-61-10 
Fax. : +33 (0)4-67-70-85-44 
Email : address@hidden 
Web Site : http://www.hymatom.fr 
====================================
P Avant d'imprimer, penser à l'environnement
 


-----Message d'origine-----
De : address@hidden [mailto:address@hidden De la part de Tai-hwa Liang
Envoyé : vendredi 30 mars 2007 03:23
À : Mailing list for lwIP users
Objet : Re: RE : [lwip-users] local connection failed on loopif, race?


On Mon, 26 Mar 2007, Fr嶮廨ic BERNON wrote:
> The workaround you have find is "something like" the one I proposed in 
> https://savannah.nongnu.org/bugs/?19157 ("lwip_close problems"). It 
> would be good to add a netconn_close in your code. Follow information 
> for bug #19157 : lwip_close problems, I will propose a patch, like 
> (not yet tested like this) :
>
> err_t
> netconn_close(struct netconn *conn)
> {
>  struct api_msg msg;
>
>  if (conn == NULL) {
>    return ERR_VAL;
>  }
>
>  conn->state = NETCONN_CLOSE;
> again:
>  msg.type = API_MSG_CLOSE;
>  msg.msg.conn = conn;
>  api_msg_post(&msg);
>  if (conn->err == ERR_MEM && conn->sem != SYS_SEM_NULL) {
>    sys_sem_wait(conn->sem);
>    goto again;
>  }
>  if (conn->type==NETCONN_TCP) {
>    if (((sock->conn->pcb.tcp->unacked!=NULL) || 
> (sock->conn->pcb.tcp->unsent!=NULL)) && (sock->conn->err==ERR_OK)) {
>      sys_msleep(1);//I don't like that, but...
>      goto again:
>  }
>  conn->state = NETCONN_NONE;
>  return conn->err;
> }

Hi Fr嶮廨ic,

   With your patch, my testing code never hangs(complete millions of 
client/server transactions).  It is interesting to note that with a 
"sys_msleep(1)" inside the patch, the performance is much better than the patch 
without one.  I guess that's probably because the sys_msleep() gives another 
thread a chance to run, which is crucial in my testing case.

   On the other hand, for certain low memory constraint, say, MEM_SIZE=1024, my 
test usually crashed into a NULL conn->pcb, which requires another 
(conn->pcb.tcp != NULL) workaround:

       if ((conn->pcb.tcp != NULL) && ((conn->pcb.tcp->unacked != NULL) ||
         (conn->pcb.tcp->unsent != NULL)) && (conn->err == ERR_OK)) {
         sys_msleep(1);
         goto again;
       }

   However, this workaround does not solve the hanging problem I've mentioned 
about in my first post.  It turns out that raising MEM_SIZE to 2048 is the only 
'solution' at this moment.

> -----Message d'origine-----
> De : address@hidden 
> [mailto:address@hidden De la part de Tai-hwa Liang
> Envoy?: lundi 26 mars 2007 08:28
> ?: Mailing list for lwIP users
> Objet : Re: [lwip-users] local connection failed on loopif, race?
>
>
> On Mon, 19 Mar 2007, Tai-hwa Liang wrote:
>> Hi,
>>
>>  Is there anyone having success on using loopif in a multi-threaded 
>> application?  The attached is a modified unix/proj/unixsim/simhost.c. 
>> Basically it is a self-contained server/client which binds/connects 
>> to 127.0.0.2 through loopif.
>>
>>  The problem is that this program failed to run indefinitely and hung 
>> after a few client <-> server transactions(lwIP CVS version, with 
>> LWIP_HAVE_LOOPIF = 1, running on FreeBSD-CURRENT):
>>
>> freebsd> ./simhost -d
>> Host at 127.0.0.2 mask 255.0.0.0 gateway 127.0.0.1
>> System initialized.
>> TCP/IP initialized.
>> netif_set_ipaddr: netif address being changed
>> netif: IP address of interface API message 0x8065a28
>> cli connecting...
[...]
>> tcp_output: snd_wnd 8096, cwnd 2048, wnd 2048, seg == NULL, ack 6539
>> State: FIN_WAIT_1
>> srv listening return, proceed to accept
>> [...hanging...]
>
>   I managed to workaround this hanging by adding a 1 second delay 
> between
> netconn_write() and netconn_delete():
>
>       /* pseudo code */
>       for (;;) {
>               srv = netconn_new(NETCONN_TCP)
>               netconn_connect(srv, srv_addr, srv_port);
>               netconn_write(srv, buf, len, NETCONN_NOCOPY);
>               sleep(1);       /* this is vital! */
>               netconn_delete(srv);
>       }
>
>   In addition to that, the default setting of MEMP_NUM_SYS_TIMEOUT in 
> unixsim still isn't able to cope with the load required in my loopif 
> testing case.  Raising the default value from 3 to 8 plus the 
> aforementioned 1 second delay seems to "resolve" the hanging observed 
> in my case.
>
>   Now the question turns to be: why armed sys_timeout() inside lwIP 
> seems to "lost" when the system is heavily stressed?

-- 
Thanks,

Tai-hwa Liang

Frédéric BERNON.vcf
Description: Frédéric BERNON.vcf

[Prev in Thread]

Current Thread

[Next in Thread]

RE : RE : [lwip-users] local connection failed on loopif, race?, Frédéric BERNON <=

Prev by Date: Re: [lwip-users] where can i find out the function for time check
Next by Date: 回复:Re: [lwip-users] where can i fi nd out the function for time check
Previous by thread: [lwip-users] where can i find out the function for time check
Next by thread: RE : Re: [lwip-users] where can i find out the function for time check
Index(es):
- Date
- Thread