[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE : RE : [lwip-users] local connection failed on loopif, race?
From: |
Frédéric BERNON |
Subject: |
RE : RE : [lwip-users] local connection failed on loopif, race? |
Date: |
Fri, 30 Mar 2007 09:36:49 +0200 |
>It is interesting to note that with a "sys_msleep(1)" inside the patch, the
>performance is much better than the patch without one. I guess that's
>probably because the sys_msleep() gives another thread a chance to run, which
>is crucial in my testing case
Yes, even it is not the nicest solution, it avoid to consumne all cycles with
this check, and let others threads (like tcpip_thread) running.
>my test usually crashed into a NULL conn->pcb, which requires another
>(conn->pcb.tcp != NULL) workaround:
Yes, I also add it now, even if a crash in this part is more the result from
another problem (allocation error somewhere else).
>why armed sys_timeout() inside lwIP seems to "lost" when the system is heavily
>stressed?
In a general way, lwIP is not very reliable in upper-layers, due to some malloc
which are not all checked. More, some critical internal features could hang the
stack or a calling thread, or cause some strange behaviors. If your have
MEM_LIBC_MALLOC==0, you use the internal lwIP heap, with its limitations.
mem_malloc is used in protocols such DHCP, SNMP, IGMP, but the real use if for
pbuf_alloc(..., PBUF_RAM) and for loopif_output. If the first case is normal, I
think that using loopif_output got some problems :
If you got a mem_malloc( sizeof( void *[2]))c error, the "r" pbuf is not freed
(and "one day" you will got a "not enough memory")
Inside sys_timeout, if you got a memp_malloc(MEMP_SYS_TIMEOUT) error, there is
no way to know by the caller is the timer is initialized (and to free any
resources in this case, and so "one day", you will got a "not enough memory").
Last, using a timer to communicate pbuf between two "contexts" (and in fact,
this is the same, tcpip_thread) seems a little strange...
So, things to do seems to be :
- add pbuf_free when "if( NULL == arg ) {" is true
- add a return type de sys_timeout, and check result to do a pbuf_free and a
mem_free.
- prehaps redesign with a mbox + a new thread (but, I don't think it's a good
idea)?
I don't use loopif in my applications, so, I suggest you to add a new bug on
savannah bug tracker to explain these problems, hoping that a developper can
help you, or, better, propose your own patch...
====================================
Frédéric BERNON
HYMATOM SA
Chef de projet informatique
Microsoft Certified Professional
Tél. : +33 (0)4-67-87-61-10
Fax. : +33 (0)4-67-70-85-44
Email : address@hidden
Web Site : http://www.hymatom.fr
====================================
P Avant d'imprimer, penser à l'environnement
-----Message d'origine-----
De : address@hidden [mailto:address@hidden De la part de Tai-hwa Liang
Envoyé : vendredi 30 mars 2007 03:23
À : Mailing list for lwIP users
Objet : Re: RE : [lwip-users] local connection failed on loopif, race?
On Mon, 26 Mar 2007, Fr嶮廨ic BERNON wrote:
> The workaround you have find is "something like" the one I proposed in
> https://savannah.nongnu.org/bugs/?19157 ("lwip_close problems"). It
> would be good to add a netconn_close in your code. Follow information
> for bug #19157 : lwip_close problems, I will propose a patch, like
> (not yet tested like this) :
>
> err_t
> netconn_close(struct netconn *conn)
> {
> struct api_msg msg;
>
> if (conn == NULL) {
> return ERR_VAL;
> }
>
> conn->state = NETCONN_CLOSE;
> again:
> msg.type = API_MSG_CLOSE;
> msg.msg.conn = conn;
> api_msg_post(&msg);
> if (conn->err == ERR_MEM && conn->sem != SYS_SEM_NULL) {
> sys_sem_wait(conn->sem);
> goto again;
> }
> if (conn->type==NETCONN_TCP) {
> if (((sock->conn->pcb.tcp->unacked!=NULL) ||
> (sock->conn->pcb.tcp->unsent!=NULL)) && (sock->conn->err==ERR_OK)) {
> sys_msleep(1);//I don't like that, but...
> goto again:
> }
> conn->state = NETCONN_NONE;
> return conn->err;
> }
Hi Fr嶮廨ic,
With your patch, my testing code never hangs(complete millions of
client/server transactions). It is interesting to note that with a
"sys_msleep(1)" inside the patch, the performance is much better than the patch
without one. I guess that's probably because the sys_msleep() gives another
thread a chance to run, which is crucial in my testing case.
On the other hand, for certain low memory constraint, say, MEM_SIZE=1024, my
test usually crashed into a NULL conn->pcb, which requires another
(conn->pcb.tcp != NULL) workaround:
if ((conn->pcb.tcp != NULL) && ((conn->pcb.tcp->unacked != NULL) ||
(conn->pcb.tcp->unsent != NULL)) && (conn->err == ERR_OK)) {
sys_msleep(1);
goto again;
}
However, this workaround does not solve the hanging problem I've mentioned
about in my first post. It turns out that raising MEM_SIZE to 2048 is the only
'solution' at this moment.
> -----Message d'origine-----
> De : address@hidden
> [mailto:address@hidden De la part de Tai-hwa Liang
> Envoy?: lundi 26 mars 2007 08:28
> ?: Mailing list for lwIP users
> Objet : Re: [lwip-users] local connection failed on loopif, race?
>
>
> On Mon, 19 Mar 2007, Tai-hwa Liang wrote:
>> Hi,
>>
>> Is there anyone having success on using loopif in a multi-threaded
>> application? The attached is a modified unix/proj/unixsim/simhost.c.
>> Basically it is a self-contained server/client which binds/connects
>> to 127.0.0.2 through loopif.
>>
>> The problem is that this program failed to run indefinitely and hung
>> after a few client <-> server transactions(lwIP CVS version, with
>> LWIP_HAVE_LOOPIF = 1, running on FreeBSD-CURRENT):
>>
>> freebsd> ./simhost -d
>> Host at 127.0.0.2 mask 255.0.0.0 gateway 127.0.0.1
>> System initialized.
>> TCP/IP initialized.
>> netif_set_ipaddr: netif address being changed
>> netif: IP address of interface API message 0x8065a28
>> cli connecting...
[...]
>> tcp_output: snd_wnd 8096, cwnd 2048, wnd 2048, seg == NULL, ack 6539
>> State: FIN_WAIT_1
>> srv listening return, proceed to accept
>> [...hanging...]
>
> I managed to workaround this hanging by adding a 1 second delay
> between
> netconn_write() and netconn_delete():
>
> /* pseudo code */
> for (;;) {
> srv = netconn_new(NETCONN_TCP)
> netconn_connect(srv, srv_addr, srv_port);
> netconn_write(srv, buf, len, NETCONN_NOCOPY);
> sleep(1); /* this is vital! */
> netconn_delete(srv);
> }
>
> In addition to that, the default setting of MEMP_NUM_SYS_TIMEOUT in
> unixsim still isn't able to cope with the load required in my loopif
> testing case. Raising the default value from 3 to 8 plus the
> aforementioned 1 second delay seems to "resolve" the hanging observed
> in my case.
>
> Now the question turns to be: why armed sys_timeout() inside lwIP
> seems to "lost" when the system is heavily stressed?
--
Thanks,
Tai-hwa Liang
Frédéric BERNON.vcf
Description: Frédéric BERNON.vcf
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- RE : RE : [lwip-users] local connection failed on loopif, race?,
Frédéric BERNON <=