|
From: | Tazzari Davide |
Subject: | Re: [lwip-users] lwip lock |
Date: | Tue, 12 Apr 2011 17:36:17 +0200 |
Hi Kieran, I have done this improve in my webserver WebServer task (old version) ... for (;;) { iRestartBinding = 0; pxHTTPListener = netconn_new( NETCONN_TCP ); netconn_bind(pxHTTPListener, NULL, webHTTP_PORT ); netconn_listen( pxHTTPListener ); int iTimeout = 1000; for( ; (iRestartBinding < 10) && (gucRestartWebServer == FALSE); iRestartBinding++) { xLastFocusTime = xTaskGetTickCount(); vTaskDelayUntil( &xLastFocusTime, xDelayLength ); if (iGlobalWtdBomb == FALSE) // TRUE I am waiting for a WDT suicide { // Wait for a first connection. #if LWIP_SO_RCVTIMEO pxHTTPListener->recv_timeout = iTimeout; #endif pxNewConnection = netconn_accept(pxHTTPListener); if(pxNewConnection != NULL) { prvweb_ParseHTMLRequest( pxNewConnection ); netconn_close( pxNewConnection ); netconn_delete( pxNewConnection ); iRestartBinding = 0; iTimeout = 5000; }// end if new connection else { iTimeout = 1000; } } } // end acquisition loop gucRestartWebServer = FALSE; netconn_close(pxHTTPListener); while(netconn_delete(pxHTTPListener) != 0) { vTaskDelay(20); } pxHTTPListener = NULL; } ... static unsigned char prvweb_ParseHTMLRequest( struct netconn *pxNetCon ) { struct netbuf *pxRxBuffer; portCHAR *pcRxString; unsigned portSHORT usLength; /* We expect to immediately get data. */ pxNetCon->recv_timeout = 1000; pxRxBuffer = netconn_recv( pxNetCon ); if( pxRxBuffer != NULL ) { /* Where is the data? */ netbuf_data( pxRxBuffer, ( void * ) &pcRxString, &usLength ); ... netbuf_delete( pxRxBuffer ); return 0; } else { return -1; } } This was my first implementation. Why these two loops? Because, in this case, when the ethernet cable is unplugged and then plugged I recognize it and create again the listener. Anyway I loose the ethernet!!! I don't know if this is THE solution but, at least, this is a solution. After your comments I changed the web server task into a more flexible structure: on each accepted connection, I create a task to serve it in this way portTASK_FUNCTION( WebServerAnswerTask, pvParameters ) { struct netconn * pxNewConnection = (struct netconn *) pvParameters; prvweb_ParseHTMLRequest( pxNewConnection ); netconn_close( pxNewConnection ); netconn_delete( pxNewConnection ); vTaskDelete( NULL ); } And ... if(pxNewConnection != NULL) { if (xTaskCreate(WebServerAnswerTask, ( signed portCHAR * ) "WebServerAnswer", WEB_SERVER_STACK_SIZE, pxNewConnection, ethWEBSERVER_PRIORITY, ( xTaskHandle * ) NULL ) != pdPASS) { // Task not correctly created!!! netconn_write( pxNewConnection, (char *) webHTTP_HTM_INTERNAL_ERROR, (u16_t) strlen( webHTTP_HTM_INTERNAL_ERROR ), NETCONN_COPY ); // error HTTP 500 netconn_close( pxNewConnection ); netconn_delete( pxNewConnection ); } iRestartBinding = 0; iTimeout = 5000; }// end if new connection else { iTimeout = 1000; } instead of prvweb_ParseHTMLRequest( pxNewConnection ); netconn_close( pxNewConnection ); netconn_delete( pxNewConnection ); directly in the main web server task Results: Web server faster. MBOX full has never happened any more Mem area stuck is reduced but, unfortunately, not to zero. Very few times I have seen that the lfree pointer is different (and stucked) to ram pointer. In that few cases the web server remains not accessible till a reset. I monitore this value and I reset the machine (WDT) if occours. I dislike very much this but... anyway... this doesn't happen so often. Let's consider memp. Now, TCP_SEG seems correct and it seems that no blocks are lost. TCP_PCB, instead, goes to full usage almost immediately. I have set the limit to 12 and then to 30 but anytime a connection appears this number increases to reach the limit. My home page contains 8 images, 1 css and 1 js so, in a couple of reload I reach the limit (whatever set) I have read somewhere that even the connection is closed the pcb remains in a wait state (to wait for connection sinchronisation packet lost in the net) for a couple of minute and the rule is to use the "not used" pcb then the "wait" pcb so at the beginning I didn't take care of it. After 10 minutes waited, the relative lwip_stats.memp[i].used is still equal to the limit or, at least, one less: 12 limit, 11 used; the "used" never goes to zero. What I see is that, when the use pcb value is well below the limit the web server is almost fantastic, when the pcb value is near the limit the web server is slower and (this is the bad thing) sometimes locks. In that cases the lfree pointer of the mem area is stucked to a value different from the ram pointer. Moreover. For memory code problem I transferred all the code to SDRam. Of course I see a speed reduction but I expected it. The problem is that after few web server connection the web server sometimes locks i.e. connection refused ([RST, ACK] immediately after a [SYN] request) and no possibility to restart till a reset. It seems that this happens when the TCP_PCB limit is reached no matter the value of this limit. But sometimes everything functions no matter these values. The code is exactly the same, the difference is where this code is fetched from. About the lwip interface. I used in all the code only the netconn api (or at least I this is my intention!). I suppose I make some mistakes or somewhere in the code (FreeRtos? LWIP itself? My fault? ...) there is something that uses a low level lwip access I didn't find. Here is the lwip connection I have 1) WebServer (shown above) 2) PortalConnection ... // Send and receive function * pps_Connection = netconn_new(NETCONN_TCP); error_get_web = netconn_connect(* pps_Connection, &ipaddr, gs_EthernetParameters.siPort); if(error_get_web == 0) { if ((* pps_Connection)->pcb.tcp->state != ESTABLISHED) // if the portal doesn't respond I don't receive any error at all! { DestroyConnection(pps_Connection); return ERR_CONN; } else { error_get_web = netconn_write(* pps_Connection, pcBuffer, iSize, NETCONN_COPY ); if ((* pps_Connection)->state != NETCONN_NONE) { // error code but connection not destroyed. I don't know what to do here and if I have to do something!!! } } } #if LWIP_SO_RCVTIMEO ps_Connection->recv_timeout = 10000; // 10 sec max #endif unsigned char ucFlagFirstPage = TRUE; while( (nb = netconn_recv( ps_Connection ) ) != NULL ) { netbuf_data( nb, (void *) & pcPageData, & usLength ); ... // transfer data to a temporary file to be analyzed later. netbuf_delete(nb); } #if LWIP_SO_RCVTIMEO if (ps_Connection->err == ERR_TIMEOUT) { DestroyConnection(& ps_Connection); return ERR_TIMEOUT; } #endif DestroyConnection(& ps_Connection); And void DestroyConnection(struct netconn ** pps_Connection) /// \breif Destroy active connection /// \param pps_Connection pointer to pointer to connection { if (pps_Connection == NULL) return; netconn_close(* pps_Connection); while(netconn_delete(* pps_Connection) != 0) { vTaskDelay(DELAY_TO_WAIT_DISPOSE_CONNECTION); } * pps_Connection = NULL; } 3) UDP Debug Client (this task sends data to a remote client). conn = netconn_new(NETCONN_UDP); if (conn != NULL) { nb = netbuf_new(); netconn_connect(conn, &ipaddr, ti_UdpPortDebug.i); while (xQueueReceive(xQueueUdpDebug, & s_Block, 0) == pdTRUE) { sprintf(pcBitmaskCode, "####### Code: %02x - %02x #######\r\n", s_Block.ucClassCode, s_Block.ucSubClassCode); netbuf_ref(nb, pcBitmaskCode, strlen(pcBitmaskCode)); cError = netconn_send(conn, nb); vTaskDelay(10); int len = strlen(s_Block.pcTextBloc); int i = 0; while ((len - i) > 1000) { netbuf_ref(nb, (char *) & s_Block.pcTextBloc[i], (unsigned short)1000); cError = netconn_send(conn, nb); i += 1000; vTaskDelay(20); } if (len - i) { netbuf_ref(nb, (char *) & s_Block.pcTextBloc[i], (unsigned short)(len - i)); cError = netconn_send(conn, nb); vTaskDelay(20); } netbuf_ref(nb, "\r\n\r\n", 4); cError = netconn_send(conn, nb); vTaskDelay(5); vPortFree(s_Block.pcTextBloc); vTaskDelay(5); } netconn_disconnect(conn); netbuf_free(nb); netbuf_delete(nb); } Forget for the moment the s_Block data; it is a structure to enqueue a debug message text 4) UDP Configuration Server portTASK_FUNCTION( vBasicUDPCOMSERVER, pvParameters ) { struct udp_pcb *connUdp; err_t myError; connUdp = udp_new(); myError = udp_bind(connUdp, IP_ADDR_ANY, UDPCOMNET_PORT); udp_recv(connUdp, Server_udp_recv, NULL); cUDPTxBuffer[0] = myError; // Loop forever for( ;; ) { vTaskDelay(1000); __asm__ __volatile__("nop"); } } void Server_udp_recv(void *_args, struct udp_pcb *upcb, struct pbuf *pBuffUdp, struct ip_addr *Remoteaddr, u16_t Remote_port_udp) { int uiUdpLenMessage= 0; if(pBuffUdp!= NULL) { .. // message analyzed udp_sendto(...); // send the answer pbuf_free(pBuffUdp); } } 5) UDP Management Server Exactly as the UDP Configuration Server Anyway, during the web server lock, udp servers and client were not used In my knowledge, nothing else is using lwip. Where do I have to look for unknown low level lwip access? Any further clever idea about all these problems? Sorry for boring with such huge e-mail Best regards Davide -----Original Message----- On Tue, 2011-03-22 at 16:41 +0100, Tazzari Davide wrote: > > Anyone has ever seen such a problem? It sounds like you're corrupting internal stack state by having more than one thread active in lwIP's core at the same time. This would also explain tcpip_thread being stalled as it is probably stuck in a loop iterating a corrupt list. > Any suggestion on how to solve it? Make sure that only one thread is active in lwIP at once. This should in your case be the tcpip_thread. All other threads (including interrupts) should make sure they're not calling directly into lwIP and are instead queueing work for the tcpip_thread to perform for them. If you're using the sockets API then most of this will be done for you but you still need to be careful; you can't use one socket in two different thread for example. Make sure your driver is interfacing to lwIP correctly as that was a common source of porting errors. Kieran _______________________________________________ lwip-users mailing list address@hidden http://lists.nongnu.org/mailman/listinfo/lwip-users |
[Prev in Thread] | Current Thread | [Next in Thread] |