lwip-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-devel] [bug #62141] LWIP crashes in tcp_close(), freeing the same


From: Ryan
Subject: [lwip-devel] [bug #62141] LWIP crashes in tcp_close(), freeing the same PCB twice
Date: Fri, 4 Mar 2022 21:17:24 -0500 (EST)

URL:
  <https://savannah.nongnu.org/bugs/?62141>

                 Summary: LWIP crashes in tcp_close(), freeing the same PCB
twice
                 Project: lwIP - A Lightweight TCP/IP stack
            Submitted by: rwyc
            Submitted on: Sat 05 Mar 2022 02:17:22 AM UTC
                Category: TCP
                Severity: 3 - Normal
              Item Group: Crash Error
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None
            lwIP version: 2.1.3

    _______________________________________________________

Details:

Hi, 

I'm occasionally seeing a crash in my application where LWIP is freeing the
same PCB twice. 
This seems to be similar to the behavior reported in [bug
#62071](https://savannah.nongnu.org/bugs/?62071). 

Here's a backtrace of when the crash happened:

```
#0  0x000285c4 in ParmCrash (parm=parm@entry=0, pCond=pCond@entry=0xc54e4 "",
pText=pText@entry=0xc54e4 "",
    pFileName=pFileName@entry=0xae0f0 "src/core/memp.c",
lineNum=lineNum@entry=389)
    at <our_specific_file>/fatal_error.c:97
#1  0x00028638 in do_memp_free_pool (desc=<optimized out>,
mem=mem@entry=0x20012c90 <memp_memory_TCP_PCB_base+20664>)
    at src/core/memp.c:389
#2  0x0002875c in memp_free (type=type@entry=MEMP_TCP_PCB,
mem=mem@entry=0x20012c90 <memp_memory_TCP_PCB_base+20664>)
    at src/core/memp.c:440
#3  0x0007a1ce in tcp_free (pcb=<optimized out>) at src/core/tcp.c:216
#4  tcp_close_shutdown (rst_on_unacked_data=<optimized out>, pcb=<optimized
out>) at src/core/tcp.c:370
#5  tcp_close (pcb=0x20012c90 <memp_memory_TCP_PCB_base+20664>,
pcb@entry=0x2002119c <ucHeap+56860>)
    at src/core/tcp.c:498
#6  0x000834f4 in lwip_netconn_do_close_internal (conn=0x200073dc
<memp_memory_NETCONN_base+528>, delayed=delayed@entry=0 '\000')
    at src/api/api_msg.c:1012
#7  0x00084228 in lwip_netconn_do_close (m=0x20021344 <ucHeap+57284>) at
src/api/api_msg.c:1991
#8  0x000844de in tcpip_send_msg_wait_sem (fn=0x841c9 <lwip_netconn_do_close>,
apimsg=apimsg@entry=0x20021344 <ucHeap+57284>, sem=<optimized out>)
    at src/api/tcpip.c:442
#9  0x00082faa in netconn_apimsg (apimsg=0x20021344 <ucHeap+57284>,
fn=<optimized out>) at src/api/api_lib.c:131
#10 netconn_close_shutdown (how=3 '\003', conn=<optimized out>) at
src/api/api_lib.c:1112
#11 netconn_close (conn=<optimized out>) at src/api/api_lib.c:1129
```

This is on the server side, where the application layer (using `netconn`) is
not notified of the connection's PCB being freed, which leads to the same PCB
being freed twice. 

In my application, I have an http server that does the following using the
`netconn` API (code below is simplified but should suffice to indicate how I'm
using the library):
```
netconn_accept() // Accept an incoming connection
netconn_recv() // Read data from connection
netconn_shutdown() // Initiate shutdown with only shut_tx = true
netconn_recv() // Dummy read to gracefully close connection
netconn_close() // Close the connection
netconn_delete() // Delete the connection
```

The following sequence of events could happen:
* Before the server shuts down the TX side of the connection, a `FIN` is
received from the client.   
* At this point, the TCP thread will send a `TCP_EVENT_CLOSED` message to the
connection's recv mailbox   
* The server then shuts down the TX side of the connection
* When the application performs the dummy read, the dummy read will initiate a
closing of the RX side of the connection.  
* Before the connection is closed by the server using `netconn_close()`, the
`LAST_ACK` is received by the server. 
* This calls `tcp_input_delayed_close` which then frees the PCB. 
    * However, because both the RX and TX side of the connection is already
closed, the application is never notified that the PCB is freed. 
* Finally, the application calls `netconn_close()` which will then try and
free the PCB again, leading to the crash. 

```
netconn_accept() // Accept an incoming connection
netconn_recv() // Read data from connection
// <--------- FIN recved, sending TCP_EVENT_CLOSED message
netconn_shutdown() // Initiate shutdown with only shut_tx = true 
                   // TX side is now closed
netconn_recv() // Dummy read to gracefully close connection
               // TCP_EVENT_CLOSED message received, RX side is now closed
// <--------- LAST_ACK recved, free PCB
netconn_close() // Close the connection
                // Free PCB again (assert thrown)
```

Below are some logs that show this sequence of events (these were log
statements that I added). 
The numbers added are the pointer values of the connection and PCB objects in
our application.
```
{0:00:01:20.531} [err] src/api/api_msg.c(573): New connection with conn:
200073dc pcb: 20012c90
{0:00:01:20.532} [err] src/core/tcp_in.c(541): Got FIN for pcb 20012c90, send
event closed
{0:00:01:20.580} [err] src/api/api_lib.c(1096): netconn_close_shutdown conn:
200073dc: pcb: 20012c90 how: 2
{0:00:01:20.581} [err] src/api/api_lib.c(1096): netconn_close_shutdown conn:
200073dc: pcb: 20012c90 how: 1
{0:00:01:20.581} [err] src/core/tcp_in.c(1039): Got LAST_ACK for pcb 20012c90
{0:00:01:20.581} [err] src/core/tcp_in.c(615): tcp_input_delayed_close: free
pcb 20012c90
{0:00:01:20.582} [err] src/api/api_lib.c(1096): netconn_close_shutdown conn:
200073dc: pcb: 20012c90 how: 3
{0:00:01:20.582} Crash: 0, "", "", file src/core/memp.c, line 389
```

Combining with the `netconn` sequence above, what happens is:

```
netconn_accept() // Accepts an incoming connection
                 // {0:00:01:20.531} [err] src/api/api_msg.c(573): New
connection with conn: 200073dc pcb: 20012c90
netconn_recv() // Read data from connection
// <--------- FIN recved, sending TCP_EVENT_CLOSED message
// {0:00:01:20.532} [err] src/core/tcp_in.c(541): Got FIN for pcb 20012c90,
send event closed
netconn_shutdown() // Initiate shutdown with only shut_tx = true 
                   // TX side is now closed
                   // {0:00:01:20.580} [err] src/api/api_lib.c(1096):
netconn_close_shutdown conn: 200073dc: pcb: 20012c90 how: 2
netconn_recv() // Dummy read to gracefully close connection
               // TCP_EVENT_CLOSED message received, RX side is now closed
               // {0:00:01:20.581} [err] src/api/api_lib.c(1096):
netconn_close_shutdown conn: 200073dc: pcb: 20012c90 how: 1
// <--------- LAST_ACK recved, free PCB
// {0:00:01:20.581} [err] src/core/tcp_in.c(1039): Got LAST_ACK for pcb
20012c90
// {0:00:01:20.581} [err] src/core/tcp_in.c(615): tcp_input_delayed_close:
free pcb 20012c90
netconn_close() // Close the connection
                // Free PCB again (assert thrown)
                // {0:00:01:20.582} [err] src/api/api_lib.c(1096):
netconn_close_shutdown conn: 200073dc: pcb: 20012c90 how: 3
                // {0:00:01:20.582} Crash: 0, "", "", file src/core/memp.c,
line 389
```

In most instances, the dummy read doesn't happen fast enough to close the RX
side of the connection before the `LAST_ACK` is received, which frees the PCB
and notifies the application about it. It's only in these rare instances that
the TX and RX side of the connection is already closed before the `LAST_ACK`
is received where the application is not notified of the PCB being freed.

I've created a small application (attached) that generates 
requests to our server and immediately shuts down the connection, then waits a
bit before performing the final close. Executing this continually will cause
my application to crash almost immediately (in my case maybe after 50 or so
requests?).

I'm not sure whether this is a problem in `tcp_input_delayed_close`, where we
should always notify the application layer that the PCB has been freed, or
whether it's something that should be fixed in `netconn`. I've placed this in
the TCP category for now. 

In the meantime, I've updated our code to always notify the application layer
when the PCB is freed to get around this problem:

```
--- a/src/core/tcp_in.c
+++ b/src/core/tcp_in.c
@@ -605,14 +605,7 @@ tcp_input_delayed_close(struct tcp_pcb *pcb)
   if (recv_flags & TF_CLOSED) {
     /* The connection has been closed and we will deallocate the
         PCB. */
-    if (!(pcb->flags & TF_RXCLOSED)) {
-      /* Connection closed although the application has only shut down the
-          tx side: call the PCB's err callback and indicate the closure to
-          ensure the application doesn't continue using the PCB. */
-      TCP_EVENT_ERR(pcb->state, pcb->errf, pcb->callback_arg, ERR_CLSD);
-    }
+    TCP_EVENT_ERR(pcb->state, pcb->errf, pcb->callback_arg, ERR_CLSD);
     tcp_pcb_remove(&tcp_active_pcbs, pcb);
```

This seems to have resolved the problem from my testing but I'm not familiar
enough with the LWIP code to know whether this is the correct change. Would
appreciate any input on how this should be resolved! 



    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Sat 05 Mar 2022 02:17:23 AM UTC  Name: main.cpp  Size: 676B   By: rwyc

<http://savannah.nongnu.org/bugs/download.php?file_id=52956>

    _______________________________________________________

Reply to this item at:

  <https://savannah.nongnu.org/bugs/?62141>

_______________________________________________
  Message sent via Savannah
  https://savannah.nongnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]