lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-users] Raw TCP, intermittent long delays in accept after after clo


From: Geoff Simmons
Subject: [lwip-users] Raw TCP, intermittent long delays in accept after after close
Date: Tue, 4 Oct 2022 19:16:40 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.0

Hello,

I'm a new subscriber, and am working on my first "serious" LWIP project (meaning more than just a sample). This is an HTTP server for the Raspberry Pi PicoW, using the raw TCP API, accessed via the Pico C SDK (in which LWIP is a git submodule):

https://gitlab.com/slimhazard/picow_http

It's going well, except for one problem that has me stumped after trying to fix it for days. If a client attempts to connect shortly after a number of connections were closed, then intermittently (not always, but fairly often), the accept process stalls for a long time -- seems to be as long as 10 seconds, maybe more.

So I'm hoping that someone on the list can help spot the error.

When this happens, I see sequences like this in debug output:

TCP connection request 40270 -> 80.
tcp_enqueue_flags: queueing 27040:27041 (0x12)
tcp_output_segment: 27040:27040
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_output_segment: 27040:27040
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_output_segment: 27040:27040
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_output_segment: 27040:27040
TCP connection established 40270 -> 80.

The pattern is always:

- "TCP connection request"
- tcp_enqueue_flags with a range of 1 ("queueing n:(n+1)"), always with the hex value 0x12
- this sequence, repeated many times:
  - "tcp_slowtmr: processing active pcb"
  - "tcp_slowtmr: polling application"
  - "tcp_output: nothing to send (00000000)",

tcp_output_segment with range 0 ("n:n") is interspersed in the repeating sequence. After the stall comes "TCP connection established", and then everything proceeds normally. With long timeouts on the client side, all of the requests succeed, despite the long stall.

All of this happens before the tcp_accept callback is invoked. When the stalls happen, I see the client side sending SYN retransmissions in wireshark. I haven't noticed anything else unusual in wireshark (of course it's easy to overlook something).

I usually see this when repeating a test script that sends a few dozen requests. There's no stall on the first connection after server startup. There's also no stall if I wait long enough between sending batches of requests. But if I run the test script and then start it again shortly afterward, it can stall for quite a while on the second run.

During a stall, I see MEM TCP_PCB stats showing "used" == "max", i.e. all tcp_pcbs in the pool are used. I assume that after all connections are closed following a series of requests, they *should* be in TIME_WAIT, and then for the next connection, the oldest PCB in TIME_WAIT gets re-used. I have seen debug tcp debug output saying exactly that.

But I suspect that my application code is not doing everything right about closing connections. Bearing in mind that there's a lot I don't know about LWIP, so this hypothesis may be nonsense -- the feeling is that I have discarded a connection, thinking that is fully closed and should be in TIME_WAIT; but it isn't. Then on the next client connection, the PCB thinks it still needs to send something like an ACK or FIN, and stalls while doing so, accounting for the long sequence "processing active pcb" and "nothing to send". Eventually (because a timeout elapses?) the PCB gives up and accept can proceed.

Does (0x12) in the tcp_enqueue_flags debug output refer to a PCB's tcp flags? If so, then the value is TF_RXCLOSED | TF_ACK_DELAY. Is that significant? It doesn't "sound right" for a PCB to be used for an incoming connection.

Some things I've tried to fix the problem, none of which have succeeded:

- Wait until all bytes of a sent response have been ACKed (using the tcp_sent callback). This may mean that HTTP request pipelining is not possible. And it hasn't helped.

- Increase MEMP_NUM_TCP_PCB. But it doesn't seem to matter, if there have been enough requests so that all of them are used (and should be in TIME_WAIT), then the same thing happens. When I "wait long enough" between batches of requests, 6 PCBs are enough (I've tried up to 24).

- Increase MEM_SIZE. MEM HEAP stats show very consistently that it never needs more than 4800 bytes.

- Increase MEMP_NUM_TCP_SEG from 32 to 64. A bit of a desperation move, because I don't understand what it does, at any rate that didn't help.

Sorry for the long introductory post, I'm trying to cover what I think I've understood about the problem. I assume that I've misunderstood something about the TCP API, and someone can set me straight.


Thanks,
Geoff



reply via email to

[Prev in Thread] Current Thread [Next in Thread]