[lwip-users] Raw TCP, intermittent long delays in accept after after clo

lwip-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-users] Raw TCP, intermittent long delays in accept after after clo

From:	Geoff Simmons
Subject:	[lwip-users] Raw TCP, intermittent long delays in accept after after close
Date:	Tue, 4 Oct 2022 19:16:40 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.0

Hello,

I'm a new subscriber, and am working on my first "serious" LWIP project(meaning more than just a sample). This is an HTTP server for theRaspberry Pi PicoW, using the raw TCP API, accessed via the Pico C SDK(in which LWIP is a git submodule):


https://gitlab.com/slimhazard/picow_http

It's going well, except for one problem that has me stumped after tryingto fix it for days. If a client attempts to connect shortly after anumber of connections were closed, then intermittently (not always, butfairly often), the accept process stalls for a long time -- seems to beas long as 10 seconds, maybe more.


So I'm hoping that someone on the list can help spot the error.

When this happens, I see sequences like this in debug output:

TCP connection request 40270 -> 80.
tcp_enqueue_flags: queueing 27040:27041 (0x12)
tcp_output_segment: 27040:27040
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_output_segment: 27040:27040
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_output_segment: 27040:27040
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_output_segment: 27040:27040
TCP connection established 40270 -> 80.

The pattern is always:

- "TCP connection request"

- tcp_enqueue_flags with a range of 1 ("queueing n:(n+1)"), always withthe hex value 0x12

- this sequence, repeated many times:
  - "tcp_slowtmr: processing active pcb"
  - "tcp_slowtmr: polling application"
  - "tcp_output: nothing to send (00000000)",

tcp_output_segment with range 0 ("n:n") is interspersed in the repeatingsequence. After the stall comes "TCP connection established", and theneverything proceeds normally. With long timeouts on the client side, allof the requests succeed, despite the long stall.

All of this happens before the tcp_accept callback is invoked. When thestalls happen, I see the client side sending SYN retransmissions inwireshark. I haven't noticed anything else unusual in wireshark (ofcourse it's easy to overlook something).

I usually see this when repeating a test script that sends a few dozenrequests. There's no stall on the first connection after server startup.There's also no stall if I wait long enough between sending batches ofrequests. But if I run the test script and then start it again shortlyafterward, it can stall for quite a while on the second run.

During a stall, I see MEM TCP_PCB stats showing "used" == "max", i.e.all tcp_pcbs in the pool are used. I assume that after all connectionsare closed following a series of requests, they *should* be inTIME_WAIT, and then for the next connection, the oldest PCB in TIME_WAITgets re-used. I have seen debug tcp debug output saying exactly that.

But I suspect that my application code is not doing everything rightabout closing connections. Bearing in mind that there's a lot I don'tknow about LWIP, so this hypothesis may be nonsense -- the feeling isthat I have discarded a connection, thinking that is fully closed andshould be in TIME_WAIT; but it isn't. Then on the next clientconnection, the PCB thinks it still needs to send something like an ACKor FIN, and stalls while doing so, accounting for the long sequence"processing active pcb" and "nothing to send". Eventually (because atimeout elapses?) the PCB gives up and accept can proceed.

Does (0x12) in the tcp_enqueue_flags debug output refer to a PCB's tcpflags? If so, then the value is TF_RXCLOSED | TF_ACK_DELAY. Is thatsignificant? It doesn't "sound right" for a PCB to be used for anincoming connection.


Some things I've tried to fix the problem, none of which have succeeded:

- Wait until all bytes of a sent response have been ACKed (using thetcp_sent callback). This may mean that HTTP request pipelining is notpossible. And it hasn't helped.

- Increase MEMP_NUM_TCP_PCB. But it doesn't seem to matter, if therehave been enough requests so that all of them are used (and should be inTIME_WAIT), then the same thing happens. When I "wait long enough"between batches of requests, 6 PCBs are enough (I've tried up to 24).

- Increase MEM_SIZE. MEM HEAP stats show very consistently that it neverneeds more than 4800 bytes.

- Increase MEMP_NUM_TCP_SEG from 32 to 64. A bit of a desperation move,because I don't understand what it does, at any rate that didn't help.

Sorry for the long introductory post, I'm trying to cover what I thinkI've understood about the problem. I assume that I've misunderstoodsomething about the TCP API, and someone can set me straight.



Thanks,
Geoff

[Prev in Thread]

Current Thread

[Next in Thread]

[lwip-users] Raw TCP, intermittent long delays in accept after after close, Geoff Simmons <=
- Re: [lwip-users] Raw TCP, intermittent long delays in accept after after close, address@hidden, 2022/10/12
  - Re: [lwip-users] Raw TCP, intermittent long delays in accept after after close, Geoff Simmons, 2022/10/20
    - Re: [lwip-users] Raw TCP, intermittent long delays in accept after after close, address@hidden, 2022/10/20
    - Re: [lwip-users] Raw TCP, intermittent long delays in accept after after close, Geoff Simmons, 2022/10/20
    - Re: [lwip-users] Raw TCP, intermittent long delays in accept after after close, Geoff Simmons, 2022/10/24

Prev by Date: [lwip-users] LWIP_ALLOW_MEM_FREE_FROM_OTHER_CONTEXT
Next by Date: [lwip-users] SLAAC
Previous by thread: [lwip-users] LWIP_ALLOW_MEM_FREE_FROM_OTHER_CONTEXT
Next by thread: Re: [lwip-users] Raw TCP, intermittent long delays in accept after after close
Index(es):
- Date
- Thread