bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overfl

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overfl

From:	J.P.
Subject:	bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow
Date:	Mon, 28 Mar 2022 05:08:56 -0700
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

Hi Mattias (and Eli),

Mattias Engdegård <mattiase@acm.org> writes:

> 27 mars 2022 kl. 22.54 skrev Mattias Engdegård <mattiase@acm.org>:
>
>> Not sure where this happens but it looks like it might be 
>> erc-dcc-send-filter.
>
> Presumably we should add a note to the documentation of process filter
> functions that they shouldn't be used to send more data to the process. (Not
> sure what to recommend, an idle timer maybe?)

As you highlighted up thread, inhibited sends appeal to
`wait_reading_process_output' to try and shake something loose. I agree
such occasions seem likeliest to trigger "unexpected" filter nesting.

As far as best practices go, I'm not sure how a successful
request-response dialog can happen without participants being able to
react in a timely fashion. If the main worry is stack growth, then
perhaps scheduling something on the event loop with a timer (as you say)
makes the most sense.

I actually tried that (via `run-at-time') in the tinkering detailed
below but still managed to incur "error running timer" messages that
referred to "excessive variable binding." Should I have used something
in the "idle" department instead?

The approach that seemed to "work" mimics the one relied on by ERC's
main client filter, which takes pains to ensure overlapping calls merely
stash the input and bail (via `erc-server-processing-p'). Perhaps a
synchronization primitive especially suited to process filters would
make more sense? (No idea.)

> I'm not going to fix this because I don't know ERC very well and wouldn't be
> able to test it sufficiently, but our ERC maintainers do and can!

You're very generous, and your expertise is much appreciated. (But as
Eli says, "hopefully".)

                              . . .

Hi Fernando,

I've managed to trigger behavior (somewhat) resembling what you've
described. It's quite possible, even likely, that this is total baloney,
but please humor me anyway this one round (unless you spot something
totally egregious, that is). The basic "hypothesis" seems to comport
with Mattias's analysis. It posits that the peer you're connecting to is
misbehaving and engaged in some combination of:

 1. not reading frequently enough amid sends

 2. being too stingy with its read buffer

I'd be great if you could confirm this suspicion by checking if the
"window" portion of the TCP ACK segments containing actual payloads go
to zero after a spell. This can be done with tcpdump or wireshark. A
well behaved peer respecting the protocol should normally have nonzero
windows throughout.

On the client side (Emacs master this time), here is what I observe
after following the steps enumerated further down: There appears to be
another buffer of around 64KB that needs filling before I (the receiver)
encounter the error, which in my case is a "variable binding depth
exceeds max-specpdl-size" message along with an unresponsive UI. For me,
this happens without fail at around 800MB after the TCP window goes to 0
(at around 100MB). Strangely, increasing the value of `max-specpdl-size'
doesn't change things perceptively.

Anyway, the file-writing operation continues for around 200MB and
eventually peters out. But the connection only dies after the sender
closes it normally (having sent everything). The IRC connection of
course PINGs out and is severed by the IRC server. The Emacs receiver
process eventually recovers responsiveness (if you wait long enough).

These are the steps I followed. They require two emacs -Q instances, a
server, and the attached script:

 1. connect to the server and make sure the sender can /whois the
    receiver (have them join the same #chan if necessary)

 2. start the script:

    python ./script.py ./some_large_file.bin misbehave

 3. on the sender:

    /msg RecvNick ^ADCC SEND some_large_file.bin 2130706433 9899 1234567890^A

    where 1234567890 is the size of the file in bytes and ^A is an
    actual control char

 4. on the receiver:

    /dcc get SendNick some_large_file.bin

As mentioned earlier, I've attached a crude experiment (patch) that just
records whether we're currently sending (so receipts can be skipped when
the flag is set). I used the process plist for now, but erc-dcc does
keep a global context-state object called `erc-dcc-entry-data', which I
suppose may be more fitting. The idea is to roll with the punches from a
pathological peer but also (of course) interoperate correctly with an
obedient, protocol-abiding one.

Normal sender
- Sends 1405135128 bytes
- Sees 343051 reports

Aberrant sender
- Sends 1405135128 bytes
- Ignores 238662 + unknown reports

Let me know if anything needs clarifying. And thanks for your patience!

serve.py
Description: Binary data

0001-EXPERIMENT-regulate-ACK-updates-in-erc-dcc-get-filte.patch
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Fernando de Morais, 2022/03/18
- bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, J.P., 2022/03/21
  - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Fernando de Morais, 2022/03/22
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Eli Zaretskii, 2022/03/22
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Fernando de Morais, 2022/03/27
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Eli Zaretskii, 2022/03/27
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Fernando de Morais, 2022/03/27
- bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Mattias Engdegård, 2022/03/27
  - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Mattias Engdegård, 2022/03/28
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Eli Zaretskii, 2022/03/28
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, J.P. <=
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Mattias Engdegård, 2022/03/29
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Eli Zaretskii, 2022/03/29
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Mattias Engdegård, 2022/03/29
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, J.P., 2022/03/29
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, J.P., 2022/03/30
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, Mattias Engdegård, 2022/03/30
    - bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow, J.P., 2022/03/31

Prev by Date: bug#54562: 28.0.91; Emoji sequence not composed
Next by Date: bug#54608: 29.0.50; xref-search-program-alist: support ugrep
Previous by thread: bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow
Next by thread: bug#54458: 27.2; erc-dcc-get: Re-entering top level after C stack overflow
Index(es):
- Date
- Thread