qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 4/5] block/nbd: drop connection_co


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH v6 4/5] block/nbd: drop connection_co
Date: Mon, 27 Sep 2021 14:57:36 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0

24.09.2021 20:44, Eric Blake wrote:
On Thu, Sep 02, 2021 at 01:38:04PM +0300, Vladimir Sementsov-Ogievskiy wrote:
OK, that's a big rewrite of the logic.

And a time-consuming review on my part!


Pre-patch we have an always running coroutine - connection_co. It does
reply receiving and reconnecting. And it leads to a lot of difficult
and unobvious code around drained sections and context switch. We also
abuse bs->in_flight counter which is increased for connection_co and
temporary decreased in points where we want to allow drained section to
begin. One of these place is in another file: in nbd_read_eof() in
nbd/client.c.

We also cancel reconnect and requests waiting for reconnect on drained
begin which is not correct. And this patch fixes that.

Let's finally drop this always running coroutine and go another way:
do both reconnect and receiving in request coroutines.

The detailed list of changes below (in the sequence of diff hunks).

Well, depending on the git order file in use ;)


1. receiving coroutines are woken directly from nbd_channel_error, when
    we change s->state

2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
    and in nbd_teardown_connection() all requests should already be
    finished (and reconnect is done from request). So
    nbd_co_establish_connection_cancel() is called from
    nbd_cancel_in_flight() (to cancel the request that is doing
    nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
    (previously we didn't need it, as reconnect delay only should cancel
    active requests not the reconnection itself. But now reconnection

Missing )

    itself is done in the separate thread (we now call
    nbd_client_connection_enable_retry() in nbd_open()), and we need to
    cancel the requests that waits in nbd_co_establish_connection()

Singular/plural disagreement. I think the intended meaning is
'requests that wait' and not 'request that waits'.

    now).

2. We do receive headers in request coroutine. But we also should

Second point 2; I'll call it 2A below, because it looks related to
point 8.

Ohh. OK.


    dispatch replies for another pending requests. So,

s/another/other/

    nbd_connection_entry() is turned into nbd_receive_replies(), which
    does reply dispatching until it receive another request headers, and

s/until/while/, s/another/other/

    returns when it receive the requested header.

receives


3. All old staff around drained sections and context switch is dropped.
    In details:
    - we don't need to move connection_co to new aio context, as we
      don't have connection_co anymore
    - we don't have a fake "request" of connection_co (extra increasing
      in_flight), so don't care with it in drain_begin/end
    - we don't stop reconnection during drained section anymore. This
      means that drain_begin may wait for a long time (up to
      reconnect_delay). But that's an improvement and more correct
      behavior see below[*]

4. In nbd_teardown_connection() we don't have to wait for
    connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
    is moved here from removed connection_co.

5. In nbd_co_do_establish_connection() we now should handle
    NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
    NBD_CLIENT_CONNECTING_NOWAIT, it still should call
    nbd_co_establish_connection() (who knows, maybe connection already
    established by thread in background). But we shouldn't wait: if

maybe the connection was already established by another thread in the background

"another thread" sounds vague. I think better: by connection thread.


    nbd_co_establish_connection() can't return new channel immediately
    the request should fail (we are in NBD_CLIENT_CONNECTING_NOWAIT
    state).

6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
    other requests in the caller, so here we just assert that fact.
    Also delay time is now initialized here: we can easily detect first
    attempt and start a timer.

7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
    retries are fully handle by thread (nbd/client-connection.c), delay
    timer we initialize in nbd_reconnect_attempt(), we don't have to
    bother with s->drained and friends. nbd_reconnect_attempt() now
    called from nbd_co_send_request().

A lot going on there, but it's making sense so far.


8. nbd_connection_entry is dropped: reconnect is now handled by
    nbd_co_send_request(), receiving reply is now handled by
    nbd_receive_replies(): all handled from request coroutines.

9. So, welcome new nbd_receive_replies() called from request coroutine,
    that receives reply header instead of nbd_connection_entry().
    Like with sending requests, only one coroutine may receive in a
    moment. So we introduce receive_mutex, which is locked around
    nbd_receive_reply(). It also protects some related fields. Still,
    full audit of thread-safety in nbd driver is a separate task.

It sounds like you're more worried about auditing for which locks are
held longer than necessary, rather than expecting to find cases where
we aren't thread-safe because we are lacking locks?  At any rate, as
this patch is already easier to reason about, I'm okay deferring that
audit to later patches.

I just try to not interduce new not-threadsafe things. But I'm not sure do we 
have a good kind of threadsafety in nbd driver code now.


    New function waits for a reply with specified handle being received
    and works rather simple:

    Under mutex:
      - if current handle is 0, do receive by hand. If another handle
        received - switch to other request coroutine, release mutex and
        yield. Otherwise return success
      - if current handle == requested handle, we are done
      - otherwise, release mutex and yield

10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
     needed. Also waiting in free_sema queue we now wait for one of two
     conditions:
     - connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
     - connectING, in_flight == 0, so we can call
       nbd_reconnect_attempt()
     And this logic is protected by s->send_mutex

     Also, on failure we don't have to care of removed s->connection_co

11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
     s->connection_co we just call new nbd_receive_replies().

12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
     which means that handling of the whole reply is finished. Here we
     need to wake one of coroutines sleeping in nbd_receive_replies().
     If now one sleeps - do nothing. That's another behavior change: we

s/now one sleeps/none are sleeping/

     don't have endless recv() in the idle time. It may be considered as
     a drawback. If so, it may be fixed later.

The case where no coroutine is waiting to receive a reply should only
happen due to a malicious server (a compliant server will only send
replies matching an existing pending request).

With new code, when we don't have any in-flight requests - there are no waiting 
coroutines as well.
And nobody will note if connection silently get lost.

With old code, we always have recv() call called from connection_co, so we 
should catch disconnects (at least some kinds of disconnects or if keep-alive 
is set up properly) sooner and start reconnection procedure.



13. nbd_reply_chunk_iter_receive(): don't care about removed
     connection_co, just ping in_flight waiters.

14. Don't create connection_co, enable retry in the connection thread
     (we don't have own reconnect loop anymore)

15. We need now nbd_co_establish_connection_cancel() call in
     nbd_cancel_in_flight(), to cancel the request that doing connection
     attempt.

s/need now/now need to add a/
s/request that doing/request that is doing a/


[*], ok, now we don't cancel reconnect on drain begin. That's correct:
     reconnect feature leads to possibility of long-running requests (up
     to reconnect delay). Still, drain begin is not a reason to kill
     long requests. We should wait for them.

     This also means, that we can again reproduce a dead-lock, described
     in 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace.
     Why we are OK with it:
     1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
        after reconnect delay. Actually 8c517de24a8a1dc fixed a bug in
        NBD logic, that was not described in 8c517de24a8a1dc and led to
        forever dead-lock. The problem was that nobody woken free_sema

s/woken/woke the/

        queue, but drain_begin can't finish until there is a request in
        free_sema queue. Now we have a reconnect delay timer that works
        well.
     2. It's not a problem of NBD driver, it's a problem of ide code,
        that does drain_begin under global mutex

I agree that this point means that it is appropriate to address that
in a separate patch series.  Hopefully someone tackles it before 6.2,
but even if not, I agree that the worst that happens is that we have a
command stalled waiting on a long timeout, rather than actual
deadlock.

     3. That doesn't reproduce if chose scsi instead of ide.

Seems like this sentence fragment fits better as the tail of 2, rather
than as a separate bullet 3?  I'll reword as:

...it's a problem of the ide code, because it does drain_begin under
the global mutex; the problem doesn't reproduce when using scsi
instead of ide.

Sounds good



Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

[..]


Verdict: I found some typos/grammar fixes, but I think I can touch
those up.  I'm now hammering on the patches in my NBD tree, and
barring any surprises, this will be in my next pull request Monday at
the latest.

Reviewed-by: Eric Blake <eblake@redhat.com>


Thanks a lot for reviewing, and time spent to understand and fix my English )

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]