qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 12/13] block: Don't poll in bdrv_replace_child_noperm()


From: Hanna Reitz
Subject: Re: [PATCH 12/13] block: Don't poll in bdrv_replace_child_noperm()
Date: Mon, 14 Nov 2022 21:22:32 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0

On 08.11.22 13:37, Kevin Wolf wrote:
In order to make sure that bdrv_replace_child_noperm() doesn't have to
poll any more, get rid of the bdrv_parent_drained_begin_single() call.

This is possible now because we can require that the child is already
drained when the function is called (it better be, having in-flight
requests while modifying the graph isn't going to end well!) and we
don't call the parent drain callbacks more than once.

The additional drain calls needed in callers cause the test case to run
its code in the drain handler too early (bdrv_attach_child() drains
now), so modify it to only enable the code after the test setup has
completed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
  include/block/block-io.h     |  8 ++++
  block.c                      | 72 +++++++++++++++++++++++++-----------
  block/io.c                   |  2 +-
  tests/unit/test-bdrv-drain.c | 10 +++++
  4 files changed, 70 insertions(+), 22 deletions(-)

I find this change complicated.  I understand it’s the point of the series, but I find it difficult to grasp.  But I guess there can be no drain series without such a patch.

As usual, I was very skeptical of the code at first, and over time slowly realized that I’m mostly confused by the comments, and the code seems fine.  Ah, well.

[...]

diff --git a/block.c b/block.c
index 5f5f79cd16..12039e9b8a 100644
--- a/block.c
+++ b/block.c

[...]

@@ -2414,12 +2428,20 @@ static TransactionActionDrv bdrv_replace_child_drv = {
   *
   * Note: real unref of old_bs is done only on commit.
   *
+ * Both child and new_bs (if non-NULL) must be drained. new_bs must be kept
+ * drained until the transaction is completed (this automatically implies that
+ * child remains drained, too).

I find “child” extremely ambiguous.  The problem is that there generally is no way to drain a BdrvChild object, is there?  You can only drain the BDS in it, which then drains the parent through the BdrvChild object.  Historically, I don’t think there was ever a place where we cared about the BdrvChild object between the two to be drained, was there?  I mean, now there apparently is, in bdrv_child_attach_common(), but that’s a different story.

So the problem is that “draining a BdrvChild object” generally appears in the context of bdrv_parent_drained_*() functions, i.e. actually functions draining the parent.  Which makes it a bit confusing to refer to a BdrvChild object just as “child”.

I know that “child” here refers to the variable (or does it not?), but that is why I really prefer marking variables that are just plain English words, e.g. as @child or `child`, so it’s clear they are a name and not a noun.

In any case, because the concept is generally to drain the `child->bs` instead of the BdrvChild object directly, I understand the comment to mean: “Both the old child (`child->bs`) and `new_bs` (if non-NULL) must be drained.  `new_bs` must be kept drained until the transaction is completed.  This implies that the parent too will be kept drained until the transaction is completed by the BdrvChild object `child`.”

Or am I misunderstanding something, and the distinction between `child` and `child->bs` and the parent node is important here? (Would be good to know. :))

+ *
   * The function doesn't update permissions, caller is responsible for this.
   */
  static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState 
*new_bs,
                                      Transaction *tran)
  {
      BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
+
+    assert(child->parent_quiesce_counter);
+    assert(!new_bs || new_bs->quiesce_counter);
+
      *s = (BdrvReplaceChildState) {
          .child = child,
          .old_bs = child->bs,
@@ -2818,6 +2840,12 @@ static void bdrv_replace_child_noperm(BdrvChild *child,

This function now has its callers fulfill kind of a complicated contract.  I would prefer that to be written out in a doc comment, especially because it sounds like the assertions can’t cover everything (i.e. callers that remove a child are required to have stopped issuing requests to that child, but they are free to do that in any way they want, so no assertion will check for it here).

      int new_bs_quiesce_counter;
assert(!child->frozen);
+    /*
+     * When removing the child, it's the callers responsibility to make sure
+     * that no requests are in flight any more. Usually the parent is drained,
+     * but not through child->parent_quiesce_counter.
+     */

When I see a comment above an assertion, I immediately assume it is going to describe what the assertion checks.  Unless I’m misunderstanding something (again?), this comment actually describes what the assertion *does not* check.  I find that confusing, especially because the comment leads with “it’s the caller’s responsibility”, which to me implies “and that’s why we check it here in this assertion”, because assertions are there to verify that contracts are met.

The assertion verifies that the parent must be drained (through @child), unless the child is removed, which case isn’t covered by the assertion.  That “isn’t covered” is then described by the comment, right?

I’d prefer the comment to lead with describing what the assertion does check, and then transitioning to “But in case the child is removed, we ignore that, and just note that it’s the caller’s responsibility to...”.

Also, the comment doesn’t explicitly say why we don’t check it in the assertion.  It says “usually” and “child->parent_quiesce_counter”, which implies “can’t get any information from child->parent_quiesce_counter, and regardless, callers can do what they want do achieve quiescing in regards to this child, so there’s nothing we can check”.  It feels like we can just say outright that there’s an informal contract that we can’t formally verify here, but callers naturally still must adhere to it.  It would be interesting to know (and note) why that is, though, i.e. why we can’t have parents be drained through the BdrvChild object for the child that is being removed.

I understand the intention behind the assertion to be: “We require the parent not to have in-flight requests to the BdrvChild object manipulated here.  In most cases, we verify that by requiring the parent be drained through this BdrvChild object.  However, when a child is being removed, we skip formal verification, because we leave callers free in deciding how to ensure that no requests are in flight.  Usually, they will still have the parent be drained (even if not through this BdrvChild object), but we don’t require that.”

I may well be wrong, but then it would be good for a comment to correct me. :)

(Interestingly, because bdrv_replace_child_noperm() no longer polls itself, it can’t know for sure that `child->parent_quiesce_counter > 0` means that there are no requests in flight.)

+    assert(!new_bs || child->parent_quiesce_counter);
      assert(old_bs != new_bs);
      GLOBAL_STATE_CODE();

[...]

@@ -2865,9 +2875,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
      /*
       * If the old child node was drained but the new one is not, allow

This now also covers the case where there was no old child node, but the parent was simply drained via an empty BdrvChild by the caller.

       * requests to come in only after the new node has been attached.
-     *
-     * Update new_bs_quiesce_counter because bdrv_parent_drained_begin_single()
-     * polls, which could have changed the value.
       */
      new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
      if (!new_bs_quiesce_counter && child->parent_quiesce_counter) {
@@ -3004,6 +3011,12 @@ static BdrvChild 
*bdrv_attach_child_common(BlockDriverState *child_bs,
      }
bdrv_ref(child_bs);
+    /*
+     * Let every new BdrvChild start drained, inserting it in the graph with
+     * bdrv_replace_child_noperm() will undrain it if the child node is not
+     * drained. The child was only just created, so polling is not necessary.

I feel like this is hiding some complexity.  Unless I missed something, draining a BdrvChild always meant draining the parent. But here, it absolutely does not mean that, and maybe that deserves a big warning sign?

Beginning a drain without poll means quiescing.  You assert that there can be no requests to the new child, which I agree on[1].  The combination of no new requests coming in, and no requests being there at this point is what being drained means.  So @new_child is indeed “drained”.

But the parent isn’t drained, because it isn’t polled.  There may still be requests in flight to its other children.  That’s really interesting, and I found it extremely confusing until I wrote ten paragraphs in reply here and scrapped most of them again.  Whenever I find this to be my reaction to something, I really wish for a detailed comment that explains the situation.

I would like the comment to:
- Expand on what “only just created” means.  As it’s written, that could mean relying on a race condition.  At which point would the parent be able to send requests?  (I assume either the .attach() in bdrv_replace_child_noperm(), or when this function returns, whichever comes first.  (The former always comes first.)) - Say in more detail that calling bdrv_parent_drained_begin_single() without polling will quiesce the parent, preventing new requests from appearing. - Note that because there are no requests in flight, and because no new requests can then appear, the BdrvChild is drained. - Note that the parent is only quiesced, not drained, and may still have requests in flight to other children, but naturally we don’t care about them.

I feel like the comment tries to hide all that complexity simply by avoiding the word “parent”.

[1] As far as I can piece together, no requests to the new child can have started yet, because this function creates the BdrvChild object, so before it is returned to the caller (or BdrvChildClass.attach() is called in bdrv_replace_child_noperm()), the block driver won’t generate requests to it.

Hanna

+     */
+    bdrv_parent_drained_begin_single(new_child, false);
      bdrv_replace_child_noperm(new_child, child_bs);
BdrvAttachChildCommonState *s = g_new(BdrvAttachChildCommonState, 1);




reply via email to

[Prev in Thread] Current Thread [Next in Thread]