qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10 11/21] jobs: group together API calls under the same job


From: Kevin Wolf
Subject: Re: [PATCH v10 11/21] jobs: group together API calls under the same job lock
Date: Wed, 17 Aug 2022 10:46:54 +0200

Am 16.08.2022 um 16:54 hat Emanuele Giuseppe Esposito geschrieben:
> Am 04/08/2022 um 19:10 schrieb Kevin Wolf:
> > Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
> >> Now that the API offers also _locked() functions, take advantage
> >> of it and give also the caller control to take the lock and call
> >> _locked functions.
> >>
> >> This makes sense especially when we have for loops, because it
> >> makes no sense to have:
> >>
> >> for(job = job_next(); ...)
> >>
> >> where each job_next() takes the lock internally.
> >> Instead we want
> >>
> >> JOB_LOCK_GUARD();
> >> for(job = job_next_locked(); ...)
> >>
> >> In addition, protect also direct field accesses, by either creating a
> >> new critical section or widening the existing ones.
> > 
> > "In addition" sounds like it should be a separate patch. I was indeed
> > surprised when after a few for loops where you just pulled the existing
> > locking up a bit, I saw some hunks that add completely new locking.
> 
> Would it be okay if we don't split it in two? There would be two
> microscopical patches.

If it would be just a hunk or two, fair enough.

> >> Note: at this stage, job_{lock/unlock} and job lock guard macros
> >> are *nop*.
> >>
> >> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> >> ---
> >>  block.c            | 17 ++++++++++-------
> >>  blockdev.c         | 12 +++++++++---
> >>  blockjob.c         | 35 ++++++++++++++++++++++-------------
> >>  job-qmp.c          |  4 +++-
> >>  job.c              |  7 +++++--
> >>  monitor/qmp-cmds.c |  7 +++++--
> >>  qemu-img.c         | 37 +++++++++++++++++++++----------------
> >>  7 files changed, 75 insertions(+), 44 deletions(-)
> >>
> >> diff --git a/block.c b/block.c
> >> index 2c00dddd80..7559965dbc 100644
> >> --- a/block.c
> >> +++ b/block.c
> >> @@ -4978,8 +4978,8 @@ static void bdrv_close(BlockDriverState *bs)
> >>  
> >>  void bdrv_close_all(void)
> >>  {
> >> -    assert(job_next(NULL) == NULL);
> >>      GLOBAL_STATE_CODE();
> >> +    assert(job_next(NULL) == NULL);
> >>  
> >>      /* Drop references from requests still in flight, such as canceled 
> >> block
> >>       * jobs whose AIO context has not been polled yet */
> >> @@ -6165,13 +6165,16 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error 
> >> **errp)
> >>          }
> >>      }
> >>  
> >> -    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> >> -        GSList *el;
> >> +    WITH_JOB_LOCK_GUARD() {
> >> +        for (job = block_job_next_locked(NULL); job;
> >> +             job = block_job_next_locked(job)) {
> >> +            GSList *el;
> >>  
> >> -        xdbg_graph_add_node(gr, job, 
> >> X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
> >> -                           job->job.id);
> >> -        for (el = job->nodes; el; el = el->next) {
> >> -            xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
> >> +            xdbg_graph_add_node(gr, job, 
> >> X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
> >> +                                job->job.id);
> >> +            for (el = job->nodes; el; el = el->next) {
> >> +                xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
> >> +            }
> >>          }
> >>      }
> >>  
> >> diff --git a/blockdev.c b/blockdev.c
> >> index 71f793c4ab..5b79093155 100644
> >> --- a/blockdev.c
> >> +++ b/blockdev.c
> >> @@ -150,12 +150,15 @@ void blockdev_mark_auto_del(BlockBackend *blk)
> >>          return;
> >>      }
> >>  
> >> -    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> >> +    JOB_LOCK_GUARD();
> >> +
> >> +    for (job = block_job_next_locked(NULL); job;
> >> +         job = block_job_next_locked(job)) {
> >>          if (block_job_has_bdrv(job, blk_bs(blk))) {
> > 
> > Should this be renamed to block_job_has_bdrv_locked() now?
> > 
> > It looks to me like it does need the locking. (Which actually makes
> > this patch a fix and not just an optimisation as the commit message
> > suggests.)
> 
> Nope, as GSList *nodes; is always read and written under BQL.

Ah, right. I wonder if we should later take the lock anyway even for
fields where it's not strictly necessary to simplify the locking rules.
Having to check the rules for each field separately is kind of hard. But
let's do only the necessary things in this series.

> > 
> >>              AioContext *aio_context = job->job.aio_context;
> >>              aio_context_acquire(aio_context);
> >>  
> >> -            job_cancel(&job->job, false);
> >> +            job_cancel_locked(&job->job, false);
> >>  
> >>              aio_context_release(aio_context);
> >>          }
> >> @@ -3745,7 +3748,10 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
> >>      BlockJobInfoList *head = NULL, **tail = &head;
> >>      BlockJob *job;
> >>  
> >> -    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> >> +    JOB_LOCK_GUARD();
> >> +
> >> +    for (job = block_job_next_locked(NULL); job;
> >> +         job = block_job_next_locked(job)) {
> >>          BlockJobInfo *value;
> >>          AioContext *aio_context;
> > 
> > More context:
> > 
> >         BlockJobInfo *value;
> >         AioContext *aio_context;
> > 
> >         if (block_job_is_internal(job)) {
> >             continue;
> >         }
> >         aio_context = block_job_get_aio_context(job);
> >         aio_context_acquire(aio_context);
> >         value = block_job_query(job, errp);
> >         aio_context_release(aio_context);
> > 
> > This should become block_job_query_locked(). (You do that in patch 18,
> > but it looks a bit out of place there - which is precisely because it
> > really belongs in this one.)
> 
> Ok
> > 
> >> diff --git a/blockjob.c b/blockjob.c
> >> index 0d59aba439..96fb9d9f73 100644
> >> --- a/blockjob.c
> >> +++ b/blockjob.c
> >> @@ -111,8 +111,10 @@ static bool child_job_drained_poll(BdrvChild *c)
> >>      /* An inactive or completed job doesn't have any pending requests. 
> >> Jobs
> >>       * with !job->busy are either already paused or have a pause point 
> >> after
> >>       * being reentered, so no job driver code will run before they pause. 
> >> */
> >> -    if (!job->busy || job_is_completed(job)) {
> >> -        return false;
> >> +    WITH_JOB_LOCK_GUARD() {
> >> +        if (!job->busy || job_is_completed_locked(job)) {
> >> +            return false;
> >> +        }
> >>      }
> >>  
> >>      /* Otherwise, assume that it isn't fully stopped yet, but allow the 
> >> job to
> > 
> > Assuming that the job status can actually change, don't we need the
> > locking for the rest of the function, too? Otherwise we might call
> > drv->drained_poll() for a job that has already paused or completed.
> > 
> > Of course, this goes against the assumption that all callbacks are
> > called without holding the job lock. Maybe it's not a good assumption.
> > 
> >> @@ -475,13 +477,15 @@ void *block_job_create(const char *job_id, const 
> >> BlockJobDriver *driver,
> >>      job->ready_notifier.notify = block_job_event_ready;
> >>      job->idle_notifier.notify = block_job_on_idle;
> >>  
> >> -    notifier_list_add(&job->job.on_finalize_cancelled,
> >> -                      &job->finalize_cancelled_notifier);
> >> -    notifier_list_add(&job->job.on_finalize_completed,
> >> -                      &job->finalize_completed_notifier);
> >> -    notifier_list_add(&job->job.on_pending, &job->pending_notifier);
> >> -    notifier_list_add(&job->job.on_ready, &job->ready_notifier);
> >> -    notifier_list_add(&job->job.on_idle, &job->idle_notifier);
> >> +    WITH_JOB_LOCK_GUARD() {
> >> +        notifier_list_add(&job->job.on_finalize_cancelled,
> >> +                          &job->finalize_cancelled_notifier);
> >> +        notifier_list_add(&job->job.on_finalize_completed,
> >> +                          &job->finalize_completed_notifier);
> >> +        notifier_list_add(&job->job.on_pending, &job->pending_notifier);
> >> +        notifier_list_add(&job->job.on_ready, &job->ready_notifier);
> >> +        notifier_list_add(&job->job.on_idle, &job->idle_notifier);
> >> +    }
> >>  
> >>      error_setg(&job->blocker, "block device is in use by block job: %s",
> >>                 job_type_str(&job->job));
> > 
> > Why is this the right scope for the lock? It looks very arbitrary to
> > lock only here, but not for the assignments above or the function calls
> > below.
> > 
> > Given that job_create() already puts the job in the job_list so it
> > becomes visible for other code, should we not keep the job lock from the
> > moment that we create the job until it is fully initialised?
> 
> I try to protect only what needs protection, nothing more. Otherwise
> then it is not clear what are we protecting and why. According to the
> split I made in job.h, things like job_type_str and whatever I did not
> protect are not protected because they don't need the lock.

I think the last paragraph above explains what it would protect?

Having a half-initialised job in the job list without holding the lock
sounds dangerous to me. Maybe it's actually okay in practice because
this is GLOBAL_STATE_CODE() and we can assume that code accessing
the job list outside of the main thread probably skips over the
half-initialised job, but it's another case where relying on the BQL is
confusing when there would be a more specific lock for it.

> > 
> >> @@ -558,10 +562,15 @@ BlockErrorAction block_job_error_action(BlockJob 
> >> *job, BlockdevOnError on_err,
> >>                                          action);
> >>      }
> >>      if (action == BLOCK_ERROR_ACTION_STOP) {
> >> -        if (!job->job.user_paused) {
> >> -            job_pause(&job->job);
> >> -            /* make the pause user visible, which will be resumed from 
> >> QMP. */
> >> -            job->job.user_paused = true;
> >> +        WITH_JOB_LOCK_GUARD() {
> >> +            if (!job->job.user_paused) {
> >> +                job_pause_locked(&job->job);
> >> +                /*
> >> +                 * make the pause user visible, which will be
> >> +                 * resumed from QMP.
> >> +                 */
> >> +                job->job.user_paused = true;
> >> +            }
> >>          }
> >>          block_job_iostatus_set_err(job, error);
> > 
> > Why is this call not in the critical section? It accesses job->iostatus.
> 
> But the blockjob is not yet "classified". Comes after.

Ok.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]