On 08.04.21 19:26, Vladimir Sementsov-Ogievskiy wrote:
08.04.2021 20:04, John Snow wrote:
On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:
job-complete command is async. Can we instead just add a boolean like
job->completion_requested, and set it if job-complete called in STANDBY state,
and on job_resume job_complete will be called automatically if this boolean is
true?
job_complete has a synchronous setup, though -- we lose out on a lot of
synchronous error checking in that circumstance.
yes, that's a problem..
I was not able to audit it to determine that it'd be safe to attempt that setup
during a drained section -- I imagine it won't work and will fail, though.
So I thought we'd have to signal completion and run the setup *later*, but what do we do
if we get an error then? Does the entire job fail? Do we emit some new event?
("BLOCK_JOB_COMPLETION_FAILED" ?) Is it recoverable?
Isn't it possible even now, that after successful job-complete job still fails
and we report BLOCK_JOB_COMPLETED with error?
And actually, how much benefit user get from the fact that job-complete may
fail?
We can make job-complete a simple always-success boolean flag setter like
job-pause.
I wanted to say the following:
But job-pause does always succeed, in contrast to block-job-complete.
block-job-complete is more akin to job-finalize, which too is a
synchronous operation.
But when I wrote that last sentence, I asked myself whether what
mirror_complete() does isn’t actually a remnant of what we had to do when we
didn’t have job-finalize yet. Shouldn’t that all be in mirror_exit_common()?
What’s the advantage of opening the backing chain or putting blockers on the
to-replace node in block-job-complete? Aren’t that all graph-changing
operation, basically, i.e. stuff that should be done in job-finalize?
If we move everything to mirror_exit_common(), all that remains to do is
basically set some should_complete flag (could even be part of the Job struct),
and then the whole problem disappears.
Thoughts?