Re: [Gluster-devel] [Feature request]: Regression to take more patches i

On Wed, Jul 31, 2013 at 5:11 AM, Jeff Darcy <address@hidden> wrote:

On 07/31/2013 07:35 AM, Amar Tumballi wrote:

I was trying to fire some regression builds on very minor patches today, and
noticed (always known, but faced pain of 'waiting' today) that we can fire
regression build on only one patch (or a patchset if its submitted with
dependency added while submitting). And each regression run takes approx
30mins.

With this model, we can at max take only ~45 patches in a day, which won't
scale up if we want to grow with more people participating in code
contribution. Would be great to have an option to submit regression run with
multiple patch numbers, (technically they should be applicable one top of
other in any order if not dependent), and it should work fine. That way, we
can handle more review load in future.

Maybe my brain has been baked too much by the sun, but I thought I'd seen cases
where a regression run on a patch with dependencies automatically validated
everything in the stack. Not so? That still places a burden on patch
submitters to make sure dependencies are specified (shouldn't be a problem
since the current tendency is to *over*specify dependencies) and on the person
starting the run to pick the top of the stack, but it does allow us to kill
multiple birds with one stone.

As for scaling, isn't the basic solution to add more worker machines? That
would multiply the daily throughput by the number of workers, and decrease
latency for simultaneously submitted runs proportionally.

The flip side of having too many patches regression-tested in parallel is that, since the regression test applies the patch in question on top of the current git HEAD _at the time of test execution_, we lose out on testing the "combined effect" of those multiple patches. This can result in master branch being in broken state even though every patch is tested (in isolation). And the breakage will be visible much later - when an unrelated patch is tested after the patches get (successfully tested and) merged independently. This has happened before too, even with the current "test one patch at a time" model. E.g:

1 - Patch A is tested [success]

2 - Patch B is tested [success]

3 - Patch A is merged

4 - Patch B is merged

5 - Patch C is tested [failure, because combined effect of A + B is tested only now]

The serial nature of today's testing limits such delays to some extent, as tested patches keep getting merged before regression test of new patches start. Parallelizing tests too much could potentially increase this "danger window".

On the other hand, to guarantee master is never broken, test + merge must be a strictly serial operation (i.e do not even start new regression job until the previous patch is tested and merged). That is even worse, for sure.

In the end we probably need a combination of the two strategies

- Ability to test multiple patches at the same time (solves regression throughput to some extent and increases "integrated testing" of patches for their combined effect.

- Ability to run tests in parallel (of the patch sets) where testing patch sets can be formed such that the two groups are really independent and there is very less chance of their combined effect to result in a regression (e.g one patch set for a bunch of patches in glusterd and another patch set for a bunch of patches in data path).

Avati

From:	Anand Avati
Subject:	Re: [Gluster-devel] [Feature request]: Regression to take more patches in single instance
Date:	Thu, 1 Aug 2013 20:09:36 -0700