[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation
From: |
Cleber Rosa |
Subject: |
Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation and scripts |
Date: |
Fri, 4 Sep 2020 11:10:00 -0400 |
On Fri, Sep 04, 2020 at 09:18:16AM +0100, Daniel P. Berrangé wrote:
> On Thu, Sep 03, 2020 at 08:11:39PM -0400, Cleber Rosa wrote:
> > On Thu, Jul 09, 2020 at 11:30:29AM +0100, Daniel P. Berrangé wrote:
> > > On Wed, Jul 08, 2020 at 10:46:57PM -0400, Cleber Rosa wrote:
> > > > This is a mapping of Peter's "remake-merge-builds" and
> > > > "pull-buildtest" scripts, gone through some updates, adding some build
> > > > option and removing others.
> > > >
> > > > The jobs currently cover the machines that the QEMU project owns, and
> > > > that
> > > > are setup and ready to run jobs:
> > > >
> > > > - Ubuntu 18.04 on S390x
> > > > - Ubuntu 20.04 on aarch64
> > > >
> > > > During the development of this set of jobs, the GitLab CI was tested
> > > > with many other architectures, including ppc64, s390x and aarch64,
> > > > along with the other OSs (not included here):
> > > >
> > > > - Fedora 30
> > > > - FreeBSD 12.1
> > > >
> > > > More information can be found in the documentation itself.
> > > >
> > > > Signed-off-by: Cleber Rosa <crosa@redhat.com>
> > > > ---
> > > > .gitlab-ci.d/gating.yml | 146 +++++++++++++++++
> > >
> > > AFAIK, the jobs in this file just augment what is already defined
> > > in the main .gitlab-ci.yml. Also since we're providing setup info
> > > for other people to configure custom runners, these jobs are usable
> > > for non-gating CI scenarios too.
> > >
> >
> > If you mean that they introduced new jobs, you're right.
> >
> > > IOW, the jobs in this file happen to be usable for gating, but they
> > > are not the only gating jobs, and can be used for non-gating reasons.
> > >
> >
> > Right, I do not doubt these jobs may be useful to other people and on
> > scenarios other than "before merging a patch series".
> >
> > > This is a complicated way of saying that gating.yml is not a desirable
> > > filename, so I'd suggest splitting it in two and having these files
> > > named based on what their contents is, rather than their use case:
> > >
> > > .gitlab-ci.d/runners-s390x.yml
> > > .gitlab-ci.d/runners-aarch64.yml
> > >
> > > The existing jobs in .gitlab-ci.yml could possibly be moved into
> > > a .gitlab-ci.d/runners-shared.yml file for consistency.
> > >
> >
> > Do you imply that every gitlab CI job should be a gating job? And
> > that the same jobs should be used when other people with their own
> > forks? I find this problematic because:
> >
> > * It would trigger pipelines with jobs that, unless every user has the
> > same runners configured, would have unfulfilled jobs that don't have
> > a matching hardware.
>
> Jobs that require a custom runner should not be set to run by default,
> but individual contributors must absolutely be able to opt-in to running
> those jobs simply by registering a runner on their account.
>
Agreed, and that's why they have been put into this diffent "gating"
class here.
> > * It dilutes the idea that those jobs are inherently different with
> > regards to the management of their infrastructure.
>
> I don't really know what yiu mean here, but "Inherantly different"
> does not sound like a desirable property.
>
Organizations and individuals will have responsibility over the
infrastructure they choose to add, which is "inherently different"
from the gitlab shared machines. Not sure there's a way around it.
> > * It destroys the notion of layered testing, for whatever people find
> > that worth it, where a faster turnaround could/would be possible
> > with fewer jobs for every push, and many more jobs before a merge.
>
> The key goal of CI is to reduce the burden on maintainers. The biggest
> cost is if we merge code and failure is noticed after merge. IT is
> still a large cost, however, if Peter only finds a CI failure when he
> attempts the pre-merge test. He has to throw out the pull request
> putting more work on the subsystem maintainer. The subsystem maintainer
> may have to throw it back to the original author.
>
> The ideal scenario that we need to strive towards is that the original
> author has tested their code with 100% coverage of all the CI jobs QMEU
> has defined.
>
I agree... but it's also unrealistic at this point, right? For
instance, do we have s390x boxes to run all of those? Avocado has
been using Travis CI for s390x/ppc64/aarch64, and those are quite
unreliable even with a load many orders of magnitude smaller then the
QEMU project. So, resources are needed to have this flat, 100%
coverage, "ideal scenario" you describe.
> Any time there is a job that is not run by authors, but only by the
> maintainers, we are putting increased burden on the maintainers, so
> must be minimize that.
>
I agree. But if resources are limited, then should the testing scope
be decresead so that it's equalized?
> IOW, layered testing is not desirable as goal. Rather layered testing
> is just a default setup, but we'd encourage contributors to run the
> full set of CI jobs, especially if they are frequent contributors.
> The more they run themselves, the less burden on subsystem maintainers
> and Peter, and thus the better we all scale.
>
We agree on goals, we don't agree on the strategy though.
> > Finally, I find the split by runner architecture you suggested
> > problematic because different organizations may have jobs for the same
> > architecture. I believe that files for different organizations may be
> > a better organization instead. Entries in the MAINTAINERS are one
> > example where the grouping by architecture may not be optimal.
>
> I don't think we should be structuring jobs around organizations. We
> should be defining a set of desired jobs we wish to be able to run.
> Any organization can bring a runner that is capable of running the
> jobs and donate it to the QEMU project for our formal CI runner
> The organization is not defining the job though - QEMU is defining
> the jobs we expect to have used for testing.
>
This was disscussed previously[1].
> This is key because any contributor needs to be able to spin up an
> identical envrionment to replicate any build failures. We don't want
> runners for merge testing that are built as a blackbox by someone.
> That is the single biggest painpoint with Peter's current merge
> jobs - we can't easily replicate Peter's merge env even if we had
> the matching hardware available.
>
With the right automation, such as the playbooks introduced here, any
person with the same hardware should have an environment to replicate
a job and debug and issue.
[1] - https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html
Best regards,
- Cleber.
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
signature.asc
Description: PGP signature
- Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation and scripts, Cleber Rosa, 2020/09/03
- Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation and scripts, Cleber Rosa, 2020/09/03
- Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation and scripts, Cleber Rosa, 2020/09/03
- Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation and scripts, Cleber Rosa, 2020/09/03
- Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation and scripts, Cleber Rosa, 2020/09/03
- Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation and scripts, Philippe Mathieu-Daudé, 2020/09/04