lilypond-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: testing out Docker CI scripts?


From: David Kastrup
Subject: Re: testing out Docker CI scripts?
Date: Sun, 23 Feb 2020 00:16:54 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

Han-Wen Nienhuys <address@hidden> writes:

> On Sat, Feb 22, 2020 at 10:01 PM David Kastrup <address@hidden> wrote:
>> >> That's oversimplifying the staging->master push process a bit which goes
>> >> to considerable pain to make sure that its own copy of staging is still
>> >> part of the upstream staging branch before pushing the tested result.
>> >> That makes it possible to manually stop a bad staging commit from
>> >> reaching master even if it would compile: once you reset staging, no
>> >> already running Patchy process will override that decision with a
>> >> version of staging that has become stale.
>> >
>> > If there are multiple patchy processes, you'd hope they all come to
>> > the same conclusion.
>>
>> Patchy processes don't reset staging.  Humans do.  But our various
>> patchies run on different platforms with different version libraries.
>> That actually has turned out helpful in discovering portability problems
>> at times.
>
> How many patchies are there, and on what platforms do they run?

At the current point of time, I think that my laptop is the main
workhorse, Dan has another one he runs at times particularly when he has
committed, well, a commit of his own into staging, James does the manual
Patchy runs with visual inspection but runs the staging Patchy now at
most on weekends and at home (he used to have a scheduled Patchy at work
checking for a run every two hours, but a change in office policies
stopped that practice).

> Yes, it's an oversimplification that fits in an email so we can
> discuss next steps. My question is: are there fundamental features
> that you think are missing?

In what we are doing now?  Or in what you have proposed?  Since the
latter apparently is to include oversimplification, I don't see how I
could answer that without actually seeing the full version.

The current version has evolved to do a reasonable job within the
framework of people running it on their personal setup.  A considerable
change of framework would obviously trigger the question again, and the
per-patch Patchy operated by James only obviously has sunk to a state
where fundamental features are not as much missing as having become
inoperative.

>> > 1) we get a reproducible test process, because everyone can use the
>> > same base images.
>>
>> Which makes it less likely that we discover portability problems.  I
>> am not sure what problem you are trying to address here.
>
> It means that if CI sees an error (because it does testing on multiple
> platforms), it is trivial for me to reproduce that error, and fix it
> locally.

The thing with "multiple platforms" is that our testing does not
actually cover multiple platforms.  The serious testing happens after
installers are released.  The most release-critical testing is GUB going
through.  The binaries and installers coming out of GUB never get to see
a single regtest except possibly manually.

I see absolutely no chance that we can change that significantly without
leaving both the free and the affordable tiers of CI services.

> By contrast, today if there is an error (see the Pango problem), we
> have to email back and forth to figure out what is going on.

Yes.  But if every developer tests on the same platform, we will have to
email back and forth with users to figure out what is going on when the
stuff does not blow up on our unified platform code.  We have had that
situation with floating point on Windows (or rather 32bit platforms
generally) just now.  Windows-only problems are really tricky things.
So I am skeptical that a unification of test platforms among developers
will make it easier rather than harder to track down problems among us.

>> > 2) we can test against different configurations (Pango 1.44
>> > vs. 1.36, GUILE 1.8 vs 2.2) simultaneously, which catches problems
>> > like the recent Pango one earlier.
>>
>> That's definitely an advantage against our more haphazard setup now.
>> It does come at the cost of _everyone_ (or the CI system) having to
>> test _all_ pertinent configurations rather than just a personal
>> sampling if it is supposed to increase the covered base.
>
> Nobody _has_ to test all configurations. But one _can_.

If one does, the bill will come up eventually.  For better or worse,
LilyPond is a real pig regarding resource usage for full builds/tests.
Our strategy so far has been working in a spotty manner, and with
volunteers giving their computers significant workouts.

That's not how you would do things in a corporate setting.  But we don't
have a corporate setting.

-- 
David Kastrup
My replies have a tendency to cause friction.  To help mitigating
damage, feel free to forward problematic posts to me adding a subject
like "timeout 1d" (for a suggested timeout of 1 day) or "offensive".



reply via email to

[Prev in Thread] Current Thread [Next in Thread]