savannah-hackers-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Savannah-hackers-public] Emacs git repository clone limits


From: Philippe Vaucher
Subject: Re: [Savannah-hackers-public] Emacs git repository clone limits
Date: Wed, 27 May 2020 23:52:18 +0200

Thanks for your answers!

> > You can see one of the builds here:
> > https://gitlab.com/Silex777/docker-emacs/pipelines/148568952
>
> Wow!  I see 40 different builds there.  Which is totally awesome btw.

Yeah, I meant "one of the builds of the 40 images that you see on
https://hub.docker.com/r/silex/emacs"; :-)

> > Is there a limit and/or maintainance going? Was I but in some sort of
> > throttle-list?
>
> We do have rate limits in place.  Because otherwise the general
> background radiation activity of the Internet will break things.
>
> However nothing has changed recently in regard to the rate limits for
> a long time.  As I look at the logs the last rate limit change was Sat
> Dec 7 06:31:50 2019 -0500 which seems long enough ago that it isn't
> anything recent.  Meaning that this is probably simply just you
> competing with other users on the Internet for resources.

Until recently I only did up to ~8 concurrent git clones, but recently
with infrastructure changes I'm able to do much more.

> Cloning with the https transport that uses git-http-backend for the
> backend.  We are using Nginx rate limiting.  Which you can read about
> how the algorithm works here.  It is basically a smoothing process.

Until recently I was cloning git://git.sv.gnu.org/emacs.git, the
switch to https is an attempt at working around the limitation I hit
recently. My train of thought was that http:// is easier to scale than
git://, if you say otherwise I can revert back to git:// clones.

> How are you doing the clone?  Is it a fresh clone into an empty
> directory?  Because that would be the worst case.  In that worst case
> scenario it would need to transport 100% of the repository every time.
> That's pretty heavy.  The Emacs git repository is about 359MB in
> size.  Seeding that with the previous version and updating it would
> make that transfer much more efficient.

You can see how the whole image is built here (26.3 example):
https://gitlab.com/Silex777/docker-emacs/-/blob/master/26.3/ubuntu/18.04/dev/Dockerfile

I do your worst case scenario except I limit it with `--depth 1` (git
clone --depth 1 --branch $EMACS_BRANCH $EMACS_REPOSITORY /opt/emacs)

> Whenever I have set up continuous integration builds I always set up a
> local mirror of all remote repositories.  I poll with a cronjob to
> refresh those repositories.  Since the update is incremental it is
> pretty efficient for keeping the local mirrors up to date.  Then all
> of the continuous integration builds pull from the local mirror.
>
> This has a pretty good result in that the LAN is very robust and only
> shows infrastructure failures when there is something really
> catastrophic happening on the local network.  Since 100% of everything
> is local in that case.
>
> Transient WAN glitches such as routing problems or server brownouts
> happen periodically and are unavoidable but then will catch up the
> local mirror image with the next cronjob refresh.  Can refresh on a
> pretty quick cycle.  I would do every four hours for example.  Plus
> those errors are much more understandable as a network issue separate
> from the CI build errors.  It's a good way to protect the CI builds
> and redunce noise from them to a minimum.

It's what I actually used back in the days, the Dockerfile didn't
clone the repository but did copy the already checked-out repository
inside the image. That has all the advantages you cited, but cloning
straight from your repository makes my images more trustworthy because
the user sees that nothing fishy is going on. Also he can just take my
Dockerfile and build it directly without having to clone something
locally first.

To be honest I think my realistic alternatives here are to find the
right clone limit (4? 8? 20? depending on the hour of the day) and use
one which is reasonable in terms of time it takes to build and abuse
of your servers. The images are usually only built once per day, and
because it's all cached they are only built when the base image
changes, which is like once per month. So most of the time I do *not*
clone anything from your repositories... that's when I'd like all the
images building in parralel, but when suddenly each of the images
requires a clone then that's where I'd like at most 2 images building
simultenaously to ensure it works.

I could also switch to the github mirror
(https://github.com/emacs-mirror/emacs), because I expect github to
have so many resources that I can clone from them like crazy. But it
feels a bit wrong, cloning from the official repo sounds better and
more trustworthy. I'll probably go the "limit to N clones route",
right now it's limited at ~16 (4 concurrent jobs, each of them
building for 4 architectures).

I just had this thought that maybe I could play man-in-the-middle with
/etc/hosts and make-believe git.sv.gnu.org is a local repository, and
once per day I sync that local repo with the real one. That was the
dockerfile would appear as cloning the real repo yet caching would be
done.

> Do the failures are are seeing have a periodic time of day cycle when
> they are more likely to happen in the middle of the US nighttime?  If
> so then that is probably related.

What do you reckon would the best schedule?

Kind regards,
Philippe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]