savannah-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Savannah git gateway down now


From: Bob Proulx
Subject: Re: Savannah git gateway down now
Date: Mon, 3 Feb 2025 12:29:07 -0700

Sorry I did not see this thread until now.

Paul Eggert wrote:
> Can't browse source code via Savannah now. Here are symptoms:

A 502 Bad Gateway is a symptom of the machine being too overloaded to
process the request before the web server times out talking to the CGI
process.  In this case the gitweb process.

The machine is surging in load because the Internet is a hostile place
and people are the problem.  It is getting hit with a large botnet
that has over 3 million addresses.  (Therefore I suspect it is
composed of either compromised security cameras or compromised phones
but no real idea.)  And there is never just one botnet.  There are
actually multiple botnets operating concurrently!  Because people are
the problem.

We are mitigating things as best as we can to shed the abuse load but
still provide member services.

Basically though the problem is that on average everything is fine but
when the thundering herd of the botnets increases then the machine
browns out for a while and we see 502 returns due to the load and so
sometimes it is not usable.  Please be patient and try again after a
while.

Member access through ssh is almost always better than the web http
based protocols because ssh is authenticated and http is anonymous and
anonymity leads to more abuse.

I am scripted in some mitigations which are blocking the botnet
addresses dynamically.  This means that though things surge that this
allows the system to shed that load after a bit of running.

> Browsing via the Savannah web didn't work. The not-working was for some time
> (a few minutes? don't recall exactly), which is why I reported it.

We always appreciate feedback!  And it allows us to communicate and
share what is happening that is causing the problems.

> Now it's failing, with different symptoms:
>
>   $ wget -S -O- 'https://git.savannah.gnu.org/gitweb/?p=gettext.git'
...
> ... and at this point it hangs indefinitely.
...
> So I suspect vcs3 is the culprit somehow.

Yes.  This happened yesterday.  I saw it mostly in real-time because
another user reported it on IRC and I spotted it and was able to jump
onto the problem within a minute.  And then the machine locked up
entirely.  Which prevented me from getting more than the main part of
the problem.

Earlier in the day something caused the vcs3 machine to have wonky
kernel problems (hard to describe and no idea the problem) and to drop
the configured swap partition.  And without swap it started to OOM
Killer things.  Therefore I rolled the git service from vcs3 to vcs2
so I could work on vcs3.  It looked like systemd disabled the custom
swap on service which I know has been enabled because it is a scripted
provisioning.  I enabled it again and rebooted it to verify.  And
noticed that it was trying to set up suspend-resume from swap which is
undesired there so reconfigured that and rebooted again.  All being
good I rolled git from vcs2 back onto vcs3 again.

And that's when more trouble started after a while.  The problem there
was that the botnet block list didn't get loaded after reboot.  I had
_thought_ that I had it working but that detail did not work.  And so
vcs3 was open to the large 3 million strong botnet and eventually it
was overwhelmed and fell prey to it.  I don't think it should have
locked up regardless but eventually it did anyway.

I rolled git back from vcs3 onto vcs2 and let it run the night there
because I did not have more time to deal with it at that moment.

Now you might be asking why not leave it on vcs2?  vcs2 is Trisquel 9
on xfs which is a good combination but it needs to be upgraded to
Trisquel 11 and currently is running with rsync disabled due to the
recent rsync vulnerability.  vcs3 is Trisquel 11 on btrfs and fully
updated and all services including rsync fully upgraded and running.
(Note that I personally am not a fan of btrfs and do not like the
configuration there but that was not my decision to make with it.)
The theory goes that vcs3 is the better system and more new VMs will
have the exact same configuration in the future.  If it can't handle
it then better for us to find out.

I need to spend some time to do various cleaning on vcs2 and then
upgrade it to Trisquel 11.  It's just life and time which is keeping
everything from happening all at once.  And lately for me life has
been a problem.  Having vcs2 to fall back onto when vcs3 suffers has
been a huge good thing and I am a little worried about disturbing the
functionality on vcs2.  But can't stay on Trisquel 9 forever due to
the ongoing security issues.

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]