sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sks-devel] SKS intermittently stalls with 100% CPU & rate-limiting


From: Pete Stephenson
Subject: Re: [Sks-devel] SKS intermittently stalls with 100% CPU & rate-limiting
Date: Sun, 17 Jun 2018 20:08:53 -0700
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 6/17/2018 12:59 AM, Paul M Furley wrote:
> Hi Pete,
> 
> On 17/06/18 04:53, Pete Stephenson wrote:
>> Thanks.
>>
>> I then have three more questions:
>>
>> 1. If this issue is affecting my server to the point of it being booted
>> from the pool (since it's stalling near-continuously and can't respond
>> toe queries), why are other servers not being similar affected? There's
>> lots of servers still in the pool.
> 
> I certainly should've been booted from the pool since my server has
> filled up its disk and trashed its database (twice) so it was offline
> all of yesterday.
> 
> I'm bringing it back up with the `set_flags DB_LOG_AUTOREMOVE` setting
> this time which will hopefully save it.

Yeah, I added the same line. There's now just two log files rather than
dozens. Seems to work ok in controlling the disk space usage, but it
doesn't seem to do anything about the spikes in CPU usage,
non-responsiveness, etc.

>> 2. Is there some countermeasure one can use to protect their server? I
>> have LimitRequestBody set to 8000000 (8MB) to prevent blatant abuse, but
>> clearly something is still annoying the server.
> 
> It appears from Rob's previous email that our servers are failing to
> synchronise a 22M key (because of settings like this) which is causing
> sks to continually retry:
> 
> https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00014.html :

The server had been running with no limits on the request body size for
several years without problems. I added that line in the hopes of
controlling things from getting worse. I've since removed it, but it
doesn't seem to have much of an effect.

Is there some way of (a) resolving the problem with this key (e.g.
locally adding it to the server, so it won't keep choking while
retrying) and (b) preventing such issues from occurring in the future
that I can take now?

>> 3. Any suggestions on how to deal with the unreasonably high-speed
>> queries from corporate mail systems? Ideally, they'd run their own
>> server locally to handle their huge amount of queries, but I have no
>> real way of communicating that with them. I'd love to slow down their
>> queries (tarpitting, maybe?) to minimize excess resource consumption
>> while still answering their queries as opposed to just cutting them off
>> once they hit a rate limit.
> 
> Are you sure these users are the cause of your troubles? Or is it this
> constant-retry loop caused by this large key?

I don't know.

Regardless, I do think that the high-volume users are being a bit
unreasonable: SKS queries are relatively "heavy" compared to lightweight
queries like those to DNSbls, so making queries to the SKS pool for each
email sent or received seems excessive, but that may just be me.

Anyway, I've removed the rate limits since they didn't seem to have any
effect on the constant-retry loop or stalling.

> I'd suggest contacting them before rate limiting them, ask them to point
> at the pool or slow down their queries.

I think they already were querying the pool, and just happened to get my
server as part of the rotation. I just sent them an email inquiring
about the number of queries and encouraging them to run their own server
and have it join the pool, but in general I don't have the time or
motivation to contact every potentially abusive user. I'm just curious
if there's any recommended practices for throttling abusive users.

Cheers!
-Pete

-- 
Pete Stephenson



reply via email to

[Prev in Thread] Current Thread [Next in Thread]