savannah-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers] How to fight spam on GNU mailing lists


From: Milan Zamazal
Subject: [Savannah-hackers] How to fight spam on GNU mailing lists
Date: 26 Apr 2002 10:43:22 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

[I am sorry, I don't know what's the proper place to discuss the issues
below.  Please resend it there and CC me.  Thank you.]

[This message is long, but I hope some of you will read it, since I'm
trying to *solve* a *real* problem.]

The excessive number of spam messages coming to GNU mailing lists
nowadays is a real problem.  I had to start to fight it on GNU GNATS
mailing lists for two reasons:

- Users are annoyed by the spam and sometimes even unsubscribe from the
  spammed mailing lists.  The first wrong thing with it is that people
  are unnecessarily annoyed and the second one that we lose
  contributors.  A particular example is the bug-gnats mailing list with
  the `spam : legitimate messages' ratio being about 10:1 before I
  started to fight spam there.  Several people have unsubscribed
  themselves from there because of the spam, which lowers chances to
  properly diagnose and fix reported bugs.

- Mailing list archives contain a lot (or even mostly) spam.  This makes
  the archives less usable or nearly unusable and additionally the
  archives serve as a permanent marketing place for the spammers.  Both
  are very wrong things damaging the GNU project.

So we have a real problem and we should try to solve it.  I'm no way
calling for cheap solutions, like limiting posting only to subscribers,
that damage users or waste administrators' resources in a different way,
thus bringing nothing good.  I'd like to explain what I do with GNU
GNATS mailing lists quite successfully and what could be IMHO done on a
wider basis with much lower `man work : positive result' ratio.

The idea is very simple: If a message looks like spam, let's delay it
for moderation.  A surprisingly small set of rules can give quite
satisfactory results.  Particularly:

- GNU Mailman already offers delaying mails with implicit destination
  (the mailing list address is not mentioned in To nor CC).

- A lot of spam comes with subjects containing many non-ASCII characters
  (properly encoded subjects shouldn't contain any non-ASCII characters
  at all) or in exotic encodings (subject in Korean encodings coming to
  an English list is suspicious).

- There are other typical spam subjects, like a number at the end of the
  subject separated by many spaces from the rest of the subject.

- Some spam messages are identified in headers either by themselves
  (spamming software identifier) or by external means (X-RBL-Warning).

- I also delay messages coming from hotmail.com or yahoo.com, those
  sites are unable to stop being spam sources.  This rule is
  controversial, but it applies well in the particular case of the GNU
  GNATS lists.

You may object that applying such rules is a screening of users,
unacceptable in the GNU project.  But it is not, for the following
reasons:

- No legitimate mail is thrown away.  It only has to wait for moderation
  and will pass later.

- If someone can't write his message properly, he can't complain the
  message is delayed a bit.

Another objection may be that such an approach requires human
intervention, so a volunteer is needed.  I've found a volunteer for the
GNU GNATS mailing lists, within the quite small set of the people
interested in GNU GNATS.  I think that if we fight spam on a wider
basis, then:

- I guess not much more man power will be needed.  Ideally, a single
  volunteer with not much more work than the one who helps GNU GNATS
  might do the job.

- The wider basis may help to better identify what is spam and what is
  not, thus easing the moderator's work.

BTW, please note that much of the moderator's work is currently consumed
by the primitive moderating interface in GNU Mailman, especially
unsuitable for us poor dial-up users.  I've heard that some newer
version will improve in this area, I hope it gets installed on gnu.org
once it is available.

So I propose to set up or implement a system similar to the one I use
with GNU GNATS for all the gnu.org mailing lists that *wish* to join it.
I think that:

- A single moderator could kill with a single action spams coming to all
  the participating mailing lists.  Also a cooperative model should be
  considered -- if I kill a spam in the GNU GNATS moderation queue, then
  it gets killed in all participating mailing list queues.

- Since spams typically terribly cross-post, they can be identified
  better by a sharing system.  If a message *looking like a spam* comes
  to many gnu.org mailing list queues in a certain time interval, then
  it will be queued for global moderation, otherwise it will be passed
  to local moderation.  Thus the global moderator mostly kills spam,
  while the particular mailing list moderators mostly pass legitimate
  messages.

- If something is killed as spam somewhere, it will be killed as such
  automatically if it arrives later to another mailing lists, based on a
  check sum or message id.

- We need a way to remove spam messages, that are not delayed by the
  filter and later killed by moderators, from mailing list archives.
  There is currently no such manageable possibility.

I'm aware that at least one cooperative antispam system exists as a free
software, but I don't know any details.  Maybe something existent can be
used, maybe a new system would have to be implemented.

Please note I am unable to participate in a particular set up or
implementation work, I can help only in a limited way.  I hope my
message is constructive enough otherwise, so you'll forgive me that.
People seem to be annoyed by spam on gnu.org mailing lists enough to
volunteer in this area, *once we agree on a proper solution*.

Thank you for your attention.

Milan Zamazal



reply via email to

[Prev in Thread] Current Thread [Next in Thread]