info-gnus-english
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnus-parameters


From: reader
Subject: Re: gnus-parameters
Date: Sun, 16 Dec 2007 16:40:50 -0600
User-agent: Gnus/5.110007 (No Gnus v0.7) Emacs/22.1.50 (gnu/linux)

Reiner Steib <reinersteib+gmane@imap.cc> writes:

> On Sun, Dec 16 2007, reader@newsguy.com wrote:
>
>>  (setq gnus-parameters
>>        '(("[Ss]pam[0-9]*$\\|_ex$"
>>           (total-expire . t)
>>           (expiry-wait . 18))
>>          (gnus-visible-headers
>>            ("\.bk$"
>>             ("^X-Spam-Report:.*_FARAWAY_")))))
>
> Please describe *what* you are trying to achieve.

I guess I thought since I was attempting to add gnus-visible-headers
to gnus-parameters it would be kind of clear I wanted to make a header
visible by way of gnus parameters which is largely based on regex.

But I can see now since I had it completely inside out and backwards how
it would be pretty confusing.

But anyway, you made a good and accurate guess ... thanks.
I only wanted the X-Spam-Reports that contain hits on _FARAWAY_ but that
is an easy refinement now you've showed the main code.  And thanks again.

Since there is little traffic on this sunday afternoon (here) I guess
I will bore you to death with some commentary about _FARAWAY_

PS -

_FARAWAY_ is part of the string spamassassin inserts if one tells it
to look for messages not in their native language (in my case
english).

I've wondered from time to time how spamassassin accomplishes that.
Charset declarations headers or `Received' headers with addresses
ending in non-english speaking country designations like \.kr \.cn
etc.  Is how most of it gets done but I've noticed that not all
messages have those things in them.  Yet spamassassin still mostly
gets it right.

I've found quite a few messages with the _FARAWAY_ tag but I could
find no evidence to base it on... but it was correct.  Or at least was
in a non-english language.

The reason I'm curious is that at one time I spent a few days trying
to create procmail recipes that did that very thing.  The one that was
most reliable (below wrapped for mail) gets very few if any false
positives:

* ^(Subject|Message-ID|From|Received):.*(@|\.)[a-z0-9][a-z0-9]*\.
     (ar|br|cl|ch|cn|co|cz|de|hu|it|jp|kr|mx|pe|pl|ro|ru|th|tr|tw|ua|uk|
     [a-z][a-z])[^a-z0-9.@]

I add a new country every once in a while.  But like I said, not all
non-english mail has that kind of handy string or a non-enghish
character set header in it.

For those, I tried to create a few Subject based regular expression
character sets that included common letters in some non-english
language that never occur in english, but finding common letters when
you know nothing at all about a language is very time consuming.

Now I happily let spamassassin do the chore... but I doubt it uses
that method.

I still do the heavy lifting with that regex based on country
designation trying to keep down the high cpu cost of spamassassin by
handling most of the non-english stuff before it gets that far.

If I knew more about how SA does it, I might try to get still more of
it with procmail.  I asked on the SA list twice but was shunned both
times.. Apparently they felt it was either obvious or well documented.

Neither seemed true to me.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]