[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Ifile-discuss] Large spam-only .idata file available
From: |
Karl Vogel |
Subject: |
Re: [Ifile-discuss] Large spam-only .idata file available |
Date: |
15 May 2003 00:46:26 -0400 |
>> On 04 May 2003 23:03:48 -0400,
>> "Jonadab the Unsightly One" <address@hidden> said:
J> "Karl Vogel" <address@hidden> writes:
>> The only real change is the number of spam messages used to generate
>> the spam-only .idata file mentioned in the page. I included Bruce
>> Guenter's collection (45,000 messages) and saw an immediate
>> improvement in my mailbox.
J> Is 45 thousand enough to give solid results, or would it be helpful to
J> have an additional twenty-six-thousand-message spam collection?
A few thousand more is always good...
J> I currently have 15302 messages in spam.general, 4298 in
J> spam.filtered.charset.chinese.gb2312, 1793 in
J> spam.filtered.charset.euc_kr, and 5084 in spam.filtered.charset.ks_c_
I don't need any of the charset stuff; any incoming message with a given
percentage of 8-bit characters is automatically filed as spam.
J> I'm using the nnml storage backend, which means each message is a file
J> and each folder/group is a directory, so I could tar or zip the whole
J> spam heirarchy up pretty easily.
Feel free to tar up the spam.general collection and send it on.
I don't know how snotty my ISP is about quotas; could you do the old
tar - gzip - base64/uuencode routine?
Thanks.
--
Karl Vogel I don't speak for the USAF or my company
address@hidden http://www.pobox.com/~vogelke
Mary had a little lamb,
that walked into a pylon,
10,000 volts went up it's arse,
And turned its wool to nylon. --updated nursery rhymes