[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Ifile-discuss] Re: html tag stripping
From: |
clemens fischer |
Subject: |
[Ifile-discuss] Re: html tag stripping |
Date: |
25 Jun 2003 22:48:17 +0200 |
User-agent: |
Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3 (berkeley-unix) |
* David Bushong:
> Yo<kc34sma21py2>uve rea<khuyowp1wuizl>d about them in the
> P<ks4nj3w258mkq1>apers....
>
> (If you're reading this list in HTML, try turning it off). Basically, this
> completely ruins ifile's effectiveness. However a simple addition to the
> word tokenizer to skip anything between matched <>'s would completely avoid
> this problem (as well as stop making "font", "color", etc. my most popular
> words, spam or otherwise).
you've got my vote, because it's simple. then again, people who use
ifile for something else then spam-filtering may not like it. i think
all i've seen in ifile development has never deminuished applicability
to text messages, be they meail or usenet, but many have been attempts
to let it un-base64 MIME parts or whatnot. upto now, this hasn't
happend.
have you thought about testing bogofilter?
clemens