[Ifile-discuss] Re: html tag stripping

ifile-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ifile-discuss] Re: html tag stripping

From:	clemens fischer
Subject:	[Ifile-discuss] Re: html tag stripping
Date:	26 Jun 2003 10:12:12 +0200
User-agent:	Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3 (berkeley-unix)

* David Bushong:

> Well, even if people are filtering non-email through it, it doesn't
> handle tagged input gracefully.  An option to do a simple, naïve
> tag-strip seems like a win to me.

have you thought about piping emails through "sed -E 's/<.+>//g'" to
check if a naive approach suffices?  i just tried it:  it fails when a
tag is opened on one line and closed on another.  also, sed(1)
unfortunately doesn't have non-greedy versions of RE closures, so that
a line having `<' somewhere and `>' lateron will have everything in
between stripped regardless of the balancing of tags:  this looses
perfectly readable text.

you could try with another simple tool:  sgrep(1) "Structured Grep":

  http://www.cs.helsinki.fi/~jjaakkol/sgrep.html
  ftp://ftp.cs.helsinki.fi/pub/Software/Local

  Sgrep was created by Jani Jaakkola (address@hidden) and
  Pekka Kilpeläinen (address@hidden).

it is meant to find balanced, SGML like markup, and you can customize
the output format.  it has HTML-examples included.  if you make it
with sgrep(1), please drop a few lines to this list.

  clemens

[Prev in Thread]

Current Thread

[Next in Thread]

[Ifile-discuss] html tag stripping, David Bushong, 2003/06/25
- [Ifile-discuss] Re: html tag stripping, clemens fischer, 2003/06/25
  - [Ifile-discuss] Re: html tag stripping, David Bushong, 2003/06/25
    - [Ifile-discuss] Re: html tag stripping, clemens fischer <=
    - [Ifile-discuss] Re: html tag stripping, Aaron M. Ucko, 2003/06/26
    - Re: [Ifile-discuss] Re: html tag stripping, Preben Randhol, 2003/06/26
    - [Ifile-discuss] Re: html tag stripping, David Bushong, 2003/06/26
- Re: [Ifile-discuss] html tag stripping, Jason Rennie, 2003/06/30

Prev by Date: [Ifile-discuss] Re: html tag stripping
Next by Date: [Ifile-discuss] Re: html tag stripping
Previous by thread: [Ifile-discuss] Re: html tag stripping
Next by thread: [Ifile-discuss] Re: html tag stripping
Index(es):
- Date
- Thread