ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ifile-discuss] Re: html tag stripping


From: clemens fischer
Subject: [Ifile-discuss] Re: html tag stripping
Date: 26 Jun 2003 10:12:12 +0200
User-agent: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3 (berkeley-unix)

* David Bushong:

> Well, even if people are filtering non-email through it, it doesn't
> handle tagged input gracefully.  An option to do a simple, naïve
> tag-strip seems like a win to me.

have you thought about piping emails through "sed -E 's/<.+>//g'" to
check if a naive approach suffices?  i just tried it:  it fails when a
tag is opened on one line and closed on another.  also, sed(1)
unfortunately doesn't have non-greedy versions of RE closures, so that
a line having `<' somewhere and `>' lateron will have everything in
between stripped regardless of the balancing of tags:  this looses
perfectly readable text.

you could try with another simple tool:  sgrep(1) "Structured Grep":

  http://www.cs.helsinki.fi/~jjaakkol/sgrep.html
  ftp://ftp.cs.helsinki.fi/pub/Software/Local

  Sgrep was created by Jani Jaakkola (address@hidden) and
  Pekka Kilpeläinen (address@hidden).

it is meant to find balanced, SGML like markup, and you can customize
the output format.  it has HTML-examples included.  if you make it
with sgrep(1), please drop a few lines to this list.

  clemens




reply via email to

[Prev in Thread] Current Thread [Next in Thread]