[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to get title of web page by url?
From: |
Andreas Röhler |
Subject: |
Re: How to get title of web page by url? |
Date: |
Wed, 28 Jul 2010 18:03:58 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; de; rv:1.9.1.11) Gecko/20100711 Thunderbird/3.0.6 |
[ ... ]
The real solution for extracting title from a HTML text are not regular
expressions but a specific HTML parser. The Lisp way to write such
parser would be to turn the document (or only the head part) to nested
lists and other s-expressions and then dive into the list to find the
title. Such parsers already exist for Common Lisp but I'm not sure about
Emacs Lisp.
beg-end.el
at
http://bazaar.launchpad.net/~a-roehler/s-x-emacs-werkstatt
is an essay for such a parser
see thing-at-point-markup.el too, which serves markup-languages as xml, html
thing-at-point-utils.el offers functions to grasp everything between
angles - and does count nesting.
try ar-angled-lesser-atpt for example
all this needs
thingatpt-utils-base.el,
where the core routines reside.
Have a look, how the parser mentioned is employed via
beginning-of-form-base, end-of-form-base from there.
Andreas
Andreas
--
https://code.launchpad.net/~a-roehler/python-mode
https://code.launchpad.net/s-x-emacs-werkstatt/