[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to get title of web page by url?
From: |
Thamer Mahmoud |
Subject: |
Re: How to get title of web page by url? |
Date: |
Wed, 28 Jul 2010 18:34:56 +0300 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) |
filebat Mark <filebat.mark@gmail.com> writes:
> Thanks, Thamer. It works.
>
> Below is the code snippet.
>
> Well, I still have an encoding problem.
> To get the title of "http://www.baidu.com", the title we get is displayed as
> unrecognizable codes.
>
> I have tried to encode it, in the way of "(setq web_title_str
> (encode-coding-string web_title_str 'utf-8-dos))", but it fails.
I'm also new to Elisp (well sort of).
But here is a modified version that should handle both charsets and
newlines (and other issues noticed by Deniz Dogan. Thanks).
(defun www-get-page-title (url)
(let ((title))
(with-current-buffer (url-retrieve-synchronously url)
(goto-char (point-min))
(re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
(setq title (match-string 1))
(goto-char (point-min))
(re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
(decode-coding-string title (intern (match-string 1))))))
The robustness of this code would still depend on whether the HTML is
well-formed, but it should be good enough I think.
--
Thamer
> Since I am a newbie for emacs encoding, can you please help me to point what
> the problem is?
>
> ;; -------------------------- separator --------------------------
> (defun get-page-title()
> "Get title of web page, whose url can be found in current line"
> (interactive)
> ;; Get url from current line
> (copy-region-as-kill (re-search-backward "^") (re-search-forward "$"))
> (setq url (substring-no-properties (current-kill 0)))
> ;; Get title of web page, with the help of functions in url.el
> (with-current-buffer (url-retrieve-synchronously url)
> (goto-char 0)
> (re-search-forward "<title>\\(.*\\)<[/]title>" nil t 1)
> (setq web_title_str (match-string 1)))
> (setq web_title_str (encode-coding-string web_title_str 'utf-8-dos))
> ;; Insert the title in the next line
> (reindent-then-newline-and-indent)
> (insert web_title_str)
> )
>
>
> On 7/28/10, Thamer Mahmoud <thamer.mahmoud@gmail.com> wrote:
>>
>> filebat Mark <filebat.mark@gmail.com> writes:
>>
>> > Such as, given "http://www.emacswiki.org/emacs/Git", we will get the
>> title
>> > of this web page, which is "EmacsWiki: Git:".
>> >
>> > Function of w3m-current-title is quite close, but a standalone lisp
>> function
>> > is much preferred.
>>
>>
>> Using the url.el package,
>>
>> (defun www-get-page-title (url)
>> (with-current-buffer (url-retrieve-synchronously url)
>> (goto-char 0)
>> (re-search-forward "<title>\\(.*\\)<[/]title>" nil t 1)
>> (match-string 1)))
>>
>> (www-get-page-title "http://www.emacswiki.org/emacs/Git")
>> => "EmacsWiki: Git"
>>
>> hth,
>>
>> Thamer
>>
>>
>>