info-gnus-english
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recognizing repeats in RSS feeds


From: Ted Zlatanov
Subject: Re: Recognizing repeats in RSS feeds
Date: Fri, 16 Jan 2009 16:05:07 -0600
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.60 (gnu/linux)

On Fri, 16 Jan 2009 13:12:37 -0500 Desmond Rivet <desmond_news@videotron.ca> 
wrote: 

DR> In addition to reading news and email, I use Gnus to keep track of
DR> various RSS feeds.

DR> For some of these feeds, certain articles will, over time, show up
DR> repeatedly in my summary list.  I'm not sure why, but I assume it has
DR> something to do with updates to the article itself.  Or maybe it happens
DR> when someone posts a new comment on the article.  I don't know.
...
DR> Is there any way to score a repeated (updated) article down, so that
DR> they wouldn't show up in my group unless I asked?  I have no idea where
DR> to even start with this; a simple push in the right direction would be
DR> appreciated.

You want to ignore updates which only affect irrelevant fields.  Here's
how I do it:

(setq nnrss-ignore-article-fields '(description slash:comments 
slash:hit_parade))

This works for me to eliminate duplicates completely; "description"
changes very frequently on some sites for instance.  nnrss finds unique
articles by taking all their fields that are not ignored and hashing the
content.

To find out exactly what's happening, set gnus-verbose to 10 and refresh
a nnrss group.  You have to have a recent CVS Gnus to use this.  I added
it fairly recently.  In *Messages* you'll see a full dump of the RSS
segment that describes each article, and from that you can easily figure
out what's causing duplicates.

For example, here's one entry from the Dilbert Blog:

nnrss: Making hash index of (item nil "
" (title nil "From Blog to Reality: Three Interesting Things") "
" (link nil "http://dilbert.com/blog/entry/from_blog_to_reality_three_things/";) 
"
" (description nil "...cut because it's too much text...") "
" (pubDate nil "Fri, 16 Jan 2009 01:00:01 PST") "
" (guid ((isPermaLink . "false")) "http://dilbert.com/blog/entry/203/";) "
")

So the fields here are guid, pubDate, title, link, and description.

If you need more help, tell us what feeds specifically are causing the
problem and I can take a look.

Ted


reply via email to

[Prev in Thread] Current Thread [Next in Thread]