straw-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Straw-devel] UTF-8


From: Juri Pakaste
Subject: Re: [Straw-devel] UTF-8
Date: Wed, 13 Apr 2005 22:22:25 +0300

On Tue, 2005-04-12 at 18:33 -0400, Steve Laniel wrote:
> I just noticed that Straw hasn't been properly refreshing my
> RSS feed for a couple days, and the reason seems to be that
> a recent post contains a demonstration of UTF-8's
> capabilities. The offending post is here:
> 
> http://laniels.org/weblog/tech/unicode.html
> 
> and the RSS feed containing it is here:
> 
> http://laniels.org/rss
> 
> Firefox loads it fine, but Straw falls to pieces on it.
> 
> That seems like an odd failure mode when Straw can't display
> UTF-8. Shouldn't it just display a block in the offending
> post that says, "This portion contains characters that Straw
> doesn't know how to display?"

Well, it depends. Generally speaking, Straw doesn't have a problem with
UTF-8. Now, I haven't looked too closely at your feed (and I wouldn't
know what's a valid UTF-8 byte sequence anyway), but see what the feed
validator says: http://feedvalidator.org/check.cgi?url=http%3A%2F%
2Flaniels.org%2Frss

It looks like there's something there that claims to be UTF-8 but isn't?
Also the Tamil bit seems to be causing trouble. Maybe Straw/feedparser
should cope with the RSS despite these problems, but when the character
sets go wonky and aren't what they claim to be, parsing is difficult.

Of course, it's also possible that your feed is OK and it's a problem
with Python's character set handlers, but I would be rather surprised if
that turned out to be the case.

-- 
[ Juri Pakaste | address@hidden | http://www.iki.fi/juri/ ]

Attachment: signature.asc
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]