[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnupod] input not verified as sane
From: |
H. Langos |
Subject: |
Re: [Bug-gnupod] input not verified as sane |
Date: |
Thu, 10 Apr 2008 13:39:55 +0200 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
Hi Dylan,
On Tue, Apr 08, 2008 at 01:48:35PM -0700, Dylan Martin wrote:
> Hi, I use gnupod all the time. Thanks for making it!
>
> I just downloaded a podcast with non-ascii characters and possibly
> problematical '<' and '>' marks in the title.
>
If you take a look at gnupod/src/ext/XMLhelper.pm you'll see that the
xml file is generated mostly by text manipulation. So chances are quite
high that something will slip through the cracks ...
Could you post the url of that podcast, the downloaded xml file (should
be something called /tmp/gnupodcast1_47f61cb0_87903dd.)
Or at least the relevant part of your GNUtunesDB.xml?
It would make it far easier to reproduce the bug and fix it.
As I read gnupod/src/ext/XMLhelper.pm all attribute names (why that?) and
their values get filtered by this function:
sub xescaped {
my ($ret) = @_;
$ret =~ s/&/&/g;
$ret =~ s/"/"/g;
$ret =~ s/</</g;
$ret =~ s/>/>/g;
#$ret =~ s/^\s*-+//g;
my $xutf = Unicode::String::utf8($ret)->utf8;
#Remove 0x00 - 0x1f chars (we don't need them)
$xutf =~ tr/\000-\037//d;
return $xutf;
}
So your < and > marks should be taken care of. Non-ascii characters
however are a different story. There are people out there who seem
to write their rss files in wordpad and since the line
my $xutf = Unicode::String::utf8($ret)->utf8;
assumes that $ret is in utf8 and does output it in utf8 again.
there is no real conversion happening here.
The only effect of this line as far as I can tell is some filtering.
Try this:
perl -e 'use Unicode::String; my $xutf = Unicode::String::utf8("foo")->hex;
print $xutf."\n";'
Output should be:
U+0066 U+006f U+006f
Now replace "foo" with a string that contains nonascii characters. if
your terminal is utf8 you should see output of those characters in hex
notation.
If the input isn't utf8 in my case the invalid characters seem to be
ignored but the behavior for invalid input is undefinded (at least
my documentation of Unicode::String doesn't tell me anything) and
your perl version might be different.
Anyway there should be no invalid utf8 characters here since the input
that comes from the podcast's xml file is already filtered and converted
by XML/Parser.pm
After thinking about it again I assume that by "title" you mean the
title that is extracted from the id3 tag of that podcast.
One more reason to follow up on this with more information about the
podcast.
> gnupod_addsong.pl added these items to GNUtunesDB.xml and every
> subsequent attempt to read the database produces an error.
>
> not well-formed (invalid token) at line 654, column 398, byte 548847
> at /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm
> line 187
>
> I was able to fix this by changing the questionable string to
> something reasonable in the GNUtunesDB.xml file.
>
> Also, it would be really nice if the error message was more helpful,
> e.g. said which file contained the problem.
At the point where the parser dies it only knows a file handle, and no
filename anymore. So the output can't be done there.
I don't realy understand perl so somebody please correct me if I talk
bullshit here but as I understand it perl doesn't have any useful means
of exception handling/propagation. So if an error is not handled where
it occurs, it crashes your programm right there and right then.
The only way to avoid this is to use "eval" and see if the eval block
died. This is what XML/Parser.pm does.
Before ranting on about the inherent evil of "eval" I decided to take
a look at the parser and there's at least a way to make it output the
offending line.
I've also wrapped the call to the parser in an eval block to catch the
dying parser and make it output the file name.
( I don't have a clue if in this case perl will realy assign the right
value to $p but it is not used by anybody who calls the "doxml" sub
anyway. )
I guess I will add the same savety mechanism to addsong for handling
badly formated rss feeds.
cheers
-henrik
gnupod_ext_XMLhelper-improve-error-output.patch
Description: Text Data