[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnupod] Encoding of non-ascii characters in GNUtunesDB.xml
From: |
H. Langos |
Subject: |
Re: [Bug-gnupod] Encoding of non-ascii characters in GNUtunesDB.xml |
Date: |
Tue, 22 Apr 2008 21:44:24 +0200 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
Patch to the patch of the patch ... or rather not...
I rewrote the UTF-8 to XML-entities conversion once again. This time I only use
perl regular expressions and no Unicode::String methods at all.
Instead of making another patch I'll simply paste the xescaped sub.
################################################################
# Escape chars
sub xescaped {
my ($ret) = @_;
$ret =~ s/&/&/g;
$ret =~ s/"/"/g;
$ret =~ s/\'/'/g;
$ret =~ s/</</g;
$ret =~ s/>/>/g;
#$ret =~ s/^\s*-+//g;
my $xutf = Unicode::String::utf8($ret)->utf8;
#Remove 0x00 - 0x1f chars (we don't need them)
$xutf =~ tr/\000-\037//d;
#convert to XML encoded unicode
$xutf =~ s/([\xC2-\xDF])([\x80-\xBF])/"&#".( ((ord($1) % 32) << 6) +
(ord($2) % 64) ).";"/eg;
$xutf =~ s/([\xE0-\xEF])([\x80-\xBF])([\x80-\xBF])/"&#".( ((ord($1) %
16) << 12) + ((ord($2) % 64) << 6) + (ord($3) % 64) ).";"/eg;
$xutf =~ s/([\xF0-\xF4])([\x80-\xBF])([\x80-\xBF])([\x80-\xBF])/"&#".(
((ord($1) % 8) << 18) + ((ord($2) % 64) << 12) + ((ord($3) % 64) << 6) +
(ord($4) % 64) ).";"/eg;
return $xutf;
}
################################################################
cheers
-henrik