savannah-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[savannah-help-public] [sr #109439] Commit notification hook mishandles


From: Bob Proulx
Subject: [savannah-help-public] [sr #109439] Commit notification hook mishandles non-ASCII author names
Date: Tue, 9 Jan 2018 14:22:10 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36

Follow-up Comment #2, sr #109439 (project administration):

I spent some time looking into this problem and the issue is much too
complicated to type into a web page text area.  I thought about dragging this
conversation over to the mailing list but decided to give it a shot here
anyway.  Glenn is correct about the Latin1 encoding being the problem.

There are many problems.  One is that Savannah's web interface is designed
around Latin1 not UTF-8.  I don't know what needs to be done to fix the web UI
to migrate it from Latin1 to UTF-8.  I didn't try it and am not sure but I am
pretty sure that if I update the database to contain UTF-8 content instead of
Latin1 content then the web page would be the reverse mangling.

https://savannah.gnu.org/users/civodul

Oh, and there is also a lot of content stored in the database in UTF-8 content
too.  Even though the database character encoding is specified as Latin1. 
Assaf has an entry describing this problem in the TODO list.  That mismatch is
also a problem for other data in the other direction.

In any case here are some data factoids just as general information.  I will
dump some data from the MySQL database.


vcs0:~# getent passwd civodul | awk -F: '{print$5}' | od -tx1 -c
0000000  4c  75  64  6f  76  69  63  20  43  6f  75  72  74  e8  73  0a
          L   u   d   o   v   i   c       C   o   u   r   t 350   s  \n

vcs0:~# getent passwd civodul | awk -F: '{print$5}' | iconv -f LATIN1 -t UTF-8
| od -tx1 -c
0000000  4c  75  64  6f  76  69  63  20  43  6f  75  72  74  c3  a8  73
          L   u   d   o   v   i   c       C   o   u   r   t 303 250   s
0000020  0a
         \n


This shows that indeed the content from the database is returned in a Latin1
encoding.  This is then used by git-multimail and onward.  If it were UTF-8
then from here onward through the email it should all work okay.

At the moment I think a reasonable workaround would be handling this in the
git-multimail wrapper that we are already using with git-multimail.  It's all
Python and I am a Perl guy so please forgive me if I don't know Python well
enough to make the changes myself.  But if someone were to propose patches to
the python then I think this could be fixed there.  Here is raw access to the
git repository including config for git-multimail.  The file needing patching
is post-receive.  Looking at that file should give a python person enough
information on the process and they should be able to hack in a workaround.

https://git.savannah.gnu.org/git/guix.git/hooks/

If the fromaddr could be passed through "iconv -f LATIN1 -t UTF-8" then I
think the result would work around the current Latin1 issues.  Patches
solicited.

And one more thing.  We are using git-multimail from just after the 1.0.0 tag
plus 3 with two local changes on top of that from 2014.  It's been working
well so there hasn't been a need to update.  But if someone were offended that
we aren't using the latest version of git-multimail and was willing to test
out the new version then I'd be happy to work through the upgrade with them.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/support/?109439>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]