bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58281: 27.1; windows mangles encoding on command line


From: Daniel Bastos
Subject: bug#58281: 27.1; windows mangles encoding on command line
Date: Wed, 12 Oct 2022 08:49:32 -0300

On Wed, Oct 12, 2022 at 5:45 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Daniel Bastos <dbastos@id.uff.br>
> > Date: Thu, 6 Oct 2022 09:03:50 -0300
> > Cc: Wayne Harris <dbastos@toledo.com>, 58281@debbugs.gnu.org
> >
> > On Tue, Oct 4, 2022 at 7:02 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > > > From: Wayne Harris <dbastos@toledo.com>
> > > > Date: Mon, 03 Oct 2022 22:18:35 -0300
> > > >
> > > > I run emacs -Q.  I open eshell.  Then I try to use fossil (which is a
> > > > version control system like git) and try to put accented letters on the
> > > > commit message.  No choice of encoding seems to avoid the mangling.
> > > >
> > > > c:/my/path $ alias fs 'fossil $*'
> > > > c:/my/path $ echo kkk >> encoding.txt
> > > > c:/my/path $ fs changes
> > > > EDITED     encoding.txt
> > > >
> > > > c:/my/path $ (print default-process-coding-system)
> > > > (undecided-dos . undecided-unix)
> > > >
> > > > c:/my/path $ (or buffer-file-coding-system "it is nil")
> > > > it is nil
> > > >
> > > > c:/my/path $ fs commit -m 'Naiveté'
> > > > [...]
> > > > Sync done, wire bytes sent: 3234  received: 309  ip: 5.161.138.46
> > > >
> > > > c:/my/path $ fs timeline -n 1
> > > > === 2022-10-02 ===
> > > > 13:11:20 [febbbf0441] *CURRENT* Naiveté (user: mer tags: trunk)
> > > > --- entry limit (1) reached ---
> > > > c:/my/path $
> > >
> > > Where did you download Fossil for MS-Windows?  Is it a native Windows
> > > program, or a Cygwin program?  Is 'fs' a program (i.e. fs.exe) or some
> > > kind of shell script, and if the latter, can you post the script?
> >
> > I went to
> >
> >   https://fossil-scm.org/home/uv/download.html
> >
> > and chose the last one --- Windows64 ---, which is the ZIP at
> >
> >   https://fossil-scm.org/home/uv/fossil-w64-2.19.zip
> >
> > Inside this ZIP, there's a fossil.exe binary.  All evidence points to
> > a native Windows program, not a Cygwin program.
> >
> > %file c:/my/path/fossil.exe
> > c:/my/path/fossil.exe: PE32+ executable (console) x86-64, for MS Windows
> > %
> >
> > There's no fs.exe and no script fs.  (Sorry about that.)  That's just
> > my alias in ESHELL.  You can safely assume that /fs/ just means
> > /fossil/.  (I shouldn't have used the alias in this bug report.
> > Sorry.)
> >
> > > Also, do you know whether Fossil expects the message text in some
> > > particular encoding?
> >
> > That I don't know.  I've looked into the documentation, but I did not
> > find anything that looked relevant.  I did find old commit messages in
> > the repository of fossil itself that little by little the developers
> > have been adding UTF-8 support to it.  But I can't say it expects any
> > particular encoding.
>
> I think you said at some point that using non-ASCII commit log
> messages from a shell outside of Emacs did succeed?  If so, can you

Not from a shell but from a regular GNU EMACS buffer.  I then showed
an ESHELL session where I don't specify the commit message on the
command-line and then emacsclientw was invoked.  In the buffer that
opened, I typed an UTF-8 encoded message and that was not mangled.

--8<---------------cut here---------------start------------->8---
However, if instead of the command-line, I use a regular GNU EMACS
buffer, it works just fine.

%echo kkk >> encoding.txt

%fs commit
Pull from https://mer@somewhere.edu/test
Round-trips: 1   Artifacts sent: 0  received: 0
Pull done, wire bytes sent: 437  received: 2118  ip: 5.161.138.46
emacsclientw ./ci-comment-A2803F45F10B.txt
Waiting for Emacs...
Pull from https://mer@somewhere.edu/test
Round-trips: 1   Artifacts sent: 0  received: 0
Pull done, wire bytes sent: 441  received: 2118  ip: 5.161.138.46
New_Version: 09ea1b5d5b8d776d61a74bb412cd58bd8b6f82323c2f539a1eb0d915f7026f20
Sync with https://mer@somewhere.edu/test
Round-trips: 1   Artifacts sent: 2  received: 0
Sync done, wire bytes sent: 2496  received: 309  ip: 5.161.138.46

%fs timeline
=== 2022-10-01 ===
14:09:39 [09ea1b5d5b] *CURRENT* Naiveté. (user: mer tags: trunk)
--8<---------------cut here---------------end--------------->8---

> describe how you do that, i.e. which shell do you use and how you type
> 'Naiveté' from the shell?  Also, what does the command "chcp" report
> in that shell, if you invoke it with no arguments?

I had not tested with a different shell.  I'm testing it with cmd.exe
below.  The encoding is not mangled, but I don't know which encoding
is applied there because I have no idea how cmd.exe works.  The
command chcp reports code page 850.

--8<---------------cut here---------------start------------->8---
c:\my\path>chcp
Active code page: 850

c:\my\path>fossil commit -m 'Naiveté'
Pull from https://mer@somewhere.edu/mer
Round-trips: 1   Artifacts sent: 0  received: 0
Pull done, wire bytes sent: 438  received: 3250  ip: 5.161.138.46
New_Version: 8cce649b5236e507e84ce8114ab273e3b9ea246dd00e42484b47ab86517cf028
Sync with https://mer@somewhere.edu/mer
Round-trips: 1   Artifacts sent: 2  received: 0
Sync done, wire bytes sent: 3615  received: 307  ip: 5.161.138.46

c:\my\path>fossil timeline -n 1
=== 2022-10-12 ===
11:31:30 [8cce649b52] *CURRENT* 'Naiveté' (user: mer tags: trunk)
--- entry limit (1) reached ---

c:\my\path>
--8<---------------cut here---------------end--------------->8---

However, there is some evidence that UTF-8 is the encoding used by
cmd.exe.  I committed again with the message "água aaaaa".

--8<---------------cut here---------------start------------->8---
c:\my\path>fossil timeline -n 1
=== 2022-10-12 ===
11:38:30 [148c174ad3] *CURRENT* água aaaaa (user: mer tags: trunk)
--- entry limit (1) reached ---
--8<---------------cut here---------------end--------------->8---

I know "á" encodes to the two-byte c3 a1 in UTF-8.  Asking /od/ to
show me the byte sequence, I see the c3 a1 in there.  First notice the
position of the two-byte sequence of interest --- it's in line 0000060
at the 4th column.

--8<---------------cut here---------------start------------->8---
c:\my\path>fossil timeline -n 1 | od -t c
0000000   =   =   =       2   0   2   2   -   1   0   -   1   2       =
0000020   =   =  \n   1   1   :   3   8   :   3   0       [   1   4   8
0000040   c   1   7   4   a   d   3   ]       *   C   U   R   R   E   N
0000060   T   *       Ã   ¡   g   u   a       a   a   a   a   a       (
[...]
--8<---------------cut here---------------end--------------->8---

If we look at which bytes are there, we find c3 a1.  I do not
understand this: I have no idea why my cmd.exe is UTF-8 encoding
anything.

--8<---------------cut here---------------start------------->8---
c:\my\path>fossil timeline -n 1 | od -t x1
0000000 3d 3d 3d 20 32 30 32 32 2d 31 30 2d 31 32 20 3d
0000020 3d 3d 0a 31 31 3a 33 38 3a 33 30 20 5b 31 34 38
0000040 63 31 37 34 61 64 33 5d 20 2a 43 55 52 52 45 4e
0000060 54 2a 20 c3 a1 67 75 61 20 61 61 61 61 61 20 28
[...]
--8<---------------cut here---------------end--------------->8---

Feel free to ask me any further questions. Thank you!





reply via email to

[Prev in Thread] Current Thread [Next in Thread]