bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#38587: base64-decode-region breaks encoding


From: Eli Zaretskii
Subject: bug#38587: base64-decode-region breaks encoding
Date: Tue, 17 Dec 2019 18:04:16 +0200

> From: Juri Linkov <juri@linkov.net>
> Cc: schwab@linux-m68k.org,  larsi@gnus.org,  38587@debbugs.gnu.org
> Date: Mon, 16 Dec 2019 23:51:48 +0200
> 
> >> Is there an equivalent of force_encoding('UTF-8') in Emacs?
> >
> > "C-x RET c utf-8 RET M-x SOME-COMMAND RET"
> 
> I see that 'C-x RET c' just sets coding-system-for-read and
> coding-system-for-write for the next command, so could
> base64-decode-region get coding from these variables?

Yes, just access the variable and use the value.

> >   (decode-coding-string (base64-decode-string
> >                          (base64-encode-string
> >                       (encode-coding-string "รค" 'utf-8)))
> >                     'utf-8)
> 
> Thanks, this works for strings.
> 
> My real need was to find a way to decode base64 regions
> that were encoded with UTF-8 coding.

Then you need just base64-decode-region followed by
decode-coding-region.  Assuming that I understand what you mean,
i.e. that the region you want to decode includes only ASCII characters
and raw bytes (otherwise it is not correct to say that it is "encoded
with UTF-8").

> First I tried to find such post-processing that would
> recover "broken" characters inserted by base64-decode-region.
> It seems these characters represent bytes that are parts
> of the UTF-8 characters encoded in the UTF-8 buffer
> using eight-bit charset.  I failed to find such functions
> that would convert the result of base64-decode-region
> to UTF-8 characters in the UTF-8 buffer.

decode-coding-region should be what you want.  It decodes raw bytes
(a.k.a. "eight-bit charset") into characters.

> So I wrote a replacement of base64-decode-region:
> 
> (defun base64-decode-utf8-region (beg end)
>   (interactive "r")
>   (replace-region-contents beg end
>    (lambda ()
>      (decode-coding-string
>       (base64-decode-string
>        (buffer-substring beg end))
>       (or coding-system-for-write 'utf-8)))))
> 
> But the question remains: is it possible to do the same
> in a simpler way without the need to write a new command?

Yes, see above.  In particular, decode-coding-region already knows how
to replace the region with the decoded text.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]