guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCHES] Discard BOMs at stream start for UTF-{8,16,32} encodings


From: Andy Wingo
Subject: Re: [PATCHES] Discard BOMs at stream start for UTF-{8,16,32} encodings
Date: Thu, 31 Jan 2013 10:39:14 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux)

Hi Mark,

On Thu 31 Jan 2013 05:40, Mark H Weaver <address@hidden> writes:

> From ceccaf59267bc98c86aae33809905f26b017ebc8 Mon Sep 17 00:00:00 2001
> From: Mark H Weaver <address@hidden>
> Date: Wed, 30 Jan 2013 10:16:37 -0500
> Subject: [PATCH 1/3] Rewrite get_iconv_codepoint to fix a bug involving
>  byte-order marks.

LGTM, thanks!

> From 65e0cca752e005d75c8eade1c92f084a8518f209 Mon Sep 17 00:00:00 2001
> From: Mark H Weaver <address@hidden>
> Date: Wed, 30 Jan 2013 12:21:20 -0500
> Subject: [PATCH 2/3] Discard UTF-8 byte-order marks at stream start, and
>  improve efficiency.
>
> * libguile/ports.c (SCM_ICONV_UNINITIALIZED, SCM_ICONV_UTF8_AT_START,
>   SCM_ICONV_UTF8_NOT_AT_START, SCM_ICONV_SPECIAL_P, SCM_UNICODE_BOM):
>   New macros.

I would prefer a different name other than "special".  Perhaps reverse
the test to be SCM_ICONV_DESCRIPTOR_OPEN_P or something.

> @@ -1202,6 +1211,8 @@ get_utf8_codepoint (SCM port, scm_t_wchar *codepoint,
>      }
>    else if ((buf[0] & 0xf0) == 0xe0)
>      {
> +      scm_t_wchar code_pt;
> +
>        /* 3-byte form.  */
>        byte = scm_peek_byte_or_eof (port);
>        ASSERT_NOT_EOF (byte);

Call it "codepoint_or_bom" perhaps; otherwise *codepoint = code_pt is
too confusing.

The patch looks good to me in general but there is one problem, related
to seeks.  If you seek back to 0, the iconv descriptors are not re-set.
Perhaps seeks should flush the iconv descriptors, if any, and re-set the
UTF-8 state to "at start".  Thoughts?

> From 1e4dde890b0a9a80b26f78d5c82b8ffac9e47689 Mon Sep 17 00:00:00 2001
> From: Mark H Weaver <address@hidden>
> Date: Wed, 30 Jan 2013 14:22:00 -0500
> Subject: [PATCH 3/3] Remove byte-order mark check from
>  scm_i_scan_for_encoding.

Looks good to me.

Thanks for working on this!

Andy
-- 
http://wingolog.org/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]