[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parse inline scheme using per-expression port (issue 557330043 by ad
From: |
hanwenn |
Subject: |
Re: Parse inline scheme using per-expression port (issue 557330043 by address@hidden) |
Date: |
Fri, 07 Feb 2020 07:23:10 -0800 |
Reviewers: dak,
Message:
On 2020/02/07 15:14:52, dak wrote:
> Ok, that should probably have been stressed somewhat stronger before:
have you
> checked the preexisting Guilev2 work in branches that Harm pointed you
to?
No, I didn't
> Date: Sun Sep 21 18:40:06 2014 +0200
Is there more 6 year old code that is still relevant but not integrated
to mainline?
> Source_file::init_port: Keep GUILEv2 from redecoding string input
>
> diff --git a/lily/source-file.cc b/lily/source-file.cc
> index 5a94927a7f..5ad9c4c6e8 100644
> --- a/lily/source-file.cc
> +++ b/lily/source-file.cc
> @@ -152,7 +152,11 @@ Source_file::init_port ()
> // we do our own utf8 encoding and verification in the parser, so
we
> // use the no-conversion equivalent of latin1
> SCM str = scm_from_latin1_string (c_str ());
> - str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG,
__FUNCTION__);
> + scm_dynwind_begin ((scm_t_dynwind_flags)0);
> + // Why doesn't scm_set_port_encoding_x work here?
> + scm_dynwind_fluid (ly_lily_module_constant
("%default-port-encoding"),
> SCM_BOOL_F);
> + str_port_ = scm_open_input_string (str);
> + scm_dynwind_end ();
> scm_set_port_filename_x (str_port_, ly_string2scm (name_));
> }
>
> Now it is a valid question why this isn't, GUILEV2-guarded, already in
the main
> code base. Maybe we should integrate all of the already existing
Guilev2 work
> into master with priority in order to avoid duplicate work.
Yes, that would be great idea. Is there any other pending work that you
know of?
I think the patch you quote here is doing the wrong thing, though. If
you have Scheme code with UTF-8 encoded data embedded into it, it will
get parsed out as Latin1.
Description:
Parse inline scheme using per-expression port
Introduces a throw-away string port, Overlay_string_port, which makes
a port out of a random section of a C string. It does not own the
string, so there is no overhead in creating and discarding the ports
during parse time.
For GUILE v2, Overlay_string_port uses UTF-8 encoding, so UTF-8
encoded string constants within the Scheme constants are interpreted
as Unicode correctly.
This obviates the string port that Source_file carries along.
This commit fixes a problem with GUILE 2.2.6, where LilyPond
calculates offsets in the source file as bytes, while GUILE interprets
the source file as UTF-8 encoded Unicode. As a result, files with
Unicode before embedded scheme break completely.
Please review this at https://codereview.appspot.com/557330043/
Affected files (+206, -78 lines):
M lily/include/lily-parser.hh
A lily/include/overlay-string-port.hh
M lily/include/source-file.hh
A + lily/overlay-string-port.cc
M lily/parse-scm.cc
M lily/source-file.cc
Re: Parse inline scheme using per-expression port (issue 557330043 by address@hidden), dak, 2020/02/07
Re: Parse inline scheme using per-expression port (issue 557330043 by address@hidden), hanwenn, 2020/02/09
Re: Parse inline scheme using per-expression port (issue 557330043 by address@hidden), dak, 2020/02/09