[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Questionable code in handling of wordend in the regexp engine in reg
From: |
Stefan Monnier |
Subject: |
Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c |
Date: |
Fri, 01 Mar 2019 08:41:31 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) |
> down", other times it "rounds it up" to a character position. I think
> it should be defined as rounding it down. It would be a relatively
> simple correction (at least, technically ;-).
When moving forward, rounding it up is more natural ;-)
> But I'm still a little worried about buf_bytepos_to_charpos. Perhaps it
> should state that the result is undefined when the bytepos is "invalid".
Yes, I think it's the intention. Even better would be to signal an
error (when built with --enable-checking).
> For that matter, how many charpos <-> bytepos functions are there in
> Emacs? Just this one?
I think so, yes.
>> Worse, in notwordbound we do:
>
>> ptrdiff_t offset = PTR_TO_OFFSET (d - 1);
>> ptrdiff_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
>> UPDATE_SYNTAX_TABLE (charpos);
>
>> which seems even more broken because `d` might point to the first byte
>> after the gap, so `d - 1` will point in the middle of the gap, so it's
>> simply an invalid argument to PTR_TO_OFFSET.
>
> I don't think this is right. Both `d' and `offset' are byte
> measurements, not character measurements, so it shouldn't matter whether
> the "- 1" is inside or outside the parens. However, it would be less
> confusing if they were both (?all) the same.
The difference between `d` and `offset` is just an offset, indeed, but
it can be 2 different offsets depending on whether `d` is before or
after the gap, so what happens when `d` is within the gap depends on how
the test for "before/after the gap" is implemented.
More specifically, when `d` is N bytes before the end of the gap, the
code could consider it as being N bytes before the beginning of the
second part, or being "gap-size - N" bytes after the end of the
first part.
>> According to the definition of PTR_TO_OFFSET and POINTER_TO_OFFSET,
>> the result may be the same as if we did the decrement after the fact,
>> but it still looks fishy. WDYT?
>
> I think it is suboptimal to have both PTR_TO_OFFSET and
> POINTER_TO_OFFSET meaning different things in the same source file. ;-)
I'm so glad you're volunteering to clean this up.
Thank you, really.
> There are eight occurrences of SYNTAX_TABLE_BYTE_TO_CHAR in
> regex-emacs.c. I think I will check them all, amending them as in your
> patch.
> What do you say?
Thanks,
Stefan
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Alan Mackenzie, 2019/03/01
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c,
Stefan Monnier <=
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Eli Zaretskii, 2019/03/01
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Alan Mackenzie, 2019/03/01
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Eli Zaretskii, 2019/03/01
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Alan Mackenzie, 2019/03/01
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Eli Zaretskii, 2019/03/01
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Alan Mackenzie, 2019/03/01
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Alan Mackenzie, 2019/03/01
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Eli Zaretskii, 2019/03/01
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Alan Mackenzie, 2019/03/02
- Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c, Eli Zaretskii, 2019/03/02