bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#45660: 28.0.50; Changed word/whitespace syntax


From: Eli Zaretskii
Subject: bug#45660: 28.0.50; Changed word/whitespace syntax
Date: Tue, 05 Jan 2021 20:45:13 +0200

> From: Juri Linkov <juri@linkov.net>
> Cc: 45660@debbugs.gnu.org
> Date: Tue, 05 Jan 2021 20:20:44 +0200
> 
> > Previously, many characters, including u+202F, had the punctuation
> > ('.') syntax.  I modified that to be more close to the Unicode
> > Character Database (UCD), and u+202F is not a punctuation character
> > according to the UCD.  It has the Zs general category, which means
> > "space separator", the same as SPC, NBSP, EN SPACE, and others.
> 
> So according to the Unicode standard it should have whitespace syntax?

Unicode doesn't have the concept of "syntax", it's our invention.  For
some syntactic categories, it makes sense to follow the corresponding
Unicode general category.  Two examples are "punctuation" and
"symbols".

The question whether to treat Zs as whitespace syntax is on the
table.  We previously treated many of such characters as
"punctuation", which doesn't seem right to me.  Which is why I removed
them from the "punctuation" syntax, and you got bitten byu the result
(because the default syntax is "word-constituent").

> Should the word characters separated by NO-BREAK SPACE by treated as one word?

That's a good question.  Do we currently treat them as such?  I don't
think so, because NBSP has the '.' syntax, i.e. "punctuation".

> If there is no reason to treat space characters as part of words, then all
> characters with the Zs general category could have the same whitespace syntax.

I tend to agree.  If no objections or new issues arise, I will do that
in a couple of days.

Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]