bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#39659: 27.0.60; inappropriate han script definition in char-script-t


From: ynyaaa
Subject: bug#39659: 27.0.60; inappropriate han script definition in char-script-table
Date: Wed, 19 Feb 2020 18:53:07 +0900

Eli Zaretskii <eliz@gnu.org> writes:

>> From: ynyaaa@gmail.com
>> Date: Tue, 18 Feb 2020 22:50:57 +0900
>> 
>> 'han' script is defined in char-script-table as:
>>      2E80-2FDF       han
>>      3200-9FFF       han
>>      F900-FAFF       han
>>      FE30-FE4F       han
>>      1F200-1F2FF     han
>>      20000-2A6DF     han
>>      2A700-2EBEF     han
>>      2F800-2FA1F     han
>> 
>> It is better to set values as:
>>      3200-33FF       cjk-misc
>>      4DC0-4DFF       cjk-misc
>>      FE30-FE4F       cjk-misc
>>      1F200-1F2FF     cjk-misc
>> 
>> If enclosed CJK Ideographs should be 'han' script,
>> enclosed Hanguls should be 'hangul' script,
>> enclosed Katakana should be 'kana' script,
>> and enclosed Numbers should be 'symbol' script.
>
> Please provide some rationale for the differences, just saying
> "better" and "should" doesn't explain why you think the changes are
> for the good.
>
> CC'ing Handa-san, who I hope will have some comments on this.
>
> Thanks.

Because they are not han characters.
I think that combinatorial characters are not han characters,
and that they are symbolic characters.

As for enclosed latin letters, they are treated as 'symbol' script.
        249C-24B5       PARENTHESIZED LATIN SMALL LETTER *
        24B6-24CF       CIRCLED LATIN CAPITAL LETTER *
        24D0-24E9       CIRCLED LATIN SMALL LETTER *
        1F110-1F129     PARENTHESIZED LATIN CAPITAL LETTER *
        1F130-1F149     SQUARED LATIN CAPITAL LETTER *
        1F150-1F169     NEGATIVE CIRCLED LATIN CAPITAL LETTER *
        1F170-1F189     NEGATIVE SQUARED LATIN CAPITAL LETTER *
        1F12A           TORTOISE SHELL BRACKETED LATIN CAPITAL LETTER S
        1F12B           CIRCLED ITALIC LATIN CAPITAL LETTER C
        1F12C           CIRCLED ITALIC LATIN CAPITAL LETTER R
        1F18A           CROSSED NEGATIVE SQUARED LATIN CAPITAL LETTER P
        1F1A5           SQUARED LATIN SMALL LETTER D

If script is set to han, hangul or kana for combinatorial characters
which contain han, hangul or kana characters, script values are like below:

CodePoint       Script  Comment
3200-321E       hangul  enclosed hangul
321F            -       unassigned
3220-3247       han     enclosed han
3248-324F       symbol  enclosed number
3250            symbol  combined latin
3251-325F       symbol  enclosed number
3260-327E       hangul  enclosed hangul
327F            symbol  symbol
3280-32B0       han     enclosed han
32B1-32BF       symbol  enclosed number
32C0-32CB       han     square character with han
32CC-32CF       symbol  square character with latin
32D0-32FE       kana    enclosed kana
32FF            han     square character with han
3300-3357       kana    square character with kana
3358-3370       han     square character with han
3371-337A       symbol  square character with latin
337B-337F       han     square character with han
3380-33DF       symbol  square character with latin
33E0-33FE       han     square character with han
33FF            symbol  square character with latin

4DC0-4DFF       symbol  symbol

FE30-FE44       symbol  symbol for vertical
FE45-FE46       symbol  symbol
FE47-FE48       symbol  symbol for vertical
FE49-FE4F       symbol  symbol

1F200-1F202     kana    enclosed/square character with kana
...             -       unassigned
1F210-1F212     han     enclosed han
1F213           kana    enclosed kana
1F214-1F248     han     enclosed han
...             -       unassigned
1F250-1F251     han     enclosed han
...             -       unassigned
1F260-1F265     symbol  symbol





reply via email to

[Prev in Thread] Current Thread [Next in Thread]