[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#39659: 27.0.60; inappropriate han script definition in char-script-t
From: |
ynyaaa |
Subject: |
bug#39659: 27.0.60; inappropriate han script definition in char-script-table |
Date: |
Wed, 19 Feb 2020 18:53:07 +0900 |
Eli Zaretskii <eliz@gnu.org> writes:
>> From: ynyaaa@gmail.com
>> Date: Tue, 18 Feb 2020 22:50:57 +0900
>>
>> 'han' script is defined in char-script-table as:
>> 2E80-2FDF han
>> 3200-9FFF han
>> F900-FAFF han
>> FE30-FE4F han
>> 1F200-1F2FF han
>> 20000-2A6DF han
>> 2A700-2EBEF han
>> 2F800-2FA1F han
>>
>> It is better to set values as:
>> 3200-33FF cjk-misc
>> 4DC0-4DFF cjk-misc
>> FE30-FE4F cjk-misc
>> 1F200-1F2FF cjk-misc
>>
>> If enclosed CJK Ideographs should be 'han' script,
>> enclosed Hanguls should be 'hangul' script,
>> enclosed Katakana should be 'kana' script,
>> and enclosed Numbers should be 'symbol' script.
>
> Please provide some rationale for the differences, just saying
> "better" and "should" doesn't explain why you think the changes are
> for the good.
>
> CC'ing Handa-san, who I hope will have some comments on this.
>
> Thanks.
Because they are not han characters.
I think that combinatorial characters are not han characters,
and that they are symbolic characters.
As for enclosed latin letters, they are treated as 'symbol' script.
249C-24B5 PARENTHESIZED LATIN SMALL LETTER *
24B6-24CF CIRCLED LATIN CAPITAL LETTER *
24D0-24E9 CIRCLED LATIN SMALL LETTER *
1F110-1F129 PARENTHESIZED LATIN CAPITAL LETTER *
1F130-1F149 SQUARED LATIN CAPITAL LETTER *
1F150-1F169 NEGATIVE CIRCLED LATIN CAPITAL LETTER *
1F170-1F189 NEGATIVE SQUARED LATIN CAPITAL LETTER *
1F12A TORTOISE SHELL BRACKETED LATIN CAPITAL LETTER S
1F12B CIRCLED ITALIC LATIN CAPITAL LETTER C
1F12C CIRCLED ITALIC LATIN CAPITAL LETTER R
1F18A CROSSED NEGATIVE SQUARED LATIN CAPITAL LETTER P
1F1A5 SQUARED LATIN SMALL LETTER D
If script is set to han, hangul or kana for combinatorial characters
which contain han, hangul or kana characters, script values are like below:
CodePoint Script Comment
3200-321E hangul enclosed hangul
321F - unassigned
3220-3247 han enclosed han
3248-324F symbol enclosed number
3250 symbol combined latin
3251-325F symbol enclosed number
3260-327E hangul enclosed hangul
327F symbol symbol
3280-32B0 han enclosed han
32B1-32BF symbol enclosed number
32C0-32CB han square character with han
32CC-32CF symbol square character with latin
32D0-32FE kana enclosed kana
32FF han square character with han
3300-3357 kana square character with kana
3358-3370 han square character with han
3371-337A symbol square character with latin
337B-337F han square character with han
3380-33DF symbol square character with latin
33E0-33FE han square character with han
33FF symbol square character with latin
4DC0-4DFF symbol symbol
FE30-FE44 symbol symbol for vertical
FE45-FE46 symbol symbol
FE47-FE48 symbol symbol for vertical
FE49-FE4F symbol symbol
1F200-1F202 kana enclosed/square character with kana
... - unassigned
1F210-1F212 han enclosed han
1F213 kana enclosed kana
1F214-1F248 han enclosed han
... - unassigned
1F250-1F251 han enclosed han
... - unassigned
1F260-1F265 symbol symbol