bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#50247: 27.2; wrong `word-wrap' for Chinese characters


From: ClaudeMonet
Subject: bug#50247: 27.2; wrong `word-wrap' for Chinese characters
Date: Sun, 29 Aug 2021 11:14:40 +0800


When `toggle-word-wrap' is enabled, lines that ends with Chinese
characters and Chinese punctuations won't be seperated in the right
way, "normally", all Chinese words in a sentence will be crowded and
recognized by Emacs as one single WORD.

e.g. "世界" is a word in
Chinese, and "世界人民大团结万岁。" is a full sentence ending with a
full width perid, and Emacs would recognize the sentence as a word, thus
wrap lines in a wrong way.

By the way, I think this one have long been a problem for Chinese users,
since we use full-width punctuation system instead in English half-width
is more generally adopted. Another thing is, in Emacs when you use
`forward-word' key binding, I know English words are all separated
either by punctuations or blank characters(<space>, <tab>, etc.), but in
Chinese, words in a single sentence are usually separated by nothing, I
don't know what the normal practice for "word recognizing" tasks is on
modern OS like Mac and Windows. I guess there is a dictionary mechanism.

A footnote here, for tokenizing Chinese words, there is a Python
tokenizor called "jieba" in NLP field, would be a great reference if you
guys are going to address this issue. The github link of "jieba" is:

        https://github.com/fxsjy/jieba

Thanks!


In GNU Emacs 27.2 (build 1, x86_64-apple-darwin18.7.0, NS appkit-1671.60 
Version 10.14.6 (Build 18G95))
of 2021-03-28 built on builder10-14.porkrind.org
Windowing system distributor 'Apple', version 10.3.2022
System Description:  macOS 11.5.2

Recent messages:
Wrote /Users/claude/.emacs.d/lisp/init-preload-local.el
Quit
Type "q" in help window to delete it.
C-c C-o is undefined
uncompressing simple.el.gz...done
Mark set
find-function-C-source: The C source file buffer.c is not available
Quit [2 times]

Mark set

Configured using:
'configure --with-ns '--enable-locallisppath=/Library/Application
Support/Emacs/${version}/site-lisp:/Library/Application
Support/Emacs/site-lisp' --with-modules'

Configured features:
NOTIFY KQUEUE ACL GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS NS MODULES
THREADS JSON PDUMPER GMP

Important settings:
  value of $LANG: en_CN.UTF-8
  locale-coding-system: utf-8

Major mode: Org

Minor modes in effect:
  default-text-scale-mode: t
  recentf-mode: t
  vertico-mode: t
  marginalia-mode: t
  company-quickhelp-mode: t
  company-quickhelp-local-mode: t
  winner-mode: t
  flycheck-color-mode-line-mode: t
  global-flycheck-mode: t
  flycheck-mode: t
  dimmer-mode: t
  global-anzu-mode: t
  anzu-mode: t
  global-company-mode: t
  company-mode: t
  diredfl-global-mode: t
  shell-dirtrack-mode: t
  savehist-mode: t
  electric-pair-mode: t
  delete-selection-mode: t
  global-auto-revert-mode: t
  global-so-long-mode: t
  mode-line-bell-mode: t
  beacon-mode: t
  show-paren-mode: t
  global-page-break-lines-mode: t
  page-break-lines-mode: t
  whole-line-or-region-global-mode: t
  whole-line-or-region-local-mode: t
  hes-mode: t
  which-key-mode: t
  global-whitespace-cleanup-mode: t
  whitespace-cleanup-mode: t
  global-diff-hl-mode: t
  diff-hl-mode: t
  projectile-rails-global-mode: t
  projectile-mode: t
  ipretty-mode: t
  auto-compile-on-load-mode: t
  auto-compile-on-save-mode: t
  immortal-scratch-mode: t
  desktop-save-mode: t
  ns-auto-titlebar-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  auto-fill-function: org-auto-fill-function
  visual-line-mode: t
  transient-mark-mode: t

Load-path shadows:
/Users/claude/.emacs.d/elpa-27.2/magit-20210822.529/magit-section-pkg hides 
/Users/claude/.emacs.d/elpa-27.2/magit-section-20210819.1119/magit-section-pkg
/Users/claude/.emacs.d/elpa-27.2/seq-2.22/seq hides 
/Applications/Emacs.app/Contents/Resources/lisp/emacs-lisp/seq

Features:
(shadow sort mail-extr emacsbug sendmail consult-vertico consult
bookmark ielm tabify view cl-print eieio-opt speedbar sb-image ezimage
dframe rainbow-mode help-fns radix-tree switch-window
switch-window-mvborder switch-window-asciiart quail executable cus-edit
cus-start cus-load sanityinc-tomorrow-bright-theme
color-theme-sanityinc-tomorrow default-text-scale recentf tree-widget
orderless vertico marginalia company-quickhelp pos-tip winner windswap
windmove vc-bzr vc-src vc-sccs vc-svn vc-cvs vc-rcs diff-hl-dired
elisp-slime-nav paredit aggressive-indent highlight-quoted
display-line-numbers display-fill-column-indicator rainbow-delimiters
symbol-overlay bug-reference goto-addr flycheck-color-mode-line
flycheck-package package-lint let-alist imenu finder flycheck dimmer
face-remap color anzu company-oddmuse company-keywords company-etags
etags fileloop company-gtags company-dabbrev-code company-dabbrev
company-files company-clang company-capf company-cmake company-semantic
company-bbdb company-php company-template ac-php-core popup xcscope
company-anaconda anaconda-mode xref project pythonic
company-nixos-options nixos-options company pcase disp-table vc-git
vc-darcs org-element avl-tree generator ol-eww eww mm-url url-queue
ol-rmail ol-mhe ol-irc ol-info ol-gnus nnir gnus-sum url url-proxy
url-privacy url-expand url-methods url-history mailcap shr url-cookie
url-domsuf url-util svg xml dom gnus-group gnus-undo gnus-start
gnus-cloud nnimap nnmail mail-source utf7 netrc nnoo gnus-spec gnus-int
gnus-range message rmc puny rfc822 mml mml-sec epa epg epg-config
mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader gnus-win gnus nnheader gnus-util rmail rmail-loaddefs rfc2047
rfc2045 ietf-drums text-property-search mail-utils mm-util mail-prsvr
wid-edit ol-docview doc-view image-mode exif dired-x diredfl dired
dired-loaddefs ol-bibtex bibtex ol-bbdb ol-w3m ob-sqlite ob-sql ob-shell
ob-ruby ob-python python tramp-sh docker-tramp tramp-cache tramp
tramp-loaddefs trampver tramp-integration files-x tramp-compat shell
parse-time iso8601 ls-lisp ob-plantuml ob-octave ob-ledger ob-latex
ob-gnuplot ob-dot ob-ditaa ob-R org-clock org ob ob-tangle ob-ref ob-lob
ob-table ob-exp org-macro org-footnote org-src ob-comint org-pcomplete
pcomplete org-list org-faces org-entities time-date noutline outline
org-version ob-emacs-lisp ob-core ob-eval org-table ol org-keys
org-compat org-macs org-loaddefs format-spec find-func cal-menu calendar
cal-loaddefs savehist session elec-pair delsel autorevert filenotify
so-long mode-line-bell beacon paren page-break-lines
whole-line-or-region highlight-escape-sequences which-key diminish
whitespace-cleanup-mode whitespace diff-hl log-view pcvs-util vc-dir
ewoc vc vc-dispatcher diff-mode cl-extra help-mode projectile-rails rake
f dash s inflections inf-ruby ruby-mode smie autoinsert projectile
lisp-mnt grep compile comint ring ibuf-ext ibuffer ibuffer-loaddefs
thingatpt jka-compr ipretty advice auto-compile packed immortal-scratch
uptimes pp server init init-locales init-direnv init-ledger init-dash
init-folding init-misc init-common-lisp init-clojure-cider init-clojure
init-slime init-lisp init-paredit init-nix init-terraform init-docker
init-yaml init-toml init-rust init-nim init-j init-ocaml init-sql
init-rails init-ruby init-purescript init-elm init-haskell init-python
reformatter ansi-color init-http init-haml init-css init-html init-nxml
init-org init-php init-javascript easy-mmode init-erlang erlang-start
init-csv init-markdown init-textile init-crontab init-compile
init-projectile init-github init-git init-darcs init-vc init-whitespace
init-editing-utils init-mmm mmm-auto mmm-vars mmm-utils mmm-compat
init-sessions desktop frameset init-windows init-company
init-hippie-expand init-minibuffer init-recentf init-flycheck
init-ibuffer ibuf-macs init-uniquify init-grep init-isearch init-dired
init-gui-frames ns-auto-titlebar init-osx-keys init-themes init-xterm
init-frame-hooks init-preload-local init-exec-path exec-path-from-shell
init-elpa fullframe finder-inf rx edmacro kmacro slime-autoloads info
package easymenu browse-url url-handlers url-parse auth-source eieio
eieio-core cl-macs eieio-loaddefs password-cache json subr-x map
url-vars seq byte-opt gv bytecomp byte-compile cconv init-site-lisp
cl-seq cl-loaddefs cl-lib init-utils init-benchmarking derived
early-init tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/ns-win ns-win ucs-normalize mule-util
term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads kqueue cocoa ns multi-tty make-network-process emacs)

Memory information:
((conses 16 632053 354268)
(symbols 48 59409 246)
(strings 32 197826 53863)
(string-bytes 1 5927719)
(vectors 16 69807)
(vector-slots 8 1717944 390092)
(floats 8 911 2031)
(intervals 56 3152 3510)
(buffers 1000 32))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]