lilypond-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character encoding / poor man's letterspacing


From: Urs Liska
Subject: Re: Character encoding / poor man's letterspacing
Date: Tue, 12 Mar 2019 15:53:18 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1

Hi Alexander,

thank you for that pointer. This made my day!

Am 12.03.19 um 14:54 schrieb Alexander Kobel:
Hi,

On 12.03.19 10:43, Urs Liska wrote:
Am 12.03.19 um 01:14 schrieb Aaron Hill:
On 2019-03-11 3:40 pm, David Kastrup wrote:
Urs Liska <address@hidden> writes:
[...]

Also, I should have been clear before.  David's code should work for most cases.  I was just being pedantic that /./ would not work if the input has combining characters.  For instance, if you type U+0308 (Combining Diaeresis) after an 'a', you'll get an ä.  But the simple regex would not treat that as a single grapheme.  The result would be "T a ̈ s t".

I did understand it that way, and it would not be an issue in the project I'm working on. There it's just some umlauts.

given that Aaron, my undisputed hero of Lily-UTFxy-workarounds, is active in this thread: I'm surprised to see no mention of his wonderful example of such a workaround from

  https://lists.gnu.org/archive/html/lilypond-user/2018-10/msg00473.html

IIRC, the essentials of the approach is to encode stuff as UTF-32 (more or less brute force), and handle individual characters as chunks of 4 consecutive bytes / 0..255-integers in a list.
It's not the ultimate solution to all imaginable troubles with encodings, but should be good enough for almost every *practical* use case.

In his modified center-lyrics-ignoring-punctuation.ily from that thread, you'll find the two main utility functions as string->utf32 and utf32->string. I presume you could call string->utf32, slice in a '(0 0 0 32) after each 4 entries, convert back via utf32->string, et voilà.


For future reference here is my solution:

\version "2.19.50"  %% and higher

% Coded by Aaron Hill
%
% Referenced from
% https://lists.gnu.org/archive/html/lilypond-user/2019-03/msg00150.html
%
%% UTF8 workaround - see the following...
%% http://lists.gnu.org/archive/html/lilypond-user/2018-10/msg00468.html

#(define (utf8->utf32 lst)
  "Converts a list of UTF8-encoded characters into UTF32."
  (if (null? lst) '()
    (let ((ch (char->integer (car lst))))
      (cond
        ;; Characters 0x00-0x7F
        ((< ch #b10000000) (cons ch (utf8->utf32 (cdr lst))))
        ;; Characters 0x80-0x7FF
        ((eqv? (logand ch #b11100000) #b11000000)
          (cons (let ((ch2 (char->integer (cadr lst))))
              (logior (ash (logand ch #b11111) 6)
                      (logand ch2 #b111111)))
            (utf8->utf32 (cddr lst))))
        ;; Characters 0x800-0xFFFF
        ((eqv? (logand ch #b11110000) #b11100000)
          (cons (let ((ch2 (char->integer (cadr lst)))
                      (ch3 (char->integer (caddr lst))))
              (logior (ash (logand ch #b1111) 12)
                      (ash (logand ch2 #b111111) 6)
                      (logand ch3 #b111111)))
            (utf8->utf32 (cdddr lst))))
        ;; Characters 0x10000-0x10FFFF
        ((eqv? (logand ch #b111110000) #b11110000)
          (cons (let ((ch2 (char->integer (cadr lst)))
                      (ch3 (char->integer (caddr lst)))
                      (ch4 (char->integer (cadddr lst))))
              (logior (ash (logand ch #b111) 18)
                      (ash (logand ch2 #b111111) 12)
                      (ash (logand ch3 #b111111) 6)
                      (logand ch4 #b111111)))
            (utf8->utf32 (cddddr lst))))
        ;; Ignore orphaned continuation characters
        ((eqv? (logand ch #b11000000) #b10000000) (utf8->utf32 (cdr lst)))
        ;; Error on all else
        (else (error "Unexpected character:" ch))))))

#(define (utf32->utf8 lst)
  "Converts a list of UTF32-encoded characters into UTF8."
  (if (null? lst) '()
    (let ((ch (car lst)))
      (append (cond
          ;; Characters 0x00-0x7F
          ((< ch #x80) (list (integer->char ch)))
          ;; Characters 0x80-0x7FF
          ((< ch #x800) (list
            (integer->char (logior #b11000000 (logand (ash ch -6) #b11111)))
            (integer->char (logior #b10000000 (logand ch #b111111)))))
          ;; Characters 0x800-0xFFFF
          ((< ch #x10000) (list
            (integer->char (logior #b11100000 (logand (ash ch -12) #b1111)))
            (integer->char (logior #b10000000 (logand (ash ch -6) #b111111)))
            (integer->char (logior #b10000000 (logand ch #b111111)))))
          ;; Characters 0x10000-0x10FFFF
          (else (list
            (integer->char (logior #b11110000 (logand (ash ch -18) #b111)))
            (integer->char (logior #b10000000 (logand (ash ch -12) #b111111)))
            (integer->char (logior #b10000000 (logand (ash ch -6) #b111111)))
            (integer->char (logior #b10000000 (logand ch #b111111))))))
        (utf32->utf8 (cdr lst))))))

#(define (string->utf32 s) (utf8->utf32 (string->list s)))
#(define (utf32->string l) (list->string (utf32->utf8 l)))


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \letterspaced markup command
% simply inserting spaces between letters

#(define-markup-command
  (letterspaced layout props text)(string?)
  (let*
   (;; convert string to list of raw UTF-32 numbers
    (utf-chars (string->utf32 text))
    ;; Interleave spaces (32) between the characters
    (spaced-list
     (let ((result '()))
       (for-each
        (lambda (elt)
          (set! result (cons elt result))
          (set! result (cons 32 result)))
        utf-chars)
       ;; reverse the list (which has been reversed through the cons-es
       ;; and strip the surplus space that is now at the head of the list
       (reverse result)))
    (spaced-text (utf32->string spaced-list))
    )
   (interpret-markup layout props
     (markup (utf32->string spaced-list)))))


%%%%%%%%%%%
% Example

{
  s1 ^\markup { Foo \letterspaced "Hey łäſ" bar }
}

reply via email to

[Prev in Thread] Current Thread [Next in Thread]