[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 in MIDI Lyrics
From: |
karl |
Subject: |
Re: UTF-8 in MIDI Lyrics |
Date: |
Sat, 25 Feb 2017 17:44:32 +0100 (CET) |
sorry, last mail wrong from header.
Joe Austen:
> > Am 24.02.2017 um 02:15 schrieb Joseph Austin:
> >> This raises another question. I'm working with MIDI files,
> >> and it's not clear how to encode UTF-8 text in MIDI.
> >> There must be some convention, but I haven't found an official RP for it.
...
> I don't have a program that displays MIDI files with lyrics, so I can't test
> it.
Timidity will show the lyrics.
I have a simple program that dumps the midi as text:
http://aspodata.se/git/musik/bin/midi.pl
$ midi.pl test.midi | grep lyric | head
['lyric', 0, 'Sta'],
['lyric', 768, 'bat '],
['lyric', 768, 'Ma'],
['lyric', 768, 'ter '],
['lyric', 384, 'do'],
['lyric', 768, 'lo'],
['lyric', 384, 'ro'],
['lyric', 768, 'sa '],
['lyric', 384, 'sa '],
['lyric', 384, 'jux'],
$
> It appears that, when generating a MIDI file, LilyPond currently
> just puts UTF8 chars in the text fields as if they were ASCII.
> According the base MIDI spec, this is illegal; only ASCII chars
> between 0 and 127 are allowed.
Your wording is too hard. complete_midi_96-1-3.pdf, p.137 (or [1]
p.10) clearly says "should", but
"other characters codes
using the high-order bit may be used for interchange of files between
different programs on the same computer which supports an extended
character set. Programs on a computer which does not support
non-ASCII characters should ignore those characters."
[1] http://www.cdik.se/pdf/midiformat.pdf
Also, rp17.pdf, last paragraph gives you the set that are "accepted for use"
and that "it is best to avoid the use of these characters: \ [ ] { }".
And, rp26 clearly states in section 5:
In addition, if a byte order mark which specifies UNICODE such as
'FF FE' or 'FE FF' exists, the character code SET should be treated
as UNICODE.
There is such a "byte order mark" for utf8, see [2]. And then by
extension, you just have to insert that BOM somewhere in the midi
file (exists == not restricted to the lyrics meta event, preferable
in track 0 at time 0) and it would be legal (according to the
recommendation) to use utf8 straigth out the box.
[2] http://www.unicode.org/faq/utf_bom.html#BOM
> However, MIDI RP-17 and RP-26 introduce additional encodings for
> the <text> portion of the lyric meta-event FF 05 <len> <text>.
You do extrapolate a litte, rp17 tells you the "recommended" way to
specify end of word/line/paragraph, and gives you a list of characters
that should give no compatibility problems.
> In particular, RP-26 specifies the "language" code address@hidden to
> include 8-bit chars > 127. It seems no code for "UTF8" has been
> officially defined, but a reasonable proposal might be language code:
> address@hidden
You don't need that, see above about BOM. Also it would be interesting
to see which programs that actually support rp26. Since midi "standards"
just are recommendations, you have to know what works in the wild.
..
> So for LilyPond purposes, it would suffice to use a reversible
> encoding, that is, LilyPond would accept any MIDI file text format
> that LilyPond generates. The apparently existing UTF-8 default
> should work for that.
Lilypond don't read midi files, you can convert midi files to ly files,
which then lilypond can read.
> But if we are going to use a "private standard", we might as well
> imitate the "official" standard and insert something like
> FF 05 07 { @ U T F 8 }
> And lobby AMEI/MMA to adopt an official UTF8 position.
Could be good, but why just not capitalize on the BOM and just use
utf8.
Regards,
/Karl Hammar
-----------------------------------------------------------------------
Aspö Data
Lilla Aspö 148
S-742 94 Östhammar
Sweden
+46 173 140 57