Date: Sat, 25 Feb 2017 17:34:54 +0100 (CET)From: Karl Hammar <address@hidden>To: Joseph Austin <address@hidden><snip> And, rp26 clearly states in section 5:In addition, if a byte order mark which specifies UNICODE such as'FF FE' or 'FE FF' exists, the character code SET should be treated as UNICODE.There is such a "byte order mark" for utf8, see [2]. And then byextension, you just have to insert that BOM somewhere in the midifile (exists == not restricted to the lyrics meta event, preferablein track 0 at time 0) and it would be legal (according to therecommendation) to use utf8 straigth out the box.[2] http://www.unicode.org/faq/utf_bom.html#BOM<snip>
only ASCII chars between 0 and 127 are allowed.
Your wording is too hard. complete_midi_96-1-3.pdf, p.137 (or [1] p.10) clearly says "should", but
"other characters codes using the high-order bit may be used for interchange of files between different programs on the same computer which supports an extended character set. Programs on a computer which does not support non-ASCII characters should ignore those characters."
I stand corrected. But if we are going to use a "private standard", we might as well imitate the "official" standard and insert something like FF 05 07 { @ U T F 8 } And lobby AMEI/MMA to adopt an official UTF8 position.
Could be good, but why just not capitalize on the BOM and just useutf8.Regards,/Karl Hammar
OK, the UTF-8 BOM is 0x EF BB BF But given that the MIDI file is not a "text file" but a binary file with text fields scattered throughout, normally embedded in various MIDI Meta-events, where should the BOM be placed?
Interpreting your suggestion, we could add a Lyric Meta-Event with the BOM as the text field to Track 0 Time 0. That should work for lyrics, but RP-26 indicates that lyrics "language encoding" should not extend to other types of text events. For other text events, it seems we would need to prefix every UTF-8 text field with the BOM. --- Joe Austin
|