bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New encodings in makeinfo


From: Sergey Poznyakoff
Subject: Re: New encodings in makeinfo
Date: Sun, 29 Jan 2006 21:19:39 EET

Patrice Dumas <address@hidden> wrote:

> For some there are letters with an accent or a symbol added, and it seems
> to me that the transliterated symbol should be the letter without accent
> or symbol:
> 
> 00C5: A better than AA 
> 00D8: O better than OE
> 00E5: a better than aa
> 00F8: o better than oe

Nope, all these symbols are traditional replacements for 'å' (a-ring)
and 'ø' (o-slash) in Nordic languages. Moreover, not so long ago they
were part of the standard orthography: e.g. the letter 'å' was
introduced in Norwegian in 1917 and in Danish in 1955. Even today, if a
somebody's computer does not allow him to use Nordic characters, the
person will use these transliterations instead. The digraph 'aa' is
still in use in some traditional place and family names. 

The difference in spelling can cause the difference in meaning: both
'måtte' and 'maatte' (read "m-oh-te") in Norwegian mean 'I have to',
whereas 'matte' is quite a different word ('a mat').
 
> 00D0: D better than DH

Again, 'dh' is the usual representation of the 'eth' ('edh') sound
wherever it, for some reason, cannot be written as is.

> Please tell if you have an argument
> against or for a given transliteration, such that I can report to
> the Text::Unidecode author if there are errors.

Before I begin, please notice the principles I used when choosing the
transliteration:

1. The transliteration is meant for use in file names, therefore any
   characters that have special meaning for the shell should be omitted.

2. The translitirated names should sound close to their original reading
   when read by a native reader. Among other considerations, this brings
   in question the fact that most people using non-latin (or modified latin)
   scripts have adopted their traditional ways of transliteration with
   the advent of the Internet.     

Therefore, the makeinfo transliterations must not necessarily coincide
with the ones from Text::Unidecode.
   
Now, here are my arguments:

> 00DE Th -> TH                (THORN)
> 0404 Ie -> IE
> 0407 Yi -> YI
> 0416 Zh -> ZH
> 0427 Ch -> CH
> 0428 Sh -> SH
> 0429 Shch -> SHCH

The ones I proposed for texinfo (the ones after the -> sign)
are traditionally used in transliterating texts written in cyrillic
scripts. Besides, I believe it is logical enough to transliterate a caps
by a caps, otherwise capitalized words will look rather strange, e.g. PUSHCHA
- PUShchA, (wild forest in Ukrainian). The first spelling seems to be
more natural. 

> 0415 Ie -> E
> 0435 ie -> e

Can be both. Depending on the position in the word this letter can sound
both as 'e' or 'ie'. Anyway, in this case I'd prefer to spell 0415 as
'IE', due to the reasons explained above.

> 0425 Kh -> H

Here you are right, 'KH' is more appropriate.

> 0426 Ts -> C

Again, 'C' is the usual representation I found in most transliterated
cyrillic texts. The spelling 'TS' can produce duplicates. For example,
in Bulgarian: 

swotvetstvie   - 'matching'. Here 'ts' is a cluster of two cyrillic
                 letters: 't' and 's'
        
cyalost        - 'entirety'. Here, 'c' represents the cyrillic letter 'TSE'. 

(both examples above are the usual way to transliterate these words).

> 042a  -> W
> 044a  -> w

This is certainly not right. While in Russian this character (called
'hard sign') does not represent any sound and can safely be omitted in
transliteration, it does represent a vowel sound in Bulgarian, and 'w'
is the usual way to transliterate it wherever we have to write Bulgarian
texts using the latin script.

> 042c ' -> X
> 044c ' -> x

Although this character (called 'soft sign') is often represented by an
apostrophe, it is inconvenient for use in file names (see the 'principles'
above), therefore I chose to use 'X' instead.

> 042e Iu -> yu
> 042f Ia -> YA
> 044f ia -> ya

Both ways are OK with me, since native speakers tend to read them the same
way. However, again, I'd like to emphasize that caps should always be
caps (IU - IA or YU - YA)

> 0433 gh -> g

The appearence of 'h' in the proposed transliteration is due to the fact
that 'g' is subject to palatalization before front wovels in many
European languages (e.g. as in French 'genou' or English
'generate'). However, it is not so in East-Slavic languages, where this
letter ('ghe') is used. The spelling 'g' is traditional for native
speakers, whereas 'gh' is not. 
 
> 0445 kh -> h

See KH

> 0446 ts -> c

See TS

> 04d7 ie -> IO

The proposed 'ie' is awfully wrong. In Russian (it is the only language
that uses this letter, as far as I know), it sounds as 'io' or 'yo' (a
diphtong). It is often represented in writing by cyrillic 'e', however
that does not change its reading. 

Regards,
Sergey
    




reply via email to

[Prev in Thread] Current Thread [Next in Thread]