On 29 July 2002, Jamie Norrish wrote:

The Sanskrit translation on http://www.columbia.edu/kermit/utf8.html has two characters in the wrong place. Instead of nopahinasti, you have nopihanista. Assuming the transliteration is correct, the correct character sequence is:

0928 0913 092A 0939 093F 0928 0938 0924 093F

093F is the short i character which is displayed before the consonant which it is pronounced after.

(There are several errors in the encoding above; corrected in Jamie's response to Siva further down.)


On 9 Aug 2002, Siva Nataraja responded as follows:

Please note that English being not my mother tongue, it is not easy for me to be clear ; I hope you'll understand what I write...

I think that Jamie Norrish does not see on his screen what I actually wrote. It may occur with some browsers. Since the devanāgarī is not read the way it's written (as Arabic, in a certain way), a good software should decode devanāgarī as it must be read by interpreting some sequences of letters that must be read backward of replace some combinations of letters by a single ligature. Most don't.

    Let us take the case of letter [i], written ि before the letter it actually follows when the word is read aloud. For instance, [bi] appears as िब (i « ि » +b « ब »).

    However, Unicode is not well suited to devanāgarī needs (it does not possess any ligature, for example). A good document should be encoded in the same order a devanāgarī text is written, and then decoded correctly by a software (especially for ligatures), but normal browser can’t. So I wrote my document as correctly as I could, that is to say that I've indicated where ligatures should be (which is Unicode compliant) but I've put the letters as one must read them, i.e. not in devanāgarī's order : to read /nopahinasti/, I wrote nophinsti (in devanāgarī, you never write an |a] : every consonant is, by default, followed by a short [a], unless there is another vowel after or before) and not nopihnist. Jamie Norrish's browser must be more recent that mine, so my document is, for him but not for me, incorrect.

 

  1. Text in devanāgarī script without ligatures but correctly placed vowels. 

  2. Transliteration ; italic letters are those followed by a virāma (indicated by « - ») ; note that the |i] vowel is written before the letter it follows in the pronunciation.

  3. Text in devanāgarī script with ligatures and correctly placed vowels.

  4. Transcription indicating the pronunciation : consonants without written vowel symbol nor virāma are read with a following [a], here transcribed by a small a above the line.

    Explanations :

    Every consonant is pronounced with an [a] which is not written. If the consonant is pronounced with another vowel following it, this vowel is written, but some are placed above, others below, after or before the consonant :

    If the consonant mustn't be followed by an [a] and is the last letter before a punctuation (i.e. « । », kind of comma, and « ॥ », kind of dot), it must be followed by a virāma o ्  placed below : अक  = aka but अक्  = ak.

    When one must write a cluster of consonants, as kka,  ksa or ktra, it is not possible to place each letter one after the other, for these consonants would be pronounced with an a : अकक reads akaka, not akka ; it is possible to use a virāma, but it's not the right way : अक्क = akka. That's what I did with the document I sent you, because Unicode does not possess the correct ligatures symbols : kk should be written , akka  and akk . Ligatures are standalone symbols representing consonants clusters.

    A good software (I.E. 5.5 can, but I don't know how) could transform अक्क in .

 

<quote>
    Assuming the transliteration is correct, the correct character sequence is:

0928 0913 092A 0939 093F 0928 0938 0924 093F ;
<end of quote>

    It is not. This sequence is displayed as  by my browser, which I read na opahanisata (i), the last symbol, an i  ि  at the end of the word, being impossible to read. Without virāmas, there are too many letters a (which are underlined) in the word but  no clusters, and the i  ि  are badly placed. Last point, the ओ symbol, which reads o, can only be used at the beginning of a sequence of letters. Inside a word, its must be written ो.

    This sequence could be correctly displayed only by a very smart software, able to interpret these sequences of letters.

<quote>
093F is the short i character which is displayed before the consonant which it is pronounced after.
<end of quote>

    Yes, that's why it must be written before it.

    To sum up, I don't agree with Jamie Norrish : the sequence he proposes is not correct. Maybe his browser is able to transform it into correct devanāgarī orthography, I don't know. Anyway, what you should read on your web page is either:

(with interpolated ligatures), or:

(analytic orthography).

         Regards,

                 Vincent Ramos / Siva


On 24 Sep 2002 Jamie Norrish responded:

I would point anyone wishing for the definitive discussion on the matter of the order of the characters (as opposed to glyphs, which is where the confusion arises) to: http://www.unicode.org/unicode/uni2book/ch09.pdf, which is the chapter of the online edition of the Unicode Standard version 3.0 dealing with Devanagari (among other scripts). On page 12 of that PDF (page 220 on the printed version) is the following:

"Memory Representation and Rendering Order. The order for storage of plain text in Devanagari and all other Indic scripts generally follows phonetic order; that is, a CV syllable with a dependent vowel is always encoded as a consonant letter C followed by a vowel sign V in the memory representation....

"Because Devanagari and other Indic scripts have some dependent vowels that must be depicted to the left side of their consonant letter, the software that renders the Indic scripts must be able to reorder elements in mapping from the logical (character) store to the presentational (glyph) rendering. For example, if Cn denotes the nominal form of consonant C, and Vvs denotes a left-side dependent vowel sign form of vowel V, then a reordering of glyphs with respect to encoded characters occurs as just shown."

The example figure shows exactly the situation under discussion here, with the short "i" vowel which logically occurs after the consonant, but is displayed before it.

The sample phrase is properly encoded as:

&#2344;&#2379;&#2346;&#2361;&#2367;&#2344;&#2360;&#2381;&#2340;&#2367;

Certainly, in browsers which do not understand what to do with these character references, the glyph for the dependent vowel short "i" will be displayed in character order, which is to say after the glyph for the consonant it should properly be displayed before. However, that is a fault of the rendering, and not with the ordering of the characters.

That is, the characters are not written in the same order as a devanagari text is (hand-)written; rather it is written in logical order (which corresponds with the order of the sounds when the text is spoken), and it is up to the rendering agent to do any necessary reordering of glyphs for display.

(End)