{"id":493,"date":"2009-08-19T11:59:19","date_gmt":"2009-08-19T18:59:19","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/cqtwo\/?p=493"},"modified":"2009-10-03T17:36:16","modified_gmt":"2009-10-04T00:36:16","slug":"transliterating-sanskrit-and-pali","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/2009\/08\/19\/transliterating-sanskrit-and-pali\/","title":{"rendered":"Transliterating Sanskrit and Pali [updated]"},"content":{"rendered":"<p>Transliterating Sanskrit, and its derivatives such as Pali, remains an annoying problem.\u00a0 The problem isn&#8217;t with the language itself; Sanskrit&#8217;s wonderfully precise and clear about sounds and letters.\u00a0 Likewise, there&#8217;s no issue with scripts or alphabets.\u00a0 You might think that there is some mystical connection between the script that a language is written in and the language itself but that&#8217;s really not the case.\u00a0 Sanskrit in India is written in Devanagari but there&#8217;s no special reason to use Devanagari for Sanskrit instead of the Latin alphabet or another one.\u00a0 Plus, Sanskrit&#8217;s only been written in Devanagari for a comparatively short period of time.<\/p>\n<p><!--more--><\/p>\n<p>(Surprisingly, the alphabet was only invented once and all alphabets are genetically related to each other, branches from this one root.\u00a0 Devanagari is linked to Latin letters via Brahmi and Aramaic.)<\/p>\n<p>But in order to write Sanskrit correctly, you need some Latin letters not used in English.\u00a0 This is a common-enough situation; think of accent marks, or the French and Portuguese cedilla &#8212; \u00e7 &#8212; or the Spanish enye &#8212; \u00f1 &#8212; or even Mot\u00f6rhead&#8217;s <a title=\"Heavy metal umlaut\" href=\"http:\/\/en.wikipedia.org\/wiki\/Heavy_metal_umlaut\">heavy metal umlaut<\/a>.\u00a0 So, for example, &#8220;Devanagari&#8221; ought to be written &#8220;Devan\u0101gar\u012b&#8221; and &#8220;Pali&#8221; should be &#8220;P\u0101\u1e37i.&#8221;\u00a0 The complete set of diacritics for Pali is: \u0101, \u012b, \u016b, \u1e41, \u1e47, \u00f1, \u1e6d, \u1e0d, \u1e45, \u1e37 .<\/p>\n<p>There&#8217;s another, separate but related, issue about when to use these &#8216;extra&#8217; letters and marks; for native English readers, the argument goes, these &#8216;extra&#8217; letters and marks &#8212; called diacritics &#8212; are distracting and make the words harder to read.<\/p>\n<p>Specialists typically prefer to preserve diacritics, because losing them changes the meaning of the word in its original language.\u00a0 The question comes down to: &#8220;when do these foreign words become English words?&#8221;\u00a0 There&#8217;s an active debate going on now on H-Buddhism, an academic Buddhist studies mailing list on this very topic.\u00a0 Dictionaries are split on the issue, with some words preserving diacriticals and others losing them: for more on this, see the list of <a title=\"Buddhist Terms Found in English Print Dictionaries\" href=\"http:\/\/www.h-net.org\/~buddhism\/buddhist_terms_english.html\"><em>Buddhist Terms Found in English Print Dictionaries<\/em><\/a> and Gerald Jackson&#8217;s <a title=\"Gerald Jackson on getting published\" href=\"http:\/\/gettingpublished.wordpress.com\/2009\/08\/27\/diacritics-ok\/\">series on fonts and diacritics<\/a> in academic publishing.<\/p>\n<p>The problem arises when you need to write diacritics in your friendly word processing application.\u00a0 This immediately leads to a technical conversation about Unicode and Unicode fonts.\u00a0 Which is when things start to get hairy.<\/p>\n<p>The best starting point for Unicode issues is <a href=\"http:\/\/www.alanwood.net\/unicode\/\" target=\"_blank\">Alan Wood&#8217;s page<\/a>,\u00a0 It&#8217;s worth reading for the introduction, as an overview of the topic of digital transcription.\u00a0 More specifically, for the topic at hand, the  Tibetan &amp; Himalayan Digital Library has <a href=\"http:\/\/thlib.org\/tools\/#wiki=\/access\/wiki\/site\/c06fa8cf-c49c-4ebc-007f-482de5382105\/windows%20unicode%20diacritic%20fonts.html\" target=\"_blank\">a good survey of Unicode fonts<\/a> for transliterating &#8220;Indo-Tibetan&#8221; languages.<\/p>\n<p>(By Indo-Tibetan they mean Indian languages for Buddhist studies, including  Sanskrit, Pali, Gandhari, and so on, plus Tibetan.\u00a0 &#8220;Indo-Tibetan&#8221; isn&#8217;t a language family like Indo-European but the term point to the very close relationship between Tibet and India.\u00a0 Buddhist Tibetan is a specialized language unreadable to a native Tibetan, optimized a thousand years ago to translate Buddhist Sanksrit into Tibetan.\u00a0 Smart people have been dealing with these issues for a long time.)<\/p>\n<p>They make the <a href=\"http:\/\/thlib.org\/tools\/#wiki=\/access\/wiki\/site\/c06fa8cf-c49c-4ebc-007f-482de5382105\/unicode%20diacritic%20fonts.html\" target=\"_blank\">point<\/a> that not all Unicode fonts contain the necessary characters, so simply choosing a Unicode font isn&#8217;t enough: &#8220;To properly display all the diacritic marks used in Indo-Tibetan studies, a Unicode font must contain the following character ranges:<\/p>\n<ul style=\"margin-left: 40px\">\n<li> Basic Latin: U+0000 \u2013 U+007F (<a rel=\"external\" href=\"http:\/\/www.unicode.org\/charts\/PDF\/U0000.pdf\" target=\"_blank\">View Unicode Chart<\/a>)<\/li>\n<li>Latin-1 Supplement: U+0080 \u2013 U+00FF (<a rel=\"external\" href=\"http:\/\/www.unicode.org\/charts\/PDF\/U0080.pdf\" target=\"_blank\">View Unicode Chart<\/a>)<\/li>\n<li>Latin Extended-A: U+0100 \u2013 U+017F (<a rel=\"external\" href=\"http:\/\/www.unicode.org\/charts\/PDF\/U0100.pdf\" target=\"_blank\">View Unicode Chart<\/a>)<\/li>\n<li>Latin Extended-B: U+0180 \u2013 U+024F (<a rel=\"external\" href=\"http:\/\/www.unicode.org\/charts\/PDF\/U0180.pdf\" target=\"_blank\">View Unicode Chart<\/a>)<\/li>\n<li>Latin Extended Additional: U+1E00 \u2013 U+1EFF (<a rel=\"external\" href=\"http:\/\/www.unicode.org\/charts\/PDF\/U1E00.pdf\" target=\"_blank\">View Unicode Chart<\/a>)&#8221;<\/li>\n<\/ul>\n<div style=\"border: medium none;color: #000000;text-align: left;text-decoration: none\">(<a href=\"http:\/\/thlib.org\/tools\/#wiki=\/access\/wiki\/site\/c06fa8cf-c49c-4ebc-007f-482de5382105\/unicode%20diacritic%20fonts.html%23ixzz0OXQV5xVF\" target=\"_blank\">More&#8230;<\/a>)<\/p>\n<p>For Pali, this is the Unicode set:<\/p><\/div>\n<table border=\"0\">\n<tbody>\n<tr>\n<th>character<\/th>\n<th>ASCII rendering<\/th>\n<th>character name<\/th>\n<th>Unicode number<\/th>\n<th>key combination<\/th>\n<th>HTML code<\/th>\n<\/tr>\n<tr>\n<td align=\"center\">\u0101<\/td>\n<td>aa<\/td>\n<td>a macron<\/td>\n<td>61580<\/td>\n<td>Alt+A<\/td>\n<td>\u0101<\/td>\n<\/tr>\n<tr>\n<td align=\"center\">\u012b<\/td>\n<td>ii<\/td>\n<td>i macron<\/td>\n<td>61620<\/td>\n<td>Alt+I<\/td>\n<td>\u012b<\/td>\n<\/tr>\n<tr>\n<td align=\"center\">\u016b<\/td>\n<td>uu<\/td>\n<td>u macron<\/td>\n<td>61672<\/td>\n<td>Alt+U<\/td>\n<td>\u016b<\/td>\n<\/tr>\n<tr>\n<td align=\"center\">\u1e41<\/td>\n<td>.m<\/td>\n<td>m dot-under<\/td>\n<td><\/td>\n<td><\/td>\n<td>\u1e41<\/td>\n<\/tr>\n<tr>\n<td align=\"center\">\u1e47<\/td>\n<td>.n<\/td>\n<td>n dot-under<\/td>\n<td>61686<\/td>\n<td>Alt+N<\/td>\n<td>&amp;#7751<\/td>\n<\/tr>\n<tr>\n<td align=\"center\">\u00f1<\/td>\n<td>~n<\/td>\n<td>n tilde<\/td>\n<td>61590<\/td>\n<td>Alt+Ctrl+N<\/td>\n<td>&amp;ntilde;<\/td>\n<\/tr>\n<tr>\n<td align=\"center\">\u1e6d<\/td>\n<td>.t<\/td>\n<td>t dot-under<\/td>\n<td>61642<\/td>\n<td>Alt+T<\/td>\n<td>\u1e6d<\/td>\n<\/tr>\n<tr>\n<td align=\"center\">\u1e0d<\/td>\n<td>.d<\/td>\n<td>d dot-under<\/td>\n<td>61622<\/td>\n<td>Alt+D<\/td>\n<td>\u1e0d<\/td>\n<\/tr>\n<tr>\n<td align=\"center\">\u1e45<\/td>\n<td>&#8220;n<\/td>\n<td>n dot-over<\/td>\n<td>61626<\/td>\n<td>Ctrl+N<\/td>\n<td>\u1e45<\/td>\n<\/tr>\n<tr>\n<td align=\"center\">\u1e37<\/td>\n<td>.l<\/td>\n<td>l dot-under<\/td>\n<td>61634<\/td>\n<td>Alt+L<\/td>\n<td>\u1e37<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>From <em>Wikipedia<\/em>&#8216;s &#8220;<a href=\"http:\/\/en.wikipedia.org\/wiki\/Pali#Pali_transliteration_on_computers\" target=\"_blank\">Pali transliteration on computers<\/a>.&#8221;<\/p>\n<p>The Tibetan Digital Library people also have a good <a href=\"http:\/\/thlib.org\/tools\/thl-diacritic-chart.php\" target=\"_blank\">chart<\/a> of relevant diacritics.<\/p>\n<p>(In the distant past, like five years ago, there were various gnarly work-arounds, including the now-deprecated Times Norman \/ Normyn font.\u00a0 Nobumi Iyanaga has written a <a title=\"Convert Diacritics\" href=\"http:\/\/www.bekkoame.ne.jp\/~n-iyanag\/researchTools\/convert_word_diacritical_f.html\">useful library of scripts<\/a> to convert from Times Norman \/ Normyn to &#8216;good&#8217; Unicode.)<\/p>\n<p>So, what are the practical options for a good font\u00a0 for transliterating Sanskrit and Pali today?\u00a0 It seems to me that there are at least five good choices:<\/p>\n<p><strong>Times Ext Roman<\/strong><\/p>\n<p><strong> <\/strong>The Tibetan Digital Library people really like Times Ext Roman.\u00a0 But the only source for it is the <a rel=\"external\" href=\"http:\/\/www.bcca.org\/services\/fonts\/\" target=\"_blank\">Bah\u00e1&#8217;i Computer &amp; Communication Association<\/a> and it&#8217;s not clear to me what license its published under so I would be reluctant to recommend it even though I trust that it&#8217;s technically valid.<\/p>\n<p><strong>Gentium<br \/>\n<\/strong><br \/>\nIf you can get past SIL&#8217;s Christian missionary agenda, they do outstanding linguistics work and their <a href=\"http:\/\/scripts.sil.org\/cms\/scripts\/page.php?site_id=nrsi&amp;item_id=gentium\" target=\"_blank\">Gentium<\/a> font is well regarded, seems complete for the purposes of transliterating Sanskrit and Pali, is widely accepted, is under active development and is licensed under a good, if idiosyncratic, open source license.\u00a0 It&#8217;s a <a href=\"http:\/\/scripts.sil.org\/cms\/scripts\/page.php?site_id=nrsi&amp;item_id=Gentium_samples\" target=\"_blank\">nice-looking typeface<\/a>, in my opinion.<\/p>\n<p><strong>IndUni<\/strong><\/p>\n<p><a href=\"http:\/\/bombay.indology.info\/software\/fonts\/index.html\" target=\"_blank\">John Smith<\/a> has recently updated this <a href=\"http:\/\/bombay.indology.info\/software\/fonts\/induni\/index.html\" target=\"_blank\">font family<\/a>.\u00a0 It&#8217;s exactly designed for the topic under discussion, &#8220;the representation of Indian-language (and similar) material in Roman script using the Unicode character set.&#8221;\u00a0 But he&#8217;s just one, albeit committed, guy and I don&#8217;t know what license he&#8217;s publishing these under, so I worry about its long-term supportability.\u00a0 But worth mentioning; sort of in the same category to me as Times Ext Roman.<br \/>\n<strong><br \/>\nTransIndic Transliterator<\/strong><\/p>\n<p>There&#8217;s also a commercial product,<a href=\"http:\/\/www.linguistsoftware.com\/tintuu.htm\" target=\"_blank\"> TransIndic Transliterator in Unicode<\/a>, from Linguists Software that seems like it does the job, although I don&#8217;t know much about it.\u00a0 It costs $100 per typeface (they have Times, Palatino, Arial, etc.) or $250 for the whole thing.\u00a0 Commercial license.\u00a0 Paying for it has the advantage of having someone on the hook to help you with it, not a small thing.<br \/>\n<strong><br \/>\nGandhari Unicode <\/strong><\/p>\n<p>This <a href=\"http:\/\/andrewglass.org\/gu.php\" target=\"_blank\">nice-looking<\/a> typeface was originally designed to transcribe the newly discovered Buddhist manuscripts from Afghanistan.\u00a0 (Gandhari is another Middle Indic prakrit like Pali.)\u00a0 Gandhari Unicode is under active development, which is good, and seems widely accepted.\u00a0 (<a href=\"http:\/\/www.ebmp.org\/p_dwnlds.php\" target=\"_blank\">Main page<\/a>, <a href=\"http:\/\/andrewglass.org\/download.php?fname=gu5-110_ttf&amp;extn=zip\" target=\"_blank\">download<\/a>.)<\/p>\n<p>The license status of Gandhari Unicode is a little bit troubling; it&#8217;s based on work licensed under the &#8220;<a href=\"http:\/\/www.artifex.com\/downloads\/doc\/Public.htm\" target=\"_blank\">Aladdin Free Public License<\/a>&#8221; which isn&#8217;t, despite the name, a free public license.\u00a0 The <a href=\"http:\/\/www.fsf.org\/licensing\/licenses\/\" target=\"_blank\">Free Software Foundation<\/a> considers it a non-free license.\u00a0 Other parts of Gandhari Unicode are GPL-derived but I don&#8217;t understand which takes precedence.\u00a0 Note that the link in Andrew Glass&#8217;s documentation to the Aladdin license (at Wisconsin) is out of date.<\/p>\n<p>[26 August 2009 update: According to reliable reports, there are issues with Gandhari Unicode&#8217;s spacing, especially italics, when printed.]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Transliterating Sanskrit, and its derivatives such as Pali, remains an annoying problem.\u00a0 The problem isn&#8217;t with the language itself; Sanskrit&#8217;s wonderfully precise and clear about sounds and letters.\u00a0 Likewise, there&#8217;s no issue with scripts or alphabets.\u00a0 You might think that &hellip; <a href=\"https:\/\/archive.blogs.harvard.edu\/cqtwo\/2009\/08\/19\/transliterating-sanskrit-and-pali\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1116,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1421,140],"tags":[],"class_list":["post-493","post","type-post","status-publish","format-standard","hentry","category-central-asia","category-religion"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8jQA6-7X","_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/wp-json\/wp\/v2\/posts\/493","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/wp-json\/wp\/v2\/users\/1116"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/wp-json\/wp\/v2\/comments?post=493"}],"version-history":[{"count":8,"href":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/wp-json\/wp\/v2\/posts\/493\/revisions"}],"predecessor-version":[{"id":563,"href":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/wp-json\/wp\/v2\/posts\/493\/revisions\/563"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/wp-json\/wp\/v2\/media?parent=493"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/wp-json\/wp\/v2\/categories?post=493"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/cqtwo\/wp-json\/wp\/v2\/tags?post=493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}