Webpages of Tamil Electronic Library © K. Kalyanasundaram |
Design of TSCII-encoding based webpages in Tamil -Guidelines
Tamil Script Code for Information Interchange (TSCII)After nearly three years of discussions, Internet Tamil community has agreed upon an Encoding scheme for Tamil called "Tamil Script Code for Information Interchange (TSCII)". During Fall 1998, this standard was submitted to the Tamilnadu Special Advisory Committe for Tamil Computing for possible adoption as a glyph encoding standard for tamil. A 8-bit bilingual glyph encoding scheme forms the basis of TSCII. Details of the proposed TSCII standard are available at the TSCII website.
A major goal of the proposed Script Code is to unify the mode ofinformation interchange in the Internet via Email, WWW, pdf etc by all of us using the same font encoding scheme. This way tamils worldwide can readily access tamil pages without the need to download and install one tamil font for each website.
Several TSCII-conformant font faces are available free for download from the following websites: TSCII website or from my Tamil Electronic Library Website. Tamil fonts and Tools (Text Editors, Keyboard Editors, File Convertors,...) are available FREE for usage on all three of the commonly used computer platforms Windows, Macintosh and Unix OS. TSCII-conformant font faces are readily identified by the suffix TSC after the font name, e.g. MylaiTSC, Sri-TSC, InaimathiTSC, MaduramTSC, TneriTSC,.. Time is ripe for all of us to start using the proposed Tamil Standard TSCII in the World Wide Web. In this page, we would like to provide some guidelines for setting up Webpages in Tamil and also give pointers to sample tamil webpages based on TSCII.
Mode of presentation of the Tamil text in source/HTML fileTSCII encoding scheme is a 8-bit bilingual one with standard lower-ASCII set at slots 0-127 and tamil glyphs occupying the upper berth (slots 128-255). HTML protocol 3.x is based on Latin-1 (8859-1) as the reference scheme. HTML protocols suggest "equivalent 7-bit representation of upper-ASCII characters". This could be either in the form of "entity representation" where a-tilde is represented as ? OR via "number representation" in the form of xx; where xxx corresponds to the slot position of the character in question. (e.g. È for E-grave). Unless "user-defined" encoding option is chosen at the very beginning of the html file preparation step, most of the web-browsers systematically replace upper-ASCII characters by one or the above two equivalent 7-bit modes. When such "equiv. 7-bit characters" are present in a HTML file, the browser assumes the encoding to be of Latin-1 type.
Latin-1 scheme does not include any characters in rows 8 and 9 (slots 128-159). In order that
the grantha characters and tamil numerals (present in slots 128-159), careful attention needs
to be paid in the preparation of HTML files for 8-bit fonts. Based on many successful trials,
the Internet Working Group for TSCII strongly recommends that the tamil text be present
as a raw 8-bit text. For this, files can be generated in HTML format using simple text
Editors or ensuring that "user-defined" option is chosen before entering the tamil text in
the html file. Most of the web-editors introduce descriptions automatically
inscribing the encoding used at the time of the html file preparation. (see below).
Ways of forcing the web-browser to use TSCII-conformant
There are two ways in which one can force the Web-browser at the client side to use locally
available TSCII-conformant tamilfont(s) to display tamil pages in tamil script: