Design of TSCII-encoding based webpages in Tamil -Guidelines

Tamil Script Code for Information Interchange (TSCII)

After nearly three years of discussions, Internet Tamil community has agreed upon an Encoding scheme for Tamil called "Tamil Script Code for Information Interchange (TSCII)". During Fall 1998, this standard was submitted to the Tamilnadu Special Advisory Committe for Tamil Computing for possible adoption as a glyph encoding standard for tamil. A 8-bit bilingual glyph encoding scheme forms the basis of TSCII. Details of the proposed TSCII standard are available at the TSCII website.

A major goal of the proposed Script Code is to unify the mode ofinformation interchange in the Internet via Email, WWW, pdf etc by all of us using the same font encoding scheme. This way tamils worldwide can readily access tamil pages without the need to download and install one tamil font for each website.

Several TSCII-conformant font faces are available free for download from the following websites: TSCII website or from my Tamil Electronic Library Website. Tamil fonts and Tools (Text Editors, Keyboard Editors, File Convertors,...) are available FREE for usage on all three of the commonly used computer platforms Windows, Macintosh and Unix OS. TSCII-conformant font faces are readily identified by the suffix TSC after the font name, e.g. MylaiTSC, Sri-TSC, InaimathiTSC, MaduramTSC, TneriTSC,.. Time is ripe for all of us to start using the proposed Tamil Standard TSCII in the World Wide Web. In this page, we would like to provide some guidelines for setting up Webpages in Tamil and also give pointers to sample tamil webpages based on TSCII.

Mode of presentation of the Tamil text in source/HTML file

TSCII encoding scheme is a 8-bit bilingual one with standard lower-ASCII set at slots 0-127 and tamil glyphs occupying the upper berth (slots 128-255). HTML protocol 3.x is based on Latin-1 (8859-1) as the reference scheme. HTML protocols suggest "equivalent 7-bit representation of upper-ASCII characters". This could be either in the form of "entity representation" where a-tilde is represented as ? OR via "number representation" in the form of &#xxx; where xxx corresponds to the slot position of the character in question. (e.g. È for E-grave). Unless "user-defined" encoding option is chosen at the very beginning of the html file preparation step, most of the web-browsers systematically replace upper-ASCII characters by one or the above two equivalent 7-bit modes. When such "equiv. 7-bit characters" are present in a HTML file, the browser assumes the encoding to be of Latin-1 type.

Latin-1 scheme does not include any characters in rows 8 and 9 (slots 128-159). In order that the grantha characters and tamil numerals (present in slots 128-159), careful attention needs to be paid in the preparation of HTML files for 8-bit fonts. Based on many successful trials, the Internet Working Group for TSCII strongly recommends that the tamil text be present as a raw 8-bit text. For this, files can be generated in HTML format using simple text Editors or ensuring that "user-defined" option is chosen before entering the tamil text in the html file. Most of the web-editors introduce descriptions automatically inscribing the encoding used at the time of the html file preparation. (see below).

Ways of forcing the web-browser to use TSCII-conformant
Tamil font face to display tamil pages

There are two ways in which one can force the Web-browser at the client side to use locally available TSCII-conformant tamilfont(s) to display tamil pages in tamil script:

i) Invoke "x-user-defined" case for the encoding in the META header

Internationalisation part of HTML standards propose usage of "character set" to display non-roman language materials. One of the near-term goals of the Internet Working Group for TSCII is to get Internet Protocols Standardisation Agencies such as IETF to accept the proposed Encoding scheme TSCII as a "char-set" for Tamil. This is along the same lines of specific character sets we have for Russian, Korean, Japanese, Greek etc. Then we can have TSCII as one of the recognized character set to invoke in HTML files.

Till that time, an immediate option is to invoke "x-user-defined" case for the char-set in the META Header of the HTML file and have the end-user choose TSCII-conformant Tamil font as the font to use for the "User-defined" encoding (using Browser Preferences Menu).
Implementation: For this to work properly the following must be done:
Author of the HTML file must place a META header part at the beginning of the font file that should read as follows:

where you can write your name for the part marked "your name".
End-user/client must open the "Fonts" section of the "Preferences" menu of the browser and select his/her preferred TSCII-conformant font face for the proportional and fixed-width (monospace) fonts. Please take note that TSCII fonts of Murasu Anjal are available to the System only when the Anjal software is running in the background. Self-standing MylaiTSC and ArulMathiTSC that comes part of Anjal package are examples of fixed width font faces that one can use to display texts and to fill in on-line Web querries where you provide responses by typing directly in appropriate boxes.

ii) Invoke font face tags

HTML protocols allow usage of to specify up to three font face(s) for the browser to use to display text in the webpage.
Implementation: Author of the html file: For tamil, one can use any three of freely available TSCII-font faces, e.,g . There are no restrictions on where you can invoke nor on the number of times you can invoke in the html file. For facile access by majority of the community, it is preferable that one invokes font faces that are readily and freely available in the Internet.
End-user/Client: Not much to do, except to choose "user-defined" case for the encoding option within the View menu of the Browser.

Advantage of Option (i): Each client can use his/her own TSCII font to view the tamil page (and no need to download any font for any tamil website)
Advantage of Option (ii): Allows the author of HTML a finer control in the Web presentation using font faces of personal preference/choice.
There is a growing trend in the internet to avoid using tags and go for character-set option.

Recommendations of Internet Working Group for TSCII

Based on several successful trials on several of the commonly used web-browsers in different computer OS, IWC recommends that the Tamil webpages carry the tamil text as raw 8-bit text and with Meta-Headers that specify "x-user-defined" as the charset.

Sample Web Pages in Tamil based on TSCII format

The following are sample collections of tamil webpages based on TSCII encoding scheme. You can view them in tamil script using any TSCII-conformant font face.

