happy gif Webpages of Tamil Electronic Library © K. Kalyanasundaram



Tamil script Code for Information Interchange (TSCII) and
other 8-bit Coded Bilingual Character Sets of ISO /IEC /ECMA


Tamil script Code for Information Interchange (TSCII) and
other 8-bit Coded Bilingual Character Sets of ISO /IEC /ECMA

Introduction


In Character set standards, various slot positions (bit combinations) of the set are divided into different zones with specific features /restrictions. The specifications of the characters of the 8-bit code are as follows:
  • C0 set containing control characters (also known as CL area):
    A set of up to 30 control characters represented by bit combinations 00/00 to 01/15, except 00/14 and 00/15 which shall not be used.
    The requirement for the C0 set are:
    - bit combination 00/14 and 00/15 shall not be used; - the control character ESCAPE shall be represented by bit combination 01/11; - any control characters can be allocated to the other combinations

  • Character ESCAPE
    Escape is a control character, represented by bit combination 01/11, used to form escape sequences.

  • Character SPACE
    A graphic character represented by bit combination 02/00, having a visual representation consisting of the absence of a graphic symbol. It causes the active position to be advanced by one character position.

  • G0 set (also known as GL area)
    94 bit combinations 02/01 to 07/14 are used to represent graphic characters. All graphic characters allocated to bit combinations in the range 01/07/14 are spacing characters, that is they cause the active position to advance by one character spacing. The graphic characters allocated by this standard to these 94 bit combinations are those of standard lower ASCII set

  • Character DELETE
    A character represented by bit combination 07/15. DEL was originally used to erase or obliterate an erroneous or unwanted character in punched tape. DEL may be used for media-fill or time-fill. DEL characters may be inserted into, or removed from, a data stream without affecting the information content of that stream, but such action may affect the information layout and/or control of the equipment

  • C1 set (also known as CR area )
    The C1 set is available for up to 32 control characters in addition to those provided by the C0 set. It shall not include any of the control characters of the C0 set of ISO 6429.
    No specific control characters are allocated to bit combinations 08/00 to 08/13 and 09/00 to 09/15 by this standard.
    When the single shift functions SS2 and SS3 are used, they shall be allocated to bit combinations 08/14 and 08/15 respectively, otherwise these bit combinations shall not be used. (Note: A C1 set comprising only of SS2 and SS3 to these bit combinations has been registered as ISRO-IR No. 105).

  • G1 set (also known as GR area)
    A set of up to 96 graphic characters represented by bit combinations 10/00 to 15/15.
    The G1 set shall be either a 94-character set (bit combinations 10/00 to 15/14) or a 96-character set (bit combination 10/00 to 15/15) of graphic characters. This set is available to graphic characters in addition to those provided by the G0 set.
    Either a unique graphic character shall be allocated for each bit combination or the bit combination shall be declared unused.


8-bit character set Standard References

Three ISO standards deal with 8-bit character sets:

ISO/IEC 2022:1994
Character code structure and extension techniques (fourth edition).
Description
This standard specifies a structure for 7-bit and 8-bit codes that is adopted by all such codes produced under the auspices of ISO/IEC JTC1/SC2. This is the subcommittee entrusted jointly by ISO and IEC with the development of character set coding matters. This standard also specifies means by which the correspondence between bit combinations and characters may be changed during a particular instance of information interchange. This is known as code extension. It makes use of control functions that are themselves represented by bit combinations within the original code. Cf: http://www.ewos.be/tg-cs/gis2022.htm

ISO/IEC 4873:1991
ISO 8-bit code for information interchange - Structure and rules for implementation (third edition).
Description
This standard specifies a structure for 8-bit codes that builds on the general structure for such codes laid down in ISO/IEC 2022. In particular the content of the GL area of the code table is fully specified and the content of the GR area is restricted to be a character set that makes use of single-byte coding (and so contains at most 96 characters). The fixed content for the GL area is the set registered in the ISO 2375 Register as ISO-IR 6. This set is also the International Reference Version (IRV) of ISO/IEC 646:1991 and is more commonly known as the ASCII character set.
Cf: http://www.ewos.be/tg-cs/gis4873.htm

ECMA-43:1991 (identical to ISO/IEC 4873:1991) Standard specifies three nested levels of implementation :

    Level-1 comprising of the following facilities:
    - a C0 set;
    - the character SPACE represented by bit combination 01/00;
    - the G0 set;
    - the character DELETE represented by bit combination 07/15;
    - a C1 set; and
    - a G1 set

    At level-1, no shift functions shall be used and the G0 and G1 sets are assumed to be invoked permanently in columns 02 to 07 and 10 to 15, respectively.

    At Level-1, the C1 set and/or the G1 set may be empty if there are no requirement for control characters in addition to those provided by the C0 set and/or graphic characters in addition to those provided by the G0 set.

    At Level-1 a version shall not include a G2 or G3 set.

    (A G2 set consists of 94 or 96-character set of graphic characters at bit combinations 10/00 to 15/15 and the character sets of G2 are invoked either by the single-shift function SS2 or by the locking-shift function LS2R.
    A G3 set consists of 94 or 96-character set of graphic characters at bit combinations 10/00 to 15/15 and the character sets of G3 are invoked either by the single-shift function SS3 or by the locking-shift function LS3R. )
    Level-2 and Level-3 versions of 4873 Standard correspond to having G1 set replaced by G2 and G3 respectively.

ISO/IEC 10367:1991
Standardized coded graphic character sets for use in 8-bit codes (first edition).
Description:
ISO/IEC 10367 specifies a collection of coded graphic character sets suitable for use within the structure of an 8-bit code as laid down in ISO/IEC 4873. These sets are all suitable for use as any of the code elements G1, G2 and G3 in a version of ISO/IEC 4873 at any of its three levels of implementation. The G0 code element of ISO/IEC 4873 is prescribed by that standard but is repeated for information in ISO/IEC 10367.
ISO/IEC 10367 does not specify the sets C0 and C1 of control functions that may be used in a version of ISO/IEC 4873 that conforms to ISO/IEC 10367.
cf: http://www.ewos.be/tg-cs/gis10367.htm

Websites of Standardisation Agencies


TSCII as a 8-bit Coded Character Set

  • The 8-bit bilingual glyph encoding based Tamil Standard Code TSCII proposed by the Internet Working Group for Tamil Standard Code meets all the requirements for registration as per International Standards Organisation (ISO ) standard ISO/IEC 4873:1991 Level 1 Specifications as indicated below.

    Note: Vietnamese Standard Code VSCII and Russian Language Code KOI8-R are examples of officially recognized 8-bit character sets, very similar in structure to TSCII.

    ISO:IEC guidelines for the 8-bit Coded Character Set Standards ISO/IEC 4871:1991 and ECMA-43:1991 view the entire block of 256 glyph slots into four segments: C0 (Control-0), G0 (Graphic-0), C1 (Control-1) and G1 (Graphic-1) Segments with explicit specifications on what can be in each of these four segments. Figure 1 shows graphically typical composition of a 8-bit coded character set.
    8bitstd block content


    A brief overview of 8-bit character sets that have graphic characters in the C1 block

    In the last decade, ISO-8859-1 (aka as Latin-1) character set has been the most popular and widely used character set and found a larger user-base when HTML 3.0 protocols chose to have Latin-1 as the default standard for HTML documents diffused in the internet. This Latin-1 set does not have any graphic characters placed in the C1 block. Many of the softwares written for the English-speaking and European Market are based on this Latin-1 character set as the standard. This has led to many to believe that one cannot place graphic characters in the C1 block.

    The following are examples of 8-bit character sets of major computer manufacturers that have graphic characters in the C1 block.
    MS-DOS Code pages

    • CP437 (DOSLatinUS) used once by the IBM Personal Computer
    • CP852 (DOSLatin2) for European Languages
    • CP855 (DOSCyrillic) for Cyrillic
    MS-Windows Code Pages
    • CP1252 (Win Latin1, aka as Windows-1252) Microsoft character set for Windows OS, supercede CP437 used earlier by IBM PCs; a superset of 8859-1 scheme
      windows1252 encoding
    • CP1250 (WinLatin2)
    • CP1251 (WinCyrillic, aka as Windows-1251)
    Apple
    • MacRoman encoding
      macroman encoding
    NeXT
    • NeXTSTEP
    Hewlett-Packard
    • HP-Roman8
    KOI8-R for Russian (Cyrillic) and VISCII for Vietnamese are recent examples of 8-bit character sets with graphic characters in the C1 block that have been recognized as International standards through RFCs.

    Russian /Cyrillic character code set KOI8-R

    URL for KOI8-R Homepage:http://www.nagual.pp.ru/~ache/koi8/main.html

    Russian section of the Internet (the relcom.* newsgroups) has been using KOI8-R as their character encoding for discussions in Cyrillic. In view of its wide popularity, Andrei Chernov et al formalised its registration as an international standard by registering KOI8-R character set through RFC 1489 . . This procedure let to establishment of KOI8-R as the de-facto standard on the Internet. KOI8-R which was later also numbered code-point 878, CP878 .
    The following gif shows the content of the 128-255 slot assignments of the 8-bit character set KOI8-R:

    koi8_r russian encoding


    Vietnamese Character set VISCII

    URL for VISCII homepage: http://www.vietstd.org/vietstd/index.htm

    VISCII was developed in 1993 by the Vietnamese Standardization Working Group Viet-Std@Haydn.Stanford.EDU.
    VISCII became an international standard with registration of its character set through RFC 1456.

    The following gif shows the content of the 8-bit character set VISCII:

    vietnamese encoding


Google
Web tamilelibrary.org



Like to order Tamil books?
click here to see a list of Books of interest to Tamil Diaspora that you can order directly from




Click here to go to Guestbook page

Page visits to the site since Nov 16, 2005: