Understanding Code Pages

Topics:

A code page is a matching set of numeric values and the written symbols they represent. It is usually defined by the vendor of an operating system (platform). The process of transcoding data between the client and server is the actual mapping of the graphic characters on one code page to the corresponding graphic characters on another code page.

Code pages have the following characteristics:

Character Sets

There are two important types of computer character sets:

  • Single-Byte Character Set (SBCS) code pages are 8-bit encodings that represent scripts such as Eastern and Western European alphabets, Greek, Cyrillic (Russian), Arabic, Hebrew, and Thai.
  • Double-Byte Character Set (DBCS) code pages use 16-bits to represent each written symbol. DBCS code pages are used for the East Asian (Chinese and Japanese) scripts that have thousands of written symbols.

Defining Scripts, Languages, and Code Pages

Reference:

Scripts, languages, and code pages are closely related in text handling for computer systems. This topic will give you a better understanding of the differences in these terms.

A script is a collection of symbols that represent textual information in a writing system. These symbols might be letters of the alphabet, the numerals 0–9, punctuation marks, and mathematical symbols. Scripts are the major writing systems of the world, which include:

  • Latin (Roman letters)
  • Greek
  • Cyrillic (Russian)
  • Japanese

Written languages use symbols from a script to transcribe the spoken language. Languages with strong linguistic or historical links often make use of the same script. For example, most European languages use the Latin script, as does English. The set of English characters is generally referred to as the ASCII character set.

However, European languages have additional letters, referred to as national characters, which are not found in English. Examples of these national characters are German umlauts (Ä/ä, Ö/ö, Ü/ü), and French accented characters (á, à, â).

In a similar way, Japanese makes extensive use of kanji, the Japanese forms of Chinese characters. However, many Chinese characters are not part of written Japanese, and some Japanese kanji are not found in the Chinese written language.

A code page assigns numeric values to a set of written symbols. Historically, the first code pages were for a single country or language. Recently, code pages have been designed to handle many languages using the same script. Examples of multi-language code pages include the almost identical Microsoft Windows 1252 and UNIX ISO 8859-1 code pages. These pages support almost all North American, South American, and Western European languages that use the Latin script.

In keeping with the trend toward designing code pages for multiple languages, Information Builders has developed its own code page 1252, which handles all major North American, South American, and Western European languages (except Greek) for Windows and UNIX. Code page 1252 is functionally equivalent to Microsoft Windows 1252 and UNIX ISO 8859-1 code pages.

Reference: Information Builders Key Reporting Server Code Pages

The following table describes the key code pages used by Information Builders products.

Language

Windows

UNIX

Linux

z/OS (PDS and HFS Deployments)

IBM i

English

1252 (default)

1252 (default)

37 (default)

Western European

1252

1252

37 or a code page dependent on country

Central European (Polish and Czech)

1250

1250

870

Turkish

1254

1254

1026

Lithuanian

1257

1257

1112

Latvia

1257

1257

1112

Estonian

1257

1257

1112

Traditional Chinese

10948

10948

937

Japanese

942

10942/942

939/930

Hebrew

1255

1255

424

Unicode

65001

65001

65002

Note:

  • OS/390, z/OS, MVS, VM, and IBM i all use an IBM operating system and share the same family of EBCDIC code pages.
  • VM applies to an iWay Subserver for VM connected to a WebFOCUS Reporting Server.
  • OpenVMS applies to an iWay Subserver for OpenVMS connected to a WebFOCUS Reporting Server.

WebFOCUS

Feedback