EBCDIC 930

EBCDIC 930

CCSID 930 (sometimes known as CP930 or codepage 930) is one of several Japanese EBCDIC code pages created by IBM for representation of Japanese text. It is commonly used on IBM z/OS and IBM System i operating system.

It encodes halfwidth Katakana, fullwidth Katakana, Hiragana and Kanji.

Contents

Technical detail

CCSID 930 uses a stateful EBCDIC encoding scheme that uses 1 byte to encode halfwidth Katakana and 2 bytes to encode all other Japanese characters. The single byte portion is CCSID 290, which is also known as EBCDIK (Extended Binary Coded Decimal Interchange Kana). The double byte portion is CCSID 300, which is shared with CCSID 939.[1][2] If only halfwidth Katakana mixed with Latin characters is used, which was the standard till the 80s, CCSID 930 can be considered a pure 8bit encoding. When other types of Japanese or fullwidth characters are used, it is a multibyte encoding where the Shift-In 0x0E and Shift-Out 0x0F bytes are used to indicate the start and end of a double-byte encoding.

The most recent versions of CCSID 930 (CCSID 1390) supports JIS X 0213.

It was invented by Alan Lloyd Jones at IBM Hursley Laboratories, UK.[citation needed]

Practical considerations

CCSID 930 itself and its encoding scheme contains a number of idiosyncrasies that makes working with CCSID 930 in practice hard (see also EBCDIC for idiosyncrasies of the EBCDIC standard) and are of some practical relevance.

  • Because of the Shift-In, Shift-Out codes parsing a byte sequence from the middle is hard. Interpretation of the bytes requires backing up until one of the shift bytes is encountered.
  • Although CCSID 930 allows for mixed halfwidth and fullwidth character text, many database schemas strictly distinguish between columns containing only single byte halfwidth Katakana and such containing only double byte fullwidth characters. This is a convenience created for software developers to make text length prediction for a given column size in bytes easier and vice-versa.
  • On the downside the above means that for consistency Latin text in such fullwidth character column will have to be entered or converted into fullwidth Alphabetic characters (interesting when doing database searches) such that they are encoded as double byte characters
  • When database columns are implicitly defined as pure fullwidth character text the Shift-In, Shift-Out codes are often omitted, which results in strictly speaking incorrect encoding. When the shift codes are missing, usually CCSID 290 or CCSID 300 needs to be used for proper conversion to another charset, like the more portable Unicode.

References

  1. ^ http://www.ibm.com/software/globalization/ccsid/ccsid930.jsp
  2. ^ http://www.ibm.com/software/globalization/ccsid/ccsid939.jsp

External links


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Code page 930 — (abbreviated as CP930, also known as Japanese EBCDIC) is a code page created by IBM for representation of Japanese text. It is a superset of EBCDIC. It is commonly used on IBM OS390 and IBM AS400 operating system.It encodes halfwidth Katakana,… …   Wikipedia

  • Code page — is another term for character encoding. It consists of a table of values that describes the character set for a particular language. The term code page originated from IBM s EBCDIC based mainframe systems,[1] but many vendors use this term… …   Wikipedia

  • Unicode — For the 1889 Universal Telegraphic Phrase book, see Commercial code (communications). The Unicode official logo since October 2009 …   Wikipedia

  • Character encoding — Special characters redirects here. For the Wikipedia editor s handbook page, see Help:Special characters. A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of… …   Wikipedia

  • Baudot code — The Baudot code, invented by Émile Baudot,[1] is a character set predating EBCDIC and ASCII. It was the predecessor to the International Telegraph Alphabet No 2 (ITA2), the teleprinter code in use until the advent of ASCII. Each character in the… …   Wikipedia

  • Control character — In computing and telecommunication, a control character or non printing character is a code point (a number) in a character set, that does not in itself represent a written symbol. It is in band signaling in the context of character encoding. All …   Wikipedia

  • ISO/IEC 646 — This article is about a character encoding standard. For the ISO C header file, see iso646.h. ISO/IEC 646:1991, Information technology ISO 7 bit coded character set for information interchange, is an ISO standard that since its first edition in… …   Wikipedia

  • UTF-7 — (7 bit Unicode Transformation Format) is a variable length character encoding that was proposed for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in… …   Wikipedia

  • Western Latin character sets (computing) — Several binary representations of character sets for common Western European languages are compared in this article. These encodings were designed for representation of Italian, Spanish, Portuguese, French, German, Dutch, English, Danish, Swedish …   Wikipedia

  • Morse code — Chart of the Morse code letters and numerals Morse code is a method of transmitting textual information as a series of on off tones, lights, or clicks that can …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”