Binary-to-text encoding

Binary-to-text encoding

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of ASCII-printable characters. These encodings are necessary for transmission of data when the channel or the protocol only allows ASCII-printable characters, such as e-mail or usenet. PGP documentation ( RFC 2440 ) uses the term ASCII armor for binary-to-text encoding when referring to Radix-64.

Description

The ASCII text-encoding standard uses 128 unique values (0–127) to represent the alphabetic, numeric, and punctuation characters commonly used in English, plus a selection of 'control codes' which do not represent printable characters. For example, the capital letter "A" is ASCII character 65, the numeral "2" is ASCII 50, the character "}" is ASCII 125, and the metacharacter "carriage return" is ASCII 13. Systems based on ASCII use seven bits to represent these values digitally.

In contrast, most computers store data in memory organized in eight-bit bytes. Files that contain machine-executable code and non-textual data typically contain all 256 possible eight-bit byte values. Many computer programs came to rely on this distinction between seven-bit "text" and eight-bit "binary" data, and would not function properly if non-ASCII characters appeared in data that was expected to include only ASCII text. For example, if the value of the eighth bit is not preserved, the program might interpret a byte value above 127 as a flag telling it to perform some function.

It is often desirable, however, to be able to send non-textual data through text-based systems, such as when one might attach an image file to an e-mail message. To accomplish this, the data is encoded in some way, such that eight-bit data is encoded into seven-bit ASCII characters (generally using only alphanumeric and punctuation characters -- the ASCII printable characters). Upon safe arrival at its destination, it is then decoded back to its eight-bit form. This process is referred to as binary to text encoding. Many programs perform this conversion to allow for data-transport, such as PGP and GNU Privacy Guard (GPG).

Encoding plain text

Although all binary-to-text encoding methods are useful for transmitting non-textual data through text-based systems, they are also used as a mechanism for encoding plain text.Some systems have a more limited character set they can handle; not only are they not 8-bit clean, some can't even handle every printable ASCII character.Others have limits on the number of characters that may appear between line breaks.Still others add headers or trailers to the text.And a few poorly-regarded but still-used protocols use in-band signaling, causing confusion if specific patterns appear in the message. The best-known is the string "From " (including trailing space) at the beginning of a line used to separate mail messages in the mbox file format.

By using a binary-to-text encoding on messages that are already plain text, then decoding on the other end, one can make such systems appear to be completely transparent.This is sometimes referred to as 'ASCII armoring'. For example, the ViewState component of ASP.NET uses base64 encoding to safely transmit text via HTTP POST.

Encoding standards

The most used forms of binary-to-text encodings are:

* hexadecimal
* base64
* quoted-printable
* uuencoding
* yEnc
* Ascii85
* BinHex
* Percent encoding
* Radix-64

The 94 isgraph codes 33 to 126 are known as the ASCII printable characters.

Some older and today uncommon formats include BOO, BTOA, and USR encoding. A newer, unstandardized encoding method is [http://base91.sourceforge.net/ basE91] , which produces the shortest plain ASCII output for compressed 8-bit binary input.

Most of these encodings generate text containing only a subset of all ASCII printable characters: for example, the base64 encoding generates text that only contains upper case and lower case letters, (A–Z, a–z), numerals (0–9), and the "+", "/", and "=" symbols.

Some of these encoding (quoted-printable and percent encoding) are based on a set of allowed characters and a single escape character. The allowed characters are left unchanged, while all other characters are converted into a string starting with the escape character. This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text.These encodings produce the shortest plain ASCII output for input that is mostly printable ascii.

Some other encodings (base64, uuencoding) are based on mapping all possible sequences of six bits into different printable characters. Since there are more than 26 = 64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted.

Some encodings (the original version of BinHex and the recommended encoding for CipherSaber) use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard hexadecimal digits.Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding -- expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Binary file — binaries redirects here. For double stars, see Binary star. .bin redirects here. For The CD image format, see Disk image. A hex dump of the 318 byte Wikipedia favicon, or …   Wikipedia

  • Percent-encoding — For the urlencode in MediaWiki, see Help:Magic words. Percent encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances. Although it is known as URL encoding… …   Wikipedia

  • Binary — means composed of two parts or two pieces . It contrasts with Unary, Ternary, Quaternary, and so on.Binary may also refer to:* Binary option, also known as digital option OR all or nothing option * Binary numeral system, a representation for… …   Wikipedia

  • Binary Synchronous Communications — Binary Synchronous Communication (BSC or Bisync) is an IBM link protocol, announced in 1967 after the introduction of System/360. It replaced the synchronous transmit receive (STR) protocol used with second generation computers. The intent was… …   Wikipedia

  • Binary-coded decimal — In computing and electronic systems, binary coded decimal (BCD) is a digital encoding method for numbers using decimal notation, with each decimal digit represented by its own binary sequence. In BCD, a numeral is usually represented by four bits …   Wikipedia

  • Text file — A text file (sometimes spelled textfile ) is a kind of computer file that is structured as a sequence of lines. A text file exists within a computer file system. The end of a text file is often denoted by placing one or more special characters,… …   Wikipedia

  • Binary code — The word Wikipedia represented in ASCII binary. A binary code is a way of representing text or computer processor instructions by the use of the binary number system s two binary digits 0 and 1. This is accomplished by assigning a bit string to… …   Wikipedia

  • Binary XML — For information on the generalized binary file format, see Extensible Binary Meta Language. Binary XML, or Binary Extensible Markup Language, refers to any specification which defines the compact representation of XML in a binary format. While… …   Wikipedia

  • Binary numeral system — Numeral systems by culture Hindu Arabic numerals Western Arabic (Hindu numerals) Eastern Arabic Indian family Tamil Burmese Khmer Lao Mongolian Thai East Asian numerals Chinese Japanese Suzhou Korean Vietnamese …   Wikipedia

  • Binary Ordered Compression for Unicode — BOCU 1 is a MIME compatible Unicode compression scheme. BOCU stands for Binary Ordered Compression for Unicode. BOCU 1 combines the wide applicability of UTF 8 with the compactness of SCSU. This Unicode encoding is designed to be useful for… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”