# Shannon–Fano coding

﻿
Shannon–Fano coding

In the field of data compression, Shannon-Fano coding is a suboptimal technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). The technique was proposed prior to the optimal technique of Huffman coding in Claude Elwood Shannon's "A Mathematical Theory of Communication," his 1948 article introducing the field of information theory. The method was attributed to Robert Fano, who later published it as a technical report. Shannon-Fano coding should not be confused with Shannon coding [Shannon coding [http://ocw.mit.edu/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-441Transmission-of-InformationSpring2003/02665B0D-F6AF-42BB-87D6-771EA9D1BF38/0/6441lecture6.pdf] ] , the coding method used to prove Shannon's noiseless coding theorem, or with Shannon-Fano-Elias coding (also known as Elias coding) [Elias coding [http://ocw.mit.edu/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-441Transmission-of-InformationSpring2003/A2E84D18-8C50-4876-9A8A-E2639DD86375/0/6441lecture7.pdf] ] , the precursor to arithmetic coding.

In Shannon-Fano coding, the symbols are arranged in order from most probable to least probable, and then divided into two sets whose total probabilities are as close as possible to being equal. All symbols then have the first digits of their codes assigned; symbols in the first set receive "0" and symbols in the second set receive "1". As long as any sets with more than one member remain, the same process is repeated on those sets, to determine successive digits of their codes. When a set has been reduced to one symbol, of course, this means the symbol's code is complete and will not form the prefix of any other symbol's code.

The algorithm works, and it produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are in fact of equal probability, the one bit of information used to distinguish them is used most efficiently. Unfortunately, Shannon-Fano does not always produce optimal prefix codes; the set of probabilities {0.35, 0.17, 0.17, 0.16, 0.15} is an example of one that will be assigned non-optimal codes by Shannon-Fano coding.

For this reason, Shannon-Fano is almost never used; Huffman coding is almost as computationally simple and always produces optimal prefix codes – optimal, that is, under the constraints that each symbol is represented by a code formed of an integral number of bits. This is a constraint that is often unneeded, since the codes will be packed end-to-end in long sequences. If we consider groups of codes at a time, symbol-by-symbol Huffman coding is only optimal if the probabilities of the symbols are independent and are some power of a half, i.e., $frac\left\{1\right\}\left\{2^n\right\}$. In most situations, arithmetic coding can produce greater overall compression than either Huffman or Shannon-Fano, since it can encode in fractional numbers of bits which more closely approximate the actual information content of the symbol. However, arithmetic coding has not superseded Huffman the way that Huffman supersedes Shannon-Fano, both because arithmetic coding is more computationally expensive and because it is covered by multiple patents.

Shannon-Fano coding is used in the IMPLODE compression method, which is part of the ZIP file format.cite web
url = http://www.pkware.com/documents/casestudies/APPNOTE.TXT
title = APPNOTE.TXT - .ZIP File Format Specification
accessdate = 2008-01-06
publisher = PKWARE Inc
date = 2007-09-28
quote = The Imploding algorithm is actually a combination of two distinct algorithms. The first algorithm compresses repeated byte sequences using a sliding dictionary. The second algorithm is used to compress the encoding of the sliding dictionary output, using multiple Shannon-Fano trees.
]

The Shannon-Fano Algorithm

A Shannon-Fano tree is built according to a specification designed to define an effective code table. The actual algorithm is simple:

# For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of occurrence is known.
# Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right.
# Divide the list into two parts, with the total frequency counts of the left half being as close to the total of the right as possible.
# The left half of the list is assigned the binary digit 0, and the right half is assigned the digit 1. This means that the codes for the symbols in the first half will all start with 0, and the codes in the second half will all start with 1.
# Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.

Example

The example shows the construction of the Shannon code for a small alphabet. The five symbols which can be coded have the following frequency::

resulting in

$frac\left\{1,\left\{ m Bit\right\}cdot 15 + 3,\left\{ m Bit\right\} cdot \left(7+6+6+5\right)\right\}\left\{39, mathrm\left\{Symbol approx 2.23$ Bits per Symbol.

References

ee also

*Huffman coding
*Modified Huffman coding - used in fax machines
*Data compression

* [http://www.binaryessence.com/dct/en000041.htm Shannon–Fano at Binary Essence]

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Codificacion Shannon-Fano — Codificacion Shannon Fano, en el campo de la compresión de datos, la codificación Shannon Fano es una técnica para construir un código prefijo basado en un conjunto de símbolos y sus probabilidades (estimadas o medidas). No es óptimo en el… …   Wikipedia Español

• Huffman coding — Huffman tree generated from the exact frequencies of the text this is an example of a huffman tree . The frequencies and codes of each character are below. Encoding the sentence with this code requires 135 bits, as opposed of 288 bits if 36… …   Wikipedia

• Claude Shannon — Claude Elwood Shannon (1916 2001) Born April …   Wikipedia

• Robert Fano — Robert Mario Fano (born 1917 as Roberto Mario Fano) is an Italian American computer scientist, currently professor emeritus of Electrical Engineering and Computer Science at Massachusetts Institute of Technology. Fano is known principally for his …   Wikipedia

• Adaptive Huffman coding — (also called Dynamic Huffman coding) is an adaptive coding technique based on Huffman coding. It permits building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one pass encoding and …   Wikipedia

• Modified Huffman coding — is used in fax machines to encode black on white images (bitmaps). It combines the variable length codes of Huffman coding with the coding of repetitive data in run length encoding. External links Modified Huffman coding from UNESCO . Archived… …   Wikipedia

• NegaFibonacci coding — Numeral systems by culture Hindu Arabic numerals Western Arabic (Hindu numerals) Eastern Arabic Indian family Tamil Burmese Khmer Lao Mongolian Thai East Asian numerals Chinese Japanese Suzhou Korean Vietnamese …   Wikipedia

• Nyquist–Shannon sampling theorem — Fig.1: Hypothetical spectrum of a bandlimited signal as a function of frequency The Nyquist–Shannon sampling theorem, after Harry Nyquist and Claude Shannon, is a fundamental result in the field of information theory, in particular… …   Wikipedia

• Advanced Video Coding — H.264/MPEG 4 AVC ist ein Standard zur hocheffizienten Videokompression. Er wurde zunächst von der ITU (Study Group 16, Video Coding Experts Group) unter dem Namen H.26L entwickelt. Im Jahre 2001 schloss sich die ITU Gruppe mit MPEG Visual… …   Deutsch Wikipedia

• Elias coding — is term used for one of two types of lossless coding schemes used in digital communications:* Shannon Fano Elias coding, a precursor to arithmetic coding, in which probabilities are used to determine codewords; * Universal coding using one of… …   Wikipedia