- Redundancy (information theory)
information theoryis the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data. Data compressionis a way to reduce or eliminate unwanted redundancy, while checksums are a way of adding desired redundancy for purposes of error detectionwhen communicating over a noisy channel of limited capacity.
In describing the redundancy of raw data, recall that the rate of a source of information is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the most general case of a stochastic process, it is
the limit, as "n" goes to infinity, of the
joint entropyof the first "n" symbols divided by "n". It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a memoryless source is simply , since by definition there is no interdependence of the successive messages of a memoryless source.
The absolute rate of a language or source is simply
logarithmof the cardinalityof the message space, or alphabet. (This formula is sometimes called the Hartley function.) This is the maximum possible rate of information that can be transmitted with that alphabet. (The logarithm should be taken to a base appropriate for the unit of measurement in use.) The absolute rate is equal to the actual rate if the source is memoryless and has a uniform distribution.
The absolute redundancy can then be defined as
the difference between the absolute rate and the rate.
The quantity is called the relative redundancy and gives the maximum possible
data compression ratio, when expressed as the percentage by which a file size can be decreased. (When expressed as a ratio of original file size to compressed file size, the quantity gives the maximum compression ratio that can be achieved.) Complementary to the concept of relative redundancy is efficiency, defined as so that . A memoryless source with a uniform distribution has zero redundancy (and thus 100% efficiency), and cannot be compressed.
Other notions of redundancy
A measure of "redundancy" between two variables is the
mutual informationor a normalized variant. A measure of redundancy among many variables is given by the total correlation.
Redundancy of compressed data refers to the difference between the expected compressed data length of messages (or expected data rate ) and the entropy (or entropy rate ). (Here we assume the data is ergodic and
stationary, e.g., a memoryless source.) Although the rate difference can be arbitrarily small as increased, the actual difference , cannot, although it can be theoretically upper-bounded by 1 in the case of finite-entropy memoryless sources.
Source coding theorem
* Fazlollah M. Reza. "An Introduction to Information Theory". New York: McGraw-Hill 1961. New York: Dover 1994. ISBN 0-486-68210-2
* B. Schneier, "Applied Cryptography: Protocols, Algorithms, and Source Code in C". New York: John Wiley & Sons, Inc. 1996. ISBN 0-471-12845-7
Wikimedia Foundation. 2010.
Look at other dictionaries:
Information theory — Not to be confused with Information science. Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental… … Wikipedia
information theory — the mathematical theory concerned with the content, transmission, storage, and retrieval of information, usually in the form of messages or data, and esp. by means of computers. [1945 50] * * * ▪ mathematics Introduction a mathematical… … Universalium
information theory — Synonyms and related words: EDP, automatic electronics, autonetics, bionics, bit, channel, circuit analysis, communication engineering, communication explosion, communication technology, communication theory, communications, communications… … Moby Thesaurus
Entropy (information theory) — In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information… … Wikipedia
History of information theory — The decisive event which established the discipline of information theory, and brought it to immediate worldwide attention, was the publication of Claude E. Shannon s classic paper A Mathematical Theory of Communication in the Bell System… … Wikipedia
List of information theory topics — This is a list of information theory topics, by Wikipedia page.*A Mathematical Theory of Communication *algorithmic information theory *arithmetic encoding *channel capacity *Communication Theory of Secrecy Systems *conditional entropy… … Wikipedia
Redundancy — may refer to: Redundancy (engineering) Redundancy (information theory) Redundancy (language) Redundancy (total quality management) Redundancy (user interfaces) Data redundancy Gene redundancy Logic redundancy Redundant acronym syndrome syndrome… … Wikipedia
redundancy — Synonyms and related words: EDP, abundance, amplitude, avalanche, battology, bedizenment, bit, channel, circumambages, circumbendibus, circumlocution, cloud of words, communication explosion, communication theory, copiousness, data retrieval,… … Moby Thesaurus
Redundancy theory of truth — According to the redundancy theory of truth, or the disquotational theory of truth, asserting that a statement is true is completely equivalent to asserting the statement itself. For example, asserting the sentence Snow is white is true is … Wikipedia
Theory of the firm — The theory of the firm consists of a number of economic theories that describe the nature of the firm, company, or corporation, including its existence, behavior, structure, and relationship to the market. Contents 1 Overview 2 Background … Wikipedia