LZWL is a syllable-based variant of the character-based
LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.
According Compact Oxford English Dictionary syllable is defined as: ‘A unit of pronunciation having one vowel sound, with or without surrounding consonants, and forming all or part of a word.’
As the decomposition to syllables is used in data compression, it is not necessary to decompose words into syllables always correctly.
Algorithm LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.
In the initialization step the dictionary is filled up with all characters from the alphabet. In each next step it is searched for the maximal string S, which is from the dictionary and matches the prefix of the still non-coded part of the input. The number of phrase S is sent to the output. A new phrase is added to the dictionary. This phrase is created by concatenation of string S and the character that follows S in file. The actual input position is moved forward by the length of S.Decoding has only one situation for solving. We can receive the number of phrase, which is not from the dictionary. In this case we can create that phrase by concatenation of the last added phrase with its first character.
The syllable-based version works over an alphabet of syllables. In the initialization step we add to the dictionary the empty syllable and small syllables from a database of frequent syllables. Finding string S and coding its number is similar to the character-based version, except that string S is a string of syllables. The number of phrase S is encoded to the output. The string S can be the empty syllable.
If S is the empty syllable, then we must get from the file one syllable called K and encode K by methods for coding new syllables. Syllable K is added to the dictionary. The position in the file is moved forward by the length of S. In the case when S is the empty syllable, the input position is moved forward by the length of K.
In adding a phrase to the dictionary there is a difference to the character-based version. The phrase from the next step will be called S1. If S and S1 are both non-empty syllables, then we add a new phrase to the dictionary. The new phrase is created by the concatenation of S1 with the first syllable of S. This solution has two advantages: The first is that strings are not created from syllables that appear only once. The second advantage is that we cannot receive in decoder number of phrase that is not from dictionary.
* [http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-176/paper5.pdf Detailed description]
Wikimedia Foundation. 2010.
Look at other dictionaries:
Kolmogorov complexity — In algorithmic information theory (a subfield of computer science), the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object. It is named after Soviet Russian… … Wikipedia
Codec — This article is about encoding and decoding a digital data stream. For other uses, see Codec (disambiguation). Further information: List of codecs A codec is a device or computer program capable of encoding and/or decoding a digital data stream… … Wikipedia
Data compression — Source coding redirects here. For the term in computer programming, see Source code. In computer science and information theory, data compression, source coding or bit rate reduction is the process of encoding information using fewer bits than… … Wikipedia
Huffman coding — Huffman tree generated from the exact frequencies of the text this is an example of a huffman tree . The frequencies and codes of each character are below. Encoding the sentence with this code requires 135 bits, as opposed of 288 bits if 36… … Wikipedia
Information theory — Not to be confused with Information science. Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental… … Wikipedia
Lossless data compression — is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be… … Wikipedia
MPEG-1 — Moving Picture Experts Group Phase 1 (MPEG 1) Filename extension .mpg, .mpeg, .mp1, .mp2, .mp3, .m1v, .m1a, .m2a, .mpa, .mpv Internet media type audio/mpeg, video/mpeg Developed by ISO, IEC Type of format audio, vid … Wikipedia
Nyquist–Shannon sampling theorem — Fig.1: Hypothetical spectrum of a bandlimited signal as a function of frequency The Nyquist–Shannon sampling theorem, after Harry Nyquist and Claude Shannon, is a fundamental result in the field of information theory, in particular… … Wikipedia
A-law algorithm — Graph of μ law A law algorithms An A law algorithm is a standard companding algorithm, used in European digital communications systems to optimize, i.e., modify, the dynamic range of an analog signal for digitizing. It is similar to the μ law… … Wikipedia
Entropy encoding — In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium. One of the main types of entropy coding creates and assigns a unique prefix free code to each… … Wikipedia