0.10 Data Compression Algorithms
The goal of data compression is to represent a some set of information
as space efficiently as possible. A data compression code is a
mapping between some set of source messages and a set of codewords. A source message does not have to be and is not usually
an entire string being compressed. Rather, it is the set of symbols
or strings into which the data being compressed is partitioned for
processing. These basic units may be single symbols from the source
string's alphabet, or they may be strings of such symbols. The
process of converting from a source stream into a coded (hopefully
compressed) message is called encoding while the inverse
operation is called decoding. A lossless encoding method
is one in which the process of decompression results in no loss of
original data whereas lossy encoding method is one in which the
original data cannot be fully recovered.
Codes can be of the types block-block or variable-variable. Codes of the block-block variety operate on
static, fixed-length codewords and source messages while their
counterparts operate on dynamic length codewords and source messages.
One example of a block-block type code is the ASCII code. It is
of the block-block variety because it maps fixed-length source
messages (characters) into fixed-length codewords (their ASCII
Because variable-variable type codes produce codewords that do not
have a fixed length, when processing the output of a variable-variable
code it is vital to be able to differentiate between codewords in the
stream. Fixed length codewords are easy to distinguish due to their
regular spacing pattern. However, we do not have this luxury when
dealing with variable-variable codes.
The sequence of codewords or source messages in a stream
is called an ensemble.
A coding function is called distinct if its mapping from source
messages to codewords is one-to-one. Such a code is called uniquely decodable if every codeword is recognizable even when
immersed in a stream of other codewords. A uniquely decodable code is
known as a prefix code if it has the property that no codeword
in the code is a
prefix of any other codeword.
Data compression schemes are said to be either static or dynamic (or adaptive). A static function is one in which the
mapping from the input source messages to the set of codewords is
fixed before the data compression begins. In such a system a given
message is always represented by the same codeword regardless of where
it appears in the ensemble. In contrast, a dynamic (or adaptive)
algorithm may change the mapping for a particular source message
during the compression process.