Comparison of Lossless Image Compression Techniques based on Context Modeling

Master's Thesis, 2014

79 Pages









1.1 Data compression
1.2 Basic concepts in image compression
1.2.1 Lossless and lossy cases
1.2.2 Measures of compression
1.2.3 Paradigm of compression
1.2.4 Arithmetic coding
1.3 Thesis objectives
1.4 Contents of thesis' chapters

2.1 Fundamentals
2.2 Context Tree
2.2.1 Structure of context tree
2.2.2 Static and semi-adaptive approaches
2.2.3 Construction of an initial context tree
2.2.4 Pruning of context tree

3.1 JPEG2000 standard
3.2.1 CALIC Frame Structure
3.2.2 Neighborhood pixels involved
i. Contexts Formation
ii. Error Feedback
3.2.6 ENTROPY CODING OF PREDICTION ERRORS Error Sign Flipping Remapping Errors. Histogram Tail Truncation
A. Proposed modification
3.2.7 Continuous-tone mode results
A. Proposed modification
3.3 General Context Tree based on Intensity
3.3.1 Context template
3.3.2 Compression scheme
3.3.3 GCT-I Simulation results

4.1 Conclusions
4.2 Suggestion for Future Work





Table 3-1 Comparison results of the modified CALIC and the original.

Table 3-2 Test sets information

Table 3-3 Bit-rate of compression results on natural smooth images (bits per pixel).

Table 3-4 Bit-rate of compression results on medical images (bits per pixel) of modified CALIC in continuous tone mode ..

Table 3-5 The compression efficiency (bits per pixel)..

Table 3-6 Modified CALIC Bit-rate of compression results on medical images (bits per pixel) after eliminating feedback mechanism

Table 3-7 GCT-I Bit-rate of compression results on medical images (bits per pixel).

Table 3-8 Comparison between the important properties of the lossless algorithms in consideration (JPEG2000, CALIC and GCT-I)


Figure 1-1 Principle compression scheme based on concept “universal modeling and coding”

Figure 2-1 Example of context template with size four

Figure 2-2 An example of context tree with depth of

Figure 2-3 Context template with size 4.

Figure 3-1 The JPEG2000 encoding process.

Figure 3-2 The DWT principle scheme.

Figure 3-3 DWT process for the Y component..

Figure 3-4 Template for context selection

Figure 3-5 The JPEG2000 decoding process.

Figure 3-6 CALIC Frame Structure

Figure 3-7 Causal template for adjacent pixels in prediction and modeling (Neighborhood pixels involved).

Figure 3-8 Convexity relationship of Î and neighborhood pixels

Figure 3-9 Two-stage adaptive prediction

Figure 3-10 Reduce conditional entropy by Sign Flipping.

Figure 3-11 Original Lena Image and its Histogram..

Figure 3-12 Prediction Error histogram of Different Predictors

(b) Zero-Holder Predictor of H

(c) MED Predictor of H

(d) GAP Predictor of H

Figure 3-13 Conditional GAP Error.

Figure 3-14 Feedback adjusted Version Error

Figure 3-15 Error Sign Flipping and Remapping.

Figure 3-16 Histogram Truncation Threshold

Figure 3-17 Feedback mechanism elimination of CALIC

Figure 3-18 The Overall modified CALIC system

Figure 3-19 Default location and order of the neighboring pixels with maximum depth

Figure 3-20 The principle scheme of the GCT-I algorithm

Figure 3-21 Intensity histogram of blood (medical image)

Figure 4-12 Intensity histogram of Lena (natural smooth image)

Appendix 1 (Original image test set. Set of natural smooth grayscale images)

Appendix 2 (Set of Medical images)


illustration not visible in this excerpt


illustration not visible in this excerpt


Most importantly, I am ever grateful to GOD the major source of strength when I worked on my MSc. thesis research and it is under His grace that we live and learn.

I would like to acknowledge the support of all members of my family, my parents, my wife and my daughter "Ratil" for their good support throughout my life.

Also, I would like to express my sincere gratitude to my advisor, Prof. Dr. Amin Mohamed Nassar for his guidance and advice.

Last but not least, I would like appreciate the help offered by the Electronics and Electrical Communications department staff and library staff, faculty of Engineering, Cairo University.


In this thesis various methods for lossless compression of source image data are analyzed and discussed. The main focus in this work is lossless compression algorithms based on context modeling using tree structure.

We are going to compare CALIC, GCT-I algorithms to the JPEG2000 standard algorithm which will be considered the reference of comparison.

This work includes research on how to modify CALIC algorithm in continuous-tone mode by truncating tails of the error histogram which may lead to improve CALIC compression performance.

Also, we are going to propose a modification to CALIC in binary mode by eliminating error feedback mechanism. As when any pixel to be encoded has a different grey level than any of the neighboring pixels, CALIC triggers an escape sequence that switches the algorithm from binary mode to continuous-tone mode. Which means in this case the pixel will be treated as if it was in continuous-tone region. This minor modification should improve CALIC performance in binary images.

Finally, we are going to discuss the GCT-I on medical images and compare results to the JPEG2000 standard.


1.1 Data compression

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an un-encoded representation would use through use of specific encoding schemes. For example, any article could be encoded with fewer bits if one were to accept the convention that the word "compression" be encoded as "comp". One popular instance of compression that many computer users are familiar with is the ZIP file format, which, as well as providing compression, acts as an archiver, storing many files in a single output file.

As is the case with any form of communication, compressed data communication only works when both the sender and receiver of the information understand the encoding scheme. For example, this text makes sense only if the receiver understands that it is intended to be interpreted as characters representing the English language. Similarly, compressed data can only be understood if the decoding method is known by the receiver. Some compression algorithms exploit this property in order to encrypt data during the compression process so that decompression can only be achieved by an authorized party (e.g. through the use of a password).

Compression is useful because it helps reduce the consumption of expensive resources, such as disk space or transmission bandwidth. On the downside, compressed data must be uncompressed to be viewed (or heard), and this extra processing may be detrimental to some applications. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it's being decompressed (you always have the option of decompressing the video in full before you watch it, but this is inconvenient and requires storage space to put the uncompressed video). The design of data compression schemes therefore involve trade-offs between various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme), and the computational resources required to compress and uncompress the data.

1.2 Basic concepts in image compression

1.2.1 Lossless and lossy cases

The assignment of compression is to code the image data into a compact form, minimizing both the number of bits in the representation, and the distortion caused by the compression.

Lossless image compression algorithms are generally used for images that are documents and when lossy compression is not applicable. Lossless algorithms are especially important for systems transmitting and archiving medical data, because lossy compression of medical images used for diagnostic purposes is, in many countries, forbidden by law.

In such a system, the images or volume slices are stored in the memory since mass storage turns out to be too slow here, the fast lossless image compression algorithm could virtually increase the memory capacity allowing processing of larger sets of data.

An image may be defined as a rectangular array of pixels. The pixel of a grayscale image is a nonnegative integer interpreted as the intensity (brightness, luminosity) of the image. When image pixel intensities are in the range [0, 2N - 1], then we say that the image is of N bit depth, or that it is an N -bit image. Typical grayscale images are of bit depths from 8 to 16 bits.

Grayscale image compression algorithms are used as a basis for color image compression algorithms and for algorithms compressing other than images 2-dimensional data characterized by a specific smoothness.

These algorithms are also used for volumetric 3-dimensional data. Sometimes such data, as a set of 2-dimensional images, is compressed using regular image compression algorithms. Other possibilities include preprocessing volumetric data before compressing it as a set of 2-dimensional images or using algorithms designed exclusively for volumetric data the latter are usually derived from regular image compression algorithms.

We could use a universal algorithm to compress images, i.e., we could simply encode a sequence of image pixels extracted from an image in the raster scan order.

For a universal algorithm such a sequence is hard to compress. Universal algorithms are usually designed for alphabet sizes not exceeding 28 and do not exploit directly the following image data features: images are 2-dimensional data, intensities of neighboring pixels are highly correlated, and the images contain noise added to the image during the acquisition process the latter feature makes dictionary compression algorithms perform worse than statistical ones for image data1.

Modern grayscale image compression algorithms employ techniques used in universal statistical compression algorithms. However, prior to statistical modeling and entropy coding the image data is transformed to make it easier to compress.

The fundamental principle for all compression methods is following idea:

If represent oft-recurring elements as short codes and rare-recurring as long codes, then the block of data needs a smaller memory size than if all elements were represented by codes of identical length.

A compression algorithm is “lossless” (or reversible) if the decompressed image is identical with the original. Respectively, a compression method is “lossy” (or irreversible) if the reconstructed image is only an approximation of the original one3.

Some loss of information can be acceptable for the following three reasons:

1. Significant loss can often be tolerated by the human visual system without interfering with perception of the scene content.
2. In most cases, digital input to the compression algorithm itself is an imperfect representation of the real world scene. This is certainly true when the image sample values are quantized version of the real- valued quantities.
3. Lossless compression is usually incapable of achieving the high compression requirements of many storage and distribution applications.

The term lossy is used in an abstract sense, and does not mean random lost pixels, but instead means loss of a quantity such as a frequency component, or perhaps loss of noise. The fundamental question of lossy compression methods is where to lose information.

Nevertheless, the lossless compression is often applied in medical applications, because on such images all information has big significance and lossy compression here is intolerable.

Lossless compression is also applied in cases where it is difficult to determine how to introduce an acceptable loss, which will increase compression.

In palletized color images, for example, a small error in the numeric sample value may have an intense effect upon the color representation.

Finally, lossless compression may be appropriate in applications where the image is to be extensively edited and recompressed so that the accumulation of errors from multiple lossy compression operations may become unacceptable.

In the definition of lossless and lossy compression, it is assumed that the original image is in digital form. For compression digital images are used; but source may be in analog view in the real world, and therefore, the loss in image quality already takes place in digitalization of source images, when the picture is converted from analog to digital representation. For simplicity, in compression the digitalization phase is skipped, images are restored in digital form.

1.2.2 Measures of compression

The compression methods are evaluated by two main criteria:

Compression efficiency and distortion. The most obvious measures of compression efficiency are “bit-rate” and “compression ratio”. For our purposes an image is a two dimensional sequence of sample values:

illustration not visible in this excerpt

Having finite size, N 1 and N 2, in vertical and horizontal directions respectively. The sample value x [ n 1 ,n 2 ] of source image is intensity of the location [ n 1 ,n 2 ] and can have the following values:

Abbildung in dieser Leseprobe nicht enthalten

For unsigned imagery, where B is the number of bits on each sample.

The purpose of image compression is image representation with a string of binary digits or “bits”, called the compressed “bit stream”, denoted as C. The objective is to keep the length C as small as possible. In the absence of any compression, we require N 1 N 2 B bits to represent the image sample values. Let us define the compression ratio as following equation18:

illustration not visible in this excerpt

Equivalently, we define the compressed bit-rate, expressed in bps (bits per sample), as follows:

illustration not visible in this excerpt

Bit-rate is the most obvious measure of compression efficiency; it shows the average number of bits per stored pixel of the image.

For lossy compression bit-rate is a more meaningful performance for image compression systems, since the least significant bits of high bit-depth imagery can often be excluded without significant visual distortion. The average number of bits spent in representing each image sample is often a more meaningful measure of compression performance, because it is independent of the precision with which original samples were represented. If the image is displayed or printed with physical size regardless the size of samples, then more meaningful measure in such case is the size of the bit-stream.

Such situation is typical for lossy compression; the bit-rate is a meaningful measure only when N 1 and N 2 are proportional to the physical dimensions with which the image is to be printed or displayed.

Compression algorithms are also estimated by distortion measure, i.e. compression error. The more distortion we allow, the smaller the compression representation can be. The primary goal of lossy compression is to minimize the number of bits required to represent an image with an allowable level of distortion. The measure of distortion is an important feature for lossy compression.

Formally, distortion is calculated between the original image, x x [ n 1 ,n 2 ], and the reconstructed image, x x [ n 1 ,n 2 ]. The quantitative distortion of the reconstructed image is measured by the

mean absolute error (MAE), mean square error (MSE), or peak-to- peak signal to noise ratio (PSNR)1 18:

illustration not visible in this excerpt

The PSNR is expressed in dB (decibels), good reconstructed images typically have value of 30dB.

For estimation of compression method the following measures are applied: speed of compression, robustness against transmission errors and memory requirements of the algorithm. For estimation efficiency of algorithms in this thesis we will use the value of bit rate.

1.2.3 Paradigm of compression

Compression methods of context modeling for data compression are based on a paradigm of compression with applying “universal modeling and coding” proposed by Rissanen and Langdon in 19812. According to the given approach the compression process consists of two separate parts:


Modeling assigns probabilities to the symbols, and coding produces a bit sequence from these probabilities. This concept is illustrated in Figure 1-1.

Figure 1-1.Principle compression scheme based on concept “universal modeling and coding”.

illustration not visible in this excerpt

Decompression scheme is symmetrical to compression scheme illustrated in Figure 1-1. The same model for both coder and decoder is used.

For all compression methods exists following common principle: if representation of oft-recurring elements is short codes and representation of rare-recurring elements is long codes, then the block of data needs a smaller memory size than if all elements were represented by codes of equal length. Exact connection between the symbol probability and its code was first established in Shannon’s “noiseless source coding theorem”4.

The essence of this theorem is that element s i with probability p (s i) is represented more advantageous by code with length - log 2 p (s i) bits. If during the coding process length of codes equals exactly - log 2 p (s i) bits, it means that length of coded bit stream is minimal for all possible compression methods. Such value is denoted as entropy:

illustration not visible in this excerpt

And means information content of an element s i in the alphabet. Here source alphabet is a set of all possible non-recurrent elements from source image. The entropy rate of a random process provides a lower bound of the average number of bits that must be spent in coding each of its outcomes, and this bound may be approached arbitrary closely as the complexity of the coding scheme is allowed to grow without bound.

If the probability distribution F { p (s i)} is invariable and probabilities p (s i) are independent, then the average code length is given by:

illustration not visible in this excerpt

Where k is the number of elements (or symbols) in the alphabet. This value also means entropy of the probability distribution.

In order to achieve good compression rate, exact probability estimation is needed. The more accurately the probabilities of symbols occurrence are estimated, the more closely code length correspond to the optimal, and the better compression is.

Since the model is responsible for the probability estimation of each symbol, statistical modeling is one the most important tasks in data compression. It can be classified into the following three categories:

Static modeling;
Semi-adaptive modeling;
Adaptive (or dynamic) modeling.

Static modeling is the simplest case. In the static modeling the same model (code table) is applied to all input data ever to be coded. Code table with predefined alphabet is constructed based on a test set of data used this alphabet. Unfortunately the static modeling fails if the input data does not base on the same alphabet as model. The advantage of static modeling, is that no side information is needed to transmit to the decoder, and that the compression can be done by one-pass over the input data.

Semi-adaptive modeling analyzes the source data before coding. Probability distribution for symbols coincides to source data stream because the code table is calculated after analyzing the input stream. The main feature for semi-adaptive modeling is collection of statistical information from the source data, and the encoding is based on the semi-adaptive code table. Disadvantage of semi- adaptive modeling is that constructed model must be stored in the compressed file.

Adaptive model 20 changes the symbol probabilities during the compression process in order to adapt the statistics during the process. Initially the compression process starts with an initial model, so the model does not need to be transmitted. During the process, the model adapts by the symbols, which have been transmitted already. It is important that the model gets adapted only by the symbols, which have been transmitted already, since the decoder needs to be able to adapt the model in the same way later at the decompression process. Since the decoder knows the before transmitted symbols, it can adapt to the model in the same way than the coder did.

The properties of different modeling strategies are summarized as follows3:

illustration not visible in this excerpt

1.2.4 Arithmetic coding

Arithmetic coding is a more modern coding method that usually out-performs Huffman coding. Huffman coding assigns each symbol a codeword which has an integral bit length. Arithmetic coding can treat the whole message as one unit.

A message is represented by a half-open interval [ a, b) where a and b are real numbers between 0 and 1. Initially, the interval is [0, 1). When the message becomes longer, the length of the interval shortens and the number of bits needed to represent the interval increases.

Basic idea :

Arithmetic coding contains the following steps:

1. The coding process begins with a “current interval” [ L, H) initialized to [0,1)

2. For each symbol of the source data stream two steps are performed:
2.1 The current interval [ L, H) is divided into subintervals, one for each possible alphabet symbol. The symbol’s subinterval has size that is proportional to the estimated probability of the symbol will be the next symbol in the source data stream, according to the model.
2.2 The subinterval [ L, H) corresponding to the symbol that actually occurs next will become as the new current interval.

3. Final interval is coded by enough bits to distinguish it from all other possible intervals.

1.3 Thesis Objectives:

- Comparison of Lossless image compression techniques based on context modeling.
- Research on how to modify CALIC algorithm in both continuous-tone mode and binary mode.
- Discussing the GCT-I algorithm on medical images.

1.4 Contents of the thesis' Chapters:

- Chapter One: Introduction and basic concepts in image compression.
- Chapter Two: Fundamentals of context modeling.
- Chapter Three: Discussing the considered Lossless image compression techniques (JPEG2000, CALIC and GCT-I) and proposed modifications.
- Chapter Four: Conclusions and Suggestions for Future work.


2.1 Fundamentals

Context modeling is one of the varieties of modeling steps. At present time the compression methods based on context modeling are known, like, CALIC.

The problem is probability estimation of symbols’ occurrence in each position of source data stream. For lossless compression case we can use only the information that is known both of encoder and decoder. It means that probability estimation of occurrence symbol depends on properties of data block previously processed. Let us denote the context modeling as an estimation of probability of occurrence symbol (element, pixel, sample or set of different objects) depending on previous symbols, or a context. In practice, term “context” is used as collection of neighboring symbols, which are surrounding current symbol. It is a context in the broad sense. Left-side and right-side contexts exist in practice, i.e. as left-side context is considered the set of adjoining symbols from left side, for right-side context respectively.

If the length of the context is finite, then context modeling is denoted as finite-context modeling. Finite context modeling is based on the fact that a symbol that has appeared recently will appear in a near future, so the more times that it appears the higher is the probability that it appears again, so every time it has been seen, we increment its probability.

Let us denote the maximum length of usable context as order of context N. For example, during context modeling with order 3 for last symbol in sequence “…milk…” context with maximum length is “mil”; as contexts here are following strings: “mil”, “il”, “i” and empty string. All of these contexts with length from N to 0 denote as active contexts, i.e. for estimation of symbol can be used cumulative

statistics about these contexts. The entropy of an N -order context model is given by4:

illustration not visible in this excerpt

Where p (x | c) is the probability of symbol x in a context c, and N is the number of different contexts.

In general case, the context model for each context with finite length happened in source data stream is generated. Any context model contains counter mechanism of all symbols, which are happened in corresponding context to this model. The value of counter of symbol x in current context increases after each happened of symbol x in this context.

The main question of context modeling is selection of optimal context for more precise estimation. And aim of context modeling is to find such context.

Context model has an order, as was described above. Once we have different orders the question of how to use them arises. Blending is one of the solutions, such method means combining the probability of all the orders in a single probability for every symbol from alphabet5. Such combining is done by adding together the probability of the same symbol under different contexts. However, in practice probabilities of higher orders tend to be more reliable and thus achieve higher compression, the way we reflect this in blending is by weighting higher orders that is multiplying them to give them more importance when they are added together. Every context has a different weight which we can choose beforehand, or change while compressing.

Let us consider context model with order N. Let us assume p (x i | o) is probability of symbol x i from alphabet A in the finite context model with order o. This probability is adaptively and will change during compression process. The blended probabilities are given by following formula5:

illustration not visible in this excerpt

Where w (o) is weight of model with order o and

illustration not visible in this excerpt

As order of context model here is considered length of

corresponding context. In formulas (2.2) and (2.3) the order of

context model 1 is used, context model with order 1 assumes

equal probability for all alphabet symbols. And, for context models with order 0 and 1 we have the same probability distribution of alphabet symbols.

Probability estimation p (x i | o) is defined by following formula:

illustration not visible in this excerpt

Where f (x i | o) is number of appearance of symbol x i in current context with order o, f (o) is overall number of context appearance in the processed data stream. For f (o) exists following equation:

illustration not visible in this excerpt

So, the simple blending can be constructed from the mechanism of choosing the probability estimation of the context by applying formula (2.4).

If weight w ( 1) 0 it is guarantee successful compression for any symbol of source data stream, because existence of context model with order o 1 allows to obtain nonzero probability estimation and code with final length.

Recognize fully blended context modeling, when prediction is defined by statistics of all usable orders in context model, and partially blended otherwise. However using blending is slow, though it gives good compression.


Excerpt out of 79 pages


Comparison of Lossless Image Compression Techniques based on Context Modeling
Image Processing
Catalog Number
ISBN (eBook)
ISBN (Book)
File size
2999 KB
comparison, lossless, image, compression, techniques, context, modeling
Quote paper
Assistant Teacher Mohammad El-Ghoboushi (Author), 2014, Comparison of Lossless Image Compression Techniques based on Context Modeling, Munich, GRIN Verlag,


  • No comments yet.
Read the ebook
Title: Comparison of Lossless Image Compression Techniques based on Context Modeling

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free