In this thesis various methods for lossless compression of source image data are analyzed and discussed. The main focus in this work is lossless compression algorithms based on context modeling using tree structure.
We are going to compare CALIC, GCTI algorithms to the
JPEG2000 standard algorithm which will be considered the reference of comparison.
This work includes research on how to modify CALIC algorithm in continuoustone mode by truncating tails of the error histogram which may lead to improve CALIC compression performance.
Also, we are going to propose a modification to CALIC in binary
mode by eliminating error feedback mechanism. As when any pixel to be encoded has a different grey level than any of the neighboring pixels, CALIC triggers an escape sequence that switches the algorithm from binary mode to continuoustone mode. Which means in this case the pixel will be treated as if it was in continuoustone region. This minor modification should improve CALIC performance in binary images.
Finally, we are going to discuss the GCTI on medical images and compare results to the JPEG2000 standard.
CONTENTS
LIST OF TABLES
LIST OF FIGURES
LIST OF SYMBOLS.
LIST OF ABREVIATIONS
ACKNOWLEDGEMENTSxv
ABSTRACT
1. INTRODUCTION
1.1 Data compression
1.2 Basic concepts in image compression
1.2.1 Lossless and lossy cases
1.2.2 Measures of compression
1.2.3 Paradigm of compression
1.2.4 Arithmetic coding
1.3 Thesis objectives
1.4 Contents of thesis' chapters
2. CONTEXT MODELING
2.1 Fundamentals
2.2 Context Tree
2.2.1 Structure of context tree
2.2.2 Static and semiadaptive approaches
2.2.3 Construction of an initial context tree
2.2.4 Pruning of context tree
3. LOSSLESS IMAGE COMPRESSION TECHNIQUES
3.1 JPEG2000 standard
3.2 CALIC
3.2.1 CALIC Frame Structure
3.2.2 Neighborhood pixels involved
3.2.3 LINEAR PREDICTION  GAP
3.2.4 CODING CONTEXT
3.2.5 CONTEXT MODELING FOR ADAPTIVE ERROR FEEDBACK
i. Contexts Formation
ii. Error Feedback
3.2.6 ENTROPY CODING OF PREDICTION ERRORS
3.2.6.1 Error Sign Flipping
3.2.6.2 Remapping Errors.
3.2.6.3 Histogram Tail Truncation
A. Proposed modification
3.2.7 Continuoustone mode results
3.2.8 BINARY MODE
A. Proposed modification
3.3 General Context Tree based on Intensity
3.3.1 Context template
3.3.2 Compression scheme
3.3.3 GCTI Simulation results
3.4 COMPARATIVE ANALYSIS OF CONSIDERED TECHNIQUES
4. CONCLUSIONS AND SUGGESTION FOR FUTURE WORK
4.1 Conclusions
4.2 Suggestion for Future Work
REFERENCES
Appendix
Appendix
LIST OF TABLES
Table 31 Comparison results of the modified CALIC and the original.
Table 32 Test sets information
Table 33 Bitrate of compression results on natural smooth images (bits per pixel).
Table 34 Bitrate of compression results on medical images (bits per pixel) of modified CALIC in continuous tone mode ..
Table 35 The compression efficiency (bits per pixel)..
Table 36 Modified CALIC Bitrate of compression results on medical images (bits per pixel) after eliminating feedback mechanism
Table 37 GCTI Bitrate of compression results on medical images (bits per pixel).
Table 38 Comparison between the important properties of the lossless algorithms in consideration (JPEG2000, CALIC and GCTI)
LIST OF FIGURES
Figure 11 Principle compression scheme based on concept “universal modeling and coding”
Figure 21 Example of context template with size four
Figure 22 An example of context tree with depth of
Figure 23 Context template with size 4.
Figure 31 The JPEG2000 encoding process.
Figure 32 The DWT principle scheme.
Figure 33 DWT process for the Y component..
Figure 34 Template for context selection
Figure 35 The JPEG2000 decoding process.
Figure 36 CALIC Frame Structure
Figure 37 Causal template for adjacent pixels in prediction and modeling (Neighborhood pixels involved).
Figure 38 Convexity relationship of Î and neighborhood pixels
Figure 39 Twostage adaptive prediction
Figure 310 Reduce conditional entropy by Sign Flipping.
Figure 311 Original Lena Image and its Histogram..
Figure 312 Prediction Error histogram of Different Predictors
(b) ZeroHolder Predictor of H
(c) MED Predictor of H
(d) GAP Predictor of H
Figure 313 Conditional GAP Error.
Figure 314 Feedback adjusted Version Error
Figure 315 Error Sign Flipping and Remapping.
Figure 316 Histogram Truncation Threshold
Figure 317 Feedback mechanism elimination of CALIC
Figure 318 The Overall modified CALIC system
Figure 319 Default location and order of the neighboring pixels with maximum depth
Figure 320 The principle scheme of the GCTI algorithm
Figure 321 Intensity histogram of blood (medical image)
Figure 412 Intensity histogram of Lena (natural smooth image)
Appendix 1 (Original image test set. Set of natural smooth grayscale images)
Appendix 2 (Set of Medical images)
LIST OF SYMBOLS
illustration not visible in this excerpt
LIST OF ABBREVIATIONS
illustration not visible in this excerpt
ACKNOWLEDGMENTS
Most importantly, I am ever grateful to GOD the major source of strength when I worked on my MSc. thesis research and it is under His grace that we live and learn.
I would like to acknowledge the support of all members of my family, my parents, my wife and my daughter "Ratil" for their good support throughout my life.
Also, I would like to express my sincere gratitude to my advisor, Prof. Dr. Amin Mohamed Nassar for his guidance and advice.
Last but not least, I would like appreciate the help offered by the Electronics and Electrical Communications department staff and library staff, faculty of Engineering, Cairo University.
ABSTRACT
In this thesis various methods for lossless compression of source image data are analyzed and discussed. The main focus in this work is lossless compression algorithms based on context modeling using tree structure.
We are going to compare CALIC, GCTI algorithms to the JPEG2000 standard algorithm which will be considered the reference of comparison.
This work includes research on how to modify CALIC algorithm in continuoustone mode by truncating tails of the error histogram which may lead to improve CALIC compression performance.
Also, we are going to propose a modification to CALIC in binary mode by eliminating error feedback mechanism. As when any pixel to be encoded has a different grey level than any of the neighboring pixels, CALIC triggers an escape sequence that switches the algorithm from binary mode to continuoustone mode. Which means in this case the pixel will be treated as if it was in continuoustone region. This minor modification should improve CALIC performance in binary images.
Finally, we are going to discuss the GCTI on medical images and compare results to the JPEG2000 standard.
CHAPTER ONE INTRODUCTION
1.1 Data compression
In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an unencoded representation would use through use of specific encoding schemes. For example, any article could be encoded with fewer bits if one were to accept the convention that the word "compression" be encoded as "comp". One popular instance of compression that many computer users are familiar with is the ZIP file format, which, as well as providing compression, acts as an archiver, storing many files in a single output file.
As is the case with any form of communication, compressed data communication only works when both the sender and receiver of the information understand the encoding scheme. For example, this text makes sense only if the receiver understands that it is intended to be interpreted as characters representing the English language. Similarly, compressed data can only be understood if the decoding method is known by the receiver. Some compression algorithms exploit this property in order to encrypt data during the compression process so that decompression can only be achieved by an authorized party (e.g. through the use of a password).
Compression is useful because it helps reduce the consumption of expensive resources, such as disk space or transmission bandwidth. On the downside, compressed data must be uncompressed to be viewed (or heard), and this extra processing may be detrimental to some applications. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it's being decompressed (you always have the option of decompressing the video in full before you watch it, but this is inconvenient and requires storage space to put the uncompressed video). The design of data compression schemes therefore involve tradeoffs between various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme), and the computational resources required to compress and uncompress the data.
1.2 Basic concepts in image compression
1.2.1 Lossless and lossy cases
The assignment of compression is to code the image data into a compact form, minimizing both the number of bits in the representation, and the distortion caused by the compression.
Lossless image compression algorithms are generally used for images that are documents and when lossy compression is not applicable. Lossless algorithms are especially important for systems transmitting and archiving medical data, because lossy compression of medical images used for diagnostic purposes is, in many countries, forbidden by law.
In such a system, the images or volume slices are stored in the memory since mass storage turns out to be too slow here, the fast lossless image compression algorithm could virtually increase the memory capacity allowing processing of larger sets of data.
An image may be defined as a rectangular array of pixels. The pixel of a grayscale image is a nonnegative integer interpreted as the intensity (brightness, luminosity) of the image. When image pixel intensities are in the range [0, 2N  1], then we say that the image is of N bit depth, or that it is an N bit image. Typical grayscale images are of bit depths from 8 to 16 bits.
Grayscale image compression algorithms are used as a basis for color image compression algorithms and for algorithms compressing other than images 2dimensional data characterized by a specific smoothness.
These algorithms are also used for volumetric 3dimensional data. Sometimes such data, as a set of 2dimensional images, is compressed using regular image compression algorithms. Other possibilities include preprocessing volumetric data before compressing it as a set of 2dimensional images or using algorithms designed exclusively for volumetric data the latter are usually derived from regular image compression algorithms.
We could use a universal algorithm to compress images, i.e., we could simply encode a sequence of image pixels extracted from an image in the raster scan order.
For a universal algorithm such a sequence is hard to compress. Universal algorithms are usually designed for alphabet sizes not exceeding 2^{8} and do not exploit directly the following image data features: images are 2dimensional data, intensities of neighboring pixels are highly correlated, and the images contain noise added to the image during the acquisition process the latter feature makes dictionary compression algorithms perform worse than statistical ones for image data^{1}.
Modern grayscale image compression algorithms employ techniques used in universal statistical compression algorithms. However, prior to statistical modeling and entropy coding the image data is transformed to make it easier to compress.
The fundamental principle for all compression methods is following idea:
If represent oftrecurring elements as short codes and rarerecurring as long codes, then the block of data needs a smaller memory size than if all elements were represented by codes of identical length.
A compression algorithm is “lossless” (or reversible) if the decompressed image is identical with the original. Respectively, a compression method is “lossy” (or irreversible) if the reconstructed image is only an approximation of the original one^{3}.
Some loss of information can be acceptable for the following three reasons:
1. Significant loss can often be tolerated by the human visual system without interfering with perception of the scene content.
2. In most cases, digital input to the compression algorithm itself is an imperfect representation of the real world scene. This is certainly true when the image sample values are quantized version of the real valued quantities.
3. Lossless compression is usually incapable of achieving the high compression requirements of many storage and distribution applications.
The term lossy is used in an abstract sense, and does not mean random lost pixels, but instead means loss of a quantity such as a frequency component, or perhaps loss of noise. The fundamental question of lossy compression methods is where to lose information.
Nevertheless, the lossless compression is often applied in medical applications, because on such images all information has big significance and lossy compression here is intolerable.
Lossless compression is also applied in cases where it is difficult to determine how to introduce an acceptable loss, which will increase compression.
In palletized color images, for example, a small error in the numeric sample value may have an intense effect upon the color representation.
Finally, lossless compression may be appropriate in applications where the image is to be extensively edited and recompressed so that the accumulation of errors from multiple lossy compression operations may become unacceptable.
In the definition of lossless and lossy compression, it is assumed that the original image is in digital form. For compression digital images are used; but source may be in analog view in the real world, and therefore, the loss in image quality already takes place in digitalization of source images, when the picture is converted from analog to digital representation. For simplicity, in compression the digitalization phase is skipped, images are restored in digital form.
1.2.2 Measures of compression
The compression methods are evaluated by two main criteria:
Compression efficiency and distortion. The most obvious measures of compression efficiency are “bitrate” and “compression ratio”. For our purposes an image is a two dimensional sequence of sample values:
illustration not visible in this excerpt
Having finite size, N 1 and N 2, in vertical and horizontal directions respectively. The sample value x [ n 1 ,n 2 ] of source image is intensity of the location [ n 1 ,n 2 ] and can have the following values:
Abbildung in dieser Leseprobe nicht enthalten
For unsigned imagery, where B is the number of bits on each sample.
The purpose of image compression is image representation with a string of binary digits or “bits”, called the compressed “bit stream”, denoted as C. The objective is to keep the length C as small as possible. In the absence of any compression, we require N 1 N 2 B bits to represent the image sample values. Let us define the compression ratio as following equation^{18}:
illustration not visible in this excerpt
Equivalently, we define the compressed bitrate, expressed in bps (bits per sample), as follows:
illustration not visible in this excerpt
Bitrate is the most obvious measure of compression efficiency; it shows the average number of bits per stored pixel of the image.
For lossy compression bitrate is a more meaningful performance for image compression systems, since the least significant bits of high bitdepth imagery can often be excluded without significant visual distortion. The average number of bits spent in representing each image sample is often a more meaningful measure of compression performance, because it is independent of the precision with which original samples were represented. If the image is displayed or printed with physical size regardless the size of samples, then more meaningful measure in such case is the size of the bitstream.
Such situation is typical for lossy compression; the bitrate is a meaningful measure only when N 1 and N 2 are proportional to the physical dimensions with which the image is to be printed or displayed.
Compression algorithms are also estimated by distortion measure, i.e. compression error. The more distortion we allow, the smaller the compression representation can be. The primary goal of lossy compression is to minimize the number of bits required to represent an image with an allowable level of distortion. The measure of distortion is an important feature for lossy compression.
Formally, distortion is calculated between the original image, x x [ n 1 ,n 2 ], and the reconstructed image, x x [ n 1 ,n 2 ]. The quantitative distortion of the reconstructed image is measured by the
mean absolute error (MAE), mean square error (MSE), or peakto peak signal to noise ratio (PSNR)^{1} ^{18}:
illustration not visible in this excerpt
The PSNR is expressed in dB (decibels), good reconstructed images typically have value of 30dB.
For estimation of compression method the following measures are applied: speed of compression, robustness against transmission errors and memory requirements of the algorithm. For estimation efficiency of algorithms in this thesis we will use the value of bit rate.
1.2.3 Paradigm of compression
Compression methods of context modeling for data compression are based on a paradigm of compression with applying “universal modeling and coding” proposed by Rissanen and Langdon in 1981^{2}. According to the given approach the compression process consists of two separate parts:
modeling;
coding.
Modeling assigns probabilities to the symbols, and coding produces a bit sequence from these probabilities. This concept is illustrated in Figure 11.
Figure 11.Principle compression scheme based on concept “universal modeling and coding”.
illustration not visible in this excerpt
Decompression scheme is symmetrical to compression scheme illustrated in Figure 11. The same model for both coder and decoder is used.
For all compression methods exists following common principle: if representation of oftrecurring elements is short codes and representation of rarerecurring elements is long codes, then the block of data needs a smaller memory size than if all elements were represented by codes of equal length. Exact connection between the symbol probability and its code was first established in Shannon’s “noiseless source coding theorem”^{4}.
The essence of this theorem is that element s i with probability p (s i) is represented more advantageous by code with length  log 2 p (s i) bits. If during the coding process length of codes equals exactly  log 2 p (s i) bits, it means that length of coded bit stream is minimal for all possible compression methods. Such value is denoted as entropy:
illustration not visible in this excerpt
And means information content of an element s i in the alphabet. Here source alphabet is a set of all possible nonrecurrent elements from source image. The entropy rate of a random process provides a lower bound of the average number of bits that must be spent in coding each of its outcomes, and this bound may be approached arbitrary closely as the complexity of the coding scheme is allowed to grow without bound.
If the probability distribution F { p (s i)} is invariable and probabilities p (s i) are independent, then the average code length is given by:
illustration not visible in this excerpt
Where k is the number of elements (or symbols) in the alphabet. This value also means entropy of the probability distribution.
In order to achieve good compression rate, exact probability estimation is needed. The more accurately the probabilities of symbols occurrence are estimated, the more closely code length correspond to the optimal, and the better compression is.
Since the model is responsible for the probability estimation of each symbol, statistical modeling is one the most important tasks in data compression. It can be classified into the following three categories:
Static modeling;
Semiadaptive modeling;
Adaptive (or dynamic) modeling.
Static modeling is the simplest case. In the static modeling the same model (code table) is applied to all input data ever to be coded. Code table with predefined alphabet is constructed based on a test set of data used this alphabet. Unfortunately the static modeling fails if the input data does not base on the same alphabet as model. The advantage of static modeling, is that no side information is needed to transmit to the decoder, and that the compression can be done by onepass over the input data.
Semiadaptive modeling analyzes the source data before coding. Probability distribution for symbols coincides to source data stream because the code table is calculated after analyzing the input stream. The main feature for semiadaptive modeling is collection of statistical information from the source data, and the encoding is based on the semiadaptive code table. Disadvantage of semi adaptive modeling is that constructed model must be stored in the compressed file.
Adaptive model ^{20} changes the symbol probabilities during the compression process in order to adapt the statistics during the process. Initially the compression process starts with an initial model, so the model does not need to be transmitted. During the process, the model adapts by the symbols, which have been transmitted already. It is important that the model gets adapted only by the symbols, which have been transmitted already, since the decoder needs to be able to adapt the model in the same way later at the decompression process. Since the decoder knows the before transmitted symbols, it can adapt to the model in the same way than the coder did.
The properties of different modeling strategies are summarized as follows^{3}:
illustration not visible in this excerpt
1.2.4 Arithmetic coding
Arithmetic coding is a more modern coding method that usually outperforms Huffman coding. Huffman coding assigns each symbol a codeword which has an integral bit length. Arithmetic coding can treat the whole message as one unit.
A message is represented by a halfopen interval [ a, b) where a and b are real numbers between 0 and 1. Initially, the interval is [0, 1). When the message becomes longer, the length of the interval shortens and the number of bits needed to represent the interval increases.
Basic idea :
Arithmetic coding contains the following steps:
1. The coding process begins with a “current interval” [ L, H) initialized to [0,1)
2. For each symbol of the source data stream two steps are performed:
2.1 The current interval [ L, H) is divided into subintervals, one for each possible alphabet symbol. The symbol’s subinterval has size that is proportional to the estimated probability of the symbol will be the next symbol in the source data stream, according to the model.
2.2 The subinterval [ L, H) corresponding to the symbol that actually occurs next will become as the new current interval.
3. Final interval is coded by enough bits to distinguish it from all other possible intervals.
1.3 Thesis Objectives:
 Comparison of Lossless image compression techniques based on context modeling.
 Research on how to modify CALIC algorithm in both continuoustone mode and binary mode.
 Discussing the GCTI algorithm on medical images.
1.4 Contents of the thesis' Chapters:
 Chapter One: Introduction and basic concepts in image compression.
 Chapter Two: Fundamentals of context modeling.
 Chapter Three: Discussing the considered Lossless image compression techniques (JPEG2000, CALIC and GCTI) and proposed modifications.
 Chapter Four: Conclusions and Suggestions for Future work.
CHAPTER TWO CONTEXT MODELING
2.1 Fundamentals
Context modeling is one of the varieties of modeling steps. At present time the compression methods based on context modeling are known, like, CALIC.
The problem is probability estimation of symbols’ occurrence in each position of source data stream. For lossless compression case we can use only the information that is known both of encoder and decoder. It means that probability estimation of occurrence symbol depends on properties of data block previously processed. Let us denote the context modeling as an estimation of probability of occurrence symbol (element, pixel, sample or set of different objects) depending on previous symbols, or a context. In practice, term “context” is used as collection of neighboring symbols, which are surrounding current symbol. It is a context in the broad sense. Leftside and rightside contexts exist in practice, i.e. as leftside context is considered the set of adjoining symbols from left side, for rightside context respectively.
If the length of the context is finite, then context modeling is denoted as finitecontext modeling. Finite context modeling is based on the fact that a symbol that has appeared recently will appear in a near future, so the more times that it appears the higher is the probability that it appears again, so every time it has been seen, we increment its probability.
Let us denote the maximum length of usable context as order of context N. For example, during context modeling with order 3 for last symbol in sequence “…milk…” context with maximum length is “mil”; as contexts here are following strings: “mil”, “il”, “i” and empty string. All of these contexts with length from N to 0 denote as active contexts, i.e. for estimation of symbol can be used cumulative
statistics about these contexts. The entropy of an N order context model is given by^{4}:
illustration not visible in this excerpt
Where p (x  c) is the probability of symbol x in a context c, and N is the number of different contexts.
In general case, the context model for each context with finite length happened in source data stream is generated. Any context model contains counter mechanism of all symbols, which are happened in corresponding context to this model. The value of counter of symbol x in current context increases after each happened of symbol x in this context.
The main question of context modeling is selection of optimal context for more precise estimation. And aim of context modeling is to find such context.
Context model has an order, as was described above. Once we have different orders the question of how to use them arises. Blending is one of the solutions, such method means combining the probability of all the orders in a single probability for every symbol from alphabet^{5}. Such combining is done by adding together the probability of the same symbol under different contexts. However, in practice probabilities of higher orders tend to be more reliable and thus achieve higher compression, the way we reflect this in blending is by weighting higher orders that is multiplying them to give them more importance when they are added together. Every context has a different weight which we can choose beforehand, or change while compressing.
Let us consider context model with order N. Let us assume p (x i  o) is probability of symbol x i from alphabet A in the finite context model with order o. This probability is adaptively and will change during compression process. The blended probabilities are given by following formula^{5}:
illustration not visible in this excerpt
Where w (o) is weight of model with order o and
illustration not visible in this excerpt
As order of context model here is considered length of
corresponding context. In formulas (2.2) and (2.3) the order of
context model 1 is used, context model with order 1 assumes
equal probability for all alphabet symbols. And, for context models with order 0 and 1 we have the same probability distribution of alphabet symbols.
Probability estimation p (x i  o) is defined by following formula:
illustration not visible in this excerpt
Where f (x i  o) is number of appearance of symbol x i in current context with order o, f (o) is overall number of context appearance in the processed data stream. For f (o) exists following equation:
illustration not visible in this excerpt
So, the simple blending can be constructed from the mechanism of choosing the probability estimation of the context by applying formula (2.4).
If weight w ( 1) 0 it is guarantee successful compression for any symbol of source data stream, because existence of context model with order o 1 allows to obtain nonzero probability estimation and code with final length.
Recognize fully blended context modeling, when prediction is defined by statistics of all usable orders in context model, and partially blended otherwise. However using blending is slow, though it gives good compression.
[...]

Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X. 
Upload your own papers! Earn money and win an iPhone X.