Image Color Dimension Reduction. A comparative study of state-of-the-art methods

Textbook, 2016

64 Pages

Free online reading


1. Introduction

2. Color perception
2.1. Biological aspects of color perception
2.2. Physical aspects of color perception
2.3. Contextual aspects of color perception

3. The need for color dimension reduction
3.1. Color-to-dichromatic conversion.
3.2. Human perception-driven decolorization methods
3.3. Color-to-grayscale conversion for intensity image processing
3.4. Recoloring images

4. Color formation during image production

5. Classical color-to-grayscale conversion methods
5.1. The weighted sum methods
5.2. Color-to-grayscale conversion based on color spaces
5.3. Color-to-grayscale conversion based on data analysis

6. The problem of color-to-grayscale conversion

7. Categories of state-of-the-art color-to-grayscale methods
7.1. Global color-to-grayscale conversion methods
7.2. Local color-to-grayscale conversion methods
7.3. Hybrid color-to-grayscale conversion methods

8. Evaluation procedure of color-to-grayscale methods
8.1. Image datasets
8.2. Objective performance parameters
8.3. Subjective evaluation

9. Comparison of state-of-the-art methods for color-to-grayscale conversion

10. Examples of application of color dimension reduction

11. Final remarks



Key words: Color dimension reduction, color-to-grayscale, decolorization, image recoloring, luminance, color perception.

1. Introduction

Color is a human sensation depending on the brain’s response to a specific visual stimulus and we experience color as one single attribute. Although we can precisely describe color by measuring its spectral power distribution, this leads to a large degree of highly redundant data. The reason for this redundancy is that the eye’s retina samples color using only three broad bands, roughly corresponding to red (R), green (G), and blue (B) light, belonging to the response of three type of sensitive cells (cones) termed L (long), M (media), and S (short) in relation to the wavelength of frequency band they respond to. The signals from these sensors, together with those from the rods (sensitive to intensity only), are combined in the brain to provide several different sensations of the color. Color production in digital RGB images is similar in some aspects, like in the usage of three sensors for R, G, and B lights, but unfortunately it lacks in sensors for intensity only (i.e. rods) and also in an analyzer similar to the human brain to interpret contextually and psychologically the signal from those three sensors.

The above-mentioned characteristics of color production in digital images difficult the process of interpreting color images, neglecting or reducing their chromatic content because this task becomes a problem of dimension reduction, from the three-dimensional (3D) space of the RGB representation to the two- or one-dimensional (1D) space of intensities: the color reduction and the color-to-grayscale (C2G) conversion, respectively. For some color reduction procedures, it is a problem similar to color quantization and compression of gamut dimensionality. However, in any case, there is a loss of visual information and, most of the time, the conversion usually preserves luminance contrast without an appropriate handing of the chrominance (hue and saturation) meaning. This is the case of the classical C2G image conversion methods in widespread use.

The most recurrent usage of color reduction is the C2G image conversion, also named color removal or decolorization, which can be grouped in two sets. The first set is that of the C2G image conversion procedures in which the goal is human perception of the grayscale image, including the problem of color conversion or recoloring images for human observers with color-deficient vision; and the second set is that of the C2G image conversion methods in which the goal is 1D image processing, including artificial vision, the use of well established procedures for grayscale image analysis like segmentation, thresholding, texture segmentation, etc... Roughly speaking, the first set of C2G image conversion methods needs to generate a grayscale image perceptually equivalent to the original color image, where the perceived color difference between any pair of colors should be proportional to the perceived gray difference in the grayscale image, which is a very complex task. On the other hand, the second set of C2G image conversion procedures needs to generate grayscale images also with global consistency, that is, when two pixels have the same color in the color image, they might have the same gray level in the grayscale image. Section three will enhance this analysis.

This work has only academic intention, and it is a comparative study of the more than twenty state-of-the-art C2G methods developed during the first 15 years of the XXI century, according to recent publications. However, other additional methods published in this lapse are mentioned, confirming that the problem of C2G conversion is an active area of research, mainly because this problem seldom ensures both quality and efficiency simultaneously. The work has been structured in eleven parts. The second section expounds on the fundamentals of color attributes according to the human perception. The third part comments on the need of color dimension reduction using different situations in which it is an indispensable task. The fourth section is a fast review of the color formation during digital image production. The fifth section is a critical analysis of the classical C2G methods which were developed in the last part of the XX century, mainly those using lightness or luminance of different color spaces. The sixth section enumerates the goals and constraints related to the C2G conversion problem. The seventh section describes the characteristics of the different categories of C2G methods. The eighth section explains the state-of-the-art in evaluation procedures for C2G methods describing image datasets and metrics of performance. The ninth section presents the comparison of the state-of-the-art methods for C2G image conversion including authors, year of publication, method name, description, and database used for its evaluation, type of evaluation with parameters measured, and drawbacks. The tenth section shows some examples of the application of color dimension reduction. The last section considers some final remarks.

2. Color perception

As it has been stated above, color is the visual sensorial property corresponding in humans to the categories named red, yellow, green, blue, cyan,… formed when the light (the visible part of the electromagnetic radiation) reaches our light sensors in eye retina producing a stimulus interpreted by the brain. Light is transmitted by photons (particles without mass) characterized by their energy E, which is proportional to a sole parameter, the frequency f, been the proportion the Planck’s constant h = 6.625x10-34 J-s; that is, E = hf. Light can also be characterized by its wavelength l , which is related to the light speed across each medium v = l f. Since light speed depends of each medium (according to its refraction index) the wavelength does too. Then, the visible spectrum comprises frequencies from f » 420 THz for the red color to f » 750 THz for the blue color, corresponding approximately to wavelengths from l » 730 nm to l » 400 nm respectively as it can be seen in Figure 1.

illustration not visible in this excerpt

Fig. 1. Frequency f and wavelength l for the visible interval of the electromagnetic radiations.

Color perception depends on some aspects which can be grouped as:

- biological, related to physiological response of the human visual system (eyes and brain),
- physical, related to light properties (brightness, hue, saturation, etc.), and
- contextual, related to the psychological response of the brain (psycho-sensorial, training, etc.).

2.1. Biological aspects of color perception

Biological aspects of color perception depends on the light sensitivity of two kind of nerve cells:

- rods, sensible to light intensity or luminance (the measured amount of light in units of candelas per square meters (cd/m2)), having each eye about 100 millions of rods, and

- cones, sensible mainly to color (as it can be seen in Fig. 2), contrast, and brightness having each eye about 6 million cones which are of three types:

- S, approximately the 2% of the cones, sensible mainly to short wavelength (the perceived blue-violet lights),
- M, approximately the 33% of the cones, sensing mainly the middle wavelength (the perceived green lights), and
- L, approximately the 65% of the cones, sensing mainly the long wavelength (the perceived yellow-green lights).

illustration not visible in this excerpt

Fig. 2. Light sensitivity of the three types of cones.

2.2. Physical aspects of color perception

The physical perception of colors can be identified by different basic properties or dimensions which have been defined by the CIE, the Commission Internationale de l’Eclairage (International Commission on Illumination), the international standardization organization devoted to the establishment of recommendations about all matters relating to the science and art of lighting [BVM07, Kin11]. The dimensions that are used depend on how we perceive the color. If the color is perceived as the property of a light source, it is called an aperture or a self-luminous color; however, if is a property of a surface, it is called an object color. For aperture colors, the dimensions are hue, saturation and brightness.

- Hue (H): the human sensation according to which an area appears to be similar to one of these perceived colors, corresponding to the main primary colors red, yellow, green, blue and magenta (i.e. primary hues). It corresponds to the dominant wavelength of a spectral power distribution. Since black, white or gray are colors without any hue, they are named neutral colors or achromatic colors. Each hue can then be described as a mixture of two of these primary hues, such as yellow-red (orange) or blue-red (purple). Not all combinations of the four primary hues make sense, though. One cannot perceive a yellowish blue or a reddish green. This observation led to the so called opponent-color theory.
- Saturation (S): the sensation according to which the perceived color of an area appears to be more or less chromatic, judged in proportion to its brightness. Saturation represents the purity of a perceived color like bright, pale, duty... It corresponds to the ratio between the energy of the dominant wavelength of a spectral power distribution and the sum of the energies of the other wavelengths.
- Brightness (B): the human sensation by which an area appears to emit more or less light. Thus, it corresponds to a feeling in terms of light, like dark or luminous, and characterizes the luminous level of a color stimulus. The terms radiant intensity, illumination, brilliance... are often employed in the literature, to indicate the concept of brightness.

For object colors, the same definitions for hue and saturation are used. The intensity depends both on the incident light and the surface reflectance, both defining the term lightness. For a given illumination, a white object will always reflect more light than a colored object, simply because it reflects at all wavelengths. In this way, if x and y are Cartesian coordinates in the two-dimensional image plane, the lightness L (x, y), reflectance R (x, y) and illumination IL (x, y) are related by the simple equation,

illustration not visible in this excerpt

Due to this, a white object is used as the reference for all other lightness to be measured, thus, the next definition:

- Lightness (L): the sensation according to which a body seems to reflect (diffusely) or transmit a greater or smaller fraction of the incident light. The attribute lightness requires, next to the response to the incident light, the response to reference white.

Then, lightness is the attribute by which the object appears to reflect (or transmit) more or less of the incident light (related to intensity, luminosity, luster, or shimmer), whereas brightness is the attribute of light-source by which emitted light is ordered from bright to dark (related with value, brilliance or radiance) been a photometric value (with standardized visual units - cd/m2). In the CIE 1987 publication, lightness is defined as relative brightness. The terms luminance (L) or luminosity are employed in the literature, to indicate the concept of both lightness and brightness.

According to (1), it is impossible to determine from the lightness of any image point whether it is part of a light surface that is shaded or a dark surface that is brightly illuminated. Then, the pixel lightness is inherently ambiguous because an infinite combination of the two unknowns, R (x, y) and IL (x, y) can produce the same L (x, y).

Hue and saturation taken together are called chrominance or chromaticity, then a color may be characterized by its luminance and chromaticity. That is because the human eyes can divide what they see into a luminance (grayscale) dimension in the rods and a chromatic dimension in the cones; precisely, it is the luminance dimension the most basic to vision and to understand the appearance of the objects (lightness) and the intensity of the light sources (brightness).

Moreover, the physical perception of colors is related to two more attributes: contrast and gamma. Contrast C is a very relative concept that is highly reliant on context. Commonly, the first definition is the difference of intensity between any object (foreground) and its background, then we can take intensity contrast and color contrast too. Contrast is also interpreted as the range of intensity values effectively used within a given image. This is the case of the Michelson’s definition of contrast, also known as the relative lightness contrast for grayscale images,

illustration not visible in this excerpt

where L max and L min are the maximum and minimum luminance values.

Another definition of contrast is the absolute luminance difference (between the target and the background) divided by the background luminance, and for grayscale images with mono-modal histograms, contrast is also related to the statistical standard deviation of the pixel intensities in its histogram [LI12].

According to You et al. [YBW16], while earlier research in color-to-grayscale conversion algorithms focused more on contrast in color space, current research considers spatial context. However, perceiving spatial context is rather complex. Thus, a perceptional consistent contrast model is desired as it will be exposed in section 6.

On the other hand, the gamma parameter, g, is the power-law regulating the intensity of luminance relationship between output (L out) and input (L in); that is, it controls the non-linear mapping between the original and modified image luminance as it has been shown in Figure 3.

illustration not visible in this excerpt

Fig. 3. Non-linear gamma intensity mapping.

Commonly, it is preferable the use of 1/ g instead of g because it is more intuitive. Note that image intensities might be real values between 0 (black) and 1 (white); however, for 24 bits per pixel (bpp) RGB images, the gamma of each component can be modified independently by

illustration not visible in this excerpt

2.3. Contextual aspects of color perception

Humans often take for granted their ability to recognize objects under a wide range of viewing conditions [Kin11]. We succeed at this task despite the fact that different combinations of illumination and surface reflectance can produce identical images, then under ordinary viewing conditions, an observer can select from multiple sources of information depending on the context. For example, one of the context effects is the simultaneous contrast as it can be seen in Figure 4. In this figure, two central disks are physically (luminance and chrominance) the same, but appear to be different. The difference in appearance is caused by the fact that each disk is seen in the context of a different annular surround.

illustration not visible in this excerpt

Fig. 4. Simultaneous contrast in color images where the central disk appear to be physically different in illumination and coloration but they are the same.

Other examples of the contextual color perception are contrast sensitivity causing the Mach bands, contrast crispening, etc. A more detailed analysis of these manifestations of contextual color perception can be seen in [Ade00].

3. The need for color dimension reduction

Digital color images are commonly produced as a three-dimensional representation in the RGB color space, or in some other color space generally as a 3D space too. However, for numerous applications, they might be dimensionally reduced. Color dimension reduction is usually performed from the 3D RGB space to 2D (dichromatic) or 1D space (grayscale) for several requirements.

Grayscale printing devices are perhaps the most common of these dimension reductions, where the vectors representing the color dimension of each pixel might be reduced to a scalar value which vary from black at the lower intensity to white at the highest intensity. Currently, color printers are widely introduced in the market with excellent performance; however grayscale printers abound not only due to economic reasons but also to artistic ones. By the way, grayscale images have been a favorite artistic choice among many photographers around the globe, and “black-and-white” fax machine are up to now in the market. As this example shows, color-to-grayscale conversion is required in many single-channel image processing applications. The challenge is that every different color component in the input image becomes visible in a separate gray value at the grayscale image to more easily perform the identification or segmentation process of the object of interest from the background. Algorithms based on grayscale images have been widely employed with relatively high success and simplicity. Color-to-grayscale conversion might act as in “black-and-white” photography when the photographer situates a light filter in front of the camera lent. For example, to situate an orange filter for increasing the gray contrast between sky and clouds in the “black-and-white” image. As it has been previously mentioned, the need of C2G image conversion might be founded in human perceptual porpoises or in intensity digital image processing porpoises. Sections 3.2 and 3.3 will explain these needs more explicitly.

A color dimension reduction, not as common as color-to-grayscale conversion, is a color-to-dichromatic conversion in which the color image is reduced to the combination of only a pair of primary colors. Section 3.1 will expound the basic aspects of this type of conversion.

Moreover, some of the recoloring procedures accomplish any of the color dimension reduction methods, mainly those performing C2G conversion. Recoloring is widely employed for convert from grayscale to color images, but also for color-to-color conversion. Recoloring might act as when a scene is illuminated by colored lights to enhance or modify its color content or when in microscopy the sample is stained to modify the chromaticity of some of its parts. Also it might be used as a method for lightness contrast enhancement (LCE), color constancy (also named color normalization or standardization), and other color transformation procedures. Section 3.4 will describe the fundamentals of the recoloring procedures. A more detailed explanation of color dimension reduction can be read in [Ras05].

3.1. Color-to-dichromatic conversion.

The proliferation of color printers and display devices allows an extensive use of the color to reveal rich information content, increasing the significance of color for effective visual communication. However, human observers with color-deficient vision may experience a problem similar to the loss of information during C2G image conversion, in that they may perceive distinct colors as indistinguishable with a loss of image details [RGW05a, b]. Roughly around 5%–8% of men and 0.8% of women have a certain kind of color deficiency [HTWW07]. In the works of Rasche et al., the same strategy that is used in converting color images to grayscale provides the method for recoloring the images to deliver increased information content to such observers. This problem is similar to the usual color gamut mapping (CGM). Color gamut mapping redraws image’s colors to colors that are reproducible on a specific device, e.g., a display or a printer, stating the fact that preserving the relative differences between colors is often more important than preserving their absolute values [LU12].

According to Rasche et al., deficiencies in color vision arise from differences in pigmentation of optical photoreceptors. Normal people are trichromatic observers having L, M, and S cones, but anomalous trichromatopia is a condition in which the pigment in one cone is not sufficiently distinct from the others. The viewer still has three distinct spectral sensitivities, but the separation is reduced. Dichromatopia results when the viewer has only two distinct pigments in the cones. For both dichromatopia and anomalous trichromatopia, there are three sub-classifications depending on which cone has the abnormal pigmentation. For dichromats, deficiencies in cones sensitive to long (L) wavelengths are referred to as protanopic, while deficiencies in those sensitive to medium (M) and short (S) wavelengths are referred to as deuteranopic or tritanopic, respectively. Protanopic and deuteranopic deficiencies, the most common forms of color-deficient vision, are characterized by difficulties distinguishing between red and green tones. Tritanopic deficiencies are associated with confusions between blue and yellow tones.

Then, the main idea is that the perceived color difference between any pair of colors should be proportional to their perceived gray difference. The goal is to obtain a dichromatic version of the original full color image for color-deficient observers, tailored to the viewer’s visual characteristics, and focusing on the differences between pairs of colors. This problem can be also seen as a color quantization. The goal of color quantization is to preserve image detail while reducing the number of colors in the image from n to m, where usually n >> m. Color quantization algorithms have two components, palette selection and mapping. Clever palette selection is the major component and it is critical to the success of these algorithms. Mapping image colors to the selected palette is usually a simple, three-dimensional error diffusion.

Significant work has been done in simulating color-deficient vision [HTWW07, ATAK07, HCJW09]. Then, given the ability to simulate color-deficient vision, it may be possible to re-color images in such a way that confused detail is restored for color-deficient observers, generating images that are more comprehensible to them. For example, in the CIEL*a*b* color space, protanopic and deuteranopic viewers have an extremely small response in a * direction while the tritanopic viewers have a narrow response in the b * direction [RGW05a]. Based on this, a RGB image can be first converted to grayscale taking into account chromatic differences together with luminance information, then substituting the L * component of the CIEL*a*b* color space by the grayscale image as luminance information. Finally, a mapping back can be done to reconstruct the color image enhancing the b * layer for the protanopic and deuteranopic cases, and the opposite mapping for the tritanopic case.

3.2. Human perception-driven decolorization methods

The visual appearance of a color image might be robustly reproduced in the grayscale image for human perception. This means that conversion might preserve feature discrimination and reasonable color ordering, while respecting the original luminance of colors [KJDL09].

The usual choice of luminance as the quantity for the gray values is because it is the achromatic response to a color stimulus, measuring how light a color appears to be when compared to an equally light white. Also, human vision itself relies on luminance more than on anything else. However, at constant illumination, the lightness of a color increases with its saturation; this phenomenon is known as the Helmholtz-Kohlrausch effect (H-K effect), and has been reported to have a strong hue-dependency. This means that while luminance is the dominant contributor to lightness perception, the chromatic component also contributes, and this contribution varies according to both hue and luminance. The H-K effect also explains why given two isoluminant colors, the more colorful sample appears lighter, and why a chromatic stimulus with the same luminance as a white reference stimulus will appear lighter than the reference [SLTN08].

From the perceptual point of view, visual cues are measurements that can be extracted from a color image by a perceiver. They indicate the state of some properties of the image that the perceiver is interested in perceiving. Different people may perceive different cues from the same color image. Some authors have been proposing different cues, and examples of this are the color spatial consistency (CSC), the image structure information (ISI), and the color channel perception priority (CCPP) [STCLC10]. For example, some recoloring methods have been proposed for helping color blind people better perceive visual cues in color images.

According to Song et al. [STCBY13], although many representative methods have been proposed for C2G conversion, not all of them consider the important distribution of visual information from the view of the human visual system (HVS). It has been widely acknowledged that HVS is usually sensitive to the important visual cues and neglects unimportant ones in an image. They consider as a significant aspect the visual attention consistency between the color image and the converted grayscale one. Inspired by the Helmholtz-Kohlrausch principle stating that “we immediately perceive whatever could not happen by chance”, they propose an algorithm for color-to-gray which solves this problem defining the chance of happening (CoH) to measure the attention level of each pixel in a color image based on natural image statistics. The CoH of a pixel represents how likely to appear this pixel will be at a specific location in an image. It is mainly determined by two groups of priors, which are color statistics-based priors (CSP) and spatial correlation-based priors (SCP).

As an example, Figure 5 shows the image of the Claude Oscar Monet's (France, 1840-1926) masterpiece "Impression Sunrise", which has been widely used for validating C2G algorithms. The C2G conversion methods utilized in this figure will be explained along this work. As it can be seen in Figure 5, the feeling of sunrise in the original color image can be easily lost in conversion to grayscale. Using the rgb2gray function of Matlab without considering color contrast can lead to the loss of the sun reflection on water and to the diminishment of sun brightness. Using the L * component of the CIEL*a*b* color space not only affects the sun brightness and its reflection on water, but additionally changes the feeling of the image to fog. Using the method of Lu et al. [LXJ12] causes a strongly enhanced contrast while allowing the brightness to change aggressively, changing the feeling of the image to moonlight. Using the ApparentGrayscale method of Smith et al. [SLTM08] changes the feeling of the image to haze, affecting the clouds and sun reflection on the sky. Finally, using the method of Kim et al. [KJDL09] causes a loss of the separation between water and sky. Generally speaking, a C2G conversion without considering visual appearance can lead to a non-natural result.

illustration not visible in this excerpt

Fig. 5. Images of the Monet’s masterpiece “Impression Sunrise”. a) Original, and grayscale versions utilizing b) rgb2gray function of Matlab, c) L * of the CIEL*a*b*, d) RTCP [LXJ12], e) ApparentGrayscale [SLTM08], f) Kim09 [KJDL09].

Also, human perception tends to firstly group the perceptually similar elements while looking at an image, according to the Gestalt principles. The Gestalt school of psychology [Kin11, WSL12, JFWM15] proposed many theories to explain how humans naturally perceive the world. The essence of Gestaltism is that “the whole is greater than the sum of its parts”. The Gestaltists describe a number of “principles” that appear to guide the organization of elements into perceptual groups. Then, Gestalt principles are rules of the organization of perceptual scenes. These theories attempt to describe how people organize visual elements into groups or unified wholes when certain principles are applied. These principles include similarity, continuation, closure, proximity, figure, etc. According to the Gestalt principles, for an image, human perception tends to group pixels with similar color or texture while looking at it. Then, a group of pixels with similar color can be identified as a region of an object. For example, Wu et al. [WSL12] propose a two-scale approach for converting color images into grayscale ones. The image is represented as perceptual groups according to Gestalt principles. The global gray tone of the resulting grayscale image is determined by the averaged color of each group as well as its surrounding.

Furthermore, objects are high-level information and it is not an easy task to identify or recognize objects from an image. In this way, the above mentioned aspects of C2G conversion for human perception describe the difficulty of this task and explain the why of the great proliferation of decolorization methods recently publicized, as the complexity of such algorithms.

A more challenging task from the human perceptual point of view is video decolorization. Video decolorization can be assumed as an extension of the image C2G conversion, but it is more complex due to the temporal coherence that needs to be preserved between adjacent frames; that is, video decolorization adds another dimension to the problem of image decolorization, as temporal coherence needs to be guaranteed for the entire video sequence [AAB11, AAHB11, DHSML15].

3.3. Color-to-grayscale conversion for intensity image processing

Diverse image processing systems and algorithms have been developed to only operate with intensity images because of its relatively high success and simplicity. That is the case of grayscale morphological processing algorithm, most of the thresholding procedures, textures segmentation, feature extraction, stereo vision, the Canny operator for edge detection, and others in which, later in the processing, the algorithm will only consider a single numerical value instead of a 3D value. For example, in many of the computer vision applications, C2G conversion algorithms are required to preserve the salient features of the color images, such as brightness, contrast and structure of the color image. The target is the conversion of the color image to grayscale image in order to reduce complexity without compromising the importance of details, but the visual appearance might not be the main problem. In this approach, the corresponding grayscale images will then be used for the subsequent processing steps as most of the current available image processing techniques are referred to the grayscale images. Additionally, color image mosaicing, a procedure which condenses in a single bigger image (the mosaic) the content of many smaller images (the tiles) is usually accomplished using the grayscale versions of the tiles. The process applies intensity-based registration to drive the warping of each tile. Once the mosaic is created, the colors are restored [TU07]. Mosaicing is a usual procedure to increase the field of view (FOV) in many imaging techniques as in microscopy or remote sensing.

Decolorization, an apparently simple problem which aims at converting color images into grayscale images while preserving structures and contrasts in original colors images, has recently received great attention. Mapping 3D color information onto a 1D grayscale image preserving the original appearance, contrast and finest details is not a trivial task. There is no general solution for this problem, due to the fact that most of the existing algorithms cannot perform well in certain circumstances and the resultant grayscale images produced have low contrast and interpretability.

The solution substantially depends on the specific information that must be preserved when reducing a multi-component image. To retain features discriminately in RGB to gray conversion, color differences in the input image should be reflected as much as possible onto the converted grayscale values. Then, an important feature of C2G conversion for intensity image processing is the need that when two pixels have the same color in the color image, they will have the same gray level in the grayscale image. This constraint is not met by all C2G conversion methods, mainly those termed as local mapping.

Another important characteristic to be met in the grayscale images to be digitally processed is the need of continuous mapping reducing image artifacts, such as false contours in homogeneous color regions. Unfortunately, some C2G conversion methods which incorporate chrominance information to the luminance layer, to increase the differentiation between isoluminant regions with different chromaticity, can cause those artifacts. However, the successfulness of a C2G conversion method used as preprocessing stage in an image analysis system depends on the subsequent steps. For example, [KC12] compared the performance of 13 grayscale conversion algorithms in an image recognition application and concluded that recognition performance is tangibly affected by the grayscale conversion step. Also, [GKD15] showed the importance of a right C2G conversion method for image classification. Correspondingly, [VV15] investigated the performance of C2G applied to medical color images of skin burns, skin melanoma and eardrum membrane (tympanic membrane), and [PMSJ10] for detection of nuclei in diversely stained cytology images.

Sometimes the C2G conversion methods are designed ad-hoc for the specific subsequent steps [MZT06, DAA07, TU07, LP09, PP13, MMDBBK16]. Specifically, [MZT06] demonstrated that pixel classification-based color image segmentation in color space is equivalent to performing segmentation on grayscale image through thresholding. Based on this result, they developed a supervised learning-based two-step procedure for color cell image segmentation, where color image was first mapped to grayscale via a transform learned through supervised learning, thresholding was then performed on the grayscale image to segment objects out of background. Another of those ad-hoc methods was introduced by [WR14] using color clustering and multidimensional scaling that preserves region integrity, that is, it preserves edge structure without low-frequency distortion; then, edges can be extracted from the resulting grayscale images and the results will be close to those obtainable from the original color image. Respectively, [PP13] developed a C2G conversion method to convert color documents to grayscale taking as criterion that the grayscale document must have locally uniform background, well separated characters from the background, and reduced noise.

As examples, Figure 6-a) shows one of the images of the Color250’s database. Suppose that it is needed to segment from this image the green leaves, or pink flower petals, or yellow anthers, or reddish filaments, or the yellowish butterfly, using a grayscale thresholding procedure. Then, you need to analyze the results of different C2G conversion methods, as seen in the same figure, utilizing some of the state-of-the-art methods (described in epigraph 9).

As it can be seen in Figure 6-b) the flower anther and filaments can be hardly differentiated and also the leaves have the same gray shade. Figure 6-c) shows a low contrast between petals and leaves. Figure 6-d) shows a low contrast between flower petals and the butterfly, and also between flower anther, filament, and leaves. Figure 6-e) can hardly differentiate between leaves and flower petals, neither between the flower anthers and the butterfly, showing the latter a loss of contrast between its yellowish and white regions. Figure 6-f) shows also a loss of contrast between yellowish and white regions of the butterfly and between it and the flower anthers, likewise between leaves and flower petals. This example illustrates one typical problem of C2G conversion for digital image analysis based on its gray intensities.

illustration not visible in this excerpt

Fig. 6. a) Original image from Color250 database and C2G conversion utilizing: b) CPDecolor [LXJ14], c) RTCP [LXJ12], d) Decolorize [GD07], e) ApparentGrayscale [SLTM08], and f) Color2Gray [GOTG05].

3.4. Recoloring images

Despite the fact of decolorizing the images, this process can be particularly appropriate for several other important goals as image enhancement or image recoloring.

The main goal of image enhancement is to emphasize the image features for display and analysis. There are different procedures to enhance image features, which depend on the goal of the enhancement, two of them are image recoloring and lightness contrast enhancement. Recoloring is the action of changing or modifying image coloration for different porpoises, mainly for color constancy, segmentation under uneven illumination, to obtain a dichromatic version of the original full color image for human observers with color-deficient vision, etc. As observed by Gooch et al. [GOTG05], substituting the luminance component with the appropriate grayscale conversion and recoloring back can yield more pleasant color and grayscale images for printouts. This approach was also used by Ancuti et al. [AAB11].

Lightness contrast enhancement (LCE) is a usual procedure in which the image is first converted to a luminance-chrominance color space and then performed in this enhancement of luminance (lightness). Lightness contrast enhancement can be performed using different procedures as histogram equalization, gamma-correction, etc. However, color-to-grayscale conversion using state-of-the-art procedures is a laudable choice to do this.

Color constancy, also named color normalization or standardization, leads to the increase of color fidelity of images taken under variations of different aspects, affecting the true color of known objects in the image. This is caused usually by violations of protocols for image production, variations in capture parameters, or inappropriate image enhancements.

Image recoloring has also been applied to recover the original colors of the C2G transformed images; that is, a color-to-gray mapping that is reversible [QB06]. Actually, the method of these authors converts color images to grayscale textures transforming first the RGB image to the YCbCr color space. Then, its luminance component (Y) is divided in four subbands, each representing different spatial frequency contents, making use of the discrete wavelet transform.

4. Color formation during image production

The tri-chromatic theory describes the way three separate lights, red, green and blue, can match most of the visible colors based on the eye’s use of three color sensitive sensors. This is the basis on which photography and printing operate, using three different colored lights or dyes to reproduce color in a scene. It is also the way that most computer color spaces operate, using three parameters to define a color.

Digital CCD (charged coupled device) or CMOS (complementary metal-oxide-semiconductor) cameras used to acquire color photography produce color images in RGB color space (cube) shown in Figure 7 (left). The RGB color space is defined as a primary space. In the RGB cube, the origin point 0 corresponds to the black color (R = G = B = 0) whereas the reference white is defined by the additive mixture of equal quantities of the three colors (R = G = B = 1). The straight line joining the corners of the points black and white is called the gray axis, the neutral color axis or the achromatic axis. Physically, the formation of color in the RGB color space is additive as it has been shown in Figure 7 (right), this means that each color is an additive contribution of the three lights. For this reason, the characteristics of the light and the material which constitutes the observed object determine physical properties of its color.

A primary color space depends on the choice of a set of primary colors. Ratios fixed for each primary are defined in order to reproduce a reference white (white point) by the additive mixing of the same amount of each primary. The choice of the primary depends on the device used, and the reference white is generally a CIE illuminant [Ade00, PV00].

illustration not visible in this excerpt

Fig. 7. RGB color space represented as a cube (left) and additive mixture of R, G, and B components in colors formation (right). Figures from Wikipedia.

Due to the fact that the R, G and B color components are invariant to acquisition system, it is necessary to apply a step of calibration. The problem is that there are some different aspects influencing color formation and most of the time acquisition conditions are not specified. For example, the CIE defined the CIE-RGB color space denoting each axis by RC, GC, and BC (where C indicates ‘CIE’), setting the primary monochromatic red, green and blue color wavelength in 700.0 nm, 546.1 nm and 435.8 nm respectively, and the reference white is the equal energy illuminant E. A first problem is that light sources, as incandescent lamps, do not emit those wavelengths with similar energy. The CIE-RGB color space can be considered as the reference color space because it defines a standard observer whose eye spectral response approximates the average eye spectral response of a representative set of human observers.

Common digital cameras have one chip sensor array with three interlaced color filters (Figure 8) named color filter array (CFA). Thus, the photosensitive receptors which are laid sequentially on a same line of the sensor are fit out with red, green and blue filters successively. So, the color information of each pixel in the image is obtained by receptors which are located at different places in his neighborhood. This technology generates a loss of resolution and chromatic aberration (false colors). To estimate the color of each pixel in the image, it is required an interpolation technique called demosaicking, and consequently, the color of each pixel of the image is defined by three components. Generally, each one of these three components (R, G, and B) is coded on 8 bits and can take 256 different unsigned integer values belonging to the interval {0, 255}. A color is so coded with 3 x 8 = 24 bits and it is then possible to represent 224 = 16 777 216 colors by additive mixture, whereas the human visual system allows to distinguish nearly 350 000 colors. However, the human visual system is not uniform and, for specific intervals, it is necessary to code the color component on 12 bits in order to discriminate all the colors that the human visual system can perceive on these intervals.

illustration not visible in this excerpt

Figure 8. Color filters array (CFA) in a single chip sensor. Figure from Wikipedia.

This is the reason why one of the most elemental ways of C2G conversion is to retain the green component of the RGB image and neglect the red and blue.

The tri-stimulus values of a RGB color stimulus depend on its luminance. Two different color stimuli can be described by the same chromatic characteristics, but their tri-stimulus values can be different due to their luminance. In order to obtain color components which do not depend on the luminance, it is necessary to normalize their values.

The RGB color spaces depends on the primaries used to match colors by additive mixture and by the used reference white; so, they are device dependent color spaces. For example, in color image analysis applications, the images acquired by digital cameras, the color of each pixel of an image is defined by three numerical components which depend on the so used acquisition device and its setup to obtain a reference white. Thus, an image acquired with the same lighting and observation conditions by two different cameras give rise to different colors if the primary color associated to the two cameras are not the same [BVM07].

In this way, the RGB color space presents some major drawbacks:

- Because it is not possible to match all the colors by additive mixture with a real primary space, the tri-stimulus values and chromaticity coordinates can be negative.
- The tri-stimulus values depend on the luminance which is a linear transformation of the R, G, and B color components.
- Because RGB color spaces are device dependent, there is a multitude of RGB color spaces with different characteristics.

For these reasons the CIE defines the imaginary primary space named the CIE-XYZ color space [SL05] where the primary colors are imaginary (virtual or artificial) in order to overcome the problems of the primary spaces. In this space, the X, Y and Z primaries are not physically realizable but they have been defined so that all the color stimuli are expressed by positive tri-stimulus values and so that one of these primaries, represents the luminance component. Because all the RGB color spaces can be converted to the XYZ color space by linear transforms, this space is a device independent color space. It defines an ideal observer, the CIE 1931 standard colorimetric observer, and all the colorimetry applications are based on it. Chromaticity coordinates can be deduced from the XYZ color space in order to obtain a normalized XYZ color space denoted x, y, and z. Thus, the chromaticity coordinates x, y and z are derived from the X, Y and Z tri-stimulus values by equation (5) [SMI05].

illustration not visible in this excerpt

As x + y + z = 1, z can be deduced from x and y. For the CIE A (light emitted by a black body radiator at the temperature of 2856 K or incandescence lamp), C (direct light of the sun at midday), D65 (daylight at 6504 K), D50 (daylight at 5000 K), F2 (fluorescent lamp), and E (light of equal energy) illuminants, Table 1 gives the x, y and z chromaticity coordinates of the primaries [Ras05, BVM07].

Table 1. x, y and z chromaticity coordinates of different illuminants and their correlated color temperatures T.

illustration not visible in this excerpt

If RGB is the main color space for sensed colors in digital images, the LMS color space is the main color space for sensed colors in the human eyes. Thus, color deficiency viewers with the three types of dichromacy mentioned before can be simulated by converting the “normal” RGB color space into its dichromatic versions using the LMS color space [ATAK07]. For example, normal viewers sensed stimuli can be calculated by converting each pixel in the RGB color space to the LMS color space by,

illustration not visible in this excerpt

then, for protanopic viewers the L, M, and S sensed stimuli can be reduced to L p, M p, and M p by, = (7)

illustration not visible in this excerpt

Finally, to show the simulated protanopic deficiency in a RGB monitor, the R p, G p, and B p components can be calculated by, = (8)

Figure 9 shows a simulation performed by Anagnostopoulos et al. [ATAK07] of the colors perceived by a protanope (a person suffering from protanopia). Note that reddish tones are confused with black. Similar transformations can simulate the deuteranopic and tritanopic deficiencies.

illustration not visible in this excerpt

Fig. 9. Simulation of the colors perceived by a protanope. Images taken from [ATAK07].

5. Classical color-to-grayscale conversion methods

The notion of “classical” methods in this work is related to those methods for C2G conversion developed during the twentieth century and are yet in use by some software or applications. Those methods include weighted sum of the R, G, and B values, both in linear and nonlinear manners, methods using the luminance component of some other color spaces, and methods based on data analysis.

5.1. The weighted sum methods

The most common of the classical methods for C2G conversion is the first-order linear model implemented as a weighted sum of the R, G, and B intensities of each pixel, named here the “direct” method. In the direct method the lightness information is retained and the hue and saturation information is discarded, converting RGB values to grayscale values by [BB04], [BE04]:

illustration not visible in this excerpt

Due to the fact that for an equal amount of color, the human eye is most sensitive to green, followed by red and finally by blue, the most usual selection for coefficients are established by the recommendation BT.601 of the CIE, setting the weight values a = 0.2989360213, b = 0.5870430744, and c = 0.1140209042; as in the monochromatic television formats known as NTSC (National Television Standards Committee) utilized in America, PAL (Phase Alternation Line) utilized in part of Europe, and SECAM (Sequentiel Couleur Avec Mémoire) utilized in France and other parts of Europe. The above weighted sum is also used by the function rgb2gray of Matlab, and the Postscript standard. This direct C2G conversion may produce mediocre images for visual observation. Therefore, these grayscale images are not tailored for image digital processing purposes, such as classification, because the intention of NTSC is not to obtain discriminative images [GKD15]. Actually, the NTSC weighted sum is the luminance component of the YUV color space.

There are other approaches for equation (9) as the CIE recommendation BT.709 used in the high definition television (HDT) standard where the contribution of the green component is more predominant than in BT.601, with a = 0.2126, b = 0.7152, and c = 0.0722, or in the “averaging conversion”, in which the weight contribution assigned to R, G, and B components are all the same, with a = b = c = 1/3. It must be noticed that to ensure that if R = G = B = 1 then the value of Gray = 1, all the three coefficients are bigger than zero; that is, a positive constraint and an energy conservation constraint on the weights might be enforced so that the grayscale image is within the range {0, 1}. The two constraints can be written as [LXJ12]

illustration not visible in this excerpt

These constraints also serve a second purpose: a neutral color would have the same intensity after C2G conversion.

Moreover, for specific applications, the weights can be adapted ad-hoc. For example, in [LP09] for purposes of face detection, the authors set a = 0.85, b = 0.10, and c = 0.05 because of the higher content of red component in human skin. This selection allows stronger face grayscale signal while downgrading other signals which could be considered to be noise. In [MZT06] for cell segmentation in images of tumors, they found that the optimum coefficients from the dataset of background and nuclei pixels of the studied images, through maximizing Fisher ratio class separability were a = 0.219, b = 0.582, and c = 0.200. Note the similarity between these coefficients and the ones for the luminance in the YUV color space, but this learning-based method studies the transform from images of the specific application domain in the context of machine-based pattern classification and are, therefore, expected to produce grayscale images leading to better segmentation results. Additionally, [LI12] calculates the weights adaptively for each image according to the contribution of each color in their Adaptive Color to Grayscale (ACGS) conversion algorithm.

Also it is common apply equation (9) in the nonlinear way i.e. gamma-corrected, then

illustration not visible in this excerpt

This approach controls the produced luminance nonlinearly. For example, the CIE Illuminant C utilizes g R = g G = g B = 2.2 with the same coefficient than for the luminance of the YUV color space. However, in the weighted sum of equations (9) and (11) the problem of colors with small lightness difference persists, but large differences in chrominance (hue and saturation) will be indistinguishable in the grayscale image.

Although the weighted sum methods (linear, averaging or gamma corrected) generally work well and are able to produce promising results of grayscale images, they will fail in certain cases, for example in images with single dominant color, low illumination color, and colors with similar luminance (but different hue). Single dominant color image refers to color image with the pixel distribution strongly biased to certain color components, while low illumination color image refers to color image with pixel distribution concentrated at low intensity levels. The resultant grayscale images produced have low luminance, contrast, and amount of details revealed, because the weight contribution that is assigned to R, G, and B components in these methods are fixed, regardless of the type of input color images used [LI11]. Moreover, as the distinction between two colors of similar luminance (but different hue) is lost, the contrast for important image features can disappear during C2G conversion.

Qiu et al [QFQ08] created a parameter vector u = [ a, b, c ] and defines a covariance matrix K of the red, green, and blue components of the input color image then, in order to retain the maximum amount of information from the color image F to the grayscale image J, the variance of J is maximized; i.e., uKu T of J is maximized. However, the resulting grayscale image may not be visually pleasing when only the variance of J is maximized. Similar results can be obtained using the Euclidean distance between the R, G, and B components [MMDBBK16].

The success in RGB to grayscale conversion is mostly subjective. Figure 10 shows three synthetic color images in RGB space (second row) and theirs colors coordinates in the RGB cube (upper row). To compare the ability of the direct procedure converting from RGB to grayscale using these synthetic images, the histogram of the obtained grayscale image has been included (bottom row). The expectation is that if the five colors in each synthetic image are perceptually different, then in the histogram of the grayscale version, the five gray levels must be also separated.

Perceptually, the grayscale version of image at the left column in Figure 10 shows a great similitude in gray levels between its gray background (the most populated pixels in the histogram) and the orange block at the upper-right corner, indicating a poor ability to differentiate between these two colors in the image. The color image at the central column in Figure 10 has very different coordinates in the RGB cube as compared with the image at the left column. Both, the grayscale version and its histogram show a great differentiation between the gray background and the color of the four blocks; nevertheless, the differentiation between the colors of the blocks at the upper row and between the ones at the lower row is poor, as can be seen both perceptually and in the histogram on the central column. The color image at the right column in Figure 10 also has different coordinates in the RGB cube compared with the other ones, moreover the background has no gray color, but its grayscale version reveals a behavior similar to the image at the left column in relation with the differentiation between background and the block at the upper-left corner.

illustration not visible in this excerpt

Fig. 10. RGB to grayscale direct converted using equation (9) with NTSC weights. From top to bottom: coordinates in the RGB cubes, original RGB images, grayscale images, and their histogram.

In images from the real world (non-synthetic), each color component expands within a volume in the RGB cube; additionally, images are contaminated with a small amount of noise, then, the grayscale histogram obtained by the color-to-grayscale conversion cannot show the disjoint lines profile obtained in Fig. 10 inhibiting the right conversion.

Despite the weakness of the direct conversion, this is the basis of some currently state-of-the-art C2G conversion methods. For example, Song et al. [SBXY13] improved the rgb2gray function of Matlab. Instead of assigning fixed component weights for all images, they found a more flexible strategy by choosing component weights depending on specific images to avoid indiscrimination in isoluminant regions based on bilateral filtering of the original color components. Also Lu et al. [LXJ12] in their real-time contrast preserving (RTCP) decolorization, discretize the solution space of a linear parametric model with 66 candidates and then identify one candidate achieving the smallest gradient error based energy value as the optimal solution, and this is currently the fastest algorithm. Similarly, Song et al. [STCBY13], based on the rgb2gray function, researched multi-scale contrast preservation on the discretized candidate images by taking advantage of the (joint) bilateral filter. Likewise Liu et al. [LLWL16] make use of the rgb2gray function of Matlab as initialization of their advanced SPDecolor method, and [LLWD16] revisited it in their two-stage parametric subspace (TPS), motivated by the previous impressive performance, extending the parametric discrete searching technique from the linear model to a two-order multivariance polynomial model.

An alternative to calculate image luminance is using a Minkowski-norm based luminance measure [DCFB09]. In this approach, the Minkowski luminance image (L p) is the image whose value at each pixel is given by the Minkowski-norm for the color vector at that pixel,

illustration not visible in this excerpt

where f i is the individual color component (K = 3). As p increases, the p -norm approaches the maximum over the channels. Figure 11 shows Monet’s painting “Impression sunrise”, with its L 1 and L 5 Minkowski luminance images in parts b) and c) respectively. Note that the sun tends to disappear when an L 1-type luminance is displayed, whereas the sun comes to the foreground using L 5-type luminance.

illustration not visible in this excerpt

Fig. 11. Minkowski-norm based luminance measure of Monet’s painting “Impression sunrise”. a) original RGB image, b) L 1-type luminance, and c) L 5-type luminance. Images taken from [DCFB09].

The Minkowski luminance image with p = 2; that is, the Euclidean luminance image, was reported by [PMSJ10] in the CIEL*a*b* color space, to obtain the grayscale version of stained cytology images.

The analysis done using figures 10 and 11 reveals the main question of this study: how much different can the colors in the RGB source be in a grayscale image? Come to think about it, any contrast enhancement or contrast stretching technique, applied to the grayscale image, can rescue the loss of gray differences.

5.2. Color-to-grayscale conversion based on color spaces

A color space is a method by which we can specify, create and visualize colors as shown in RGB, the main color space for sensed colors in digital images. Consequently, in a color space, each color is usually specified using three coordinates, or parameters. These parameters describe the position of the color within the color space being used. Translation from RGB to others color spaces is critical for color feature extraction and data redundancy reduction.

As a color can be expressed by its lightness and chromaticity, in color science literature there are many standard color spaces that serve to separate luminance information (grayscale image) from hue and saturation (chrominance information). These color spaces can be categorized in the family of luminance-chrominance spaces. Standard examples include: YUV, CIEL*a*b*, HSV, HSL, YIQ, YCbCr, etc. [SL05]. A propos, in many of computer vision applications, color spaces other than the RGB can be more appropriate to obtain reasonable results. However, the luminance obtained from each of these color spaces is different, then the RGB to grayscale transformation can be accomplished through them with different results. The components of a luminance-chrominance space are derived from the component of a RGB color space by linear or nonlinear transformation, which depends on the kind of space.

The L * component of CIEL*a*b* color space

One of the most utilized color space to C2G conversion is the CIEL*a*b*, also known as CIELAB[i] [CHD04, SL05, BVM07, CHRW09, PMSJ10, WSL12]. It was introduced in 1976 by the CIE as a device-independent color space to be approximately perceptually uniform, meaning that colors which are visually similar are close to each other in the color space (if the Euclidean distance is used).

The CIEL*a*b* space consists of a lightness component L *, chromaticity component a * indicating where color falls along the red-green axis, and chromaticity component b * indicating where the color falls along the blue-yellow axis, as it can be seen in Figure 12. Because of the opposition red-green and blue-yellow, in the CIELAB it is named an antagonist color space. All of the color information is in a * and b * layers.

illustration not visible in this excerpt

Fig. 12. Spherical representation of the CIEL*a*b* color space. Figure adapted from [YBW16].

The transformation from RGB to CIEL*a*b* is nonlinear [BVM07] and is accomplished indirectly through the ideal XYZ color space. Then, the lightness component in the CIELAB color space models the nonlinear human eye response to a level of luminance. Before doing the transformation from RGB to XYZ, one needs to specify the coordinates of the three primary color stimuli and of the white point (the nominally white object-color stimulus) in the XYZ space. The primary color stimuli depend on the hardware characteristics of the image capture device (i.e. the camera). The white point is usually specified by the spectral radiant power of one of the CIE standard illuminants, such as D65 (daylight) or A (tungsten filament lamp), reflected into the observer’s eye by the perfect reflecting diffuser. If the illumination conditions used when acquiring the image is known, then the specification of the white point is simple. If the illumination conditions are unknown, a hypothesis can be made, or a technique for estimating the white point from the image can be used. Then, the transformation from an RGB cube completely filled with points does not fill the L*a*b* space, but produces a gamut of colors whose shape depends on the primaries and white point.

Since the lightness information is in the L * component, it can be taken as the grayscale image. This is shown in Figure 13 for the same three synthetic RGB images showed in Figure 9. When comparing histograms in Figure 13 with the ones in Figure 10, it is perceptible the best color differentiation using the L * component for these three synthetic images.

illustration not visible in this excerpt

Fig. 13. RGB to grayscale conversion of the three color images in Figure 10 using the L * layer of CIELAB color space. Grayscale image (second row) and histogram of the grayscale image (bottom row).

Zhao and Tamimi [ZT10] improved the chance of the CIELAB color space introducing a spectral approach, particularly using the Fourier space. This global mapping image dependent procedure first computes Fourier transforms on the lightness component (L *), and the two chromatic components (a * and b *) of the image, respectively. The results, L, a, and b, are directly related to spatial rates of the intensity change at all spatial scales. Each frequency spectrum (i.e. magnitude) inherently reflects the contrast level at each corresponding scale, among the three components of the image. A luminance-mapping grayscale image can be recovered from an inverse Fourier transform of L.

Unfortunately, no perfectly uniform color space has been discovered yet, but CIELAB is one attempt to it. Other approximations to uniform color space are CIECAM02 [MFHLLN02], and LAB2000HL [LU12], where the perceptual uniformity generally holds for small color differences. However, there are other perceptual luminance-chrominance spaces, compounded of a luminance component and two chrominance components. This perceptual approach allows an adequate communication between human and machine for describing colors. There are lots of such color spaces, like HSL, HSB, YUV, YIQ… For these spaces the equations to determine the luminance component are different, leading to dissimilar meaning.

The L component of the HSL color space

In image processing systems, it is often convenient to specify colors in a way that is compatible with the hardware used. The different variants of the RGB monitor model address that need. Although these systems are computationally practical, they are not useful for user specification and recognition of colors. The user cannot easily specify a desired color in the RGB model. On the other hand, perceptual features, such as perceived luminance, saturation and hue, correlate well with the human perception of color. Therefore, a color model in which these color attributes form the basis of the space is preferable from the users' point of view [PV00]. In the HSL (also named HLS or HSI indicating intensity), hue indicates the color sensation of the light, in other words if the color is red, yellow, green, cyan, blue, magenta... This representation looks almost the same as the visible spectrum of light. Saturation indicates the degree to which the hue differs from a neutral gray. The values run from 0%, which is no color, to 100%, which is the fullest saturation of a given hue at a given percentage. Lightness is intuitively what its name indicates according to definition in section 2.2. Varying the lightness reduces the values of the primary colors while keeping them in the same ratio.

The HSL color space is a nonlinear deformation of the RGB color cube in Figure 7. One of its tridimensional representations is a sphere as has been shown in Figure 14, where:

- Longitude a (east to west) rotation, changes in Hue. H ∊ {0, 360°} where typically 0° is red, 60° yellow, 120° green, 180° cyan, 240° blue, and 300° magenta. Hue works circularly, so it can be represented on a circle (a hue of 360° looks the same again as a hue of 0°).

- Radial r (center to surface) increase, changes in saturation. S ∊ {0, 1}, or sometimes S ∊ {0, 100%}, where 0 is gray (gray has no saturation), and 1 the pure primary color according to the hue.

- Latitude q (south to north) elevation, changes in lightness from black (south) to white (north).

The HSL color space owes its usefulness to two principal facts. First, the luminance component is decoupled from the chrominance information (hue and saturation). Second, the hue and saturation components are intimately related to the way in which humans perceive chrominance. Hence, these features make the HSL an ideal color model for image processing applications where the chrominance is of importance rather than the overall color perception (which is determined by both luminance and chrominance). One example of the usefulness of the HSL model is color image histogram equalization performed in the HSL space to avoid undesirable shifts in image hue [PV00].

Fig. 14. Spherical representation of the HSL color space. Figure from Wikipedia.

illustration not visible in this excerpt

Actually, HSL type color spaces are deformations of an RGB color cube. If you imagine the RGB cube tilted onto his black corner, then the line through the cube from black to white defines the lightness axis. The color is then defined as a position on a circular plane perpendicular to the lightness axis. Hue is the angle from a nominal point around the circle to the color, while saturation is the radius from the central lightness axis to the color. Then, the HSL family is also named the "desaturation" model. Desaturating an image works by forcing the saturation to zero. Basically, this takes a color and converts it to its least-saturated variant. A pixel can be desaturated by finding the midpoint between the maximum of (R, G, and B) and the minimum of (R, G, and B). For this reason, the RGB to HSL conversion algorithms, originally defined as in [PV00], and used in segmentation procedures as in [KBK07], start setting the variables H ∊ {0, 360°} and S, L, R, G, B ∊ {0, 1}. By the way, Gonzalez, Woods and Eddins [GWE04] named HSI luminance the result of L = ⅓(R + G + B), which is the averaging way obtained from equation (9).

Figure 15 shows changes in HSL lightness (L) and saturation (S) with different hue (H) values, where it is evident that different hue and saturation can have the same lightness (left part of the figure), then in a RGB to grayscale conversion using HSL lightness, different H and S values will produce similar gray shade.

illustration not visible in this excerpt

Fig. 15. Changes in HSL lightness (L) and saturation (S) with different hue (H) values.

Figure 16 shows the three RGB images of Figure 10 converted here to grayscale using the L component of the HSL color space with their respective histograms. The grayscale versions cannot show a satisfactory conversion from the perceptual point of view for the three images. Also, the associated grayscale histograms show an insufficient differentiation for the five colors in each image.

illustration not visible in this excerpt

Fig. 16. Grayscale conversion of the RGB images in Figure 10 using the L component of the HSL color space (second row) with their respective histograms (bottom row).

A variant of the HSL color space is the Improved HLS (IHLS) where saturation was normalized and proved to be independent of the luminance. IHLS color space forms a more independent subsystem which would give a more stable foundation for analysis of perception contrast and it has been widely applied, including C2G conversion [ZHL13].

The B component of the HSB color space

The HSB (Hue, Saturation, and Brightness) color space, also known as HSV (with Value), was created in 1978 as a generalized form of the HSL model, and is frequently used in digital image processing. This user-oriented color space is based on the intuitive appeal of the artist's tint, shade, and tone. As can be seen in Figure 17, the three constituent components are the hue H (with the same meaning than in the HSL color space), the saturation S of the color (similar than in HSL), and the brightness of the color B (which range from 0 for the black to 1 for the maximum brightness).

illustration not visible in this excerpt

Fig. 17. Cone representing the HSB color space. Figure from Wikipedia.

Briefly, hue, saturation, and value indicate the quality by which we distinguish one color family from another, a strong color from a weak one, and a light color from a dark one, respectively. The saturation is sometimes called the "purity" of the color. The lower the saturation, the more "grayness" is present and the more faded the color will appear.

The HSB color space is also a nonlinear transformation of the RGB color space, and like this one, it is also device dependent. On the other hand, whereas RGB is an additive color space, HSB encapsulates information about a color in terms that are more familiar to humans: What color is it? How vibrant is it? How light or dark is it? Figure 18 shows changes in in HSB brightness (B) and saturation (S) with different hue (H) values were is also evident that different hue and saturation can have the same brightness (left part of the figure), then in a RGB to grayscale conversion utilizing HSB brightness, different H and S values will produce similar gray shade.

illustration not visible in this excerpt

Fig. 18. Changes in in HSB brightness (B) and saturation (S) with different hue (H) values.

Table 2 shows the equivalence between some colors in the RGB, HSL, HSB, and CIEL*a*b* color spaces. In this table, it can be seen that when in the RGB color space at least one of their components are of value one and any other of value zero, the numerical value for lightness in the HSL color space is 50% of the B value in the HSB color space. This is because the numerical value of the brightness in the HSB space is given by the brightness of the brightest RGB component, then all pure lights (R, G, or B) have a brightness of 100 %. In contrast, the lightness in the HSL color space L = ½(MAX + MIN) if MAX ¹ MIN, been MAX = max(R, G, B) and MIN = min(R, G, B). That is, the lightness in HSL always spans the entire range from black through the chosen hue to white, whereas in HSB, the B component only goes half that way, from black to the chosen hue.

Table 2. Comparison between some colors in the RGB, HSL, HSB, and CIEL*a*b* color spaces.

illustration not visible in this excerpt

Additionally, Table 2 shows that the significance of lightness L in the HSL color space is not the same tan the lightness L * in the CIEL*a*b* color space. Both lightnesses are obtained by nonlinear transformations of the RGB cube shown in Figure 7, but with different perceptions. The L * component of the CIEL*a*b* color space also models the nonlinear human eye response to a level of luminance. This different significance will cause noticeable results in color-to-grayscale conversion procedures based on the lightness of these two color spaces.

Compare also the numerical value of the saturation of both spaces. While hue in HSL and HSB refers to the same attribute, their definitions of saturation differ slightly, because in HSB the saturation variation is more related to color purity (e.g. compare the three green shade when in Table 2 [ R, G, B ] = [0.5, 1.0, 0.5]). In HSL, the saturation component always goes from fully saturated color to the equivalent gray, whereas in HSB, with B at maximum, S goes from saturated color to white, which may be considered counterintuitive.

However, both color spaces, HSL and HSB, have important advantages over other color spaces, as [PV00]:

- Good compatibility with human intuition.
- Separability of chromatic values (hue and saturation) from achromatic values (luminance: L or B).
- The possibility of using one color feature, hue, only for color segmentation purposes. Many image segmentation approaches take advantage of this. Color segmentation is usually performed in one color feature (hue) instead of three, allowing the use of much faster algorithms.

However, hue-oriented color spaces have some significant drawbacks, such as:

- Singularities in the transform, e.g. undefined hue for achromatic points.
- Sensitivity to small deviations of RGB values near singular points.
- Numerical instability when operating on hue due to the angular nature of the feature. As the hue is defined on the unit circle, the values wrap around (H = H + 2 n p, n ∊ ℤ). For example, standard grayscale image analysis operators, specifically morphological operators, are influenced by it.

Figure 19 shows the three RGB images of Figure 10 converted here to grayscale using the brightness (B) component of the HSB color space (equivalent to the value V of the HSV color space) and their respective histograms. Perceptually, the three grayscale versions cannot differentiate well all the blocks in their original color images, and even with certain grayscale histograms, they show intensity differentiation.

HSL and HSB are the most common color spaces used in image editing software i.e. Microsoft Window system, including Microsoft Paint.

illustration not visible in this excerpt

Fig. 19. Grayscale conversion of the RGB images in Figure 10 using the B component of the HSB color space (top row) with their respective histograms (second row).

The Y component of the YCbCr color space

The YCbCr color space, sometimes abbreviated to YCC, has been defined for use in digital video and digital photography in response to the increasing demands for digital algorithms. Y is the luminance component and is obtained as the weighted sum of R, G, and B values. Color information (chrominance) is stored as two color-difference components: Cb, the difference between the blue component and a reference value, and Cr, the difference between the red component and a reference value [QB06, LKS10, KK12, AM13, AMM13, HR13]. The luminance component Y has been defined to have a nominal range of [16/255, 235/255]; the blue-difference Cb and red-difference Cr chrominance are defined to have a nominal range of [16/255, 240/255].

In contrast to RGB, the YCbCr color space is luminance independent, giving better performance. However, the transformation RGB into YCbCr depends on the CIE illuminant (A, C, D56, D60 …).

In digital image coding systems, YCbCr is a preferable color space [STCBY13], because of three factors:

1. YCbCr takes into account the property of HSB. Studies show human eyes are sensitive to luminance, but not so sensitive to chrominance. YCbCr color space makes use of this fact to achieve more efficient representation of images when separating the luminance and chrominance components.
2. YCbCr is used for digital encoding of color information in computing systems. Some other color spaces, like YUV, are traditionally used for analog encoding of color information in television system.
3. The conversion between RGB and YCbCr is linear. Although L*a*b* and HSB are color spaces that also match the human visual system, the conversion between L*a*b*, HSB and RGB is nonlinear, so their conversions are more time consuming than that of YCbCr.

Actually, the linear transformation between RGB and YCbCr is formed in a way that rotates the entire nominal RGB color cube and scales it to fit within a (larger) YCbCr color cube too. However, there are some points within the YCbCr color cube that cannot be represented in the corresponding RGB domain (at least not within the nominal RGB range). This causes some difficulty in determining how to correctly interpret and display some YCbCr signals. Figure 20 shows three planes of the YCbCr cube for Y values of 0, 0.5, and 1.

illustration not visible in this excerpt

Fig. 20. Planes at constant luminance of the YCbCr color space: Y = 0 (left), Y = 0.5 (center), and Y = 1 (right). Figure from Wikipedia.

The result of the conversion to grayscale of the three synthetic RGB images with similar color lightness shown in Figure 10, using now the Y component of the YCbCr color space, can be perceptually observed in Figure 21, where the resulting grayscale histogram is exposed too. For these three images the Y component of YCbCr space as grayscale versions are no better than the L * layer of the CIELAB color space (Figure 13), but slightly better than B component of HSB color space (Figure 19).

5.3. Color-to-grayscale conversion based on data analysis

As has been commented earlier, color-to-gray can be also treated as a dimension reduction problem. Numerous approaches have been developed for solving the general problem of reducing an n -dimensional set of data to m dimensions where m < n. Therefore, unsupervised dimension reduction algorithms can be employed to transform the three dimensional color-space to the lower dimensional space.

illustration not visible in this excerpt

Fig. 21. Conversion of the three RGB images shown in Figure 10 to grayscale using the Y component of the YCbCr color space (second row), and their histograms (bottom row).

As stated by Rasche et al. [RGW05a], the reduction of high-dimensional data to a lower dimension is a well-studied problem. A standard linear technique for this is principal component analysis (PCA) where a set of orthogonal vectors (the principal components) lying along the directions of maximal data variation is found [TB05, SL05, DAA07, TU07, LL13]. To reduce the dimensions of the data, a subset of the principal components form a subspace onto which the original data is projected. While preserving color variation is necessary for pleasing grayscale images, it is not sufficient. Choosing the axis of maximum variation in the color image data does provide a high contrast grayscale image [SK13]. Even so, significant smaller details usually appear in the second principal component, which is lost to this technique. Further, it is difficult to incorporate constraints on luminance consistency. In this respect, Song et al. [STCBY13] comment that PCA constructs the covariance matrix based on the statistics of the R, G, and B values of pixels and projects color pixels to the leading principle component. The method assumes observations are drawn from a single Gaussian and ignores the spatial distribution of color pixels. Although its “kernelization” can deal with the non-Gaussian distribution property of color pixels, the computational complexity is too expensive to be applicable in practice, not to mention the sensitivity of the kernel parameter to different images.

Grundland and Dogson [GD07], in order to find the color axis that best represents the chromatic contrasts lost when the luminance component supplies the color to grayscale mapping, introduce a new dimensionality reduction strategy named predominant component analysis (PdCA). Unlike principal component analysis, which optimizes the variability of observations, predominant component analysis optimizes the differences between observations. The predominant chromatic axis aims to capture, with a single chromatic coordinate, the color contrast information that is lost in the luminance component. For the preservation of luminance polarity, its orientation should promote the positive correlation of chromatic contrasts with luminance contrasts. The direction of the predominant chromatic axis is optimal in the sense that it maximizes the covariance between chromatic contrasts and the weighted polarity of the luminance contrasts. They acknowledge that a single chromatic axis is an inherently limited tool, as it cannot be used to distinguish color contrasts that are perpendicular to it. However, for most color images, a single well-chosen chromatic axis suffices for effective contrast enhancement.

The mentioned analysis performed by Rasche et al. [RGW05a] also considers the multi-dimensional scaling (MDS), as an alternative to factor analysis for detecting meaningful underlying dimensions in a set of multi-dimensional data points. According to these authors, it most often takes the form of minimizing a quadratic function of the set of all distances between pairs of points in the multidimensional data. Because the input involves all observed distances, the technique does not often scale well to large data sets. For the special case of reduction to a single dimension (1D scaling) in general, these do not scale well to large data sets, and the addition of external constraints can be problematic. These authors also explain that another class of dimensional reduction techniques attempts to find a nonlinear transformation of the data into the lower dimensional space as the local linear embedding (LLE) and ISOMAP [CHRW09]. According to these authors the mentioned transforms work well with data that has some inherent (although perhaps complex) parameterization. They attempt to maintain local distances, corresponding to the original data lying along a higher dimensional manifold. As such, they generally require a “magic number” defining the size of the neighborhood of local distances to observe, and the results can be somewhat sensitive to the neighborhood size. Additionally, as they only work at a local level, it is not clear whether they can reproduce both contrast and detail when reducing the dimension of the colors. Moreover, it is also not clear that the colors from an arbitrary image lie along a single manifold.

6. The problem of color-to-grayscale conversion

As it has been defined by Ma et al. [MZZW15], the goal of C2G conversion is to preserve as much visually meaningful information about the reference color images as possible, while simultaneously produce perceptually natural and pleasing grayscale images. The general goal for this dimension reduction task is how to utilize the limited range in the grayscales to preserve as much as possible the original features, salient structures, color contrast, and other important visual goals and logical constraints. Despite the great proliferation of C2G conversion methods, not all these algorithms are ready for the great variety of practical applications, because they encounter the following problems [STCLC10]: 1) visual cues are not well defined, so it is unclear how to preserve important cues in the transformed gray-scale images; 2) some algorithms have extremely high time cost for computation; and 3) some require human-computer interactions to have a reasonable transformation.

Some authors [RGW05b, GD07, LKS10, KJDL09, HLS11, BCCCR12, HR13, WT14, JFWM15] agree that color-to-grayscale image conversion might meet some goals and constraints to contrast preservation and luminance consistency. Contrast preservation means that colors which are readily distinguishable in the original image should be represented by gray values that are also readily distinguishable. On the other hand, shadows, highlights, and color gradients provide depth cues through luminance variations. Grayscale images in which these cues exhibit luminance reversals, e.g. lightened shadows or darkened highlights, can be disorienting for human perception. These effects can be ameliorated if luminance gradients within narrowly defined chrominance bands are maintained during the conversion to grayscale. Then, the C2G conversion might meet mainly the next visual goals and logical constraints.

Making a consensus of the different authors, the visual goals for C2G conversion are the next five:

- Feature preservation. Features in the color image should remain discriminable in the grayscale image.
- Contrast magnitude. The magnitude of the grayscale contrasts should visibly reflect the magnitude of the color contrasts. This goal is also named as preservation of chromatic contrast by [LKS10] and [WT14].
- Contrast polarity. The positive or negative polarity of gray level change in the grayscale contrasts should visibly correspond to the polarity of luminance change in the color contrasts.
- Lightness fidelity. Color and grayscale images should have similar lightness stimuli.
- Dynamic range. The dynamic range of the gray levels in the grayscale image should visibly accord with the dynamic range of luminance values in the color image.

According to Ji at al. [JFWM15], the color-to-gray conversion is a dimension-reduction problem which inevitably loses image information, even without considering the mapping consistency. Therefore, it is allowed to sacrifice the minor features in the conversion for some color images, accepting the goal “dominant feature preservation” instead of the more demanding goal “feature preservation”.

Color contrast between two pixels p and q on the color image can be expressed based on the Euclidean distance (L2 norm) in the CIEL*a*b* color space as [LXJ12],

illustration not visible in this excerpt

Equation (13) represents the color dissimilarity in the human vision system, also known as the visible color difference (VCD) according to the original name in the paper of Chen and Wang [CW04]. It is widely accepted that the Euclidean distances in CIEL*a*b* space approaches to the human perceptual dissimilarities whereas the Euclidean distance in the RGB color space do not correspond to the color differences which are actually perceived by an human observer [BVM07, WSL12, LXJ14, JFWM15, KKD16, YBW16].

It is needed to be aware that it is meaningful to predict perceptual color differences in CIELAB space only for certain range of small color differences (SCDs). Its own authors classify the VCD roughly into three different levels to reflect the degrees of color difference perceived by humans. The VCD is hardly perceptible when is smaller than 3; it is perceptible but still tolerable when is between 3 and 6; and it is usually easily perceptible or “visible” when is larger than 6. Additionally, for VCD > 15, it makes little sense to rely on CIELAB distances to differentiate large color differences (LCDs). The HVS has major difficulties in differentiating VCD = 33 and 34. On the other hand, at the low extreme (VCD < 2.3), the HVS cannot perceive the color differences. As a result, when VCD is lower than a just-noticeable difference (JND) level, the differences in VCD value do not have any perceptual meaning [MZZW15].

However, other distances between colors has been proposed. As an example, Kuk et al. [KAC11] propose a modification of equation (13) utilizing also a metric over the CIEL*a*b* color space denoted by

illustration not visible in this excerpt

where w a and w b are free parameters for adjusting the contributions of a * and b * components to the distance respectively. Note also that the cubic root denotes L3 norm. Then, equations (13) and (14) are the Minkowski distances between pixels p and q on each image component when the norms are 2 and 3 respectively.

Although the above metrics for color contrast have been used by many C2G conversion procedures, according to Wu et al. [WSL12], it is not very reasonable to compute the grayscale results by preserving the discriminability between individual pixels. Human visual system does not perceive the world as a collection of individual “pixels”. According to the Gestalt principle, it tends to group together the similar elements instead of processing a large number of smaller stimuli.

The logical constraints for C2G conversion are the four next, making a consensus of the authors mentioned:

- Continuous mapping. The transformation from color to grayscale is a continuous function. This constraint reduces image artifacts, such as false contours in homogeneous image regions.
- Global consistency. When two pixels have the same color in the color image, they will have the same gray level in the grayscale image. This constraint assists in image interpretation by allowing the ordering of gray levels to induce a global ordering relation on image colors. This constraint has been named recently as mapping consistency by [HR13] and [JFWM15].
- Grayscale preservation. When a pixel in the color image is gray, it will have the same gray level in the grayscale image. This constraint assists in image interpretation by enforcing the usual relationship between gray level and luminance value.
- Luminance ordering. When a sequence of pixels of increasing luminance in the color image share the same hue and saturation, they will have increasing gray levels in the grayscale image. This constraint reduces image artifacts, such as local reversals of image polarity. Also named as local luminance consistency when is related to luminance gradients by [WT14].
Moreover, two important properties for human grayscale image perception are:
- Saturation ordering. When a sequence of pixels having the same luminance and hue in the color image has a monotonic sequence of saturation values, its sequence of gray levels in the grayscale image will be a concatenation of at most two monotonic sequences.
- Hue ordering. Also named color ordering, means that when a sequence of pixels having the same luminance and saturation in the color image has a monotonic sequence of hue angles that lies on the same half of the color circle, its sequence of gray levels in the grayscale image will be a concatenation of at most two monotonic sequences. According to Kim et al [KJDL09], color ordering is still an active research topic in color theory having room for the improvement of the existing schemes.

Hue ordering is an important property for color sequence preservation. Many visualization techniques use images containing meaningful color sequences [YLL15]. A sequence of colors can play an important role in visualization. For example, a weather map might be colored to show the forecast temperatures: red for hot, blue for cold, and other colors for intermediate temperatures. This technique of representing continuously varying values using a sequence of colors is used widely in medical imaging and other scientific applications. If such images are converted to grayscale, the sequence is often distorted, compromising the information in the image. Usually, the color sequences used in visualization are particularly difficult to decolorize. For many color sequences, conversion based on the luminance channel alone produces a grayscale sequence that has an order different to the original, and may also contain repeated gray levels. In fact, many color sequences are too complicated to be captured satisfactorily by any linear or simple nonlinear functions, and sequences decolorized using such functions can never be monotonic in brightness.

Finally, two practical criteria might be considered:

- High efficiency [JFWM15] or low complexity [BCCCR12]. The color-to-gray transformation is relatively subjective and application related. To further interactively modify the conversion results, the algorithm should be fast enough for preferably real time interaction.
- Unsupervised tuning. The algorithm needs no user intervention to avoid the variability of the final grayscale caused by subjective criteria [BCCCR12]. This criteria is particularly important when the C2G conversion algorithm is a pre-processing stage of a digital image processing system or an image analysis scheme.

Computational cost and robustness to parameter tuning are two important issues [LLXWL15]. In the pioneering work of Gooch et al., O(N 4) is needed to optimize the target difference in grayscale results for an N × N image size. To alleviate the computation difficulty, random sampling of pixel pairs was explored by Grundland and Dodgson [GD07] with a predominant component analysis to provide fast conversion results. Latterly, Lu et al. [LXJ12] discretized the solution space of a linear parametric model with 66 candidates and then determined one candidate achieving the highest energy value as the optimal solution, which is currently one of the fastest algorithm.

The solution to all these above-mentioned cases is not unique and in some situations state-of-the-art procedures cannot produce satisfactory results in many cases.

7. Categories of state-of-the-art color-to-grayscale methods

Compared with classical C2G methods explained in section five, state-of-the-art methods are focused on retaining as much as possible meaningful visual features and color contrast, trying to satisfy some others goals and constraints described in section six. However, these efforts are characterized by a growing computational cost, decreasing the speed of algorithms. Consequently, the resulting C2G methods are typically orders of magnitude slower than classical procedures. Attempting to satisfy those goals and constraints, C2G conversion methods usually adopt one of the next strategies: global, local, or hybrid mapping. Figure 22 shows roughly the appearance of the grayscale images obtained utilizing global mapping (center part of the figure), and local mapping (right part).

illustration not visible in this excerpt

Fig. 22. RGB to grayscale conversion from original image (left) using global mapping (center) and local mapping (right). Figure adapted from [KJDL09].

7.1. Global color-to-grayscale conversion methods

As explained by Kim et al. [KJDL09], in a global mapping method, the same color-to-gray mapping is used for all pixels in the input. It consistently maps the same colors to the same grayscale values over an image, independently of their location, guaranteeing homogenous conversion of constant color regions as can be seen in Figure 22 (center). Recently, many global C2G conversion methods have been proposed [BB04, GOTG05, RGW05a, b, GD07, KJDL09, CHRW09, Sar10, LI12, LXJ12, STCBY13, ZHL13, HR13, WR14, JN15, MBT15, and LLWL16].

However, it would be more challenging to determine such a global mapping that preserves local features at different locations at the same time. A propos, one disadvantage of global mapping methods is that they may not preserve local features from input color images to output grayscale images [JLN14]. Examples of global mapping are the luminance of the YUV color space (NTSC format) used also by the rgb2gray function of Matlab (named here direct method) and the luminance layer of the most classical color spaces as CIELAB, HSV, HSL, YCbCr, etc.

To preserve color contrast, the conversion can be formulated as a linear or non-linear optimization problem which preserves the original color contrast of the color image as much as possible.

Global mapping methods can further be divided into image independent and image dependent algorithms. Image independent algorithms, such as the calculation of luminance, assume that the transformation from color to gray is related to the cone sensitivities of the human eye. Based on that, the luminance approach is defined as a weighted sum of the red, green and blue values of the image without any measuring of the image content. The weights assigned to the red, green and blue components are derived from vision studies where it is known that the eye is more sensitive to green than red and blue. The luminance transformation is known to reduce the contrast between color regions, and this is perhaps the main drawback in RGB to grayscale conversion applied to digital analysis.

To improve upon the performance of the image-independent methods, the image dependent methods incorporate statistical information about the image’s color, or multi-spectral information. Principal component analysis (PCA) is an example of this, by considering the color information as vectors in an n -dimensional space [SL05, AD09, SK13]. It has been shown that PCA shares a common problem with the global averaging techniques: contrast between adjacent pixels in an image is always less than in the original. Moreover, PCA-based approaches would require an optimization technique to mix the principal components [TB05, KOF08]. Other global mapping approach image dependent algorithms are based in ISOMAP, a non-manifold learning technique [CHRW09].

Human-computer interaction (HCI) has been demonstrated to be helpful to further improve the performance of color-to-grayscale algorithms by allowing parameters to be tuned for a given image to obtain a visually reasonable transformation [STCLC10]. One of the first state-of-the-art global mapping methods with this property was proposed by Gooch et al. [GOTG05] with their Color2Gray interactive algorithm that tuned three parameters: the difference in color chrominance, the contrast between colors, and the size of a pixel’s neighborhood. Users utilize Color2Gray according to their preferred visual cues via tuning the above parameters manually. Thus, it is difficult to find the optimal parameter combination for a given image to achieve the visual cues preservation. Despite the costly original version, this algorithm has been improved by many latter authors, and an advanced faster version of this method has been a Photoshop plugin, but sometimes causing artifacts [KJDL09, KAC11, MBT15].

In 2007 Grundland and Dodgson [GD07] proposed the Decolorize, an image dependent global mapping algorithm using the YPQ color space, which is similar to YIQ, a color space formerly used by NTSC television. Actually YIQ is almost the same than the YUV color space currently utilized by the NTSC television standard and other television systems. In YUV, the U and V components can be thought of as X and Y coordinates within the color space, whereas in YIQ, the I and Q components can be thought as a second pair of axes on the same graph, rotated 33°; therefore I, Q and U, V represent different coordinate systems on the same plane.

The Decolorize algorithm has become an established method both, for contrast enhancement as well as for converting color to grayscale performing dimensionality reduction. It uses the technique called predominant component analysis (PdCA). In order to decrease the computational cost of this analysis, they use a local sampling by a Gaussian pairing of pixels that limits the amount of color differences processed. This algorithm is conceived to satisfy with the goals: contrast magnitude, contrast polarity, and dynamic range. Also, this algorithm satisfies the following constraints: continuous mapping, global consistency, grayscale preservation, and luminance ordering.

Three independent parameters are used to control: 1) contrast enhancement l, 2) scale selection s, and 3) noise suppression h. Image-independent default values for these three parameters have been proposed by authors.

The contrast enhancement (0 ≤ l ≤ 1) is an indicator of how much the image’s achromatic content can be changed to accommodate its chromatic contrasts. The typical value is l = 0.5 for a noticeable effect, or l = 0.3 for subtle effect. This parameter enables to control the intensity of the contrast enhancing effect, together with the resulting expansion of the dynamic range. The scale parameter (s > 0) is the radius in pixels of relevant color contrast features. The typical value is s = 25 for a 300x300 image. This parameter enables to take account of image size, ensuring that the algorithm pays attention to image features at the correct spatial scale. The noise parameter (0 < h << ½) specifies the portion of image pixels that are outliers, displaying aberrant color values. The typical value is h = 0.001 for a high quality photograph. This parameter enables the user to compensate for image noise, ensuring that the algorithm can robustly determine the extent of the image’s dynamic range.

Figure 23 shows the effect of contrast enhancement applied to the three synthetic RGB images on Figure 10. Its second row shows the change in achromatic content to accommodate the chromatic contrast using l = 0.3, and its third row using l = 0.5. In this figure the scale and noise parameters are the typical (s = 25 and h = 0.001).

Figure 24 shows in its upper row the grayscale versions of the images in Figure 23 with achromatic content adjusted using l = 0.3 with typical scale and noise parameters. The lower row shows the associated histograms. When comparing Figure 24 with previously obtained cases (Figure 10, which shows the direct conversion, Figure 13 showing the L * component of the CIELAB color space, Figure 16 showing the L component of the HSL color space, Figure 19 for the B component of the HSB color space, and Figure 21 using the Y component of the YCbCr color space), it can be observed that using the Decolorize algorithm for these three figures we obtain both, better perceptual grayscale shades matched with their color images, and better gray level separation in the histogram. Despite the refinement included in this image dependent global mapping procedure, some images exhibit low gray resolution.

illustration not visible in this excerpt

Fig. 23. The Decolorize algorithm contrast enhancement effect applied to the three synthetic RGB images on Figure 10 using l = 0.3 (middle row) and l = 0.5 (bottom row), with typical scale and noise parameters (s = 25, h = 0.001).

Kuh et al. [KOF08] showed that the Decolorize algorithm local analysis may not capture the differences between spatially distant colors and, as a result, it may map clearly distinct colors to the same shade of gray. However, Cadík [Cad08] perceptually evaluated in 2008 several typical state-of-the-arts RGB to grayscale conversion methods showing that the Decolorize conversion were overall the best ranked approaches. As recognized by Ji et al. [JFWM15], though this evaluation is outdated, and there are several new methods proposed after that evaluation, it demonstrates that the fast linear methods (such as Decolorize) perform well, in despite of its simplicity. Benedetti et al. [BCCCR12] developed the named Multi-Image Decolorize (MID) which is an adaptation of the Decolorize algorithm for stereo and multi view stereo matching porpoises.

illustration not visible in this excerpt

Fig. 24. Grayscale versions of RGB synthetic images in Figure 23 using the Decolorize algorithm with l = 0.3, with typical scale and noise parameters (s = 25, h = 0.001)

7.2. Local color-to-grayscale conversion methods

Local (also named spatial) methods are based on the assumption that the transformation from color to grayscale needs to be defined such that differences between pixels are preserved. Kim et al. [KJDL09] also explicate that in a local mapping method, the color-to-gray mapping of pixel values is spatially varied, depending on the local distributions of colors. Although a local mapping has advantages in accurately preserving local features, constant color regions could be converted inhomogeneously if the mapping changes in the regions, as can be seen in Figure 22 (right). Some local mapping algorithms effectively preserve local features but may distort appearances of constant color regions lacking a continuous mapping problem. Moreover, some local C2G methods have no global consistency.

In 2004, Bala and Eschbach [BE04] proposed a color to grayscale transformation technique that locally preserves distinction between adjacent colors (chrominance edges) by introducing high frequency chrominance information into the luminance component, taking a spatial approach and introducing color contrasts in CIELAB LCH (lightness, chrominance, hue angle) by adding the highpass filtered chrominance component to the lightness component. This is accomplished by applying a spatial highpass filter to the chrominance component, weighting the output with a luminance-dependent term; and adding the result to the luminance component. Since this pioneer work, other local mapping methods have been published [NK06, NCN07, KOF08, AD09, LI11, WSL12, ZLZZ14, LLXWL15, GKD15, NH15, LCYC15, YBW16, TT16].

Some C2G conversion procedures were originally global mapping methods, but later versions transformed them to local mapping or a combination of them becoming hybrid approaches. These are the methods of Lu et al. of 2012 and 2014 [LXJ12, 14], or Liu et al of 2015 and 2016 [LLXWL15, LLWL16]. Another example is the default implementation of the global mapping of Rasche et al. [RGW05a, b], which is computationally expensive because it compares every pixel to every other pixel when minimizing the objective function. The authors suggested a faster implementation as an alternative by limiting the comparison to only a small spatial neighborhood for each pixel. However, this also turns the algorithm into a local contrast enhancement operator.

For video decolorization, Smith et al. [SLTM08] have shown that local mapping approaches are more suitable for this task. However, when the scene changes, their method may require user intervention to adjust the parameters. To maintain temporal coherence while providing adaptive conversion at each frame, Kim et al. [KJDL09] make the temporal variation of grayscale values similar to that of colors.

Lu et al [LXJ12] indicate that local methods in the color image make pixels not to be processed in the same way and usually rely on the local chrominance edges for enhancement. The move towards spatial algorithms is motivated by two factors. The first is: for certain applications, preserving the luminance information per se might not result in the desired output. As an example, an equi-luminous image may easily have pixels with very different hue and saturation (as in Fig. 15 and 18). However, equating gray with luminance results in a flat uniform gray. Although a local mapping has advantages in accurately preserving local features, constant color regions could be converted inhomogeneously if the mapping changes in the regions. This inhomogeneity is observed as unwanted contours because these methods place too much emphasis on reproducing perceived chromatic differences. That is, almost the same color can be given too much grayscale difference, and thus the object with smoothly changing color can experience the contour effect [KAC11].

7.3. Hybrid color-to-grayscale conversion methods

Global mappings ensure that the identical color values are mapped to identical gray values, been typically efficient and fast, while for local approaches, the C2G mapping of pixel values is spatially varied, depending on the local distributions of colors, yielding better results, but are typically complex and computationally expensive [WT14]. According to this generalization, for many authors color-to-grayscale methods can only be grouped as global or local mapping; however, some global mapping C2G conversion methods take into account local aspects, whereas some local mapping C2G conversion methods consider global consistency. This is the reason to classify those methods as hybrid.

Hybrid methods attempt to preserve both local and global features simultaneously, and their goal is to raise the perceptual accuracy rather than exaggerate the discernment, as in [SLTM08, KAC11, AAB11, HLS11, LXJ14, WT14, JLN14, JFWM15, DHSML15]. These methods combine global mapping for color-to-grayscale conversion, which prevents inhomogeneous conversion of constant color region that could disturb the appearance of the image, with local differences to retain feature discriminability.

Perhaps the first application of this strategy was the ApparentGrayscale method of Smith el al. [SLTM08] which used a two-step approach to, in first place, globally assign gray values and determine color ordering, and secondly, to locally enhance the grayscale to reproduce the original contrast. They incorporate the Helmholtz-Kohlrausch color appearance effect for predicting differences between isoluminant colors. This multiscale local contrast enhancement reintroduces lost discontinuities only in regions that insufficiently represent original chromatic contrast.

As Liu et al. [LLWL16] explicate in a recent work, hybrid methods often contain a multi-stage process or multiple constraints. For example, Jin et al. [JLN14] proposed a local variance maximization and brightness preservation algorithm for decolorization. In order to minimize the difference among the local transformations at nearby pixel locations, the total variation regularization was also employed in the decolorization process. The recently reported saliency-guided region-based optimization method of Du et al. [DHSML15] involves a two-stage parametric color-to-gray mapping function, which considered the joint global and local information.

In general, due to quantization strategies or prohibitive function optimization, the existing approaches fail to render the original image look and to preserve the finest details and the luminance consistency (shadows and highlights should not be reversed). Additionally, most of the existing approaches are computationally expensive. According to [AAB11] and [ZSM14], the concept of image decolorization is not to generate a perfect optical match, but rather to obtain a plausible image that maintains the overall appearance and primary the contrast of the most salient regions. Moreover, Kim et al. [KJDL09] recognize that local methods might cause unpleasant halo artifacts, preferring global methods in recent research works.

Although global, local, and hybrid are the most common ways of grouping C2G conversion algorithms, another classification, according to Benedetti et al. [BCCCR12], splits the C2G conversion methods into two categories:

1. Functional methods. These conversions are image-independent local functions of every color, e.g., for every pixel of the color image a grayscale value is computed using a function whose only parameters are the values of the corresponding color pixel.
2. Optimizing methods. These are more advanced techniques which take into account more complex features that can only be obtained by analyzing the image as a whole.

In the opinion of Tigora and Boiangiu [TB14], the above mentioned classification closely resembles the common classification of global, local, and hybrid. The difference between these two points of view is that the later goes into more detail. Functional conversions can be further divided into trivial, direct methods and chrominance direct methods. Trivial methods either select a channel to represent the color (e.g. L * of CIEL*a*b*), or average the values of the components (e.g. the average conversion in section 5.1). Direct methods expand on the trivial ones, using weighted sums of the data components (see section 5.1). Chrominance direct methods aim to correct the results of direct conversion methods so that they better reflect human perception, as illustrated by the Helmholtz-Kohlrausch effect. These authors remarks that optimizing methods vary greatly in their approaches, but they can also be classified into three groups. The first group consists of functional conversions that are then followed by various optimizations based on the characteristics of the image. The second employs iterative energy minimization, whereas the last category consists of various orthogonal solutions that do not fit within any of the previously enumerated categories. A different category altogether is the one that includes algorithms that are meant to code a color image into a grayscale image in such a way that the color image can be recovered from the printed grayscale result.

8. Evaluation procedure of color-to-grayscale methods

The evaluation of a color dimension reduction method depends on the porpoise of the final image. For example, if the reduced color image will be judged by human observers, it depends if the viewers have normal vision or are color-deficient people. Alternatively, if the reduced color image will be digitally processed, generally as a grayscale image, the evaluation procedure might quantify the performance of the subsequent processing stages. Then, a general characterization of the evaluation procedure for color dimension reduction methods is almost impossible, and generally the procedures are designed ad-hoc for the specific color dimension reduction porpoise. However, for C2G conversion procedures, the authors of recent works have been going up in consensus to some substantial aspects for the evaluation procedure, mainly in the usage of the image dataset and some objective performance parameters.

Unfortunately, as has been recognized by [MZZW15], despite its wide usage in real-world applications, little work has been dedicated to compare the performance of C2G conversion algorithms. Moreover, there is no specialized color to grayscale evaluation procedure collectively accepted to validate the success of a decolorization procedure, and a wide number of the authors propose their own method according to the algorithm introduced. Due to the lack of a general framework to C2G transformation methods evaluation, most of the researchers until 2009 managed solely a perceptual validation showing their results with few images [GOTG05, NCN07, GD07, SLTM08, KJDL09].

Subjective evaluation has been employed as the most straightforward image quality assessment (IQA) methods. Until 2013, several authors exclusively achieved user experiment by subjective evaluation with some participants (generally between 6 and 17) [BB04, BE04, RGW05a, b, AAB11, STCBY13]. All of them were focused on evaluating some desirable properties or advantages of their methods.

One common difficulty related with the evaluation of C2G procedures was remarked by Ma et al. [MZZW15], consisting in that many state-of-the-art algorithms are parametric. Parameters in those algorithms can be roughly categorized into two types: one type controls the influence of the chromatic contrast on the luminance component [GOTG05], [KJDL09], [SLTM08]; the other enables certain flexibility of the implementation [GD07], [LXJ12]. These parameters are typically user-specified, but it is often a challenging task to find the best parameters without manually testing many options. The reason is that the performance of these C2G conversion algorithms are often highly parameter-sensitive, such that different parameter settings could lead to drastically different results.

8.1. Image datasets

In order to evaluate C2G conversion algorithms, different image datasets have been launched containing both natural and synthetic images. In 2008, Cadík [Cad08] published an influential work with the earliest dataset for C2G subjective evaluation. This contains 24 highly saturated images with 390x390 pixels. The contents vary from geometric pattern to real world scenes. This dataset has been reported as a benchmark for C2G conversion procedures evaluation, but there are two drawbacks limiting its usage: first, it contains very few images to perform an appropriate statistical analysis, and second, although it contains various types of images, most of them are synthetized containing only a few colors which are very simple to demand high procedure performance. Actually, scenes from the real world, especially for outdoor environments, usually involve abundance of colors and patterns.

Later, Lu et al. [LXJ14] published a new dataset named Color250, which contains 250 images with 200 photographic and 50 synthetic ones (digital charts, logos, and illustrations, which are ubiquitous in document printing). The 200 nature images were selected from the Berkeley Segmentation Dataset (BSDS) and Saliency Detection Dataset (SD), taking the 100 most colorful images from each one, based on the fact that decolorization methods perform similarly on colorless or grayish images in general. Although these datasets are designed for other purposes, the contained images are natural with foreground and background. Images from the Berkeley Segmentation Dataset (BSDS) have also been used by Liu et al. [LLXWL15]. The BSDS contains over 1000 human segmented natural images and it is used to evaluate segmentation algorithms. These images contain a variety of important structures such as large scale edges, smooth areas and fine textures, etc.

In 2015 Günes et al. [GKD15] made use of the “Olivia-Torralba” (OT) dataset with 2688 images and the “Caltech-101” dataset with 9197 images, which are commonly used in machine vision applications. OT dataset contains color images classified into eight categories: 360 coasts, 328 forest, 374 mountain, 410 open country, 260 highway, 308 inside of cities, 356 tall buildings, and 292 streets. Resolution of images in OT dataset is 256x256 and obtained from the Corel stock photo library. Caltech-101 dataset consists of images classified into 101 different object categories. Categories such as accordion, cannon, butterfly, chair, camera, airplanes, and motorbikes are comprised of 30–800 images. The Caltech-101 images have different resolution; however, most of them are medium resolution from 250 to 400 pixels. However, according to the mentioned authors, not all images of these two datasets are used.

Recently, Du et al. [DHSML15] launched the Complex Scene De-colorization Dataset (CSDD) with 22 different images with abundant colors and patterns.

Also in a recent work, You et al. [YBW16] introduced the NeoColor dataset with 300 images specially designed for C2G evaluation. NeoColor improves existing C2G datasets with high quality digital images of different size, great color complexity from typical C2G scenarios, including: natural scenes, commercial photograph, printing, books, magazines, masterpiece artworks and computer designed graphics.

8.2. Objective performance parameters

Classical full-reference approaches, such as mean squared error (MSE) and structural similarity (SSIM) index are not applicable in these circumstances, because the original color and the obtained grayscale images do not have the same dimension. Applying reduced-reference and no-reference measures is also conceptually inappropriate because the color image contains more information than the grayscale image.

Kuhn et al. [KOF08] introduced in 2008 an error metric to evaluate the quality of color to grayscale transformations, which is not perceptual. It measures whether the difference between any pairs of colors (ci, cj) in the original color image has been mapped to the corresponding proper target difference in the grayscale image using the root weighted mean square (RWMS) computed in the CIEL*a*b* color space. This quantitative criterion was utilized later by [EKB13] to quantify colormap quality for structure-preserving color transformations between images measuring the distortion of relative color distances in two images. This kind of color transformation is used in image processing problems such as gamut mapping, decolorization, and image optimization for color-blind people. Plotting the pixel-wise RWMS error as an image allows to see which pixels are affected the most by the color transformation. The average RWMS in all the image is used as a single number representing the quality of the colormap. Later, in 2015 Ma et al. [MZZW15] reported that they implemented the exact version of RWMS without k -means algorithm for color quantization. According to these authors, the RWMS had not been tested on (or calibrated against) subjective data.

Lim and Isa [LI11], to evaluate the performance of C2G conversion algorithms, introduced in 2011 for the grayscale obtained image the calculation of its mean intensity (MI) as an indicator of the lightness, standard deviation (SD) as an indicator of the contrast, and the entropy (E) as an indicator of the amount of information corresponding to the richness of details; all of them calculated according to their classical definitions. The greatest value of the MI indicates the more lightness in the grayscale image. Similarly, a greater value of the SD indicates bigger contrast. Likewise, higher value of the E indicates the more details revealed by the resultant grayscale image.

In 2012 Lu et al. [LXJ12] introduced the color contrast preserving ratio (CCPR) to quantitatively evaluate the C2G conversion algorithms in terms of contrast preserving. The CCPR is based on the judgment that if the color contrast between two pixels p and q, Cp,q defined in equation (13), is smaller than a threshold t, it becomes nearly invisible in human vision. The task of contrast-preserving decolorization is therefore to maintain color change that is perceivable by humans; that is,

illustration not visible in this excerpt

where W is the set containing all neighboring pixel pairs with their original color difference Cp,q ³ t. The parameter || W || is the number of pixel pairs in W, whereas is the number of pixel pairs in W that are still distinctive after C2G conversion. The task of contrast preserving C2G conversion is therefore to maintain color change that is perceivable by humans. Chen and Wang [CW04] suggested that color difference Cp,q < 6 is generally imperceptible.

In 2013 Song et al. [SBXY13] proposed a cost parameter to measure the contrast preservation quality of the grayscale conversion. To compute the cost parameter, given an input color image F and a grayscale conversion result J, a bilateral filtering on F is performed with itself and J as guidance images, respectively, to get FF and FJ. Ideally if all the details in the color image can be reproduced in the grayscale image, the bilateral filtered results FF and FJ should be identical. However, this will not be the case in reality, since the dimensionality reduction process probably will cause contrast loss for most images.

Lu et al. [LXJ14] introduced the color content fidelity ratio (CCFR) and the E -score to increase the chance of a more objective performance evaluation of C2G conversion procedures. Color content fidelity ratio is defined as

CCFR = 1 - (16)

where Q is the set containing pixels pairs with | g pg q | > t, corresponding to structures with the least contrast. According to these authors, if the original pixel difference is small, i.e., Cp,q £ t, the ratio / measures the occurrence of unwanted “artifacts” in the result.

Additionally, the E -score considers jointly the color preservation (CCPR) and color fidelity (CCFR). It is the harmonic mean of the two measures, similar to the F -measure in statistics. It is written as

E -score = (17)

As it has been remarked by these authors, a higher CCPR or CCFR does not imply a better result. Only the harmonic mean of CCPR and CCFR, i.e. the E -score, determines the final quality. However, although the E -score provided the most promising results so far until 2014, it cannot make adequate quality predictions of C2G images [MZZW15].

Jin et al. [JLN14] proposed a measure called CSErr to quantitatively evaluate the degree of preservation of the consistency of a local mapping. Assuming that there are n different colors for a given color image. For the k th (k = 1, ..., n) color, they determine the corresponding locations in the color image, and then check how many different numbers Num k of grayscale values exist in the decolorized image. The measure CSErr is then defined as follows,

illustration not visible in this excerpt

For this measure of error, the lower the value is, the higher the consistency of the decolorized mapping is.

In order to enable an objective quantification of the performance of a C2G conversion method, Wu and Toet [WT14] adopt the normalized cross-correlation (NCC) between the resulting grayscale image and the R, G, and B color components of the original input image as a conversion quality metric,

illustration not visible in this excerpt

where I i represents one of the three (R, G, or B) components of the color input image, I g represents the grayscale output image, and x, y represent the image coordinates. The motivation for using this metric is the requirement that the color variations (features) in each of the individual color components should be optimally preserved in the resulting grayscale image. Since this metric has no direct relation to human visual perception, it may, in some cases, assign a quality ranking that is not strictly related to human judgment.

One of the most promising result was published recently by Ma et al. [MZZW15], which is one of the first attempts to develop an objective quality model that automatically predicts the perceived quality of C2G converted images. Inspired by the philosophy of the structural similarity (SSIM) index, which assumes that human visual perception is highly adapted for extracting structural information from its viewing field. They proposed a C2G structural similarity (C2G - SSIM) index map, which evaluates the luminance, contrast, and structure similarities between the reference color image and the C2G converted image. To obtain a single score for the overall quality of the entire image, they introduce a single Q -score which can be obtained by taking the average of the C2G-SSIM map, complementing the E -score (equation 17). According to these authors the E -score emphasizes preservation of high contrast, and measurement of spatial contrast preservation is based on random sampling which does not fully reflect the concept on visual perception preservation. The Q -score has better quantization for color contrast measurement; however, scale dependent contrast is not emphasized. The authors compared C2G - SSIM index with existing metrics RWMS, CCPR, CCFR and E -score using Cadík dataset and Color250 dataset, showing that their quality model outperform those other objective metrics, been comparable to an average subject acting as observer in the subjective evaluation. An important characteristic of the C2G - SSIM index is that for parametric C2G procedures, this metric provides a useful tool to automatically choose the best parameters without involving human intervention.

Other decolorization metric was proposed by Liu et al. [LLXWL15] named gradient correlation similarity (Gcs). Contrary to the conventional data-fidelity term consisting of gradient error-norm-based measures, the Gcs measure calculates the summation of the gradient correlation between each component of the original RGB color image and the transformed grayscale image. The Gcs, conducted between each RGB component of the input color image and the resulting grayscale images, works better to reflect the degree of preserving feature discriminability and color ordering in C2G conversion reproducing successfully the visual appearance of color images. The Gcs was used afterwards by [LLWD16] in their two-stage parametric subspace (TPS) conversion procedure. In the first stage, the Gcs measure is used to obtain an immediate grayed image. Then, the Gcs is again applied to select the optimal result from the immediate grayed image plus the second subspace-induced candidate images.

In 2016 Liu et al. [LLWL16] introduced the color contrast correlation preserving ratio CCPR defined as

illustration not visible in this excerpt

where Q is the set containing all neighboring pixel pairs with their original color difference ³ t. The parameter is the number of pixel pairs in Q, whereas is the number of pixels pairs in Q that are still distinctive after decolorization.

Note that the defined CCPR, CCFR, E -score, and C3PR only measure the quantitative performance of all the images in the dataset for each threshold level t. This common parameter ranges, according to the authors, between 1 and 40, and needs to be hand-picked by users. In order to better investigate the performance on each image for different K values of the threshold t, Liu et al. [LLWL16] suggest the average of CCPR (ACCPR) and the average of C3PR (AC3PR) for each image defined as follows:

illustration not visible in this excerpt

However, Sowmya et al. [SGS16] showed that the converted grayscale images, in which the important details of the original color image were eliminated, can have greater values of NCC and CCFR; therefore, a new objective metric which measures the preservation of the finer details of the input color planes in the resultant gray plane should be devised as a future work.

Another way, a perceptual valuation from visual attention analysis, positioned in-between objective and subjective, was introduced in 2010 by Yang et al. [YSBCJ10] named saliency map. It acts as a classifier and aims at preserving the attention area in the output grayscale image. Figure 25 shows the input natural color image and its saliency map, which is claimed to be a representation of the human attention distribution. The idea is to make a valuation of attention preservation analyzing if the C2G method converts the input color image into a grayscale representation that retains visually salient information, especially for the most attentional areas. There are diverse algorithms to estimate the saliency map of an image, been one of them the proposed by Hou et al. [HHK12]. This algorithm can be used to predict human fixation points very well and has a very fast data rate [JLN14].

illustration not visible in this excerpt

Fig. 25. Color image with its saliency map (upper row) and the grayscale version with its saliency map (lower row). Figure adapted from [JLN14].

To improve the saliency maps introduced by [YSBCJ10], the work of Ancuti et al. [AAB11] proposed the saliency weight map to put emphasis on maintaining saliency map before and after the C2G conversion. Their algorithm computed saliency maps for each (R, G, and B) color component. It reveals the degree of conspicuousness with respect to the neighborhood regions, the exposedness weight map to estimate the degree to which a pixel is exposed, and the chromatic weight map to control the saturation contribution of the inputs in the decolorized image.

Afterward, Zhu et al. [ZHL13], based on the filter theory, formulated a novel of channel-level distinction, called channel salience, to depict the filter level of three color stimuli. This salience metric guides a contrast adjustment process to enhance the perceived grayscale in the final output. Contrasting with Ancuti et al. [AAB11], Zhu et al. method does not adopt the salience map but builds channel salience for contrast comparison with three channels (hue, luminance, and saturation in IHLS color space) and it could help to preserve the most salient information in perceived grayscale.

8.3. Subjective evaluation

In 2008, an in-depth evaluation provided by Cadík [Cad08] conducted two subjective experiments with the participation of 119 human subjects to psychologically evaluate the performance of seven C2G conversion algorithms in the state-of-the-art. He obtained nearly 20,000 human responses (119 human subjects and 24 color images) which were surveyed and used to evaluate the accuracy and preference of the conversions.

Cadík’s subjective methodology to evaluate C2G conversion methods become a benchmark to compare subsequent objective quality assessment metrics. The experiment asked the subjects to select the more favorable C2G image from a pair of images. Two subjective experiments were conducted, including, in first place, accuracy, in which the two C2G images were shown at the left and right sides with the corresponding reference color image in the middle, and secondly, preference, in which the subjects gave opinions without any reference. Based on the subjective scores, it is valuable to analyze and observe the behavior of all subjects for each image set, which consists of a color image and its corresponding C2G images obtained by each conversion method. The comparison might be based on Spearman’s rank-order correlation coefficient (SRCC) and Kendall’s rank-order correlation coefficient (KRCC).

For each image set (original and grayscale versions obtained by each method), the rankings given to each image are averaged over all subjects. Considering these average ranking scores as the “ground truth”, the performance of each individual subject can be observed by computing SRCC and KRCC between their ranking scores with the “ground truth” for each image set. Furthermore, the overall performance of the subject can be evaluated by the average SRCC and KRCC values over all 24 image sets. The mean and standard deviation of SRCC and KRCC values for each individual subject in the accuracy test and in the preference test might be calculated and plotted. It can be used to observe if there is an agreement between different subjects on ranking the quality of C2G images. A low standard deviation of SRCC and KRCC denotes a high degree of agreement.

However, the subjective evaluation method is time consuming, expensive, and most importantly, cannot be incorporated into automated systems to monitor image quality and to optimize image processing algorithms. So far, perceptual evaluation remain until now as a valuable method judging C2G conversion methods. Recently Sowmya el al. [SGS16] included in their work the five score values for perceptual evaluation shown in Table 3.

Table 3. Description of visual perception score.

illustration not visible in this excerpt

To conclude the review of C2G procedures evaluation, it is suitable to mention to the work of Benedetti et al. [BCCCR12] which explains that three approaches can be used to evaluate the correctness of different color-to-grayscale conversion algorithms:

- A perceptual evaluation, which is best suited for grayscale printer reproduction and other human-related tasks, as has been used by [Cad08, AAB11, LU12, JLN14].
- An information theory approach which could quantify the amount of information that is lost during the dimensionality reduction, performed by [LXJ12, LXJ14, WT14, JLN14, MZZW15, LLWL16, YBW16].
- An approach that is tailored to measure the results of the subsequent image processing algorithms, as in [BCCCR12, KC12, GKD15].

Despite the advent of new datasets and some objective metrics to evaluate the performance of C2G conversion algorithms, the usage of images and perceptual procedures designed ad-hoc persist for particular conversion methods as in [TT16].

9. Comparison of state-of-the-art methods for color-to-grayscale conversion

The result according to Cadík’s benchmark evaluation showed that the Decolorize [GD07] and ApparentGrayscale [SLTM08] conversions were overall the best ranked approaches. The first one is a global mapping image dependent method, and the second one is a hybrid one using a global mapping as a first stage. None of them outperformed the other. Specifically, each of the seven evaluated C2G conversion methods was ranked the worst for at least one of the 24 images. In this broad evaluation, the lack of fully satisfactory results showed the absence of a suitable C2G conversion method which perceptually please all observers, and impelled the development of new methods. On the other hand, the local mapping approach of Bala and Eschbach [BE04] performed the worst. Additionally, Cadík’s study observed that the methods that optimize an objective function have been classified as less perceptually accurate.

Since Cadík evaluation, other more successful methods have been published. Table 4 summarize the 30 most successful methods for C2G conversion published during the first 15 years of the XXI century, according to the literature review. There are much other published methods as can be seen in references [NK06, DAA07, KOF08, QFQ08, TSU08, AD09, DCFB09, LKS10, Sar10, STCLC10, ZT10, HLS11, LI11, LI12, HR13, LL13, PP13, SK13, TB14, WR14, XLZZ14, LCYC15, MBT15, LLWD16, MMDBBK16, SGS16, TT16], not included in Table 4. Users do not always have a complete information about the characteristics, possibilities, and drawbacks of these methods. Moreover, the great diversity of methods and the aim of them may be confusing to beginners.

Table 4. The 30 state-of-the-art global, local, and hybrid color to grayscale image conversion methods.

illustration not visible in this excerpt

10. Examples of application of color dimension reduction

First example: The most usual color dimension reduction is the color-to-grayscale conversion. One perceptual comparison of the C2G conversion utilizing different methods can be done utilizing a synthetic image with isoluminant colors from the benchmark dataset published by Cadík [Cad08], as seen in Figure 26.

It is suitable to highlight some distinctive aspects observed in Figure 26. The widely used rgb2gray function of Matlab (Y component of the YUV color space and also used by some television standards as the NTSC) is unable to differentiate all the image structures. The brightness of the HSB color space can produce only a partial gray extent, ranging between bright-gray and middle-gray; whereas the lightness of the HSL color space, a partial gray extent between middle-gray and dark-gray. The L * component of the CIEL*a*b color space cannot differentiate the isoluminant colors and it might be noted the great difference between the L component of the HSL color space and the L * component of the CIEL*a*b color space (see sec. 5.2). The local method of Bala and Eschbach, identified here as BalaE04, achieved the worst performance in Cadík evaluation, showing in this figure that it is susceptible to colors with similar lightness, and may convert inhomogeneously the appearance of constant color regions. The Rasche05’s global mapping method produces a grayscale image whose gray ordering does not match very well with the natural perception of colors in the original image. Using the Color2Gray global mapping method of Gooch et al. [GOTG05], the gray ordering gives the impression of contradicting the colors’ luminance ordering. By the Decolorize global mapping of Grundland and Dogson [GD07], there is a loss of contrast between blue and blue-gray regions, and also distant regions with different color are mapped into very similar gray levels. With the local mapping method Neuman07 [NCN07], the different gamut of orange is mapped with very low contrast and converts inhomogeneously constant color regions. Through the ApparentGrayscale’s hybrid mapping method [SLTM08], the original color contrast is not well represented in the grayscale image, and some local artifacts are produced. Using the Kim09’s global mapping method [KJDL09], an acceptable grayscale image is obtained for this color image, but the influence of chromatic contrast on feature discriminability requires user intervention. In the global mapping method identified as Cui09 [CHRW09], the grayscale image shows salt-and-pepper noise (due to a not big enough parameter k in the KNN search). The hybrid mapping method Kuk11 [KAC11] produced a suitable conversion, but also requires human intervention. The hybrid method termed Ancuti11 [AAB11] fails in adjacent blue and bluish regions because it does not reflect the true salient information of the image. The global mapping method identified as RTCP [LXJ12] produces grayscale artefacts in some regions, causing higher local contrasts than the original image. Using the global mapping method Zhu13 [ZHL13], there are artefacts caused by adjustment on contrast map, and some unwanted sharp edges can be observed, caused because the salient channel is the hue channel, then pixel values are nearly piecewise constant. There is an over-enhancement of the edges in the hybrid method of Wu14 [WT14] with noticeable artifacts. The hybrid mapping method of CPDecolor [LXJ14] usually cannot preserve well the local contrast, but the result for this image is pleasing. Similarly, the hybrid method of Jin14 [JLN14] can cause possible artifacts in some smooth regions due to over-boosting of minor edges and needs user intervention, but for this image the C2G conversion is also pleasing. The Nguyen15’s local mapping method [NH15], also known as COC method, can preserve the color contrast in the grayscale image, but the hue orientation of the colors in the original image is reverted in the grayscale version; actually, it is somehow the opposite orientation than in the CPDecolor’s and Jin14’s methods. In the hybrid method Ji15 [JFWM15], some local features in the color image cannot be preserved in the grayscale result. The two methods of Liu et al. [LLXWL15] shows good chromatic discrimination; however, the global mapping GcsDecolor1 causes an apparent over-fitting contrast, whereas the local mapping version known as GcsDecolor2 performs similar to the Ji15’s method. Finally, in the upgraded version of these methods published in 2016 as a global mapping named SPDecolor [LLWL16], the gradient error correction of parameters is highly non-linear, influencing the final gray shades distribution. Additional C2G conversions of this isoluminant synthetic image, using other methods, can be seen in Appendix 1.

illustration not visible in this excerpt

Fig. 26. Isoluminant synthetic image from Cadík’s dataset [Cad08] converted to grayscale utilizing different methods.

Second example: The term simultaneous contrast exposed in section 2.3 is also known as either simultaneous brightness contrast, simultaneous lightness contrast or plain simultaneous contrast; because of the similarity between brightness and lightness under many circumstances [Kin11]. It is also well known that a lightness contrast enhancement in the RGB color space without a chromatic distortion is unpractical; that is, maintaining both hue and saturation unchanged [SMI05]. Different to the most general concept of contrast enhancement which involves both luminance and chrominance contrast enhancement, lightness contrast enhancement (LCE) deals only with the luminance component. For example, the Decolorize algorithm for C2G conversion [GD07] allows contrast enhancement. In general, contrast enhancement techniques increased the dynamic range of the all image including linear mapping, histogram stretching and equalization, and gamma correction, which are all commonly found in standard commercial image editing software.

One of the applications of color dimension reduction is the lightness contrast enhancement for images with low lightness contrast [SMI05]. One approach to this is obtaining two versions of the original RGB color image, the first is its luminance-chrominance version (i.e. the YCbCr color space) version, and the second is its grayscale version obtained by a suitable procedure which well represents image salient features. The key for this approach is to obtain a high contrast grayscale version with visual cues preservation. Finally, the luminance component (i.e. the Y component of the YCbCr color space) can be replaced by the obtained grayscale image and back transformed to the RGB color space. This method can by summarized in the diagram of Figure 27.

illustration not visible in this excerpt

Fig. 27. Diagram of the second example for image lightness contrast enhancement (LCE).

This method is useful mainly for images with different colors with similar lightness since a suitable C2G conversion able to increase the lightness contrast in the grayscale image will increase the lightness contrast in the final color image. Some examples of LCE can be seen in Figure 28 for three images taken from the Cadík dataset, and the images 104 and 150 of the Color250 dataset. The color images shown in Figure 28 have low contrast in their lightness as can be seen in the second column, where the Y component of the YCbCr color space is shown. Then, the replacement of the low contrast Y component by the higher contrast grayscale image obtained by the CPDecolor method (third column) [LXJ14], and finally the back conversion YCbCr-to-RGB creates an increased lightness contrast image. Obviously, the final result depends on the properties of the used C2G conversion method.

The newly created colored images when converted to gray using traditional methods will then retain good gray contrast as shown [GOTG05]. However, while this is a useful idea, it is not without problems as recognized [Ras05]. As it can be seen in Figure 28, simply adding the chromaticity to the recovered lightness will not always produce in-gamut colors. If the new colors are clipped, or otherwise gamut mapped, the process of going from optimized gray to optimized color is not invertible. The gray image resulting from a traditional grayscale transformation of the optimized color image may not match the result of the optimized gray image. Some efforts have been done to solve this problem; for example, [Ras05] using the CIEL*a*b* color space circumvents this problem including a constraint that each gray value stays within the available L * range, typically {0, 100}.

illustration not visible in this excerpt

Fig. 28. Examples of lightness contrast enhancement (LCE) utilizing the procedure shown in Figure 27. Columns: a) Original RGB images, b) Y components of the YCbCr color space, c) Grayscale versions of original RGB using the CPDecolorize method of [LXJ14], and d) result of the LCE.

Third example: In outdoor scenarios, the light reflected from a surface is scattered in the atmosphere by different particles (e.g. water-droplets, fog, smoke, dust, mist, fumes, or haze), deflecting light from its original course of propagation before it reaches the camera. According to Fattal [Fat08], in long distance photography or foggy scenes, this process has a substantial effect on the image, in which contrasts are reduced and surface colors become faint. Such degraded photographs often lack visual vividness and appeal, and moreover, they offer a poor visibility of the scene contents. This effect may undermine the quality of aerial photography like in the case of satellite imaging which is used for many purposes including cartography and web mapping, land-use planning, archeology, and environmental studies. Image haze removal (or dehazing) is a challenging problem which have been introduced only recently, grouping without distinction the process for image enhancement under the affectation of the mentioned different type of particles. This technique have been increasing in interest applying diverse techniques [Fat08, He09, AAB11].

Assuming that due to atmospheric absorption and scattering, distant surfaces or objects will appear lighter and less colorful, the same strategy used to lightness contrast enhancement can be used to image dehazing. Although different strategies has been developed to image dehazing [Fat08, HST09], the work of Ancuti et al. [AAB11] introduced a comprehensive procedure which first identifies the hazed areas by computing the luminance difference between initial image and a processed version. Therefore, manipulating the contrast difference between the hazed and non-hazed regions, the algorithm is able to significantly reduce the degree of haze. A simplified version of this algorithm will be applied to dehaze the images shown in Figure 29, where the step of identifying the hazed areas is obviated and applying light contrast enhancement to all the image. For C2G conversion is used the global mapping SPDecolor method [LLWL16] and then an adaptive contrast enhancement based on the CLAHE (contrast-limited adaptive histogram equalization) algorithm [Rez04].

illustration not visible in this excerpt

Fig. 29. Image dehazing based on the enhancement of its grayscale version obtained by the SPDecolor of Liu et al. [LLWL16] and the contrast-limited adaptive histogram equalization (CLAHE) algorithm.

Fourth example: Unlike people with normal color vision, people with color deficiency have difficulties discriminating certain color combinations and color differences. Recolored images by a color dimension reduction procedure can be used to highlight important visual details that otherwise be unnoticeable by color deficient viewers [RGW05a, b, HTWW07, KOF08]. For example, in the method proposed by Kuh et al. [KOF08] the color gamut of each class off dichromacy can be represented by two half-plane in the LMS color space, which can be satisfactorily approximated by a single plane passing through the luminance axis. Thus, for each class of dichromacy, they mapped its color gamut to the approximately perceptually-uniform CIEL*a*b color space and used least-squares to obtain a plane that contains the luminance axis and best represents the corresponding gamut. Figure 30 shows the original color image (left column), the image as is perceived by a color deficient viewer (center), and the recolored image by the method of these authors. The perceived image and the recolored image for protanopes (i.e. dogs and individuals without red cones) are shown in the first row, for deuteranopes (i.e. individuals with deficiencies distinguishing between red and green tones) in the second row, and for tritanopes (i.e. individuals without blue cones) in the third row.

illustration not visible in this excerpt

Fig. 30. Original image (left column), the image as is perceived by a color deficient viewer (center), and the recolored image by Kuhn et al. The perceived image and the recolored image for protanopes are shown in the first row, for deuteranopes in the second row, and for tritanopes in the third row. Images taken from the supplemental results of [KOF08].

Other example of image recoloring can be seen in [YLL15] using the grayscale image as an index in an indexed image and defining a colormap with the intention of modifying the original color sequence. This procedure is known as pseudocoloring, and it is a common technique for adding color to grayscale images such as X-ray, MRI, scanning electron microscopy (SEM) and other imaging modalities in which color information does not exist. Pseudocoloring is classified as an “image enhancement” technique because it can be used to enhance the detectability of details within the image. As in previous examples, the C2G conversion allows to increase the contrast enhancement and salient features, now of the index image. Figure 31 shows the re-colorization of one of the Cadík’s dataset images based on an index image obtained as the grayscale version through the CPDecolor [LXJ14], and using the Matlab’s colormap jet and hot. This recoloring option is particularly useful for maps, diagrams, and other synthetic graphs.

illustration not visible in this excerpt

Fig. 31. Recoloring of one of the Cadík’s dataset images based on an index image obtained as the grayscale version through the CPDecolor [LXJ14], and using the Matlab’s colormap jet and hot.

Fifth example: Edge detection of image structures is a difficult task under uneven illumination. Some of the color-to-grayscale conversion methods in the state-of-the-art are able to enhance image salient features and over-enhance the edges of structures [GD07, SLTM08, KJDL09, WT14, LLWL16]. Dikbas et al. [DAA07] developed a color edge preserving grayscale conversion algorithm that helps detect color edges using only the luminance component. The algorithm calculates an approximation to the first principal component to form a new set of luminance coefficients instead of using the conventional luminance coefficients. This method can be directly applied to all existing grayscale edge detectors for color edge detection. Processing only one channel instead of three channels results in lower computational complexity compared to other color edge detectors. Also, Ancuti et al. [AAB11] make use of their hybrid C2G conversion algorithm for segmentation under different illuminations. Their results reach high consistency for segmenting flowers from their background under different illuminations as can be seen in Figure 32.

Fig. 32. Edge detection of image structures under different illuminants. Images taken from [AAB11].

illustration not visible in this excerpt

11. Final remarks

The problem of color-to-grayscale image conversion has been growing in interest in recent years. It is an open and active research topic to enormously exploit, as much as possible, the limited range in gray scales to present the input color image contrast. Since the C2G image conversion is fundamentally a dimension reduction process, it entails an inevitable problem of loss of information. The basic problem is to reproduce the intent of the color original, its contrasts and salient features, while preserving the perceived magnitude and direction of its gradients. Several works recognize the complexity of the problem; however, numerous approaches for color-to-grayscale conversion have been proposed. Several of these methods have been oriented to visual perception of grayscale image in printed documents, artistic purpose, etc. The other part of the methods are focused on obtaining grayscale images which will be digitally processed for automatic analysis. The simplest and widely used approach to convert color to grayscale images is based on neglecting the chrominance channels, e.g. taking a luminance channel as a grayscale representation of the original color image. This approach, based generally on one of the color spaces, is simple and computationally efficient, but it may fail for features with isoluminant colors. More advanced methods make use of a great diversity of strategies. Most of them consider the grayscale conversion as an optimization problem, trying to find an optimal color-to-grayscale mapping that preserves the contrast in fixed-size (or global) spatial neighborhoods, or among the colors appearing in a particular image. Solutions to the optimization problem presents slow numerical computing, typically consuming many seconds or minutes on a medium size image.

The analysis of individual images reveals that no conversion produces universally good results for all the involved input images neither for all the image analysis procedures. All of the methods have shown to be effective from some aspects. It is widely recognized that, in spite of the efforts of involving more complicated color models and computational models to solve the problem, all of the existing methods suffer from the same weakness - robustness: failure cases can be easily found for each of the methods, either of missing major structures in original color images or losing the perceptual plausibility. This prevents all these methods from being practical for real-world applications.

A thought-provoking question raised from Song et al [SBXY13]: can we reach a robust solution using the simplest color model and the most straightforward computational model? They went back over the direct method using the rgb2gray Matlab function based on the Y component of the YUV color space, but modifying it to avoid failures in isoluminant regions by choosing channel weights depending on the specific image. According to the authors, their method showed “good” results for each color image, among which the “best” one preferred by users can be selected by further involving perceptual contrasts preferences.

Published works suggest that there still exist areas for improvement of current conversions, especially in the robustness over various inputs. Moreover, the desirable properties of the color to grayscale conversion may sometimes depend on the chosen subsequent pathway. In this regard, since the response of the HVS to color difference of complex visual stimuli remains an active research topic, more advanced and accurate estimates of color image contrast and structure may be further investigated.

Correspondingly, the problem of evaluating color-to-grayscale image conversion methods has been developing more accurate objective metrics and evaluation procedures, including the creation of suitable datasets, according to the desires of the specific purpose of the conversion method.

Until now, there are no much works discussing the robustness of the color-to-gray conversion algorithms in the case of noise environment, only the two recent works of Liu et al. [LLXWL15, LLWL16]. These works added Gaussian noise with cero mean and diverse standard deviations showing that when the noise level gradually increases, their C2G conversion methods degrade less than some previous algorithms. However, a more exhaustive analysis might be done in relation to the performance of the state-of-the-art C2G conversion methods in noisy environments.

A second parts of this work pretends to implement a quantitative evaluation of the exposed state-of-the-art for C2G conversion methods using the newly defined objective performance metrics and suggested datasets.


[AAB11] Ancuti C.O., Ancuti C., Bekaert P.: Enhancing by saliency guided decolorization. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2011), 257-264.

[AAHB11] Ancuti C.O., Ancuti C., Hermans C., Bekaert P.: Image and Video Decolorization by Fusion. Computer Vision -ACCV 2010, 6492 (2011), 79–92.

[Ade00] Adelson E.H.: Lightness perception and lightness illusions. The New Cognitive Neurosciences, 2nd ed., M. Gazzaniga, ed. Cambridge, MA: MIT Press, pp. 339-351, 2000.

[AD09] Alsam A., Drew M.S.: Fast Multispectral2Gray. Journal of Imaging Science and Technology (2009).

[AM13] Abbas N., Mohamad D.: Microscopic RGB color images enhancement for blood cells segmentation in YCbCr color space for K-means clustering. Journal of Theoretical and Applied Information Technology, 55, 1 (Sept. 2013), 117-125.

[AMM13] Abdul-Nasir A.S., Mashor M.Y., Mohamed Z.: Color image segmentation approach for detection of Malaria parasites using various color models and k-means clustering. WSEAS Transactions on Biology and Biomedicine, 10, 2, (April 2013), 41-55.

[ATAK07] Anagnostopoulos C.-N., Tsekouras G., Anagnostopoulos I., Kalloniatis C.: Intelligent modification for the daltonization process of digitized paintings. Proc. of the 5th Int. Conference on Computer Vision Systems (ICVS 2007).

[BB04] Bala R., Braun K.: Color-to-grayscale conversion to maintain discriminability. Proceedings of SPIE (2004), 196–202.

[BCCCR12] Benedetti L., Corsini M., Cignoni P., Callieri M., Scopigno R.: Color to gray conversions in the context of stereo matching algorithm: an analysis and comparison of current methods and an ad-hoc theoretically motivates technique for image matching. Machine Vision and Applications, 23, 2 (2012), 327-348.

[BE04] Bala R., Eschbach R.: Spatial color-to-grayscale transformation preserving chrominance edge information. Proc. IS&T/SID’s 12th Color Imaging Conference (2004), 82–86.

[BVM07] Busin L., Vandenbroucke N., and Macaire L.: Color spaces and image segmentation, Laboratoire LAGIS UMR CNRS, Université des Sciences et Technologies de Lille, Villeneuve d’Ascq – France, 2007.

[Cad08] Cadík M.: Perceptual evaluation of color-to-grayscale image conversion. Pacific Graphics Forum, 27, 7 (2008), 1745-1754.

[CHD04] Chen Y., Hao P., Dang A.: Optimal transform in perceptually uniform color space and its application in image coding. A. Campilho, M. Kamel (Eds.): ICIAR 2004, LNCS 3211 (2004), 269–276.

[CHRW09] Cui M., Hu J., Razdan A., Wonka P.: Color-to-gray conversion using ISOMAP. Vis. Comput, 26 (2009), 1349–1360.

[CW04] Chen H., Wang S.: The use of visible color difference in the quantitative evaluation of color image segmentation. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2004).

[DA07A] Dikbas S., Arici T., Altunbasak Y.: Chrominance edge preserving grayscale transformation with approximate first principal component for color edge detection. 2007 IEEE International Conference on Image Processing, 2, IEEE (2007).

[DCFB09] Drew M., Connah D., Finlayson G., Bloj M.: Improved colour to greyscale via integrability correction. Human Vision and Electronic Imaging XIV, edited by B.E. Rogowitz, T.N. Pappas, Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 7240, 72401B, (2009).

[DHSML15] Du H., He S., Sheng B., Ma L., Lau R.W.: Saliency-guided color-to-gray conversion using region-based optimization. Image Processing, IEEE Transactions on 24, 1 (2015) 434-443.

[EKB13] Eynard D., Kovnatsky A., Bronstein M.M.: Structure-preserving color transformations using Laplacian commutativity. arXiv:1311.0119v1 (1 Nov 2013).

[Fat08] Fattal R.: Single image dehazing. Proceedings of Computer Graphics (SIGGRAPH-2008), 27, 3 (2008), 1–9.

[GOTG05] Gooch A. A., Olsen S. C., Tumblin J., Gooch B.: Color2gray: salience-preserving color removal. ACM Trans. Graph. 24, 3 (2005), 634–639.

[GD07] Grundland M., Dodgson N. A.: Decolorize: Fast, contrast enhancing, color to grayscale conversion. Pattern Recogn. 40, 11 (2007), 2891–2896.

[GKD15] Günes A., Kalkan H., Durnus E.: Optimizing the color-to-grayscale conversion for image classification, Signal, Image and Video Processing (SIVP), (Oct. 2015).

[GWE04] Gonzalez R.C., Woods R.E., Eddins S.L.: Digital Image Processing Using Matlab. Prentice Hall (2004).

[HCJW09] Huang J.-B, Chen C.-S., Jen T.-C., Wang S.-J.: Image recolorization for the colorblind. Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. IEEE, 2009.

[HHK12] Hou X., Harel J., Koch C.: Image signature: Highlighting sparse salient regions, IEEE Trans. Pattern Anal. Machine Intell., 34 (2012), 194–201.

[HLS11] Hsin C., Le H.-N., Shin S.-J.: Color to grayscale transform preserving natural order of hues. 2011 International Conference on Electrical Engineering and Informatics, Bandung, Indonesia (17-19 July 2011).

[HR13] Haritha H., Reddy R.S.: Color conversion and watershed segmentation for RGB images. International Conference on Electrical and Electronics Engineering, (27th Jan 2013), 82-87.

[HST09] He K., Sun J., Tang X.: Single image haze removal using dark channel prior. IEEE CVPR, (2009).

[HTWW07] Huang J.-B., Tseng Y.-C., Wu S.-I., Wang S. –J.: Information preserving color transformation for protanopia and deuteranopia. IEEE Signal Processing Letters, 14, 10, (October 2007), 711-714.

[JFWM15] Ji Z., Fang M.E., Wang Y., Ma, W.: Efficient decolorization preserving dominant distinctions. The Visual Computer (2015) 1-11.

[JLN14] Jin Z., Li F., Ng M.K.: A variational approach for image decolorization by variance maximization, SIAM J. Imaging Sci., 7, 2 (2014), 944-968.

[JN15] Jin Z., Ng M.K.: A contrast maximization method for color-to-grayscale conversion. Multidimentional Systems and Signal Processing, 26, 3 (2015), 869-877.

[KAC11] Kuk, J.G., Ahn, J.H., Cho, N.I.: A color to grayscale conversion considering local and global contrast. Computer Vision -ACCV 2010. Springer (2011), 513-524.

[KC12] Kanan C., Cottrell G.W.: Color-to-grayscale: Does the method matter in image recognition? PLOS ONE, 7, 1 (Jan. 2012), 1–7.

[KBK07] Krikor L.Z., Baba S.E.I., Krikor M.Z.: Palette-based image segmentation using HSL space. Journal of Digital Information Management, 5, 1 (2007), 8-11.

[Kin11] Kingdom F.A.A.: Lightness, brightness and transparency: A quarter century of new ideas, captivating demonstrations and unrelenting controversy. Vision Research 51 (2011), 652–673.

[KJDL09] Kim Y., Jang C., Demouth J., Lee S.: Robust color-to gray via nonlinear global mapping. ACM Transactions on Graphics (TOG), 28, 5 (2009).

[KK12] Kaur A., Kranthi B.V.: Comparison between YCbCr color space and CIELab color space for skin color segmentation. International Journal of Applied Information Systems (IJAIS), 3, 4 (July 2012), 30-33.

[KKD16] Kumar S.N., Kannan R.G., Dhivya R.: Objective quality assessment for color-to-gray images using FOM. International Research Journal of Engineering and Technology (IRJET), 3, 4 (Apr. 2016), 825-829.

[KOF08] Kuhn G.R., Oliveira M.M., Fernandes L.A.F.: An improved contrast enhancing approach for color-to-grayscale mappings. Visual Comput. 24 (2008), 505-514.

[LCYC15] Lee B., Choi J., Yun K., Choi J.Y.: Gradient preserving RGB-to-gray conversion using random forest. Image Processing (ICIP), 2015 IEEE International Conference on. IEEE, (2015).

[LI11] Lim W.H., Isa N.A.M.: Color to grayscale conversion based on neighborhood pixels effect approach for digital image. 7th International Conference on Electrical and Electronics Engineering (ELECO 2011), Bursa, TURKEY (1-4 December 2011), 157-161.

[LI12] Lim W.H., Isa N.A.M.: A novel adaptive color to grayscale conversion algorithm for digital images. Scientific Research and Essays, 7, 30 (August, 2012), 2718-2730.

[LKS10] Lee T.-H., Kim B.-K., Song W.-J.: Converting color images to grayscale images by reducing dimensions. Optical Engineering 49, 5 (May 2010), 057006-1-7.

[LL13] Liu C.-W., Liu T.-L.: A sparse linear model for saliency-guided decolorization. 2013 IEEE International Conference on Image Processing. IEEE (2013).

[LLWD16] Lu H.-y., Liu Q.-g., Wang Y.-h., Deng X.-h.: A two-stage parametric subspace model for efficient contrast-preserving decolorization. Frontiers of Information Technology & Electronic Engineering (FITEE), (2016), 1-9.

[LLWL16] Liu Q., Liu P.X., Wang Y., Leung H.: Semi-parametric decolorization with Laplacian-based perceptual quality metric. IEEE Transactions on Circuits and Systems for Video Technology 99 (2016).

[LLXWL15] Liu Q., Liu P.X., Xie W., Wang Y., Liang D.: GcsDecolor: Gradient correlation similarity for efficient contrast preserving decolorization, IEEE Trans. Image Process., 24, 9 (2015), 2889-2904.

[LP09] Lu J., Plataniotis K.N.: On Conversion from Color to Gray-scale Images for Face Detection. 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE (2009), 114-119.

[LU12] Lissner I., Urban P.: Toward a unified color space for perception-based image processing. IEEE Trans. Image Process., 21, 3 (Mar 2012), 1153–1168.

[LXJ12] Lu C., Xu L., Jia J.: Real-time contrast preserving decolorization, ACM Siggraph Asia Technical Berief, (2012).

[LXJ14] Lu C., Xu L., Jia J.: Contrast preserving decolorization with perception-based quality metrics, Int. J. Comput. Vis. (2014).

[MBT15] Meng M., Bao S., Tanaka G.: Color Removal Method Preserving Local Contrast. International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) (November 9-12, 2015), 7-10.

[MFHLLN02] Moroney N., Fairchild M.D., Hunt R.W.G., Li C., Luo M.R., Newman T.: The CIECAM02 color appearance model. Proc. Soc. Imag. Sci. Technol. 10th Color Imag. Conf. (2002), 23–27.

[MMDBBK16] Mukherjee J., Maitra I.K., Dey K.N., Bandyopadhyay S.K., Bhattacharyya D., Kim T.-H.: Grayscale Conversion of Histopathological Slide Images as a Preprocessing Step for Image Segmentation. Int. Journal of Software Engineering and Its Applications, 10, 1 (2016), 15-26.

[MZT06] Mao K.Z., Zhao P., Tan P.-H.: Supervised Learning-Based Cell Image Segmentation for P53 Immunohistochemistry. IEEE Transactions on Biomedical Engineering, 53, 6 (June 2006), 1153-1163.

[MZZW15] Ma K., Zhao T., Zeng K., Wang Z.: Objective quality assessment for color-to-gray image conversion. IEEE Transactions on Image Processing, 24, 12 (December 2015), 4673-4685.

[NCN07] Neumann L., Cadík M., Nemcsics A.: An efficient perception-based adaptive color to gray transformation. Proceedings of Computational Aesthetics in Graphics, Visualization, and Imaging (Banff, Canada, 2007) 73– 80.

[NH15] Nguyen C.T., Havlicek J.P.: Color to grayscale image conversion using modulation domain quadratic programming, Image Processing, IEEE International Conference on (ICIP) (2015), 4580-4584.

[NK06] Nikolaev D., Karpenko S.: Color-to-grayscale image transformation preserving the gradient structure. Proceedings 20th European Conference on Modelling and Simulation (ECMS), W. Borutzky, A. Orsoni, R. Zobel Edts. (2006).

[PMSJ10] Phansalkar N., More S., Sabale A., Joshi M.: Adaptive local thresholding for detection of nuclei in diversely stained cytology images. Communications and Signal Processing (ICCSP), 2011 International Conference on. IEEE, (2010), 218-220.

[PP13] Papamarkou I., Papamarkos N.: On gray-level conversion of color documents. Int. Conf. on Document Analysis and Recognition (ICDAR-2013) Washington, DC (Aug. 25-28, 2013).

[PV00] Plataniotis K.N., Vanetsanopoulos A.N.: Color image processing and applications. Springer-Verlag (2000).

[QB06] de Queiroz R.L., Braun K.M.: Color to Gray and Back: Color Embedding Into Textured Gray Images. IEEE Transactions on Image Processing, 15, 6 (June 2006), 1464-1470.

[QFQ08] Qiu M., Finlayson G., Qiu G.: Contrast maximizing and brightness preserving color to grayscale image conversion. Proceedings of the 4th European Conference on Colour in Graphics, Imaging, and Vision, Curran, Norwich, UK (2008), 347–351.

[Rez04] Reza A.M.: Realization of the Contrast Limited Adaptive Histogram Equalization (CLAHE) for Real-Time Image Enhancement. Journal of VLSI Signal Processing, 38 (2004), 35–44.

[Ras05] Rasche K.: Re-coloring images for gamuts of lower dimension. PhD Thesis, Graduate School of Clemson University, 2005.

[RGW05a] Rasche K., Geist R., Westall J.: Detail preserving reproduction of color images for monochromats and dichromats. IEEE Comput. Graph. Appl. 25, 3 (2005).

[RGW05b] Rasche K., Geist R., Westall J.: Re-coloring images for gamuts of lower dimension. Computer Graphics Forum, 24 (September 2005), 423–432.

[Sar10] Saravanan C.: Color image to grayscale image conversion. Second International Conference on Computer Engineering and Applications (2010).

[SBXY13] Song Y., Bao L., Xu X., Yang Q.: Decolorization: Is rgb2gray( ) out? Proceedings of Computer Graphics (SIGGRAPH-2013) Asia, Hong Kong (Nov. 19-22, 2013).

[SGS16] Sowmya V., Govind D., Soman K.P.: Significance of incorporating chrominance information for effective color-to-grayscale image conversion. Signal, Image and Video Processing (SIViP) (2016), 1-8.

[SK13] Seo J.W., Kim S.D.: Novel PCA-based color-to-gray image conversion. ICIP (2013), 2279-2283.

[SL05] Shih P., Liu C.: Comparative assessment of content-based face image retrieval in different color spaces. International Journal of Pattern Recognition and Artificial Intelligence, 19, 7(2005), 873-893.

[SLTM08] Smith K., Landes P.-E., Thollot J., Myszkowski K.: Apparent greyscale: A simple and fast conversion to perceptually accurate images and video. Comp. Graph. Forum, 27, 2 (Proc. Eurographics 2008), 193–200.

[SMI05] Subr K., Majumder A., Irani S.: Greedy algorithm for local contrast enhancement of images. Proceedings of the 13th International Conference on Image Analysis and Processing, (2005).

[STCBY13] Song M., Tao D., Chen C., Bu J., Yang Y.: Color-to-gray based on chance of happening preservation, Neurocomputing, 119 (2013), 222-231.

[STCLC10] Song M., Tao D., Chen C., Li X., Chen C.W.: Color to gray: Visual cue preservation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 32, 9 (2010), 1537–1552.

[TB05] Tzeng D.-Y., Berns R.S.: A review of principal component analysis and its applications to color technology. COLOR Research and Application, 30, 2 (April 2005), 84-98.

[TB14] Tigora A., Boiangiu C.A.: Image color reduction using iterative refinement. Int. Journal of Mathematical Models and Methods in Applied Sciences, 8 (2014), 203-207.

[TSU08] Tanaka G., Suetake N., Uchino E.: Fast Color Removal Method Considering Differences between Colors. Fourth Int. Workshop on Computational Intelligence & Applications, IEEE SMC, Hiroshima Chapter, Hiroshima University, Japan (Dec. 10 & 11, 2008).

[TT16] Tyagi R., Tomar G.S.: Transformation of Image from Color to Gray Scale Using Contrast among DPCM and LMS Method. International Journal of Signal Processing, Image Processing and Pattern Recognition 9.8 (2016), 11-24.

[TU07] Thevenaz P., Unser M.: User-friendly semiautomated assembly of accurate image mosaics in microscopy. Microscopy Research and Technique, 70 (2007), 135-146.

[VV15] Vertan C., Voicea G.: A comparison of decolorization methods performance for medical color images. The 5th IEEE International Conference on E-Health and Bioengineering - EHB 2015, Grigore T. Popa University of Medicine and Pharmacy, Iasi, Romania (November 19-21, 2015).

[WSL12] Wu J., Shen X., Liu L.: Interactive two-scale color-to-gray, Vis. Comput., 28 (2012), 723-731.

[WR14] Wu Z., Robinson J.: Edge-preserving colour-to-greyscale conversion. IET Image Processing, 8, 4 (2014) 252–260.

[WT14] Wu T., Toet A.: Color-to-grayscale conversion through weighted multiresolution channel fusion, Journal of Electronic Imaging, 23, 4 (Jul/Aug 2014), 043004-1-06.

[XLZZ14] Xiang Y., Long H., Zhang X., Zou B.: Image decolorizing by quantile-base distribution analysis. Journal of Information & Computational Science, 11, 1 (2014), 35-43.

[YBW16] You S., Barnes N., Walker J.: Perceptually consistent color-to-gray image conversion, arXiv preprint arXiv: 1605.01843 (2016).

[YLL15] Yoo M.-J., Lee I.-K., Lee S.: Color Sequence Preserving Decolorization. EUROGRAPHICS, 34, 2 (2015), 373-383.

[YSBCJ10] Yang Y., Song M., Bu J., Chen C., C. Jin C.: Color to Gray: Attention Preservation. Fourth Pacific-Rim Symposium on Image and Video Technology, (2010), 337-342.

[ZHL13] Zhu W., Hu R., Liu L.: Grey conversion via perceived-contrast. Vis. Comput., Springer (29 May 2013).

[ZSM14] Zhou M., Sheng B., Ma L.: Saliency preserving decolorization. 2014 IEEE International Conference on Multimedia and Expo (ICME). IEEE (2014), 1-6.

[ZT10] Zhao Y., Tamimi Z.: Spectral image decolorization. 6th International Symposium Advances in Visual Computing (ISVC), Lectures Notes in Computer Science, Vol. 6454, Las Vegas, NV, USA, (Nov. 29 – Dec. 1, 2010), 747-756.


Color-to-grayscale conversions of the isoluminant synthetic image shown on Figure 26, using other methods.

illustration not visible in this excerpt


[i] The * after L, a, and b are part of the full name, since they are derived from L, a, and b in the earlier version of the CIELAB.

64 of 64 pages


Image Color Dimension Reduction. A comparative study of state-of-the-art methods
Catalog Number
ISBN (Book)
File size
4482 KB
image, color, dimension, reduction
Quote paper
Rubén Orozco-Morales (Author), 2016, Image Color Dimension Reduction. A comparative study of state-of-the-art methods, Munich, GRIN Verlag,


  • No comments yet.
Read the ebook
Title: Image Color Dimension Reduction. A comparative study of state-of-the-art methods

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free