The aim of this paper is to increase the accuracy of an architecture, a convolutional neural network in the TensorFlow machine learning library, in a scenario where training data is limited and small. By pre-processing the image data using color quantization with the help of K-means clustering, and histogram equalization. It is expected that the prediction accuracy will increase more than the raw training data in the usage of both pre-processing methods.

Abstract—This paper presents the effects of pre-processed image based training data on Convolutional Neural Network’s (CNN’s) training accuracy where training dataset is small and insufficient. The goal of this research is to discover whether or not a convolutional neural network can perform better with a small quantity of image dataset that are pre-processed using color quantization through K-means clustering or histogram equalization.

Keywords— image pre-processing, k-means clustering, histogram equalization, convolutional neural networks, tensorflow, image classification;

I. Introduction

In the modern world, machine learning algorithms plays a vital role to solve big problems in a smart and efficient way. These algorithms solves huge problems by analyzing a set of data. From stock market prediction¹ to smartphone assistants², the application of machine learning algorithms are seen everywhere.

Machine learning algorithms depends on datasets to train their inner workings and produce reliable outputs. For example, a neural network, an old yet sophisticated learning algorithm, can be trained to classify pictures of cats and dogs by feeding it pictures of cats and dogs in the input layers and mentioning which one is which. In this case, the pictures of cats and dogs are datasets.

Machine learning algorithms can recognize patterns among anything given a good amount of dataset. It can solve medical problems where it would take countless hours for a regular human being, it can solve large scientific equations, and even drive cars. However, although these machine learning algorithms are widely available and free for use such as google’s TensorFlow³, the availability of good datasets are not as common. Without good datasets, these architectures are not capable of giving high accuracy level based predictions. Scientifically accurate datasets are hard to come by and even more expensive to collect. Therefore, an intuitive solution has to be conceived that can offer high accuracy on machine learning architectures without the need of big dataset.

K-means clustering based color quantization is expected to provide higher accuracy on the CNN because it will reduce the dimensionality of the image and represent the image with only 4,8, and 16 colors respectively instead of using thousands of colors in the RGB color space. Similarly, histogram equalization is expected to help the CNN provide a higher accuracy because histogram equalization will increase the contrast level of the image. As a result of this contrast stretching, the image will show artifacts and other useful features more prominently that the CNN can pick up more easily and therefore use it to train the network with increased accuracy.

The TensorFlow convolutional neural network that will be used is designed to classify images and is known as Inception. Inception has been trained with thousands of images over a course of multiple weeks with high powered GPUs by the engineers at Google. This research will explore re-training⁴ the last layer of Inception with a dataset of High Resolution Fundus images obtained from the Department of Computer Science, Friedrich-Alexander-Universitat⁵ so that it can classify healthy fundus, fundus with diabetic retinopathy, and glaucoma. The dataset has 3 different classes. class I has healthy fundus imaging, class II has diabetic retinopathy, class III has glaucoma. After the re-training process of Inception, with the raw images, the accuracy will be measured. Such measurements will also be taken by using the pre-processed images as training datasets. Finally the results will be compared to observe the effects of pre-processed training data.

II. Theory

A. Color Quantization

Color quantization is a process by which a color image is represented with fewer colors compared to the original one⁶. Color image quantization is a lossy process in which the rate of loss depends on the number of colors of the new image.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1:Color quantization before and after

B. K-means Clustering

An unsupervised learning algorithm by which a large set of data is represented in K amount of clusters⁷. The K-means algorithm works in four simple steps.

a. First the algorithm places K points into the dataspace.
b. The algorithm assigns each data point to its nearest K point.
c. When all data points have been assigned, the algorithm recalculates the positions of the K points to make sure each K point has the same amount of data points assigned to them.
d. Step c is continued until the position of K stops moving.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2: Movement of K-points on K-means Clustering

C. Convolutional Neural Networks

A convolutional neural network is a type of neural network with one or more convolutional units⁸.

D. Fundus Imaging

A type of medical imaging that uses a fundus camera to capture color images of the interior surface of the eye⁹.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2: Fundus imaging of healthy specimen

E. TensorFlow

TensorFlow is an open source library by Google that can perform numerical computations on data³. TensorFlow library has easy to use machine learning algorithms that are open for all.

F. Histogram Equalization

Histogram equalization is a process where the histogram of the image is equalized so that all intensity values are evenly distributed across the entire histogram.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3: Histogram Equalization, before and after

III. Existing Research

Digital image processing has been used to in the past to improve the accuracy of Optical Character Recognition¹¹. In the field of medical imagery, the use of image processing has been explored to segment background and noise on retinal images¹². However, there has been no research on the c orrelation between image preprocessing on a small dataset and its effect on convolutional neural network’s training accuracy.

IV. Novelty of This Research

Gathering huge datasets to train machine learning algorithms is a time consuming and expensive process. It is very challenging to design and test good machine learning architectures without good datasets. Good datasets, such as the high resolution fundus imaging discussed in this research paper are very hard to come by and takes years to accumulate. This creates a problem where the quantity of the data is not enough to train a reliable machine learning architecture. According to an article in Sensors Magazine, the lack of high quality training data is preventing huge AI expansions¹⁰.This research aims to solve this issue where even a small number of training data (60 images per class in the fundus dataset) can be pre-processed and then fed into the machine learning algorithms to output a good accuracy based prediction compared to using just the raw images to train the architecture.

V. Methodology

The research is split into 4 steps. The first step is to create augmented data from the original dataset of 15 images per class of the collected high resolution retinal fundus images. The second step is to preprocess the newly formed dataset consisting of the augmented data and the original images using the K-means clustering algorithm to reduce dimensionality. The third step is to take the dataset again and apply histogram equalization on it instead of the K-means clustering algorithm like second step. The final step is using the datasets collected after step one, two, and three and using them separately to re-train the inception model of the TensorFlow library. In this step, TensorFlow will return training accuracy for each dataset and those accuracy values will be used to estimate the effects of preprocessing.

A. Creating Augmented Data

The high resolution fundus images collected from Friedrich-Alexander-Universitat⁵ only has 45 images, 15 images per class. 15 images for healthy fundus, 15 images for fundus with diabetic retinopathy, and 15 images for fundus affected by glaucoma. The 15 images per class are used to create augmented data following the methods below:

1. At first each of the 15 image per class are rotated 90 degrees and added to the dataset.

Abbildung in dieser Leseprobe nicht enthalten

Figure 4: Fundus imaging of healthy specimen - rotated 90 degrees

2. Next the images are rotated 180 degrees and added again.

Abbildung in dieser Leseprobe nicht enthalten

Figure 5: Fundus imaging of healthy specimen - rotated 180 degrees

3. Finally, they are rotated 270 degrees and added to the dataset.

Abbildung in dieser Leseprobe nicht enthalten

Figure 6: Fundus imaging of healthy specimen - rotated 270 degrees

4. The total image count on the new dataset is now 180 with 60 images per class. 15 images per class that are original, 15 images that are rotated 90 degrees, 15 images that are rotated 180 degrees, and 15 images that are rotated 270 degrees. This entire dataset is now named ‘raw’.

B. Applying K-Means Clustering

K-mean clustering is applied on the newly formed dataset ‘raw’. K-mean is applied three times with 3 different K values to create three new datasets. K-mean clustering was applied using Matlab’s default kmeans() function which was set to run for 150 iterations at max.

1. First, K-means clustering is applied for K=4. The output dataset images are represented with only 4 colors. The output dataset is given the name ‘k means 4’.

Figure 7: Fundus imaging of healthy specimen represented with 4 colors

Next, K-means clustering is applied for K=8. The output dataset images are represented with 8 colors. The output dataset is given the name ‘k_means_8’.

Abbildung in dieser Leseprobe nicht enthalten

Figure 8: Fundus imaging of healthy specimen represented with 8 colors

3. Lastly, K-means clustering is applied for K=16. The output dataset images are represented with 16 colors. The output dataset is given the name ‘k_means_16’.

Abbildung in dieser Leseprobe nicht enthalten

Figure 9: Fundus imaging of healthy specimen represented with 16 colors

C. Applying Histogram Equalization

The ‘raw’ dataset is now histogram equalized to generate two new sets of training datasets.

1. First, Each ‘raw’ dataset training image is represented by the HSV color model. Then histogram equalization is performed on the V channel of the image, which is the intensity or luminosity value for each pixel on the image. The resulting dataset is now called ‘hist_eq’.

Abbildung in dieser Leseprobe nicht enthalten

Figure 10: Fundus imaging of healthy specimen after histogram equalization on the luminance channel

2. Next, Each ‘raw’ dataset training image is represented by the RGB color model. Histogram equalization is performed on all 3 R,G,B channels. The output dataset is given the name ‘hist_eq_rgb’.

Abbildung in dieser Leseprobe nicht enthalten

Figure 11: Fundus imaging of healthy specimen after histogram equalization on the R, G, and B channels separately

D. Re-training Convolutional Neural Network

In the final step, TensorFlow’s image recognition convolutional neural network, inception was re-trained in its last layer using the datasets found on step A-C. The models were trained with a max iteration of 500. Each training resulted in a different accuracy level outputting different models.

1. The first training was done with the ‘raw’ dataset and the training accuracy returned by the network was noted.

2. The second training was done with the ‘k_means_4’ dataset and the training accuracy was once again noted.

3. The third training was done with the ‘k_means_8’ dataset and the accuracy level for training was noted again.

4. For the fourth training ‘k_means_16’ dataset was used and training accuracy was measured.

5. The fifth and final training was done using the ‘hist_eq’ dataset and the training accuracy was measured for a final time.

VI. Result and Comparison

After training the network with the first ‘raw’ dataset, the following plot was generated for visualizing accuracy by the TensorFlow data visualization toolbox TensorBoard.

Abbildung in dieser Leseprobe nicht enthalten

Figure 12: ‘raw’ dataset training accuracy plot

On the X axis we have iteration count for the learning process, and on the y axis we have the accuracy level. The orange curve represents the training accuracy while the blue curve represents the validation accuracy. Overall, after completion of the training the network with the ‘raw’ dataset a Final test accuracy of 70.0% was returned by TensorFlow.

Next, after training the network with the ‘k_means_4’ dataset the following accuracy plot was displayed on TensorBoard.

Abbildung in dieser Leseprobe nicht enthalten

Figure 13: ‘k_means_4’ dataset training accuracy plot

At the end of the training process with the ‘k_means_4’ dataset a final test accuracy of 64.3% was returned.

After training the network with the ‘k_means_8’ and ‘k_means_16’ two more plots were generated which are shown below.

A final accuracy of 72.7% and 71.4% was returned after both trainings respectively.

The network was then trained on the ‘hist_eq’ dataset and the following accuracy plot was generated on TensorBoard.

Abbildung in dieser Leseprobe nicht enthalten

Figure 16: ‘hist_eq’ dataset training accuracy plot

The final accuracy returned after training with ‘hist_eq’ was 88.9%.

The final accuracy returned after training with ‘hist_eq_rgb’ was 66.7%.

Abbildung in dieser Leseprobe nicht enthalten

Figure 17: ‘hist_eq_rgb’ dataset training accuracy plot

The final accuracy returned after training with ‘hist_eq_rgb’ was 66.7%.

A bar chart showing each accuracy level is included below to compare the effects of different preprocessing techniques on the accuracy of the convolutional neural network.

Abbildung in dieser Leseprobe nicht enthalten

Figure 18: comparison of accuracy levels for different dataset

Looking at the chart above, it is safe to assume that histogram equalization to stretch the luminance of the image dataset has more positive impact on the convolutional neural network’s training accuracy than the other preprocessing methods used in this paper. The luminance histogram equalized training dataset images resulted in a 18.9% accuracy boost over the raw dataset images. On the other hand, the individual R,G,B channel histogram equalization operation, and the K-mean clustering based color quantized datasets had little to no impact on the accuracy level. Although in theory dimension reduction should have caused the convolutional neural network to give better accuracy than raw dataset.

VII. Conclusion and future implementations

The findings of this research proves that it is possible to increase the accuracy level of a convolutional neural network by doing pre-processing on the image dataset. It is possible to receive around 90% accuracy level by doing preprocessing on images such as histogram equalization on the luminance channel of the HSL or value channel of HSV model even though the image dataset may contain a very few training examples. It can be theorized that the K-means clustering did not offer a big shift in the accuracy level because clustering the raw images resulted in a big loss of data on the image that would have been otherwise used as a feature by the CNN. In future implementations, it would be interesting to see the effects of preprocessing by applying histogram equalization first on the luminance channel and then k-means clustering on the training dataset. It can be expected that this new combination would result in a much higher accuracy than just histogram equalization alone.

References

¹ Mims, C. (2010, June 10). AI That Picks Stocks Better Than The Pros. MIT Technology Review. Retrieved from https://www.technologyreview.com/s/419341/ai-that-picks-stocks-better -than-the-pros/

² Pham, S. (2017, March 21). Samsung's new AI assistant will take on Siri and Alexa. CNN tech. Retrieved from http://monev.cnn.com/2017/03/21/technologv/samsung-bixbv-ai-s8-note -7-apple-google/

³ An open-source software library for Machine Intelligence. 2017. In tensorflow.org/. Retrieved August 20, 2017, from https://www.tensorflow.org/

⁴ How to Retrain Inception's Final Layer for New Categories, 2017. In tensorflow.org. Retrieved August 20, 2017, from https://www.tensorflow.org/tutorials/image retraining

⁵ High-Resolution Fundus (HRF) Image Database, 2017. In www5.cs.fau.de/ Retrieved August 20, 2017, from https://www5.cs.fau.de/research/data/fundus-images/

⁶ Color Quantization, In https://rosettacode.org/ . Retrieved August 20, 2017, from https://rosettacode.org/wiki/Color quantization

⁷ A Tutorial On Clustering Algorithms, In https://home.deib.polimi.it . Retrieved August 20, 2017, from https://home.deib.polimi.it/matteucc/Clustering/tutorial html/kmeans.html

⁸ Convolutional Neural Network, In stanford.edu . Retrieved August 20, 2017, from http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwor k/

⁹ Color Fundus Photography, In med.ubc.ca . Retrieved August 20, 2017, from http://ophthalmologv.med.ubc.ca/patient-care/ophthalmic-photographv/c olor-fundus-photography/

¹⁰ Dirjish, M (2017, April 20), Lack Of High-Quality Training Data Impedes AI Advances. Retrieved from http://www.sensorsmag.com/embedded/lack-high-auality-training-data-i mpedes-ai-advances

¹¹ Bieniecki W. , Grabowski S. , Rozenberg W. , Image Preprocessing for Improving OCR Accuracy, Retrieved from http://ieeexplore.ieee.org/abstract/document/4283429/authors

Frequently asked questions

What is the main goal of the research presented in this paper?

The main goal is to determine if pre-processing image data (specifically high resolution fundus images) using color quantization (K-means clustering) or histogram equalization can improve the training accuracy of a Convolutional Neural Network (CNN), particularly when the training dataset is small and insufficient.

What pre-processing techniques are used in the research?

The research uses two main image pre-processing techniques:

Color Quantization using K-means Clustering: This reduces the number of colors in the image to 4, 8, and 16 respectively, effectively reducing dimensionality.
Histogram Equalization: This increases the contrast level of the image by stretching the histogram, making features more prominent.

Why are these pre-processing techniques believed to be helpful?

It is hypothesized that:

K-means clustering reduces dimensionality, potentially making it easier for the CNN to identify important features.
Histogram equalization enhances contrast, making artifacts and other features more visible and thus easier for the CNN to learn.

What Convolutional Neural Network (CNN) architecture is used?

The research uses the Inception model from the TensorFlow machine learning library. The last layer of the Inception model is re-trained using the pre-processed fundus images.

What is the source of the image data used in the research?

The image data consists of High Resolution Fundus images obtained from the Department of Computer Science, Friedrich-Alexander-Universitat. These images represent three classes: healthy fundus, fundus with diabetic retinopathy, and fundus with glaucoma.

How is the data augmented in the research?

To increase the size of the small dataset, the original images are rotated by 90, 180, and 270 degrees, creating augmented versions. The dataset of original 15 images per class becomes 60 images per class. These augmented images are used in the ‘raw’ dataset and as the basis for the pre-processed datasets.

What datasets are created and tested?

Several datasets are created and used for training the CNN:

Raw: Augmented images without pre-processing
K_means_4: Images pre-processed using K-means clustering with K=4 (4 colors)
K_means_8: Images pre-processed using K-means clustering with K=8 (8 colors)
K_means_16: Images pre-processed using K-means clustering with K=16 (16 colors)
Hist_eq: Images pre-processed using histogram equalization on the luminance (V) channel of the HSV color model
Hist_eq_rgb: Images pre-processed using histogram equalization on the R, G, and B channels of the RGB color model separately

What are the key results of the research?

The research found that histogram equalization on the luminance channel ('hist_eq' dataset) significantly improved the training accuracy of the CNN (88.9%) compared to the raw dataset (70.0%). The K-means clustering and R,G,B histogram equalization datasets did not show significant improvement over the 'raw' data.

What is the conclusion of the research?

The research concludes that pre-processing image data can increase the accuracy of a CNN, even with a small training dataset. Specifically, histogram equalization on the luminance channel of fundus images proved to be an effective technique. The results suggest that k-means clustering resulted in significant loss of feature data.

What future implementations are suggested?

The paper suggests exploring the effects of applying histogram equalization first, followed by K-means clustering on the training dataset, which could potentially lead to even higher accuracy.

Effects of Pre-Processed Training Data on Convolutional Neural Network’s Training Accuracy where Training Dataset is Small

Leseprobe

I. Introduction

II. Theory

A. Color Quantization

B. K-means Clustering

C. Convolutional Neural Networks

D. Fundus Imaging

E. TensorFlow

F. Histogram Equalization

III. Existing Research

IV. Novelty of This Research

V. Methodology

A. Creating Augmented Data

B. Applying K-Means Clustering

C. Applying Histogram Equalization

D. Re-training Convolutional Neural Network

VI. Result and Comparison

VII. Conclusion and future implementations

References

Frequently asked questions

What is the main goal of the research presented in this paper?

What pre-processing techniques are used in the research?

Why are these pre-processing techniques believed to be helpful?

What Convolutional Neural Network (CNN) architecture is used?

What is the source of the image data used in the research?

How is the data augmented in the research?

What datasets are created and tested?

What are the key results of the research?

What is the conclusion of the research?

What future implementations are suggested?

Jetzt kaufen

Details