Feature Extraction and Different Classifiers Applied for Detection of Abnormalities in Computer Tomography (CT) Images

Scientific Study, 2014

34 Pages









This chapter introduces the basic concepts of selecting Region of Interest (ROI), feature extraction and different classifiers applied for detection of abnormalities in Computer Tomography (CT) images. Selecting the region of interest in general cover the preprocessing procedure which involves the better classifier performance.


Computer aided medical diagnosis is a continuously growing field of research. Image based medical diagnosis techniques mostly relay on proper extraction of features from the input images and their subsequent classification. One of the main difficulties faced by this field is the requirement of huge memory space to store the medical image and computational time needed to process the data. The main objective of feature extraction is the automatic extraction of features from the input, to represent it in a unique and compact form of a single value or matrix vector. Though there are many techniques available in medical image processing and classification, the most prominent method for analysis is that in the wavelet domain. Many image processing applications uses wavelet transforms for data analysis which is an advanced technique in signal and image analysis. This is introduced as an alternative to short term Fourier transforms which suffers from issues related to frequency and time resolution properties.


Abnormality detection using classifiers is one of the recent research areas where much importance is given. It is one of the critical issues where excessive care needs to be taken for better diagnosis. An input image may contain excessive information either wanted or unwanted which depends upon the problem formulation. The problem in this project is to analyze the performance of the classifier in terms of its efficiency in detecting abnormalities in medical images. Any classifier needs to detect the carcinogenesis with respect to the efficiency in time of detection and performance. Here two classifiers are selected namely Singular Value Decomposition (SVD), and Principle Component Analysis (PCA). Both the SVD and PCA are applied for dual class classification procedure. The performance analysis of all these classifiers are analyzed using the classifier performance measures like, Sensitivity, Selectivity, Average Detection, Perfect Classification, Missed Classification, False Alarm, F-score and Quality Metrics. Here CT images of brain and skull are used for analysis. Two sets of 30 images are taken which contain both normal and abnormal ones. Fig 1.1 shows the general architecture of the image classifier system.

illustration not visible in this excerpt

Figure.1 General Architecture of the Image Classifier System


Selection of ROI is an essential preprocessing stage when it comes to abnormality detection. In cancer detection process efficiency of detection is purely based on the region of interest selection. This stage can improve the efficiency and performance of the feature extraction as well as classification stages, as only the required regions are taken for further processing.


Maximizing the joint dependency with a minimum size of variables is generally the main task of feature selection. For obtaining a minimal subset, while trying to maximize the joint dependency with the target variable, the redundancy among selected variables must be reduced to a minimum. Feature selection is one of the most crucial steps of many pattern recognition and artificial intelligence problems. There are two general approaches to feature selection: filters and wrappers. Filter type methods are essentially data pre-processing or data filtering methods. Features are selected based on the intrinsic characteristics which determine their relevance or discriminant powers with regard to the target classes.

In wrapper type methods, feature selection is "wrapped" around a learning method: the usefulness of a feature is directly judged by the estimated accuracy of the learning method. One can often obtain a set with a small number of non-redundant features, which gives high prediction accuracy, because the characteristics of the features match well with the characteristics of the learning method. Wrapper methods typically require extensive computation to search the best features. Wrapper type methods are used here in this project. In this project seven features like mean, variance, entropy and wavelet approximation coefficients at 4-levels are used.


Classification of selected features is the next stage in the process. Here we use Singular Value Decomposition (SVD) and Principle Component Analysis (PCA) for the classification purpose. SVD and PCA are common techniques for analysis of multivariate data. PCA is a multivariate statistical technique frequently used in exploratory data analysis and for making predictive models.


The evaluation of the classifier performance is done by using various performance measures like Sensitivity, Selectivity, Average Detection, Perfect Classification, Missed Classification, False Alarm and F-score. The Sensitivity and Specificity specifies the ability of the classifier to classify the data when a correct input is given and the ability of the classifier to classify the objects when a wrong input is given. Perfect Classification shows the number of perfectly classified data and missed classification denotes the number of wrongly classified data. Average detection shows the average performance with respect to the correctly classified data. These measures help in evaluating the performance of the above mentioned classifiers in cancer detection. Some require considerably more computation or memory than others. Some require a substantial number of training instances to give reliable results. Depending on the situation the user may be willing to accept a lower level of predictive accuracy in order to reduce the run time/memory requirements and/or the number of training instances needed. A more difficult trade-off occurs when the classes are severely unbalanced. The Evaluation measures help in finding the performance of the classifier, so that a specified classifier would be used for a specific problem.


The Organization of thesis is as follows: Chapter1 already explained introduces the methodologies used in the project based on the problem chosen. Selecting the Region of Interest is described in chapter2 and the features extracted from the selected regions of interest are discussed in Chapter3.The abnormality detection is done by using classification procedure. The classification using SVD and PCA are discussed in Chapter 4. Better the performance better the classification procedure. The Performance Measures and Quality Metrics used to measure the classifier performance are compared and analyzed in Chapter5. Chapter 6 concludes the project and briefs the future scope.



A region of interest (ROI) is a portion of an image that is to be filtered or perform some other operation on. Usually ROI is defined by creating a binary mask, which is a binary image that is the same size as the image you want to process with pixels that define the ROI set to 1 and all other pixels set to 0 .

illustration not visible in this excerpt

Figure.2 Brain image as input


Figure.2 shows an input image of brain used in this analysis. As a preprocessing stage this image is divided into four blocks. Then proper mask are assigned to produce the ROI such that only the required region of the input image is taken for analysis. Among the four blocks, the first and third blocks which constitute the left half of the image are taken for further analysis.

illustration not visible in this excerpt

Figure.3.Block-1 of input image

illustration not visible in this excerpt

Figure.4.Block-2 of input image

illustration not visible in this excerpt

Figure.5.Block-3 of input image

illustration not visible in this excerpt

Figure.6.Block-4 of input image

Abbildung in dieser Leseprobe nicht enthalten

Figure.7 (a). Different ROIs of the input image.


The 1st and 3rd blocks constitute the left half of the input image. These ROIs selected and are subjected to further two level block division to form 16 blocks each. This increase the efficiency of classification as the feature extraction and the analysis of the image can be carried out more deeply. The 32 blocks formed are now used in the feature extraction stage.


Selecting proper regions of interest and subdividing those regions improves the overall efficiency of feature extraction and classification. The more detail the analysis, the more efficient is the classification.



In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction. When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (e.g. the same measurement in both feet and meters) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.

Maximizing the joint dependency with a minimum size of variables is generally the main task of feature selection. For obtaining a minimal subset, while trying to maximize the joint dependency with the target variable, the redundancy among selected variables must be reduced to a minimum. In order to maximize the classification accuracy various types of features like extraction techniques like Mean, Variance, Entropy and also 1st ,2nd ,3rd and 4th level approximation coefficients of wavelet decomposition are used in this project. Brief discussion of these techniques is as follows:


A data-based relative frequency distribution by measures of location and spread, such as the sample mean and sample variance are common approaches in feature selection [12,16,23,24]. Likewise, we have seen how to summarize probability distribution of a random variable X by similar measures of location and spread, the mean and variance parameters. For location features of a joint distribution, we simply use the means and of the corresponding marginal distributions for X and Y. Likewise, for spread features we use and . For joint distributions, however, we can go further and explore a further type of feature: the manner in which X and Y are interrelated or manifest dependence.

One way that X and Y can exhibit dependence is to “vary together” – i.e., the distribution p(x, y) might attach relatively high probability to pairs (x, y) for which the deviation of x above its mean, and the deviation of y above its mean, , are either both positive or both negative and relatively large in magnitude. Thus, for example, the information that a pair (x, y) had an x with positive deviation would suggest that, unless something unusual had occurred, the y of the given pair also had a positive deviation above its mean. A natural numerical measure which takes account of this type of information is the sum of terms, (3.1)

For the kind of dependence just described, this sum would tend to be dominated by large positive terms. Another way that X and Y could exhibit dependence is to “vary oppositely,” in which case pairs (x, y) such that one of and is positive and the other negative would receive relatively high probability [29, 30]. In this case the sum would tend to be dominated by negative terms.

Image entropy is a quantity which is used to describe the amount of information which must be coded for by a compression algorithm. Low entropy images, such as those containing a lot of black shades, have very little contrast and large runs of pixels with the same or similar DN values. An image that is perfectly flat will have entropy of zero. Consequently, they can be compressed to a relatively small size. On the other hand, high entropy images such as an image of heavily scattered areas on the medical image have a great deal of contrast from one pixel to the next and consequently cannot be compressed as much as low entropy images. The equation for entropy of an image is as follows: (3.2)


For many images the low-frequency content is the most important part. It is what gives the signal its identity. In wavelet analysis, we often speak of approximations and details. The approximations are the high-scale, low-frequency components of the signal. The details are the low-scale, high-frequency components. The wavelet decomposition process can be iterated, with successive approximations being decomposed in turn, so that one signal is broken down into many lower resolution components. This is called the wavelet decomposition tree.

Since the analysis process is iterative, in theory it can be continued indefinitely. In reality, the decomposition can proceed only until the individual details consist of a single sample or pixel. In practice, you'll select a suitable number of levels based on the nature of the signal, or on a suitable criterion such as entropy. The figure below shows the general decomposition tree of a input signal or image into approximation coefficients and detailed coefficients.

illustration not visible in this excerpt

Figure.7 (b).Wavelet decomposition tree

Features extracted for a CT image of brain is shown in Table3.1.

Table.3.1. Features extracted from a CT image of brain

illustration not visible in this excerpt


Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately. When performing analysis of complex data one of the major problems stems from the number of variables involved. Analysis with a large number of variables generally requires a large amount of memory and computation power or a classification algorithm which over fits the training sample and generalizes poorly to new samples. Feature extraction is a general term for methods of constructing combinations of the variables to get around these problems while still describing the data with sufficient accuracy. In this section the different features like mean, variance, entropy and four levels of approximation coefficients are explained. These seven features are selected for each block of the images which will yield a 32 x7 feature matrix. This is used for classifying the input image in to normal or abnormal.


Excerpt out of 34 pages


Feature Extraction and Different Classifiers Applied for Detection of Abnormalities in Computer Tomography (CT) Images
Catalog Number
ISBN (eBook)
ISBN (Book)
File size
1570 KB
feature, extraction, different, classifiers, applied, detection, abnormalities, computer, tomography, images
Quote paper
Sunil Kumar Prabhakar (Author)Harikumar Rajaguru (Author)Vinoth Kumar Bojan (Author), 2014, Feature Extraction and Different Classifiers Applied for Detection of Abnormalities in Computer Tomography (CT) Images, Munich, GRIN Verlag, https://www.grin.com/document/287937


  • No comments yet.

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free