The tremendous growth of data due to the Internet and electronic commerce has created serious challenges to the researches in pattern recognition. There is a need of processing and analysing data. Advances in data mining and knowledge discovery provide the requirement of new approaches to reduce the data.
In this work, methods are proposed to overcome the computational requirements of the nearest neighbor classifiers. The work shows some of the possible remedies to overcome problems with nearest neighbor based classifiers. The work proposes a new method of reducing the data set size. The author reduces the data set size in terms of number of samples and also in terms of number of features. Therefore, the holistic goal of the work is to reduce the time and space requirements and at the same time not to degrade the performance of the nearest neighbor classifier.
The nearest neighbor classifier is a popular non-parametric classifier used in many fields since the 1950s. It is conceptually a simple classifier and shows good performance.
Table of Contents
1. Introduction
2. Literature Survey
3. OCNN (Prototype Selection)
4. Prototype Selection and Feature Extraction
5. Prototype Selection and Feature Selection
6. Summary, Conclusions and Future Work
Research Objectives and Key Topics
The primary objective of this thesis is to reduce the computational requirements of the Nearest Neighbor Classifier (NNC) by addressing the challenges of high storage and processing time through the reduction of training set cardinality and data dimensionality.
- Nearest Neighbor Classification and its variants (k-NN).
- Curse of dimensionality and its impact on classifier error rates.
- Prototype selection strategies to minimize training set size.
- Feature reduction techniques, including extraction and selection methods.
- Integration of OCNN with feature reduction to achieve dual data optimization.
Excerpt from the Book
3.1 Proposed Methodology
The Proposed method OCNN is based on the intuitive notion of retaining the patterns which are close to decision boundary and ignoring other patterns. Obviously the patterns that are close to decision boundary play the important role in the classification of a query pattern. A decision boundary is the hyperplane that separates the space into regions keeping the aim such that each region contains the patterns which exhibits the similar behaviour. The true decision boundary is unknown by definition. The next best alternative is piecewise linear decision boundary. The piecewise linear decision boundary is shown in Figure 3.1. The decision boundary need not be linear. The Figure 3.1 shows decision boundary for the two class category problem.
The proposed method is explained with the example training set shown in Figure 3.1. There are two classes present in the Figure 3.1. The positive class samples are represented by the sign ’+’ and negative class tuples are represented by the sign ’-’. For example, if a query pattern ’*’ lies on the left side of the boundary decision, then it is classified as ’-’ class otherwise ’+’ class. That means the patterns that are near to decision boundary plays important role in classification. The remaining patterns can be ignored from the training set. This is the central theme of the proposed method. In the example present in the Figure 3.1, two samples from ’+’ class and two samples from ’-’ class which are near to decision boundary are considered as typical patterns when compared to other training patterns because these patterns plays crucial role in classification process. This can be found by using the Other Class Nearest Neighbors algorithm(OCNN) discussed in the next section 3.2.
Chapter Summaries
1. Introduction: This chapter provides an overview of Nearest Neighbor Classification (NNC) and the related fields of Machine Learning, Pattern Recognition, and Data Mining, while outlining the motivation and scope of the research.
2. Literature Survey: This chapter reviews existing literature on NNC, focusing on various prototype selection, prototype generation, and feature reduction techniques.
3. OCNN (Prototype Selection): This chapter introduces the proposed Other Class Nearest Neighbor (OCNN) prototype selection algorithm and presents its experimental validation against existing methods.
4. Prototype Selection and Feature Extraction: This chapter discusses the cascading of the OCNN method with Linear Discriminant Analysis (LDA) to achieve both cardinality and dimensionality reduction.
5. Prototype Selection and Feature Selection: This chapter details the combination of the OCNN method with the Sequential Forward Selection (SFS) algorithm to optimize the feature subset for classification.
6. Summary, Conclusions and Future Work: This chapter concludes the thesis by summarizing the findings and suggesting future research directions for simultaneous data reduction.
Keywords
Nearest Neighbor Classifier, NNC, Prototype Selection, OCNN, Feature Extraction, Feature Selection, Dimensionality Reduction, Machine Learning, Data Mining, Pattern Recognition, LDA, SFS, Classification Accuracy, Pattern Synthesis, Computational Complexity
Frequently Asked Questions
What is the core focus of this research?
The research focuses on overcoming the high computational requirements of the Nearest Neighbor Classifier, specifically addressing the storage and time overheads associated with large training datasets.
What are the primary challenges of the Nearest Neighbor Classifier?
The primary challenges include high storage requirements, sensitivity to noise, and high computational time, especially when dealing with large datasets or high-dimensional features.
What is the main objective of the proposed OCNN method?
The objective of the Other Class Nearest Neighbor (OCNN) method is to perform prototype selection by retaining only the patterns that are close to the decision boundary, thus reducing the training set size without significant loss of accuracy.
What scientific methods are utilized in this thesis?
The work employs prototype selection (OCNN), feature extraction (LDA), and feature selection (SFS) techniques, evaluating them across various benchmark datasets from the UCI Machine Learning Repository.
How is the computational complexity improved?
Computational complexity is improved by reducing both the cardinality of the training set through OCNN and the dimensionality of the feature space through LDA or SFS, thereby decreasing the time required for classification.
What distinguishes OCNN from other algorithms like CNN?
Unlike some algorithms that may be order-dependent or require multiple passes over the dataset, OCNN is designed to be order-independent and more efficient in terms of processing time.
What is the recommendation for combining these techniques?
The thesis concludes that applying feature extraction methods prior to prototype selection generally yields better results than the reverse order.
How does LDA contribute to the research findings?
Linear Discriminant Analysis (LDA) is used as a dimensionality reduction technique to project high-dimensional data into a lower-dimensional subspace while maximizing class separability.
- Arbeit zitieren
- Raja Kumar (Autor:in), 2018, Reducing the Computational Requirements of the Nearest Neighbor Classifier, München, GRIN Verlag, https://www.grin.com/document/495106