In this project, we would tackle three different parts using Python programming language and JupyterLab. The first part is focusing on programming KNNs (K-nearest neighbors) and NBs (Naive Bayes) from scratch. Then, we would move on afterward to comparing the results obtained by these both algorithms for final evaluation. Therefore, we would consider which one is performing the best.
In the second part, we would use sklearn library to compare the two algorithms on a larger dataset, specifically in four different settings: Influence of reduced training set, influence of large training set, influence of absence of a teacher
and unknown distribution.
In the third part, we would compare the same algorithms for image classification on 10 different classes, using feature descriptors.
Table of Contents
Introduction
1. Programming of a discrimination method: KNNs (K-nearest neighbors) and Naïve Bayes (NB)
1.1 Data
1.2 Implementation of KNNs Algorithm
1.3 Implementation of NBs Algorithm
1.4 Experiment and Results
2. Comparison of the two methods (parametric and nonparametric)
2.1 Influence of the size of the training set: the case of reduced size
2.2 Influence of the size of the training set: the case of large size
2.3 In the case of the absence of a teacher
2.4 Distribution unknown
3. Approach based on Descriptors
3.1 Calculation of descriptors
3.2 Implementation of a classification system
Objectives and Topics
This report aims to evaluate and compare the performance of K-Nearest Neighbors (KNNs) and Naïve Bayes (NB) algorithms through various classification tasks using Python and the sklearn library.
- Implementation of KNNs and Naïve Bayes algorithms from scratch.
- Comparative analysis of parametric and non-parametric classification models.
- Evaluation of training set size influence and data distribution on algorithm robustness.
- Application of feature descriptors for real-world image classification.
- Performance assessment using confusion matrices and accuracy measures.
Excerpt from the Book
1.2 Implementation of KNNs Algorithm
For the realization of KNNs algorithm, at first, we have our X data points which we strictly know their truth (we know their classes). On the hand, we have N points whose classes we would like to know. So, the first important step that we would do is browsing our N points in computing the distance between them with all the truth points. In our case, we would use the Euclidean distance as in taking the square root of the difference between each coordinate squared.
After computing the distance between the points, we would like to recover the indices of the weakest distance. And by this, we would sort into a table in distance by recovering their original indices. To retrieve and keep the first K values of this table, we would recover the most present class among these points. Therefore, we could simply the process in considering figure (2).
Summary of Chapters
Introduction: Provides an overview of the project structure, including the implementation of classification algorithms and the testing of different data settings.
1. Programming of a discrimination method: KNNs (K-nearest neighbors) and Naïve Bayes (NB): Details the initial programming of both algorithms from scratch and the preliminary experimental results on data distribution.
2. Comparison of the two methods (parametric and nonparametric): Compares KNNs and NBs using the sklearn library across different scenarios, such as varying training set sizes and teacher-less environments.
3. Approach based on Descriptors: Describes the application of five unique feature descriptors for image classification tasks using the Wang database.
Keywords
K-Nearest Neighbors, Naïve Bayes, Machine Learning, Classification, Python, Sklearn, Image Classification, Feature Descriptors, Confusion Matrix, Accuracy, Training Set, Parametric Models, Non-parametric Models, Data Analysis, Algorithm Performance.
Frequently Asked Questions
What is the primary focus of this report?
The report focuses on understanding, implementing, and comparing K-Nearest Neighbors and Naïve Bayes algorithms through practical programming tasks.
Which programming language and tools are utilized?
The project utilizes Python and JupyterLab, along with the sklearn library for implementing and testing the machine learning models.
What is the main objective of the research?
The primary objective is to evaluate which classification algorithm performs better under various conditions, including different dataset sizes and distributions.
Which scientific method is applied for evaluation?
The report uses quantitative metrics such as error rates, accuracy, and confusion matrices to compare the performance of the chosen algorithms.
What topics are covered in the main section?
The main sections cover the manual implementation of algorithms, comparative analysis of training sets, and an approach to image classification using descriptors.
Which keywords best characterize this work?
Key terms include K-Nearest Neighbors, Naïve Bayes, Classification, Machine Learning, and Feature Descriptors.
Why is the choice of 'k' important for the KNNs algorithm?
The choice of 'k' determines how many nearest neighbors are considered for classification; the report identifies that there is an optimal range for 'k' to minimize error and maximize accuracy.
How does the size of the training set affect the algorithms?
The findings suggest that KNNs perform better with larger training bases, while Naïve Bayes shows more robustness in scenarios with smaller datasets or when labels are absent.
- Arbeit zitieren
- Marwan Al Omari (Autor:in), 2021, Understanding of Algorithms. KNNs and Naive Bayes, München, GRIN Verlag, https://www.grin.com/document/1215098