This work is concerned with the question of how loss of information in data mining can be prevented by putting in missing values in mixed attributed datasets.
Missing value imputation is a procedure that replaces the missing values with some feasible values. Missing data imputation methods are based on only complete instances,instances without missing values in a dataset that is, when estimating plausible values for the missing values in the dataset. Actually, the information within incomplete instances can also play an important role in missing value imputation. Missing data imputation aims at providing estimations for missing values by reasoning from observed data. Because missing values can result in bias that impacts on the quality of learned patterns and the performance of classifications
Various techniques have been developed to deal with missing values in data sets with homogenous attributes. But those approaches are independent of all either continuous or discrete value. Moreover these algorithms cannot be applied to real data sets such as equipment maintenance datasets, industrial data sets and gene datasets due to the fact that these data sets contain both discrete and continuous attributes. In order to overcome the above shortcomings, imputation is done in the following manner in this work, there by contributing to both continuous and discrete data. In this method two consistent estimators for discrete and continuous missing target values are developed, and then a spherical kernel based iterative estimator using spherical kernel with RBF kernel and spherical kernel with poly kernel is advocated to impute mixed-attribute data sets, thereby improving the interpolation and extrapolation abilities.
The performance of this technique is compared by implementing the imputation with the K-NN, Frequency estimator, RBF kernel, Poly kernel and a mixed kernel and is evaluated in terms of RMSE, which reads out as Root mean square error, and correlation coefficient. In these datasets, the missing values are imputed using higher order kernel functions and the performance is evaluated.
From the experimental results it has been observed that spherical kernel with rbf and spherical kernel with poly kernel imputes missing values better when compared to other techniques.
Inhaltsverzeichnis (Table of Contents)
- CHAPTER 1: INTRODUCTION.
- 1.1 Objective of the work
- 1.2 Introduction to data mining
- 1.3 Missing values
- 1.4 Missing value imputation
- 1.5 Model flow diagram
- 1.6 Organization of the report
- 1.7 Summary
- CHAPTER 2: LITERATURE REVIEW
- 2.1 Introduction
- 2.2 Literature review
- 2.2.1 Missing values
- 2.2.2 Missing value imputation
- 2.2.3 Kernel functions
- 2.3 Summary
- CHAPTER 3: DATASET DESCRIPTION
- 3.1 Introduction
- 3.2 Data set description
- 3.3 Summary
- CHAPTER 4: IMPUTATION TECHNIQUES
- 4.1 Introduction
- 4.2 K –Nearest neighbor imputation method
- 4.3 Experimental results for imputation done using K-NN
- 4.4 Frequency Estimation Method
- 4.5 Experimental results for frequency estimator
- 4.6 Kernel Functions
- 4.7 Imputation using RBF kernel
- 4.8 Experimental results for rbf kernel
- 4.9 Imputation using poly kernel
- 4.10 Experimental results for poly kernel
- 4.11 Summary
- CHAPTER 5: IMPUTATION USING MIXTURE OF KERNELS.
- 5.1 Introduction
- 5.2 Interpolation and Extrapolation
- 5.3 Mixture of kernels
- 5.4 Experimental results for mixture of kernels
- 5.5 Imputation using spherical kernel with rbf kernel.
- 5.6 Experimental results for imputation using spherical kernel and rbf kernel
- 5.7 Imputation using spherical kernel and poly kernel
- 5.8 Experimental results for spherical kernel and poly kernel
- 5.9 Summary
- CHAPTER 6: RESULTS AND DISCUSSION
- 6.1 Introduction
- 6.2 Performance evaluation
- 6.3 Experimental results and discussion
- 6.4 Discussion of results
- 6.5 Summary
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This work aims to utilize an estimator to impute missing values in mixed attribute datasets. This approach utilizes information present in both complete and incomplete instances, preventing information loss often incurred during data conversion. The effectiveness of this method is evaluated through extensive experiments and compared to traditional algorithms using metrics such as root mean square error and correlation coefficients.
- Missing value imputation in mixed attribute datasets.
- Preserving information during data imputation.
- Performance evaluation of imputation methods.
- Comparison to existing algorithms.
- Data mining concepts and techniques.
Zusammenfassung der Kapitel (Chapter Summaries)
- Chapter 1: Introduction: Introduces the objective of the work, which is to develop and evaluate an estimator for imputing missing values in mixed attribute datasets. This chapter also provides a brief overview of data mining concepts, missing values, and missing value imputation.
- Chapter 2: Literature Review: Reviews existing literature on missing values, missing value imputation techniques, and kernel functions. This chapter provides the theoretical foundation for the proposed imputation method.
- Chapter 3: Dataset Description: Presents the dataset used in the research, providing details about its characteristics and any relevant information about missing values.
- Chapter 4: Imputation Techniques: Discusses various imputation techniques, including K-Nearest Neighbor, frequency estimation, and kernel-based methods. It also presents experimental results for each technique.
- Chapter 5: Imputation Using Mixture of Kernels: Explores the use of a mixture of kernels for imputation. It explains the concept of interpolation and extrapolation, as well as the rationale for using a combination of kernels. Experimental results for different kernel combinations are presented.
- Chapter 6: Results and Discussion: Analyzes the results of the experiments conducted, comparing the performance of different imputation methods. It discusses the effectiveness of the proposed method and provides insights into the findings.
Schlüsselwörter (Keywords)
Missing value imputation, mixed attribute datasets, kernel functions, data mining, performance evaluation, root mean square error, correlation coefficients, experimental results.
- Citation du texte
- Aasha Ajith (Auteur), 2012, How Can a Loss of Information in Mixed Attribute Datasets be Prevented?, Munich, GRIN Verlag, https://www.grin.com/document/457847