Grin logo
de en es fr
Shop
GRIN Website
Texte veröffentlichen, Rundum-Service genießen
Zur Shop-Startseite › Informatik - Allgemeines

How Can a Loss of Information in Mixed Attribute Datasets be Prevented?

On the Imputation of Missing Values in Mixed Attribute Datasets Using Higher Order Kernel Functions

Titel: How Can a Loss of Information in Mixed Attribute Datasets be Prevented?

Masterarbeit , 2012 , 43 Seiten , Note: 1.00

Autor:in: Aasha Ajith (Autor:in)

Informatik - Allgemeines
Leseprobe & Details   Blick ins Buch
Zusammenfassung Leseprobe Details

This work is concerned with the question of how loss of information in data mining can be prevented by putting in missing values in mixed attributed datasets.

Missing value imputation is a procedure that replaces the missing values with some feasible values. Missing data imputation methods are based on only complete instances,instances without missing values in a dataset that is, when estimating plausible values for the missing values in the dataset. Actually, the information within incomplete instances can also play an important role in missing value imputation. Missing data imputation aims at providing estimations for missing values by reasoning from observed data. Because missing values can result in bias that impacts on the quality of learned patterns and the performance of classifications

Various techniques have been developed to deal with missing values in data sets with homogenous attributes. But those approaches are independent of all either continuous or discrete value. Moreover these algorithms cannot be applied to real data sets such as equipment maintenance datasets, industrial data sets and gene datasets due to the fact that these data sets contain both discrete and continuous attributes. In order to overcome the above shortcomings, imputation is done in the following manner in this work, there by contributing to both continuous and discrete data. In this method two consistent estimators for discrete and continuous missing target values are developed, and then a spherical kernel based iterative estimator using spherical kernel with RBF kernel and spherical kernel with poly kernel is advocated to impute mixed-attribute data sets, thereby improving the interpolation and extrapolation abilities.

The performance of this technique is compared by implementing the imputation with the K-NN, Frequency estimator, RBF kernel, Poly kernel and a mixed kernel and is evaluated in terms of RMSE, which reads out as Root mean square error, and correlation coefficient. In these datasets, the missing values are imputed using higher order kernel functions and the performance is evaluated.

From the experimental results it has been observed that spherical kernel with rbf and spherical kernel with poly kernel imputes missing values better when compared to other techniques.

Leseprobe


Table of Contents

CHAPTER 1: INTRODUCTION

1.1 Objective of the work

1.2 Introduction to data mining

1.3 Missing values

1.4 Missing value imputation

1.5 Model flow diagram

1.6 Organizationof the report

1.7 Summary

CHAPTER 2: LITERATURE REVIEW

2.1 Introduction

2.2 Literature review

2.2.1 Missing values

2.2.2 Missing value imputation

2.2.3 Kernel functions

2.3 Summary

CHAPTER 3: DATASET DESCRIPTION

3.1 Introduction

3.2 Data set description

3.3 Summary

CHAPTER 4: IMPUTATION TECHNIQUES

4.1 Introduction

4.2 K –Nearest neighbor imputation method

4.3 Experimental results for imputation done using K-NN

4.4 Frequency Estimation Method

4.5 Experimental results for frequency estimator

4.6 Kernel Functions

4.7 Imputation using RBF kernel

4.8 Experimental results for rbf kernel

4.9 Imputation using poly kernel

4.10 Experimental results for poly kernel

4.11 Summary

CHAPTER 5: IMPUTATION USING MIXTURE OF KERNELS

5.1 Introduction

5.2 Interpolation and Extrapolation

5.3 Mixture of kernels

5.4 Experimental results for mixture of kernels

5.5 Imputation using spherical kernel with rbf kernel

5.6 Experimental results for imputation using spherical kernel and rbf kernel

5.7 Imputation using spherical kernel and poly kernel

5.8 Experimental results for spherical kernel and poly kernel

5.9 Summary

CHAPTER 6: RESULTS AND DISCUSSION

6.1 Introduction

6.2 Performance evaluation

6.3 Experimental results and discussion

6.4 Discussion of results

6.5 Summary

CHAPTER 7: CONCLUSION AND FUTURE WORK

7.1 Conclusion

7.2 Future work

Research Objectives and Focus Areas

The primary objective of this study is to develop and evaluate a mixture kernel-based iterative nonparametric estimator for imputing missing values in mixed-attribute datasets. By leveraging both complete and incomplete instances, the proposed approach aims to mitigate the information loss typically associated with converting continuous values to discrete ones during imputation. The performance is rigorously assessed using Root Mean Square Error (RMSE) and correlation coefficients across multiple standard datasets.

  • Imputation of missing values in mixed-attribute datasets
  • Application of kernel functions (RBF, Polynomial, Spherical) for data imputation
  • Development of mixture kernel strategies to improve interpolation and extrapolation
  • Performance benchmarking against K-Nearest Neighbor and frequency-based methods
  • Optimization of kernel parameters using grid search methods

Excerpt from the Book

1.1 Objective of the work

The main objective of this work is to use an estimator for imputing missing values in mixed attribute datasets by utilising the information present in incomplete instances also apart from the complete instances. This approach prevents loss of information which occurs when continuous values are converted into discrete values and vice versa for imputation.

This method is evaluated with extensive experiments and is compared with some typical algorithms and the performance is evaluated in terms of root mean square error and correlation coefficients.

This chapter begins with the brief introduction to data mining concepts, missing values and missing value imputation and concludes with the organization of the report.

Summary of Chapters

CHAPTER 1: INTRODUCTION: Provides an overview of data mining, the challenge of missing values in mixed-attribute datasets, and outlines the objectives of the research.

CHAPTER 2: LITERATURE REVIEW: Examines previous studies regarding missing value imputation, data mining techniques, and the mathematical foundations of kernel functions.

CHAPTER 3: DATASET DESCRIPTION: Details the five publicly available datasets used for experimental validation and explains the procedure for simulating missing values.

CHAPTER 4: IMPUTATION TECHNIQUES: Discusses standard techniques like K-Nearest Neighbor and individual kernel methods (RBF, Polynomial) for filling in missing data.

CHAPTER 5: IMPUTATION USING MIXTURE OF KERNELS: Introduces hybrid kernel strategies, including the combination of spherical, RBF, and polynomial kernels to enhance model performance.

CHAPTER 6: RESULTS AND DISCUSSION: Compares the experimental performance of all proposed methods based on RMSE and correlation coefficients across the selected datasets.

CHAPTER 7: CONCLUSION AND FUTURE WORK: Summarizes the effectiveness of the proposed mixture kernel estimators and suggests future research directions in advanced kernel modeling.

Keywords

Data Mining, Missing Value Imputation, Mixed-Attribute Datasets, Kernel Functions, RBF Kernel, Polynomial Kernel, Spherical Kernel, Mixture of Kernels, Root Mean Square Error, Correlation Coefficient, K-Nearest Neighbor, Nonparametric Estimator, Predictive Modeling, Interpolation, Extrapolation

Frequently Asked Questions

What is the core focus of this research?

This research focuses on the challenge of imputing missing values specifically in mixed-attribute datasets, where independent attributes consist of both continuous and discrete types.

What are the primary methods explored?

The study investigates various techniques including K-Nearest Neighbors, frequency estimation, and several kernel-based methods such as RBF, polynomial, and spherical kernels.

What is the main objective of the proposed imputation method?

The objective is to minimize information loss by utilizing information from both complete and incomplete data instances, thereby creating a more accurate estimator for missing values.

How is the performance of the imputation techniques evaluated?

Performance is quantitatively assessed by measuring the Root Mean Square Error (RMSE) and the correlation coefficient between the original and imputed values.

What does the main part of the report cover?

The main part of the report covers theoretical foundations, detailed descriptions of the used datasets, the implementation of various imputation techniques, and extensive experimental results.

Which keywords best describe this study?

Key terms include Data Mining, Missing Value Imputation, Kernel Functions, Mixed-Attribute Datasets, and Predictive Performance Evaluation.

Why are mixture kernels used instead of single kernels?

Mixture kernels are utilized because they combine the strengths of different kernels—specifically the interpolation ability of local kernels like RBF and the extrapolation capability of global kernels like polynomial.

How does the spherical kernel contribute to the results?

The spherical kernel, classified as a higher-order kernel, is shown to provide superior imputation accuracy, yielding lower RMSE values and higher correlation coefficients when mixed with other kernels.

Ende der Leseprobe aus 43 Seiten  - nach oben

Details

Titel
How Can a Loss of Information in Mixed Attribute Datasets be Prevented?
Untertitel
On the Imputation of Missing Values in Mixed Attribute Datasets Using Higher Order Kernel Functions
Hochschule
Avinashilingam University
Note
1.00
Autor
Aasha Ajith (Autor:in)
Erscheinungsjahr
2012
Seiten
43
Katalognummer
V457847
ISBN (eBook)
9783668905337
ISBN (Buch)
9783668905344
Sprache
Englisch
Schlagworte
loss information mixed attribute datasets prevented imputation missing values using higher order kernel functions
Produktsicherheit
GRIN Publishing GmbH
Arbeit zitieren
Aasha Ajith (Autor:in), 2012, How Can a Loss of Information in Mixed Attribute Datasets be Prevented?, München, GRIN Verlag, https://www.grin.com/document/457847
Blick ins Buch
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
Leseprobe aus  43  Seiten
Grin logo
  • Grin.com
  • Versand
  • Kontakt
  • Datenschutz
  • AGB
  • Impressum