Data mining is coined one of the steps while discovering insights from large amounts of data which may be stored in databases, data warehouses, or in other information repositories. Data mining is now playing a significant role in seeking a decision support to draw higher profits by the modern business world. Various researchers studied the benefits of data mining processes and its adoption by business organizations, but very few of them have discussed the success factors of decision support projects.

The Research Hypothesis states the involvement of the decision tree while adopting accuracy of classification and while emphasizing the impact factor or importance of the attributes rather than the information gain. The concept of involvement of impact factor rather than just accuracy can be utilized in developing the new algorithm whose performance improves over the existing algorithms. We proposed a new algorithm which improves accuracy and contributing effectively in decision tree learning. We presented an algorithm that resolves the above stated problem of confliction of class. We have introduced the impact factor and classified impact factor to resolve the conflict situation. We have used data mining technique in facilitating the decision support with improved performance over its existing companion. We have also addressed the unique problem which have not been addressed before. Definitely, the fusion of data mining and decision support can contribute to problem-solving by enabling the vast hidden knowledge from data and knowledge received from experts.

We have discussed a lot of work done in the field of decision support and hierarchical multi-attribute decision models. Ample amount of algorithms are available which are used to classify the data in datasets. Most algorithms use the concept of information gain for classification purpose. Some Lacking areas also exist. There is a need for an ideal algorithm for large datasets. There is a need for handling the missing values. There is a need for removing attribute bias towards choosing a random class when a conflict occurs. There is a need for decision support model which takes the advantages of hierarchical multi-attribute classification algorithms.

Extrait

Chapter 1. Introduction to Data Mining and Decision Support

1.1 Introduction

1.2 The KDD Process

1.2.1 Developing and understanding of the application domain

1.2.2 Selecting and creating a data set

1.2.3 Pre-processing and cleansing

1.2.4 Data transformation

1.2.5 Choosing the appropriate Data Mining task

1.2.6 Choosing the Data Mining algorithm

1.2.7 Employing the Data Mining algorithm

1.2.8 Evaluation

1.2.9 Using the discovered knowledge

1.3 The Data Mining, a Step of the KDD Process

1.3.1 Database, Data Warehouse, or Other Information Repositories

1.3.2 Database or Data Warehouse Server

1.3.3 Knowledge Base

1.3.4 Data Mining Engine

1.3.5 Pattern Evaluation Module

1.3.6 Graphical User Interface

1.4. Data Mining Functionalities

1.4.1 Concept / Class Description: Characterization and Discrimination

1.4.2 Association Rule Mining

1.4.3 Classification and Prediction

1.4.4 Clustering

1.4.5 Outlier Analysis

1.4.6 Evolution Analysis

1.5 Common Uses of Data Mining

1.6 Decision Support

1.6.1 Basic Discipline

1.6.2 Decision Making

1.6.3 Classification of decision problems

1.6.4 Decision Support System

1.7 Contributions of This Thesis

Chapter 2. A Survey of Existing Work and Problem Definition

2.1 A Survey of existing work

2.1.1 Problem with integration of Data Mining and Decision Support

2.1.2 Evolution of Decision Support System (DSS)

2.1.3 A Survey of existing decision tree algorithm (Traditional)

2.1.4 A Survey of existing decision tree algorithm (Advanced)

2.2 Problems yet to be solved

2.3 Problem Definition

2.4 Research hypothesis, aims and objectives

2.5 Conceptual Research Framework

2.6 Conclusion

Chapter 3. Analysis of Data Mining Methods

3.1 Data Mining Methods

3.2 Discovery Method

3.3 Flat versus hierarchical classification

3.4 Basic Methods

3.5 Hierarchical classification

3.5.1 Why to choose hierarchies

3.5.2 Advantages of hierarchies

3.6 Machine Learning and Classification

3.6.1 Classification

3.6.1.1 Evaluation of classification methods

3.7 Classification Based on decision tree

3.8 Classification rules

3.9 The Pruning of Decision Tree

3.9.1 Types of Pruning Technique

3.9.1.1 Pre-Pruning

3.9.1.2 Post- Pruning

3.9.2 Fuzzy Decision Trees

3.10 Conclusion

Chapter 4. Decision tree Techniques and their formulation

4.1 Formulation of decision trees

4.2 Characteristics of Classification Trees

4.2.1 Tree Size

4.2.2 The hierarchical nature of decision trees

4.3 Basic concept and algorithm of Decision Tree

4.3.1 ID3

4.3.1.1 Attribute Selection

4.3.1.2 Information Gain

4.3.2 C4.5

4.3.3 CART

4.3.4 CHAID

4.3.5 QUEST

4.4 Advantages and Disadvantages of Decision Trees

4.5 Decision Tree Extensions

4.5.1 Oblivious Decision Trees

4.5.2 Fuzzy Decision Trees

4.6 Decision Trees Inducers for Large Datasets

4.7 Incremental Induction

4.8 Evaluation of Decision Tree Techniques

4.8.1 Generalization Error

4.8.1.1 Theoretical Estimation of Generalization Error

4.8.1.2 Empirical Estimation of Generalization Error

4.8.2 Confusion Matrix

4.8.3 Computational Complexity

4.8.4 Comprehensibility

4.9 Scalability to Large Datasets

4.9.1 Robustness

4.10 Conclusion

Chapter 5. The Development of New Algorithm for Decision Tree Learning

5.1 Proposed Improved ID3 Algorithm

5.2 Steps of Improved ID3 Algorithm

5.3 Pseudocode of Proposed Improved Algorithm

5.4 Experimental Example

5.5 Experiments on Datasets

5.6 Investigation and analysis based on performance parameters

5.6.1 Accuracy

5.6.2 Model Build Time

5.6.3 Predictor Error Measures

5.7 Empirical comparison and Investigation results

5.8 Conclusion

Chapter 6. Decision Support Framework and Related Work

6.1 Introduction

6.2 Proposed Decision Support Framework

6.3 Real world applications

6.3.1 Predicting Usage of Library Books

6.3.2 Intrusion Detection

6.3.3 Machine Learning

6.3.4. Diagnosis

6.3.5 Banking Sector

6.3.6 Credit Risk Analysis

6.4 Decision Tree Construction Using Weka

6.5 Weka Screen Shot

6.6 Conclusion

Chapter 7. Conclusions and Future Work

7.1 Summary and Contributions

7.2 Limitations and Future Work

7.3 Future Work

Research Objectives and Themes

This thesis aims to address the limitations of the traditional ID3 decision tree algorithm by developing an improved version that utilizes impact factors and classified impact factors to resolve conflicts when attributes have equal values but belong to different classes. The primary research goal is to propose an algorithm that enhances classification accuracy while maintaining computational efficiency for real-world decision support systems.

Data mining and knowledge discovery in large datasets.
Development of hierarchical multi-attribute decision models.
Improvement of the ID3 algorithm through attribute weight importance and impact factors.
Implementation of decision support frameworks for real-world applications.
Performance evaluation of decision tree algorithms using various benchmark datasets.

Excerpt from the Book

1.2.6. Choosing the Data Mining algorithm

As per the approach, we now settle on the strategies. This phase includes choosing the particular method to be used for searching patterns (including multiple inducers). For instance, in with precision versus understand ability. The previous is healthier with neural networks, while the second is healthier with decision trees. For every strategy of meta-learning there are several possibilities to grasp how it may be accomplished. Meta-learning emphases on elucidating on causes of data Mining algorithm to achieve success or not during a precise problem. Thus, this approach attempts to recognise the circumstances under which a data Mining algorithm is most fitted. Each algorithm has parameters and techniques of learning (such as ten-fold cross-validation or another division for training and testing).

Summary of Chapters

Chapter 1. Introduction to Data Mining and Decision Support: Provides an overview of the data mining process, the KDD workflow, and the integration of these techniques within decision support systems.

Chapter 2. A Survey of Existing Work and Problem Definition: Reviews literature on data mining and decision support, identifies research gaps, and formulates the core research hypothesis and objectives.

Chapter 3. Analysis of Data Mining Methods: Analyzes various data mining methods, focusing on the taxonomy, hierarchical vs. flat classification, and the importance of tree-based methods.

Chapter 4. Decision tree Techniques and their formulation: Details various decision tree algorithms, their mathematical formulations, and evaluation strategies for classification tasks.

Chapter 5. The Development of New Algorithm for Decision Tree Learning: Presents an improved ID3 algorithm that incorporates impact factors to handle attribute conflicts and validates its performance against traditional methods.

Chapter 6. Decision Support Framework and Related Work: Formulates a practical decision support framework and demonstrates its utility through real-world applications and Weka-based implementations.

Chapter 7. Conclusions and Future Work: Summarizes the thesis contributions, acknowledges limitations, and suggests potential directions for future research in hierarchical decision modeling.

Key Words

Data Mining, KDD Process, Decision Tree, ID3, C4.5, CART, Decision Support System, Classification, Information Gain, Impact Factor, Hierarchical Classification, Attribute Selection, Weka, Machine Learning, Pruning.

Frequently Asked Questions

What is the core focus of this research?

The research focuses on enhancing decision tree classification by introducing an improved ID3 algorithm that utilizes impact factors and classified impact factors to resolve conflicts in attribute data.

Which specific themes are covered in the work?

The work covers themes such as the KDD process, the development of hierarchical multi-attribute decision models, decision tree pruning, and the implementation of decision support frameworks for real-world scenarios.

What is the primary goal of the thesis?

The primary goal is to improve the accuracy of decision tree learning and to provide a robust framework for effective decision support in scenarios where traditional algorithms fail to handle conflicting class attributes.

Which methodology is employed in this research?

The research uses a mix of theoretical analysis of data mining methods and empirical evaluation, comparing the proposed improved ID3 algorithm against existing algorithms like C4.5 and CART using multiple real-world datasets.

What is discussed in the main chapters?

The main chapters discuss the foundational concepts of data mining, survey existing algorithms, analyze classification methods, detail the development of the improved ID3 algorithm, and formulate a new decision support framework.

What key keywords characterize this research?

Key keywords include Data Mining, Decision Tree, ID3, Decision Support System, Classification, Information Gain, and Hierarchical Classification.

How does the proposed algorithm improve upon the original ID3?

The proposed algorithm resolves the conflict of class selection when attributes have equal values by introducing an "impact factor" and "classified impact factor," allowing the algorithm to decide more effectively which class to adopt for maximum accuracy.

Why are impact factors used in the new algorithm?

Impact factors are introduced to balance classification decisions based on the importance of attributes rather than relying solely on information gain, which helps the algorithm handle ambiguous data scenarios more accurately.

How is the performance of the proposed algorithm validated?

The performance is validated using 10-fold cross-validation on six different real-world datasets, measuring parameters like accuracy, model build time, and mean absolute error in comparison with existing ID3, C4.5, and CART models.

Fin de l'extrait de 134 pages - haut de page

Résumé des informations

Titre: Data Mining Multi-Attribute Decision System. Facilitating Decision Support Through Data Mining Technique by Hierarchical Multi-Attribute Decision Models
Université: Symbiosis International University
Auteurs: Dr. Pankaj Pathak (Auteur), Dr. Parashu Ram Pal (Auteur)
Année de publication: 2020
Pages: 134
N° de catalogue: V950605
ISBN (ebook): 9783346292315
ISBN (Livre): 9783346292322
Langue: anglais
mots-clé: data mining multi-attribute decision system facilitating support through technique hierarchical models
Sécurité des produits: GRIN Publishing GmbH

Citation du texte: Dr. Pankaj Pathak (Auteur), Dr. Parashu Ram Pal (Auteur), 2020, Data Mining Multi-Attribute Decision System. Facilitating Decision Support Through Data Mining Technique by Hierarchical Multi-Attribute Decision Models, Munich, GRIN Verlag, https://www.grin.com/document/950605

Data Mining Multi-Attribute Decision System. Facilitating Decision Support Through Data Mining Technique by Hierarchical Multi-Attribute Decision Models