Grin logo
de en es fr
Shop
GRIN Website
Publicación mundial de textos académicos
Go to shop › Ciencias de la computación - Programación

A Genetic Programming Approach to Classification Problems

Título: A Genetic Programming Approach to  Classification Problems

Ensayo , 2013 , 10 Páginas , Calificación: A+

Autor:in: Hakan Uysal (Autor)

Ciencias de la computación - Programación
Extracto de texto & Detalles   Leer eBook
Resumen Extracto de texto Detalles

Genetic Programming is a biological evolution inspired technique for computer programs to solve problems automatically by evolving iteratively using a fitness function. The advantage of this type programming is that it only defines the basics.

As a result of this, it is a flexible solution for broad range of domains. Classification has been one of the most compelling problems in machine learning. In this paper, there is a comparison between genetic programming classifier and conventional classification algorithms like Naive Bayes, C4.5 decision tree, Random Forest, Support Vector Machines and k-Nearest Neighbour.

The experiment is done on several data sets with different sizes, feature sets and attribute properties. There is also an experiment on the time complexity of each classifier method.

Extracto


Table of Contents

1. Introduction

1.1.Classification

1.2.Decision Trees

1.3.Naive-Bayes

1.4.Random Forest

1.5.Support Vector Machines

1.6.K Nearest Neighbours

1.7.Genetic Algorithm

1.8.Genetic Programming

2. Experiment

2.1.Classification

2.2.Data Sets

2.3.Tools and Frameworks

2.4.Compared Algorithms

2.5.Genetic Programming Classifier

2.6.Evaluation

3. Results

3.1.Accuracy

3.1.1.Adult Data Set

3.1.2.Breast Cancer Wisconsin Data Set

3.1.3.Car Data Set

3.1.4.Iris Data Set

3.1.5.Contact Lenses Data Set

3.1.6.Soybean Data Set

3.1.7.Weather Nominal Data Set

3.2.Time Complexity Comparison (seconds)

4. Conclusion

5. Future Work

Research Objectives & Core Themes

The primary objective of this research is to evaluate the effectiveness of Genetic Programming (GP) as a classifier compared to traditional machine learning algorithms, specifically focusing on its ability to optimize decision trees for text classification tasks.

  • Comparison of GP-based classification with conventional algorithms like Naive Bayes, Random Forest, and SVM.
  • Analysis of performance metrics including AUC, precision, recall, and true positive rate across various data sets.
  • Investigation of time complexity for training and classification processes.
  • Evaluation of GP performance in relation to data set size, distribution, and attribute complexity.

Excerpt from the Book

1. Introduction

The world is going towards digitisation. Anything in the human life becomes data. Parallel to the improvement of the storage and database systems, storing the data and reaching to it have become easier and cheaper. However, having data does not mean knowledge. Information must be extracted from certain amount of the raw data. When it is done, the picture becomes clearer. This is where data mining starts.

Data mining[1] is the exploration and analysis of large quantities of data. Therefore extraction of interesting knowledge like patterns, rules or constraints from large data sets is essential.

Classification is the problem of identifying the categories of data. Text classification is one of the most idiosyncratic one among all. It is based on labelling the input text based on some training data. Social media and internet usage have been increasing by the acceptance of the real time communication and text based information sharing. Increasing amount of the text data boosts the importance of the knowledge extraction from this type. This leads computer science world to lean on text classification algorithms more. The most well known algorithms of this kind are decision tree, Naive-bayes, Random Forest, Support Vector Machines and K Nearest Neighbours classification.

Summary of Chapters

1. Introduction: This chapter defines the context of data mining and classification, introducing Genetic Programming as a bio-inspired technique for solving classification problems.

2. Experiment: This section details the methodology used for the experiment, including descriptions of the data sets, tools (Weka, Rapidminer, Orange), and the specific parameters of the GP classifier.

3. Results: This chapter presents a comparative analysis of classification performance across seven distinct data sets and evaluates the time complexity of the tested algorithms.

4. Conclusion: The author summarizes the findings, highlighting the potential of GP while acknowledging its current limitations regarding parameter tuning and computational requirements.

5. Future Work: This chapter outlines potential research directions, such as hybridizing GP with neural networks and improving search space efficiency via parallelization.

Keywords

Genetic Programming, Machine Learning, Text Classification, Data Mining, Decision Trees, Accuracy, AUC, True Positive Rate, Time Complexity, Evolutionary Algorithm, Naive Bayes, Random Forest, Support Vector Machines, K-Nearest Neighbours, Fitness Function.

Frequently Asked Questions

What is the core focus of this research?

The paper explores the application of Genetic Programming (GP) to improve decision tree classification models and benchmarks its performance against standard machine learning algorithms.

What are the primary topics covered?

Key topics include bio-inspired search methods, data classification accuracy, performance benchmarking on various UCI repository data sets, and algorithmic time complexity.

What is the main objective of the paper?

The objective is to determine if Genetic Programming can act as a reliable and efficient classification method by optimizing the creation of decision trees.

What research methods were employed?

The study uses an experimental approach, applying 10-fold cross-validation on various data sets and comparing the results of different classification algorithms using tools like Weka and Python-based APIs.

What does the main body discuss?

It provides a technical overview of each classifier, details the experimental setup, and presents an extensive analysis of the results through tables and performance charts.

Which keywords best describe this study?

Genetic Programming, Classification, Data Mining, and Machine Learning metrics such as AUC and Recall are the essential identifiers for this work.

How does GP handle the "compactness" challenge in decision trees?

GP uses a fitness function to iteratively evolve the population of trees, aiming to select more informative attributes close to the root to reduce size and increase classification speed.

Why did the GP classifier perform poorly on the Soybean data set?

The large number of class labels (19) and attributes (36) created a massive search space, making it difficult for the evolutionary process to identify optimal nodes during crossover and mutation.

What is the impact of elite population size on GP performance?

The study found that increasing the elite population size improves the True Positive rate but increases the computational time complexity by approximately 20%.

Final del extracto de 10 páginas  - subir

Detalles

Título
A Genetic Programming Approach to Classification Problems
Universidad
University College Dublin
Curso
Natural Computing
Calificación
A+
Autor
Hakan Uysal (Autor)
Año de publicación
2013
Páginas
10
No. de catálogo
V333781
ISBN (Ebook)
9783656984368
ISBN (Libro)
9783656984375
Idioma
Inglés
Etiqueta
classification genetic programming machine learning
Seguridad del producto
GRIN Publishing Ltd.
Citar trabajo
Hakan Uysal (Autor), 2013, A Genetic Programming Approach to Classification Problems, Múnich, GRIN Verlag, https://www.grin.com/document/333781
Leer eBook
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
Extracto de  10  Páginas
Grin logo
  • Grin.com
  • Envío
  • Contacto
  • Privacidad
  • Aviso legal
  • Imprint