Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time-consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectation.
Automated pattern matching- the ability of a program to compare known patterns and determine the degree of similarity –forms the basis for automated sequence analysis, modeling of protein structures, locating homologous genes, data mining, Internet search engines etc. in bioinformatics. Data mining relies on algorithm pattern matching to locate patterns in online and local databases, using a variety of technologies, from simple keyword matching to rule based expert system and artificial neural networks.
In this dissertation, the basic problems related to pattern reorganization and pattern matching for nucleotide and protein sequence alignment are discussed. The main techniques used to solve these problems and a comprehensive survey of most influential algorithms that were proposed during the last decay is described.
Inhaltsverzeichnis (Table of Contents)
- Biological Prospective of Data mining
- Introduction
- Scope of Thesis
- Biology Primer
- Data Mining Operations for Knowledge Discovery Process
- Data Mining and Neural Network
- Introduction
- Neural Networks Overview
- Feed forward Neural Networks
- Time Delay Neural Networks
- Bi-Directional Neural Networks
- Recurrent Neural Networks
- Back-Propagation Through Time
- Constructive Neural-Network Learning Algorithms for Pattern Classification
- Introduction
- Preliminaries
- Sequence Alignment
- Introduction
- Sequence Description
- Sequence Alignment
- Fundamentals of Sequence Alignment
- Pair wise Sequence Alignment
- Local versus Global Alignment
- Multiple Sequence Alignment
- Alignment Algorithms
- Sequence Similarity Identification
- Introduction
- Existing Methods
- FASTA
- BLAST
- Dynamic Programming
- Fuzzy Logic in Pattern Recognition
- Introduction
- Unsupervised Clustering
- Fuzzy c-Means Algorithm
- Cluster Validity
- Membership- based Validity Measures
- Geometry-based Validity
- Knowledge-based Pattern Recognition
- Hybrid Pattern Recognition System
- Fuzzy Hidden Markov Models
- Hidden Markov Models
- Fuzzy Measures and Fuzzy Integrals
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This dissertation explores the application of data mining techniques to analyze biological sequences, specifically in the context of pattern recognition and sequence alignment. The primary objective is to delve into the challenges and solutions related to pattern reorganization and pattern matching within nucleotide and protein sequences.
- Data mining techniques for biological sequence analysis
- Pattern recognition and sequence alignment in bioinformatics
- Neural networks and their applications in sequence analysis
- Fuzzy logic and its role in pattern recognition
- Knowledge discovery and its integration with data mining processes
Zusammenfassung der Kapitel (Chapter Summaries)
Chapter 1: Biological Prospective of Data mining
This chapter provides a fundamental introduction to the biological context of data mining. It delves into the basic characteristics of living entities, emphasizing metabolism, growth, and reproduction. The chapter then explores the structure of proteins and DNA, highlighting the significance of mutations in DNA sequences. The chapter concludes by discussing data mining operations within the framework of a knowledge discovery process, emphasizing its iterative nature and role in hypothesis generation.
Chapter 2: Data Mining and Neural Network
This chapter explores the role of neural networks in data mining applications. It provides an overview of neural networks, including various types such as feed forward, time delay, bi-directional, and recurrent neural networks. The chapter also discusses back-propagation through time, a crucial technique for training recurrent networks. Finally, it delves into constructive neural-network learning algorithms designed for pattern classification, outlining their principles and significance.
Chapter 3: Sequence Alignment
This chapter focuses on the central concept of sequence alignment, which is crucial for understanding and comparing biological sequences. It defines sequence alignment and explores its fundamentals, including pair-wise alignment, local versus global alignment, and multiple sequence alignment. The chapter further examines different alignment algorithms, highlighting their strengths and limitations.
Chapter 4: Sequence Similarity Identification
This chapter explores various methods for identifying sequence similarity. It examines existing methods such as FASTA and BLAST, discussing their principles and applications. The chapter also delves into the application of dynamic programming for sequence alignment, providing a detailed explanation of its problem formulation and solution strategies.
Chapter 5: Fuzzy Logic in Pattern Recognition
This chapter explores the use of fuzzy logic in pattern recognition, specifically in the context of unsupervised clustering. It introduces the fuzzy c-means algorithm and discusses its effectiveness in handling data with overlapping clusters. The chapter also explores cluster validity measures, emphasizing both membership-based and geometry-based approaches. Finally, it touches upon knowledge-based pattern recognition and hybrid systems, showcasing the integration of fuzzy logic with other techniques.
Schlüsselwörter (Keywords)
The key terms and focus topics of this dissertation revolve around data mining, bioinformatics, sequence analysis, pattern recognition, sequence alignment, neural networks, fuzzy logic, knowledge discovery, and algorithm development. These keywords encapsulate the research focus, themes, and core concepts explored within the work.
- Citation du texte
- Dr Binod Kumar (Auteur), 2006, Data Mining for Pattern Recognition and Pattern Matching in Bioinformatics, Munich, GRIN Verlag, https://www.grin.com/document/537265