Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time-consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectation.

Automated pattern matching- the ability of a program to compare known patterns and determine the degree of similarity –forms the basis for automated sequence analysis, modeling of protein structures, locating homologous genes, data mining, Internet search engines etc. in bioinformatics. Data mining relies on algorithm pattern matching to locate patterns in online and local databases, using a variety of technologies, from simple keyword matching to rule based expert system and artificial neural networks.

In this dissertation, the basic problems related to pattern reorganization and pattern matching for nucleotide and protein sequence alignment are discussed. The main techniques used to solve these problems and a comprehensive survey of most influential algorithms that were proposed during the last decay is described.

Excerpt

Inhaltsverzeichnis (Table of Contents)

Biological Prospective of Data mining

Introduction
Scope of Thesis
Biology Primer
Data Mining Operations for Knowledge Discovery Process

Data Mining and Neural Network

Introduction
Neural Networks Overview
Feed forward Neural Networks
Time Delay Neural Networks
Bi-Directional Neural Networks
Recurrent Neural Networks
Back-Propagation Through Time
Constructive Neural-Network Learning Algorithms for Pattern Classification

Introduction
Preliminaries

Sequence Alignment

Introduction
Sequence Description
Sequence Alignment

Fundamentals of Sequence Alignment

Pair wise Sequence Alignment
Local versus Global Alignment
Multiple Sequence Alignment

Alignment Algorithms

Sequence Similarity Identification

Introduction
Existing Methods

FASTA
BLAST
Dynamic Programming

Fuzzy Logic in Pattern Recognition

Introduction
Unsupervised Clustering
Fuzzy c-Means Algorithm

Cluster Validity

Membership- based Validity Measures
Geometry-based Validity

Knowledge-based Pattern Recognition
Hybrid Pattern Recognition System
Fuzzy Hidden Markov Models

Hidden Markov Models
Fuzzy Measures and Fuzzy Integrals

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This dissertation explores the application of data mining techniques to analyze biological sequences, specifically in the context of pattern recognition and sequence alignment. The primary objective is to delve into the challenges and solutions related to pattern reorganization and pattern matching within nucleotide and protein sequences.

Data mining techniques for biological sequence analysis
Pattern recognition and sequence alignment in bioinformatics
Neural networks and their applications in sequence analysis
Fuzzy logic and its role in pattern recognition
Knowledge discovery and its integration with data mining processes

Zusammenfassung der Kapitel (Chapter Summaries)

Chapter 1: Biological Prospective of Data mining

This chapter provides a fundamental introduction to the biological context of data mining. It delves into the basic characteristics of living entities, emphasizing metabolism, growth, and reproduction. The chapter then explores the structure of proteins and DNA, highlighting the significance of mutations in DNA sequences. The chapter concludes by discussing data mining operations within the framework of a knowledge discovery process, emphasizing its iterative nature and role in hypothesis generation.

Chapter 2: Data Mining and Neural Network

This chapter explores the role of neural networks in data mining applications. It provides an overview of neural networks, including various types such as feed forward, time delay, bi-directional, and recurrent neural networks. The chapter also discusses back-propagation through time, a crucial technique for training recurrent networks. Finally, it delves into constructive neural-network learning algorithms designed for pattern classification, outlining their principles and significance.

Chapter 3: Sequence Alignment

This chapter focuses on the central concept of sequence alignment, which is crucial for understanding and comparing biological sequences. It defines sequence alignment and explores its fundamentals, including pair-wise alignment, local versus global alignment, and multiple sequence alignment. The chapter further examines different alignment algorithms, highlighting their strengths and limitations.

Chapter 4: Sequence Similarity Identification

This chapter explores various methods for identifying sequence similarity. It examines existing methods such as FASTA and BLAST, discussing their principles and applications. The chapter also delves into the application of dynamic programming for sequence alignment, providing a detailed explanation of its problem formulation and solution strategies.

Chapter 5: Fuzzy Logic in Pattern Recognition

This chapter explores the use of fuzzy logic in pattern recognition, specifically in the context of unsupervised clustering. It introduces the fuzzy c-means algorithm and discusses its effectiveness in handling data with overlapping clusters. The chapter also explores cluster validity measures, emphasizing both membership-based and geometry-based approaches. Finally, it touches upon knowledge-based pattern recognition and hybrid systems, showcasing the integration of fuzzy logic with other techniques.

Schlüsselwörter (Keywords)

The key terms and focus topics of this dissertation revolve around data mining, bioinformatics, sequence analysis, pattern recognition, sequence alignment, neural networks, fuzzy logic, knowledge discovery, and algorithm development. These keywords encapsulate the research focus, themes, and core concepts explored within the work.

Excerpt out of 79 pages - scroll top

Details

Title: Data Mining for Pattern Recognition and Pattern Matching in Bioinformatics
Author: Dr Binod Kumar (Author)
Publication Year: 2006
Pages: 79
Catalog Number: V537265
ISBN (eBook): 9783346162083
Language: English
Tags: bioinformatics data matching mining pattern recognition
Product Safety: GRIN Publishing GmbH

Quote paper: Dr Binod Kumar (Author), 2006, Data Mining for Pattern Recognition and Pattern Matching in Bioinformatics, Munich, GRIN Verlag, https://www.grin.com/document/537265

Data Mining for Pattern Recognition and Pattern Matching in Bioinformatics