Extracting meaningful information from gene expression data poses a great challenge to the community of researchers in the field of computation as well as to biologists. It is possible to determine the behavioral patterns of genes such as nature of their interaction, similarity of their behavior and so on, through the analysis of gene expression data. If two different genes show similar expression patterns across the samples, this suggests a common pattern of regulation or relationship between their functions. These patterns have huge significance and application in bioinformatics and clinical research such as drug discovery, treatment planning, accurate diagnosis, prognosis, protein network analysis and so on.

In order to identify various patterns from gene expression data, data mining techniques are essential. Major data mining techniques which can be applied for the analysis of gene expression data include clustering, classification, association rule mining etc. Clustering is an important data mining technique for the analysis of gene expression data. However clustering has some disadvantages. To overcome the problems associated with clustering, biclustering is introduced. Clustering is a global model where as biclustering is a local model. Discovering such local expression patterns is essential for identifying many genetic pathways that are not apparent otherwise. It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.

Biclustering is a two dimensional clustering problem where we group the genes and samples simultaneously. It has a great potential in detecting marker genes that are associated with certain tissues or diseases. However, since the problem is NP-hard, there has been a lot of research in biclustering involving statistical and graph-theoretic. The proposed Cuckoo Search (CS) method finds the significant biclusters in large expression data. The experiment results are demonstrated on benchmark datasets. Also, this work determines the biological relevance of the biclusters with Gene Ontology in terms of function.

Excerpt

1 INTRODUCTION

1.1 MICROARRAY TECHNOLOGY

1.2 MICROARRAY DATA CLUSTERING ANALYSIS

1.3 BICLUSTERING

1.3.1 Bicluster Types

1.4 MOTIVATION

1.5 PROBLEM STATEMENT

1.6 RESEARCH OBJECTIVE

1.7 ENCODING OF BICLUSTER

1.8 DATASETS USED

1.9 BIOLOGICAL VALIDATION OF BICLUSTERS

2 LITERATURE REVIEW

2.1 SYSTEMATIC BICLUSTERING ALGORITHMS

2.1.1 Divide and Conquer Approach

2.1.2 Greedy Iterative Search Approach

2.1.3 Biclusters Enumeration Approach

2.2 STOCHASTIC BICLUSTERING ALGORITHMS

2.2.1 Neighbourhood Search Approach

2.2.2 Evolutionary Computation Approach

3 BICLUSTERING GENE EXPRESSION DATA USING CUCKOO SEARCH

3.1 CUCKOO SEARCH

3.2 EXPERIMENT RESULTS ANALYSIS

3.2.1 Experimental Setup

3.2.2 Bicluster extraction for Yeast and Human Lymphoma Dataset

3.2.3 Biological Relevance

3.2.4 Biological Annotation for Yeast cell cycle using GOTermFinder Toolbox

3.3 SAMMARY

Objectives and Topics

The primary objective of this work is to develop a heuristic approach for identifying coherent biclusters within gene expression data, characterized by minimum Mean Square Residue (MSR) and maximum row variance, to overcome the limitations of traditional clustering methods.

Application of Cuckoo Search metaheuristic for biclustering
Minimization of MSR for improved coherence in biclusters
Utilization of high row variance to filter out trivial clusters
Biological validation using Gene Ontology enrichment analysis
Performance evaluation on Yeast and Human Lymphoma datasets

Excerpt from the Book

3.1 CUCKOO SEARCH

The CS is one of the metaheuristic optimization approach developed by Xin-She Yang & Suash Deb (2009) based on the brood parasitism of the cuckoo species by laying their eggs in the nests of other host bird. Based on the selfish gene theory (Dawkins 1989), this parasitic behavior increases the chance of survival of the cuckoo’s genes. Since the cuckoo need not spend any energy rearing its young one. The CS algorithm utilizes these behaviors in order to traverse the search space and find optimal solutions. A set of nests with one egg are placed in random locations in the search space where the each egg represent a candidate solution. The number of cuckoos is assigned to traverse the search space, recording the highest objective values for different encountered candidate solutions. The cuckoos utilize a search pattern called Levy flight which is encountered in real insects, fish and birds. When generating new solutions x(t+1) for a cuckoo i, a Levy flight is performed using the following equation (3.1)

Chapter Summary

1 INTRODUCTION: Discusses microarray technology, the challenges of clustering gene expression data, and the introduction of biclustering as a localized alternative.

2 LITERATURE REVIEW: Reviews existing systematic and stochastic biclustering algorithms, categorizing them by their search strategies and limitations.

3 BICLUSTERING GENE EXPRESSION DATA USING CUCKOO SEARCH: Proposes the Cuckoo Search algorithm to identify coherent biclusters and provides a detailed analysis of experimental results including biological validation.

Keywords

Biclustering, Gene Expression Data, Microarray, Cuckoo Search, Metaheuristic, Mean Square Residue, MSR, Row Variance, Coherence Index, Yeast Dataset, Lymphoma Dataset, Gene Ontology, Biological Validation, Optimization, Bioinformatics

Frequently Asked Questions

What is the core focus of this research?

This research focuses on the application of the Cuckoo Search metaheuristic algorithm to the problem of biclustering, specifically to find coherent patterns in high-dimensional gene expression data.

What are the central thematic fields?

The central fields include bioinformatics, data mining, metaheuristic optimization, and molecular biology, specifically targeting gene regulation patterns.

What is the primary research goal?

The primary goal is to derive a heuristic approach capable of identifying biclusters with minimum Mean Square Residue (MSR) and maximum row variance, ensuring biological relevance.

Which scientific methodology is utilized?

The work employs the Cuckoo Search (CS) algorithm, a nature-inspired metaheuristic based on the brood parasitism strategy of cuckoo species, using Levy flights for effective search space traversal.

What is covered in the main section of the paper?

The main section details the problem formulation, the design of the Cuckoo Search algorithm for biclustering, and extensive experimental validation using benchmark Yeast and Human Lymphoma datasets.

Which keywords characterize this work?

Key terms include Biclustering, Cuckoo Search, Gene Expression, Metaheuristic, Mean Square Residue (MSR), and Gene Ontology validation.

How is the biological relevance of the extracted biclusters verified?

The biological relevance is verified using the Gene Ontology (GO) project and the FuncAssociate tool to compute p-values, assessing the statistical enrichment of genes within identified clusters.

Why is the Cuckoo Search algorithm used instead of traditional clustering?

Traditional clustering is often global and restrictive, forcing genes into single clusters. Cuckoo Search enables the discovery of local, potentially overlapping biclusters that capture significant, co-regulated genetic pathways.

Excerpt out of 40 pages - scroll top

Details

Title: A Nature Inspired Algorithm for Biclustering Microarray Data Analysis
College: Bannari Amman Institute of Technology
Grade: 1
Authors: B. Rengeswaran (Author), A.M. Natarajan (Author), K. Premalatha (Author)
Publication Year: 2015
Pages: 40
Catalog Number: V300977
ISBN (eBook): 9783668619524
ISBN (Book): 9783668619531
Language: English
Tags: biclustering expression data heuristic approach
Product Safety: GRIN Publishing GmbH

Quote paper: B. Rengeswaran (Author), A.M. Natarajan (Author), K. Premalatha (Author), 2015, A Nature Inspired Algorithm for Biclustering Microarray Data Analysis, Munich, GRIN Verlag, https://www.grin.com/document/300977

A Nature Inspired Algorithm for Biclustering Microarray Data Analysis