A Nature Inspired Algorithm for Biclustering Microarray Data Analysis


Research Paper (undergraduate), 2015

40 Pages, Grade: 1


Excerpt

2
TABLE OF CONTENTS
CHAPTER
NO.
TITLE
ABSTRACT
LIST OF TABLES
LIST OF FIGURES
1
INTRODUCTION
1.1
MICROARRAY TECHNOLOGY
1.2
MICROARRAY DATA CLUSTERING ANALYSIS
1.3
BICLUSTERING
1.3.1 Bicluster Types
1.4 MOTIVATION
1.5 PROBLEM STATEMENT
1.6 RESEARCH OBJECTIVE
1.7 ENCODING OF BICLUSTER
1.8 DATASETS USED
1.9 BIOLOGICAL VALIDATION OF BICLUSTERS
2
LITERATURE REVIEW
2.1
SYSTEMATIC BICLUSTERING ALGORITHMS
2.1.1 Divide and Conquer Approach
2.1.2 Greedy Iterative Search Approach
2.1.3 Biclusters Enumeration Approach
2.2 STOCHASTIC BICLUSTERING ALGORITHMS
2.2.1 Neighbourhood Search Approach
2.2.2 Evolutionary Computation Approach
3
BICLUSTERING GENE EXPRESSION DATA USING

3
CUCKOO SEARCH
3.1 CUCKOO SEARCH
3.2 EXPERIMENT RESULTS ANALYSIS
3.2.1 Experimental Setup
3.2.2 Bicluster extraction for Yeast and Human
Lymphoma Dataset
3.2.3 Biological Relevance
3.2.4 Biological Annotation for Yeast cell cycle using
GOTermFinder Toolbox
3.3 SAMMARY
REFERENCES

i
ABSTRACT
Extracting meaningful information from gene expression data poses a great challenge to
the community of researchers in the field of computation as well as to biologists. It is possible to
determine the behavioral patterns of genes such as nature of their interaction, similarity of their
behavior and so on, through the analysis of gene expression data. If two different genes show similar
expression patterns across the samples, this suggests a common pattern of regulation or relationship
between their functions. These patterns have huge significance and application in bioinformatics and
clinical research such as drug discovery, treatment planning, accurate diagnosis, prognosis, protein
network analysis and so on.
In order to identify various patterns from gene expression data, data mining techniques
are essential. Major data mining techniques which can be applied for the analysis of gene expression
data include clustering, classification, association rule mining etc. Clustering is an important data
mining technique for the analysis of gene expression data. However clustering has some
disadvantages. To overcome the problems associated with clustering, biclustering is introduced.
Clustering is a global model where as biclustering is a local model. Discovering such local
expression patterns is essential for identifying many genetic pathways that are not apparent
otherwise. It is therefore necessary to move beyond the clustering paradigm towards developing
approaches which are capable of discovering local patterns in gene expression data.
Biclustering is a two dimensional clustering problem where we group the genes and samples
simultaneously. It has a great potential in detecting marker genes that are associated with certain
tissues or diseases. However, since the problem is NP-hard, there has been a lot of research in
biclustering involving statistical and graph-theoretic. The proposed Cuckoo Search (CS) method
finds the significant biclusters in large expression data. The experiment results are demonstrated on
benchmark datasets. Also, this work determines the biological relevance of the biclusters with Gene
Ontology in terms of function.

ii
LIST OF TABLES
TABLE NO.
TITLE
3.1
Parameter and its value
3.2
Experiment results for yeast cell expression data
3.3
Experiment results for human lymphoma expression
data
3.4
Significant GO terms for three biclusters on yeast cell
data

iii
LIST OF FIGURES
FIGURE NO.
TITLE
1.1
Microarray Analysis
1.2
Gene expression matrix
1.3
Types of microarray clusters
1.4
Representation of vertex and its mapping to Bicluster
3.1
Fitness of the bicluster on Yeast cell-cycle data
3.2
Fitness of the bicluster on Lymphoma data
3.3
Gene expression profile of the largest bicluster on yeast
cell-cycle data
3.4
Gene expression profile of the largest bicluster on
Lymphoma data
3.5
Proportions of biclusters significantly enriched by GO
annotations on yeast cell-cycle.
3.6
Gene Ontology biological functions of yeast cell cycle
data with (20 genes)

1
CHAPTER 1
INTRODUCTION
Bioinformatics is an interdisciplinary subject involving fields as diverse as Biology,
Statistics, Computer Science, Mathematics, Physics and Information Technology.
It deals with
different kinds of biological data. Microarray Gene expression data is one among them. The
dimension and complexity of raw gene expression data is create challenging data analysis and data
management problems. The fundamental goal of microarray gene expression data analysis is to
identify the behavioral patterns of genes. An overview of Microarray Technology, Biclustering,
problem formulation and the need of biological validation are discussed in this chapter.
1.1.
MICROARRAY TECHNOLOGY
Cells are the basic building blocks of every organism. There is a central core in the cell
called nucleus. Inside the nucleus there is an important molecule known as Deoxyribonucleic Acid
(DNA). All living organisms contain DNA. A
gene is a segment of DNA, which contains the
formula for the chemical composition of one particular protein. Gene expression is the process of
transcribing a gene's DNA sequence into Messenger Ribonucleic Acid (mRNA) sequences, which in
turn are later translated into proteins. Several microarray technologies have been developed to study
gene expression regulation. A most popular microarray technology is based on oligonucleotide
chips. The other broadly used microarray technology is complementary DNA (cDNA)-arrays. DNA
microarray technology is attracting wonderful interest in both the scientific community and in
industry. Because of its ability to measure simultaneously the activities and interactions of thousands
of genes (Lockhart & Winzeler 2000). A summary of the whole process of the microarray analysis
can be seen in Figure 1.1.

2
Figure 1.1 Microarray Analysis
Molecular Biology research evolves through the development of the technologies used for
carrying them out. Since it is not possible to research on a large number of genes using traditional
methods, DNA microarray enables the researchers to analyze the expression of many genes in a
single reaction quickly and in an efficient manner.
A typical DNA microarray analysis involves a
multistep procedure: fabrication of microarrays by fixing properly designed oligonucleotides
representing specific genes; hybridization of cDNA populations onto the microarray; scanning
hybridization signals and image analysis; transformation and normalization of data; and analyzing
data to identify differentially expressed genes as well as sets of genes that are co regulated.
The gene expression matrix is a processed data after the normalization procedure. Each
row in the matrix corresponds to a particular gene and each column could either correspond to an
experimental condition or a specific time point at which expression of the genes has been measured.
The expression levels for a gene across different experimental conditions are cumulatively called the
gene expression profile, and the expression levels for all genes under an experimental condition are

3
cumulatively called the sample expression profile. An expression profile (of a gene or a sample) can
be thought of as a vector and can be represented in vector space. For example, an expression profile
of a gene can be considered as a vector in n dimensional space where n is the number of conditions,
and an expression profile of a sample with m genes can be considered as a vector in m dimensional
space where m is the number of genes). In the figure given below, the gene expression matrix X with
m genes across n conditions is considered to be an m x n matrix. Each element x
i
of this matrix
represents the expression level of a gene under a specific condition, and is represented by a real
number. Usually, it is the logarithm of the relative profusion of the mRNA of the gene under the
specific condition. Figure 1.2 shows the gene expression data matrix.
.
Figure 1.2 Gene expression data matrix © Mark Reimers/Exploratory Analysis
1.2
MICROARRAY DATA CLUSTERING ANALYSIS
Data mining techniques are essential in order to identify various patterns from gene
expression data. Major data mining approaches which can be applied for the analysis of gene
expression data include association rule mining, classification, clustering etc.
Cluster analysis is an
important technique to partition objects that have many attributes (multi-dimensional data) into
meaningful disjoint sub-groups. Clustering process groups together similar objects into clusters. The
Excerpt out of 40 pages

Details

Title
A Nature Inspired Algorithm for Biclustering Microarray Data Analysis
College
Bannari Amman Institute of Technology
Grade
1
Authors
Year
2015
Pages
40
Catalog Number
V300977
ISBN (eBook)
9783668619524
ISBN (Book)
9783668619531
File size
1237 KB
Language
English
Tags
biclustering, expression, data, heuristic, approach
Quote paper
B. Rengeswaran (Author)A.M. Natarajan (Author)K. Premalatha (Author), 2015, A Nature Inspired Algorithm for Biclustering Microarray Data Analysis, Munich, GRIN Verlag, https://www.grin.com/document/300977

Comments

  • No comments yet.
Read the ebook
Title: A Nature Inspired Algorithm for Biclustering Microarray Data Analysis



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free