This survey aims to analyze a large number of existing methods to biclustering, and classify them in accordance with the methods used to perform the search.
DNA microarray aims at extracting useful information that can be applied in medical and biological studies. Clustering is one of the most utilized data mining technique to analyze the gene expression data. Generally, there are subsets of genes that behave similarily or under subsets of conditions. Therefore, biclustering is introduced to identify the subgroups of genes and subgroups of conditions by performing simultaneous clustering of both rows and columns of the gene expression matrix, rather than clustering these two dimensions separately.
DNA microarray technologies have made it feasible to monitor transcription levels of tens of thousands of genes in a single expriement. A typical DNA microarray experiment involves a multistep procedure: fabrication of microarrays by fixing properly designed oligonucleotides representing specific genes; hybridization of cDNA populations onto the microarray; scanning hybridization signals and image analysis; transformation and normalization of data; and analyzing data to identify differentially expressed genes as well as sets of genes that are co-regulated.
Table of Contents
1. INTRODUCTION
2. BICLUSTERING METHODS
2.1. Systematic Biclustering Algorithms
2.1.1 Divide and Conquer Approach
2.1.2 Greedy Iterative Search Approach
2.1.3 Biclusters Enumeration Approach
2.2. Stochastic Biclustering Algorithms
2.2.1 Neighbourhood Search Approach
2.2.2 Evolutionary Computation Approach
3. CONCLUSIONS
Objectives and Topics
This paper provides a comprehensive survey of existing biclustering algorithms used for analyzing gene expression data. The primary objective is to categorize and evaluate various computational methods, such as systematic and stochastic search approaches, to help researchers identify appropriate tools for extracting meaningful biological patterns from microarray datasets.
- Foundations of DNA microarray technology and gene expression matrices.
- Taxonomy of biclustering algorithms (systematic vs. stochastic approaches).
- Divide and conquer, greedy, and enumeration-based systematic techniques.
- Neighborhood search and evolutionary computation as stochastic optimization strategies.
- Challenges in the biological validation of biclustering results.
Excerpt from the Book
2.1.2 Greedy Iterative Search Approach
This approach builds a solution in a step-by-step way using a given quality measure. Decisions made at each step are based on information at hand without worrying about the effect these decisions may have in the future. Moreover, once a decision is taken, it becomes permanent and is never reconsidered. By applying this approach to the biclustering problem, at each iteration, submatrices of the data matrix are constructed by adding/removing a row/column to/from the current submatrix that maximizes/minimizes a certain function. This process reiterate until no other row/column can be added/removed to/from any submatrix.
Ben-Dor et al defined a bicluster as an Order-Preserving Sub-Matrix (OPSM) [7]. This way, a sub-matrix is said to be order preserving if there is a permutation of its columns under which the sequence of values in every row is strictly increasing. This algorithm can also be used to discover more than one bicluster in the same dataset, even when they are overlapped. However, in this model concerns only the order of values and thus makes the model quite restrictive.
Summary of Chapters
1. INTRODUCTION: Explains the necessity of microarray analysis and introduces biclustering as an alternative to traditional clustering for identifying localized gene expression patterns.
2. BICLUSTERING METHODS: Provides an extensive classification of biclustering algorithms, divided into systematic search approaches and stochastic metaheuristic methods.
2.1. Systematic Biclustering Algorithms: Details deterministic techniques, including divide and conquer, greedy iterative, and enumeration approaches for solving the biclustering problem.
2.1.1 Divide and Conquer Approach: Describes algorithms like Bimax and Hough Transform-based methods that recursively partition data matrices into coherent sub-problems.
2.1.2 Greedy Iterative Search Approach: Covers methods such as OPSM, ISA, and FLOC, which construct biclusters incrementally through iterative local optimization.
2.1.3 Biclusters Enumeration Approach: Examines graph-theoretic and tree-based methods like SAMBA, MicroCluster, and BiMine that attempt to identify optimal biclusters by searching exhaustive solution spaces.
2.2. Stochastic Biclustering Algorithms: Discusses probabilistic approaches that utilize metaheuristics to approximate optimal biclusters in reasonable time frames.
2.2.1 Neighbourhood Search Approach: Outlines local search methods such as the Cheng & Church algorithm and pattern-driven approaches that improve solutions through iterative changes.
2.2.2 Evolutionary Computation Approach: Analyzes population-based optimization methods like MOEA and hybrid evolutionary algorithms that utilize natural selection principles to extract biclusters.
3. CONCLUSIONS: Reflects on the lack of a universal "best" algorithm and highlights the ongoing challenge of biologically validating biclustering results.
Keywords
Biclustering, microarray, genes, expression data, systematic search, stochastic optimization, metaheuristics, gene regulation, submatrix, iterative search, evolutionary algorithms, data mining, local patterns, transcription modules, biological validation.
Frequently Asked Questions
What is the core focus of this research paper?
The paper is a survey focused on various biclustering algorithms designed to analyze high-dimensional gene expression data from DNA microarrays.
What are the primary thematic areas covered?
The main themes include the classification of biclustering techniques into systematic and stochastic categories, the specific algorithmic implementations for each, and the challenges regarding biological validation.
What is the ultimate goal of the work?
The goal is to provide a structured overview of existing methods to assist researchers in selecting the most suitable algorithm for their specific biological datasets and research questions.
Which scientific methods are primarily discussed?
The paper discusses systematic methods like divide-and-conquer and greedy search, as well as stochastic methods including neighborhood search and evolutionary computation.
What does the main body of the text address?
The main body details specific algorithms (e.g., OPSM, SAMBA, BiMine, FLOC, ISA) and explains their mechanisms for partitioning gene expression matrices to find locally co-regulated gene sets.
Which keywords characterize this paper?
Key terms include Biclustering, microarray, metaheuristics, gene expression, and data mining, reflecting the intersection of bioinformatics and computational optimization.
What is the difference between systematic and stochastic biclustering approaches?
Systematic approaches are typically deterministic and aim to find the optimal solution through structured partitioning or enumeration, whereas stochastic approaches utilize probabilistic metaheuristics to approximate solutions within a reasonable timeframe.
Why is biological validation mentioned as an open issue?
Biological validation is crucial because mathematical optimality does not always translate to biological relevance; the paper notes that there are currently no universal literature guidelines for ensuring that identified biclusters are biologically meaningful.
- Arbeit zitieren
- Rengeswaran Balamurugan (Autor:in), 2017, Biclustering Algorithms for Microarray Data, München, GRIN Verlag, https://www.grin.com/document/376903