Grin logo
de en es fr
Shop
GRIN Website
Publish your texts - enjoy our full service for authors
Go to shop › Computer Science - Bioinformatics

Visualising how correlation networks change over time

Title: Visualising how correlation networks change over time

Scientific Study , 2016 , 9 Pages , Grade: 8.5/10

Autor:in: Hendricus Bongers (Author)

Computer Science - Bioinformatics
Excerpt & Details   Look inside the ebook
Summary Excerpt Details

This thesis is made to research and develop a tool for visualising gene expression data. The report will describe and illustrate how correlations between genes will change in a time course. The tool that is developed can be widely used in biology (e.g. toxicology, studying the cell cycle, studying the progression of diseases) and the 3D environment gives better insights in the gene expression data. For this research the given data will be preprocessed first before analysing. After this step the unexpressed genes will be filtered out. The remaining genes will be used to make a correlation matrix. The correlation matrix will be clustered in smaller matrices to make it easier for algorithms to analyse the data. An advantage of clustering is that it can visualise the inter- and intra-cluster correlations. After the clustering step two filtering algorithms will be used on each cluster to retrieve the final correlation networks. The clustering and filtering is done for every time point. Between this the data is interpolated for smooth simulation. All clusters will be visualised in 3D environment and will be made interactive by using virtual reality. This thesis has as a goal to develop a tool that gives insight on how correlation networks change in a time course and give answers to the following questions: Which simulation techniques work best? What layout algorithm will give the best visual outcome? Will a 3D environment improve readability of the data? Which filters are applicable to the data?

Excerpt


Table of Contents

1 Introduction

1.1 Research questions

2 Methods

2.1 Preprocessing and filtering

2.2 Making the correlation network

2.3 Visualising

2.4 Simulating

3 Results

3.1 K-means

3.2 Filtering algorithms

3.3 Size of filtered network VS computing time

3.4 Visualising

4 Conclusion

5 Future Work

Research Objectives and Topics

The primary objective of this thesis is to research and develop a software tool capable of visualizing changes in gene expression correlation networks over a specific time course, utilizing 3D environments and virtual reality to improve data interpretability.

  • Data preprocessing and filtering techniques for gene expression datasets.
  • Application of k-means clustering to handle large-scale genomic data.
  • Implementation of correlation network construction using statistical algorithms.
  • Comparative analysis of layout algorithms and interpolation methods for simulation.
  • Enhancement of visual readability through 3D rendering and virtual reality integration.

Excerpt from the Thesis

2.3 Visualising

Visualising the data is an important aspect of the developed application. The data needs to be visualised in such a way that the data is easy to read and the user can quickly read the info from the application. The graph layout of the correlation network can be calculated by using one of the following methods.

Force directed layout

In a force directed layout[2, 4] the position of the nodes is calculated by continuously moving the nodes around as if a force is working on it. In this graph drawing method the way of adding these forces is by treating the edges between the nodes as springs. If two nodes have a high correlation then the spring will contract more between these two nodes and the two nodes will be closer to each other. While laying out a graph can be a difficult problem, the force directed layout is less of a problem since it is a physics simulation and no special knowledge about graph theory is required (e.g. planarity).

The forces in this method are mostly treated like springs where the springs represent the edges between the nodes. The attractive forces in these spring are using spring-physics following Hooke’s law. At the same time the nodes attract to each other, there is also a repulsive force. This repulsive force behaves like charged particles based on Coulomb’s law. The movement of nodes by attracting and repulsing to and from each other keeps happening until an equillibrium is reached.

An advantage of using the force directed layout is that it is very intuitive. The spring like movement makes it easy to see what happens which is highly important to the user experience of the application. Another advantage is the interactivity. The algorithm has no problem when there is an extra node added. The algorithm will then calculate a new layout with the new node in there. This algorithm is also interactive because you can break up the drawing stages and then the user can see what happens in each intermediate stage.

Summary of Chapters

1 Introduction: Provides an overview of computational biology and defines the specific research goal of visualizing dynamic correlation networks in genomic data.

2 Methods: Details the algorithmic approach, covering data preprocessing, k-means clustering, correlation network construction, visualization techniques, and simulation methods.

3 Results: Presents empirical experiments on the performance of k-means clustering, filtering algorithms, and visualization efficiency regarding computing time and data readability.

4 Conclusion: Summarizes the effectiveness of the developed methods and evaluates the impact of 3D environments and virtual reality on data insights.

5 Future Work: Suggests potential improvements including alternative clustering techniques, advanced graph layout algorithms, and software integration.

Keywords

Computational biology, Genomics, Gene expression, Correlation networks, k-means clustering, Data visualization, Force directed layout, Linear interpolation, Spline interpolation, Virtual reality, Bioinformatics, Data preprocessing, Network simulation, 3D modelling, Statistical modeling.

Frequently Asked Questions

What is the core focus of this research?

The thesis focuses on developing a tool to visualize how gene expression correlation networks change over time, specifically aimed at genomic datasets.

What are the primary themes covered in this work?

The main themes include data preprocessing, clustering algorithms, network construction, visualization techniques (2D vs 3D), and simulation models for temporal data.

What is the main research question?

The research asks how 3D environments and specific layout algorithms can best improve the readability and interpretability of complex, time-varying gene correlation data.

Which scientific methods are utilized?

The work utilizes k-means clustering, Spearman correlation calculations, force-directed graph drawing, and various interpolation techniques like spline and linear interpolation.

What does the main body discuss?

The main body details the methodology for filtering and processing raw genomic data, the implementation of 3D force-directed visualizations, and results from experiments testing computing efficiency.

Which keywords characterize this work?

Key terms include Genomics, Bioinformatics, k-means clustering, Force directed layout, and Virtual Reality.

Why is the k-means algorithm used for clustering?

It is used because it is fast, easy to implement, and effectively divides large, arbitrary datasets into smaller, more manageable clusters to reduce computation time.

How does virtual reality enhance the analysis?

Virtual reality allows the user to navigate and stand within a 3D model of the data, providing a more intuitive sense of scale and structure than standard 2D desktop displays.

Why is the spline interpolation preferred over other simulation methods?

Spline interpolation is chosen because it offers higher precision in calculating intermediate time points compared to linear interpolation, while avoiding the overfitting risks associated with other polynomial methods.

How does the size of the cluster affect system performance?

The experiments show that larger clusters significantly increase computation time, making data filtering essential to maintain a feasible performance for the interactive visualization tool.

Excerpt out of 9 pages  - scroll top

Details

Title
Visualising how correlation networks change over time
College
Maastricht University  (Department of Data Science and Knowledge Engineering)
Grade
8.5/10
Author
Hendricus Bongers (Author)
Publication Year
2016
Pages
9
Catalog Number
V370173
ISBN (eBook)
9783668483972
Language
English
Tags
Bioinformatics visualising virtual reality Machine Learning Data Science
Product Safety
GRIN Publishing GmbH
Quote paper
Hendricus Bongers (Author), 2016, Visualising how correlation networks change over time, Munich, GRIN Verlag, https://www.grin.com/document/370173
Look inside the ebook
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
Excerpt from  9  pages
Grin logo
  • Grin.com
  • Shipping
  • Contact
  • Privacy
  • Terms
  • Imprint