Since 2013 generative neural networks are used for tasks like generating audio or image data. However, there is no publication which uses their capabilities for de novo ligand and or protein design yet. In this work, a generative neural network is introduced – the PG-VUGAN (progressively growing variational U-NET generative adversarial network) with which it is intended to fill this knowledge-gap.
The PG-VUGAN consumes a rich molecular image (RMI) of either the ligand or the pocket and can generate its complementary counterpart. This is practically demonstrated for de novo ligand design in this paper. The RMI is a new image-based format for molecular structures, which is specifically designed for being performantly processed by convolutional neural networks. Its suitability is demonstrated by developing a state-of-the-art binding-affinity regressor. Summing up, a first step towards artificially generated ligands and proteins via generative neural networks was made.
Protein-ligand interactions control cellular processes and are therefore essential for all living beings. Hence, generating complementary ligands for a protein-structure or vice-versa the prediction of complementary protein-structures for ligands is a desirable intent of science. Possible use-cases for de novo ligand and protein design can be found in all fields of biotechnology and reach from drug discovery and individual medicine up to the creation of artificial enzymes.
Designing these molecules from scratch is challenging; and yet, the technology for de novo design is in its early stages. The reason is, that existing tools rely on the assumptions of experts and on mathematical approximations with which their real physical nature can only be simulated partly. Artificial neural networks promise to pass these limitations.
Inhaltsverzeichnis (Table of Contents)
- Introduction
- Overview
- Basics
- Biological background and terms
- Proteins
- The key lock principle
- Drugs and receptors
- Intermolecular interactions
- Enzyme, theozyme and theosite
- Data formats for molecular structures
- 1D – Arrays
- Atom list / rich atom list
- SMILES
- Descriptors
- Fingerprints
- 2D-matrix
- Adjacency matrix
- Coulomb matrix
- Contact map and coevolutionary analysis
- Images of a visualization tool
- Adjacency matrix
- 3D-Matrix
- Voxel representations
- Rich voxel
- Wavelet
- GRID maps (3D - pharmacophore)
- Voxel representations
- 1D – Arrays
- Drug and protein design
- Drug design
- Structure based drug design
- Docking and virtual high throughput screening
- Scoring functions
- Assisted model building with energy refinement
- Incremental construction docking tools and FlexX
- Evolutionary algorithms and Autodock 4.2
- Shape-based docking
- Ligand based drug design
- Library search
- Quantitative-structure-activity relationships models
- De novo drug design via molecular modeling
- Incremental construction algorithms
- LUDI
- FlexNovo
- Evolutionary algorithms
- Incremental construction algorithms
- Structure based drug design
- Protein design
- Directed evolution
- Rational design
- De novo protein design
- Rosetta Commons
- Rosetta (ab initio) structure prediction
- Rosetta Match
- RosettaDesign
- ScaffoldSelection
- Rosetta Commons
- Deep learning
- Recent architectural enhancements of deep models and new architectures
- Deep residual learning
- Inception modules & InceptResNet v2
- Attention modules
- Filter-generating network
- Squeeze-and-Excitation block
- Spatial transformer
- Residual attention module
- 3D convolutional neural networks
- Multi-view networks
- Graph convolutional networks
- Tree-LSTM
- LSTM - cell
- N-ary Tree-LSTM cell
- Generative neural networks
- Generative adversarial network
- Autoencoders
- Variational autoencoder
- Adversarial autoencoder
- VAEGAN
- Recent proceedings in generative neural networks.
- Common issues of training GANs and how to deal with them
- Mini batch discrimination
- Feature matching
- Historical averaging
- Noisy labels
- Semi-Supervised GAN
- Least squares GAN
- Wasserstein GAN
- WGAN with gradient penalty
- Recently as useful proven architectures
- U-NET
- Variational U-NET
- Patch networks
- Discovery GAN
- BicycleGAN
- StackGAN
- Progressively growing GAN
- Common issues of training GANs and how to deal with them
- Deep learning for drug discovery
- De novo drug design via deep learning
- SMILES variational autoencoder
- Wavelet autoencoder
- druGAN
- Feature regressors for molecular properties
- KDEEP: a 3D convolutional network
- SchNET: a graph convolutional network
- De novo drug design via deep learning
- Dealing with dataset limitations
- Data augmentation
- Transfer learning
- Multitask learning
- Recent architectural enhancements of deep models and new architectures
- De novo 3D ligand and protein design via deep learning
- Overview: applied de novo design
- Data preparation
- Datasets
- PDBbind
- QM9
- ZINC
- CelebA
- Test complexes: Ibuprofen, HIV-Integrase and 3-dehydroquinate dehydratase
- Pose normalization
- Datasets
- Explorative phase
- RAL based approaches
- SchNET variations
- VAE and double VAE(GAN) for protein-complexes with SchNET
- Ligand autoencoding with RAL based VAEs
- Strategies to tackle the sparsity problem
- Conclusion RAL based approach
- Rich molecular image based approaches
- Rich molecular image
- VAEGANs for ligands represented as rich molecular image
- PDBbind analysis and dataset reduction
- RMI based VAEGAN on the filtered dataset
- Conclusions for RMI based VAE and VAEGAN approaches
- VUNET and the VUGAN for de novo design
- VUNET
- VUGAN
- VUGAN trained on the RV format
- How to use the VUNET and VUGAN for de novo protein design
- Additional use-cases
- Summary and conclusion of the explorative phase
- RAL based approaches
- Refinement phase
- PG-VUGAN for de novo design
- Loss functions, penalties, and output variations
- Improving the rich molecular image format
- Rich molecular image with atomic radius
- Min-max scaling
- RMI for complexes
- Ligand vs. complex PCA based pose normalization
- Comparing representations for complexes
- Designing a binding affinity regressor
- Convolutional architectures in comparison
- Designing a binding affinity regressor
- Multi-view networks
- Compensating the limitations of the PDBbind dataset
- Multi-task learning
- Network-based transfer learning
- Data augmentation
- Abridgement of the engagements towards increased binding affinity
regression performance
- MV- DilSEption a model with beneficial contributions
- Rethinking the PG-VUGAN method
- Architecture
- Reducing the output channels of the rich molecular image
- Transfer learning
- Image resizing
- Loss contributions
- Growing procedure
- Initiation criterion
- Layer fade-in
- Stabilizing the adversarial training
- Least-squares GAN
- Semi-supervised learning
- Mini batch discrimination
- Feature matching
- Activity penalty for the discriminators feature matching layer
- Using a latent feature regressor
- Training balancing
- Data balancing
- Loss normalization
- Generator / discriminator training ratio balancing
- The approach as pseudo code
- Result
- Summary and conclusion
- PG-VUGAN for de novo design
- List of abbreviations
- List of figures
- List of tables
- Bibliography
- Supplementary material
- Drug design
- Biological background and terms
- Quote paper
- Matthias Rieger (Author), 2019, Steps towards de Novo 3D Ligand and Protein Design via Deep Learning, Munich, GRIN Verlag, https://www.grin.com/document/926236