Novel structural element bridges de novo protein design challenge of functional protein-ligand interactions

Essay, 2021

11 Pages, Grade: 1,0

Free online reading



2.1 VdM generation leads to cluster formation
2.2 Cluster scores categorize vdMs
2.3 Rosetta modeling utilizes Monte Carlo sampling

3.1 Streptavidin-biotin complex demonstrates C scores and vdM generation
3.2 De novo design for apixaban challenges vdM method
3.3 COMBS algorithm gathers binding site specifics
3.4 Ligand-superimposed vdMs define the ligand’s binding pose
3.5 Rosetta designs protein backbone and performs ab initio folding
3.6 Mutant proteins reaffirm protein design




Abbildung in dieser Leseprobe nicht enthalten


De novo protein design aims to create novel protein folds from scratch without sequence homologies to any known proteins. These proteins introduce new ligand-protein interactions. Targeting small, polar ligands with structural complexity has remained a great challenge for protein engineers. Methods applied involved ligand-appended rotamers that often failed due to the lever-arm effect — Minor uncertainties in the location of ligand and protein backbone snowballed.

A van der Mer (vdM) is a structural unit consisting of an amino acid and an interacting chemical group. Therefore, both of their coordinates unite. The element allows for a precise design which is directed towards functional groups of the ligand. The Convergent Motifs for Binding Sites algorithm selects vdMs with favorable interactions. The Rosetta program is used for flexible backbone design and ab initio folding predictions. Six proteins were designed by this protocol that bind apixaban. Two proteins bound the drug effectively. Potentially, the method can be applied to design more ligand-binding proteins and protein-protein interactions. For instance, these findings could drive the development nanomaterials based on proteins forward. Consequently, vdMs provide an innovative approach to de novo protein design.


Understanding human biology is directly related to unraveling the interactions of proteins that drive pathways of the body. “Form follows function” poses a key concept that correlates protein structure with its purpose. Similarly, it is necessary to design novel protein conformations to introduce new functionality.

Proteins self-assemble based on their primary sequence to form local secondary structures that interact as one tertiary unit. This final native conformation is the most stable thermodynamically and thus minimizes Gibbs free energy (Anfinsen, 1973). In the energy landscape theory the conformational energy of the protein fold decreases over the course of a folding funnel with many local minima towards a global minimum — the folded state (Saven, 2014). Insights in protein assembly help to avoid molten globule states in which structures do not achieve their native fold (Jensen, 2009).

The goal of de novo protein design is to discover one of multiple sequences that will lead to the desired 3D fold. The burial of amino acid side chains acts as the driving force in the folding process. However, the orientation of side chains is limited to a set of rotamers due to torsional restrictions (Korendovych and DeGrado, 2020). Structural databases like the Protein Data Bank (PDB) store Information on possible rotamers. De novo protein design exploits the database to feed side chain repacking algorithms. It is this computational protein design that enables template-free design from scratch.

Previously new protein functions were only assigned to the structure after it was built. Now simultaneous design of fold and function is possible (Peacock, 2020). First to design a ligand binding protein, a topology is selected based on the size of the ligand and stability. Options include curved β-sheets (Marcos et al., 2017), as well as helical bundles where small substrates can bind internally (Korendovych and DeGrado, 2020).

Next, models are assembled through varying backbone fragments. Placement algorithms append the target ligand relative to a discrete set of rotamers. Based on van der Waals and electrostatic interactions, hydrogen bonds and solvation potential energy functions are created to rank the thermodynamic stability (Lassila et al., 2006). Protein folds with lowest energies are selected for further analysis. Nevertheless, this approach often results in idealized interaction geometries. Therefore, proteins are unstable once synthesized.

DeGrado and Polizzi have created a new small structural unit, the van der Mer (vdM), to improve the accuracy of de novo protein design predictions. The name is derived from their analogy to rotamers and van der Waals forces that attract molecules (Extance, 2020). A vdM is a fragment that consists of a rotamer of an amino acid side chain and its backbone as well as the interacting chemical group of the ligand. VdMs link backbone coordinates with statistically preferred chemical group locations to construct realistic structures.

This technique enables the targeting of complex ligands with high polarity specifically for the first time. DeGrado and Polizzi validate their hypothesis by applying their findings to the blood-thinning drug apixaban. They build a helical bundle de novo to bind excess apixaban. In medicine other potential applications includes the design of scaffolds to bind immunogens for vaccines (Kuhlman and Bradley, 2019). The reliable construction of self-assembling protein cages could further advance the development of protocells and nanomaterials (Fletcher et al., 2013). Additionally, the findings help build new knowledge on physical and structural constraints of protein folds.


2.1 VdM generation leads to cluster formation

VdMs are derived from the interaction of protein backbone fragments and the chemical groups (CG) of the ligand. First the ligand conformation is determined. Then the target CGs of the ligand are chosen. A list of all side chain interactions of an amino acid with the target CG is generated. The data from screening natural proteins in the Protein Data Base (PDB) is used. To narrow down the number of possible contacts, DeGrado and Polizzi considered hydrogen bonds only. The connected amino acids and CGs are grouped by residue type e.g. Asp / CONH2.

They performed two different superpositions of all vdMs simultaneously. First representative vdMs are generated by alignment of main chain atoms only. The process is depicted in figure 1 (Polizzi and DeGrado, 2020). These representative vdMs can be projected onto the protein backbone in de novo protein design. To score the vdMs, they are superimposed a second time by main chain atoms and CG coordinates. Broader geometric clusters are assigned based on a 0.5 Å root median square deviation (RMSD) cutoff. The researchers used these separate clusters to calculate C scores for representative vdMs. Side chains are not considered in clustering.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1 – Generation of representative vdMs (Polizzi and DeGrado, 2020). A) All vdM cluster members are superimposed by their main chain atoms only. The amino acid Asp is depicted in green and the chemical group CONH2 in cyan, oxygen (red), nitrogen (blue) and hydrogen (white). Then vdMs are split into separate sub-clusters according to coordinates of side chains and chemical groups with 0.1 Å RMSD. The centroids of these representative vdMs are shown in B.

2.2 Cluster scores categorize vdMs

To rank the generated vdMs, DeGrado and Polizzi introduced the log-odd cluster score C as in Eq. (1). C contrasts the number of cluster members in a cluster compared to the average number of members. Thus, it is a measure for the prevalence of the cluster in the PDB. Here, clusters aligned by main chain atoms and CGs are considered. Because those clusters are not defined by the position of the side chain, various rotamers belong to the same cluster sharing a C value. In this way, the scientists predict preferred placements of the backbone relative to the CGs.

Abbildung in dieser Leseprobe nicht enthalten

2.3 Rosetta modeling utilizes Monte Carlo sampling

Rosetta is an algorithm predicting protein structures based on lowest energies (Kaufmann et al., 2010). It simulates the folding process ab initio, that is based on the sequence. Monte Carlo sampling performs random folding moves on the primary structure and evaluates the folds regarding change in energy. Frequently the algorithm rejects unfavorable steps and replaces fragments. The process continues until a minimum is reached. After several repeats it likely detects the global minimum, i.e. the native conformation (Kuhlman and Bradley, 2019).


3.1 Streptavidin-biotin complex demonstrates C scores and vdM generation

Presumably natural proteins exhibit favorable binding sites to bind ligands tightly. In an experiment, the methods of vdM generation and C scoring were applied to reconstruct the natural protein streptavidin that interacts with biotin. First vdMs were constructed based on the CGs of biotin. Here backbone amide nitrogen, carbonyl and carboxylate of side chains were targeted. For the backbone design, the native streptavidin backbone was used. After sampling the representative vdMs onto the backbone, those with maximal C scores were selected. Every generated model showed side chain interactions with C > 2. Additionally, multiple vdMs at once often bound the CGs collectively. Therefore, the scientists tried to generate models that involve cooperative binding of CGs in the following experiments.

3.2 De novo design for apixaban challenges vdM method

Next, DeGrado and Polizzi applied vdMs to de novo protein design. These experiments tested whether structure and function can be designed simultaneously with their method. The target posed a small molecular drug called apixaban. By binding, apixaban inhibits the protein factor Xa and prevents blood clotting. The tertiary structure of a novel protein needed to differ from factor Xa. Also, the fold had to bind the polar CGs of apixaban cooperatively. A de novo helical bundle was selected as the basis. Usually, this structure attracts metal ions and metalloporphyrins only through coordinate bonding. However, modification of the tubular shape potentially resulted in high thermodynamic stability.

3.3 COMBS algorithm gathers binding site specifics

For the design process, DeGrado and Polizzi developed the search algorithm “Convergent Motifs for Binding Sites” (COMBS). The first steps aimed to generate the specific information necessary to design the binding site of a novel protein. First, CGs of the ligand were chosen as the objective. CGs that are polar, form hydrogen bonds and are fragments of amino acid side or main chains were focused on exclusively. In the case of apixaban, they chose carboxamide and carbonyl groups. After the ligand structure was determined, representative vdMs were generated. Subsequently, the researchers selected a designable protein fold. For apixaban that was a four helical bundle. In the resulting ensemble of backbones, COMBS identified locations of the backbone that enable collective binding of the CGs.

To enumerate all possible positions, every single representative vdM was loaded onto these backbone positions concurrently. Based on van der Waals radii of surrounding backbone atoms, vdMs that clashed with the backbone were removed. The coordinates of the remaining vdMs were filed in a nearest neighbor lookup table. When entering the position of CGs, this table presented the complete list of vdMs that were able to place the CG at this position.

3.4 Ligand-superimposed vdMs define the ligand’s binding pose

Former techniques of placing the ligand in the designed binding site suffered the lever-arm effect. The ligand was appended to one rotamer in the binding site. Additional rotamers that would position the entire ligand in this defined region were selected for the binding site. However, if the interaction of ligand and protein varied slightly, that would drastically shift the location of remote parts of the ligand. Relying on vdMs reduces the lever arm effect because rotamers are chosen merely based on the location of a ligand’s CGs. Only the first positioning of the ligand is vulnerable to the deviation.

The initial placement of the ligand is guided by a set of ligand-superimposed vdMs where CGs are aligned. If the superposition causes the ligand to clash with the vdM, those are removed. Afterwards, the ligand-superimposed vdMs remaining are loaded onto the backbone as described in 3.3. Any vdMs that do not place at least 60 % of the ligand in the protein’s interior are removed. The burial of the polar groups would not be effective enough for binding apixaban. Furthermore, all vdMs are deleted that have ligands overlapping with the backbone. All ligand-superimposed vdMs that are leftover are considered as the first contact in a potential binding site.

This is where DeGrado and Polizzi refer back to the nearest neighbor lookup table (see 3.3). Looking at each individual ligand-superimposed vdM, the coordinates of the CGs leftover are inserted as queries. Out of all the possible vdMs of which CGs overlap with the ligand, the most satisfactory positions are selected. The selection is grounded not only on maximizing the sum of C scores but also on best ligand burial.

3.5 Rosetta designs protein backbone and performs ab initio folding

For the design of the backbone, DeGrado and Polizzi ran the Rosetta algorithm. Interhelical loops are inserted during the design process. Meanwhile, all rotamers of representative vdMs that interact with the ligand are frozen in their location. These structures are tested for explicit folding funnels with a single global minimum. It helps to ensure that the protein will adapt that specific fold only. Accordingly, the Monte Carlo protocol is applied to perform ab initio folding. Negative design of the surface residues is used to modify the Rosetta files. Here, a pattern of negative and positive charges is built to stabilize the structure.

Following this design protocol, DeGrado and Polizzi designed six potential proteins of varying characteristics de novo. However, one was predicted not to fold at all. In three others, they anticipated that the binding site would collapse and do not provide enough room to bind apixaban. In last two designs, the binding site was predicted to remain accessible. They called the proteins apixaban-binding helical bundle (ABLE) and longer apixaban-binding helical bundle (LABLE). Once synthesized, the prediction was validated — Only ABLE and LABLE bound apixaban tightly.

LABLE is 165 residues and ABLE is 125 residues long. Although the designs share roughly 22 % of their sequence, apixaban is oriented in the same way. They share main interactions with the ligand like a His/C=O vdM with C = 2.1. For further analysis the scientists tried to crystallize both proteins in a sparse matrix screen. Because LABLE failed to crystallize, they focused on ABLE.

Comparing the crystal structure of ABLE with the design model revealed that the structures were in considerable alignment. Both superposition by main chain (Cα RMSD 0.7) and rotamers of core side chains were in agreement. Drug-free and drug-bound rotamers were almost identical. Additionally, ABLE is monomeric in solution and melts temperatures greater than 95°C. It is not only highly robust against heat but also binds apixaban 20-fold tighter than a comparable factor Xa inhibitor, rivaroxaban.

3.6 Mutant proteins reaffirm protein design

Contrasting the drug-free and drug-bound structure of ABLE, the protein’s entropy changed. Before the binding event several residues displayed two alternate rotamers that interact with apixaban. Once the ligand bound, one rotamer was fixed, hence reducing entropy. Nonetheless, the placement of the CG relative to the main chain was not altered. This was the case for His/C=O vdMs like His[49] for example.

In the following experiments DeGrado and Polizzi investigated whether structural consequences would arise when substituting three residues responsible for the main interactions with apixaban: His[49], Gln[14] and Thr[112]. In the H49A mutant protein, they substituted His[49] with Ala. In comparison, unliganded ABLE and H49A were similar (Cα RMSD 1.2 Å). Yet, without the side chain of His[49], the core was packed looser. They observed only the preferred rotamers in H49A. The substitution of Gln[14] with Ala yielded analogous results where the mutant protein’s affinity for apixaban was reduced approximately 3-fold. When swapping out Thr[112] with Ala, the affinity was not reduced though the rotamer did not bind to the carbonyl group of apixaban. Rather it formed an interhelical H-bond to a backbone carbonyl group.

The researchers experimented with two additional residues that were inserted during the flexible backbone design, namely Tyr[6] and Tyr[46]. Tyr[6] was part of the vdM database with C = 0.4. Therefore, it was predicted to form a hydrogen bond with carboxamide. Analysis of ABLE confirmed the interaction. Substituting this side chain with Phe or Ala destabilized the protein the most out of all substitutions tested. Despite that, substituting Tyr[46] had resulted in less severe destabilization. The hydrogen bond was formed via an unanticipated water molecule. Moreover, Tyr[46]/C=O was not part of the vdM database.


DeGrado and Polizzi achieved the design of the ligand-binding protein ABLE de novo. Designing novel, functional protein units has failed in the past because models for protein-ligand interactions were insufficient (Tinberg and Khare, 2017). Through their work, the authors created a universal design protocol for de novo protein design utilizing vdMs. Beyond that they gained new insights on how proteins fold and interact with ligands.

In their initial reconstruction of the streptavidin-biotin complex, DeGrado and Polizzi predicted C > 2 for all vdMs interacting with biotin. The score led to the conclusion that the vdMs selected bind the ligand sufficiently. As the natural structure confirms, C scores were a useful way to select vdMs for binding sites. Using vdMs for scoring protein-ligand interactions is an innovative approach that combines empirical data from the PDB with a quantitative measure, the C score. Simultaneously, possible rotamers are listed. Previously, only a completed ligand design could be evaluated (Schneider and Baringhaus, 2014). However, vdMs score interactions before the design is finished. This way, suitable interactions are selected early in the process.

Ligands bind to natural proteins by first being attracted to certain fragments within the fold. Therefore, it is crucial for binding affinity that these spaces are accessible to the ligand. DeGrado and Polizzi developed six potential proteins to bind apixaban. Performing ab initio folding on the models provided a proper prediction of which proteins will maintain an open binding site. This technique could benefit future de novo design applications by obviating the need to test every single model once synthesized.

After ABLE bound apixaban its entropy decreased. Nonetheless, DeGrado and Polizzi see that the loss in the degrees of freedom is traded for a gain in enthalpically favorable interactions. In H49A, mutant proteins preferred rotamers that were present before the binding event. His[49] provided tight packing that was now missing. H49A bound apixaban less effectively. To sum up, the tight packing of core residues improved the binding of the ligand although it stressed surrounding side chains.

Furthermore, the comparison of Tyr[6] and Tyr[46] mutant proteins shows that key vdMs with C > 0 stabilize the protein effectively. Interactions that are not part of the vdM data base e.g. Tyr[46] do not decrease protein stability as severely. This observation demonstrates that vdMs can be applied to rank protein-ligand interactions. So, with vdMs keystone interactions can be detected.

Because vdMs consider the coordinates of the backbone and CGs alone, the exact positioning of the side chain does not impact the design. Yet, the experiment on substituting Thr[112] from a Thr/C=O vdM exposed a flaw in the method — The side chain did not bind to apixaban but rather a backbone carbonyl. The scientists believe that this is due to the use of a backbone-independent vdM library. Because the backbone was not considered when choosing the vdM, the unintended interaction occurred. For the future they recommend backbone-dependent libraries to prevent unfavorable interactions of vdMs with the backbone.

VdMs and the associated COMBS algorithm pose a powerful new approach to de novo protein design. The method does not require matching of an entire ligand any longer but instead focuses on targeting single CGs. The design of ABLE demonstrates that ligand-binding proteins for small polar molecules can be designed fully de novo. Beyond that, the novel proteins might exceed the functionality of natural proteins. ABLE endures heat better than most proteins while closely targeting its ligand specifically. The method did work for apixaban and will probably apply to a variety of other protein engineering applications as well. For instance, vdMs might yield the design of novel protein-protein interfaces for medical applications. The design of novel immunogens that combine multiple antigens is just one potential application among many (Sesterhenn et al., 2020).


Anfinsen, C.B. (1973). Principles that Govern the Folding of Protein Chains. Science 181, 223-230.

Day, A.L., Greisen, P., Doyle, L., Schena, A., Stella, N., Johnsson, K., Baker, D., and Stoddard, B. (2018). Unintended specificity of an engineered ligand-binding protein facilitated by unpredicted plasticity of the protein fold. Protein Eng Des Sel 31, 375-387.

Extance, A. (2020). Viewpoint shift designs drug binding proteins from scratch (Chemistry World).

Fletcher, J.M., Harniman, R.L., Barnes, F.R.H., Boyle, A.L., Collins, A., Mantell, J., Sharp, T.H., Antognozzi, M., Booth, P.J., Linden, N. , et al. (2013). Self-Assembling Cages from Coiled-Coil Peptide Modules. Science 340, 595-599.

Jancarik, J., and Kim, S.-H. (1991). Sparse matrix sampling: a screening method for crystallization of proteins. J Appl Crystallogr 24, 409-411.

Jensen, K.J. (2009). De Novo Design of Proteins. In Peptide and Protein Design for Biopharmaceutical Applications, K.J. Jensen, ed. (West Sussex, United Kingdom: John Wiley & Sons Ltd), pp. 207-248.

Kaufmann, K.W., Lemmon, G.H., DeLuca, S.L., Sheehan, J.H., and Meiler, J. (2010). Practically Useful: What the Rosetta Protein Modeling Suite Can Do for You. Biochemistry 49, 2987-2998.

Koepnick, B., Flatten, J., Husain, T., Ford, A., Silva, D.-A., Bick, M.J., Bauer, A., Liu, G., Ishida, Y., Boykov, A. , et al. (2019). De novo protein design by citizen scientists. Nature 570, 390-394.

Korendovych, I.V., and DeGrado, W.F. (2020). De novo protein design, a retrospective. Q Rev Biophys 53, e3.

Kuhlman, B., and Bradley, P. (2019). Advances in protein structure prediction and design. Nat Rev Mol Cell Biol 20, 681-697.

Kuhlman, B., Dantas, G., Ireton, G.C., Varani, G., Stoddard, B.L., and Baker, D. (2003). Design of a Novel Globular Protein Fold with Atomic-Level Accuracy. Science 302, 1364-1368.

Lassila, J.K., Privett, H.K., Allen, B.D., and Mayo, S.L. (2006). Combinatorial methods for small-molecule placement in computational enzyme design. Proc Natl Acad Sci 103, 16710-16715.

Marcos, E., Basanta, B., Chidyausiku, T.M., Tang, Y., Oberdorfer, G., Liu, G., Swapna, G.V.T., Guan, R., Silva, D.-A., Dou, J. , et al. (2017). Principles for designing proteins with cavities formed by curved β sheets. Science 355, 201-206.

Peacock, A. (2020). Can proteins be truly designed sans function? Science 369, 1166-1167.

Polizzi, N.F., and DeGrado, W.F. (2020). A defined structural unit enables de novo design of small-molecule–binding proteins. Science 369, 1227-1233.

Saven, J.G. (2014). De Novo Computational Protein Design. In De Novo Molecular Design, G. Schneider, ed. (Weinheim, Germany: Wiley-VCH Verlag GmbH), pp. 469-494.

Schneider, G., and Baringhaus, K.-H. (2014). De novo Design: From Models to Molecules. In De novo Molecular Design, G. Schneider, ed. (Weinheim, Germany: Wiley-VCH Verlag GmbH).

Sesterhenn, F., Yang, C., Bonet, J., Cramer, J.T., Wen, X., Wang, Y., Chiang, C.-I., Abriata, L.A., Kucharska, I., Castoro, G. , et al. (2020). De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science 368, eaay5051.

Settanni, G. (2015). Simulations and Experiments in Protein Folding. In Molecular Modeling of Proteins, A. Kukol, ed. (New York, U.S.A.: Human Press), pp. 289-330.

Thomson, A.R., Wood, C.W., Burton, A.J., Bartlett, G.J., Sessions, R.B., Brady, R.L., and Woolfson, D.N. (2014). Computational design of water-soluble α-helical barrels. Science 346, 485-488.

Tinberg, C.E., and Khare, S.D. (2017). Computational Design of Ligand Binding Proteins. In Computational Protein Design, I. Samish, ed. (New York, U.S.A.: Human Press), pp. 363-373.

Tinberg, C.E., Khare, S.D., Dou, J., Doyle, L., Nelson, J.W., Schena, A., Jankowski, W., Kalodimos, C.G., Johnsson, K., Stoddard, B.L. , et al. (2013). Computational design of ligand-binding proteins with high affinity and selectivity. Nature 501, 212-216.


11 of 11 pages


Novel structural element bridges de novo protein design challenge of functional protein-ligand interactions
University of Heidelberg
Ausgewählte Themen der Molekularen Biotechnologie (unter Einbeziehung von Vortragstechniken und wissenschaftlichem Englisch)
Catalog Number
ISBN (eBook)
Protein, Engineering, vdm, Protein Design, de novo, ligand, algorithm, binding, rosetta
Quote paper
Carina Bolz (Author), 2021, Novel structural element bridges de novo protein design challenge of functional protein-ligand interactions, Munich, GRIN Verlag,


  • No comments yet.
Read the ebook
Title: Novel structural element bridges de novo protein design challenge of functional protein-ligand interactions

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free