InterEvDock

A docking server to predict the structure of protein-protein interactions using evolutionary information.

Overview

This website is free and open to all users and there is no login requirement.
InterEvDock-ataglance
Two protein structures and their respective multiple sequence alignments are used to predict binding modes through a free docking procedure.

Why InterEvDock ?


The structural modelling of protein-protein interactions is key in understanding how cell machineries assemble and cross-talk with each other. When homologous sequences are available for both protein partners, it is very useful to rely on structures and multiple sequence alignments to identify binding interfaces. InterEvDock is a server for protein docking running the InterEvScore potential specifically designed to integrate evolutionary information in the docking process. The InterEvScore potential was developed for heteromeric protein interfaces and combines a residue-based multi-body statistical potential with evolutionary information derived from the multiple sequence alignments of each partner in the complex. In InterEvDock server, the systematic docking search is performed using the FRODOCK program [1] and the resulting models are re-scored with InterEvScore [2] together with the SOAP_PP atom-based statistical potential [3] found to increase the confidence of the predictions.

Run InterEvDock


The InterEvDock service is integrated in the RPBS Mobyle Portal.

When using this service, please cite the following references:

Yu J, Vavrusa M, Andreani J, Rey J, Tufféry P, Guerois R.
InterEvDock: A docking server to predict the structure of protein-protein interactions using evolutionary information.
Nucleic Acids Res. 2016 Jul 8;44(W1):W542-9.
Andreani J, Faure G, Guerois R.
InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution.
Bioinformatics. 2013 29(14):1742-9.

Please, cite also the FRODOCK program which is used for the rigid-body docking step :

Garzon JI, Lopéz-Blanco JR, Pons C, Kovacs J, Abagyan R, Fernandez-Recio J,Chacon P.
FRODOCK: a new approach for fast rotational protein-protein docking.
Bioinformatics. 2009;25(19):2544-51.

If you use the results of SOAP_PP, please cite :

Dong GQ, Fan H, Schneidman-Duhovny D, Webb B, Sali A.
Optimized atomic statistical potentials: assessment of protein interfaces and loops.
Bioinformatics. 2013;29(24):3158-66.

If you use the evolutionary conservation results obtained using Rate4Site (mapped onto all visualized models in the PV applet and written into the b-factor field of the PDB files provided for all models in the results zip archive) please cite :

Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N
Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues.
Bioinformatics. 2002; 18 Suppl 1:S71-77.

Latest news


  • December 10, 2015 : Service opens.

InterEvDock design


InterEvDock takes as input the structures of two individual proteins to be docked (experimental or modelled structures).
The server runs several steps to propose a selection of 10 most likely models for each score (INTEREVSCORE, SOAP_PP, FRODOCK scores) as well as 10 consensus models and 5 most likely interface residues on each protein:
  • Extracts the sequences of both partners and automatically builds two multiple sequence alignments ranking homologs of the same species in the same order. Both alignments are used in INTEREVSCORE scoring. Users can also submit their own co-alignments
  • Performs an exhaustive rigid-body search using FRODOCK algorithm [1]
  • All FRODOCK decoys are clustered using a ligand RMSD threshold of 4Å and ranked with respect to their energy.
  • The best 10,000 FRODOCK clusters are scored by INTEREVSCORE [2] and SOAP_PP [3] potentials.
  • For each score, the top 1000 models are clustered using FCC [4] and the 10 best representative models of complexes are provided (ranked by score).
  • The 10 most likely models out of those 30 are selected by the InterEvDock consensus method, by grouping similar models well-ranked by different scoring functions.
  • Finally, a selection of 5 residues on each protein is proposed. Those are the residues most likely involved in the interface based on the best models, which can subsequently be used to implement constraints in flexible docking simulations or to guide mutagenesis for interface disruption.
The docking server implements 3 major methods :
  • FRODOCK, a rigid-body docking method which combines a search algorithm based on spherical harmonics and an energy-based scoring function including van der Waals, electrostatics and desolvation terms. Challenged on Weng's Benchmark v4 [5], the version running in InterEvDock was found to generate successful models with ligand RMSD below 10 Å in up to 81.8 % of the top2000 models, very close to ZDOCK3.0.2 performances (84.1 %) [6] (developed in Pablo Chacón's lab [1]).

  • INTEREVSCORE , a scoring function combining a residue-based statistical potential including both two- and three-body statistical potentials with the scoring of interface contacts inferred from multiple sequence alignments. This way of integrating evolutionary information was found significantly superior to solely accounting for conserved positions. The server version includes the mode in which InterEvScore uses evolutionary information only for residues belonging to apolar patches as described in [2]. (developed in Guerois' lab see InterEvScore [2])
  • InterEvScore-components
  • SOAP_PP , an atom-based statistical potential dedicated to protein-protein interactions derived from a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven ‘recovery’ functions. Relative orientation between two covalent bonds instead of a simple distance between two atoms contribute to capture orientation-dependent interactions such as hydrogen bonds. (developed in Andrej Sali's lab [3])

What InterEvDock does not perform

InterEvDock is a method for generating complex between two protein structures modelled as rigid-body subunits. Selected models may have some clashes that can be released upon relaxation of the models.

InterEvDock supposes the interaction between the two submitted subunits has been experimentally validated. It is not designed to predict neither the likelihood nor the strength of the interaction.

Large conformational changes upon binding are generally not well predicted.

Features

  • Generation of multiple sequence alignments for co-evolved partners: The InterEvDock server generates multiple sequence alignments of the binding partners, so that homologs of the same species are aligned in the same order in both alignments.
  • Selection of most likely binding modes : Starting from two structures of interacting proteins, InterEvDock identifies a maximum of 10 candidate binding modes for each of the 3 complementary scores computed. It also offers a selection of the 10 best consensus models.
  • Graphical exploration of the complexes: The structures of the decoys can be explored thanks to the PV applet (PV applet (M. Biasini), a WebGL-based viewer for proteins and other macromolecular structures. Both the evolutionary conservation of each partner (calculated with Rate4Site [9]) and the consensus interface can be visualized as color gradients on the surface of both protein partners.
  • Selection of 5 residues most likely involved in the interface on each protein: The 5 residues on each protein partner most likely involved in the interface are displayed in a table, together with their rank. Those residues can subsequently be used to implement constraints in flexible docking simulations or to guide mutagenesis for interface disruption.
  • Coordinates of the complexes and alignments are available for further off-web exploration: The selected models of complexes are available in the PDB format. The multiple sequence alignments of each subunit can also be retrieved in fasta format.

Limitations

  • Subunit size: Size of each submitted subunit should lie in the 10-1000 amino acids range.

  • Multi-chain subunit: Each submitted subunit should contain a single chain ID. If multiple chains are needed, users should move all chain IDs into the same ID before submission. In this case, we recommend providing a merged multiple sequence alignment. In the multiple sequence alignment of the merged subunit, the first sequence should match that of the merged subunit.

  • Design for proteins only: the server is currently not able to take into account nucleic acids or small molecules.

Usage

Input

    The most simple input consists in two fields to specify the 3D structures of the proteins to be docked.
  • Protein structures 1 and 2: It corresponds to the structure of the protein. It must be in the PDB format. Note that InterEvDock does not presently accept input other than proteins. The presence of nucleic acid chains for instance, will not be accepted. Also note that the PDB file will be cleaned up automatically (non-canonical amino acids, etc) but ligands are not removed.
  • Multiple Sequence Alignments: Two fields can be optionally filled to specify the multiple sequence alignments associated with each 3D structure. The first sequence in each alignments should perfectly match that of the corresponding PDB file (except gaps represented by '-'). The two alignments must contain sequences from the same set of species, appearing in the exact same order. If the multiple sequence alignments are not provided, they will be automatically generated by the InterEvDock server and will be provided in the results to facilitate potential re-submission.
  • Demonstration mode: Accessible from InterEvDock page by setting "Yes" on the Demonstration Mode. By setting this option to "Yes", InterEvDock will load a pre-configured test case for RAN GTPase (PDB:1QG4) in complex with NTF2 (PDB:1OUN). The results can be compared with the coordinates of the complex (PDB:1A2K).
    By switching on the demonstration mode, other input data specified in the other fields will be ignored.

Results

  • Progress report
    ProgressReport
    This section will incrementally provide information about job progression and errors if any. A typical run should produce a report similar to the one shown above. Errors related to the input data specified are also reported in this field.
  • Note InterEvDock runs last about 30 minutes if alignments are provided by the user and 1 to 2 hours otherwise, depending on the size of the proteins.
  • An interactive page allowing to browse the best complexes generated (see below Visualization and post-processing).
  • A zip archive containing the PDB files of the models, where models are indexed according to INTEREVSCORE, FRODOCK or SOAP_PP scores, as well as two tables reporting the 10 consensus models and the 5 most likely interface residues on each partner.
  • Two multiple sequence alignments generated and used for the INTEREVSCORE scoring.

Visualization and post-processing of resulting models

  • The best models can be explored using the PV applet.

    ModelsPDB
  • The predicted interface residues can also be visualized on both proteins as a color gradient(from green to white for high to low probability to be at the interface).

    PredictedInterface
  • The evolutionary conservation of each partner (calculated with Rate4Site [9]) can be visualized on both partners as a color gradient, from red (more conserved) to white (more diverse) through yellow (mild conservation).

    Conservation
  • A PyMOL script provided in the results zip archive for each run automatically loads the 30 models from the results zip file and colors them by interface residue consensus and evolutionary conservation.
  • Examples

  • Example 1: Example from Weng's Benchmark 4
    Experimental Complex

    Example of the RAN-NTF2 complex (PDB:1A2K)
    taken from Weng's Benchmark v4.

    In this particular case of Weng's Benchmark database v4, InterEvDock returned a model with an iRMSD of 2.7 Å with respect to the native structure of the complex (PDB:1A2K). This model was the representative structure of the third best cluster. In contrast, neither ZDOCK, ZRANK nor FRODOCK alone ranked an Acceptable model among their top clusters..

  • Example 2: An example from CAPRI30 (target T72)

    In the CAPRI30 session in which our group performed very well [7], targets of CASP11 crystallizing as homo-oligomers were proposed as CAPRI targets. Using the same strategy as implemented in the InterEvDock pipeline (rigid-body docking followed by scoring with multiple scores including InterEvScore), we submitted the model with the lowest interface RMSD (3.5 Å) for target T72. In case of target T72, template-based modelling led to misleading assembly prediction (right figure). Accordingly, at low sequence identity, assembly modes can substantially differ between remote homologs.

    Experimental Complex

    Xray structure of CASP11 target T0770 (4Q69), CAPRI30 Target72

    Best Model

    Structural model generated with correct orientation rated as Acceptable by CAPRI with interface RMSD of 3.5 Å obtained following a free docking procedure using the InterEvDock pipeline

    Template Based Model

    Incorrect model resulting from a simple template-based modelling based on pdb 3MX3 (seq. Id 20 %)

    Benchmark

    Dataset: Performance of InterEvScore was assessed on 85 complexes extracted from Weng's Benchmark 4.0 [5] for which the structures of the free proteins (unbound) are available and for which evolutionary information could be retrieved [2].
    A table of all 85 benchmark cases is available here.

    InterEvScore results: InterEvScore achieves significant improvement over traditional scoring functions on the 54 test cases from Weng's docking benchmark v4 with available coupled multiple sequence alignments and near-native decoys [2]. The use of evolution increased the quality of the prediction and was never detrimental in the discrimination of near-native interfaces, even though the number of sequences in the coupled alignments could be limited (between 10 and 100 species with an average of 35). In addition, we did not find the scoring improvement on inclusion of evolutionary data to be limited to certain categories of complexes (except for antibody-antigen complexes).

    InterEvDock results: The 85 test cases from Weng's docking benchmark v4 with available evolutionary information are enriched in medium and difficult cases. The InterEvDock server predicts an “Acceptable” or better solution in the consensus top10 for 21 out of 43 cases in the rigid-body category (49%). The InterEvDock server also predicts residues making contacts at the interface of a complex based on the analysis of all the interfaces of the top10 decoys for all three scores (30 models). In up to 91 % of the cases, at least one residue out of 10 was correctly predicted as present at the interface, providing very useful hints to guide mutagenesis experiments to disrupt a complex of interest Of note, there is almost no decrease in precision from the rigid-body to the difficult cases. Predictions of the InterEvDock server can thus also be used as a prior to constrain more thorough docking simulations requiring flexibility in order to model the correct orientation between two binding partners. In that perspective, in 54% of the cases, at least one correct residue is predicted on both sides of the interface, with very close performances for both rigid-body and difficult targets.

    References

    [1] JI Garzon, JR Lopéz-Blanco , C Pons, J Kovacs, R Abagyan, J Fernandez-Recio, P Chacon.
    FRODOCK: a new approach for fast rotational protein-protein docking.
    Bioinformatics. 2009; 25(19):2544-51.
    [2] J Andreani, G Faure, R Guerois.
    InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution.
    Bioinformatics 2013; 29 (14):1742–1749.
    [3] GQ Dong, H Fan, D Schneidman-Duhovny, B Webb, A Sali.
    Optimized atomic statistical potentials: assessment of protein interfaces and loops.
    Bioinformatics. 2013; 29(24):3158-66.
    [4] Rodrigues JPGLM, Trellet M, Schmitz C, Kastritis P, Karaca E, Melquiond ASJ, Bonvin AMJJ.
    Clustering biomolecular complexes by residue contacts similarity.
    Proteins: Structure, Function, and Bioinformatics 2012;80(7):1810–1817.
    [5] H Hwang, T Vreven, J Janin, Z Weng.
    Protein-protein docking benchmark version 4.0.
    Proteins. 2010; 78(15):3111-4.
    [6] SY Huang
    Exploring the potential of global protein-protein docking: an overview and critical assessment of current programs for automatic ab initio docking.
    Drug Discov Today. 2015; 20(8):969-77.
    [7] M Lensink, S Velankar, A Kryshtafovych, S Wodak
    Table CAPRI30: Ranking by number or INTERFACES for which at least one 'Acceptable' solution was obtained.
    In : Presentation from the CAPRI30 Cancun meeting
    [8] R Mendez, R Leplae, S Wodak
    Assessment of blind predictions: current status of docking methods.
    Proteins Struct. Func. & Bioinfo. 2003; 52:51-67.
    [9] T Pupko, RE Bell, I Mayrose, F Glaser, N Ben-Tal
    Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues.
    Bioinformatics. 2002; 18 Suppl 1:S71-77.