InterEvDock2

A docking server to predict the structure of protein-protein interactions using evolutionary information.

Run InterEvDock2

The InterEvDock2 service is integrated in the RPBS Mobyle Portal.

Overview

This website is free and there is no login requirement. Note that for homology modeling, using this service starting from sequences is allowed only for non-commercial users (non-profit/academic/governement research groups). Commercial users are asked to submit their own models.

InterEvDock-ataglance

Two protein structures (or structural models generated on the fly from input sequences) and their respective multiple sequence alignments are used to predict binding modes through a free docking procedure.

Why InterEvDock2 ?

The structural modelling of protein-protein interactions is key in understanding how cell machineries assemble and cross-talk with each other. When homologous sequences are available for both protein partners, it is very useful to rely on structures and multiple sequence alignments to identify binding interfaces. InterEvDock2 is a server for protein docking running the InterEvScore potential specifically designed to integrate evolutionary information in the docking process. The InterEvScore potential was developed for heteromeric protein interfaces and combines a residue-based multi-body statistical potential with evolutionary information derived from the multiple sequence alignments of each partner in the complex. In the InterEvDock2 server, the systematic docking search is performed using the FRODOCK2 program [1] and the resulting models are re-scored with InterEvScore [2] together with the SOAP_PP atom-based statistical potential [3] found to increase the confidence of the predictions.

InterEvDock2 is an update of InterEvDock [4] that can handle protein sequences as inputs, and not only protein 3D structures. When a sequence is provided by the user, a comparative modeling step based on an automatic template search protocol builds models for the individual protein partners, prior to docking. In InterEvDock2, in case the user has biological input such as a position that is known to be involved in the interface between the two protein partners, constraints can be specified for use in the docking procedure. This can be crucial to ensure that all available biologically relevant information is used for InterEvDock2 predictions. In addition, InterEvDock2 implements the possibility to submit structures of oligomers as input to the free docking. Such an option is generally complicated in co-evolution analyses since the joint MSAs have to be generated for every chain of an oligomer. This process is now fully automatized in InterEvDock2.

When using this service, please cite the following references:

Yu J, Vavrusa M, Andreani J, Rey J, Tufféry P, Guerois R.
InterEvDock: A docking server to predict the structure of protein-protein interactions using evolutionary information.
Nucleic Acids Res. 2016 Jul 8;44(W1):W542-9.
Andreani J, Faure G, Guerois R.
InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution.
Bioinformatics. 2013 29(14):1742-9.

Please, cite also the FRODOCK2 program which is used for the rigid-body docking step:

Ramirez-Aportela E, Lopéz-Blanco JR, Chacon P.
FRODOCK 2.0: fast protein-protein docking server.
Bioinformatics. 2016;32(15):2386-8.

Using the results of SOAP_PP, please cite :

Dong GQ, Fan H, Schneidman-Duhovny D, Webb B, Sali A.
Optimized atomic statistical potentials: assessment of protein interfaces and loops.
Bioinformatics. 2013;29(24):3158-66.

Using the evolutionary conservation results obtained using Rate4Site (mapped onto all visualized models in the PV applet and written into the b-factor field of the PDB files provided for all models in the results zip archive) please cite:

Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N.
Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues.
Bioinformatics. 2002; 18 Suppl 1:S71-77.

Using the comparative modeling protocol based on RosettaCM (i.e. if your input consists in one or two sequences), please cite:

Song Y, DiMaio F, Wang RY-R, Kim D, Miles C, Brunette TJ, Thompson J, Baker D.
High resolution comparative modeling with RosettaCM.
Structure. 2013; 21(10):10.

Using the automatic template search (i.e. if your input consists in one or two sequences and you did not specify a template), please cite:

Tatusova TA, Madden TL.
BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences.
FEMS Microbiol Lett. 1999;174(2):247-50.
Remmert M1, Biegert A, Hauser A, Soding J.
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.
Nat Methods. 2011;9(2):173-5.
Soding J.
Protein homology detection by HMM-HMM comparison.
Bioinformatics. 2005; 21(7):951-60.

Latest news

  • November 1, 2017 : Service prototype opens for tests.
  • December 18, 2017 : Service opens.

InterEvDock2 design

InterEvDock2 takes as input either the structures of two protein partners to be docked (experimental or modelled structures, possibly multimeric), or their sequences.

Given the sequences, the server runs several steps to generate a model of the structures:

  • Attempt to identify template structure using blastp [5] against the Protein Data Bank (PDB).
  • If no template is found by blastp with at least 35% sequence identity and 50% coverage with the query sequence, then builds profile using HHblits [6] against the uniprot20 database and attempts to identify template structure using HHsearch [7] against the PDB70.
  • If a template could be identified by blastp or by HHsearch with a probability higher than 95%, then a 3D model is generated using a fast comparative modeling protocol based on RosettaCM [8].

Given the structures (either input by the user or modeled from input sequences), the server runs several steps to propose a selection of 10 most likely models for each score (InterEvScore, SOAP_PP, FRODOCK scores) as well as 10 consensus models and 5 most likely interface residues on each protein:

  • Extracts the sequences of both partners and automatically builds two multiple sequence alignments ranking homologs of the same species in the same order. Both alignments are used in InterEvScore scoring. This process is fully automatized, including for multimeric inputs. Users can also submit their own co-alignments
  • Performs an exhaustive rigid-body search using the FRODOCK2 algorithm [1]
  • If constraints are provided by the user: Filter FRODOCK2 decoys according to user-defined constraints.
  • FRODOCK2 decoys are clustered using a ligand RMSD threshold of 4Å and ranked with respect to their energy.
  • The best 10,000 FRODOCK clusters are scored by InterEvScore [2] and SOAP_PP [3] potentials.
  • For each score, the top 1000 models are clustered using FCC [9] and the 10 best representative models of complexes are provided (ranked by score).
  • The 10 most likely models out of those 30 are selected by the InterEvDock consensus method, by grouping similar models well-ranked by different scoring functions.
  • Finally, a selection of 5 residues on each protein is proposed. Those are the residues most likely involved in the interface based on the best models, which can subsequently be used to implement constraints in further rigid-body or flexible docking simulations or to guide mutagenesis for interface disruption.

The docking server implements 3 major methods:

  • FRODOCK2 (developed in Pablo Chacón's lab [1])., a rigid-body docking method which combines a search algorithm based on spherical harmonics, an energy-based scoring function including van der Waals, electrostatics and desolvation terms and a complementary knowledge-based potential. FRODOCK2 has competitive success rates and efficiency when challenged on Weng's Benchmark v4[10].

  • InterEvScore , a scoring function combining a residue-based statistical potential including both two- and three-body statistical potentials with the scoring of interface contacts inferred from multiple sequence alignments. This way of integrating evolutionary information was found significantly superior to solely accounting for conserved positions. The server version includes the mode in which InterEvScore uses evolutionary information only for residues belonging to apolar patches as described in [2] (developed in Guerois' lab see InterEvScore [2]).

    InterEvScore-components
  • SOAP_PP , an atom-based statistical potential dedicated to protein-protein interactions derived from a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven ‘recovery’ functions. Relative orientation between two covalent bonds instead of a simple distance between two atoms contribute to capture orientation-dependent interactions such as hydrogen bonds. (developed in Andrej Sali's lab [3])

What InterEvDock does not perform

InterEvDock is a method for generating complex between two protein structures modelled as rigid-body subunits. Selected models may have some clashes that can be released upon relaxation of the models.

InterEvDock supposes the interaction between the two submitted subunits has been experimentally validated. It is not designed to predict neither the likelihood nor the strength of the interaction.

Large conformational changes upon binding are generally not well predicted.

Features

  • User input can be a protein sequence for one or both partners. In this case template search and comparative modeling is performed on the fly by the InterEvDock2 server before docking. Optionally, the user may provide a PDB template to use for comparative modelling.
  • User input can be a multimeric protein structure. The generation of the joint MSAs is fully automated as for monomeric structures.
  • Generation of multiple sequence alignments for co-evolved partners: The InterEvDock server generates multiple sequence alignments of the binding partners, so that homologs of the same species are aligned in the same order in both alignments.
  • User-defined constraints: in case the user has biological input such as a position that is known to be involved in the interface between the two protein partners, constraints can be specified for use in the docking procedure.
  • Selection of most likely binding modes: Starting from two structures (or structural models) of interacting proteins, InterEvDock identifies a maximum of 10 candidate binding modes for each of the 3 complementary scores computed. It also offers a selection of the 10 best consensus models.
  • Graphical exploration of the complexes: The structures of the decoys can be explored thanks to the PV applet (PV applet (M. Biasini), a WebGL-based viewer for proteins and other macromolecular structures. Both the evolutionary conservation of each partner (calculated with Rate4Site [11]) and the consensus interface can be visualized as color gradients on the surface of both protein partners.
  • Selection of 5 residues most likely involved in the interface on each protein: The 5 residues on each protein partner most likely involved in the interface are displayed in a table, together with their rank. Those residues can subsequently be used to implement constraints in flexible docking simulations or to guide mutagenesis for interface disruption.
  • Coordinates of the complexes and alignments are available for further off-web exploration: The selected models of complexes are available in the PDB format. The multiple sequence alignments of each subunit can also be retrieved in fasta format. A PyMOL script provided in the results zip archive for each run automatically loads the 30 models from the results zip file and colors them by interface residue consensus and evolutionary conservation.

Limitations

  • Subunit size: Size of each submitted subunit should lie in the 10-3000 amino acids range.
  • Design for proteins only: the server is currently not able to dock nucleic acids or small molecules. When nucleic acids or ligands are present in a protein chain, they will be kept only as steric objects.

Usage

Input

The simplest input consists in two fields to specify either the sequence or the 3D structures of the proteins to be docked.

  • Protein A structure or sequence: Fill one of the two fields. Warning: if both fields are filled, only the protein structure will be taken into account.

    • Protein A structure: It corresponds to the structure of the protein. It must be in the PDB format. It can be pasted, uploaded or retrieved automatically by indicating the PDB identifier and optionally one or several chain identifiers (for instance: "1a2yA" for PDB identifier 1a2y and chain A). Note that InterEvDock2 will score docking models only on standard amino acids of proteins. Nucleic acid chains or ligands are integrated in the rigid-body docking but are not scored. Solvent is removed.
    • Protein A sequence: It corresponds to the sequence of the protein in case the user does not know its structure. It must be in the FASTA format. It can be pasted or uploaded. If no protein A structure is provided but a protein A sequence is provided, automatic template search and homology modeling will be performed by the server to build a structural model for protein A.
  • Protein B structure or sequence: Same as above for the second protein partner.

Advanced options

  • Protein A structure or sequence: as in the simple options (see above)
  • Protein B structure or sequence: as in the simple options (see above)
  • Impose co-alignments (optional): Two fields can be optionally filled to specify the multiple sequence alignments associated with each 3D structure. The first sequence in each alignments should match as closely as possible the sequence of the corresponding PDB file (except gaps represented by '-'). The two alignments must contain sequences from the same set of species, appearing in the exact same order. If the multiple sequence alignments are not provided, they will be automatically generated by the InterEvDock server and will be provided in the results to facilitate potential re-submission.
  • Impose constraints (optional): A list of constraints can be optionally specified. Each constraint should be either a position from protein A or protein B or a pair of positions, one from protein A and one from protein B. Each constraint must contain at least one colon ":" used to specify which of the two partners the constraint applies to (before the colon for the first partner, after the colon for the second partner. e.g.
  • 11A:

    if we want residue at position 11 in protein A to be at the interface;

    :20B

    if we want residue at position 20 in protein B to be at the interface;

    11A:20B

    to enforce a contact between position 11 in protein A and position 20 in protein B). Positions are numbered according to the PDB numbering if the input structure was provided and according to sequential numbering of the FASTA sequence if only the input sequence was provided. Several constraints can be specified, separated by whitespace (space, tabulation or newline). A distance can be optionally specified for each constraint, for instance

    11A::7

    for a distance of 7 Å on position 11A (note the two colon “:” separators) and

    11A:20B:7

    for a distance of 7 Å between 11A and 20B. The default distance is 8 Å for single residues and 11 Å for pairs. These constraints will be taken into account in the docking process to keep only models having this position or pair of positions at the complex interface. Constraints will be checked prior to applying and constraints involving residues not present or buried in the structural model will be excluded. Note that when several constraints are provided, they are considered cumulative ("AND" not "OR") i.e. docking models will be filtered to retain only solutions that verify all constraints.

  • Positions are numbered according to the PDB numbering if the input structure was provided and according to sequential numbering of the FASTA sequence if only the input sequence was provided. Several constraints can be specified, separated by whitespace (space, tabulation or newline). A distance can be optionally specified for each constraint, for instance 11A:7 for a distance of 7 Å on position 11A and 11A:20B:7 for a distance of 7 Å between 11A and 20B. The default distance is 8 Å for single residues and 11 Å for pairs. These constraints will be taken into account in the docking process to keep only models having this position or pair of positions at the complex interface. Constraints will be checked prior to applying and constraints involving residues not present or buried in the structural model will be excluded. Note that when several constraints are provided, they are considered cumulative ("AND" not "OR") i.e. docking models will be filtered to retain only solutions that verify all constraints.
  • Impose template for Protein A (optional): Two fields can optionally be filled if the user would like a specific template to be used in the modeling.

    • Protein A template structure: The template structure in PDB format. As for the input structure, it can be pasted, uploaded or retrieved from the PDB by indicating the PDB identifier and chain (for instance: "1a2yA" for PDB identifier 1a2y and chain A).
    • Protein A sequence template alignment (optional): The sequence-template alignment in FASTA format. The first sequence should correspond to the input sequence for protein A and the second sequence should correspond to the template sequence for the structure provided in the field immediately above. If a template is provided but no query-template alignment is provided, the server will automatically build a query-template alignment.
  • Impose sequence-template alignment for Protein B (optional): Same as above for the second protein partner.

Demonstration mode

Accessible from InterEvDock2 page by setting "Yes" on the Demonstration Mode. By setting this option to "Yes", InterEvDock2 will load a pre-configured test case where the complex between the STAS domain and the acyl carrier protein in bacterium Hahella chejuensis is modeled. There are two versions of the Demonstration Mode, both starting from input sequences only, one of them without constraint and the second one where a constraint is imposed so that position 69 in the STAS domain has to be involved in the interface of the resulting models. The results can be compared with the coordinates of the reference PDB complex between the same proteins in Escherichia coli (PDB:3NY7). By switching on the demonstration mode, other input data specified in the other fields will be ignored.

Results

  • Progress report

    ProgressReport

    This section will incrementally provide information about job progression and errors if any. A typical run should produce a report similar to the one shown above. Errors related to the input data specified are also reported in this field.

  • Note InterEvDock runs last about 30 minutes if input structures and alignments are provided by the user for medium-sized proteins (200-300 residues). Runs including modeling from input sequences take about 35 to 55 minutes if alignments are provided. If only input sequences are provided and co-alignments need to be calculated, the run will last around 1 hour for medium-sized proteins. If constraints are provided, the run will be of similar or shorter duration.
  • An interactive page allowing to browse the best complexes generated (see below Visualization and post-processing).
  • A zip archive containing the PDB files of the models, where models are indexed according to InterEvScore, FRODOCK or SOAP_PP scores, as well as two tables reporting the 10 consensus models and the 5 most likely interface residues on each partner.
  • Two multiple sequence alignments generated and used for the INTEREVSCORE scoring.

Visualization and post-processing of resulting models

The best models can be explored using the PV applet.

ModelsPDB

The predicted interface residues can also be visualized on both proteins as a color gradient(from green to white for high to low probability to be at the interface).

PredictedInterface

The evolutionary conservation of each partner (calculated with Rate4Site [11]) can be visualized on both partners as a color gradient, from red (more conserved) to white (more diverse) through yellow (mild conservation).

Conservation

A PyMOL script provided in the results zip archive for each run automatically loads the 30 models from the results zip file and colors them by interface residue consensus and evolutionary conservation.

Examples

Example 1 (demo mode): complex between the STAS domain and the acyl carrier protein (ACP) in bacterium Hahella chejuensis

The result page of this demo without constraint can be accessed here and the result page of the demo with constraints 69A can be accessed here.

Experimental Complex

Example of the best ranked InterEvDock2 model using constraint 69A for the H. chejuensis complex between the STAS domain (in shades of yellow and red) and the ACP protein (in shades of blue), compared to the reference E.coli complex (PDB 3NY7, in dark grey).

Starting from input sequences only, InterEvDock2 builds models of the two chains and performs the docking procedure. The results can be compared with the coordinates of the reference PDB complex between the same proteins in Escherichia coli (PDB:3NY7). The STAS domain of H. chejuensis has 43% sequence identity with the STAS domain of E. coli and the ACP has 79% sequence identity. When using no constraint, the H. chejuensis STAS/ACP model ranked as top 6 in the top 10 consensus with an iRMSD of 2.3 Å ("Acceptable" quality according to the CAPRI criteria) compared to reference complex 3NY7. When using a constraint on position 69A involved in the interface, the best model (top 1 in the top 10 consensus) has an iRMSD of 3.95 Å ("Acceptable" quality according to the CAPRI criteria).

Example 2: Example from Weng's Benchmark 4

Experimental Complex

Example of the RAN-NTF2 complex (PDB:1A2K)
taken from Weng's Benchmark v4.

In this particular case of Weng's Benchmark database v4, InterEvDock returned a model with an iRMSD of 2.7 Å with respect to the native structure of the complex (PDB:1A2K). This model was the representative structure of the third best cluster. In contrast, neither ZDOCK, ZRANK nor FRODOCK alone ranked an "Acceptable" model among their top clusters.

Example 3: An example from CAPRI30 (target T72)

In the CAPRI30 session in which our group performed very well [12], targets of CASP11 crystallizing as homo-oligomers were proposed as CAPRI targets. Using the same strategy as implemented in the InterEvDock pipeline (rigid-body docking followed by scoring with multiple scores including InterEvScore), we submitted the model with the lowest interface RMSD (3.5 Å) for target T72. In case of target T72, template-based modelling led to misleading assembly prediction (right figure). Accordingly, at low sequence identity, assembly modes can substantially differ between remote homologs.

Experimental Complex

Xray structure of CASP11 target T0770 (4Q69), CAPRI30 Target72

Best Model

Structural model generated with correct orientation rated as "Acceptable" by CAPRI with interface RMSD of 3.5 Å obtained following a free docking procedure using the InterEvDock pipeline

Template Based Model

Incorrect model resulting from a simple template-based modelling based on pdb 3MX3 (seq. Id 20 %)

Benchmark

InterEvScore results

InterEvScore achieves significant improvement over traditional scoring functions on the 54 test cases from Weng's docking benchmark v4 with available coupled multiple sequence alignments and near-native decoys [2]. The use of evolution increased the quality of the prediction and was never detrimental in the discrimination of near-native interfaces, even though the number of sequences in the coupled alignments could be limited (between 10 and 100 species with an average of 35). In addition, we did not find the scoring improvement on inclusion of evolutionary data to be limited to certain categories of complexes (except for antibody-antigen complexes).

InterEvDock2 benchmark dataset

Performance of InterEvDock2 was assessed on 790 complexes from the PPI4DOCK database [13] for which the structures of the free proteins (unbound) can be modeled and for which evolutionary information could be retrieved [2]. A table of all 790 benchmark cases is available here.

InterEvDock2 results

The InterEvDock2 server predicts an "Acceptable" or better solution in the consensus top10 for 218 out of 790 test cases (28%). As expected, this top10 success rate decreases with increasing difficulty of the docking cases: 38% for very easy targets, 28% for easy targets and 11% for hard targets according to PPI4DOCK difficulty classification.

The InterEvDock2 server also predicts residues making contacts at the interface of a complex based on the analysis of all the interfaces of the top10 decoys for all three scores (30 models). In 89 % of the 790 test cases, at least one residue out of 10 was correctly predicted as present at the interface, providing very useful hints to guide mutagenesis experiments to disrupt a complex of interest. Of note, there is little decrease in precision from the rigid-body to the difficult cases (success rate is 91% for very easy targets, 90% for easy targets and 86% for hard targets according to PPI4DOCK difficulty classification). Predictions of the InterEvDock2 server can thus also be used as a prior to constrain more thorough docking simulations requiring flexibility in order to model the correct orientation between two binding partners. In that perspective, in 52% of the cases, at least one correct residue is predicted on both sides of the interface (61% for very easy targets, 52% for easy targets and 38% for hard targets according to PPI4DOCK difficulty classification).

References

[1] Ramirez-Aportela E, Lopéz-Blanco JR, Chacon P.
FRODOCK 2.0: fast protein-protein docking server.
Bioinformatics. 2016; 32(15):2386-8.
[2] J Andreani, G Faure, R Guerois.
InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution.
Bioinformatics 2013; 29 (14):1742–1749.
[3] GQ Dong, H Fan, D Schneidman-Duhovny, B Webb, A Sali.
Optimized atomic statistical potentials: assessment of protein interfaces and loops.
Bioinformatics. 2013; 29(24):3158-66.
[4] Vavrusa M, Andreani J, Rey J, Tuffery P, Guerois R
InterEvDock: A docking server to predict the structure of protein-protein interactions using evolutionary information.
Nucleic Acids Res. 2016; 44(W1):W542-9.
[5] Tatusova TA, Madden TL.
BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences.
FEMS Microbiol Lett. 1999;174(2):247-50.
[6] Remmert M1, Biegert A, Hauser A, Soding J.
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.
Nat Methods. 2011;9(2):173-5.
[7] Soding J.
Protein homology detection by HMM-HMM comparison.
Bioinformatics. 2005; 21(7):951-60.
[8] Song Y, DiMaio F, Wang RY-R, Kim D, Miles C, Brunette TJ, Thompson J, BakerD
High resolution comparative modeling with RosettaCM.
Structure. 2013; 21(10):10.
[9] Rodrigues JPGLM, Trellet M, Schmitz C, Kastritis P, Karaca E, Melquiond ASJ, Bonvin AMJJ.
Clustering biomolecular complexes by residue contacts similarity.
Proteins: Structure, Function, and Bioinformatics 2012;80(7):1810–1817.
[10] H Hwang, T Vreven, J Janin, Z Weng.
Protein-protein docking benchmark version 4.0.
Proteins. 2010; 78(15):3111-4.
[11] T Pupko, RE Bell, I Mayrose, F Glaser, N Ben-Tal
Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues.
Bioinformatics. 2002; 18 Suppl 1:S71-77.
[12] M Lensink, S Velankar, A Kryshtafovych, S Wodak
Table CAPRI30: Ranking by number or INTERFACES for which at least one 'Acceptable' solution was obtained.
In : Presentation from the CAPRI30 Cancun meeting
[13] Yu J, Guerois R.
PPI4DOCK: large scale assessment of the use of homology models in free docking over more than 1000 realistic targets.
Bioinformatics. 2016; 32(24):3760-3767.

Funding

This work was supported by the French Infrastructure for Integrated Structural Biology (FRISBI) [ANR-10-INSB-05-01]; ANR-IAB-2011-BIP:BIP [ANR-10-BINF-0003]; IFB [ANR-11-INBS-0013]; CHIPSET [ANR-15-CE11-0008-01]. PPI4DOCK benchmarking was done through granted access to the HPC resources of CCRT under the allocations 2015-7078, 2016-7078 and 2017-7078 by GENCI (Grand Equipement National de Calcul Intensif).