InterEvDock2

A docking server to predict the structure of protein-protein interactions using evolutionary information.

Run InterEvDock2

The InterEvDock2 service is integrated in the RPBS Mobyle Portal.

Overview

This website is free and there is no login requirement. Note that for homology modeling, using this service starting from sequences is allowed only for non-commercial users (non-profit/academic/governement research groups). Commercial users are asked to submit their own models.

InterEvDock-ataglance

Two protein structures (or structural models generated on the fly from input sequences) and their respective multiple sequence alignments are used to predict binding modes through a free docking procedure.

Why InterEvDock2 ?

The structural modelling of protein-protein interactions is key in understanding how cell machineries assemble and cross-talk with each other. When homologous sequences are available for both protein partners, it is very useful to rely on structures and multiple sequence alignments to identify binding interfaces. InterEvDock2 is a server for protein docking running the InterEvScore potential specifically designed to integrate evolutionary information in the docking process. The InterEvScore potential was developed for heteromeric protein interfaces and combines a residue-based multi-body statistical potential with evolutionary information derived from the multiple sequence alignments of each partner in the complex. In the InterEvDock2 server, the systematic docking search is performed using the FRODOCK2 program [1] and the resulting models are re-scored with InterEvScore [2] together with the SOAP_PP atom-based statistical potential [3] found to increase the confidence of the predictions.

InterEvDock2 is an update of InterEvDock [4] that can handle protein sequences as inputs, and not only protein 3D structures. When a sequence is provided by the user, a comparative modeling step based on an automatic template search protocol builds models for the individual protein partners, prior to docking. In InterEvDock2, in case the user has biological input such as a position that is known to be involved in the interface between the two protein partners, constraints can be specified for use in the docking procedure. This can be crucial to ensure that all available biologically relevant information is used for InterEvDock2 predictions. In addition, InterEvDock2 implements the possibility to submit structures of oligomers as input to the free docking. Such an option is generally complicated in co-evolution analyses since the joint MSAs have to be generated for every chain of an oligomer. This process is now fully automatized in InterEvDock2.

When using this service, please cite the following references:

Quignot C, Rey J, Yu J, Tufféry P, Guerois R, Andreani J.
InterEvDock2: an expanded server for protein docking using evolutionary and biological information from homology models and multimeric inputs.
Nucleic Acids Res. 2018 Jul 2;46(W1):W408-16.
Yu J, Vavrusa M, Andreani J, Rey J, Tufféry P, Guerois R.
InterEvDock: A docking server to predict the structure of protein-protein interactions using evolutionary information.
Nucleic Acids Res. 2016 Jul 8;44(W1):W542-9.
Andreani J, Faure G, Guerois R.
InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution.
Bioinformatics. 2013 29(14):1742-9.

Please, cite also the FRODOCK2 program which is used for the rigid-body docking step:

Ramirez-Aportela E, Lopéz-Blanco JR, Chacon P.
FRODOCK 2.0: fast protein-protein docking server.
Bioinformatics. 2016;32(15):2386-8.

Using the results of SOAP_PP, please cite :

Dong GQ, Fan H, Schneidman-Duhovny D, Webb B, Sali A.
Optimized atomic statistical potentials: assessment of protein interfaces and loops.
Bioinformatics. 2013;29(24):3158-66.

Using the evolutionary conservation results obtained using Rate4Site (mapped onto all visualized models in the PV applet and written into the b-factor field of the PDB files provided for all models in the results zip archive) please cite:

Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N.
Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues.
Bioinformatics. 2002; 18 Suppl 1:S71-77.

Using the comparative modeling protocol based on RosettaCM (i.e. if your input consists in one or two sequences), please cite:

Song Y, DiMaio F, Wang RY-R, Kim D, Miles C, Brunette TJ, Thompson J, Baker D.
High resolution comparative modeling with RosettaCM.
Structure. 2013; 21(10):10.

Using the automatic template search (i.e. if your input consists in one or two sequences and you did not specify a template), please cite:

Remmert M1, Biegert A, Hauser A, Soding J.
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.
Nat Methods. 2011;9(2):173-5.
Soding J.
Protein homology detection by HMM-HMM comparison.
Bioinformatics. 2005; 21(7):951-60.

Latest news

  • April 5, 2018: Breakpoints and hot start mechanism implemented.
  • January 25, 2018: Better output.
  • November 1, 2017: Service prototype opens for tests.
  • December 18, 2017: Service opens.

InterEvDock2 design

InterEvDock2 takes as input either the structures of two protein partners to be docked (experimental or modelled structures, possibly multimeric), or their sequences.

Given the sequences, the server runs several steps to generate a model of the structures:

  • Use HHblits [6] against the uniprot20 database to build a profile and attempt to identify template structure using HHsearch [7] against the PDB70.
  • If a template could be identified by HHsearch with a probability higher than 95%, then a 3D model is generated using a fast comparative modeling protocol based on RosettaCM [8].

Given the structures (either input by the user or modeled from input sequences), the server runs several steps to propose a selection of 10 most likely models for each score (InterEvScore, SOAP_PP, FRODOCK scores) as well as 10 consensus models and 5 most likely interface residues on each protein:

  • Extracts the sequences of both partners and automatically builds two multiple sequence alignments ranking homologs of the same species in the same order. Both alignments are used in InterEvScore scoring. This process is fully automatized, including for multimeric inputs. Users can also submit their own co-alignments
  • Performs an exhaustive rigid-body search using the FRODOCK2 algorithm [1]
  • If constraints are provided by the user: Filter FRODOCK2 decoys according to user-defined constraints.
  • If joint MSAs are not provided for the two protein partners, a joint MSA is generated automatically by the server using a blastp [5] search against the Uniprot-KB database with threshold sequence identity > 30%, coverage > 75% and E-value < 10–4. Only the sequence with the best identity is kept per species. Pairs of sequences belonging to the same species are collected and redundant paired sequences with sequence identity higher than 90% with the query are removed. Sequences are re-aligned using MAFFT giving, in the end, a set of two MSAs containing exactly the same number of sequences in the same species order.
  • FRODOCK2 decoys are clustered using a ligand RMSD threshold of 4Å and ranked with respect to their energy.
  • The best 10,000 FRODOCK clusters are scored by InterEvScore [2] and SOAP_PP [3] potentials.
  • The 10 most likely models out of those 30 are selected by the InterEvDock consensus method, by grouping similar models well-ranked by different scoring functions.
  • Finally, a selection of 5 residues on each protein is proposed. Those are the residues most likely involved in the interface based on the best models, which can subsequently be used to implement constraints in further rigid-body or flexible docking simulations or to guide mutagenesis for interface disruption. Users can focus on the top 1 or 2 predicted residues on each chain which should already provide relevant constraints.

The docking server implements 3 major methods:

  • FRODOCK2 (developed in Pablo Chacón's lab [1])., a rigid-body docking method which combines a search algorithm based on spherical harmonics, an energy-based scoring function including van der Waals, electrostatics and desolvation terms and a complementary knowledge-based potential. FRODOCK2 has competitive success rates and efficiency when challenged on Weng's Benchmark v4[9].

  • InterEvScore , a scoring function combining a residue-based statistical potential including both two- and three-body statistical potentials with the scoring of interface contacts inferred from multiple sequence alignments. This way of integrating evolutionary information was found significantly superior to solely accounting for conserved positions. The server version includes the mode in which InterEvScore uses evolutionary information only for residues belonging to apolar patches as described in [2] (developed in Guerois' lab see InterEvScore [2]).

    InterEvScore-components
  • SOAP_PP , an atom-based statistical potential dedicated to protein-protein interactions derived from a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven ‘recovery’ functions. Relative orientation between two covalent bonds instead of a simple distance between two atoms contribute to capture orientation-dependent interactions such as hydrogen bonds. (developed in Andrej Sali's lab [3])

What InterEvDock does not perform

InterEvDock is a method for generating complex between two protein structures modelled as rigid-body subunits. Selected models may have some clashes that can be released upon relaxation of the models.

InterEvDock supposes the interaction between the two submitted subunits has been experimentally validated. It is not designed to predict neither the likelihood nor the strength of the interaction.

Large conformational changes upon binding are generally not well predicted.

Features

  • User input can be a protein sequence for one or both partners. In this case template search and comparative modeling is performed on the fly by the InterEvDock2 server before docking. Optionally, the user may provide a PDB template to use for comparative modelling.
  • User input can be a multimeric protein structure. The generation of the joint MSAs is fully automated as for monomeric structures.
  • Generation of multiple sequence alignments for co-evolved partners: The InterEvDock server generates multiple sequence alignments of the binding partners, so that homologs of the same species are aligned in the same order in both alignments.
  • User input can be a query-template alignement for one or both partners if the template header starts with the 4-letter PDB identifier of the template and the chain in question separated by a "_" and directly followed by the code-word ":AUTOPDB" as InterEvDock2 would return it after the second breakpoint (i.e. "Breakpoint for Model Inspection prior to Docking").
  • User input can be the job identifier of a previous but non-expired job in which case InterEvDock2 will run using the raw docking output and the structures used in that job. This is especially usefull if the user wants to reconsiders his constraints.
  • User-defined constraints: in case the user has biological input such as a position that is known to be involved in the interface between the two protein partners, constraints can be specified for use in the docking procedure.
  • Selection of most likely binding modes: Starting from two structures (or structural models) of interacting proteins, InterEvDock identifies a maximum of 10 candidate binding modes for each of the 3 complementary scores computed. It also offers a selection of the 10 best consensus models.
  • Graphical exploration of the complexes: The structures of the decoys can be explored thanks to the PV applet (PV applet (M. Biasini), a WebGL-based viewer for proteins and other macromolecular structures. Both the evolutionary conservation of each partner (calculated with Rate4Site [10]) and the consensus interface can be visualized as color gradients on the surface of both protein partners.
  • Selection of 5 residues most likely involved in the interface on each protein: The 5 residues on each protein partner most likely involved in the interface are displayed in a table, together with their rank. Those residues can subsequently be used to implement constraints in flexible docking simulations or to guide mutagenesis for interface disruption. Users can focus on the top 1 or 2 predicted residues on each chain which should already provide relevant constraints.
  • Coordinates of the complexes and alignments are available for further off-web exploration: The selected models of complexes are available in the PDB format. The multiple sequence alignments of each subunit can also be retrieved in fasta format. A PyMOL script provided in the results zip archive for each run automatically loads the 30 models from the results zip file and colors them by interface residue consensus and evolutionary conservation.

Limitations

  • Subunit size: Size of each submitted subunit should lie in the 10-3000 amino acids range.
  • Design for proteins only: the server is currently not able to dock nucleic acids or small molecules. When nucleic acids or ligands are present in a protein chain, they will be kept only as steric objects.
  • InterEvDock2 has been tested on various browsers under various operating systems. In a general manner, for Windows users, please prefer Chrome or Opera rather than Edge for which we observed an unstable behavior depending on system configuration.

Usage

Input

Simple options

The simplest input consists in two fields to specify either the sequence or the 3D structures of the proteins to be docked.

  • Protein A structure or sequence: Fill one of the two fields. Warning: if both fields are filled, only the protein structure will be taken into account.

    • Protein A structure: It corresponds to the structure of the protein. It must be in the PDB format. It can be pasted, uploaded or retrieved automatically by indicating the PDB identifier and optionally one or several chain identifiers (for instance: "1a2yA" for PDB identifier 1a2y and chain A). Note that InterEvDock2 will score docking models only on standard amino acids of proteins. Nucleic acid chains or ligands are integrated in the rigid-body docking but are not scored. Solvent is removed.
    • Protein A sequence: It corresponds to the sequence of the protein in case the user does not know its structure. It must be in the FASTA format. It can be pasted or uploaded. If no protein A structure is provided but a protein A sequence is provided, automatic template search and homology modeling will be performed by the server to build a structural model for protein A.
  • Protein B structure or sequence: Same as above for the second protein partner.

Advanced options

Workflow controls (optional)

These controls address two different points:

  • Starting from sequences, InterEvDock2 will attempt to perform automated homology modeling using hhsearch. However, docking performance is strongly dependent on the quality of the models. Consequently, InterEvDock2 will stop when failing to identify a satisfactory template, possibly resulting in models with strange geometries and aberant linear regions. It is also possible that the automatic choice of the template is suboptimal. To provide some control on the modeling process, InterEvDock2 offers the possibility to stop before performing docking by defining breakpoints, so as to allow step-by-step modelling.
  • Another facility is in terms of the refinement of user specified contraints. InterEvDock2 proposes a hot start mechanism to elaborate over the results of a previous run.

Two controls are available:

ProtocolSelection
  • Breakpoints:

    • Breakpoint 1 (Template Selection prior to Modelling): Option used to stop the run and select the structural templates most suitable for each input sequence (partners A and B). In a second step, docking simulation can be run indicating the selected template following the protocol below
    • Breakpoint1
    • Breakpoint 2 (Model Inspection prior to Docking): Option used to analyze the structures or the models generated just before docking in order to inspect the modelled structures in a 3D viewer and help define the constraints by clicking on the 3D structures, for instance. Once models are validated, users can go back to the InterEvDock2 form and submit their structures. Remember to switch breakpoint option to 'No' so that auto-mode is selected and no breakpoint activated.
  • Hot start: It is possible to re-run docking using different conditions such as different constraints or different co-alignments. Specifying the identifier of a previous run (on the same two partners), the calculation will be much faster since the docking step will have already been calculated and only the scoring steps will need to be re-run.

Impose constraints (optional)

A list of constraints can be optionally specified. Each constraint should be either a position from protein A or protein B or a pair of positions, one from protein A and one from protein B. Each constraint must contain at least one colon ":" used to specify which of the two partners the constraint applies to (before the colon for the first partner, after the colon for the second partner. e.g.

11A:

if we want residue at position 11 on chain A in protein A to be at the interface;

:20B

if we want residue at position 20 on chain B in protein B to be at the interface;

11A:20B

to enforce a contact between position 11 on chain A in protein A and position 20 on chain B in protein B). Positions are numbered according to the PDB numbering if the input structure was provided and according to sequential numbering of the FASTA sequence if only the input sequence was provided. In the latter case, the chain name should be A by default. Several constraints can be specified, separated by whitespace (space, tabulation or newline). A distance can be optionally specified for each constraint, for instance

11A::7

for a distance of 7 Å on position 11A (note the two colon “:” separators) and

11A:20B:7

for a distance of 7 Å between 11A and 20B. The default distance is 8 Å for single residues and 11 Å for pairs. These constraints will be taken into account in the docking process to keep only models having this position or pair of positions at the complex interface. Constraints will be checked prior to applying and constraints involving residues not present or buried in the structural model will be excluded. Note that when several constraints are provided, they are considered cumulative ("AND" not "OR") i.e. docking models will be filtered to retain only solutions that verify all constraints.

Positions are numbered according to the PDB numbering if the input structure was provided and according to sequential numbering of the FASTA sequence if only the input sequence was provided. In the latter case, the chain name should be A by default. Several constraints can be specified, separated by whitespace (space, tabulation or newline). A distance can be optionally specified for each constraint, for instance 11A:7 for a distance of 7 Å on position 11A and 11A:20B:7 for a distance of 7 Å between 11A and 20B. The default distance is 8 Å for single residues and 11 Å for pairs. These constraints will be taken into account in the docking process to keep only models having this position or pair of positions at the complex interface. Constraints will be checked prior to applying and constraints involving residues not present or buried in the structural model will be excluded. Note that when several constraints are provided, they are considered cumulative ("AND" not "OR") i.e. docking models will be filtered to retain only solutions that verify all constraints.

In order to avoid round-trips outside the browser, it is possible to use the on-line NGL Viewer viewer to quickly identify residues.

Impose sequence-template for Protein A (optional)

SequenceTemplate

Two fields can optionally be filled if the user would like a specific template to be used in the modeling.

  • Protein A sequence template alignment (optional): The sequence-template alignment in FASTA format. The first sequence should correspond to the input sequence for protein A and the second sequence should correspond to the template sequence for the structure of the desired template. The coordinates of the template PDB should be provided in the field immediately below except in the case where the alignment comes from the result of a Template Selection step activated by Breakpoint 1. When using sequence-template alignments copied from the output of InterEVDock2 Breakpoint 1, it is not required to provide a template PDB since it will be automatically uploaded from the PDB code written in the tag (>PDB_chain:AUTOPDB;) in the header of the template
  • Protein A template structure: The template structure in PDB format. As for the input structure, it can be pasted, uploaded or retrieved from the PDB by indicating the PDB identifier and chain (for instance: "1a2yA" for PDB identifier 1a2y and chain A). If a template is provided but no query-template alignment is provided, the server will automatically build a query-template alignment.

Impose sequence-template alignment for Protein B (optional)

Same as above for the second protein partner.

Tune homology modeling parameters (optional)

Structural models are generated following the sequence-template alignment calculated by the server or provided by users in the dedicated frame.

ModelParameters

In standard conditions, insertions (loops) are modeled up to a length of 14 residues. Beyond this limit residues are not modeled and a break in the chain is created. Long loops might be detrimental for the validity of the subsequent rigid-body docking so that users can tune the length of the loops and reduce this threshold parameter. Similarly, tails at the N-terminus and C-terminus are generally not modeled if they are not observed in the template PDB. Users can decide to model them up to a certain number of the residues by changing the parameters. In the sequence-template alignment frame, N-ter and C-ter tail residues will correspond to residues that are aligned with gaps in the extremities of the alignment.

Impose co-alignments (optional)

Two fields can be optionally filled to specify the multiple sequence alignments associated with each 3D structure. The first sequence in each alignments should match as closely as possible the sequence of the corresponding PDB file (except gaps represented by '-'). The two alignments must contain sequences from the same set of species, appearing in the exact same order. If the multiple sequence alignments are not provided, they will be automatically generated by the InterEvDock server and will be provided in the results to facilitate potential re-submission.

Demonstration mode

Accessible from InterEvDock2 page by setting "Yes" on the Demonstration Mode. By setting this option to "Yes", InterEvDock2 will load a pre-configured test case where the complex between the STAS domain and the acyl carrier protein in bacterium Hahella chejuensis is modeled.
There are four versions of the Demonstration Mode, starting from input sequences only.

  • The two first ones demonstrate the use of break points after template search and after modeling.
  • The third illustrates a full run with automated modeling and docking without constraint and the last one includes a constraint so that position 69 in the STAS domain has to be at the interface of the resulting models.

The results can be compared with the coordinates of the reference PDB complex between the same proteins in Escherichia coli (PDB:3NY7). By switching on the demonstration mode, other input data specified in the other fields will be ignored.

Results

  • Progress report

    ProgressReport

    This section will incrementally provide information about job progression and errors if any. A typical run should produce a report similar to the one shown above. Errors related to the input data specified are also reported in this field.

  • Note InterEvDock runs last about 30 minutes if input structures and alignments are provided by the user for medium-sized proteins (200-300 residues). Runs including modeling from input sequences take about 35 to 55 minutes if alignments are provided. If only input sequences are provided and co-alignments need to be calculated, the run will last around 1 hour for medium-sized proteins. If constraints are provided, the run will be of similar or shorter duration.
  • An interactive page allowing to browse the best complexes generated (see below Visualization and post-processing).
  • A zip archive containing the PDB files of the models, where models are indexed according to InterEvScore, FRODOCK or SOAP_PP scores, as well as two tables reporting the 10 consensus models and the 5 most likely interface residues on each partner.
  • Two multiple sequence alignments generated and used for the INTEREVSCORE scoring.

Visualization and post-processing of identified templates (break after template identification)

The results related to template identication are divided in four sections.

  1. An overview of the coverage by each template identified. Clicking on the identifer will position the cursor on the corresponding alignment.
    Templates-overview1
  2. A summary of the score for each template. Clicking on the identifer will send to the corresponding PDB page.
    Templates-overview2
  3. The alignements that can be re-used by InterEvDock to drive modeling when specifying a template.
    Templates-overview3
  4. The hhsearch initial alignments.
    Templates-overview4

Visualization of generated homology models (break after modeling)

PDB files of the models generated for protein A and/or protein B are presented on the form:

HomologyModel

It is possible to visualize interactively the structure by clicking on one of the PV or NGL button located at the bottom of the window. Using PV, moving the mouse cursor, residue names and numbers are displayed in the selected text field, which can be used to define constraints for the docking. Using NGL, activating the label item will trigger the display of residue labels.

PV
NGL

Visualization and post-processing of resulting models

The best models can be explored using the PV applet.

ModelsPDB

The predicted interface residues can also be visualized on both proteins as a color gradient(from green to white for high to low probability to be at the interface).

PredictedInterface

The evolutionary conservation of each partner (calculated with Rate4Site [10]) can be visualized on both partners as a color gradient, from red (more conserved) to white (more diverse) through yellow (mild conservation).

Conservation

A PyMOL script provided in the results zip archive for each run automatically loads the 30 models from the results zip file and colors them by interface residue consensus and evolutionary conservation.

InterEvScore

The InterEvScores, SOAP_PP and frodock scores are reported for the 10 best complexes.

Visualization of constraints satisfaction

Constraint-satisfaction

If constraints were specified in input, information about each constraint satisfaction is presented as a table.

Examples

Example 1 (demo mode): complex between the STAS domain and the acyl carrier protein (ACP) in bacterium Hahella chejuensis

The result page of this demo executed with a breakpoint after template search can be accessed here and the result page of the demo executed with a breakpoint after modeling can be accessed here.

The result page of this demo without constraint can be accessed here and the result page of the demo with constraint 69A can be accessed here.

Experimental Complex

Example of the best ranked InterEvDock2 model using constraint 69A for the H. chejuensis complex between the STAS domain (in shades of yellow and red) and the ACP protein (in shades of blue), compared to the reference E.coli complex (PDB 3NY7, in dark grey).

Starting from input sequences only, InterEvDock2 builds models of the two chains and performs the docking procedure. The results can be compared with the coordinates of the reference PDB complex between the same proteins in Escherichia coli (PDB:3NY7). The STAS domain of H. chejuensis has 43% sequence identity with the STAS domain of E. coli and the ACP has 79% sequence identity. When using no constraint, the H. chejuensis STAS/ACP model ranked as top 6 in the top 10 consensus with an iRMSD of 2.3 Å ("Acceptable" quality according to the CAPRI criteria) compared to reference complex 3NY7. When using a constraint on position 69A involved in the interface, the best model (top 1 in the top 10 consensus) has an iRMSD of 3.95 Å ("Acceptable" quality according to the CAPRI criteria).

Example 2: an example from CAPRI30 (target T72)

In the CAPRI30 session in which our group performed very well [11], targets of CASP11 crystallizing as homo-oligomers were proposed as CAPRI targets. Using the same strategy as implemented in the InterEvDock pipeline (rigid-body docking followed by scoring with multiple scores including InterEvScore), we submitted the model with the lowest interface RMSD (3.5 Å) for target T72. In case of target T72, template-based modelling led to misleading assembly prediction (right figure). Accordingly, at low sequence identity, assembly modes can substantially differ between remote homologs.

Experimental Complex

Xray structure of CASP11 target T0770 (4Q69), CAPRI30 Target72

Best Model

Structural model generated with correct orientation rated as "Acceptable" by CAPRI with interface RMSD of 3.5 Å obtained following a free docking procedure using the InterEvDock pipeline

Template Based Model

Incorrect model resulting from a simple template-based modelling based on pdb 3MX3 (seq. Id 20 %)

Benchmark

InterEvScore results

InterEvScore achieves significant improvement over traditional scoring functions on the 54 test cases from Weng's docking benchmark v4 with available coupled multiple sequence alignments and near-native decoys [2]. The use of evolution increased the quality of the prediction and was never detrimental in the discrimination of near-native interfaces, even though the number of sequences in the coupled alignments could be limited (between 10 and 100 species with an average of 35). In addition, we did not find the scoring improvement on inclusion of evolutionary data to be limited to certain categories of complexes (except for antibody-antigen complexes).

InterEvDock2 benchmark dataset

Performance of InterEvDock2 was assessed on 812 complexes from the PPI4DOCK database [12] for which the structures of the free proteins (unbound) can be modeled and for which evolutionary information could be retrieved [2]. A table of all 812 benchmark cases is available here.

InterEvDock2 results

The InterEvDock2 server predicts an "Acceptable" or better solution in the consensus top10 for 239 out of 812 test cases (29%). As expected, this top10 success rate decreases with increasing difficulty of the docking cases: 43% for very easy targets, 30% for easy targets, 11% for hard targets and 5% for very hard targets according to PPI4DOCK difficulty classification.

The InterEvDock2 server also predicts residues making contacts at the interface of a complex based on the analysis of all the interfaces of the top10 decoys for all three scores (30 models). In 91 % of the 812 test cases, at least one residue out of 10 was correctly predicted as present at the interface, providing very useful hints to guide mutagenesis experiments to disrupt a complex of interest. Of note, there is little decrease in precision from the rigid-body to the difficult cases (success rate is 92% for very easy targets, 90% for easy targets, 87% for hard targets and 100% for very hard targets according to PPI4DOCK difficulty classification). Predictions of the InterEvDock2 server can thus also be used as a prior to constrain more thorough docking simulations requiring flexibility in order to model the correct orientation between two binding partners. In that perspective, in 51% of the cases, at least one correct residue is predicted on both sides of the interface (59% for very easy targets, 53% for easy targets, 33% for hard targets and 41 for very hard targets according to PPI4DOCK difficulty classification). When considering only the top 1 predicted residue on each chain, at least one of the two predicted residues is correct in 75% of the cases and both are correct in 34% of the cases, highlighting the practical value of InterEvDock2 residue prediction. All those results are significantly higher than a reference interval given by random selection of residues on the surface of the protein.

References

[1] Ramirez-Aportela E, Lopéz-Blanco JR, Chacon P.
FRODOCK 2.0: fast protein-protein docking server.
Bioinformatics. 2016; 32(15):2386-8.
[2] J Andreani, G Faure, R Guerois.
InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution.
Bioinformatics 2013; 29 (14):1742–1749.
[3] GQ Dong, H Fan, D Schneidman-Duhovny, B Webb, A Sali.
Optimized atomic statistical potentials: assessment of protein interfaces and loops.
Bioinformatics. 2013; 29(24):3158-66.
[4] Vavrusa M, Andreani J, Rey J, Tuffery P, Guerois R
InterEvDock: A docking server to predict the structure of protein-protein interactions using evolutionary information.
Nucleic Acids Res. 2016; 44(W1):W542-9.
[5] Tatusova TA, Madden TL.
BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences.
FEMS Microbiol Lett. 1999;174(2):247-50.
[6] Remmert M1, Biegert A, Hauser A, Soding J.
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.
Nat Methods. 2011;9(2):173-5.
[7] Soding J.
Protein homology detection by HMM-HMM comparison.
Bioinformatics. 2005; 21(7):951-60.
[8] Song Y, DiMaio F, Wang RY-R, Kim D, Miles C, Brunette TJ, Thompson J, BakerD
High resolution comparative modeling with RosettaCM.
Structure. 2013; 21(10):10.
[9] H Hwang, T Vreven, J Janin, Z Weng.
Protein-protein docking benchmark version 4.0.
Proteins. 2010; 78(15):3111-4.
[10] T Pupko, RE Bell, I Mayrose, F Glaser, N Ben-Tal
Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues.
Bioinformatics. 2002; 18 Suppl 1:S71-77.
[11] M Lensink, S Velankar, A Kryshtafovych, S Wodak
Table CAPRI30: Ranking by number or INTERFACES for which at least one 'Acceptable' solution was obtained.
In : Presentation from the CAPRI30 Cancun meeting
[12] Yu J, Guerois R.
PPI4DOCK: large scale assessment of the use of homology models in free docking over more than 1000 realistic targets.
Bioinformatics. 2016; 32(24):3760-3767.

Funding

This work was supported by the French Infrastructure for Integrated Structural Biology (FRISBI) [ANR-10-INSB-05-01]; ANR-IAB-2011-BIP:BIP [ANR-10-BINF-0003]; IFB [ANR-11-INBS-0013]; CHIPSET [ANR-15-CE11-0008-01]. PPI4DOCK benchmarking was done through granted access to the HPC resources of CCRT under the allocations 2015-7078, 2016-7078 and 2017-7078 by GENCI (Grand Equipement National de Calcul Intensif).