InterEvDock3

A docking server to predict the structure of protein-protein interactions using evolutionary information.

Run InterEvDock3

The InterEvDock3 service is integrated in the RPBS Mobyle Portal.

This website is free and there is no login requirement. Note that using the option for rescoring decoys using the Rosetta interface score is allowed only for non-commercial users (non-profit/academic/governement research groups). Commercial users are asked not to activate this option.

InterEvDock-ataglance

Two protein partners and their respective multiple sequence alignments are used to predict binding modes through a free docking procedure. Each partner can be a (set of) protein sequence(s) or a mono- or oligomeric structure. Comparative modeling will be performed if a (set of) sequence(s) is provided for one or both partner(s).

Watch the tutorial video to learn how to use InterEvDock3

Overview

Why InterEvDock3?

The structural modeling of protein-protein interactions is key in understanding how cell machineries assemble and cross-talk with each other. When homologous sequences are available for both protein partners, it is very useful to rely on structures and multiple sequence alignments to identify binding interfaces. InterEvDock3 is a fully automated protein docking server that can start from input sequences or structures and take user constraints. If protein sequences are provided, template search and homology modeling (for monomers and assemblies) are performed on-the-fly. InterEvDock3 returns the most interesting docking models and suggestions of residues to target for mutagenesis studies.

InterEvDock3 runs a hybrid solution for template-based or free docking simulations under evolutionary constraints, thus ensuring the most efficient available approach is used at all steps of the structural prediction of protein assemblies. Joint alignments to exploit coevolution of the two binding partners are built by the server in a fully automated manner and integrated into model selection at an atomic detail through explicitly modeled homologs. In the InterEvDock3 server, the systematic docking search is performed using the FRODOCK2 programme [1] and the resulting models are re-scored with InterEvScore [2] together with the SOAP-PP atom-based statistical potential [3] and optionally also Rosetta's Interface Score [4]. The use of evolutionary information at atomic level was found to increase the confidence of free docking predictions [5].

InterEvDock3 is an update of InterEvDock [6] and InterEvDock2 [7]. Compared to its predecessors, InterEvDock3 features the following major upgrades:

  • MODE 1: InterEvDock3 handles template-based modeling of multiple subunits thanks to the procedure developed in [8]. As a consequence, when a set of sequences is provided (for one or both partners), the best available template will be used to build a multimeric model. When two sets of sequences are provided, the two multimeric models are then passed to the free docking pipeline.
  • MODE 2: InterEvDock3 supports integration of covariation constraints, e.g. from deep-learning-assisted predictions such as those provided by the ComplexContact server.
  • MODE 3: InterEvDock3 integrates co-evolution-guided scoring at atomic-level instead of residue-level [5], which provides a large boost in free docking success rates.

How to cite this service

When using this service, please cite:

InterEvDock3: A combined template-based and free docking server with increased performance through explicit modelling of complex homologs and integration of covariation-based contact maps. Submitted.
Quignot C, Postic G, Bret H, Rey J, Granger P, Murail S, Chacón P, Andreani J*, Tufféry P* and Guerois R*


Please also consider citing (for comparative modeling of complexes) [8] and/or (for free docking) [5].

If using free docking results, please also cite the FRODOCK programme which is used for the rigid-body docking step [1]. When using the results of SOAP-PP, please cite [3]. When using the results of Rosetta interface score, please cite [4]. When using the evolutionary conservation results obtained using Rate4Site (mapped onto all visualized models and written into the b-factor field of the PDB files provided for all models in the results zip archive) please cite [9].

When using comparative modeling results for complexes, please cite [8]. When using the automatic template search, please cite [10] and [11]. Since the comparative modeling protocol is based on OSCAR-star, please also cite [12].

Latest news

  • November 2020: Service prototype opens for tests.
  • December 10, 2020: Service opens.

What InterEvDock3 does not perform

InterEvDock3 supposes the interaction between the submitted proteins has been experimentally validated. It is not designed to predict neither the likelihood nor the strength of the interaction.

When free docking is performed, the two partners are modeled as rigid-body subunits. Selected models may have some clashes that can be released upon relaxation of the models.

In the free docking procedure, large conformational changes upon binding are generally not well predicted.

Usage

Simple input options

The simplest input consists in two fields to specify either the sequence(s) or the 3D structures of the partners to be docked. According to the input types, different pipelines will be activated in InterEvDock3 (cf. template-based modeling and free docking sections).

input_struct_seq

  • Partner A structure or sequence(s): Fill in one of the two fields.

    • Partner A structure corresponds to the structure of the protein. It must be in the PDB format. It can be pasted, uploaded or retrieved automatically by indicating the PDB identifier and optionally one or several chain identifiers (e.g. "1a2yAB" for PDB identifier 1a2y and chains A and B). Note that InterEvDock3 will score docking models only on standard amino acids of proteins. Nucleic acid chains or ligands are integrated in the rigid-body docking but are not scored. Solvent is removed. Submitting structures to InterEvDock3 will trigger free docking of the given inputs (two structures or a structure and a sequence) cf. free docking with explicit interolog scoring or covariation constraints detailed below.

    • Partner A sequence(s) corresponds to the sequence (or set of sequences) of the partner in case the user does not know its(their) structure. It must be in the FASTA format. It can be pasted or uploaded. If no partner A structure is provided but a partner A set of sequences is provided, automatic template search and homology modeling will be performed by the server to build a (possibly multimeric) structural model for partner A (see the template-based docking section below).

  • Partner B structure or sequence(s): Same as above for the second protein partner.

  • Hot restart (optional)

    It is possible to re-run docking using different conditions such as different templates (TBD MODE 1), constraints, different co-alignments (MODE 3) or different scoring options (including covariation constraints) (MODE 2). Specifying the identifier of a previous run (on the same two partners), the calculation will be much faster since the docking step will have already been calculated and only the scoring steps will need to be re-run. You can also use the hot restart feature to only minimize your output structures by setting the minimization field to "Yes" in the advanced options and keeping the exat same post-processing options as in the previous job.

    input_hotrestart

MODE 1: Template-based modeling/docking

If a sequence or a set of sequences is given as input for one or both partners, InterEvDock3 proceeds with its template-based modeling pipeline for the respective partners. The template is automatically chosen according to sequence identity, coverage and resolution of the potential template structures. At the end of the job, InterEvDock3 outputs a table of alternative templates that the user may want to try out. Templates are sorted by groups of same sequences that are covered (1st column) from the maximum to the minimum number of sequences (6th column). Each group has a representative template "t001" with the best id/coverage/resolution values (columns 4, 5 and 8 respectively). The 7th column specifies what sequences match which chains in the template as well as their individual pairwise sequence identities.

template_table

If the user wants to choose a different template from this list, he can do so by using the hot restart function of InterEvDock3 by specifying the previous job identifier and the new template's 4-letter code in the correspondings boxes.

input_pdb_code

MODE 2: Free docking guided by covariation constraints

This mode is advised only in case large multiple sequence co-alignments can be generated for the partners in interaction. If the user provides 2 input structures and a covariation map calculated with a program such as ComplexContact or trRosetta adapted to run on joint co-alignments, InterEvDock3 proceeds with free docking of the two partner structures guided by covaritation information found in the given contact maps. The format of the map and the different options available for the user to adjust are described here:

  • Impose covariation or deep learning map.

    The user should input a contact_map as obtained for instance from a covariation-based method. First lines are the two sequences used to generate the covariation map in fasta format. Afterwards, each line is a coupling between residues as indexed in the sequences above with format

    > sequence A
    XXXXXX
    > sequence B
    YYYYYY
    ires1 ires2 coupling_val 
    ...

    where ires1 and ires2 are the index of the residues in the fasta sequences and coupling_val is the confidence index as predicted in the covariation analysis methods.

    Example:

    example of input

  • Of note, the server will align the fasta sequences with the pdb sequences. Users don't have to worry about the delimitations of their structures with respect to the sequences. Correspondance between the indexes will be automatically calculated.

  • Modify the distance threshold to count contacts.

    The user can change the distance used to define if a contact is satisfied in the model. By default this value is set at 8A and corresponds to the distance between all heavy atoms in a pair of residues.

  • Modify grouping contacts parameters.

    Contacts between residues close in sequence can be grouped together to prevent that too many redundant contacts bias the selection of the correct model. The size of the window used to group the contacts together is typically of 2, meaning two residues downstream and two upstream with respect to the resid1 and resid2 indicated in the map.

  • Modify the number of decoys evaluated.

    The user can change the number of decoys used to score the covariation-derived contact map. Generally, a first fast run with a small number of decoys can be tested (this typically take 30 min). If a more thorough sampling is required to increase the number of contacts likely to be taken into account in the models, the user can score all the decoys and the run might take several hours (typically about 3 hrs for large proteins).

    example of input

At the end of the job, after downloading the results.zip archive, the user can open his results in PyMOL by double-clicking on the 'start_analysis_cmap.pml' script.

example of input

PyMOL will open and load the models as shown below.

example of input

Satisfied contacts can be precisely analysed for every model and are indexed by their c1, c2, ..., cN names. Contact names have 5 elements separated with "_" symbols, corresponding to contact name, index of the line in the input contact map file, probability of the contact itself, probability of the reference contact in the group of redundancy which is used in the scoring, and model name (e.g. c1_1_0.76_0.84_Complex_CMAPnb1). Contacts can be displayed individually by toggling on and off their line in the panel.


MODE 3: Free docking with explicit interolog scoring

This mode can be run even with shallow multiple sequence co-alignments generated for the partners in interaction (> 10 sequences). When two inputs are given (sequence or structure input for each partner), InterEvDock3 proceeds with free docking of the two partner structures. Co-multiple sequence alignments are automatically generated if not given. Several options are available to the user, the newest options are related to the newly implemented explicit interolog scoring described below. The user can also specify constraints to the docking if structures are given and use his own multiple sequence alignments.

  • Impose co-alignments (optional)
    Two fields can be optionally filled in to specify the multiple sequence alignments associated with each 3D structure. The first sequence in each alignments should match as closely as possible the sequence of the corresponding PDB file (except gaps represented by '-'). The two alignments must contain sequences from the same set of species, appearing in the exact same order. If the multiple sequence alignments are not provided, they will be automatically generated by the InterEvDock server and will be provided in the results to facilitate potential re-submission.

  • Impose constraints (optional)

    A list of constraints can be optionally specified. Each constraint should contain three fields (filled or void) separated by colons (":"). The first field specifies a position on partner A, the second a position on partner B and the third the distance between both residues (contraint pair) or between the residue and any heavy atom on the opposite partner (single constraints). e.g.

    11A:20B:7

    if we want residue 11 on chain A in partner A to be at the interface as well as residue 20 on chain B in partner B with a maximum distance of 7 Å between the two. In order to specify a single constraint, you can just leave out the first or the second field (e.g. 11A::7 if you want residue 11 on chain A in Partner A at the interface with a maximum distance of 7 Å with any heavy atom on Partner B).

    Positions are numbered according to the PDB numbering. Several constraints can be specified, separated by whitespace (space, tabulation or newline). A distance can be optionally specified for each constraint. The default distance is 8 Å for single residues and 11 Å for pairs. These constraints will be taken into account in the docking process to keep only models having this position or pair of positions at the complex interface. Constraints will be checked prior to applying and constraints involving residues not present or buried in the structural model will be excluded. Note that when several constraints are provided, they are considered cumulative ("AND" not "OR") i.e. docking models will be filtered to retain only solutions that verify all constraints.

    In order to avoid round-trips outside the browser, it is possible to use the on-line NGL Viewer viewer to quickly identify residues.


  • Run explicit interolog modeling and scoring (Yes by default).
    This option is on by default since running explicit interolog modeling and scoring improves success rate of the free docking procedure (see section Benchmark), but can be deactivated to recover results close to the InterEvDock2 server results.

  • Number of decoys to score by explicit interolog modeling and scoring (10,000 decoys and 40 explicit homologs by default).
    This option enables tuning of the number of top Frodock decoys considered in the free docking pipeline and the number of explicit interologs used for rescoring. By default, the top 10,000 Frodock decoys are considered together with up to 40 explicit interologs. This can be changed to the top 1,000 Frodock decoys and 10 explicit interologs: this decreases the runtime at the cost of slightly reduced success rate of the free docking consensus.

  • Use Rosetta interface scoring as supplementary score (No by default).
    This option enables rescoring models with the Rosette interface score and integrating this score in the final consensus of the free docking procedure. This option is off by default since it is quite time-consuming and should only be used by non-commercial users due to liscencing conditions; however, it improves the free docking success rate (see section Benchmark).

  • input_explicit_scoring_options

  • Final minimization (optional, No by default)

    If the miinimization option is set to "Yes", structures at the end of a docking round are minimized using the Gromacs_py library[14]. Please take into account that minimization can take 10 to 20 minutes or more according to protein size. We recommend that you run minimization in a second step using the hot restart feature of InterEvDock3 to minimize all models of a previous job if satisfied with the results.

    input_minimisation_option


Results

InterEvDock3 runs last from 15 minutes to over 1 hour depending on the size of the input partners and the server load. The user will be able to follow the progression of his job thanks to the Progress report field of the job page. Errors related to the input data specified are also reported in this field.

ProgressReport

When the job is finished, an interactive page will allow the user to browse the best generated complexes (see below Visualization and post-processing). A zip archive is also provided containing the PDB files of the predicted models and any additional information such as individual and consensus scoring information, consensus interface residue probabilities, multiple sequence alignments used in scoring or detailed template lists when relevant and also contains pymol visualization scripts for local analyses.

Visualization and post-processing of complex models

The best models can be explored using the NGL applet.

ModelsPDB

The predicted interface residues outputted after the free docking protocol can also be visualized on both proteins as a color gradient (from green to white for high to low probability to be at the interface).

PredictedInterface

The evolutionary conservation of each partner (calculated with Rate4Site [9] in the free docking protocol) can be visualized on both partners as a color gradient, from red (more conserved) to white (more diverse) through beige (mild conservation).

Conservation

A PyMOL script provided in the results zip archive for each free docking run automatically loads the 30-50 models (according to selected options) from the results zip file and colors them by interface residue consensus and evolutionary conservation.

In the free docking mode, all calculated scores are reported in a table for the 10 best complexes.

InterEvScore

Visualization of constraints satisfaction

Constraint-satisfaction

If constraints were specified in input, information about each constraint satisfaction is presented as a table.

Demonstration mode

Accessible from InterEvDock3 page by selecting an option from the Demonstration Mode dropdown menu. By setting this option, InterEvDock3 will load pre-configured test cases.

example of demo modes

There are three demonstration mode cases:

  • Demo 1 illustrates the template-based docking feature in InterEvDock3.
  • Demo 2 illustrates the free docking feature in InterEvDock3 guided by atomic-level interolog scoring.
  • Demo 3 illustrates the free docking feature in InterEvDock3 guided by information from contact maps predicted from coevolution and deep learning.

Details on the performance of these demonstration examples are provided below.

Limitations

  • Size limits: Size of each submitted partner structure should lie in the 10-6000 amino acids range. For comparative modeling from sequences, a limit of 6000 residues or 50000 atoms is enforced. For docking shorter peptides, more adapted services such as pepATTRACT can be used.
  • Comparative modeling limits: a maximum of 70 loops can be reconstructed, each compsoed of maximum 50 amino acids.
  • Proteins only: the server is currently not able to dock nucleic acids or small molecules. In the free docking procedure, when nucleic acids or ligands are present in an input protein structure, they will be kept only as steric objects.
  • InterEvDock3 has been tested on various browsers under various operating systems. For Windows users, please prefer Chrome or Opera rather than Edge for which we observed an unstable behavior depending on system configuration.

Demos & examples

Demo MODE 1: comparative modeling

The result page of this demo can be accessed here.

Starting from input sequences only given in the Partner A input box, InterEvDock3 builds a model of the Lachancea thermotolerans SHU1-SHU2 complex through a template-based docking procedure. The automatic template search found the Saccharomyces cerevisiae SHU1-SHU2 interolog (5XYN) with 28% and 36% sequence identity and a total coverage of 98.3% with L. thermotolerans. The resulting output model has a 1.86 Å RMSD ("Medium" quality according to the CAPRI criteria) with its template structure.

Experimental Complex

Example of template-based modeling by InterEvDock3. Output of L. thermotolerans SHU1-SHU2 complex (in orange and blue cartoon) modeled using an interolog structure from S. cerevisiae (5XYN chains C and D in gray cartoon).

Demo MODE 2: use of couplings calculated by sequence covariation-based methods

The result page of this demo can be accessed here.

InterEvDock3 uses free docking between two binding partners to search for the best assemblies matching the interfacial contacts as predicted by covariation-based methods such as Complex-Contact or CCMpred. These methods generate contact maps based on the predicted evolutionary couplings but do not embed any docking method to generate compatible structures. The trRosetta program can also be used to generate these maps by adapting the trRosetta package to run on two joint co-alignments. Here, we illustrate how the interaction between MutS homodimer and MutL homodimer from Escherichia coli can be predicted using their structures as input and the predicted interfacial contacts obtained using the predictions from Complex-Contact which combines co-evolution and deep learning techniques.

Comparison Model Exp MutSMuL

Example of free docking exploiting the contact map predicted by co-evolution and deep learning techniques (using Complex-Contact). Output of E. coli MutS (red/yellow) and MutL (blue/green) homodimers free docking as displayed in the server. The presented model was ranked #1 ("Medium" according to the CAPRI criteria).

Comparison Model Exp MutSMuL

A PyMOL script "start_analysis_cmap.pml" can be downloaded from the results.zip. Double clicking on the script opens the output representing E. coli MutS and MutL free docking models. The contacts respected in the contact map are represented as red sticks (see below for interpretation details). Here, the best model #1 is compared to the reference crystal structure (5AKB in gray cartoon).

Demo MODE 3: free docking on an example from the PPI4DOCK benchmark

On the PPI4DOCK benchmark, our updated approach using evolutionary information at the atomic level provides a general performance boost (see the Benchmark section). The new scoring scheme is based on a 3-way consensus using atomic-level homology scoring. Here, we illustrate this with with two input structures and associated joint multiple sequence alignments for case 1c4z_AD from the PPI4DOCK benchmark.

The result page of this demo can be accessed here. Results using the same input but turning OFF explicit atom-based evolutionary scoring can be accessed here.

Best Model

Best model returned by InterEvDock3. Color darkness reflects conservation of the model in shades of yellow and red for chain A and blue for chain D. This model was returned as top 3 by the consensus and top 6 by atomic-level InterEvScore (IESh) and has an iRMSD of 1.51 Å and DockQ of 0.71 ("Medium" quality according to the CAPRI criteria) with the reference crystal structure (1c4z in gray).

Best Model

Best model returned in the top 10 consensus by InterEvDock3 when not using atomic-level homology integrated into scoring (similar to InterEvDock2). This model was returned as top 3 by the consensus and top 4 by SOAP-PP and has an iRMSD of 16.78 Å and DockQ of 0.04 ("Incorrect" according to the CAPRI criteria) with the reference.

Other examples

Name Description
6-subunit human inner kinetochore A challenging low-identity template-based docking example. InterEvDock3 managed to predict a complex with all six subunits despite sequence identities between human and yeast ranging from 9 to 13% for all subunits.
ComM helicase competence protein in H. pylori An example to illustrate the combination between template-based modeling and free docking guided by a covariation-derived contact map. ComM is composed of two domains belonging to well-known superfamilies but for which a full-length structural template including both domains is lacking. With InterEvDock3, we were able to model the hexameric assemblies for each domain separately (~20% sequence identity) and propose a decent full structural model thanks to free docking of these models guided by the evolutionary signal contained in contact maps calculated by the trRosetta server.
SYCE2-TEX12 (T163 in CAPRI) CAPRI target T163 is a complex between 2 homodimers. SYCE2 homodimer could be modeled with template-based docking in InterEvDock3 using 6H3A (a template structures that was available at the time of the CAPRI challenge). The model was docked against a single helix extracted from TEX12 homodimer structure (PDB:6HK9) and guided by covariation information calculated with RaptorX. The best model satisfies the largest number of inter-molecular contacts between SYCE2 and TEX12 taking into account the ambiguities due to symmetrical arrangement of the SYCE anti-parallel dimer. Another run using as input of the free docking, the structures of SYCE2 and TEX12 subunits as crystallized in the complex 6R17 can also be analyzed. With this docking using bound subunits, 56% of the contacts predicted amond the Top50 most probable contacts were found respected in #1 model while there were 41% of them in the #1 model of the docking using modeled subunits as described above.
6-subunit COMPASS complex in S. pombe Template-based docking of the COMPASS complex based on K. lactis interolog (6BX3) with some subunits sharing less than 20% sequence identity.

InterEvDock3 design

InterEvDock3 takes as input either the structures of two protein partners to be docked: experimental or modeled structures, possibly multimeric, or sets of sequences. If a single set of sequences is provided as Partner A, then comparative modeling will be performed to build a structural model for this set of sequences.

Given a (set of) sequence(s), for a given partner, the server runs several steps to generate a structural model:

  • Use HHblits [10] against the uniprot20 database to build a profile and attempt to identify template structure using HHsearch [11] against the PDB70.
  • Identify templates by HHsearch with a probability higher than 95%.
  • The procedure described in [8] is then used to identify the most suitable template for comaprative modeling. This template is identified as the structure containing the largest number of homologs to input sequences. If there are several such templates, we pick the one with highest sequence identity and coverage.
  • Comparative modeling is then performed in three steps:
    • first, a basic threaded model is built using OSCAR-star [12],
    • then insertions are modeled using the DaReUS-Loop protocol [13],
    • finally the model is minimized using the Gromacs_py library [14].

Given the structures (either input by the user or modeled from input sequences), the server runs several steps to propose a selection of 10 most likely models for each score as well as 10 consensus models and 5 most likely interface residues on each protein:

  • Extracts the sequences of both partners and automatically builds joint multiple sequence alignments ranking homologs of the same species in the same order. Both alignments are used in the scoring process. This process is fully automatized, including for multimeric inputs: first, blastp [15] is used to search against the Uniprot-KB database with threshold sequence identity > 30%, coverage > 75% and E-value < 10–4. Only the sequence with the best identity is kept for each species. Pairs of sequences belonging to the same species are collected and redundant paired sequences with sequence identity higher than 90% with the query are removed. Sequences are re-aligned using MAFFT giving, in the end, a set of two MSAs containing exactly the same number of sequences in the same species order. Users can alternatively submit their own co-alignments.
  • Performs an exhaustive rigid-body search using the FRODOCK2 algorithm [1]
  • If constraints are provided by the user: Filter FRODOCK2 decoys according to user-defined constraints.
  • FRODOCK2 decoys are clustered using a ligand RMSD threshold of 4Å and ranked with respect to their energy.
  • The best 10,000 FRODOCK clusters are then rescored using homology-enriched versions of the InterEvScore [2] and SOAP-PP [3] statistical potentials. For this purpose, a subselection of maximum 40 sequences from the co-alignments is used to build explicit homologous complex models that are further rescored. Several options are available at this stage:
    • deactivate homology scoring based on explicit homologs (this reverts to an InterEvDock2-like scoring scheme based on InterEvScore and SOAP-PP)
    • use the top 1000 FRODOCK decoys and only 10 explicitly modeled homologs (this saves time but has a slightly decreased free docking success rate)
    • use the Rosetta interface score as an additional scoring function (this increases runtime by a large amount, while improving slightly the free docking success rate)
  • The 10 most likely models are selected by the InterEvDock consensus method, by grouping similar models well-ranked by different scoring functions.
  • Finally, a selection of 5 residues on each protein is proposed. Those are the residues most likely involved in the interface based on the best models, which can subsequently be used to implement constraints in further rigid-body or flexible docking simulations or to guide mutagenesis for interface disruption. Users can focus on the top 1 or 2 predicted residues on each chain which should already provide relevant constraints.

If covariation-based pairwise constraints are provided by the user (e.g. derived from deep-learning-assisted predictors such as ComplexContact), the scoring process is bypassed and replaced by the counting of the predicted contacts satisfied in every docking model. The server runs several steps to match the residue indexes between the contact map (derived from sequence) and the input structures, then counts contacts using several options that are described below:

  • Aligns and maps the sequences of the input structures with the sequences used to generate the covariation-based contact map (the latter sequences can be provided on top of the list of residue-residue contacts). This step is essential since structures can include gaps or correspond to a single domain of a large protein. Covariation contact maps are generally generated using full-length sequences to maximise the number of homologs which can be retrieved. It is essential that the amino-acids in the contact map are properly assigned to their corresponding residue in the structures of both partners
  • Performs an exhaustive rigid-body search using the FRODOCK2 algorithm [1]
  • From the list of predicted contacts provided in the input contact map, the server scores every model by analysing residues in contacts (defined as two residues having an atom pair at less than 8A). Either the number of satisfied contacts (CMAPnb) or the sum of the predicted coupling intensities (third column of the input table) is calculated (CMAPscore).
  • Residues close in sequence can give rise to redundant covariation signal in predicted contact map with little impact in increasing models reliability. To reduce the weight of these redundant contacts and increase the importance of the others, contacts can be grouped by sets of redundant contacts (index of the groups are indicated in column 4 of the input list of contacts). The value assigned to the group is indicated in column 5 of the list of contacts(highest coupling value by default)
  • Input structures may correspond to homomeric assembly of the same subunit (as in the example of free docking between MutS homodimer and MutL homodimer (see demo 3)). In that case, contacts listed in the covariation contact map are ambiguous since we a priori ignore which homomeric subunit is involved in the signal. For those common cases of homomeric assemblies, InterEvDock3 was specifically adapted so that satisfaction of contacts in the models accounts for this ambiguity. As a results many more constraints can be taken into account which improves significantly the reliability and the proper ranking of the models.
  • The 10 models with highest number of satisfied contacts (CMAPnb) and the 10 models with highest score (CMAPscore) are selected by the InterEvDock3 server and their structures are displayed in the result page ranked by their scores which are displayed in the table below.

Features

  • User input can be a (set of) protein sequence(s) for one or both partners. In this case template search for all the subunits and comparative modeling is performed on the fly by the InterEvDock3 server before free docking.
  • In case a (set of) sequence(s) is provided only for Partner A, only template search and comparative modeling will be performed.
  • User input can be a mono- or multimeric protein structure. The generation of the joint MSAs is fully automated as for monomeric structures.
  • Generation of multiple sequence alignments for co-evolved partners: The InterEvDock3 server automatically generates multiple sequence alignments of the binding partners, so that homologs of the same species are aligned in the same order in both alignments.
  • User input can be the job identifier of a previous but non-expired job in which case InterEvDock3 will run using the raw docking output and the structures used in that previous job. This is especially usefull if the user wants to reconsider their constraints or to try out different free docking options (e.g. Rosetta interface rescoring).
  • User-defined constraints: in case the user has biological input such as a position that is known to be involved in the interface between the two protein partners, or a distance constraint between a pair of positions, constraints can be specified for use in the free docking procedure.
  • Constraints from covariation-based methods, e.g. from deep-learning-assisted predictions. Output of these methods are generally presented as a three column file ranking the inter residue contacts by their co-evolution scores. InterEvDock3 can interpret these contact map files to extracts all the the structural models satisfying best the information contained in these contact maps. Users may typically input the top 100 contacts with highest co-evolution score
  • Selection of most likely binding modes: Starting from two structures (or structural models) of interacting proteins, InterEvDock identifies a maximum of 10 candidate binding modes for each of the 3 complementary scores computed. It also offers a selection of the 10 best consensus models.
  • Graphical exploration of the complexes: The structures of the complex models can be explored thanks to the NGL applet (NGL applet (A.S. Rose & P.W. Hildebrand), a WebGL-based viewer for proteins and other macromolecular structures. Both the evolutionary conservation of each partner (calculated with Rate4Site [9]) and the consensus interface can be visualized as color gradients on the surface of both protein partners.
  • Selection of 5 residues most likely involved in the interface on each protein: The 5 residues on each protein partner most likely involved in the interface are displayed in a table, together with their rank. Those residues can subsequently be used to implement constraints in flexible docking simulations or to guide mutagenesis for interface disruption. Users can focus on the top 1 or 2 predicted residues on each chain which should already provide relevant constraints.
  • Coordinates of the complexes and alignments are available for further off-web exploration: The selected models of complexes are available in the PDB format. The multiple sequence alignments of each subunit can also be retrieved in fasta format. A PyMOL script provided in the results zip archive for each run automatically loads the 30 models from the results zip file and colors them by interface residue consensus and evolutionary conservation.

Benchmark

Scoring enriched with atomic-detail homology

Homology is cunningly derived to the atomic-level thanks to a basic and very conservative comparative modeling of homologous sequence pairs in the provided or generated coMSAs using OSCAR-star [12]. This new representation of evolutionary information is directly compatible with scores such as SOAP-PP and Rosetta's Interface Score and proved to significantly improve sucess rates of individual scores and consensuses of these scores [5].

InterEvDock3 benchmark dataset

Performance of InterEvDock3 was assessed on 812 complexes from the PPI4DOCK database [16] for which the structures of the free proteins (unbound) can be modeled and for which evolutionary information could be retrieved [2]. A table of all 812 benchmark cases is available here.

InterEvDock3 results

The InterEvDock3 server predicts an "Acceptable" or better solution in the consensus top10 for 268 out of 812 test cases in default mode (IED3-atom default, 33%) and for 284 out of 812 in slow mode (IED3-atom slow, 35%). As expected, this top10 success rate decreases with increasing difficulty of the docking cases: 45.4% for very easy targets, 34.5% for easy targets, 12.7% for hard targets and 9.1% for very hard targets according to PPI4DOCK difficulty classification (51.7, 36.1%, 11% and 4.5% respectively).

The InterEvDock3 server also predicts residues making contacts at the interface of a complex based on the analysis of all the interfaces of the top 10 decoys for all scores composing the consensus (30-50 models depending on the options). In 90.8 % of the 812 test cases, at least one residue out of 10 was correctly predicted as present at the interface, providing very useful hints to guide mutagenesis experiments to disrupt a complex of interest. Of note, there is little decrease in precision from the rigid-body to the difficult cases (success rate is 92.5% for very easy targets, 91.8% for easy targets, 84.7% for hard targets and 86.4% for very hard targets according to PPI4DOCK difficulty classification). Predictions of the InterEvDock3 server can thus also be used as a prior to constrain more thorough docking simulations requiring flexibility in order to model the correct orientation between two binding partners. In that perspective, in 53.4% of the cases, at least one correct residue is predicted on both sides of the interface (61.5% for very easy targets, 54.6% for easy targets, 37.3% for hard targets and 50.0% for very hard targets according to PPI4DOCK difficulty classification). When considering only the top 1 predicted residue on each chain, at least one of the two predicted residues is correct in 77.5% of the cases and both are correct in 37.1% of the cases, highlighting the practical value of InterEvDock3 residue prediction. All those results are significantly higher than a reference interval given by random selection of residues on the surface of the protein.

References

[1] Ramirez-Aportela E, Lopéz-Blanco JR, Chacón P.
FRODOCK 2.0: fast protein-protein docking server.
Bioinformatics. 2016; 32(15):2386-8.
[2] J Andreani, G Faure, R Guerois.
InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution.
Bioinformatics 2013; 29 (14):1742–1749.
[3] GQ Dong, H Fan, D Schneidman-Duhovny, B Webb, A Sali.
Optimized atomic statistical potentials: assessment of protein interfaces and loops.
Bioinformatics. 2013; 29(24):3158-66.
[4] Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D.
Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations.
J Mol Biol. 2003;331:281-99.
[5] Quignot C, Granger P, Chacón P, Guerois R, Andreani J.
Atomic-level evolutionary information improves protein-protein interface scoring.
bioRxiv 2020.10.26.355073.
[6] Yu J, Vavrusa M, Andreani J, Rey J, Tufféry P, Guerois R.
InterEvDock: A docking server to predict the structure of protein-protein interactions using evolutionary information.
Nucleic Acids Res. 2016 Jul 8;44(W1):W542-9.
[7] Quignot C, Rey J, Yu J, Tufféry P, Guerois R, Andreani J.
InterEvDock2: an expanded server for protein docking using evolutionary and biological information from homology models and multimeric inputs.
Nucleic Acids Res. 2018 Jul 2;46(W1):W408-16.
[8] Postic G, Marcoux J, Reys V, Andreani J, Vandenbrouck Y, Bousquet MP, Mouton-Barbosa E, Cianfériani S, Burlet-Schiltz O, Guerois R, Labesse G, Tufféry P.
Probing Protein Interaction Networks by Combining MS-Based Proteomics and Structural Data Integration.
J. Proteome Res. 2020 Apr 27;19(7):2807-2820.
[9] T Pupko, RE Bell, I Mayrose, F Glaser, N Ben-Tal
Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues.
Bioinformatics. 2002; 18 Suppl 1:S71-77.
[10] Remmert M1, Biegert A, Hauser A, Soding J.
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.
Nat Methods. 2011;9(2):173-5.
[11] Soding J.
Protein homology detection by HMM-HMM comparison.
Bioinformatics. 2005; 21(7):951-60.
[12] Liang S, Zheng D, Zhang C, Standley DM.
Fast and accurate prediction of protein side-chain conformations.
Bioinformatics. 2011 Oct 15;27(20):2913-4.
[13] Karami Y, Rey J, Postic G, Murail S, Tufféry P, de Vries SJ.
DaReUS-Loop: a web server to model multiple loops in homology models.
Nucleic Acids Res. 2019 Jul 2;47(W1):W423-8.
[14] Murail S
Gromacs_py
Github repository
[15] Tatusova TA, Madden TL.
BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences.
FEMS Microbiol Lett. 1999;174(2):247-50.
[16] Yu J, Guerois R.
PPI4DOCK: large scale assessment of the use of homology models in free docking over more than 1000 realistic targets.
Bioinformatics. 2016; 32(24):3760-3767.

Funding

This work was supported by the French Infrastructure for Integrated Structural Biology (FRISBI) [ANR-10-INSB-05-01], the Agence Nationale de la Recherche through grants CHIPSET [ANR-15-CE11-0008-01] and ESPRINet [ANR-18-CE45-0005-01], the French Institute for Bioinformatics (IFB) [ANR-14-2011-IFB], the IdEx Université de Paris [ANR-18-IDEX-0001], the IDEX Paris-Saclay [IDI 2017], the MINECO [BFU2016-76220-P], the AEI/FEDER and UE [PID2019-109041GB-C21].