Proteo3Dnet

A web server for the integration of structural information with interactomics data

Overview

The vast majority of cellular functions are ensured by molecular machines made of non-covalent protein–protein interactions (PPIs) and called protein complexes. Such interactions can be identified by genetic techniques such as the yeast-two hybrid system, MAmmalian Protein–Protein Interaction Trap (MAPPIT), protein complementation assays, or by biochemical techniques using enrichment strategies to capture a tagged protein (called the bait) under conditions supposed to preserve macromolecular associations that can be analysed by Mass spectrometry (MS). Most of the time, interactome studies stop at listing the proteins interacting with the bait, without performing further analysis of the identified sequences with external information. Useful sources of information include the BioGRID database that stores information about protein interactions obtained from various sources, the knowledge of experimentally resolved 3D structures available in the Protein Data Bank, or knowledge available for more transient interactions involving Short Linear Motifs (SLiMs), organized in a resource such as ELM. Structural and evolutionary aspects provide a powerful analysis framework for biologists—e.g. for interpreting patients mutations that interfere with assemblies, setting up directed mutagenesis and functional dissection experiments, or virtual screening. This stresses the need for integrative pipelines resulting in a structured overview of the detected interactions.

To enhance the analysis of the protein-protein interaction networks, the Proteo3Dnet pipeline aims at enriching raw interactome data with structural information. Homology-based detection of complexes with 3D structures available in the Protein Data Bank (PDB) [1], and involving proteins of the input list is carried out with HHsearch [2]. Annotations of homomultimeric complexes, as well as interaction data from the BioGRID [3] and the eukaryotic linear motifs (ELM) [4] resources are also integrated into the analysis.

As a result, the input collection of sequences is organized in terms of 3D interactions including hetero- and homo- complexes, obligatory or transient complexes. An interaction graph is built, that can be interactively explored or downloaded. This graph distinguishes the nodes corresponding to the input data from those identified in 3D complexes or in the BioGRID database but not present in the input, i.e. possibly missing in the input, providing the means to revisit the interactome data. Further analysis of the complexes can help understanding the interactions between the input proteins. The interactive online exploration of the results is made possible thanks to cytoscape.js [5], MolArt [6] and the ebi-interaction viewer.


Access the service through the RPBS Web Portal:


Link to pre-calculated results

Pre-calculated results, in Normal mode, for the preset data (Pragmin interactome; see below) can be accessed here.

Features

Integrated data

  • Identification of candidate complexes of known structure from the PDB by grouping input sequences.
    • Candidate 3D template structures for each input sequence based on remote homology using HHearch [2], followed by thorough HHsearch cluster expansion/analysis.
    • Alternative fast search for templates based on Swiss Model Repository (SMR) [8].
    • Complex identification by mapping the templates to chains of 3D structures corresponding to complexes.
    • Candidate homo-multimer retrieval based on PDB annotations.
  • Identification of interactions from other sources.
    • Search for transient interactions involving Short Linear Motifs (SLiMs) based on ELM.
    • Integration of BioGRID (validated interactions).
    • Integration of IntAct.
  • Interactive display/exploration of the inferred graph of interactions.
  • Interactive display of the structure of the template complexes.


  • Flowchart of the Proteo3Dnet integrative interactomics pipeline (full screen)

PPIs analysis

The data submitted by users may suffer from the presence of identified PPIs that do not occur physiologically (false positives, FP) and the non-detection of genuine associations (false negatives, FN).
After the search for 3D structures, each protein from the input list is associated to one, several, or no PDB entries. When several proteins from the submitted list share the same PDB entry (3D complex) (while corresponding to distinct protein chains), they are considered as true partners (true positives, TP). In this case, when the PDB entry found has one or several chains that are not present in the input list, these chains are then considered as potentially undetected partners (FN).
When two input proteins share a common partner while being seemingly mutually exclusive, this may help identifying FP: typically, subunits that are alternatively part of a complex, but do not interact with each other.

Thanks to the search for remote homologies, Proteo3Dnet will connect two input candidates, when the two corresponding homologous proteins are both found interacting within the same PDB structure. In such case, the interaction is not 100% confirmed, as the confirmation relies on the conservation of the interaction throughout evolution. That is why the sequence identity (%) is provided in the results produced by Proteo3Dnet (graph and tables).

Browser compatibility

OS Version Chrome Firefox Microsoft Edge Safari
Linux Ubuntu 18 79.0 71.0 n/a n/a
MacOS Mojave 79.0 72.0 n/a 12.0 ⚠
Windows 10 79.0 72.0 44.18362.449.0 n/a

For Safari 12.0 on MacOS Mojave, the graph visualization freezes when opening a MolArt window.


Proteo3Dnet usage

Note: Selecting Yes to test the service with preset data will fill the input fields with the protein sequences from the Pragmin interactome (61 sequences), as in [7]. For this dataset, the typical execution time is normally on the order of 10-15 mn, but can increase depending on server load.
Note that execution time depends on the number of sequences and can thus increase largely for larger datasets.

Limitations

If the number of submitted proteins is greater than 400, the graph representation will not be displayed on the results page: users will have to download the graph representation (as a compressed archive) for a local viewing.

Input

The web server requires several inputs:

  1. Query: The list of candidate protein partners identified by proteomics experiments. It can consist in either a list of Uniprot Ids (one per line) or a series of valid UniProt sequences in the FASTA format, i.e. FASTA sequences in which the UniProt identifier is specified as; for instance:
    >sp|Q96HS1|PGAM5_HUMAN Serine/threonine-protein phosphatase PGAM5, mitochondrial OS=Homo sapiens OX=9606 GN=PGAM5 PE=1 SV=2
    MAFRQALQLAACGLAGGSAAVLFSAVAVGKPRAGGDAEPRPAEPPAWAGGARPGPGVWDP
    NWDRREPLSLINVRKRNVESGEEELASKLDHYKAKATRHIFLIRHSQYHVDGSLEKDRTL
    TPLGREQAELTGLRLASLGLKFNKIVHSSMTRAIETTDIISRHLPGVCKVSTDLLREGAP
    IEPDPPVSHWKPEAVQYYEDGARIEAAFRNYIHRADARQEEDSYEIFICHANVIRYIVCR
    ALQFPPEGWLRLSLNNGSITHLVIRPNGRVALRTLGDTGFMPPDKITRS
    
    where the UniProt identifier is Q96HS1, and the '|' separators are required. Of note, mixing input from different species is not allowed. When detected, only the data of the most represented species, or that belonging to the Organism specified in the advanced options (see below) will be processed.
  2. Mode: In the Normal mode the homology search is performed using HHsearch. However, many protein models being available in the Swiss Model Repository, selecting the Fast mode will trigger instead the identification of models from the SwissModel repository, based on the UniProt Ids, which is must faster. The Fast mode can prove useful to provide a quick look at the data. Note however that SwissModel usually contains only models at a rather high sequence identity, and thus significantly enhanced results are expected from the normal mode in which more remote homology links can be detected.
  3. Label: It will be used as a prefix of the result files, for user convenience
  4. Organism (Optional): By specifying the name of a species in this field, only the input data belonging to that species will be considered.
  5. Fold change (Optional): The value of the fold change (number of protein copies detected in the sample) for each candidate protein. It will be used to modulate the size of the nodes in the graph display of the results [5]
Note: The list of proteins submitted as an input may originate from any type of sources: proteomics experiments most often, but not necessarily (e.g. a list of candidate partners established from analyzing the literature).

Results: example of the Pragmin dataset.

Pre-calculated results for the Pragmin dataset can be accessed here.
  • Progress report

    This section will incrementally provide information about job progression and errors if any.

    A typical run should produce a progress report similar to the following:
    Progress

    Slight variations can occur depending on the version of the service.
    Errors related to the input data specified are also reported in this field.

  • Protein-protein interactions: visualization

    The results produced by the pipeline can be visualized as a graph, thanks to Cytoscape.js. The nodes are of three types: input proteins, undetected partners, and BioGRID partners. The last result from the integration of the BioGRID data. Above the graph viewer are buttons which allow to select nodes and edges depending on their attributes. Details on how to handle the graph are provided below

    GraphViewer
  • Links to external resources for graph nodes:

    GraphViewer
    Right clicking on any node of the graph activates proposes to get further information either opening the UniProt page related to the protein, or to open it using MolArt [6], which displays the 3D structure of the interaction partner. Thus, the protein chain can be visualized in the context of the oligomeric assembly. The MolArt viewer also provides various annotations of the protein sequence, depending on the data available.
    For instance, right clicking on the AAPK1 node of the result page of the preset data, and choosing MolArt, one gets redirected to the following page (for more information, please refer to the MolArt [6] documentation).
    MolArt viewer
  • Protein-protein interactions: tables

    The results of the search for protein-protein interactions is summarized using several tables:


    The homology_detection table summarizes the results of the homology search.

    HHsummary
    For each input sequence, it reports the hetero-complexes in which the protein was detected to participate in, and information about the best template identified by HHsearch (PDB identifier, organism, sequence identity, coverage). For instance, the AAKB1 (Q9Y478) is found to belong to 3 hetero-complexes (c002, c004 and c011).


    A second table summarizes the results of the search for protein hetero-complexes.

    HeteroComplexes
    It summarizes the 3D complexes with experimental structures identified as involving the input data. Complexes are denoted as c001, etc.
    The table columns describe, from left to right: the complex unique identifier, the maximal average sequence identity of chain with the corresponding protein of the input data over all equivalent template complexes of known structure identified, the number of input proteins mapped to the complex structure, the number of PDB structures corresponding to similar complexes, the maximum completeness of the input/template alignment, the number of chains of the template complex that do not have an identified equivalent in the input data, the parent and child complex identifiers for entries corresponding to smaller complexes. For instance, for c001, one observes that 4 over 8 chains seem not to have an equivalent in the input data. One also sees that c002 (3 chains) seems to correspond to an assembly encompassing that of c004 (2 chains). Clicking on the complex unique identifier gives access to an ancillary table describing the mapping between the complex structures and the input proteins. For instance, clicking on the complex c002, the table describes the input mapping to all chains of all 25 structures identified with the same assembly, belonging to different species. Interestingly, one sees that better sequence identity seems to occur for structures resolved for the RAT (98.7%), and not for H. Sapiens (95.3%). Looking in more detail, one observes that this difference comes in fact from AAPK1 (86%), as the human structure contains the AAPK2 paralog instead of AAPK1.
    Complex c001


    A third table summarizes the results of the homo-oligomeric state of each protein.

    HomoMultimers
    It reports for each input protein, the oligomeric state proposed in the biological unit of the template structure, as proposed in the PDB. Author's assignments are preferred to PISA prediction. For instance, a dimeric state is proposed for the PLK1_HUMAN (P53350), not detected as belonging to a hetero-complex (see above).


    Confronting 3D results to that of litterature.
    Finally, UniProt binary interactions involving the input proteins are also presented, although not directly related to the 3D search. For each input protein, this provides an overview of the interactions with other input proteins annotated in the IntAct database, thanks to the ebi-interaction viewer.

    Binary interactions
    • Selected an entry (a protein of the input) to see the interactions described in IntAct.
    • Additional filters apply to subcellular localization and diseases
    • The table summarizes the interactions between all interaction partners of the entry. Pairwise interactions described in the IntAct database are depicted using blue spots. Color intensity is a function of the number of experiments reporting the interaction. More information can be accessed by clicking on the spots.


Appendix: graph visualization

Details on the features implemented for the Proteo3Dnet graph viewer are presented in this section.

  • Color code

    Nodes
    There are three types of nodes : those who correspond to input proteins are blue ; those added by the structure-based analysis are gray ; and those added by the BioGRID analysis are green. When using the “Fold change” (FC) advanced option, input nodes with a negative log(FC) are red while the others remain blue.

    Edges
    The Proteo3Dnet pipeline represents interactions between proteins by are 4 types of edges:
    (i) Edges produced by the structure-based analysis and connecting input proteins. Those are thick, partially opaque and can be of five colors (blue, green, yellow, red, black) depending on the evolutionary distance (seq id ≥95%, 80%, 50%, 30%, 0%, respectively) between the two input proteins and their homologs that are found interacting within an experimental structure of the PDB.
    (ii) Edges produced by the structure-based analysis and connecting input proteins and undetected partners. Those are thin and black.
    (iii) Edges produced by the ELM/BioGRID analysis and connecting input proteins. Those are thin and gray.
    (iv) Edges produced by the ELM/BioGRID analysis and connecting input proteins and additional partners from BioGRID. Those are thin and green.
    When selected, nodes and edges become magenta (except structure-based edges, which keep their color).

    Graph viewer
  • Selection

    Mouse
    Users can select nodes and edges, by directly left-clicking on them.
    Holding SHIFT key while clicking allows to select more than a single item, either by (i) successive left clicks on multiple items, or (ii) by dragging a selection area.
    To unselect all items, simply click in a void area of the viewer.

    The use of the right click is described below.
    The center clicking has no effect.


    Viewer inputs
    Users can also interact with the graph representation, thanks to different inputs that we have implemented in the viewer.

    Numbers refer to the screenshot above

    #1: Typing here one or several names of proteins (e.g. SIAH1, TERF2) and then clicking on the “Select” button will select the corresponding nodes. This is helpful to find a particular protein when the nodes are numerous.

    #2: Clicking one the elements of this list will select the complexes (named “c001”, “c002” etc.) identified by Proteo3Dnet. Clicking on “All” will select all proteins that have been found in 3D complexes. This is helpful to see at a glance the proportion of input proteins that are covered by the structure-based analysis.

    #3, #4, #5: Checking one or more of these three boxes will select proteins, depending on whether they come from the input dataset, or they have been added in the graph representation, by the structure-based analysis, or the ELM/BioGRID analysis. This is helpful to identify proteins submitted by the user, especially when they are numerous.

    Once an item is selected, the selection can be extended.

    #6: “Neighbors” are nodes that are directly connected to the selected node(s). Multiple consecutive clicks on this button will select nodes with increasing degrees of separation. This is helpful to observe indirect interactions.

    #7: The “Invert” button allows to simultaneously (i) select the unselected nodes and (ii) unselect the selected nodes. This is mainly helpful in combination with the “Action” buttons presented below.

    #8: For a selected edge, the two connected nodes can be selected as well, by clicking on the “Connected nodes” button. This is helpful for visually delineating PPIs within dense graphs.

    These functions are not mutually exclusive: users can combine them for creating custom selections.

    The properties of a current selection of nodes are shown into two windows.

    #9: Here, the number and names of the selected proteins are displayed, separated by semi-colons. This text area can be expended, and its content copy/pasted. When thus saved, a custom list of nodes can be later selected by using the input #1.

    #10: If a node belongs to one or several complexes identified by the structural analysis, its selection will trigger the display of the complex names in this window. This is useful to observe the overlap between local PPI networks.

  • Action

    Several modifications can be applied to the selected items.

    #11: The “Focus” button will increase or decrease the zoom, so that the whole selection is visible within the viewer. This is particularly helpful, in combination with input #1, to browse through large graphs.

    Note: the zoom can be otherwise modified, either by scrolling or by using the cursor at the top left of the viewer.

    #12, #13, #14: Selected items can be hidden, in order to ease the visualization of dense graphs. When a connected node is hidden, its partner(s) will display a black border. Such black-bordered nodes can be selected, and by clicking on the “Show” button, have their hidden connections revealed back. Combined with the aforementioned “Invert” button, hiding nodes is a way to highlight specific nodes in a graph. The other way is presented below. Successive hide/show manipulations can alter the representation in a seemingly irreversible manner. To restore the initial display, users can click on the “Reset” button.

    #15, #16: Highlighting is enabled by clicking on the activate button. All the non-selected items will have their opacity decreased. However, unlike with the Invert+Hide combination of buttons, the non-highlighted nodes and edges are still selectable. This highlighting can be removed with the “Cancel”.

  • Right click

    Nodes and edges (whether they are selected or not) can be right-clicked. This will trigger a tooltip, the content of which varies depending on the type of item.

    Nodes
    The tooltip appearing on every right-clicked node contains two hyperlinks: one redirects (in a new thumbnail) to the corresponding UniProt page; the other opens the MolArt protein viewer, which allows to visualize the 3D structure of the protein, as well as some linear annotations.

    Edges
    Right-clicking on edges established by the structural analysis will give you the PDB IDs of the protein structures that have enable to infer this interaction. Each PDB comes with a sequence identity (in %) which measures the evolutionary distance between the two input proteins and the two PDB chains found for these proteins (with our procedure for detecting distant homologies). The structural analysis also establishes edges with additional nodes (labeled “undetected”). In this case, the list of PDB IDs is displayed without the sequence identity.
    The second type of edges are those found with the ELM/BioGRID analysis between input nodes. In this case, a right-click on the edge will display hyperlinks for the ELM motif found on one node and the Pfam domain found on the other.
    Finally, edges between input protein and additional BioGRID partners will simply display a link to the BioGRID database.

Project history

  • 2019, December: Enhanced version of Proteo3Dnet with enhanced presentation of the ELM results.
  • 2019, November: Enhanced version of Proteo3Dnet as a web server (binary interactions, more synthetic presentation of the results).
  • 2019, November: MS2MODELS presentation at the SFCi meeting.
  • 2019, November: MS2MODELS presentation at the MASIM workshop.
  • 2019, October: Implementation of an interface to ELM.
  • 2019, October: Many bug fixes related to the management of hhsearch clusters. Extensive test for several data sets on pragmin, proteasome and p73 MS interactome data.
  • 2019, July: Proteo3Dnet as a web server in the RPBS mobyle portal. Embeds cytoscape.js and MolArt interfacing.
  • 2019, April: MS2MODELS presentation at the GGMM congress.
  • 2018, November: First implementation of full Proteo3Dnet pipeline.
  • 2018, July: Project start.
  • 2018, April: MS2MODELS project accepted.
  • 2017, November: IFB call for pilot projects. Application of the MS2MODELS project to enrich MS interactome data in the light of available 3D structures.

References

[1] Rose, P. W., Prlić, A., Altunkaya, A., Bi, C., Bradley, A. R., Christie, C. H., ... & Green, R. K.
The RCSB protein data bank: integrative view of protein, gene and 3D structural information.
Nucleic acids research 2016; gkw1000.
[2] Söding, J.
Protein homology detection by HMM-HMM comparison.
Bioinformatics 2004; 21(7), 951-960.
[3] Oughtred, R., Stark, C., Breitkreutz, B. J., Rust, J., Boucher, L., Chang, C., ... & Zhang, F.
The BioGRID interaction database: 2019 update.
Nucleic acids research 2019; 47(D1), D529-D541.
[4] Gouw, M., Michael, S., Sámano-Sánchez, H., Kumar, M., Zeke, A., Lang, B., ... & Diella, F.
The eukaryotic linear motif resource - 2018 update.
Nucleic acids research 2017; 46(D1), D428-D434.
[5] Franz, M., Lopes, C. T., Huck, G., Dong, Y., Sumer, O., & Bader, G. D.
Cytoscape. js: a graph theory library for visualisation and analysis.
Bioinformatics 2015; 32(2), 309-311.
[6] Hoksza, D., Gawron, P., Ostaszewski, M., & Schneider, R.
MolArt: a molecular structure annotation and visualization tool.
Bioinformatics 2018; 34(23), 4127-4128.
[7] Lecointre C, Simon V, Kerneur C, Allemand F, Fournet A, Montarras I, Pons JL, Gelin M, Brignatz C, Urbach S, Labesse G, Roche S.
Dimerization of the Pragmin Pseudo-Kinase Regulates Protein Tyrosine Phosphorylation.
Structure. 2018;26(4):545-554.
[8] Bienert S, Waterhouse A, de Beer TA, Tauriello G, Studer G, Bordoli L, Schwede T.
The SWISS-MODEL Repository-new features and functionality..
Nucleic Acids Res. 2017;45(D1):D313-D319.