PEP-FOLD 3

De novo peptide structure prediction.

Overview

PEP-FOLD is a de novo approach aimed at predicting peptide structures from amino acid sequences. This method, based on structural alphabet SA letters to describe the conformations of four consecutive residues, couples the predicted series of SA letters to a greedy algorithm and a coarse-grained force field.

What's new

  1. PEP-FOLD3 latest evolution comes with a new 3D generation engine , based on a new Hidden Markov Model sub-optimal conformation sampling approach, faster by one order of magnitude than the previous greedy strategy, while not affecting performance. This makes possible the on-line generation of models for peptides from 5 to 50 amino acids in a few minutes.
  2. PEP-FOLD3 offers new possibilities to refine pre-existing models and/or to generate decoys keeping rigid regions of the structure.
  3. PEP-FOLD3 also comes with the possibility to generate candidate conformations of peptide-protein complexes , by folding peptides on a user specified patch of a protein. The peptide is fully flexible and the protein is kept rigid. Note that PEP-FOLD coarse grained representation implies that candidate conformations undergo further refinement (not in the scope of the server) to reach high accuracy modelling.

Still active

PEP-FOLD former version available here is based on the greedy strategy can perform 3D modeling for linear peptides up to 36 amino acids, and allows user specified constraints such as disulfide bonds and inter-residue proximities.

1w4g model and experimental conformations

1w4g PEP-FOLD best model (cyan) and experimental (green) conformations. Rigid core BCscore: 0.97 (1 is perfect, 0 is random, -1 is mirror).

2ysh model and experimental conformations

2ysh PEP-FOLD best model (cyan) and native (green) conformations. Rigid core BCscore: 0.89.

1AWR model and experimental conformations

1AWR PEP-FOLD best complex (magenta) and experimental (green/cyan) generated. Peptide RMSd is of 1.34 A.

Note: results vary depending on the runs, and best complex and lowest energy complex can differ.

Access the service through the RPBS Mobyle portal:

When using this services, please cite the following reference:

Lamiable A, Thévenet P, Rey J, Vavrusa M, Derreumaux P, Tufféry P.
PEP-FOLD3: faster de novo structure prediction for linear peptides in solution and in complex.
Nucleic Acids Res. 2016 Jul 8;44(W1):W449-54.
Shen Y, Maupetit J, Derreumaux P, Tufféry P.
Improved PEP-FOLD approach for peptide and miniprotein structure prediction
J. Chem. Theor. Comput. 2014; 10:4745-4758
Thévenet P, Shen Y, Maupetit J, Guyon F, Derreumaux P, Tufféry P.
PEP-FOLD: an updated de novo structure prediction server for both linear and disulfide bonded cyclic peptides.
Nucleic Acids Res. 2012. 40, W288-293.

New Features

  • Peptide structure prediction

    Starting from a single amino acid sequence from 5 to 50 standard amino acids, PEP-FOLD3 runs series of 100 simulations. Each simulation samples a different region of the conformational space. It returns an archive of all the models generated, the detail of the clusters and the best conformation of the 5 best clusters. Note the faster simulation engine makes possible to run PEP-FOLD3 on only one 8 cores machinesn usually still returning models in less than half an hour (this may vary depending on server load).

    Warning: PEP-FOLD prediction and folding is presently assumed for neutral pH.
  • Biased modeling

    The PEP-FOLD 3 service accepts information to bias the 3D modeling for user specified regions. User input can impose that some pre-defined local conformations are preserved during modeling, either by providing a former 3D model and indicating which regions are to be propagated to the simulations, or by indicating a list residues for which the PEP-FOLD internal optimal prediction can be considered as accepted. This makes possible iterative modeling in the aim to refine former models, or to geenrate decoy conformations sampling different topologies for instance.

  • Protein-peptide complex modeling

    PEP-FOLD3 accepts to fold peptides in the viccinity of a protein receptor, starting from the fuzzy definition of a patch of interaction defined by the user. It is important to understand that the models generated for protein-peptide complexes are generated using a coarse grained representation, which most often must be supplemented by further refinement steps that are not presently included in the PEP-FOLD3 server. During the modeling of protein-peptide interactions, several starting points are generated close to the protein from the user specified patch, then the peptide is grown one residue by one residue in the presence of the protein. Once complete, pepetide structure is refined using a Monte Carlo procedure. The peptide is fully flexible, the protein coordinates are unaffected (rigid).

Limitations

  • Amino acid sequence size

    Presently, PEP-FOLD prediction is limited to amino acid sequence between 5 and 50 residues, although PEP-FOLD has been tested off-line for sizes up to 80 amino acids. For sizes more than 50 amino acids, users can contact the authors. For peptides shorter than 5 amino acids that are usually unstructured, conformational sampling approaches based on molecular dynamics simulations or approaches developped for small compounds should be preferred.

  • Amino acid types

    In date of january 2016, PEP-FOLD 3 only accepts the 20 usual amino acids. It will not process peptides with D-amino-acids, or unusual L- amino acids.

  • Peptide properties

    Presently, PEP-FOLD3 modeling engine, while able to deal with disulfide bonds, has not shown so far able to perform more efficiently that PEP-FOLD2. PEP-FOLD2 should be used for such peptides. Neither PEP-FOLD2 nor PEP-FOLD3 are able to process circular peptides cyclized by other means than disulfide bonds.

Usage

Input

  • Input sequence

    This field is to specify the amino acid sequence of the peptide. The input sequence file must be in FASTA format. The query peptide sequence must contain a string of only the 20 standard amino acids in uppercase, using the 1 letter code (see the pre-configured test example). The size of the input sequence can be as long as 50 amino acids.

Input options

  • Run label

    A label for the files generated. It MUST be a single word (no spaces, no special characters). It will be used to generate the name of the models available for download.

  • Sort models by

    Once generated, models are clustered to identify groups of similar models. The clusters are then sorted using either the sOPEP energy value, or Apollo [6] predicted TMscore (tm). For peptides up to 36 residues, using sOPEP as a key to sort the clusters will often result in proposing native or near native conformations in the top 5 ranks. For larger sizes, it is preferable to consider the Apollo predicted TM score.

    Note: For complexes, sorting by sOPEP will be used in any case (Apollo is designed for single chains).

Biased modeling

This section correspond to optional additional input to bias model generation.

  • Reference structure

    Input reference structure file must be in PDB format. This file must only contains a single model, with no hetero-residues and have the exact same length (in amino acids) as the query sequence.

  • Mask

    The mask specifies which residues in the sequence will have conformational variability. The input sequence must be in FASTA format. It MUST corresponds to the exact same amino acid sequence as that input, but is sensitive to the difference between uppercase and lowercase. The local conformation of residues written using lowercase will not vary, while that of the residues written in uppercase will.

    It is important to understand that conformational constraints are expressed in the terms of the structural alphabet (SA) used by PEP-FOLD3 (see the Concepts section for more information). Since the SA considers a peptide as a series of 4 amino acids overlapping by 3, a peptide sequence of L amino acids will be modeled using series of L-3 SA states. Thus, the first 4 amino acids of the sequence correspond to the first fragment and MUST have the same case. The second SA state makes possible to build the 5th amino acid in the structure, etc.

    If a reference structure is specified, it will be decoded as a series of SA letters using the Viterbi algorithm. The SA letters associated with lower case residues will be used for 3D modeling.

    If no reference structure is specified, an optimal series of SA letters will be predicted, and the SA letters associated with lower case residues will be used for 3D modeling.

    Warning: Specifying such constraints can sometimes be misleading due to the overlapping of 3 amino acids between consecutive fragments. Please consider carefully the use of lowercase.

Protein receptor

This section correspond to optional additional input to generate protein-peptide complexes.

  • Receptor structure

    The structure of the receptor must be in PDB format. This file must only contains a single model, with no hetero-residues. Presently, it should contain ONLY ONE CHAIN. Peptide folding onto multimers is not presently supported.

  • Residues defining the interaction patch

    Presently, PEP-FOLD requires the definition of a foreseen patch on receptor surface on which the peptide will bind. The patch is defined as a series of residues of the receptor structure. Given these residues, PEP-FOLD3 will generate candidate starting coordinates for the peptide modeling. Peptide modeling will associate one of these coordinates with one amino acid of the peptide in a stochastic manner. In addition, initial orientation will also be drawn randomly. This results in a very fuzzy definition of the expected interaction patch.

    Residues defining the patch must be specified ONE per line. In the PDB format, one amino acid can be unambiguously identified only by specifying the chain label, the residue number, the residue name and the insertion code.

    Here, we use the underscore as a delimiter between these fields, using the format:

    _CHAIN-ID_RESIDUE-3-LETTERS-CODE _RESIDUE-NUMBER_INSERTION-CODE_

    Example:

    _A_THR_1__,_A_THR_2__,_A_ILE_7__.
    Note: Blank chain Ids and blank insertion codes are accepted (and result in double _), and residue numbers are those of the PDB file. An error will be raised in any of the residues specified has no equivalent residue in the PDB file.

Demonstration mode

  • Pre-configured test

    Tests are configured for three different scenari. When selecting one of these, all input data specified in the other input fields will be ignored. Note that the demos are presently tuned to demonstrate PEP-FOLD 3 features only and are not intended for production. Hence they generate a limited number of models.

    • 3D modeling: This will launch PEP-FOLD3 modeling for the 40 amino acid sequence of the FAF-1 UBA Domain (PDB entry: 3e21). (MDREMILADFQACTGIENIDEAITLLEQNNWDLVAAINGV)
    • Biased modeling: This will launch PEP-FOLD3 modeling for the 40 amino acid sequence of the FAF-1 UBA Domain (PDB entry: 3e21), but limit conformational sampling to loop regions, applying the mask: MDREMILadfqaCTGIENIDEaitlleqNNWDLVaaingv on the experimental structure 3e21.
    • Complex modeling: This will launch the modeling the of the peptide of sequence HAGPIA in complex with the APO conformation of CYPA (PDB entry of the complex: 1awr).

Results

PEP-FOLD main output consists in models. On-line interactive visualisation and model selection facilities are however proposed.

  • Progress report

    ProgressReport

    This section will incrementally provide information about job progression and errors if any. A typical run should produce a report similar to that. Errors related to the input data specified are now also reported in this section.

  • Model visualization

    PEPFOLD3-PV

    PEP-FOLD3 on-line interactive visualization of the models generated is based on the PV javascript protein viewer . Different representations as well as colouring schemes can be selected. A menu makes possible to select a model among the 10 best models (representatives of the 10 best clusters).

  • Clustering report

    ClusterReport

    This report is primarily a table file that describes the clusters. For each model generated, up to 10 numbers are reported: avg, gdt, max, q, tm are the scores predicted by Apollo [6], where gdt is the predicted GDT_TS, q is the predicted Qmean score and tm is the predicted TMscore. sOPEP is the coarse grained energy of PEP-FOLD. If a reference file has been specified in input, additional columns correspond to the TMscore and the cRMSd obtained by aligning respectively the model onto the reference using TMalign or a rigid fit procedure minimizing the alpha carbon RMS deviation. The CGCNS and nSS are related to user constraints. CGCNS stands for a Coarse Grained CoNstraint Satisfaction, between 0 and 100, where 100 means all the constraints satisfied. nSS stands for the number of disulfide bond obtained at the all atom representation level (N/A for not appliable).

    Additionally, this table gives access to model download at different levels: The archive of all the models (top of table), archives of all models in a cluster - if the cluster contains several models, and each individual models.

    The cluster ranks are defined according to their scores (sOPEP or tm) - see the input option "Sort models by". The cluster representatives correspond to the models of the clusters having the best scores, i.e. with the lowest sOPEP energy (resp. highest tm value). They are denoted as "modelx", where x is the rank of the cluster according to the sort key. When a cluster has several models, it is in turn sorted according to the sort key. The first model in the table is the representative, denoted on the form "modelx" and the following models are numbered using the "modelx.y" convention where x is the rank of the cluster and y the rank of the model in the cluster.

    Note: cRMSd* (alpha carbons Root Mean Square deviation), GDT_TS* (Global Distance Test Total Score, for more information, see: predictioncenter.org), sOPEP energy (see concepts section), TM score* (for more information, see Zhang server) and a single character indicating if it is the cluster centroid (*) or not (-).
    * compared to the reference structure if provided.
  • Model 1 to 5

    Model

    Representatives o the 5 best clusters (according to tm - see Clustering report) predicted structure are provided in PDB format. You can either save the file onto your computer, or view it using JMol. In the Mobyle environment, PDB files can also be piped to other analyses such as the identification of secondary structures using stride or p-sea. For this, select the appropriate method beside the "further analysis" button, then launch it by clicking on "further analysis".

  • Local Structure prediction profile

    Model

    It corresponds to a graphical representation of the probabilities of each Structural Alphabet (SA) - see the Concepts - letter (vertical axis) at each position of the sequence (horizontal axis). Note that SA letters correspond to fragments of 4 residue length. The profile is presented using the following color code: red: helical, green: extended, blue: coil.

  • Models archive

    archive

    This archive contains all the models generated. It is in the unix tar format compressed using gzip.

    Once saved on your computer, enter for instance (unix) tar xzf AllModels.tgz to inflate the archive.

Examples, sample tests

As a simple test, you can either choose to:

  • Copy, paste

    Copy, paste the following sequence to the "Peptide amino acid sequence" field:

    >1egs_A mol:protein length:9 GROES
    TKSAGGIVL

    or use Mobyle facilities :

    MobyleForm
  • Fill the input data

    1. click "DB" radio button
    2. select pdbaa database
    3. write your sequence identifier (here: 1egs_A)
    4. add this sequence to the form field.
  • Fill the options

    1. click "DB" radio button
    2. select PDB database
    3. write your sequence identifier (here: 1egs_A)
    4. add this PDB file to the form field.
  • Warning: In both cases, the sequence cannot contain anything else that the 20 standard amino acids. "X" should be either discarded or replaced using the "edit" facility (top right of the input area).
  • Run

    Launch PEP-FOLD prediction, by clicking "Run" at the top of the page.

    Note: For non-registred users, a captcha will ask you to type the text from the image before submitting your job. Once your job has been submitted, you can check your results availability by clicking the "update job status" button.

Concepts

  • Structural alphabet

    PEP-FOLD is based on the concept of structural alphabet [1], i.e. an ensemble of elementary prototype conformations able to describe the whole diversity of protein structures.

  • Greedy algorithm

    HMM-SA letters are assembled by an enhanced greedy algorithm described in [2]. In PEP-FOLD3, this algorithm has been superseeded by HMM sub optimal sampling algorithm such as the forward backtrack algorithm or a taboo sampling algorithm [7].

  • Coarse grained force field

    The OPEP potential helps us to limit the roughness of the peptides energetic landscape, by simplifying side chains representation by a single bead. OPEP v3 parameters are optimized by a genetic algorithm procedure using a large ensemble of protein decoys [3]. OPEP is the objective function that drive the greedy algorithm during the rebuilding process.

    OPEP (Optimized Potential for Efficient structure Prediction) version 3 is expressed as a sum of local, nonbonded and hydrogen-bond (H-bond) terms:

    OPEPv3 Equation

    The local potentials are expressed by:
    OPEPv3 Equation

    The term Elocal contains force constants associated with changes in bond lengths and bond angles of all particles as well as force constants related to changes in improper torsions of the side-chains and the peptide bonds.

    The nonbonded potentials are expressed by:

    OPEPv3 Equation

    with 1, 4 the 1-4 interactions along each torsional degree of freedom, M ′ the N, C’, O and H main chain atoms, and Sc the side-chain. As seen, we separate short-range from long-range (j > i+4) interactions, and the C alpha atom from the other main chain atoms. For more details on the EVdW potential, see ref [6,5].

    The hydrogen-bonding potential (EH−bond) consists of two-body (EHB1) and four-body (EHB2) terms. Two-body H-bonds are defined by:

    OPEPv3 Equation

    Four-body effects, which represent cooperative energies between hydrogen bonds ij and kl, are defined by:

    OPEPv3 Equation
    Note: All details about the OPEP force field used in PEP-FOLD are available in refs [2,3] and further references included.

Validation

Validation tests for large peptides have been performed on a collection of 56 peptides of size between 25 and 50 amino acids. The results using the former greedy version of the algorithm show PEP-FOLD is able to identify near native or native conformations for 53 among 56 of these peptides, slightly outperforming state of the art de novo approach such as Rosetta [8]. Results using the PEP-FOLD3 engine show identical performance, for slighlty enhanced model quality [7].

Statistics

  • PEP-FOLD 2013 runs: 27898
  • PEP-FOLD 2014 runs: 29931
  • PEP-FOLD 2015 runs: 16651 since migration to new cluster architecture (July 1st 2015)

History

  • 2016, Mar 04: Added residue highlighting in model visualization.
  • 2016, Mar 03: Small improvement for the local structure prediction (faster, performance unaffected).
  • 2016, jan 11: Small correction for clustering of peptides when complexed with a receptor.
  • 2015, dec 16: PEP-FOLD3 server enters production.
  • 2015, sept 27: PEP-FOLD3 server enters test phase.
  • 2015: Improved engine based on Hidden Markov Model forward-backtrack and taboo sampling sub-optimal sampling approaches. Results show no loss in performance compared with previous greedy engine.
  • 2014: First announcement of new, faster engine at Bioinformatics 2014, Angers, France. Best presentation award.
  • 2013: New engine design. first implementation.

Known problems and answers

PEP-FOLD has been tested successfully using various OS / browser combinations, including:

  • Firefox and Chrome under Linux,
  • Firefox, Chrome, under Windows 7,and under Windows XP
  • Safari, Firefox and Chrome under MacOS X (lion, snow leopard, up to Yosemite).

There is presently very few feedback reporting problems with PEP-FOLD:

PV will not launch properly

In most cases, it is related to a problem of support of WebGL. Please check your browser supports it.

PV will not display the Cter residue

This issue of PV occurs only using the ribbon mode and has been reported to the PV developers.

Results will not display on PEP-FOLD termination

Mobyle upload button will not work. This spurious behavior has been observed repeatedly. If you encounter such behavior, just perform a full refresh of the result page (press Ctrl+R or press Enter in the url specification field of the browser, or press F5).

References

[1] Camproux AC, Gautier R, Tuffery P.
A hidden markov model derived structural alphabet for proteins.
J Mol Biol. 2004 Jun 4;339(3):591-605.
[2] Maupetit J, Derreumaux P, Tuffery P.
A fast and accurate method for large-scale de novo peptide structure prediction.
J Comput Chem. 2010 Mar;31(4):726-38.
[3] Maupetit J, Tuffery P, Derreumaux P.
A coarse-grained protein force field for folding and structure prediction.
Proteins. 2007 Nov 1;69(2):394-408.
[4] Kaur H, Garg A, Raghava GP.
PEPstr: a de novo method for tertiary structure prediction of small bioactive peptides.
Protein Pept Lett. 2007;14(7):626-31.
[5] Wang Z, Eickholt J, Cheng J.
APOLLO: a quality assessment service for single and multiple protein models.
Bioinformatics. 2011 27(12):1715-6.
[6] Beaufays J, Lins L, Thomas A, Brasseur R.
In silico predictions of 3D structures of linear and cyclic peptides with natural and non-proteinogenic residues.
J. Pept. Sci. 2011; epub ahead of print.
[7] Lamiable A, Thevenet P, Tufféry P
A critical assessment of HMM taboo sampling strategies applied to the generation of peptide 3D models.
in preparation.
[8] Shen Y, Maupetit J, Derreumaux P, Tufféry P.
Improved PEP-FOLD approach for peptide and miniprotein structure prediction
J. Chem. Theor. Comput. 2014; 10:4745-4758