PEP-FOLD is a de novo approach aimed at predicting peptide structures from amino acid sequences. This method, based on structural alphabet SA letters to describe the conformations of four consecutive residues, couples the predicted series of SA letters to a greedy algorithm and a coarse-grained force field.
Access the PEP-FOLD server @ the RPBS Mobyle Portal or the PEP-FOLD standalone server.
[ Restricted to peptide sizes from 9 to 25 residues ]

Please cite the following reference:
Maupetit J, Derreumaux P, Tufféry P.
PEP-FOLD: an online resource for de novo peptide structure prediction.
Nucleic Acids Res. 2009. doi:10.1093/nar/gkp323
History
  • 2009, sep 17 - Bugfix: in some cases, secondary structure prediction constraints were not efficient.
Features
  • Peptide structure prediction: Starting from your amino acid sequence, PEP-FOLD runs 50 greedy simulations and gives the centroid of the most populated clusters plus the lowest energy conformation.
  • Prediction constraints: The server allows you to include other prediction methods as constraints for the HMM-SA letters prediction. For now PSIPRED is implemented and fully fonctionnal.
  • Reference structure: PEP-FOLD server allows you to upload a reference structure in order to compare PEP-FOLD models with it (see usage).
  • Fast folding: Once your job is running (no pending status), PEP-FOLD prediction takes about 40 minutes for a 20-residue peptide sequence.
<Back to top>
Limitations
  • Amino acid sequence size: For now, PEP-FOLD prediction is limited to amino acid sequence between 9 and 25 residues. This could rapidly evolve depending on users request.
  • Peptide properties: Actually, PEP-FOLD is not able to treat linear peptides with D-amino-acids and disulfide briges, nor circular peptides. Those aspects are in the roadmap for the next PEP-FOLD release.
<Back to top>
Usage
  • Input sequence: Input sequence file must be in FASTA format. If another format is provided, it will be automatically converted by squizz (mobyle server only). The query peptide sequence must contain only the 20 standard amino acids in uppercase.
  • Input reference structure: Input reference structure file must be in PDB format. This file must only contains a single model, with no hetero-residues and have the same length (in amino acids) as the query sequence.
  • Prediction constraints: To combine HMM-SA letters prediction with PSIPRED prediction, simply check the ad hoc option.
  • Results: PEP-FOLD produces informative report to guide the user through the best model selection.
    • Energy plot
      Energy Plot This file reports for each cluster (x axis), the energy of its centroid (y axis) calculated by sOPEP. The size of the centroid circle is proportional to the cluster population (reported in the center of the circle). The lowest energy solution can be located by the the dashed lines.
    • Clustering report
      This report is a simple text file containing 8 columns representing: the cluster number, the cluster individu number, the model file name, its cRMSd* (alpha carbons Root Mean Square deviation), GDT_TS* (Global Distance Test Total Score, for more information see: predictioncenter.org), sOPEP energy (see concepts section), TM score* (for more information see Zhang server) and a single character indicating if it is the cluster centroid (*) or not (-).
      * compared to the reference structure if provided.
    • Clusters centroids PDB files
      All cluster centroids and the lowest energy predicted structure (in PDB format) are provided in a single tarball archive. Windows users can open this kind of archive via 7-zip (GNU LGPL) or WinRar.
    • Lowest energy solution PDB file
      BestEne The lowest energy solution (marked by dashed lines in the energy plot) is reported separately to give you the opportunity to pipe this PDB file to RPBS Mobyle's Portal other resources such as iSuperpose to superpose this solution on your own reference, evaluate its secondary structures with stride, etc ... This feature is not available in the standalone server.
    • Models snapshots
      BestEne PEP-FOLD output shows a snapshot of the lowest energy solution and of all clusters of low energy superposed either on the lowest energy conformation or a reference structure if provided. This gives an idea of the conformational diversity of your PEP-FOLD results. Structure snapshot are computed thanks to the PyMol software.
    • Predictions viewer (standalone server)
      jMol The standalone server allows you to visualize each cluster most representative conformation and the lowest energy solution independently or together via a JMol applet.

      This feature is also available for the Mobyle server, throught its pipelining possibility.
<Back to top>
Examples, sample tests
  • WMV icon

    A Mobyle video tutorial is available [WMV format (14 Mb)].
  • As a simple test, you can either choose to:
  • Copy, paste the following sequence to the "Peptide amino acid sequence" field:
    >1egs_A mol:protein length:9 GROES
    TKSAGGIVL
  • or use Mobyle facilities :
  • MobyleForm
    Fill the input data 1. click "DB" radio button 2. select pdbaa database 3. write your sequence identifier (here: 1egs_A) 4. add this sequence to the form field,
  • Fill the options 1. click "DB" radio button 2. select PDB database 3. write your sequence identifier (here: 1egs_A) 4. add this PDB file to the form field.
  • In both cases:
  • Run Launch PEP-FOLD prediction, by clicking "Run" at the top of the page.

    nota bene: For non-registred users, a captcha will ask you to type the text from the image before submitting your job. Once your job has been submitted, you can check your results availability by clicking the "update job status" button.
<Back to top>
Concepts
  • Structural alphabet PEP-FOLD is based on the concept of structural alphabet [1, 2] , i.e. an ensemble of elementary prototype conformations able to describe the whole diversity of protein structures.
  • Greedy algorithm HMM-SA letters are assembled by an enhanced greedy algorithm described in [3, 4, 5].
  • Coarse grained force field The OPEP potential helps us to limit the roughness of the peptides energetic landscape, by simplifying side chains representation by a single bead. OPEP v3 parameters are optimized by a genetic algorithm procedure using a large ensemble of protein decoys [6]. OPEP is the objective function that drive the greedy algorithm during the rebuilding process.

    OPEP (Optimized Potential for Efficient structure Prediction) version 3 is expressed as a sum of local, nonbonded and hydrogen-bond (H-bond) terms:
    OPEPv3 Equation

    The local potentials are expressed by:
    OPEPv3 Equation
    The term Elocal contains force constants associated with changes in bond lengths and bond angles of all particles as well as force constants related to changes in improper torsions of the side-chains and the peptide bonds.

    The nonbonded potentials are expressed by:
    OPEPv3 Equation
    with 1, 4 the 1-4 interactions along each torsional degree of freedom, M ′ the N, C’, O and H main chain atoms, and Sc the side-chain. As seen, we separate short-range from long-range (j > i+4) interactions, and the C alpha atom from the other main chain atoms. For more details on the EVdW potential, see ref [6,5].

    The hydrogen-bonding potential (EH−bond) consists of two-body (EHB1) and four-body (EHB2) terms. Two-body H-bonds are defined by:
    OPEPv3 Equation
    Four-body effects, which represent cooperative energies between hydrogen bonds ij and kl, are defined by:
    OPEPv3 Equation

    Please note that all details about the OPEP force field used in PEP-FOLD are available in refs [6,5].
<Back to top>
Validation
Validation tests have been performed on two differents peptides set:
  • PepStr set This test set includes 42 linear bioactive peptides free of any disulfide bridge characterized by NMR spectroscopy in both aqueous and non-aqueous solutions [7].
  • PEP-FOLD set To complete PepStr set, we designed our own test set consituted of two subsets of PDB structures solved in aqueous solution, selected for their sizes and topology diversity.
    • Short peptides PDB codes (10 targets from 10 to 23 aa) : 1dep, 1k43, 1le1, 1le3, 1pei, 1uao, 1wbr, 1wz4, 2evq and the beta hairpin fragment of 2gb1.
    • Long peptides PDB codes (14 targets from 27 to 49 aa) : 1abz, 1aie, 1bbl, 1bdd, 1e0l, 1e0n, 1f4i, 1fsd, 1i6c, 1kjk, 1psv, 1vii, 1vpu and 2p81.
<Back to top>
References
[1] Camproux AC, Tuffery P, Chevrolat JP, Boisvieux JF, Hazout S.
Hidden Markov model approach for identifying the modular framework of the protein backbone.
Protein Eng. 1999 Dec;12(12):1063-73.
[2] Camproux AC, Gautier R, Tuffery P.
A hidden markov model derived structural alphabet for proteins.
J Mol Biol. 2004 Jun 4;339(3):591-605.
[3] Tuffery P, Guyon F, Derreumaux P.
Improved greedy algorithm for protein structure reconstruction.
J Comput Chem. 2005 Apr 15;26(5):506-13.
[5] Maupetit J, Derreumaux P, Tuffery P.
A fast and accurate method for large-scale de novo peptide structure prediction.
J Comput Chem. 2009. In press.
[6] Maupetit J, Tuffery P, Derreumaux P.
A coarse-grained protein force field for folding and structure prediction.
Proteins. 2007 Nov 1;69(2):394-408.
[7] Kaur H, Garg A, Raghava GP.
PEPstr: a de novo method for tertiary structure prediction of small bioactive peptides.
Protein Pept Lett. 2007;14(7):626-31.
<Back to top>
Last-Update: 2008/12/18