PEP-FOLD is a de novo approach aimed at predicting
peptide structures from amino acid sequences. This method,
based on structural alphabet SA letters to describe the
conformations of four consecutive residues, couples the
predicted series of SA letters to a greedy algorithm and a
coarse-grained force field.
Access the
PEP-FOLD server @ the
RPBS Mobyle Portal or the
PEP-FOLD standalone server.
Please cite the following reference:
PEP-FOLD: an online resource for de novo peptide structure prediction.
Nucleic Acids Res. 2009. doi:10.1093/nar/gkp323
[ Restricted to peptide sizes from 9 to
25 residues ]
Please cite the following reference:
PEP-FOLD: an online resource for de novo peptide structure prediction.
Nucleic Acids Res. 2009. doi:10.1093/nar/gkp323
History
- 2009, sep 17 - Bugfix: in some cases, secondary structure prediction constraints were not efficient.
Features
- Peptide structure prediction: Starting from your amino acid sequence, PEP-FOLD runs 50 greedy simulations and gives the centroid of the most populated clusters plus the lowest energy conformation.
- Prediction constraints: The server allows you to include other prediction methods as constraints for the HMM-SA letters prediction. For now PSIPRED is implemented and fully fonctionnal.
- Reference structure: PEP-FOLD server allows you to upload a reference structure in order to compare PEP-FOLD models with it (see usage).
- Fast folding: Once your job is running (no pending status), PEP-FOLD prediction takes about 40 minutes for a 20-residue peptide sequence.
Limitations
- Amino acid sequence size: For now, PEP-FOLD prediction is limited to amino acid sequence between 9 and 25 residues. This could rapidly evolve depending on users request.
- Peptide properties: Actually, PEP-FOLD is not able to treat linear peptides with D-amino-acids and disulfide briges, nor circular peptides. Those aspects are in the roadmap for the next PEP-FOLD release.
Usage
- Input sequence: Input sequence file must be in FASTA format. If another format is provided, it will be automatically converted by squizz (mobyle server only). The query peptide sequence must contain only the 20 standard amino acids in uppercase.
- Input reference structure: Input reference structure file must be in PDB format. This file must only contains a single model, with no hetero-residues and have the same length (in amino acids) as the query sequence.
- Prediction constraints: To combine HMM-SA letters prediction with PSIPRED prediction, simply check the ad hoc option.
-
Results: PEP-FOLD produces
informative report to guide the user through the best
model selection.
-
Energy plot
This file reports for each cluster (x axis), the
energy of its centroid (y axis) calculated by
sOPEP. The size of the centroid circle is
proportional to the cluster population (reported in
the center of the circle). The lowest energy
solution can be located by the the dashed lines.
-
Clustering reportThis report is a simple text file containing 8 columns representing: the cluster number, the cluster individu number, the model file name, its cRMSd* (alpha carbons Root Mean Square deviation), GDT_TS* (Global Distance Test Total Score, for more information see: predictioncenter.org), sOPEP energy (see concepts section), TM score* (for more information see Zhang server) and a single character indicating if it is the cluster centroid (*) or not (-).
* compared to the reference structure if provided. -
Clusters centroids PDB filesAll cluster centroids and the lowest energy predicted structure (in PDB format) are provided in a single tarball archive. Windows users can open this kind of archive via 7-zip (GNU LGPL) or WinRar.
-
Lowest energy solution PDB file
The lowest energy solution (marked by dashed lines
in the energy plot) is reported
separately to give you the opportunity to pipe this
PDB file
to RPBS
Mobyle's Portal other resources such
as iSuperpose
to superpose this solution on your own reference,
evaluate its secondary structures
with stride,
etc ... This feature is not available in the standalone server.
-
Models snapshots
PEP-FOLD output shows a snapshot of the lowest
energy solution and of all clusters of low energy
superposed either on the lowest energy conformation
or a reference structure if provided. This gives an
idea of the conformational diversity of your
PEP-FOLD results. Structure snapshot are computed
thanks to
the PyMol software.
-
Predictions viewer (standalone server)
The standalone server allows you to visualize each
cluster most representative conformation and the
lowest energy solution independently or
together via
a JMol applet.
This feature is also available for the Mobyle server, throught its pipelining possibility.
-
Examples, sample tests
-
A Mobyle video tutorial is available [WMV format (14 Mb)]. - As a simple test, you can either choose to:
-
Copy, paste the following
sequence to the "Peptide amino acid sequence" field:
>1egs_A mol:protein length:9 GROES
TKSAGGIVL - or use Mobyle facilities :
-
Fill the input data 1. click
"DB" radio button 2. select pdbaa database 3. write your
sequence identifier (here: 1egs_A) 4. add this sequence
to the form field,
- Fill the options 1. click "DB" radio button 2. select PDB database 3. write your sequence identifier (here: 1egs_A) 4. add this PDB file to the form field.
- In both cases:
-
Run Launch PEP-FOLD
prediction, by clicking "Run" at the top of the page.
nota bene: For non-registred users, a captcha will ask you to type the text from the image before submitting your job. Once your job has been submitted, you can check your results availability by clicking the "update job status" button.
Concepts
- Structural alphabet PEP-FOLD is based on the concept of structural alphabet [1, 2] , i.e. an ensemble of elementary prototype conformations able to describe the whole diversity of protein structures.
- Greedy algorithm HMM-SA letters are assembled by an enhanced greedy algorithm described in [3, 4, 5].
-
Coarse grained force field The
OPEP potential helps us to limit the roughness of the
peptides energetic landscape, by simplifying side chains
representation by a single bead. OPEP v3 parameters are
optimized by a genetic algorithm procedure using a large
ensemble of protein decoys [6]. OPEP
is the objective function that drive the greedy
algorithm during the rebuilding process.
OPEP (Optimized Potential for Efficient structure Prediction) version 3 is expressed as a sum of local, nonbonded and hydrogen-bond (H-bond) terms:
The local potentials are expressed by:
The term Elocal contains force constants associated with changes in bond lengths and bond angles of all particles as well as force constants related to changes in improper torsions of the side-chains and the peptide bonds.
The nonbonded potentials are expressed by:
with 1, 4 the 1-4 interactions along each torsional degree of freedom, M ′ the N, C’, O and H main chain atoms, and Sc the side-chain. As seen, we separate short-range from long-range (j > i+4) interactions, and the C alpha atom from the other main chain atoms. For more details on the EVdW potential, see ref [6,5].
The hydrogen-bonding potential (EH−bond) consists of two-body (EHB1) and four-body (EHB2) terms. Two-body H-bonds are defined by:
Four-body effects, which represent cooperative energies between hydrogen bonds ij and kl, are defined by:
Validation
Validation tests have been performed on two differents
peptides set:
- PepStr set This test set includes 42 linear bioactive peptides free of any disulfide bridge characterized by NMR spectroscopy in both aqueous and non-aqueous solutions [7].
-
PEP-FOLD set To complete
PepStr set, we designed our own test set consituted of
two subsets of PDB structures solved in aqueous
solution, selected for their sizes and topology
diversity.
- Short peptides PDB codes (10 targets from 10 to 23 aa) : 1dep, 1k43, 1le1, 1le3, 1pei, 1uao, 1wbr, 1wz4, 2evq and the beta hairpin fragment of 2gb1.
- Long peptides PDB codes (14 targets from 27 to 49 aa) : 1abz, 1aie, 1bbl, 1bdd, 1e0l, 1e0n, 1f4i, 1fsd, 1i6c, 1kjk, 1psv, 1vii, 1vpu and 2p81.
References
[1]
Hidden Markov model approach for identifying the modular framework of the protein backbone.
Protein Eng. 1999 Dec;12(12):1063-73.
Hidden Markov model approach for identifying the modular framework of the protein backbone.
Protein Eng. 1999 Dec;12(12):1063-73.
[2]
A hidden markov model derived structural alphabet for proteins.
J Mol Biol. 2004 Jun 4;339(3):591-605.
A hidden markov model derived structural alphabet for proteins.
J Mol Biol. 2004 Jun 4;339(3):591-605.
[3]
Improved greedy algorithm for protein structure reconstruction.
J Comput Chem. 2005 Apr 15;26(5):506-13.
Improved greedy algorithm for protein structure reconstruction.
J Comput Chem. 2005 Apr 15;26(5):506-13.
[4]
Dependency between consecutive local conformations helps assemble protein structures from secondary structures using Go potential and greedy algorithm.
Proteins. 2005 Dec 1;61(4):732-40.
Dependency between consecutive local conformations helps assemble protein structures from secondary structures using Go potential and greedy algorithm.
Proteins. 2005 Dec 1;61(4):732-40.
[5]
A fast and accurate method for large-scale de novo peptide structure prediction.
J Comput Chem. 2009. In press.
A fast and accurate method for large-scale de novo peptide structure prediction.
J Comput Chem. 2009. In press.
[6]
A coarse-grained protein force field for folding and structure prediction.
Proteins. 2007 Nov 1;69(2):394-408.
A coarse-grained protein force field for folding and structure prediction.
Proteins. 2007 Nov 1;69(2):394-408.
[7]
PEPstr: a de novo method for tertiary structure prediction of small bioactive peptides.
Protein Pept Lett. 2007;14(7):626-31.
<Back to top>
PEPstr: a de novo method for tertiary structure prediction of small bioactive peptides.
Protein Pept Lett. 2007;14(7):626-31.