Access the service through the RPBS Mobyle portal:

This website is free and open to all users and there is no login requirement

Pre-calculated results for(E)15, extremities blocked (see Batys et al., J. Phys. Chem. B, 2020 for experimental results).

• (E)15 at pH 7. Demo

• (E)15 at pH 3. Demo

When using this service, please cite the following references:

Rey J, Murail S, de Vries S, Derreumaux P, Tufféry P.
PEP-FOLD4: a pH-dependent force field for peptide structure prediction in aqueous solution.
Nucleic Acids Res., 2023, Web server issue.
Tufféry P, Derreumaux P.
A refined pH-dependent coarse-grained model for peptide structure prediction in aqueous solution.
Frontiers in Bioinformatics, 2023 3.
Binette V, Mousseau N, Tufféry P.
A Generalized Attraction-Repulsion Potential and Revisited Fragment Library Improves PEP-FOLD Peptide Structure Prediction.
J. Chem. Theory Comput. 2022 Apr 12;18(4):2720-2736.
Lamiable A, Thévenet P, Rey J, Vavrusa M, Derreumaux P, Tufféry P.
PEP-FOLD3: faster de novo structure prediction for linear peptides in solution and in complex.
Nucleic Acids Res. 2016 Jul 8;44(W1):W449-54.

Overview

PEP-FOLD is a de novo approach aimed at predicting peptide structures from amino acid sequences. This method, based on structural alphabet SA letters to describe the conformations of four consecutive residues, couples the predicted series of SA letters to a greedy algorithm and a coarse-grained force field.

What's new

PEP-FOLD4 latest evolution comes with a new version of the force field (sOPEP2) that makes use of a Mie representation instead of the former Van der Waals representation for non-bonded interactions.[6]
PEP-FOLD4 also includes for the first time an energy term using the Debye-Hueckel formalism to model pH, ionic strength dependence, and extremity blocking. [7]
PEPFOLD4 can be used for peptides between 5 to 40 amino acids.

Still active

PEP-FOLD3 version available here . Based on sOPEP1, it uses a Hidden Markov Model sub-optimal conformation sampling approach faster by one order of magnitude than the previous greedy strategy, while not affecting performance. This makes possible the on-line generation of models for peptides from 5 to 50 amino acids in a few minutes. Consider this version to refine pre-existing models and/or to generate decoys keeping rigid regions of the structure, or to generate candidate conformations of peptide-protein complexes , by folding peptides on a user specified patch of a protein.

PEP-FOLD2 version available here is based on the greedy strategy can perform 3D modeling for linear peptides up to 36 amino acids, and allows user specified constraints such as disulfide bonds and inter-residue proximities.

New Features

Improved force field for peptide structure prediction

PEP-FOLD4 relies on the sOPEP2 force field, a major evolution of sOPEP to model pH independent non bonded inteactions (see [6])
pH and ionic strength dependent modeling(see [7])

Limitations

Amino acid sequence size

Presently, PEP-FOLD4 prediction is limited to amino acid sequences between 5 and 40 residues. For sizes more than 40 amino acids at neutral pH, users should consider using AlphaFold/ColabFold or related. For peptides shorter than 5 amino acids that are usually unstructured, conformational sampling approaches based on molecular dynamics simulations or approaches developped for small compounds should be preferred.
Amino acid types

In date of december 2023, PEP-FOLD 3 only accepts the 20 usual amino acids. It will not process peptides with D-amino-acids, or unusual L- amino acids.

Usage

Input

Input sequence

This field is to specify the amino acid sequence of the peptide. The input sequence file must be in FASTA format. The query peptide sequence must contain a string of only the 20 standard amino acids in uppercase, using the 1 letter code (see the pre-configured test example). The size of the input sequence can be as long as 50 amino acids.

Input options

Run label

A label for the files generated. It MUST be a single word (no spaces, no special characters). It will be used to generate the name of the models available for download.
Generator

Different suboptimal sampling algorithms can be used instead of the default forward backtrack algorithm (FBT) (see [4]). Forward-Backtrack should be considered for small sequences (up to 12 amino acids). For larger sequence, taboo-sampling (TS) should be considered, increasing the size of the taboo fragments for larger sequences. TS3 (resp. TS4, TTS5) stands for the non repetition of motifs of 3 (resp. 4, 5) consecutive SA letters (see [4]).
Number of models

For short peptides, the generation of 100 models is generally enough to reach native or near native conformations. For larger peptides (more than 20 amino acids), generating 200 models is preferable.
Monte Carlo steps

Monte Carlo can be disabled by switching the number of steps to 0.
Monte Carlo temperature

Former versions of PEP-FOLD used 700 K as best option. The sOPEP2 version makes it possible to lower to 370 K by default.
Pseudorandom seed

PEP-FOLD4 is fully reproducible for identical parameters. It is possible to generate different models by modifying the seed. Negative values redirect to the machine time.
Probabilities from a previous run

This field is to reload the probabilities resulting from a previous run with the same sequence. If filled, these probabilities will be directly taken into account which will result in bypassing the local prediction step and save calculation time (around 40% time). Note: the input of probabilities not resulting from a previous run is strongly discouraged since it is most likely to result in spurious models.
Pseudorandom seed

PEP-FOLD4 is fully reproducible for identical parameters. It is possible to generate different models by modifying the seed. Negative values redirect to the machine time.
Use Debye-Huckel

By default, the force field in use is that described in [6]. Switching Debye-Hueckel on will provide access to pH and ionic strenght parameters.

Warning: pH, ionic strength and extremity blocking parameters are only active switching this option to "Yes".
pH

This is to specify the pH. It is only effective if the "Use Debye-Huckel" switch is turned to "Yes".
Ionic strength

The ionic strength (mM). A default of 1 is in use by default. It is only effeective if the "Use Debye-Huckel" switch is turned to "Yes".
Blocking extremities

By default, the extremities are charged. It is possible to neutralize them by specifying which of the Nter (acetyl), Cter (N-methyl) or both extremities should be neutralized. This is only effective if the "Use Debye-Huckel" switch is turned to "Yes".
Non standard pKa values

By default, the pKa values are assigned to standard values in aqueous solution, an approximation since pKa values may vary with the amino acid composition adn the conformation. It is possible to assign, for each residue in the sequence, a specific pKa value. One assignment is considered per line, in the form: residue position in the sequence (numbers from 1) - pKa value. For instance for a sequence like AAEVVKI, an input line of 3 5.6 would assign a pKa value of 5.6 to the glutamate at position 3 in the sequence. Lysine at position 6 would remain with its default pKa.

Demonstration mode

Pre-configured test

When selecting the load button, the sequence of the tau fragment will be loaded in the input sequence field and the Debye-Hueckel swith will be turned on.

Results

PEP-FOLD main output consists in models. On-line interactive visualisation and model selection facilities are however proposed.

Progress report

This section will incrementally provide information about job progression and errors if any. A typical run should produce a report similar to that. Errors related to the input data specified are now also reported in this section. For instance the report below explains a discrepancy between the sequence and the profile, probably due to an incorrect use of the "Probabilities from a previous run" field.
Model visualization

PEP-FOLD4 on-line interactive visualization of the models generated is based on the NGL javascript viewer . Different representations as well as colouring schemes can be selected. A menu makes possible to select a model among the 10 best models (representatives of the 10 best clusters).
- Mouse control: left click: rotation, right click: translation, middle click: clipping, wheel: zoom
- Model: Use this slider to navigate through the five best models (note it can be less if the number of clusters is of less than 5).
- Representations:
  - Cartoon: The cartoon representation of the backbone.
  - Licorice: The representation of the bonds of both backbone and side chains.
  - Surface: The representation of polypeptide surface, with different possibilities: Van der Waals, Molecular surface and solvent accessible surface, as proposed by NGL. The opacity of the surface can be adjusted with a slider.
  The representations can be turned on/off independently. Each can adopt a different color scheme.
- Color schemes:
  - "Rainbow": The color change from the Nter up to the Cter residue.
  - "Atom type": The color depends on the atom type (red: oxygen, blue: nitrogens, gray: carbons.
  - "Residue type": The color depends on class of the amino acid (positively charged, negatively charged, polar, apolar, aromatic, GLY and PRO)
  - "Secondary structure": Helices are in red, strands in green.
  - "Hydrophobicity": Residues are coloured depending on the hydrophobicity, as defined by NGL.
  - "Electrostatic charge": Atoms are coloured depending on their charge. Here, this mode is diverted in order to obtain a dark cartoon.
Clustering report

This report is primarily a table file that describes the clusters. For each model generated, the sOPEP energy of the model is reported.

Additionally, this table gives access to model download at different levels: The archive of all the models (top of table), archives of all models in a cluster - if the cluster contains several models, and each individual model.

The cluster ranks are defined according to their scores (sOPEP). The cluster representatives correspond to the models of the clusters having the best scores, i.e. with the lowest sOPEP energy. They are denoted as "modelx", where x is the rank of the cluster according to the sort key. When a cluster has several models, it is in turn sorted according to the sort key. The first model in the table is the representative, denoted on the form "modelx" and the following models are numbered using the "modelx.y" convention where x is the rank of the cluster and y the rank of the model in the cluster.
Model 1 to 5

Representatives of the best 5 clusters predicted structure are provided in PDB format. You can either save the file onto your computer, or view it using NGL. In the Mobyle environment, PDB files can also be piped to other analyses such as the identification of secondary structures using stride or p-sea. For this, select the appropriate method beside the "further analysis" button, then launch it by clicking on "further analysis".
Local Structure prediction profile

It corresponds to a graphical representation of the probabilities of each Structural Alphabet (SA) - see the Concepts - letter (vertical axis) at each position of the sequence (horizontal axis). Note that SA letters correspond to fragments of 4 residue length. The profile is presented using the following color code: red: helical, green: extended, blue: coil.
Models archive

This archive contains all the models generated. It is in the unix tar format compressed using gzip.

Once saved on your computer, enter for instance (unix) tar xzf AllModels.tgz to inflate the archive.

Examples, sample tests

As a simple test, you can either choose to:

Copy, paste

Copy, paste the following sequence to the "Peptide amino acid sequence" field:
```
>1egs_A mol:protein length:9 GROES
TKSAGGIVL
```
or use Mobyle facilities :
Fill the input data
1. click "DB" radio button
2. select pdbaa database
3. write your sequence identifier (here: 1egs_A)
4. add this sequence to the form field.
Fill the options
1. click "DB" radio button
2. select PDB database
3. write your sequence identifier (here: 1egs_A)
4. add this PDB file to the form field.

Warning: In both cases, the sequence cannot contain anything else that the 20 standard amino acids. "X" should be either discarded or replaced using the "edit" facility (top right of the input area).

Run

Launch PEP-FOLD prediction, by clicking "Run" at the top of the page.

Note: For non-registred users, a captcha will ask you to type the text from the image before submitting your job. Once your job has been submitted, you can check your results availability by clicking the "update job status" button.

Concepts

Structural alphabet

PEP-FOLD is based on the concept of structural alphabet [1], i.e. an ensemble of elementary prototype conformations able to describe the whole diversity of protein structures.
Greedy algorithm

HMM-SA letters are assembled by an enhanced greedy algorithm described in [2]. In PEP-FOLD3 and PEP-FOLD4, this algorithm has been superseded by HMM sub optimal sampling algorithm such as the forward backtrack algorithm or a taboo sampling algorithm [4].
Coarse grained force field

The sOPEP potential helps us to limit the roughness of the peptides energetic landscape, by simplifying side chains representation by a single bead. sOPEP2 parameters (free of Debye-Hueckel formalism) were optimized using a swarm optimizer using a large ensemble of protein decoys [6]. sOPEP2 is the objective function that drives the model building process.

sOPEP (Optimized Potential for Efficient structure Prediction) is expressed as a sum of local, nonbonded and hydrogen-bond (H-bond) terms:

The local potentials are expressed by:

The term Elocal contains force constants associated with changes in bond lengths and bond angles of all particles as well as force constants related to changes in improper torsions of the side-chains and the peptide bonds.

The nonbonded potentials are expressed by:

where EMie stands for the use of a Mie formulation instead of the Van der Waals one, and EDH stands for the use of the Debye-Hueckel formalism for interactions between charges.

The Mie potential is expressed as:

where is the potential depth and is the position of the potential minimum for interactions between atomic types i and j, and rij is the actual distance between the beads. More information is available in [6])

The Debye-Hueckel contribution is expressed as:

where qi and qj correspond to the charge of particles i and j, j > i+1, respectively. rij is the distance between the particles, lDH is the Debye length that depends of the ionic strength of the solvent, and is the dielectric constant that depends on the distance between the charges. It is evaluated as:
More information is available in [7])

The hydrogen-bonding potential (EH−bond) consists of two-body (EHB1) and four-body (EHB2) terms. Two-body H-bonds are deﬁned by:

Four-body eﬀects, which represent cooperative energies between hydrogen bonds ij and kl, are deﬁned by:

Note: All details about the sOPEP force field used in PEP-FOLD are available in refs [7] and further references included.

Validation

Validation tests using sOPEP2 are presented in [6 and 7].

History

2023, February 23: PEP-FOLD4 documentation update.
2023, February 1: PEP-FOLD4 migrates to mobyle2 portal (backed by newer cluster).
2023, February 1: PEP-FOLD4 accepts user filled probabilities.
2023, January 25: Changing PV for NGL.
2022, December 11: Adding the possibility for user specified pKa values for specific amino acids.
2022, December 1: Adding the possibility to block extremities.
2022, September 21: First PEP-FOLD4 server implementation (test).
2022, April 1st: Considering using the Debye-Heckel formalism to tune interactions for charged residues depending on pH and ionic strenght.
2021, Mar 01: Effective sOPEP2 force field.
2018, jan 11: Considering the opportunity to use a Mie formalism for sOPEPv2.

Known problems and answers

PEP-FOLD has been tested successfully using various OS / browser combinations, including:

Firefox and Chrome under Linux,
Firefox, Chrome, under Windows 7,and under Windows XP
Safari, Firefox and Chrome under MacOS X (lion, snow leopard, up to Yosemite).

There is presently very few feedback reporting problems with PEP-FOLD:

Results will not display on PEP-FOLD termination

Mobyle upload button will not work. This spurious behavior has been observed repeatedly. If you encounter such behavior, just perform a full refresh of the result page (press Ctrl+R or press Enter in the url specification field of the browser, or press F5).

Terms of use

As PEP-FOLD4 is a web service available on the RPBS Mobyle portal, it is subject to its terms of use.

Docker image access

Instructions to generate Docker images allowing to run PEP-FOLD4 on your local machine can be found here.

Please note that the access and usage of PEP-FOLD4 through Docker images is subject to this Licence. Key terms of this licence are:

License Type: Non-exclusive, non-transferable, worldwide noncommercial license.
Permitted Use: Internal research purposes only (not for collaborative or commercial use).
Prohibited Actions:
- No commercial use.
- No modification, adaptation, translation, or reverse engineering (except for interoperability as allowed by law).
- No distribution, transfer, or resale of the software.
Installation & Liability: The user installs and uses the software at their own risk; no warranties are provided.
Intellectual Property: The software remains the exclusive property of Université Paris Cité.
Citations: Users must credit Université Paris Cité and Inserm in any publications based on the software.
Commercial Use: Requires a separate license request (contact: partenariat.recherche.drive@u-paris.fr).
Legal Jurisdiction: Governed by French law; disputes settled in Paris courts.

References

[1] Camproux AC, Gautier R, Tuffery P.
A hidden markov model derived structural alphabet for proteins.
J Mol Biol. 2004 Jun 4;339(3):591-605.
[2] Maupetit J, Derreumaux P, Tuffery P.
A fast and accurate method for large-scale de novo peptide structure prediction.
J Comput Chem. 2010 Mar;31(4):726-38.
[3] Maupetit J, Tuffery P, Derreumaux P.
A coarse-grained protein force field for folding and structure prediction.
Proteins. 2007 Nov 1;69(2):394-408.
[4] Lamiable A, Thevenet P, Tufféry P
A critical assessment of hidden markov model sub-optimal sampling strategies applied to the generation of peptide 3D models.
J Comput Chem. 2016 Aug 5;37(21):2006-16. doi: 10.1002/jcc.24422.
[5] Shen Y, Maupetit J, Derreumaux P, Tufféry P.
Improved PEP-FOLD approach for peptide and miniprotein structure prediction
J. Chem. Theor. Comput. 2014; 10:4745-4758
[6] Binette V, Mousseau N, Tufféry P.
A Generalized Attraction-Repulsion Potential and Revisited Fragment Library Improves PEP-FOLD Peptide Structure Prediction.
J. Chem. Theory Comput. 2022 Apr 12;18(4):2720-2736.
[7] Tufféry P, Derreumaux P.
A refined pH-dependent coarse-grained model for peptide structure prediction in aqueous solution.
Frontiers in Bioinformatics, 2023 3.

PEP-FOLD 4

Access the service through the RPBS Mobyle portal:

This website is free and open to all users and there is no login requirement

Overview

What's new

Still active

New Features

Improved force field for peptide structure prediction

pH and ionic strength dependent modeling(see [7])

Limitations

Amino acid sequence size

Amino acid types

Usage

Input

Input sequence

Input options

Run label

Generator

Number of models

Monte Carlo steps

Monte Carlo temperature

Pseudorandom seed

Probabilities from a previous run

Pseudorandom seed

Use Debye-Huckel

pH

Ionic strength

Blocking extremities

Non standard pKa values

Demonstration mode

Pre-configured test

Results

Progress report

Model visualization

Clustering report

Model 1 to 5

Local Structure prediction profile

Models archive

Examples, sample tests

Copy, paste

Fill the input data

Fill the options

Run