PEP-Cyclizer

A large scale structure mining approach to assist the design of peptide head-to-tail cylization.

Overview

PEP-Cyclizer is a tool to assist the design of head-to-tail peptide cyclization, a well known strategy to enhance peptide resistance to enzymatic degradation and thus peptide bioavailability.

PEP-Cyclizer adresses two complementary features.

  1. The search for candidate sequences compatible with the cyclization of the peptide, a facility to assist medicinal chemists.
  2. The generation of 3D models of a cyclic peptide starting from the 3D structure of the un-cylized peptide and the sequence of the cyclized peptide, a preliminary step for further peptide conformational stability analysis, or peptide-receptor docking.
Information is inferred based on a Protein Data Bank (PDB)[1] mining strategy similar to that of developped recently [2] to quickly identify protein fragments matching the closure condition, where the closure is defined between the C-terminus and the N-terminus of the un-cyclized peptide.

Access the service through the RPBS Mobyle portal:

When using this service, please cite the following reference:

Karami Y, Rey J, Murail S, Giribaldi J, de Vries S, Tufféry P.
PEP-Cyclizer: a service to assist peptide cyclization through a 3D structure mining approach. in preparation

Limitations

  • Un-cylized peptide size

    Although cyclic peptides as short as 4 amino acids exist, PEP-Cyclizer paradigma is more oriented towards the cyclization of bioactive peptide formerly characterized, and consequently often peptides of larger sizes. In practice, PEP-Cyclizer will not start from un-cyclized peptides of size less than 8 amino-acids, due to the requirement to perform the PDB mining (see farther).

  • Amino acid types

    PEP-Cyclizer only accepts un-cyclized peptides made of the 20 usual amino acids. It will not process un-cylized peptides with D- amino-acids, or unusual L- amino acids.

    When guessing for candidate sequences, it will only propose sequences made of the 20 usual L- amino acids.

  • Peptide properties

    It is possible to undergo the head-to-tail cyclization of peptides containing disulfde bonds. However PEP-Cylizer may not account correctly for other kinds of peptide internal cyclizations prior to head-to tail cyclization. For instance, the design of bicyclic peptides is out of the scope of PEP-Cyclizer.

Usage

PEP-Cyclizer requires the 3D structure of a peptide - the template, and will provide information on the linker, i.e. the additional amino-acids required to perform the head-to-tail cyclization. Typical runs require on the order of 20 mn, depending on server load.

Input

  • Un-cyclized peptide structure

    This field is to specify the coordinates of the template peptide. The input file must be in PDB format. When multiple models are provided, only the first one will be considered. It must contain only the 20 standard amino acids. The size of the input template can be as long as 50 amino acids, but must be longer than 8 amino acids. Note that residue numbers will be changed for the ouptut, the first amino-acid will assigned number 1.

  • Linker sequence

    This field is to specify the sequence of a candidate linker. Its size may vary between 2 and 10 amino acids. Its sequence must be provided using the 1 letter amino acid code. Filling this field, PEP-Cyclizer will generate 3D models for the head-to-tail cyclized peptide.

  • Linker size

    This field is to specify the size of a candidate linker. Its size may vary between 2 and 10 amino acids. Filling this field (default), PEP-Cyclizer will infer information from the PDB fragments of the specified linker size matching the head-to-tail closure condition.

  • Constraints for amino acid types at each position

    It is often undesirable to perform head-to-tail cyclization with amino acids likely to interfere with peptide bioactivity. For this reason, only a limited subset of amino acids is most often used. This field makes possible to specify constraints on the amino acid types at each position. Information for each position in the linker sequence MUST be provided. the ':' symbol is used as a delimiter.
    1:F:G:P indicates that at linker position 1, only PHE, GLY and PRO are preferred.
    1: indicates that at linker position 1, no preference is specified, meaning all 20 standard amino acids are allowed.
    By default the form is filled using 1:A:G, i.e. indicating a preference for ALA and GLY.

Demonstration mode

  • Pre-configured test

    Tests are configured for two different scenari. When selecting one of these, all input data specified in the other input fields will be ignored.

    • Linker sequence guess: This will launch PEP-Cyclizer for the alpha-conotoxin AuIB (PDB entry: 1mxn), and for a linker size of 6
    • Linker conformation guess: This will launch PEP-Cyclizer 3D modeling for the alpha-conotoxin MII (PDB entry: 1m2c), using as a linker sequence GGAAGG, which corresponds to the experimental structure 2ajw.

Results

PEP-FOLD main output consists in models. On-line interactive visualisation and model selection facilities are however proposed.

  • Progress report

    ProgressReport

    This section will incrementally provide information about job progression and errors if any. A typical run should produce a report similar to that. Errors related to the input data specified are now also reported in this section.

  • Model visualization

    PEPFOLD3-PV

    PEP-Cyclizer on-line interactive visualization of the models generated is based on the NGL . Different representations can be selected. A menu makes it possible to select a model among the 30 best models (representatives of the 30 best clusters), or to visualize them together.

  • Sequence Logo

    ClusterReport

    This report is a display of the variability of the amino acids observed over the candidate sequences. Red amino acids correspond to the amino acids. We remind that, due to the weak probability to find sequences matching the sequence constraints at all positions, we select, among the candidates matching the closure geometry constraints, sequences that match at least 50% constraints, which explains the occurrence of other amino acids at each position (in gray).

  • Proposed linkers

    ClusterReport

    This section presents sequences drawn using the forward backtrack algorithm on the model fit over the data. Each sequence is associated with a score that corresponds to -log(L), where L is the likelihood of the sequence. minimal score correspond to the most probable sequences.

  • Top sequences

    ClusterReport

    This section reports the sequence corresponding to the best candidates identified based on their geometry. For each, we indicate the PDB entry, the chain, and the residue numbers corresponding to the fragment, BCscore, RMSD, loop sequence.

  • Models archive

    archive

    This archive contains the 30 best predicted structures according to geometrical and sequence criteria are provided in PDB format. It is in the unix tar format compressed using gzip.

    Once saved on your computer, enter for instance (unix) tar xzf AllModels.tgz to inflate the archive.

Examples, sample tests

Example 1 (demo mode): Linker sequence guess for the alpha-conotoxin AuIB (PDB entry: 1mxn), and for a linker size of 6

The result page of this demo can be accessed here.

Example of the sequence logo generated for the un-cyclized peptide 1mxn. Experimentalists have reported a successful cyclization using a linker of 6 amino acids and of sequence GGAAGG, which is proposed by PEP-Cyclizer at rank 15 over 35.

1mxn-sequence-guess 1mxn-sequence-guess

Example 2 (demo mode): Cyclization of the alpha-conotoxin MII (PDB entry: 1m2c), using as a linker sequence GGAAGG

The result page of this demo can be accessed here.

Example of the 3D modelling of the head-to-tail cyclic conformation of the alpha-conotoxin MII (PDB: 2ajw) starting from the un-cyclized structure (PDB: 1m2c) and a linker of sequence GGAAGG. Left: all models, as seen in the result page on the web. Right: Model 2 has an alpha carbon RMSD of 1.7 Angstroms (cyan: experimental un-cyclized conformation (PDB: 1m2c model 1); green: experimental conformation of the cyclic peptide (PDB: 2ajw); magenta: PEP-Cyclizer model 2; chain breaks correspond to the head-to-tail cyclization).

1m2c-30models 1m2c-model2

Concepts

  • From loop modeling to peptide cyclization

    Peptide head-to-tail cyclization is assimilated to loop modeling where the loop consists of the linker between the C-terminus and the N-terminus of the un-cyclized peptide, considering flanks of 4 amino acids. The first linker residue corresponds to the residue next to the C-ter, etc.

  • PDB mining

    PEP-Cyclizer is based on the identification of candidate fragments matching the geometrical conditions required for peptide head-to-tail cyclization. Such fragments are identified by mining the complete collection of protein structures of the PDB, using a protocol similar to that developped for protein loops [2], i.e. an approach searching for candidate fragments matching satisfactory geometrical contraints on the loop flanks [3][4]. Here, we search for fragments of size that of the linker, supplemented by flanks of 4 amino acids on both C-ter and N-ter directions (i.e. adding 8 amino acids), and we sort the candidates according to the RMSd calculated over the flanks only (8 amino acids - flank RMSD). We found that an acceptable RMSD cut-off is of 2 Angstroms. Candidates are sorted according to their flank RMSD, and a subset is accepted for further steps.

  • Filters on linker Sequence

    The candidate fragments are then filtered according constraints set by the user in terms of the linker sequence. Given explicitly the linker sequence, we select the fragments with a positive BLOSUM62 score with the linker sequence. Given constraints on amino acids types per position, since occurrences of sequences matching all constraints are unlikely, we select the fragments that have at least 50% of positions matching the constraints and having a positive BLOSUM62 score overall.

  • 3D model generation

    Given the candidates, 3D models of the cyclized peptide are built by superimposing the candidate linkers on the flanks, and repositioning the side chains using oscar-star [5]. Local geometry refinement is performed by minimizing the conformation of cyclized peptide using Gromacs [6].

  • Sequence logo

    The sequences of the accepted candidates are piled-up to generate a sequence logo. A Hidden Markov Model is fit on top of these sequences and we use the forward-backtrack algorithm to sample the sequences the most probable given the sequence constraints, and observed transitions between consecutive positions. Since both amino-acid frequencies and transitions are estimated from small ensembles, we make use pseudo-counts to un-bias them towards a uniform distribution.

Validation

Due to the weak number of cases where both the structures of un-cyclized and cyclized peptides are available, we have validated the approach on peptides of the cybase [7](66 entries with structure reduced to 36 after removing entries withou structures in the PDB, structures with gaps, etc), by mimicking the cyclization of un-cyclized peptides generated removing amino acids at N- and C-termini of the cyclic peptides. Considering the 10 (resp. the 30) best models, PEP-Cyclizer was able to return a model approximating the structure of the cyclized peptide at 1.84 (resp. 1.78) angstroms on average.

Extra validation could be be performed on a collection of 5 peptides for which the structures of both the un-cylized and cyclized peptides are available in the PDB, and 3 peptides for which the un-cylized conformation is available and information about successful linkers exist:

un-cyclized    cyclized   linker size  
1m2c 2ajw 6
1m2c 2ak0 7
2h8s 4ttl 6
2ew4 2j15 2
1ixt 2mso 3
1mxn n/a 4, 5, 6, 7
2jut n/a 2,3
2c9t n/a 3, 4, 5, 6, 7 (6 most stable)
PEP-Cyclizer was found able to return useful information in all cases, and more particularly the experimental sequence was classified as likely in all cases. Note that one does not know a this time if the other sequences proposed by PEP-Cyclizer are viable or not.

History

  • 2018, Dec 6: PEP-Cyclizer server enters production.
  • 2018, Nov 2: PEP-Cyclizer server enters test phase.
  • 2018, Aug: First implementation of the service.
  • 2018, May: Solved issue of non using Modeller as in original DaReUS-Loop, and replacing by Gromacs minimization to generate 3D models.
  • 2018, Jan: Definition of objectives, elaborating over DaReUS-Loop.

Browser compatibility

PEP-Cyclizer has been tested successfully using various OS / browser combinations, including:

  • Firefox and Chrome under Linux,
  • Firefox, Chrome, under Windows 7,and under Windows XP. Chrome, firefox and edge under Windows 10.
  • Safari, Firefox and Chrome under MacOS X (lion, snow leopard, up to el capitan).
Note using firefox, hardware acceleration for 3D was found to trigger troubles in some cases.

References

[1] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat, Weissig H, Shindyalov IN, Bourne PE.
The Protein Data Bank
Nucleic Acids Research, 2000; 28: 235-242.
[2]Karami Y, Guyon F, De Vries S, Tufféry P.
DaReUS-Loop: accurate loop modeling using fragments from remote or unrelated proteins.
Sci Rep. 2018; Sep 12;8(1):13673. doi: 10.1038/s41598-018-32079-w.
[3] Guyon F, Tufféry P.
Fast protein fragment similarity scoring using a Binet-Cauchy kernel.
Bioinformatics. 2014 Mar 15;30(6):784-91. doi: 10.1093/bioinformatics/btt618. Epub 2013 Oct 27.
[4] Guyon F, Martz F, Vavrusa M, Bécot J, Rey J, Tufféry P.
BCSearch: fast structural fragment mining over large collections of protein structures.
Nucleic Acids Res. 2015 Jul 1;43(W1):W378-82.
[5] Liang S, Zheng D, Zhang C, Standley DM.
Fast and accurate prediction of protein side-chain conformations.
Bioinformatics. 2011 Oct 15;27(20):2913-4.
[6] Van Der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJ.
GROMACS: fast, flexible, and free.
J Comput Chem. 2005 Dec;26(16):1701-18.
[7] Wang CK, Kaas Q, Chiche L, Craik DJ.
CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and engineering.
Nucleic Acids Res. 2008; Jan;36(Database issue):D206-10.