Hidden Markov model approach for identifying the modular framework of the protein backbone. Camproux A.C., Tuffery P., Chevrolat J.P., Boisvieux J.F., Hazout S. Protein Eng. 12(12):1063-73, 1999a.

The hidden Markov model (HMM) was used to identify recurrent short 3D structural building blocks (SBBs) describing protein backbones, independently of any a priori knowledge. Polypeptide chains are decomposed into a series of short segments defined by their inter-alpha-carbon distances. Basically, the model takes into account the sequentiality of the observed segments and assumes that each one corresponds to one of several possible SBBs. Fitting the model to a database of non-redundant proteins allowed us to decode proteins in terms of 12 distinct SBBs with different roles in protein structure. Some SBBs correspond to classical regular secondary structures. Others correspond to a significant subdivision of their bounding regions previously considered to be a single pattern. The major contribution of the HMM is that this model implicitly takes into account the sequential connections between SBBs and thus describes the most probable pathways by which the blocks are connected to form the framework of the protein structures. Validation of the SBBs code was performed by extracting SBB series repeated in recoding proteins and examining their structural similarities. Preliminary results on the sequence specificity of SBBs suggest promising perspectives for the prediction of SBBs or series of SBBs from the protein sequences.

Using short structural building blocks defined by a Hidden Markov Model for analysing patterns between regular secondary structures. Camproux A.C., Tuffery P., Buffat L., Andre C., Boisvieux J.F. and Hazout S. 12: 1063-1073, 1999b.

Hidden Markov Model was used to identify recurrent Short 3D Structural Building Blocks (SSBBs) describing protein backbones. Polypeptide chains are decomposed into successive short segments defined by their inter-Calpha distances. Fitting the model to a database of nonredundant proteins allows to identify 12 distinct SSBBs and describes the preferred pathways by which SSBBs are assembled to form the 3D structure of proteins. Proteins backbone were labelled in terms of these SSBBs. Observed SSBB preferences on fragments between different regular secondary structures suggest a predominant dependency on the following regular structure rather on the preceding one. Extraction of repeated SSBBs series between regular secondary structures shows some structural specificities within different connection types. These results confirm that SSBBs can be used as building blocks in the analysis of protein structures and can yield new insights about the structures of the coils function to the types of flanking secondary structures.

Exploring the use of a structural alphabet for a structural prediction of protein loops. Camproux A.C., de Brevern A.G., Hazout S., and Tuffery P. Theoretical Chemistry Accounts, 106, 28-35. 2001.

The prediction of loop conformations is one of the challenging problems of homology modeling, due to the large sequence variability associated with these parts of protein structures. In the present study, we introduce a search procedure that evolves in a structural alphabet space deduced from a hidden Markov model to simplify the structural information. It uses a Bayesian criterion to predict, from the amino acid sequence of a loop region, its corresponding word in the structural alphabet space. Results show, that our approach ranks 30% of the target words with the best score, 50% within the 5 best scores. Interestingly, our approach is also suited to accept or not the prediction performed. This allows to rank 57% of the target words with the best score, 67% within the 5 best scores, accepting 16% of learned words and rejecting 93 % of unknown words.

Structural alphabet : A review. de Brevern A.G., Camproux A.C., Hazout S., Etchebest C., and Tuffery P. Recent Adv. In Prot. Eng., 1: 319-331, 2001.

The considerable increase of the protein structural database allows to cross the line from the classical secondary structure description of proteins. While still confronted with numerous problems, defining structural alphabets is an emerging concept in the field of protein structure analysis. It is an attempt to objectively classify the whole set of conformations occurring in protein structures described by small overlapping fragments. It is expected to lead to a better understanding of protein architecture and to open new opportunities for protein structure prediction.

A Hidden Markov Model derived structural alphabet for proteins. A.C .Camproux, R. Gautier, Tuffery P., J. Mol. Biol., 339:591-605, 2004.

Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have setup a Hidden Markov Model (HMM) that discretizes protein backbone conformation as series of overlapping fragments (states) of 4-residue length. This approach learns simultaneously the geometry of the states and their transitions. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helixsub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On average the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale.Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1 Amgstroms of RMSd and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.