Guidelines to
HYDROPHOBIC CLUSTER ANALYSIS (HCA)
 

The HCA method is based on the use of a bidimensional plot, called the HCA plot and whose principles are illustrated on the following figure (Figure 1).
The bidimensional plot is associated with an alpha helicoidal pitch (3.6 residue/turn, connectivity distance (residues separating two different clusters) of 4) which has been shown to offer the best correspondence between clusters and regular secondary structures (Refs. 1 and 2). Examination of the HCA plot of a protein sequence allow to easily identify globular regions from non globular ones and, in globular regions, to identify secondary structures. This 2D signature, which is much more conserved than 1D sequence and which can be enriched from the comparison of families of highly divergent sequences, allows to succesfully detect at low levels of sequence identity significant similarities (on the structural and functionals levels) from background noises.

An example of the detection of a duplicated domain in a protein sequence
Two examples of clusters often associated with beta-strands and alpha-helices

The HCA program is accessible online at mobyle@RPBS.
 
 
Figure 1 (adapted from the figure 1 of  Ref.1)
Illustration of the principles of the HCA diagram
The protein linear sequence (1D) (here the human alpha1 antitrypsin, SW identifier : ) is shown on the top of the figure with hydrophobic amino acids coloured. This sequence  is written on an alpha helix displayed along a cylinder. The cylinder is then cut parallel to its axis and unrolled in a bidimensional diagram (2D). This diagram is compacted and duplicated in order to restore the full environment of each amino acids. Hydrophobic amino acids are not distributed random but form clusters. The positions of these clusters have been shown to correspond to the positions of regular secondary structures (alpha helices and beta strands) (Ref.2). This is strikingly illustrated by the correponding experimental structure (3D). The form of the clusters is generally indicative of the type of secondary structures (vertical clusters are often associated to beta strands whereas horizontal ones often correspond to alpha helices).
A detailled list of the percentages of alpha, beta and coil structures associated to each cluster (as deduced from experimental structures) is in preparation. Conversely, sequences stretches between clusters mainly correspond to loops. The 2D structure of a protein sequence can be therefore easily deduced from the examination of the HCA plot.

More details are available in :
1. Deciphering protein sequence information though hydrohobic cluster analysis. Current
status and perspectives. Callebaut I, Labesse G, Durand P, Poupon A, Canard L,
Chomilier J, Henrissat B., Mornon JP. Cell. Mol. Life Sci. (1997) 53, 621-645

2. Detection of secondary structure elements in proteins by hydrophobic cluster analysis.
Woodcock S, Mornon JP, Henrissat B. Protein Eng (1992) 5 (7): 629-635

3. Hydrophobic cluster analysis: procedures to derive structural and functional
information from 2-D-representation of protein sequences. Lemesle-Varloot L, Henrissat
B, Gaboriaud C, Bissery V, Morgat A, Mornon JP. Biochimie (1990) 72 (8): 555-574

4. Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid
sequences. Gaboriaud C, Bissery V, Benchetrit T, Mornon JP. FEBS Lett (1987) 224
(1): 149-155