Overview

What is HHalign-Kbest ?

HHalign-KBest is useful to model the structure of protein domains using automatically optimized alignments and structural models in case of low sequence identity (<35%) between a query and a template protein. It can generate k suboptimal alignments (e.g. top-k scoring) rather than only the optimal one which may contain small to large errors.

This server runs HHalign-KBest (k=500) to generate the suboptimal alignments, generates the corresponding models (using Modeller), evaluates and ranks them (using Qmean Zscores). It is expected to automatically provide better alignments and more precise structural models. See FAQ for more information.

Since November 2014, basic automated template identification using hhblits and the PDB subset at 70% sequence identity is possible. Only the best template identified by hhblits is considered, not regarding sequence coverage. For sequence identity of more than 35% between a query and a template protein, the exploration of the sub-optimal alignments is generally not useful, hence the automatic modeling server will automatically reduce to only the best alignment (k=1). It is possible to overcome this limitation by explicitely specifying a template, in which case HHalign-KBest (k=500) will be used. You can also identify a structural template using the HHpred/HHsearch server or the Mobyle instance of HHblits for remote homology detection or from your own expert knowledge.

Run HHalign-Kbest

The HHalign-Kbest service is integrated in the RPBS Mobyle portal.

Download HHalign-Kbest

A standalone version of the HHalign-Kbest package can be downloaded here.

History

2015, Jul 22nd Fix for large sequences which would cause crashes.
2014, Nov 13th Service release on the Mobyle portal.

Frequently asked questions

How does the pipeline of the HHalign-KBest server works ?

The input for the query and template protein can either be :

  • a single sequence (fasta format),
  • a multiple sequence alignment (fasta or clustal formats),

In case the input is a single sequence, a standard HHblits search will be used to identify a candidate template in a PDB subset filtered at 70% sequence identity. In case a template PDB file is provided as input it will be used. Given a template, the full protocol is as follows: 500 suboptimal alignements between the query and the template will then be generated using the HHalign-KBest algorithm which was developed based on the HHsuite 2.0.16 package (Söding, 2005). If the “models” mode is selected by providing a valid MODELLER license key, the server will generate several models for every suboptimal alignment and will identify the 5 best suboptimal alignments from the evaluation of their corresponding models. The protocol consists in two steps:

  1. 20 models are first generated by MODELLER9.12 (Eswar et al., 2006) using standard automodel script without any refinement for each (sub)optimal alignment and the top-32 are selected based on the average Qmean4 Zscore (Benkert et al., 2011);
  2. a second run generates 50 models for each of the top-32, providing the final top-5 alignments and the best model for each of the top-5 alignments.

In case the input is a single sequence and the template identified using HHblits has a sequence identity of more than 35%, the number of alignements considered is reduced to only 1.

How long does it take to run HHalign-KBest ?

If your input is a single sequence, it takes several minutes to generate a multiple sequence alignment; Next, generation of the suboptimal alignments takes only a few seconds. The most time-consuming step is the modeling part and the subsequence evaluation of the structural models when the mode “models” is selected. On this server, depending on the load, the execution time may last from 30 minutes over 1 hour depending on the query and template length. Note: Server update scheduled for early 2015 should reduce execution time significantly.

What are the results returned by the HHalign-KBest server and what are they for ?

The results consist of 3 parts when using the "models" mode and of only the first 2 parts when using the "alignments" mode :

  1. The first part is the trace of your input files. If you did not input a template, information about candidate templates is returned. If you input a template PDBoption, the template PDB file is also returned (only for the given chain and renamed as chain A) together with the PDB’s protein fasta sequence;
  2. the second part is the suboptimal alignments in hhr and fasta formats;
  3. the third and last component is the part dedicated to structural modeling. It contains a ranking summary of top-5 alignments and models, and their fasta-format alignment and pdb-format model files. Up to five models are presented on the html result page. An archive containing the models corresponding to up to the 10 best alignments is also returned.

If you are structural biologist, you may compare and select the best model(s) from the top-5 as your final model(s). You may also refine these model(s) based on another modeling strategy or tool. You may also visualize all the suboptimal alignments (for example using Jalview) to check ambiguously aligned regions and correct errors according to your knowledge.

Is there a possibility to change the parameters or even a part of the pipeline?

For the server usage, it is not presently possible to change parameters, such as number of suboptimal alignments (the k in "k-best", k=500 on the server) or number of models generated for the evaluation, let alone the modeling or evaluation method. You may change the standard options by downloading the standalone version of HHalign-KBest program. Depending on your computational power, you may then include more advanced refinement and larger number of models for every alignment. You may also combine the top-5 alignments for further exploration of better alignments.

How to install the HHalign-KBest locally ?

Download the package here. Follow the README file for installation and usage. This package can be used to align 2 HMM profiles (hhm format). See the guide of hhsuite for searching a template, creating an hhm file, etc. If you want to locally evaluate suboptimal alignments by comparative modeling methods, you can download and install your preferred applications (Modeller and Qmean for example).

References

If you find this server useful for structural model optimisation search please cite :

Yu J, Picord G, Tuffery P, Guerois R.
HHalign-KBest: Exploring sub-optimal alignments for remote homology comparative modeling.
Bioinformatics, 2015 Dec 1;31(23):3850-2.

The HHalign-Kbest algorithm was developed based on the original HHsearch algorithm:

Söding J.
Protein homology detection by HMM-HMM comparison.
Bioinformatics. 2005 Apr 1;21(7):951-60.

Profiles calculated on the server are built using the PSI-BLAST algorithm on the nr database filtered to remove sequences with identities above 70%:

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402.

Structural models are generated using the MODELLER 9.12 release:

Eswar N, Marti-Renom MA, Webb B, Madhusudhan MS, Eramian D, Shen M, Pieper U, Sali A.
Comparative Protein Structure Modeling With MODELLER.
Curr Protoc Bioinformatics. 2006 Oct;Chapter 5:Unit 5.6.

Evaluation of the models and subsequent selection of the most likely sub-optimal alignment uses the QMEAN Z-score:

Benkert P, Tosatto SCE and Schomburg D.
QMEAN: A comprehensive scoring function for model quality assessment.
Proteins. 2008 Apr;71(1):261-77.
Benkert P, Biasini M and Schwede T.
Toward the estimation of the absolute quality of individual protein structure models.
Bioinformatics. 2011 Feb 1;27(3):343-50.

HHblits will be used for searching for the best template, If the template is not provided by the user:

Remmert, M., Biegert, A., Hauser, A., & Söding, J.
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.
Nature Methods. 2012 9(2):173-5.