Help on yakusa result page

The left colum contains a typical yakusa output, the right column, some explanations.

Parameters of yakusa used for this run, all of them can be change by the user (some of them can not be changed via the web interface)
Query: ../d1i4ja_.alpha 
  • Query file path and name
Database: ./bank_1249_domains.alpha 
  • Database file path and name
Output: essai
  • Output file path and name
Kept structures 50
  • The 50 best ranked structures are kept
Seed length: 4, delta_max: 3, DELTA_max: 7, # groups: 36
  • seed lengh , see "local degeneracy" in Yakusa overview,
  • DELTA_max see "global degeneracy"in Yakusa overview,
  • #group: number of angle groups. We usually cluster α angles into classes over a 10° mesh, so there are 36 classes of α angles.
Minimum SHSP length: 6, 
  • Minimal length of final SHSPs
Average difference between 2 angles: 30, 

Maximum difference between 2 angles: 60
  • For the extension step of seeds to SHSPs a score is computed: with being the angular difference between the two angles α q+i and α b+i , k the seed length, T, a constant (this average difference). We observed that average angular difference between two randomly choosen angles over all the PDB structures is about 42°. By default T is set to 30°, in order to get a negative score for two angles taken at random. This property is needed for the seed extension step.
    But we can not accept a too large difference between two angles, and this maximal acceptable difference between two angles during this step is 60°.
Distance between 2 seeds: 10
  • After the "Searching seeds step", we first gather found seeds if they overlap or they not too distant; for this run, the maximal distance between two seeds is 10 residues.
Helix are not hidden
  • As an option, a filter can be put on canonical α helix. It hides the middle of α helices but keeps their ends. Helices are hidden during the search of seeds, but not in the extension step. Therefore, α helices are finally still found but as they would lead to the generation of many seeds in first steps, the scan is faster.
No score printed
  • A not really usefull output unless you want do statistic on yakusa scores; it prints all possible scores
No structure for Gok printed
  • Output file to see results and structures with midas and gok (only on irix/sgi). Contact the authors if you want to try it.
No structure for Molscript printed
  • Output file for Molscipt. It generates an image of protein structures but you need the script LanceMolscript.csh, and molscipt.
No structure for Rasmol printed
  • You can ask for output files readable by Rasmol. It will generate, for each pair query/database structure a .pdb file (with the two structures) and a .spt file (script file for rasmol). To use them, open a .pdb file in rasmol a execute the coresponding script in it (type script .spt)
No structure for R printed
  • useless
Classification mode: m with MTD probabilty of spatially compatible SHSPs 
(threshold for RMS : 15)

No independant or conditional probabilities (2-uples) computed

  • Several scores can be used to rank found structures, see Yakusa scores
Reading MTD model in ../Yak_DB/MLearn_bank_non_redondante_from_rand
  • Input file for MTD model (not necessary, there is a default model).
Computed Statistics: Some usefull numbers
STAT Number of proteins read in database: 1249

STAT 1220 proteins have common pattern(s) with query structure
  • Number of proteins in the database given, and number of proteins sharing SHSPs with query
with a mean of  2.7 common patterns and  

8.6 total common patterns (not all compatible)
  • Average number of SHSPs, compatible and not compatible (some of them areoverlapping)
STAT 0 proteins were ignored.

0 proteins cannot be read,

0 cannot be encoded,

0 have problem when searching for seeds

29 proteins have no common patterns with query structure
  • Some database structures may be ignored
    • because of reading (too much fragments...)
    • because of the "discretisation"
    • because there were too much seeds
  • And some of the structures have no SHSP with the query
STAT Proteins mean score: 13.3537, standard deviation 5.72275

Quartiles (only for kept structure): 131.30 29.60 25.55 23.83 0.00,

inter quartile distance: -5.77
  • Mean and standart deviation of ranking score, quartiles and interquartile distance
Query length: 110 residues Query first res: 1 Query last res: 113

Description query : HEADER 0000 SCOP/ASTRAL domain d1i4ja_ [76734]
  • Some informations about the query structure

Results of the scan for the best ranked proteins.
The same information are given for each protein.

Protein rank: 1 score: 131.30 Z-score: 20.61 name: d1i4ja_.pdbi : HEADER ... 
  • rank: protein rank for this scan of the database
  • score: ranking score, based upon MTD, computed for "spatially compatible" SHSPs or for all SHSPs found (input option)
  • Z-score: the Z-score of the ranking score (the Z-score is computed in the usual manner over all scores of protein structures in the scanned database). Usually, one can see a fall in this Z-score between significant structural matches and random ones. At a first glance, matches giving Z-scores above 6 or 7 are likely to be significant matches.
  • PDBsum/RCSB: link to PDBSum and Protein Databank
Length: 110 residues, 104/110 residus aligned ( 94.5%)
  • Length: length of the database structure found.
  • 104: aligned residue number
  • 94.5%: aligned residue percentage (55.0 = 247/449*100)
Path: /abiusers/people/mathilde/Boulot/ASTRAL_FROSTPDB_domain/d1i4ja_.pdbi
  • Database structure PDB file path (usefull for rasmol or gok/midas)
SHSPs: 2 SHSPs found
  • Number of SHSPs found
Id      pos_query   chain    pos_bank  chain    score  shift  length    RMSD     MTD_proba   RGBcolor 

[ 1] 2 - 79 ( ) 2 - 79 (A) 2340 0 78 0.0 1.989308E-94 #AAFFFF

[ 2] 86 - 111 ( ) 86 - 111 (A) 780 0 26 0.0 2.499245E-38 #BBBBEE
  • Id: SHSP id
  • pos-query: first and last residue pdb number of SHSP in the query structure
  • chain: chain in the query structure (main chain if blank)
  • pos-bank: first and last residue pdb number of SHSP in the database structure
  • chain: chain in the database structure (main chain if blank)
  • score: score computed for SHSP, based on α angle differences
  • shift: SHSP shift between database and query, i.e. [database structure residue index] minus [query structure residue index]
  • lenght: SHSP length
  • RMSD: SHSP RMS, i.e. RMS between database and query fragment of SHSP
  • MTD_proba: SHSP probability (MTD based)
  • RGBcolor: SHSP color (for use in rasmol and molscript)
Group of SHSPs spatially compatible: 1  2 
(score 131.30, 94.55% residues aligned)
  • This line is the group of "spatially compatible" SHSPs, which is found by computing crossed RMS. The score is the overall score of compatible SHSPs, i.e. the sum of logarithm of the compatible SHSP MTD probabilities
Cross RMSD of the SHSP

| num HSSP for RMSD:

num HSSP for |

Rotation matrix | 1 | 2 | avg |

---------------------------------------------

1 | 0.0 | 0.0 | 0.0 |

2 | 0.0 | 0.0 | 0.0 |
SHSPs amino acid sequences

SHSP 1

query EAKAIARYVRISPRKVRLVVDLIRGKSLEEARNILRYTNK

||||||||||||||||||||||||||||||||||||||||

database EAKAIARYVRISPRKVRLVVDLIRGKSLEEARNILRYTNK


query RGAYFVAKVLESAAANAVNNHDALEDRLYVKAAYVDEG

||||||||||||||||||||||||||||||||||||||

database RGAYFVAKVLESAAANAVNNHDALEDRLYVKAAYVDEG


SHSP 2

query LPRARGRADIIKKRTSHITVILGEKH

||||||||||||||||||||||||||

database LPRARGRADIIKKRTSHITVILGEKH
  • Alignment of the sequences of query and database fragments of one SHSP:
    • vertical lines ('|') group identical residues (.i.e. whose score >= 4 in BLOSUM62),
    • Semicolons (':') group residues with BLOSUM62 score >= 2,
    • Points ('.') group residues with BLOSUM62 score >= 1