Computational protocol: Structure-Based Phylogenetic Analysis of the Lipocalin Superfamily

Similar protocols

Protocol publication

[…] According to the Structural Classification of Proteins (SCOP) [] database lipocalin structures correspond to "lipocalin fold" which is categorized under all β-class of proteins. Lipocalin fold in SCOP (version 1.75)contains only one superfamily referred to as "Lipocalins" comprising of 9 families. In the lipocalin superfamily, crystal structures of 378 lipocalin domains comprising of ligand bound and ligand-free forms are available. Generally lipocalins have the tendency to bind to different types of ligands. Further ligand binding leads to structural changes. But in the current study our interest is to analyze the genuine structural characteristics without the influence of any external factors. So, for the current analysis 39 out of 378 protein domains available in the ligand-free forms have been considered. These 39 structures span 9 protein domain families in the lipocalin superfamily. lists these 39 domains with their SCOP codes, organism names and SCOP family names. Pair-wise structural comparison was performed for all possible pairs from the dataset of 39 lipocalin domains resulting in 741 structural comparisons. Structural comparisons have been carried out using DaliLite pairwise alignment tool []. The structural alignments corresponding to these 741 pairs have been provided in . The sequence identities corresponding to these alignments have been calculated and the frequencies of occurrence of pairs in various sequence identity ranges are shown in . Over 80% of lipocalin domain pairs in the current dataset correspond to the sequence identity range of 1–20%. Therefore, sequence-based phylogeny studies may be less effective than the 3-D structure-based evolutionary analysis owing to low sequence similarity among the homologues. [...] From the pairwise structural alignments obtained using DaliLite, a structural dissimilarity matrix was computed in order to generate phylogenetic tree. A measure referred to as structural dissimilarity metric (SDM) has been calculated for every pair using the formula [] given below: SDM= −100*log(w1*PFTE+w2*SRMS) Where, w1=(1−PFTE + 1−SRMS)/2 and w2 =(PFTE+SRMS)/2 PFTE= Numberoftopologicallyequivalentresidues/lengthofthesmallerprotein SRMS= 1−(RMSD/3.0) RMSD is the root mean square deviation of topologically equivalent Cα atoms in Å. Topological equivalence of Cα atoms is defined by ≤3Å cut-off distance between Cα atoms from the two structures after optimal superposition. The SDM values give the measure of the structural dissimilarity for a given pair of structures. Using the above mentioned formula, SDM values for all the 741 domain pairs were calculated. With these values 39 x 39 structure dissimilarity matrix has been generated. This matrix has been used to build the structure-based phylogenetic tree using Kitsch, a distance based algorithm from PHYLIP (version 3.573c) suite of programs []. The Kitsch program from the PHYLIP suite uses a Kitsch-Margoliash and Least Squares method with the evolutionary clock. Also, for the 741 pairs, distance matrices were generated on the basis of the structure-based sequence alignments []. Also for the 39 domains the traditional sequence-based phylogenetic tree has been obtained using clustalW2 (version 2.1) by considering the Neighbour-joining tree method [–]. The sequence alignments obtained using clustalW2 have been provided in . […]

Pipeline specifications