Computational protocol: Protein coalitions in a core mammalian biochemical network linked by rapidly evolving proteins

[…] Gene trees obtained from Ensembl were screened visually for duplication events and used to construct duplication vectors for each gene. These vectors consisted of the number of duplications falling at nodes labelled with particular times and taxon names in Ensembl (e.g. Clupeocephala, 320 MYA). Vectors were only extended to the point at which a duplication created a new gene family, to avoid confusion with selective forces acting on proteins with different functions, and non-vertebrate duplications were excluded. Vectors were subjected to hierarchical clustering using the R dist and hclust functions and Euclidian distances []. Euclidian distances between subpathway members and between proteins classified into particular phylogenetic categories were compared with distances for all other pairwise comparisons to identify groups differing significantly from average in their degree of clustering, using the Mann-Whitney test. [...] Lineages showing evidence of positive selection were identified from the Selectome database [] and confirmed using the Codeml program in the Paml (Phylogenetic Analysis by Maximum Likelihood) package []. Disease associations of GSIS genes were determined by reference to the OMIM (Online Mendelian Inheritance in Man) database []. Statistical analysis was carried out using the R environment []. […]

Pipeline specifications

Software tools Hclust, PAML
Databases Selectome
Application Phylogenetics
Organisms Mus musculus, Homo sapiens
Chemicals Adenosine Triphosphate, Glucose, Glutathione, Pyruvic Acid