Computational protocol: In silico analysis of the cyclophilin repertoire of apicomplexan parasites

Similar protocols

Protocol publication

[…] Initially, putative apicomplexan Cyps were identified using BLASTp and tBLASTn algorithms to search in GenBank® protein and nucleic acid databases as well as in PlasmoDB, ToxoDB, CryptoDB, and in the Theileria parva genome database of TIGR. S. pombe Cyp1 and Cyp2 were used as query sequences. These Cyps were chosen because they are not closely related. If a Cyp subfamily member was not identified in one of the apicomplexan organisms, a Cyp of the same subfamily from a closely related apicomplexan parasite was used as query to search in protein, cDNA, EST and genome databases. This method ensures that no Cyps are missed in any of the taxa. In order to prevent that no complete subfamilies was overseen, BLAST analyses were also performed using the complete T. gondii Cyp repertoire as a query. However, no additional Cyp sequences could be identified.In contrast to conventional nomenclature for many Cyps, molecular mass suffixes in the names were given with one position after the decimal point since otherwise identical names would have resulted in a few cases. It was decided not to use suffix letters to avoid a possible confusion with mammalian Cyps. For instance, a Cyp19A might have been confused with a human CypA/PPIA. In addition it should be mentioned that all molecular mass suffixes used have been derived from the predicted sequence of unprocessed proteins. Although this can currently be only a provisional nomenclature, consecutive naming with numbers or letters would result in different names for orthologues Cyps and identical names for unrelated Cyps of different apicomplexa. A more function based nomenclature of apicomplexan Cyps should be introduced later, when at least for one apicomplexan genome all Cyps have been verified experimentally. For human and S. pombe Cyps, names according to the entries in the ENSEMBL database were used. [...] Homologous putative protein sequences were aligned using ClustalW2 []. Maximum likelihood phylogenetic trees were then calculated with PhyML [] using the approximate likelihood ratio test option and the JTT model [] for amino acid substitution. The program was set to estimate the proportion of invariable sites and the gamma distribution parameter, while the number of substitution rate categories was set to four. The input tree was built using the BIONJ algorithm implemented in PhyML. The resulting trees in Newick format were visualized and processed using MEGA4 [,]. [...] For identification of protein domains, CD-BLAST [,] and InterPro Scan [] were used. Moreover, protein sequences were scanned for subcellular localization signals with PSORT, SignalP [], PATS [], PlasMit [,], and Mitoprot []. […]

Pipeline specifications

Software tools BLASTP, TBLASTN, Clustal W, PhyML, MEGA, PSORT, SignalP, MITOPROT
Databases PlasmoDB CryptoDB ToxoDB
Applications Phylogenetics, Protein sequence analysis, Amino acid sequence alignment
Organisms Cryptosporidium hominis, Toxoplasma gondii, Plasmodium falciparum, Theileria annulata, Theileria parva
Diseases Babesiosis
Chemicals Cyclosporine