Computational protocol: Multiple gains of spliceosomal introns in a superfamily of vertebrate protease inhibitor genes

Similar protocols

Protocol publication

[…] Genomic data for serpins from human, chicken [], X. tropicalis [], D. rerio [], G. aculeatus [], O. latipes [], T. rubripes [] and T. nigroviridis [] were extracted from the Ensembl genome browser, release 51 [], or in the case of P. marinus [], from PreEnsembl (). Sequences from the B. floridae genome were gathered from the JGI genome browser (). EST and cDNA data mining included searches in the NCBI trace archive () and in the UCSC genome browser [], applying the BLAST algorithm. Some gene models were refined using EST data (Additional file ).All intron positions predicted by gene models were examined visually, corrected and amended manually, if necessary. Whenever cDNA or EST sequences were available, intron positions were checked by means of GENEWISE []. Protein sequences were aligned with CLUSTALW [] with some manual improvements. Intron positions were projected onto the sequence of mature human α1-antitrypsin as described []. All intron locations allude to the reference protein, unless stated otherwise. Only introns mapping to the conserved serpin scaffold (i.e. positions 33 to 394 of human α1-antitrypsin) were considered.Sequences of non-canonical introns were searched for repetitive elements with the RepeatMasker package (version 3.2.6; ()) and with RepBase Censor () [] using default settings.Phylogenetic analysis was performed using the Neighbor-Joining method [] conducted in MEGA4 []. All positions containing gaps and missing data were eliminated from the dataset (complete deletion option). There were a total of 340 positions in the final dataset. […]

Pipeline specifications

Software tools GeneWise, Clustal W, RepeatMasker, CENSOR, MEGA
Databases Repbase
Applications Phylogenetics, Genome data visualization
Organisms Dipturus trachyderma