Computational protocol: Salamander Hox clusters contain repetitive DNA and expanded non-coding regions: a typical Hox structure for non-mammalian tetrapod vertebrates?

[…] Sequences were screened to trim vector, adapters, and contaminating Escherichia coli sequences. After an initial assembly using GS De Novo Assembler (454 Life Sciences, Branford, USA), contigs and singletons were assembled further using DNASTAR SeqMan (DNASTAR, Inc., Madison, USA). Contiguous sequences of assembled BACs were searched (blastn) against salamander expressed sequence tagged contigs at Sal-Site []; non-redundant nucleotide and protein databases at NCBI (blastx and tblastp) [] were used to identify and annotate gene regions. For multispecies comparisons, genomic sequences for H. sapiens (GRCh37.10), and M. musculus (GRCh38.1), were obtained from NCBI. Anolis carolinesis (AnoCar 2.0) and D. rerio (Zv9) were obtained from Ensembl []. X. tropicalis (build 7.1) was obtained from Xenbase []. Sequences were aligned using MultiPipMaker []. Annotated repeats were identified by searching re-assembled BAC clones against all deposited repeats in RepBase []. Newt-specific repeats were identified using MultiPipmaker [] by aligning re-assembled BAC clones against each other and by performing self-self BAC alignments. The “search both strands” and “high sensitivity” options were used in MultiPipmaker to identify significantly similar non-coding sequences that are located to different positions either within or between BACs. The terminal base pair positions for these alignments were recorded to denote the positions of repetitive sequences within BACs. If the two repeats occurred within 50 bp of each other, they were compiled as a single repetitive sequence with the most terminal base positions denoting the repeat span. The base pair coordinates for newt-specific repetitive sequences were combined with base pair coordinates for RepBase repetitive sequences to generate an underlay file (Additional file : Table S1), and this was used to create maps of repetitive elements for the HoxD and HoxC genomic regions. […]

Pipeline specifications

Software tools Newbler, BLASTN, BLASTX, MultiPipMaker
Databases Xenbase Repbase
Application Miscellaneous
Organisms Mus musculus, Homo sapiens