Computational protocol: Selective Constraint on Noncoding Regions of Hominid Genomes

Similar protocols

Protocol publication

[…] We obtained a list of human–mouse-dog orthologs via Ensmart (http://www.ensembl.org) and selected the trios that were unique reciprocal best hits []. We then used the Ensembl Perl api to identify genes among these whose 5′ upstream regions do not contain another Ensembl gene within 10 kb. For each ortholog trio for which this was true in all three species, we downloaded 10 kb of upstream sequence from the human [,], mouse [], and dog (The Broad Institute, Cambridge, Massachusetts, United States, and Agencourt Bioscience, Beverly, Massachusetts, United States) genomes via Ensembl. All sequences were premasked for repetitive sequence using Repeat Masker (http://www.repeatmasker.org). For the same set of genes, we also obtained a copy of the UCSC human–chimpanzee and mouse–rat blastz alignments via Ensembl's perl api. There were 5,547 ortholog trios for which we obtained a human–chimpanzee alignment and 5,434 trios for which we obtained a mouse–rat alignment. We also repeated this process to get a set of human–mouse–chicken orthologs and downloaded the corresponding chicken genome sequence from Ensembl []. There were 3,223 human–mouse–chicken ortholog trios with a human–chimpanzee alignment. […]

Pipeline specifications

Software tools RepeatMasker, LASTZ
Databases EnsMart
Application Nucleotide sequence alignment
Organisms Homo sapiens, Pan troglodytes