Computational protocol: De Novo Genes Arise at a Slow but Steady Rate along the Primate Lineage and Have Been Subject to Incomplete Lineage Sorting

[…] We performed a BLASTp search of the human proteins against the merged primate protein data set using an e-value threshold of 1 × 10−4. These results formed the basis for the list of initial candidate genes.We used tBLASTn to search the protein sequences of interest against the genomes of up to five outgroup genomes (chimpanzee, gorilla, orangutan, gibbon, and macaque). Candidate human-specific de novo genes were searched against all five, and human + chimpanzee genes were searched against the other four, etc. We only considered cases with tBLASTn hits with sequence identity (SI) >60% and coverage >0.4 (length of the hit/length of the human protein).We excluded cases where we could not detect the orthologous sequence in the outgroup genomes, with the exception that we retained cases where the orthologous DNA was unidentifiable in only one of gibbon or macaque. We discarded cases where more than one possible homologous sequence was found in one or more outgroups or where the human protein had highly similar copies (SI > 90%) in the human genome itself. […]

Pipeline specifications

Software tools BLASTP, TBLASTN
Application Amino acid sequence alignment
Organisms Gorilla gorilla, Pan troglodytes, Homo sapiens