[…] Source data are retrieved daily from primary public servers. Integr8 and Genome Reviews are the source of genome data, including curated gene sets and annotation and cross-references to UniProtKB, InterPro, Gene Ontology and the Protein Data Bank. GenBank and RefSeq are the source of NCBI cross-references (RefSeq accession, GeneID and GI number). The OMA database provides orthology predictions for pairs of genes. Pre-computed gene predictions from the Glimmer (), GeneMark, GeneMarkHMM () and Prodigal ( packages are provided by the NCBI, and predictions by the EasyGene method () are downloaded from the EasyGene web site ( Genome Reviews data are used as a reference, because it incorporates substantial automatic and manual annotation from the gold standard UniProtKB knowledgebase (). Cross-references from GenBank and RefSeq genes are merged into Genome Reviews records based on the position of the 3′-end of the genes. This allows to correctly map not only genes for which no cross-references exist between the databases, but also those for which the 5′-end (start site) has been possibly changed by UniProtKB curators. […]

Pipeline specifications

Software tools GeneID, Glimmer, GeneMark, EasyGene
Databases UniProtKB
Application Genome annotation