Genome annotation information is available from many sources including publications on the sequencing and annotation of genes for whole genomes, individual chromosomes, and whole-genome annotation computed by multiple bioinformatics groups. Ensembl and the National Center for Biotechnology Information (NCBI) independently developed computational processes to annotate vertebrate genomes (Kitts 2002; Potter et al. 2004). Both pipelines predict genes, transcripts, and proteins based on interpretations of gene prediction programs, transcript alignments, and protein alignments. In addition, manual annotation is provided by the Havana group at the Wellcome Trust Sanger Institute (WTSI) and the Reference Sequence (RefSeq) group at the National Center for Biotechnology Information (NCBI).
Provides useful data mining resources, an enrichment analysis tool and web service APIs. H-InvDB is a comprehensive annotation resource of human genes and transcripts. This database includes gene structures, alternative splicing variants, non-coding functional RNAs, protein functions, functional domains, sub-cellular localizations, metabolic pathways, protein 3D structure, genetic polymorphisms (SNPs, indels and microsatellite repeats), relation with diseases, gene expression profiling, and molecular evolutionary features, protein-protein interactions (PPIs) and gene families/groups.
Gathers informations about genome-wide macaque gene. RhesusBase includes more than 170 million annotation records from about 1,760 next-generation sequencing (NGS) data sets. Searches can be made by using ID, location or sequence. The database also includes two tools: Molecular Evolution Gateway, which contains multiple NGS-oriented genomic interfaces to enhance data visualization, analysis and to facilitate comparative studies, and PopGateway, which enables the in-depth visualization of the database.
Tracks identical protein annotations on the reference mouse and human genomes with a stable identifier. CCDS is a resource that supports consistent, comprehensive annotation of the protein-coding content of the human and mouse genomes. It is built by consensus; each member of the collaboration contributes annotation, quality assessments, and curation. This data sets can be accessed from several public resource.
Provides public archival, retrieval and analytical services for biological information. DDBJ furnishes an analytical environment for domestic researchers to examine large-scale biology data. It offers access to a large collection of databases covering the archiving of sequences with functional annotation and molecular abundance. This platform allows data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan.
Provides publicly available nucleotide sequences for formally described species. GenBank is a comprehensive public database of nucleotide sequences. It also supports bibliographic and biological annotations. The sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun and environmental sampling projects.
Provides an integrated biomedical knowledgebase of human biological data. The GeneCards Suite amalgamates information from >150 selected sources related to genes, proteins, pathways, variants, diseases - and the connections amongst them. By highlighting associations between genes and phenotypes, the knowledgebase empowers the suite’s NGS analysis tools VarElect and TGex. GeneHancer is a database of regulatory elements (enhancers and promoters) and their target genes, facilitating phenotype interpretation of non-coding variants in WGS analysis in VarElect and TGex.