Computational protocol: The repertoire of G-protein-coupled receptors in Xenopus tropicalis

Similar protocols

Protocol publication

[…] The complete proteome of X. tropicalis was obtained from the JGI Xenopus tropicalis v4.1 database []. Protein transcripts shorter than 250 amino acids were removed as they were too short to hold seven transmembrane regions. Transmembrane regions were predicted using the HMMTOP [], TMMHMM [] and SOSUI [] programs with default settings. For each program, a range of six to eight predicted TM domains were retrieved into Temporary file 1.In this temporary file, some GPCR sequences were still likely to be missed because a few GPCRs might have under predicted or over predicted the number of six to eight TM domains (when we tested on a dataset of 1426 human GPCRs obtained from GPCRDB, 99% of the receptors had between six to eight predicted TM domains). To detect the potential missing GPCRs, additional BLASTP searches were conducted. GPCR amino acid sequences were downloaded from GPCRDB and included the A-F GPCR classes, the putative families, the non-classified and the non-GPCR families. We performed a BLASTP search with an E-value below 10-20 against the complete proteome of X. tropicalis by using the sequences from GPCRDB as queries. For each sequence, the top 20 hits longer than 250 amino acids were extracted into Temporary file 2. These two temporary files were merged together into one database.CD-HIT was performed with 90% sequence identity to remove polymorphisms, splice variants, pseudogenes and duplicates from the database. CDD v2.14 [] (E-value = 10-4) and Pfam 22.0 [] (E-value = 0.01) were used for a GPCR conserved domain search. The remaining sequences in the database used BLASTP against the NCBI non-redundant database. The GPCRs were named according to the BLASTP best hit if at least four of the five best hits were in the same family. Throughout this article the GPCR families are written in italics. [...] CDD v2.14 (E-value = 10-4) and Pfam 22.0 (E-value = 0.01) were used to identify the borders of the transmembrane region and remove the N- and C-termini. The GPCR sequences were aligned using ClustalX 2.0 [] with default alignment parameters. Neighbour-joining trees were constructed with MEGA4 [] using the Poisson correction model with 1000 bootstrap replicates. Gap sites in the alignment were not used in the phylogenetic reconstruction (the complete-deletion option). In the phylogenetic tree, we opted to use a suffix of sequence number (or only the sequence number) for each X. tropicalis receptor gene. When referring to other species, a one-letter symbol was used as a prefix (e.g., hGLP1R, zV1R1, etc.). [...] The tissue expression distribution of X. tropicalis GPCRs was investigated based on expressed sequence tag (EST) data. The X. tropicalis GPCR sequences were queried using TBLASTN against Xenbase 2.3 database [] with an E-value of 1e-15. […]

Pipeline specifications

Software tools Clustal W, MEGA, TBLASTN
Databases Xenbase Pfam
Applications Phylogenetics, Amino acid sequence alignment
Organisms Xenopus tropicalis, Homo sapiens
Chemicals Rhodopsin, Secretin