Computational protocol: Genome-wide computational identification of WG/GW Argonaute-binding proteins in Arabidopsis

Similar protocols

Protocol publication

[…] The initial sequence dataset contained a manually selected collection of 26 proteins with WG/GW motifs from various plants (NRPE1 sequences from Arabidopsis (GenBank accession NP_181532), grape (XP_002265533), nightshades (AAY89359.1), spinach (AAX12374), tomato (AAY89359), rice (EEE56320, misannotated as a PolII subunit), Physcomitrella patens (XP_001766256), poplar (XP_002303926) and corn NRPE1 sequence (identified by TBLASTN () on the genomic sequence), Arabidopsis SPT5-like (NP_196049), GTB1 (NP_176723) and their orthologs in other plant species. The sequences were identified in public databases using a PSI-BLAST () based approach and pairwise reciprocal best-hit analyses. The scoring matrix was calculated by compositional analysis of this sequence dataset and subsequently used for the detection of domain boundaries in novel proteins. The scoring table contains values for each amino acid and reflects compositional differences between the domain and the whole protein (). The following formula was used for the calculation of values for each residue present in manually identified domains: Di = 2 × log2[(Nid/Nd)/(Nip/Np)], where i—each of the amino acids present in the domain sequence; Nip—number of occurrences of amino acid i in the whole protein; Np—number of amino-acid residues in the protein; Nid—number of occurrences of amino acid i in the domain; Nd—number of amino-acid residues in the domain. The values expressed in half-bits were rounded to three decimal places. If the amino acid was not present in the domain, the corresponding value in the table was set to zero, which ensured there was no effect on domain extension and dos scores. […]

Pipeline specifications

Software tools TBLASTN, BLASTP
Application Amino acid sequence alignment
Organisms Arabidopsis thaliana, Homo sapiens
Diseases Granulomatosis with Polyangiitis