Computational protocol: A Genome Wide Association Study of arabinoxylan content in 2-row spring barley grain

Protocol publication

[…] Different tools were employed to find annotation for unknown genes under the intervals. For genes under the associations that had Accession numbers, the nucleotide sequences were downloaded from the NCBI database and then Blasted to the barley genome MLOC loci ( The annotation for these MLOCs was established with a combination of PFAM analysis and by orthology to the other well annotated cereal genomes, Brachypodium distachyon, Sorghum bicolor and rice (Oryza sativa) ( MLOC numbers were used to search the morexGenes- barley RNA-seq database ( to identify potential Arabidopsis and or rice orthologs and also to download the transcript profile of the candidate genes across eight developmental stages. Other tools used included PFAM domain search ( The CAZY database ( was used as a reference for the potential glycosyltransferases (GT) and glycoside hydrolases (GH). [...] Amino acid sequences of barley, rice, sorghum and Arabidopsis glycosyltransferases were obtained from Ensemble Plants database ( using a PFAM domain search. For GT43 the conserved PF03360 domain was used [] and the protein sequence of TaGT43-4 described in Zeng et al. (ADK56174) [] was included in the phylogeny analysis along with Arabidopsis IRX14 (AT5G67230) and IRX14-L (AT4G36890), IRAX9 (AT2G37090) and IRX9-L (AT1G27600). The MLOC_72459 sequence from the ( was used in a blastn search against the barley nucleotide sequences available on the NCBI database to obtain a full length gene sequence. The amino acid sequence of this gene was aligned with other sequences from rice, sorghum, Arabidopsis, barley and wheat. The MUSCLE alignment tool available in the Geneious software package version 8.1.3. [] was used to align all sequences, and gaps were deleted from the alignment. A phylogenetic tree of the alignment was then produced using the RAxML [] tool available in the same software package. Protein model was set to GAMMA GTR with 1000 bootstraps. For DUF579, the PFAM PF04669 was used to search for members of this family in selected species. […]

Pipeline specifications

Software tools BioMart, BLASTN, MUSCLE, Geneious, RAxML
Databases CAZy Pfam
Applications Genome annotation, Phylogenetics, RNA-seq analysis, Nucleotide sequence alignment
Organisms Hordeum vulgare, Triticum aestivum
Chemicals Cholesterol, Glucose