Computational protocol: Genome-wide analysis of MATE transporters and expression patterns of a subgroup of MATE genes in response to aluminum toxicity in soybean

Protocol publication

[…] The full protein sequences of 117 soybean MATE (Additional file : Table S1 and Additional file ) and 35 previously reported MATE from other plant species (Additional file : Table S4) were used for multiple sequence alignments by ClustalW in MEGA 6.0 []. The unrooted phylogenetic tree was then constructed by MEGA 6.0 [] using the Maximum Likelihood (ML) algorithm with 1000 bootstraps, where the amino acid substitution model was equal input model with uniform rates among sites, using partial deletion (95 % site coverage as cutoff) for gaps and missing data. Gene structure analysis was performed using the Gene Structure Display Server (GSDS) program with default settings []. Motifs in MATE proteins were statistically identified using the online tool of Multiple EM for Motif Elicitation (MEME) [] ( with default settings: Motif Width: between 6 and 50 wide (inclusive). Site Distribution: zero or one occurrence (of a contributing motif site) per sequence. The maximum number of motif was set at 12 []. [...] The chromosomal locations of GmMATE genes were illustrated by MapChart []. Segmental and tandem duplication events of the soybean MATE family were identified using the Multiple Collinearity Scan toolkit (MCScan) [] from the Plant Genome Duplication Database [] with default settings: BLASTP was used to search for potential anchors (E <1e-5, top 5 matches) between every possible homolougous pair, and these pairs were used as the input for MCscan. Syntenic blocks were identified using the E-value ≤ 1e − 10 as a significance cutoff. Tandem duplication was defined as homologous genes with less than ten gene loci in-between and >50 % similarity at protein level on a single chromosome []. [...] All supporting datasets of this article are included as additional files and available at doi: 10.6070/H47M05ZF that were deposited in LabArchives [].Phylogenetic datasets have been deposited in TreeBase and are accessible via the URL: […]

Pipeline specifications

Software tools Clustal W, MEGA, GSDS, MEME, MEME Suite, JoinMap, MCScan, BLASTP, LabArchives, PhyloWS
Databases TreeBASE
Applications Miscellaneous, Genome annotation, Phylogenetics
Organisms Glycine max, Ilex paraguariensis
Diseases Drug-Related Side Effects and Adverse Reactions
Chemicals Aluminum, Flavonoids, Iron