Computational protocol: Evolutionary relationships and expression analysis of EUL domain proteins in rice (Oryza sativa)

Similar protocols

Protocol publication

[…] Protein sequences encoding the reference members of the different lectin families [Agaricus bisporus agglutinin (ABA), Q00022.3; Amaranthus caudatus agglutinin (amaranthin), AAL05954.1; Robinia pseudoacacia chitinase-related agglutinin (CRA), ABL98074.1; Nostoc ellipsosporum agglutinin (cyanovirin), P81180.2; Euonymus europaeus agglutinin (EUL), ABW73993.1; Galanthus nivalis agglutinin (GNA), P30617.1; Hevea brasiliensis agglutinin (hevein), ABW34946.1; Artocarpus integer agglutinin (JRL), AAA32680.1; Glycine max agglutinin (legume lectin), P05046.1; Brassica juncea LysM domain (LysM), BAN83772.1; Nicotiana tabacum agglutinin (nictaba), AAK84134.1; Ricinus communis agglutinin lectin chain (ricin-B), 2AAI_B] were used to perform BLAST searches against the Oryza sativa subsp. japonica genome (RGAP release 7) available from NCBI (https://blast.ncbi.nlm.nih.gov), MSU (Kawahara et al. ) and phytozome (https://phytozome.jgi.doe.gov), as described previously by Van Holle and Van Damme (). Top hits were used for a consecutive BLAST search. In addition the MSU database (Kawahara et al. ) was searched using the Pfam domain identifier [ABA: PF07367 (fungal fruit body lectin); amaranthin: PF07468 (agglutinin domain); CRA: PF00704 (glycol-hydro 18); cyanovirin: PF08881 (CVNH); EUL: PF14200 (ricin-lectin 2); GNA: PF01453 (B-lectin); hevein: PF00187 (chitin bind 1); JRL: PF01419 (jacalin); legume lectin: PF00139 (lectin legB); LysM: PF01476 (LysM domain); nictaba: PF14299 (PP2); ricin-B: PF00652 (ricin-B lectin)] of the different lectin domains. Protein sequences were downloaded from MSU (Kawahara et al. ) and screened for the presence of conserved protein domains using interproscan 5 (Mitchell et al. ). The program was downloaded (https://www.ebi.ac.uk/interpro/download/) and locally installed. Indica lectins were identified by BLAST searches with the lectin domains of the japonica hits against the indica rice genome (ASM465v1) available from EnsemblPlants (http://plants.ensembl.org). As for the japonica sequences, these protein sequences were analyzed for the presence of conserved protein domains using Interproscan 5 (Mitchell et al. ). Only sequences with at least one lectin domain were retained. The protein sequences of the lectins were analyzed for the presence of signal peptides using SignalP 4.1 (Petersen et al. ) and the presence of transmembrane domains was analyzed using TMHMM 2.0 (Krogh et al. ). [...] The putative lectin genes were mapped on the different chromosomes using the MapChart software (Voorrips ). The transcription start sites of the putative lectin genes were retrieved from the MSU database (Kawahara et al. ) and used for the construction of the map.Gene expansion through segmental duplication or tandem duplication was analyzed for the japonica lectin sequences. Tandem duplications were assigned if: 1. Both genes belong to the same lectin family, 2. There are no more than ten intervening genes and 3. They reside on the same chromosome. Segmental duplications were identified using the Plant Genome Duplication Database (PGDD) (Lee et al. ). Collinear blocks within the O. sativa subsp. japonica genome were determined by McScan v8 (http://chibba.agtec.uga.edu/duplication/index/files), the output data was downloaded and searched for presence of lectin genes. Duplicated genes with a Ks (synonymous substitution) value higher than 1.0 were omitted. [...] The genomic structure of the OsEUL genes was analyzed in GenomeView (Abeel et al. ) using the japonica RGAP release seven sequence. Visual representation of the genomic structure of the EUL genes was made in IBS 1.0.1 (http://ibs.biocuckoo.org/). Sequences of the intron boundaries were analyzed by Weblogo (http://weblogo.berkeley.edu/logo.cgi).Single nucleotide polymorphisms (SNP) in the EUL genes between different rice subspecies were identified using the SNP-Seek database containing SNP genotyping data (called against Nipponbare reference Os-Nipponbare-Reference-IRGSP-1.0) from the 3000 Rice Genomes Project (Alexandrov et al. ). The use of the Nipponbare reference allele in the different rice groups (in %) is represented in a heat map using the BAR HeatMapper Plus Tool (http://bar.utoronto.ca/ntools/cgi-bin/ntools_heatmapper_plus.cgi). Based on this data a dendrogram is constructed using DendroUPGMA (http://genomes.urv.cat/UPGMA/). For the generation of the dendrogram, the RMSD (Root Mean Square Deviation) distance coefficient has been used to compare between sets of variables. [...] Cis-regulatory elements were obtained from AGRIS (Palaniswamy et al. ), PLACE (Higo et al. ), Athamap (Steffens et al. ), CisBP (Weirauch et al. ), JASPAR (Mathelier et al. ) and TRANSFAC (Matys et al. ). These elements were mapped to 2 kb upstream promoter regions of all genes for Oryza sativa MSU RGAP 7 (Kawahara et al. ) using cluster-buster with –c option set to 0 (Frith et al. ).In order to reduce the high false positive rates associated with inferring regulatory interactions based on simple motif mapping, two approaches were used. A first approach consisted of filtering motif matches using cross-species sequence conservation or open chromatin regions. Therefore, motif matches were filtered using all 4 sets of conserved non-coding sequences from De Witte et al. (http://bioinformatics.intec.ugent.be/blsspeller/) (De Witte et al. ) and DNaseI-hypersensitive sites downloaded from Zhang et al. (). Only the motif matches that overlap by 50% with the above functional regions (open chromatin or conserved non-coding sequences) were retained (using intersectBed –f 0.5 from BEDTools; Quinlan and Hall ).For the second approach motif enrichment analysis was used. Motif mapping information was combined with a set of up to 200 co-expressed genes (according their Pearson correlation coefficient)from the TENOR database (Kawahara et al. ). With the help of the RAP-DB ID convertor tool (http://rapdb.dna.affrc.go.jp/tools/converter) the RAP DB identifiers were converted to MSU locus numbers and used for the subsequent motif enrichment analysis. During the process of conversion some of the annotated RAP DB genes were not identified in the MSU database which reduced the number of co-expressed genes to approximately 150 genes for each OsEUL gene. Motif enrichment on these co-expressed gene was determined using the hypergeometric test with false discovery rate correction. Only significantly (q-value < 0.05) enriched motifs also present in the OsEUL promoters were retained as enriched motifs. The GO enrichment analysis on the different OsEUL regulons was performed using the GO enrichment tool in the PLAZA workbench of PLAZA 3.0 Monocots (Proost et al. ). [...] Sequences corresponding to the different EUL domains of the OsEULs were extracted and aligned with MUSCLE using the default settings (Edgar ). Based on this alignment a maximum likelihood phylogenetic tree was build using RAxML v8.2.4. RAxML used the GTRGAMMA model with automated determination of the best amino acid substitution model (i.e. the model with the highest likelihood score on the starting tree), random number seed and distinct starting trees. Bootstrap iterations to assess the robustness of the generated trees were decided automatically by RAxML. The phylogenetic tree was displayed with FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree).To extend the phylogenetic analysis to other species, putative EUL sequences were identified by BLAST searches with the lectin domains of the japonica hits against the genomes of 8 monocot and 7 dicot species available from EnsemblPlants (http://plants.ensembl.org). The monocots included are Aegilops tauchii (ASM34733v1), Brachypodium distachyon (v1.0), Hordeum vulgare (barley) (ASM32608v1), Leersia perrieri (Lperr_V1.4), Musa acuminata (banana) (MA1), Setaria italic (JGIv2.0), Sorghum bicolor (Sorbi1) and Zea mays (corn) (AGPv4). Dicots include Arabidopsis thaliana (TAIR10), Glycine max (soybean) (V1.0), Medicago truncatula (MedtrA17_4.0), Populus trichocarpa (poplar) (JGI2.0), Prunus persica (peach) (Prupe1_0), Theobroma cacao (cacao) (Theobroma_cacao_20110822) and Vitis vinifera (grape) (IGGP_12x). The putative EUL lectin sequence from Cucumis sativus (cucumber) was provided by Dang and Van Damme (). These protein sequences were analyzed for the presence of conserved protein domains using Interproscan 5 (Mitchell et al. ). Proteins with an annotated EUL domain were retained for further analysis. Sequences with a truncated EUL domain were omitted from the analysis.The EUL domain of all putative EUL sequences from the monocot and dicot species under study was extracted and aligned with MUSCLE using the default settings (Edgar ). A logo of the EUL domain was generated with WebLogo 3 (http://weblogo.threeplusone.com/create.cgi). From the aligned sequences a maximum likelihood phylogenetic tree was built with RAxML v8.2.4 as described before. The FigTree v1.4.2 software was used to visualize and edit the phylogenetic tree. Reconciliation of the phylogenetic tree with the species tree was performed in Notung 2.9 (Stolzer et al. ). The species tree, containing all species from which EUL domains were sampled, was constructed in NCBI taxonomy (https://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi). […]

Pipeline specifications

Software tools InterProScan, SignalP, TMHMM
Databases Pfam
Application Membrane protein analysis
Organisms Oryza sativa, Oryza sativa Japonica Group