Computational protocol: Genome-Wide Survey Reveals Transcriptional Differences Underlying the Contrasting Trichome Phenotypes of Two Sister Desert Poplars

[…] Candidate genes related to trichome formation in P. euphratica [], P. pruinosa () and P. trichocarpa [] were identified using the BLASTP program (E values < 10−50) [] and HMMER software [] with similarity over 90%. Trichome-formation genes from Arabidopsis downloaded from the Arabidopsis genome website TAIR 9.0 ( were used as query sequences. Phylogenetic trees were constructed using the genes identified as being involved in trichome formation in poplars and Arabidopsis in order to determine the relationships between these genes. Candidate genes in P. trichocarpa, which have sparse trichome density, were used as a reference to explore gene expansion and lose in poplar. [...] Candidate genes from the four species were aligned by MUSCLE [] and then adjusted manually before construction of phylogenetic trees. The best-fitting evolutionary models were predicted using the program ProtTest 3.0 []. Gene trees for the different transcription factors were estimated, using a maximum likelihood (ML) approach with the best-fitting models and 1000 replicates, by RAxML software [] and constructed using MEGA 6 []. Chromosomal distribution analysis was performed using the method described in Ma et al. []. [...] Pairwise alignments of the nucleotide sequences of homologous genes were performed using the Probabilistic Alignment Kit (PRANK) software package []. The nonsynonymous substitution (dN or Ka) and synonymous substitution (dS or Ks) values for homologous genes were estimated by the YN00 program in Phylogenetic Analysis Using Maximum Likelihood (PAML) []. The synonymous substitution rates (Ks) for homologous genes would be expected to be similar over time and could be used as a proxy for time in order to estimate the dates at which segmental duplication events occurred. The Ks value was calculated for each of the gene pairs and used to calculate the approximate date of each duplication event (T = Ks/2λ), assuming a clocklike rate (λ) of synonymous substitution of 9.1 × 10−9 substitutions/synonymous site/year for Populus []. [...] Raw reads were cleaned firstly by removing exact duplicates obtained from both sequencing directions and secondly by removing adapter sequences and reads for which unknown base calls (N) represented more than 5% of all bases; low complexity reads; and reads with high proportions of low-quality bases (>45% of the bases with a quality score ≤7).To determine levels of gene expression, Bowtie 2 [] was used to align RNA-Seq reads to each poplar genome. Transcript abundances were calculated using eXpress [], which outputs read counts and the number of fragments per kilobase of exon per million fragments mapped (FPKM) [], and the average FPKM values were calculated from three biological replicates. The FPKM values for genes related to trichome production were log2-transformed and used for heat map generation with the pheatmap package in R. […]

