Computational protocol: Genome-Wide Discovery and Analysis of Phased Small Interfering RNAs in Chinese Sacred Lotus

Similar protocols

Protocol publication

[…] The unique sequences in the small RNA libraries were mapped to the genome and cDNA sequences of Chinese sacred lotus with SOAP2 . A self-written program was used to scan the genome and cDNA sequences using a window of 210 nt or 240 nt (ten 21 nt or 24 nt) respectively. A two-nucleotide positive offset was used to calculate the positions of siRNAs on the anti-sense strand because the existence of two-nucleotide over-hang at the 3′-end of siRNA duplex , , , . Then a P-value was calculated for each of the windows using a modified version of methods in ,(1)where was the number of unique 21 nt (or 24 nt) sRNAs mapped within a window, was the number of phased unique 21 nt (or 24 nt) sRNAs within the window, and was the number of phases. Similar to previous work , was set to 10 in this study.And a phase score was calculated for each position of the genome and cDNA sequences using the method in . For a window started at a position with more than three phased unique sRNAs, i.e., when ,(2)where was the number of phased reads at the ith phase from the position, was the number of non-phased reads at the ith phase from the position, and was the number of phases in the window, and was the number of unique phased siRNAs in the window. was 10 in this study.The window with a P-value less than 0.05 was extended 100 base pairs at both 5′- and 3′-ends, then the overlapped windows were merged. The P-values of the merged windows were used to calculate the false positive rates using the method in . The merged windows with a maximal phase scores of larger than pre-determined threshold and multiple test corrected P-values of smaller than 0.05 were reported as PHAS loci. The predicted PHAS were named with its chromosome (or scaffold) and a unique serial number for each chromosome. The neighboring PHAS loci were predicted as PHAS clusters if the distances between individual PHAS loci were smaller than 2,000 base pairs. The phased siRNAs of the predicted PHAS loci were reported as phasiRNAs. The phasiRNAs of a PHAS loci were named by adding siR and a serial number to the name of the PHAS loci.The miRNA binding sites on PHAS and the targets of predicted phasiRNAs were predicted with the HitSensor algorithm . For 21 nt/22 nt miRNAs and phasiRNAs, targets with less than 4 mismatches were kept for analysis. For 24 nt miRNAs and phasiRNAs, targets with less than 6 mismatches were maintained for analysis.We combined the annotation of genes of Chinese sacred lotus in with alignment results of predicted PHAS sequences to the NCBI Nucleotide Collection (nr/nt) database and the TIGR Repeat database .The phylogenetic trees of the predicted TAS3, TAS4 loci and their derived tasiRNAs were constructed with the Bootstrap Neighboring-Joining algorithm implemented in ClustalX (version 2.1) and visualized with TreeView . […]

Pipeline specifications

Software tools Clustal W, TreeViewX
Application Phylogenetics
Organisms Nelumbo nucifera