[…] of Arabidopsis thaliana, Oryza sativa, and Sorghum bicolor using default settings. Sequences with similarity higher than 70% to known proteins were considered as candidate novel genes., We filtered low quality reads before mapping, including reads: (1) with more than 5% Ns or As; (2) with 20 bp or more having quality lower than 7; (3) contaminated by the adaptor (i.e., more than 10 bp aligned to the adaptor when allowing fewer than 3 bp mismatches); (4) with the first and second ends overlapping (10 bp aligned when allowing 10% mismatches); (5) with both ends identical to other reads, suggesting PCR duplication. The filtered reads were then aligned to the B73 reference genome sequence using SOAPaligner (version 2.18; ). We set the mapping parameter to allow at most two mismatches within one read. The sequencing depth and coverage were then calculated based on the alignment result. To detect small InDels, we conducted further mapping, setting the parameter ‘-g 5,’ which allowed less than or equal to 5 bp gaps., To detect SNPs, we used SOAPsnp software (version 1.03; ) using the following criteria: (1) The quality score given by SOAPsnp (which takes sequencing quality and ambiguity into consideration) should be greater than 20; (2) the sequencing depth should be higher than 2; (3) SNPs within 5 bp of nearby SNPs were considered as other variations; (4) SNPs with copy number greater than 1.5. SNPs meeting these filtering criteria were considered to be supported by high quality data, and to minimize possible errors caused by repeated sequences., For short InDels’ the ‘gaps allowed’ mapping result above was subjected to InDel detection using SOAPindel (), and gaps supported by more than one third of the mapped reads were retained as qualified InDels. In this study, InDels are defined as insertions or deletions of 5 bp or less in length, while insertions or deletions of more than five nucleotides are classified as SVs., The identified SNPs and InDels were classified based on occurrence in coding regions, non-coding regions, start codons, stop codons or splice sites. SNPs in coding sequences were annotated as synonymous or non-synonymous. GO or PFAM annotation of genes was also conducted., Structural variations were detected using SOAPsv (version 1.02), for InDel (more than 5 bp in length), replication, reversion, transposition, and other variations. The minimum read depth was set to 3., We first calculated the distances between adjacent SNPs, and the distribution of interval distance was depicted. Genes located between two adjacent SNPs were extracted as candidate genes without SNPs. Then, to further verify that there were no SNPs and other variations within those genes, we filtered the genes located in, or overlapping with, repeated elements (according to the annotation information), and genes that were not sequenced (no reads mapped to the gene region)., We obtained the maize HapMap SNPs (in B73 RefGen_v1 coordinat […]

Software tools SOAPaligner, SOAPsnp, SOAPindel, SOAPsv
Organisms Zea mays