Pipeline publication

[…] gainst 93,691 ESTs showed that 99.11% of them were aligned, with 97.40% of the ESTs having at least 90% of their lengths covered in the alignments. Using 20,847 BAC-end read pairs, it was found that 94.92% of the reads were paired in the same scaffold with a mean insert length near the 100 kb mark (), and 97.87% of the reads were paired in the same pseudo-chromosome. RNA-seq reads from six different tissues including leaf, flower, embryo, stem, root and seed coat tissues, when aligned against the assembled sequence, showed that around 94.7 and 96% of the read pairs were aligned in the embryo sample and the remaining five samples, respectively (). The high quality of the assembly verified by CEGMA and BUSCO was corroborated by the ESTs and BAC-end sequences. Five whole BAC sequences (approximately 100 kb in length) were also completely covered in the scaffolds with minor in-dels (). One of the BAC sequences included 12.6 kb of the Tpn1 family transposon, TpnA2 (see below), suggesting that repetitive elements with high copy numbers and relatively long sequences were successfully determined. The SOAPdenovo assembly was also able to cover the five BAC sequences, but with large in-dels and an increased number of mismatches, indicating that per-base resolution was better in the assembly using PacBio reads. Telomeric repeats, centromeric repeats, and rDNA arrays were identified to further analyse the contiguity of the assembly. Thirty scaffolds, with telomeric repeat units (AAACCCT) in the range of 47.1 to 4,613.9 repeating units, were identified, of which 13 were completely covered by the tandem repeats and could not be incorporated into the linkage maps (). Pseudo-chromosomes 2, 6, 8 and 14 were found to have telomeric repeats at both the ends, while pseudo-chromosomes 3, 4, 5, 9, 10, […]

Pipeline specifications

Software tools FigTree, MCScanX, PAML
Organisms Ipomoea nil