Structure and Origin of the White Cap Locus and Its Role in Evolution of Grain Color in Maize

[…] A custom bac library was constructed from genomic DNA of the homozygous Wc reference stock and screened with a Ccd1 cDNA probe by the Bio S&T (Montreal, Canada). Two positive clones (19F and H10) were identified. The BAC clones were characterized by restriction mapping and partial sequencing of selected subcloned restriction fragments. Of the two clones, H10 extended farthest into the Wc locus, and was selected for complete sequencing and assembly using a combination of circular-consensus and linear long format reads from the PacBiosystems instrument. Trimmed circular-consensus sequence reads (>5000 bp) were assembled into contigs using CAP3 () and linear long reads were assembled using CANU (). The contigs were further evaluated and assembled manually to obtain an assembly (Genbank accession KX760165) that was consistent with the bac restriction map, bac end sequences, wgs analysis, and subclone sequences. [...] Frequencies of 22-mers in the B73 reference genome were profiled using JELLYFISH (). The resulting database was then queried with 22-mers from 39,424 genes in the maize filtered gene set ( to identify a subset of genic 22-mers that were single-copy in the B73 genome. Frequencies of the resulting set of 124 million single-copy, genic 22-mers were in turn profiled in wgs sequence data obtained from the Sequence Read Archive ( for each of 102 maize and teosinte accessions in the HapMap2 collection (). Gene copy numbers in each genome were then estimated by normalizing the average frequency of single-copy 22-mers from Ccd1r to the average frequency of 124 M genic single-copy 22-mers in wgs data for each inbred. The estimated effective sequence coverage of each genome is listed in Supplemental Material, Table S1. [...] The wgs data from HapMap2 genomes was searched for sequence reads that contained diagnostic features of the Wc locus and Ccd1r alleles using the Global search Regular Expression Print (GREP) utility. Simple text searches were made in both orientations using 18–22 base sequences that were unique to transposon insertion sites and other characteristic features of Wc or Ccd1r alleles. Sequence reads identified by text searches were then validated by full-length blastn alignment to the Wc bac assembly and B73 reference genome () sequences. […]

