Computational protocol: Base Composition and Translational Selection are Insufficient to Explain Codon Usage Bias in Plant Viruses

[…] All sequences were formatted for analysis using ReadSeq ( CAICal [] was used to calculate the viral base composition. Reference sequences for each viral species were collected from GenBank on June 12, 2012. Sequences were formatted with ReadSeq, and CAICal was used to determine overall and site-specific base composition for each sequence. Observed third position nucleotide counts were averaged for each species. Expected third position nucleotide counts were computed for each gene/species combination we analyzed based on the genomic nucleotide frequencies of the species’ reference genome and the length of the ORF in the reference genome. Chi-square tests were used to evaluate the differences between these observed and expected counts, with three degrees of freedom (MS Excel). In total, sixty-seven chi-square tests were carried out: one on the CP gene of each potyvirus, luteovirus, and geminivirus we examined, and one on the Rep gene of each geminivirus we analyzed. […]

