Computational protocol: The 3-Base Periodicity and Codon Usage of Coding Sequences Are Correlated with Gene Expression at the Level of Transcription Elongation

[…] Three principal classes of random sequence models were generated:RandNuc: random nucleotide sequences generated using a 0-order Markov model with a specified probability for each nucleotide.RandCod: random codon sequences generated using a 0-order Markov model with a specified probability for each codon.ShufCod: random sequences produced by randomly shuffling codons within each coding sequence deprived of its first and last nucleotide triplets.RandNuc and RandCod sequences were generated by the software GenRGenS . ShufCod sequences were generated using the Fisher-Yates algorithm . For simulation procedures, nucleotide and codon frequencies from native CDSs were computed without considering their first and last codons. As an example, each simulation related to the coding sequences of M. tuberculosis produced 5000 RandNuc or RandCod sequences of 2866752 nucleotides, which is the total length of CDSs used in this work. Before calculating DXY frequency distributions, each of the 5000 simulated sequences was fragmented into 2388 parts to reproduce the number and length of real CDSs. In total, 5000 sets of ShufCod sequences were generated by 5000 random permutations of codons within each coding sequence. [...] The statistical analysis was performed using standard parametric and non-parametric tests included in the Statistica package (version 8.0, Statsoft, Inc.). The statistical significance of the difference between the PiCUF mean value of native and shuffled CDSs was evaluated by simulating the distribution of the mean of 5000 sets of shuffled CDS. The normality of the distribution was tested using the Shapiro-Wilk normality test and normal probability plot. […]

Software tools GenRGenS, Statistica
Applications Synthetic biology, Miscellaneous
Organisms Saccharomyces cerevisiae