Similar protocols

Pipeline publication

[…] in more than one line were removed from all indel lists. To detect possible false negative replicate SNPs with origins in mutational hotspots (), we further grouped the replicate SNP positions by the count of the number of individuals containing the SNP position. For each group, we calculated the percentages for each substitution type and performed signal-to-noise analysis by using the G:C to A:T substitution percentage as the signal. Based on the signal-to-noise analysis, we then created a separate catalog of tentative false negative G:C to A:T SNPs. Lists of these positions, and the counts of observed SNPs and indels in the 586 lines, are provided as supplemental information., We used the SnpEff program (version 3.1) to predict and classify the effects of the detected SNPs and indels on sorghum gene function. To use SnpEff, we had to create a custom SnpEff database and entry for the sorghum genome reference assembly (version 2.1) by using the SnpEff build command with parameters “-gff3 -v.” Predicted effects of the detected SNPs and indels were computed by the SnpEff eff command with parameter option “-c.” We retained only the subset of variants predicted to have either medium or high impact on an encoded protein sequence. Finally, the genome-wide distribution of indel lengths was estimated using the vcftools program (version 0.1.14) with parameter option “–hist-indel-len” ()., We used the predicted protein sequences of Sorghum bicolor BTx623 (version 2.1), maize B73 (annotation version 5b.60), and Arabidopsis thaliana (TAIR release version 10) from Phytozome (version 9.1) to annotate the SNP-encoding genes. We created protein BLAST search databases for the maize and Arabidopsis sequences using the makeblastdb command in the BLAST suite of programs (; ; version 2.2.30+), with the nondefault parameters: “-input_type fasta –dbtype prot.” We aligned each sorghum protein sequence to the maize and Arabidopsis protein sequences using the blastp (protein BLAST) command with nondefault parameters: “-evalue 1E-05 -num_threads 16 -max_target_seqs 5 -outfmt 6 -seg yes.” The gene function description files for sorghum (version 2.1), maize B73 (version 5b.60), and Arabidopsis (TAIR version 10) were obtained from Phytozome, and annotations were appended as additional columns to the VCFs for the affected sorghum locus, along with the gene identifiers and annotations for the best maize hit and best Arabidopsis BLAST hit. The top hits for maize and Arabidopsis were appended only when BLAST e-values were less than a threshold value of E−30., We converted the formatted VCFs containing the nonreplicate and likely EMS-induced homozygous SN […]

Pipeline specifications

Software tools SnpEff, VCFtools, BLASTP
Databases Phytozome
Organisms Sorghum bicolor
Diseases Ataxia Telangiectasia, Dyskinesias, Brain Diseases, Telangiectasis, DNA Repair-Deficiency Disorders, Metabolism, Inborn Errors
Chemicals Hydrocarbons, Acyclic