PyRAD specifications


Unique identifier OMICS_03724
Name PyRAD
Software type Package/Module
Interface Command line interface
Restrictions to use None
Operating system Unix/Linux
Programming languages Python
Computer skills Advanced
Stability Stable
Maintained Yes



  Deren A. R. Eaton

We demultiplexed and processed Illumina reads using PyRAD v2.16.1. Sequences were clustered at 90% similarity within samples using USEARCH v7.0.1090 and aligned with MUSCLE v3.8.31. Error rate and heterozygosity were jointly estimated.

The 'holobiont' libraries were clustered using PyRAD v3.0.63 with the following parameters: (6) restriction overhang = GATC, (8) mindepth = 6, (9) NQual = 4, (10) clustering threshold = 0.85, (11) datatype = GBS, (12) MinCov = 11, (13) MaxSH = 3.

Reads with an average Phred score higher than 20 in the first 80 cycles were kept for analysis. The filtered reads were separated by sample based on corresponding barcode using PyRAD_v.2.01. Reads for each individual sample were then mapped to the draft genome of A. chinensis 'Hongyang' with Bowtie2-2.2.4 in "end-to-end very sensitive" mode.

We performed a phylogenetic analysis to demonstrate the utility of RAD sequences for determining relationships amongst taxa. PyRAD was used to identify and homologize RAD sequences with 10x or higher coverage present in three or more taxa, which produced 38,889 bp. A maximum likelihood analysis was performed.

There are several software for the assembly of RAD data supporting PE reads, such as Stacks, Rainbow, PyRAD and ddocent. However, many of these tools cannot directly and fully support RPE datasets with staggered PE reads.

As the previous phylogeny for the group was generated using mtDNA markers, the RAD markers were used to construct a nuclear-based phylogeny using PyRAD. PyRAD filters out potentially paralogous sequences by identifying contigs with more than a set number of heterozygous sites (default = 5) and with a heterozygous site shared between a set.

Given that one of the main aims of this study is the species delimitation of B. tabaci cryptic complex, we used PyRAD, an additional pipeline developed specifically for RADseq data looking at introgression and phylogenetic inferences. The advantage of this pipeline is that it takes into account the insertions.

We then employed the resulting contigs of variable length (i.e. between 96 and 180 bp) to build a catalogue of loci using PyRAD v.3.0.63. This software package allows for indels (expected in the phylogenetic framework of our study) and for unequal length of input sequences (resulting from overlapping read pairs).

After de novo assembly and quality filtering, including minimum coverage cutoffs and discarding populations represented by single individuals with PyRAD, the number of samples recovered was 84 for Baetis cf. sabahensis, 62 for Bungona sp., 91 for Liebebiella cf. vera, and 62 for Thalerosphyrus sinuosus.

PyRAD institution(s)
Committee on Evolutionary Biology, University of Chicago, IL, USA; Botany Department, Field Museum of Natural History, IL, USA
PyRAD funding source(s)
This research was supported by a National Science Foundation Dissertation Improvement Grant (DEB-1110598).

