Computational protocol: Identification of novel candidate genes for 46,XY disorders of sex development (DSD) using a C57BL/6J-YPOS mouse model

Similar protocols

Protocol publication

[…] DNA was isolated from peripheral blood using Gentra Puregene Blood Kit (Qiagen, USA) or saliva collected using ORAgene ORG-500 (DNAgenoteck, Canada). Sequencing libraries and exome capture was done for each sample following manufacturer’s protocols for SureSelect All Exon 50 Mb capture kit (Agilent Technologies) and Nextera Rapid Capture (Illumina, USA). Sequencing was performed on an Illumina HiSeq2500 as 50-100 bp paired-end run at the UCLA Clinical Genomics Center.The sequence reads, FASTQ files, were aligned to the human reference genome (GRCh37/hg19 Feb. 2009 assembly) using BWA (Burrows-Wheeler Alignment tool) [] and Novoalign ( The output BAM files were sorted and merged, and PCR duplicates were removed using Picard. INDEL (insertion and deletion) realignment and recalibration was performed using Genome Analysis Tool Kit (GATK) ( Both single-nucleotide variants (SNVs) and small INDELs were called within the Ensembl coding exonic intervals ± 2 bp using GATK’s Unified Genotyper, then recalibrated and filtered using GATK variant-quality score recalibration and variant filtration tools. All high-quality variants were annotated using SNP&Variation Suite and VarSeq—variant filtration and annotation software (Golden Helix, USA). All variants were filtered by a minor allele frequency (MAF) of < 1% and intersected with the DSD gene list to identify mutations in known DSD genes. The list is comprised of a primary gene list of well-annotated genes involved in sex determination and differentiation [], as well as a secondary list of genes that are more loosely associated with sex development, e.g., their OMIM (Online Mendelian Inheritance of Man) description contains sex development keywords.The variants identified by exome sequencing were classified into causative or likely causative variants following the recommendations of the American College of Medical Genetics and Genomics []. All other variants with minor allele frequency below 1% were classified as variants of unknown significance (VUS). To assess previously unreported missense variants, we used two in silico algorithms SIFT [] and PolyPhen [] to predict the pathogenicity of a missense variant based on conservation of the amino acid across species, the physical characteristics of the altered amino acid, and the possible impact on protein structure and function. All variants with low quality scores were validated by Sanger sequencing []. [...] RNA from each sample was submitted to the UCLA Neuroscience Genomic Core (UNGC) for library preparation and sequencing. Library preparation was performed using TruSeq Stranded Total RNA kit (Illumina) with Poly-A selection following manufacturer’s guidelines. Sequencing was performed on HiSeq 2500 (Illumina) with 69 bp paired-end run on a rapid flowcell capable of generating 150 M reads per lane. Four samples were multiplexed and sequenced over two rapid lanes with each sample receiving approximately 75 million reads with > 85% map rate.The generated sequencing reads were aligned to the mouse genome, version mm10 with STAR []. Transcript abundance was assessed by Cufflinks (v2.1.1) [], using a GTF file based on Ensembl mouse NCBI37. Differential expression analysis was based on fold change differences greater than 1.5 between the groups being compared. Differentially expressed genes were split into two categories: underexpressed and overexpressed in B6-YPOS males. Both categories were separately subjected to pathway enrichment analysis using Gene Ontology Consortium [].To analyze the RNA from AmhCre Sox9floxflox XY gonads and wild-type XY and XX gonads, libraries were generated using the NuGEN Mondrian Technology and SPIA amplification methodology, and the data was processed and aligned to the mouse genome (Ensembl version 38.77) as described by Rahmoun et al. []. To eliminate composition biases, the trimmed mean of M values (TMM) method was used for normalization between the samples []. The adjusted P value of 0.05 was used to assess which genes were differentially expressed between XY and AmhCre Sox9floxflox XY (Sox9 KO). Graphs of gene expression were made using GraphPad Prism. [...] Reverse transcription of RNA to cDNA was performed using Tetro cDNA Synthesis Kit (Bioline, UK) following manufacturer’s protocol. The primer sequences used are detailed in Additional file : Table S1. Primers were designed using autoprime software ( and spanned exon-exon junctions for optimal RNA quantification. cDNA was quantified using QuBit HS (Invitrogen) for double-stranded DNA, and a total of 3 ng of cDNA was used per sample for amplification. qPCR was carried out in duplicates using SensiFAST™ SYBR No-ROX Kit (Bioline, UK) by DNA Engine Opticon® 2 real-time PCR detection system (BioRad, USA). Reaction conditions were as follows: 95 °C for 10 min, then 40 cycles of 95 °C for 15 s, 60–64 °C (see Additional file : Table S1) for 10 s, and 72 °C for 15 s. Data was analyzed via Opticon Monitor Software (BioRad). Standard curves were generated from a mix of cDNA of all tested samples with five iterations of 1:4 dilutions. Average cycle threshold values (Ct) for each gene/sample were determined based on two replicates. Complementary DNA amounts were estimated based on Ct values and linear equation y = mx + b (where y is the Ct value, m is the slope, x is the cDNA amount, and b is the intercept). […]

Pipeline specifications

Software tools STAR, Cufflinks, AutoPrime
Applications RNA-seq analysis, qPCR
Organisms Mus musculus, Homo sapiens
Diseases Hypospadias, Disorders of Sex Development, Urogenital Abnormalities, Genetic Diseases, Inborn