Computational protocol: Identification of genes for engineering the male germline of Aedes aegypti and Ceratitis capitata

Similar protocols

Protocol publication

[…] Candidate testis-specifically expressed genes were identified from the total set of predicted genes by running a custom Python script on the output of the standard TopHat-Cufflinks-__STRONG_START__Cuffdiff RNA-seq analysis pipeline, and applying various filtering steps (described below) to maximise sensitivity whilst removing unsuitable genes and minimising false positives.An expression level of 10 FPKM (fragments per kilobase of exon per million fragments mapped) in the early sample for Ae. aegypti and the early spermatocytes sample for C. capitata was chosen as a threshold for candidates. A threshold was set as predicted genes with low expression are more likely to be false positives, and also regulatory elements associated with relatively strong expression are desired for use in synthetic constructs; 10 FPKM is the boundary between low and moderate expression for D. melanogaster RNA-seq data on FlyBase []. The threshold for expression in samples other than testis (gonadectomised male, ovary and female) was not set at zero, to allow for some noise in the data, but rather at 1 FPKM, based on quantification of the known testis-specifically expressed genes can, comr, nht and Taf12L in D. melanogaster (data not shown).Many potential candidates appeared to be short non-coding RNAs. Quantification of short non-coding RNAs is likely to be inaccurate in a protocol using polyA selection. Therefore the only genes taken forward for further analysis were those that either coincided with a locus already annotated as a protein-coding gene, or novel predicted genes that were over 1 Kb in length.After application of the filtering steps above, predicted testis-specifically expressed genes with higher expression in early spermatogenesis than in late spermatogenesis were identified. For Ae. aegypti, 57 candidate early genes were identified, out of a total of 388 predicted testis-specifically expressed genes with expression above 10 FPKM in the early sample. For C. capitata, 68 candidate early genes were identified, out of a total of 667 predicted testis-specifically expressed genes with expression above 10 FPKM in early spermatocytes.For each species, the top ten candidates in order of expression level in the earliest testis sample were taken forward for experimental testing. Genes encoding proteins associated with transposable elements were excluded, as there are likely to be multiple copies of these in the genome, and it would be difficult to design PCR primers that would target only one. For Ae. aegypti, one additional candidate was also taken forward, as a homologue of the gene was identified as a candidate in C. capitata; candidates that are conserved between species may simplify construct generation in different species. Lists of the candidate genes tested, and the annotated loci that they correspond to, if any, can be seen in Additional file . [...] To determine whether any of the candidates we identified were conserved between species, tBLASTx searches were performed, using the candidate sequences from one species as queries and all transcripts predicted by Cufflinks from the other species as a database. A D. melanogaster dataset was also used as a database, to provide further confidence in conservation, and also because more supporting information is available on D. melanogaster genes.These BLAST searches revealed one set of homologous testis-specifically expressed candidates and one set of homologous testis-specifically spliced candidates, with conservation between all three species in each case. The Ae. aegypti testis-specifically expressed candidate corresponding to the annotated locus AAEL009267 and the C. capitata testis-specifically expressed candidate corresponding to the annotated locus LOC101459316 are homologous, and both show homology to a D. melanogaster gene, CG7691, that was also identified as testis-specifically expressed, with higher expression early in spermatogenesis. AAEL009267 is annotated as a hypothetical protein, while LOC101459316 and CG7691 are predicted zinc finger proteins. The expression timing of AAEL009267 could not be confirmed due to a failed qRT-PCR, but higher expression of LOC101459316 in early spermatogenesis was confirmed. The Ae. aegypti testis-specifically spliced candidate corresponding to the annotated locus AAEL008110 (centrosomin) and the C. capitata testis-specifically spliced candidate corresponding to the annotated locus LOC101449153 (centrosomin-like) are homologous, and both show homology to the D. melanogaster gene for centrosomin, which is involved in centrosome assembly and is known to have a role in spermatogenesis and display testis-specific splicing in this species []. However, it should be noted that qRT-PCR results suggested low abundance in testis of the predicted testis-specific splice form compared to other splice forms for both AAEL008110 and LOC101449153, and thus they may not be suitable for use in synthetic constructs for the reasons discussed above. […]

Pipeline specifications

Software tools TopHat, Cufflinks, RAP, TBLASTX
Databases FlyBase
Applications RNA-seq analysis, Nucleotide sequence alignment
Organisms Homo sapiens, Aedes aegypti, Ceratitis capitata, Drosophila melanogaster
Diseases Nervous System Diseases