Computational protocol: Influence of Wolbachia on host gene expression in an obligatory symbiosis

Similar protocols

Protocol publication

[…] All clones from the libraries were sequenced using the Sanger method (Genoscope, Evry, France), and have been deposited in the Genbank database (Normalized library: FQ829929 to FQ844492; OS: FQ848737 to FQ857191; OA1: FQ844493 to FQ848736; OA2: FQ790408 to FQ793875 and FQ859091 to FQ859175; SSH2-C: FQ828348 to FQ829118; SSH2-NC: FQ829119 to FQ829928; SSH2-A: JK217526 to JK217700 and JK217743 to JK217748; SSH2-S: JK217375 to JK217525 and JK217729 to JK217742; SSH1-S: JK217749 to JK217767; SSH1-A: JK217701 to JK217728). A general overview of the Expressed Sequence Tags (ESTs) data processing is given in Figure . Raw sequences and traces files were processed with Phred software [,] in order to eliminate any low quality bases in sequences (score < 20). Sequence trimming, which includes polyA tails/vector/adapter removal, was performed by Cross_match. Chimeric sequences were computationally digested into independent ESTs.Clustering and assembly of the ESTs were performed with TGICL [] to obtain putative unique transcripts (unigenes) composed of contiguous ESTs (contigs) and unique ESTs (singletons). To do this, a pairwise comparison was first performed using a modified version of megablast (minimum similarity 94%). Clustering was done with tclust, which proceeds by a transitive approach (minimum overlap: 60 bp at 20 bp maximum of the end of the sequence). Assembly was done with CAP3 (minimum similarity 94%).To detect unigene similarities with other species, several blasts (with high cut-off e-values) were performed against the following databases: NCBI nr (blastx (release: 1 March 2011); e-value < 5, HSP length > 33aa), Refseq genomic database (blastn, e-value < 10), Unigene division Arthropods (tblastx, #8 Aedes aegypti, #37 Anopheles gambiae, #3 Apis mellifera, #3 Bombyx mori, #53 Drosophila melanogaster, #9 Tribolium castaneum; e-value < 5), Nasonia vitripennis Nvit OGS_v1.0 (CDS predicted by Gnomon (NCBI)) and Wolbachia sequences from Genbank (blastn (release 164); e-value < e-20). Gene Ontology annotation was carried out using Blast2go software []. During the first step (mapping), a pool of candidate GO terms was obtained for each unigene by retrieving GO terms associated with the hits obtained after a blastx search against NCBI nr. During the second step (annotation), reliable GO terms were selected from the pool of candidate GO terms by applying the Score Function (SF) of Blast2go with permissive annotation parameters (EC_weight=1, e-value_filter=0.1, GO_weight=5, HSP/hit coverage cut-off=0%). In the third step of the annotation procedure, the pool of GO terms selected during the annotation step was merged with GO terms associated with Interpro domain (Interpro predictions based on the longest ORF). Finally, the Annex augmentation step was run to modulate the annotation by adding GO terms derived from implicit relationships between GO terms [].In order to extract the biological processes and molecular functions statistically over-represented in aposymbiotic libraries, we performed a hyper-geometrical test between GO terms from the aposymbiotic libraries (OA1 and OA2) and those from the OS library, which corresponds to natural physiological conditions. The p-values were then adjusted using Bonferroni’s correction. To perform a functional enrichment analysis of the unigenes extracted from the SSH, we used the FatiGO web tool [] on the OS library. With respect to the GO analysis, levels 3 and 6 were chosen to describe biological processes, and level 4 was chosen to describe molecular functions. […]

Pipeline specifications

Software tools TGICL, BLASTN, BLASTX, TBLASTX, Blast2GO, Babelomics
Applications Transcription analysis, Nucleotide sequence alignment
Diseases Substance-Related Disorders