Computational protocol: Expression patterns of transcribed human endogenous retrovirus HERV-K(HML-2) loci in human tissues and the need for a HERV Transcriptome Project

Similar protocols

Protocol publication

[…] RNA was isolated using Trizol (Invitrogen), following the manufacturer's recommendation. RNA was subsequently treated with DNase I (Roche) to remove residual traces of genomic DNA. Complete removal of DNA was verified by an Alu element-specific PCR []. DNA-free RNA was used for subsequent cDNA generation, employing the Omniscript RT Kit (Qiagen) and random hexanucleotide primers following the manufacturer's recommendations. For each cDNA preparation control reactions without reverse transcriptase were included. Subsequent PCR employed the following HERV-K(HML-2) specific primers to generate gag and env gene-derived PCR products: gag_plus (5'GGCCATCAGAGTCTAAACCACG3'; nt 1626–1647 in HERV-K(HML-2.HOM); Genbank accession number AF074086); gag_minus (5'GCAGCCCTATTTCTTCGGACC3'; nt 2242–2262); env8146F (5'AATGAGTCTGAGCATCACTGGG3'; nt 8146–8167), and env8665R (5'CCATTCAACTCTGAGTGGACACAG3'; nt 8665–8688). The PCR mix consisted of 0.5 μM of each PCR primer; 1.5 mM MgCl2; 0.2 mM dNTPs; 2.5 units Taq polymerase (Invitrogen); 1 × PCR buffer in a total volume of 50 μl. PCR cycling conditions were as follows: initial denaturation 5 min. 94°C; 40 cycles 1 min. 94°C; 45 sec. 57°C; 1 min. 72°C; final elongation 10 min. 72°C. Generation of cDNA from brain tissues from normal, bipolar and schizophrenic conditions was described previously []. RT-PCR products were subsequently purified using spin prep columns (PeqLab), ligated into the pGEM T-Easy vector (Promega) and transformed into DH5α or TOP10F' bacterial cells. Insert-containing clones were identified by standard colony-PCR using above primer combinations. Plasmid DNA of positive clones was isolated using a standard column procedure (QIAprep Spin Miniprep Kit; Qiagen). Sequences of cloned cDNAs were obtained using vector-specific sequencing primers and an Applied Biosystems 3730 × Capillary Sequencer (Institut für Immunologie und Genetik, Kaiserslautern, Germany). Quality of sequences was assessed using CodonCode Aligner (CodonCode Corporation, Dedham, MA, USA) and FinchTV (Geospiza Inc., Seattle, WA, USA) and corrected if necessary. Poor quality sequence reads were excluded. [...] Sequences of HML-2 proviral loci, including remnants of proviral loci, were collected from the human genome sequence as given at the Human Genome Browser March 2006 version [] by using the HERV-K(HML-2.HOM) sequence [] as probe for BLAT searches []. Matching sequence portions plus flanking sequences were retrieved, aligned using DiAlign [] and MAFFT [] and the alignment was manually optimized using Se-Al . Gag and env subregions were extracted from the alignment and used as reference sequences for further analysis. Locus-Assigner is an in-house Bio-Python script for assigning experimental HERV cDNA sequences to HERV reference sequences. The script is available from the authors on request. The strategy of assigning cDNA sequences to specific proviruses was based on private nucleotide differences between the various HML-2 sequences (Fig. ). Private means one or several nucleotides that are characteristic for one HML-2 locus when compared to all other loci. Ideally, a cDNA will be identical to the corresponding proviral sequence that generated the original transcript, but dissimilar to all other proviral loci. Locus-Assigner uses as input an experimental cDNA sequence file and a reference sequence file and generates all possible pairs of experimental cDNA sequences and reference sequences that are saved as individual fasta files. Locus-Assigner then generates pairwise sequence alignments for each cDNA sequence/reference sequence pair using CLUSTAL W. Pairwise alignments are saved as separate files. For each of those files, the total amount of nt differences between the aligned pair of sequences is calculated. Mismatches due to start and end gaps were ignored, as some cDNA and reference sequences lacked portions at the 5' or 3' end. For each experimental sequence Locus-Assigner then creates a tab-separated file summarizing detected nucleotide differences compared to all reference sequences. Using appropriate software, nt differences can be sorted, thus revealing for each cDNA sequence the most closely related reference locus, that is, the provirus that most likely generated the original transcript. The Locus-Assigner script was verified with a number of test sequences that were run against the HML-2 reference sequence dataset. Test sequences either derived from genomic HML-2 sequences or from cDNA sequences with known amounts of nt differences, 5', 3', and/or internal gaps. Locus-Assigner results were further verified by comparison with results from probing test sequences by BLAT at the UCSC Human Genome Browser, by visual inspection of alignments and by checking several of the Locus-Assigner cDNA/HML-2 locus assignments. cDNA sequences with 18 or more mismatches with their best matching reference sequence were excluded from further analysis because they very likely represent recombined cDNAs from different proviral transcripts that arose ex vivo during cDNA generation []. […]

Pipeline specifications

Software tools CodonCode Aligner, FinchTV, BLAT, DIALIGN, MAFFT, Se-Al, Clustal W
Applications Sanger sequencing, Nucleotide sequence alignment, Genome data visualization
Organisms Homo sapiens
Diseases Neoplasms, Neoplasms, Germ Cell and Embryonal