Computational protocol: Germline viral “fossils” guide in silico reconstruction of a mid-Cenozoic era marsupial adeno-associated virus

Similar protocols

Protocol publication

[…] For linker-mediated “genome walking” analysis, genomic DNA was extracted from kangaroo tissue samples (liver, muscle, and brain) using a FastPrep FP120 tissue homogeniser (ThermoSavant) utilising Lysing Matrix tubes (MP Biochemicals) according to the manufacturer’s instructions. Briefly, 50–75 mg of tissue was placed in a 2-ml tube containing MP Lysing Matrix A. To each tube, 360 μl of ATL buffer with proteinase K (DNeasy Blood & Tissue Kit; Qiagen) was added. Tissue was homogenised for 40 seconds at setting 5, followed by centrifugation at 10,000 × g for 2 minutes to collect fluid. Samples were incubated at 56 °C for 1 hour, and then centrifuged at 10,000 × g for 1 minute. Following centrifugation, 200 μl of the tissue homogenate was applied to a DNeasy column (Qiagen) and processed following the manufacturer’s instructions for animal tissue. To obtain endogenous AAV sequences, approximately 0.4 μg of genomic liver DNA was subjected to PCR amplification using Platinum Taq PCR SuperMix (Invitrogen) using combinations of previously reported primer pairs recognising conserved regions of the AAV genome (primers SIG+ and SIG− and primers AA55 and AA56). The thermal cycling conditions were 94 °C for 5 minutes followed by 35 rounds of 94 °C for 30 seconds, 55 °C for 30 seconds, and 72 °C for 30 seconds, with a final 5 minute extension at 72 °C. A positive PCR result was obtained from liver DNA using the following primer pair:“AA55” 5′-GTGCCCTTCTACGGCTGCGTCAACTGGACCAATGAGAACTTTCC-3′ and “SIG−” 5′-GAATCCCCAGTTGTTGTTGATGAGTC-3′. Upon identification of an endogenous AAV “anchor sequence”, linker-mediated “genome walking” was performed using the GenomeWalker Universal Kit (Clontech) according to the manufacturer’s instructions. Briefly, kangaroo liver DNA (2.5 μg) was digested overnight in separate 100-μl reactions containing individual restriction endonucleases (e.g., DraI, NruI, ScaI, or StuI) to yield blunt-ended genomic DNA fragments. The restriction enzyme digest was then heat inactivated at 70 °C for 10 minutes, and the genomic DNA fragments were partially purified using a PCR Kleen Spin column (BioRad). A GenomeWalker kit-provided adapter was ligated to enzyme-digested genomic DNA fragments overnight at 16 °C using T4 DNA ligase. Following heat inactivation (70 °C for 10 minutes), residual adapters were removed by passage over a PCR Kleen Spin column (BioRad) according to manufacturer’s instructions. Following nested PCR using unique sequence/adapter-specific primers pairs, amplified PCR products were “TA-cloned” into pCR4-TOPO (Invitrogen). Following bacterial transformation and antibiotic marker selection, individual colonies were expanded in small-scale liquid culture, and plasmid DNA was isolated for sequencing using a QIAprep Spin Kit (Qiagen).Genomic DNA for amplification across the mAAV-EVE1 locus was extracted from liver tissue samples using either a Gentra Puregene Tissue Kit (Qiagen) or a Blood & Cell Culture DNA Kit with Genomic-tip 100/G (Qiagen), in each case following the manufacturer’s protocols for DNA extraction from tissues. Amplification of the locus containing mAAV-EVE1 was initially achieved using a forward primer (AAV-EVE_flank_up, targeting the upstream flanking region: 5′-GATGTTTACAGATTAGTRTTKYATCATCAGTGCTATTTYCYCWCAAWRARRATYCC-3′) containing multiple degenerate positions to accommodate phylogenetically diverse marsupials, and a reverse primer (AAV-EVE_flank_dwn, targeting the downstream flanking region: 5′-AGGGAGAGTACCTATTATCTTAATTACTGTCAGACC-3′). The forward primer includes a 5′ non-homologous tail to facilitate reamplification. These primers amplified the locus (irrespective of its mAAV-EVE1 occupancy) from all sampled marsupials. Later, some macropodiform mAAV-EVE1 loci were amplified using a forward primer without degenerate positions (Macr(−335)flank_up: 5′-CCTGGAATTTGTGGGTGGAAACAATGATCC-3′), specifically targeted to Macropodiformes. Amplifications were carried out using the Expand Long Template PCR System (Roche) or a LongRange PCR Kit (Qiagen) according to the manufacturers’ instructions. Amplicons were gel-extracted using a Wizard SV Gel and PCR Clean-Up System (Promega) and cloned using the TOPO TA Cloning Kit for Sequencing, the Zero Blunt TOPO PCR Cloning Kit for Sequencing (Life Technologies; both in conjunction with One Shot TOP10 Chemically Competent E. coli cells), or the pGEM-T Easy Vector System I (Promega; in conjunction with XL10-Gold Ultracompetent Cells [Agilent]). Sanger sequencing of cloned inserts utilised primers directed against the cloning vectors, as well as internal, amplicon-specific primers, using an AB 3730xl instrument (Australian Genome Research Facility). A portion of the cloned mAAV-EVE1 sequences and “empty loci” were amplified using Platinum PCR SuperMix High Fidelity (Invitrogen) in conjunction with primers AAV-EVE_flank_up and AAV-EVE_flank_dwn. Amplified sequences were cloned by direct addition of a portion of the final PCR to topo-activated pCR4-TOPO (Invitrogen) without prior gel purification. Cloned fragments were transformed into bacterial strain DH10B (Invitrogen) by electroporation following desalting in a BioRad PCR Kleen Spin column according to manufacturer’s instructions. The cloned amplicons were sequenced at the U.S. Food and Drug Administration (FDA) Bethesda campus core facility, or by commercial vendor. [...] A maximum likelihood algorithm, as implemented in MEGA6.06, was used to infer ancestral mAAV-EVE1 nucleotide sequences from a multiple sequence alignment of sixteen macropodoid mAAV-EVE1 loci (M. robustus, M. rufus, M. parma, M. giganteus, M. eugenii, M. rufogriseus, O. unguifea, S. brachyurus, D, goodfellowi, D. matschiei, P. lateralis, T. stigmatica, L. fasciatus, A. rufescens, P. tridactylus, and H. moschatus), with the inclusion of “empty” mAAV-EVE1 loci from related non-macropodoid marsupials (P. breviceps, P. peregrinus, S. maculatus, T. vulpecula, L. latifrons, P. cinereus, D. marsupialis, and M. domestica) serving as an outgroup (). The relevant sequences determined as part of this study have been assigned GenBank accession numbers KX239848-KX239872. Briefly, the twenty-four member dataset was aligned using the MUSCLE multiple sequence alignment algorithm with default settings (gap open penalty = −400; gap extend penalty = 0; clustering method (all iterations) = UPGMB; minimum diagonal length (lambda) = 24). An isolated alignment of eight outgroup nucleotides (IUPAC nucleotide code: KGRTHACY) extant within the “empty locus” sequences (most likely representing nucleotides lost from the stem-macropodoid locus during the exogenous AAV integration event) was removed from the alignment. A short heterogeneous region of predominantly reiterated guanosine residues occurring within the 5′ portion of the mAAV-EVE rep gene was manually aligned. The most appropriate nucleotide substitution model was determined using the “Find Best DNA/Protein Models” function in MEGA, which determines the maximum likelihood fits of twenty-four evolutionary models given the data. For the mAAV-EVE1 dataset, the Tamura 3-parameter model with heterogeneity of substitution rates among sites modelled via a discrete Gamma distribution with five rate categories (i.e., T92 + G) gave the lowest Bayesian Information Criterion score (33153.754), and was chosen as the best nucleotide substitution model for further analysis. An mAAV-EVE1 evolutionary tree was constructed in MEGA using the maximum likelihood (ML) method (substitution model = T92 + G; gaps/missing data treatment = use all sites; ML heuristic method = nearest-neighbor interchange; initial tree for ML inference was generated automatically by maximum parsimony analysis). Most probable ancestral sequences at each node of the ML tree were exported as a “Detailed Text Export” file from the MEGA6 Tree Explorer module. The most probable ancestral sequences were extracted from the Detailed Text Export file using the command-line utility program, ExtAncSeqMEGA.exe. Due to genomic sequence deletions within mAAV-EVE1 loci occurring within the macropodoid basal taxon, H. moschatus, as well as the potoroids (A. rufescens and P. tridactylus), the full-length inferred mAAV-EVE1 sequence used for ancestral AAV modelling was derived from node 39 of the mAAV-EVE1 evolutionary tree (), occurring at the split between Lagostrophus fasciatus and the remainder of the Macropodidae at approximately 13.8 MYA. Reading frames encoding the rep and cap genes within the inferred ancestral sequence were identified by a BLAST search of translated nucleotide databases (tblastx) for significant homology to extant AAV proteins using an ancestral mAAV-EVE1 query sequence. Using homology among the translated mAAV-EVE1 ORFs and extant dependoparvovirus protein sequence alignments as a guide, the raw mAAV-EVE1 ancestral nucleotide sequence was manually edited for frameshifts, nonsense codons, and indels. In all but one instance, frameshifts within the “raw” inferred mAAV-EVE1 nucleotide sequence could be resolved by correction based upon a non-frameshifted member of the dataset. A frameshift occurring within mAAV-EVE1 rep codon 242 was corrected by arbitrary insertion of a dinucleotide sequence (TT). At various positions, the inferred mAAV-EVE1 ancestral sequence (Node 39 sequence; ) was manually edited to give precedence to nucleotides encoding amino acid residues among one or more mAAV-EVE1 sequences homologous with highly conserved extant AAV protein residues (). [...] Protein and nucleotide sequences were downloaded from the NCBI and ENSEMBL websites. Similarity searches were performed using the non-redundant protein sequence database at the NCBI and the BLAST program. Multiple nucleotide and protein sequence alignments were constructed using the MUSCLE program and then adjusted manually. For data analysis, phylogenetic trees based on multiple alignments were constructed using the maximum-likelihood, neighbor-joining, minimum-economy and maximum-parsimony methods as implemented in MEGA and FASTREE 2. An optimal model of substitutions for phylogenetic reconstructions was chosen using the MEGA5 program. Viewing the mAAV-EVE1 locus as a macropodoid gene, a maximum likelihood phylogenetic tree was constructed using mAAV-EVE1 sequence data from sixteen macropodoid species and compared to a maximum likelihood tree of the same species constructed using alignment of a set of seven concatenated exonic gene segments (ApoB, BRCA1, IRBP, Rag1, vWF, Protamine P1 and omega globin genes) available in GenBank.Phylogenetic analyses were carried out on the concatenated data set using maximum likelihood methodologies implemented in RAxML v7.2.8 and PAUP* 4.0b10 and Bayesian inference as implemented in MrBayes 3.2 with the data treated either as a single unpartitioned block or partitioned by gene (and mAAV-EVE1). This approach allowed each gene partition to have its own model of sequence evolution as determined by the Akaike Information Criterion in jModeltest. The GTR + Γ + I model was chosen for the unpartitioned data. Node support was estimated by 1,000 bootstrap pseudo-replications for RAxML and PAUP*. Bayesian analyses utilised random starting trees and two simultaneous runs of four Markov chains (one cold and three heated using default heating values) applied for 5 × 106 generations with sampling every 1,000th generation. The first 1.25 × 106 generations were discarded from each run as burn-in. The remaining trees were used to construct a majority-rule consensus with posterior probabilities >0.95 deemed as strong support. We evaluated the fit of our data to alternative phylogenetic relationships using the SH and KH tests implemented in PAUP* 4.0b10. [...] The inferred mAAV-EVE1 VP3 sequence was used to generate a 3D structure model with the AAV8 VP3 structure coordinates (RCSB PDB accession No. 2QA0) supplied as a reference template to the SWISS MODEL online 3D modelling server ( A comparison of the mAAV-EVE1 model to the AAV4 VP3 structure (RCSB PDB accession No. 2G8G) to identify VRs was conducted using the secondary structure matching (SSM) subroutine within PDBeFOld ( VP3 VRs were defined as stretches of two or more sequential Cα positions that are >1 Å apart (as previously defined in Govindasamy et al.). The structures were visualised in the COOT program for further comparison of the VRs between mAAV-EVE1, AAV2, AAV4, and AAV8. To enable description of the assembled mAAV-EVE1 capsid, the VP3 monomer model was used to generate a 60mer by icosahedral matrix multiplication in the Viperdb online server ( The VP3 and 60mer coordinates were used to generate secondary structure and capsid surface images, respectively, using the PyMol program. […]

Pipeline specifications