Features
Company
Pricing
Update of the Anopheles gambiae PEST genome assembly
An update on the Anopheles gambiae PEST genome assembly places about 33% of previously unmapped sequences on the chromosomes. An update on the Anopheles gambiae PEST genome assembly places about 33% of previously unmapped sequences on the chromosomes.Background The genome of Anopheles gambiae, the major vector of malaria, was sequenced and assembled in 2002. This initial genome assembly and analysis made available to the scientific community was complicated by the presence of assembly issues, such as scaffolds with no chromosomal location, no sequence data for the Y chromosome, haplotype polymorphisms resulting in two different genome assemblies in limited regions and contaminating bacterial DNA. Results Polytene chromosome in situ hybridization with cDNA clones was used to place 15 unmapped scaffolds (sizes totaling 5.34 Mbp) in the pericentromeric regions of the chromosomes and oriented a further 9 scaffolds. Additional analysis by in situ hybridization of bacterial artificial chromosome (BAC) clones placed 1.32 Mbp (5 scaffolds) in the physical gaps between scaffolds on euchromatic parts of the chromosomes. The Y chromosome sequence information (0.18 Mbp) remains highly incomplete and fragmented among 55 short scaffolds. Analysis of BAC end sequences showed that 22 inter-scaffold gaps were spanned by BAC clones. Unmapped scaffolds were also aligned to the chromosome assemblies in silico, identifying regions totaling 8.18 Mbp (144 scaffolds) that are probably represented in the genome project by two alternative assemblies. An additional 3.53 Mbp of alternative assembly was identified within mapped scaffolds. Scaffolds comprising 1.97 Mbp (679 small scaffolds) were identified as probably derived from contaminating bacterial DNA. In total, about 33% of previously unmapped sequences were placed on the chromosomes. Conclusion This study has used new approaches to improve the physical map and assembly of the A. gambiae genome. Background The genome of Anopheles gambiae, the major vector of malaria, was sequenced and assembled in 2002. This initial genome assembly and analysis made available to the scientific community was complicated by the presence of assembly issues, such as scaffolds with no chromosomal location, no sequence data for the Y chromosome, haplotype polymorphisms resulting in two different genome assemblies in limited regions and contaminating bacterial DNA. Results Polytene chromosome in situ hybridization with cDNA clones was used to place 15 unmapped scaffolds (sizes totaling 5.34 Mbp) in the pericentromeric regions of the chromosomes and oriented a further 9 scaffolds. Additional analysis by in situ hybridization of bacterial artificial chromosome (BAC) clones placed 1.32 Mbp (5 scaffolds) in the physical gaps between scaffolds on euchromatic parts of the chromosomes. The Y chromosome sequence information (0.18 Mbp) remains highly incomplete and fragmented among 55 short scaffolds. Analysis of BAC end sequences showed that 22 inter-scaffold gaps were spanned by BAC clones. Unmapped scaffolds were also aligned to the chromosome assemblies in silico, identifying regions totaling 8.18 Mbp (144 scaffolds) that are probably represented in the genome project by two alternative assemblies. An additional 3.53 Mbp of alternative assembly was identified within mapped scaffolds. Scaffolds comprising 1.97 Mbp (679 small scaffolds) were identified as probably derived from contaminating bacterial DNA. In total, about 33% of previously unmapped sequences were placed on the chromosomes. Conclusion This study has used new approaches to improve the physical map and assembly of the A. gambiae genome.
Genome sequence of the date palm Phoenix dactylifera L
The date palm is one of the most economically important plants of the palm family. Here, the authors present a high-quality genome assembly of the date palm Phoenix dactylifera, and reveal insights into the unique sugar metabolism underlying fruit ripening. The date palm is one of the most economically important plants of the palm family. Here, the authors present a high-quality genome assembly of the date palm Phoenix dactylifera, and reveal insights into the unique sugar metabolism underlying fruit ripening.Date palm (Phoenix dactylifera L.) is a cultivated woody plant species with agricultural and economic importance. Here we report a genome assembly for an elite variety (Khalas), which is 605.4 Mb in size and covers >90% of the genome (~671 Mb) and >96% of its genes (~41,660 genes). Genomic sequence analysis demonstrates that P. dactylifera experienced a clear genome-wide duplication after either ancient whole genome duplications or massive segmental duplications. Genetic diversity analysis indicates that its stress resistance and sugar metabolism-related genes tend to be enriched in the chromosomal regions where the density of single-nucleotide polymorphisms is relatively low. Using transcriptomic data, we also illustrate the date palm’s unique sugar metabolism that underlies fruit development and ripening. Our large-scale genomic and transcriptomic data pave the way for further genomic studies not only on P. dactylifera but also other Arecaceae plants. Date palm (Phoenix dactylifera L.) is a cultivated woody plant species with agricultural and economic importance. Here we report a genome assembly for an elite variety (Khalas), which is 605.4 Mb in size and covers >90% of the genome (~671 Mb) and >96% of its genes (~41,660 genes). Genomic sequence analysis demonstrates that P. dactylifera experienced a clear genome-wide duplication after either ancient whole genome duplications or massive segmental duplications. Genetic diversity analysis indicates that its stress resistance and sugar metabolism-related genes tend to be enriched in the chromosomal regions where the density of single-nucleotide polymorphisms is relatively low. Using transcriptomic data, we also illustrate the date palm’s unique sugar metabolism that underlies fruit development and ripening. Our large-scale genomic and transcriptomic data pave the way for further genomic studies not only on P. dactylifera but also other Arecaceae plants.
Re annotation and re analysis of the Campylobacter jejuni NCTC11168 genome sequence
Background Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Results Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Conclusions Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes. Background Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Results Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Conclusions Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.
Complete genome sequence of Mesorhizobium ciceri bv. biserrulae type strain (WSM1271T)
Mesorhizobium ciceri bv. biserrulae strain WSM1271T was isolated from root nodules of the pasture legume Biserrula pelecinus growing in the Mediterranean basin. Previous studies have shown this aerobic, motile, Gram negative, non-spore-forming rod preferably nodulates B. pelecinus – a legume with many beneficial agronomic attributes for sustainable agriculture in Australia. We describe the genome of Mesorhizobium ciceri bv. biserrulae strain WSM1271T consisting of a 6,264,489 bp chromosome and a 425,539 bp plasmid that together encode 6,470 protein-coding genes and 61 RNA-only encoding genes. Mesorhizobium ciceri bv. biserrulae strain WSM1271T was isolated from root nodules of the pasture legume Biserrula pelecinus growing in the Mediterranean basin. Previous studies have shown this aerobic, motile, Gram negative, non-spore-forming rod preferably nodulates B. pelecinus – a legume with many beneficial agronomic attributes for sustainable agriculture in Australia. We describe the genome of Mesorhizobium ciceri bv. biserrulae strain WSM1271T consisting of a 6,264,489 bp chromosome and a 425,539 bp plasmid that together encode 6,470 protein-coding genes and 61 RNA-only encoding genes.
Complete Genome Sequences of a Mycobacterium smegmatis Laboratory Strain (MC2 155) and Isoniazid Resistant (4XR1/R2) Mutant Strains
We report the whole genome sequences of a Mycobacterium smegmatis laboratory wild-type strain (MC2 155) and mutants (4XR1, 4XR2) resistant to isoniazid. Compared to Mycobacterium smegmatis MC2 155 (NC_008596), a widely used strain in laboratory experiments, the MC2 155, 4XR1, and 4XR2 strains are 60, 128 and 93 bp longer, respectively. We report the whole genome sequences of a Mycobacterium smegmatis laboratory wild-type strain (MC2 155) and mutants (4XR1, 4XR2) resistant to isoniazid. Compared to Mycobacterium smegmatis MC2 155 (NC_008596), a widely used strain in laboratory experiments, the MC2 155, 4XR1, and 4XR2 strains are 60, 128 and 93 bp longer, respectively.
Transcriptional response of the model planctomycete Rhodopirellula baltica SH1T to changing environmental conditions
Background The marine model organism Rhodopirellula baltica SH1T was the first Planctomycete to have its genome completely sequenced. The genome analysis predicted a complex lifestyle and a variety of genetic opportunities to adapt to the marine environment. Its adaptation to environmental stressors was studied by transcriptional profiling using a whole genome microarray. Results Stress responses to salinity and temperature shifts were monitored in time series experiments. Chemostat cultures grown in mineral medium at 28°C were compared to cultures that were shifted to either elevated (37°C) or reduced (6°C) temperatures as well as high salinity (59.5‰) and observed over 300 min. Heat shock showed the induction of several known chaperone genes. Cold shock altered the expression of genes in lipid metabolism and stress proteins. High salinity resulted in the modulation of genes coding for compatible solutes, ion transporters and morphology. In summary, over 3000 of the 7325 genes were affected by temperature and/or salinity changes. Conclusion Transcriptional profiling confirmed that R. baltica is highly responsive to its environment. The distinct responses identified here have provided new insights into the complex adaptation machinery of this environmentally relevant marine bacterium. Our transcriptome study and previous proteome data suggest a set of genes of unknown functions that are most probably involved in the global stress response. This work lays the foundation for further bioinformatic and genetic studies which will lead to a comprehensive understanding of the biology of a marine Planctomycete. Background The marine model organism Rhodopirellula baltica SH1T was the first Planctomycete to have its genome completely sequenced. The genome analysis predicted a complex lifestyle and a variety of genetic opportunities to adapt to the marine environment. Its adaptation to environmental stressors was studied by transcriptional profiling using a whole genome microarray. Results Stress responses to salinity and temperature shifts were monitored in time series experiments. Chemostat cultures grown in mineral medium at 28°C were compared to cultures that were shifted to either elevated (37°C) or reduced (6°C) temperatures as well as high salinity (59.5‰) and observed over 300 min. Heat shock showed the induction of several known chaperone genes. Cold shock altered the expression of genes in lipid metabolism and stress proteins. High salinity resulted in the modulation of genes coding for compatible solutes, ion transporters and morphology. In summary, over 3000 of the 7325 genes were affected by temperature and/or salinity changes. Conclusion Transcriptional profiling confirmed that R. baltica is highly responsive to its environment. The distinct responses identified here have provided new insights into the complex adaptation machinery of this environmentally relevant marine bacterium. Our transcriptome study and previous proteome data suggest a set of genes of unknown functions that are most probably involved in the global stress response. This work lays the foundation for further bioinformatic and genetic studies which will lead to a comprehensive understanding of the biology of a marine Planctomycete.
Complete genome sequence of Rhodospirillum rubrum type strain (S1T)
Rhodospirillum rubrum (Esmarch 1887) Molisch 1907 is the type species of the genus Rhodospirillum, which is the type genus of the family Rhodospirillaceae in the class Alphaproteobacteria. The species is of special interest because it is an anoxygenic phototroph that produces extracellular elemental sulfur (instead of oxygen) while harvesting light. It contains one of the most simple photosynthetic systems currently known, lacking light harvesting complex 2. Strain S1T can grow on carbon monoxide as sole energy source. With currently over 1,750 PubMed entries, R. rubrum is one of the most intensively studied microbial species, in particular for physiological and genetic studies. Next to R. centenum strain SW, the genome sequence of strain S1T is only the second genome of a member of the genus Rhodospirillum to be published, but the first type strain genome from the genus. The 4,352,825 bp long chromosome and 53,732 bp plasmid with a total of 3,850 protein-coding and 83 RNA genes were sequenced as part of the DOE Joint Genome Institute Program DOEM 2002. Rhodospirillum rubrum (Esmarch 1887) Molisch 1907 is the type species of the genus Rhodospirillum, which is the type genus of the family Rhodospirillaceae in the class Alphaproteobacteria. The species is of special interest because it is an anoxygenic phototroph that produces extracellular elemental sulfur (instead of oxygen) while harvesting light. It contains one of the most simple photosynthetic systems currently known, lacking light harvesting complex 2. Strain S1T can grow on carbon monoxide as sole energy source. With currently over 1,750 PubMed entries, R. rubrum is one of the most intensively studied microbial species, in particular for physiological and genetic studies. Next to R. centenum strain SW, the genome sequence of strain S1T is only the second genome of a member of the genus Rhodospirillum to be published, but the first type strain genome from the genus. The 4,352,825 bp long chromosome and 53,732 bp plasmid with a total of 3,850 protein-coding and 83 RNA genes were sequenced as part of the DOE Joint Genome Institute Program DOEM 2002.
Complete genome sequence of the Medicago microsymbiont Ensifer (Sinorhizobium) medicae strain WSM419
Ensifer (Sinorhizobium) medicae is an effective nitrogen fixing microsymbiont of a diverse range of annual Medicago (medic) species. Strain WSM419 is an aerobic, motile, non-spore forming, Gram-negative rod isolated from a M. murex root nodule collected in Sardinia, Italy in 1981. WSM419 was manufactured commercially in Australia as an inoculant for annual medics during 1985 to 1993 due to its nitrogen fixation, saprophytic competence and acid tolerance properties. Here we describe the basic features of this organism, together with the complete genome sequence, and annotation. This is the first report of a complete genome sequence for a microsymbiont of the group of annual medic species adapted to acid soils. We reveal that its genome size is 6,817,576 bp encoding 6,518 protein-coding genes and 81 RNA only encoding genes. The genome contains a chromosome of size 3,781,904 bp and 3 plasmids of size 1,570,951 bp, 1,245,408 bp and 219,313 bp. The smallest plasmid is a feature unique to this medic microsymbiont. Ensifer (Sinorhizobium) medicae is an effective nitrogen fixing microsymbiont of a diverse range of annual Medicago (medic) species. Strain WSM419 is an aerobic, motile, non-spore forming, Gram-negative rod isolated from a M. murex root nodule collected in Sardinia, Italy in 1981. WSM419 was manufactured commercially in Australia as an inoculant for annual medics during 1985 to 1993 due to its nitrogen fixation, saprophytic competence and acid tolerance properties. Here we describe the basic features of this organism, together with the complete genome sequence, and annotation. This is the first report of a complete genome sequence for a microsymbiont of the group of annual medic species adapted to acid soils. We reveal that its genome size is 6,817,576 bp encoding 6,518 protein-coding genes and 81 RNA only encoding genes. The genome contains a chromosome of size 3,781,904 bp and 3 plasmids of size 1,570,951 bp, 1,245,408 bp and 219,313 bp. The smallest plasmid is a feature unique to this medic microsymbiont.
Complete genome sequence of Enterococcus faecium strain TX16 and comparative genomic analysis of Enterococcus faecium genomes
Background Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references. Results In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST) 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA) strains (including STs 16, 17, 18, and 78), in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade) and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA) clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains, as previously reported. Conclusions Our findings along with other studies show that HA clonal lineages harbor specific genetic elements as well as sequence differences in the core genome which may confer selection advantages over the more heterogeneous CA E. faecium isolates. Which of these differences are important for the success of specific E. faecium lineages in the hospital environment remain(s) to be determined. Background Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references. Results In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST) 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA) strains (including STs 16, 17, 18, and 78), in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade) and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA) clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains, as previously reported. Conclusions Our findings along with other studies show that HA clonal lineages harbor specific genetic elements as well as sequence differences in the core genome which may confer selection advantages over the more heterogeneous CA E. faecium isolates. Which of these differences are important for the success of specific E. faecium lineages in the hospital environment remain(s) to be determined.
Discriminating the reaction types of plant type III polyketide synthases
Abstract Motivation: Functional prediction of paralogs is challenging in bioinformatics because of rapid functional diversification after gene duplication events combined with parallel acquisitions of similar functions by different paralogs. Plant type III polyketide synthases (PKSs), producing various secondary metabolites, represent a paralogous family that has undergone gene duplication and functional alteration. Currently, there is no computational method available for the functional prediction of type III PKSs. Results: We developed a plant type III PKS reaction predictor, pPAP, based on the recently proposed classification of type III PKSs. pPAP combines two kinds of similarity measures: one calculated by profile hidden Markov models (pHMMs) built from functionally and structurally important partial sequence regions, and the other based on mutual information between residue positions. pPAP targets PKSs acting on ring-type starter substrates, and classifies their functions into four reaction types. The pHMM approach discriminated two reaction types with high accuracy (97.5%, 39/40), but its accuracy decreased when discriminating three reaction types (87.8%, 43/49). When combined with a correlation-based approach, all 49 PKSs were correctly discriminated, and pPAP was still highly accurate (91.4%, 64/70) even after adding other reaction types. These results suggest pPAP, which is based on linear discriminant analyses of similarity measures, is effective for plant type III PKS function prediction. Availability and Implementation: pPAP is freely available at ftp://ftp.genome.jp/pub/tools/ppap/ Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online. Abstract Motivation: Functional prediction of paralogs is challenging in bioinformatics because of rapid functional diversification after gene duplication events combined with parallel acquisitions of similar functions by different paralogs. Plant type III polyketide synthases (PKSs), producing various secondary metabolites, represent a paralogous family that has undergone gene duplication and functional alteration. Currently, there is no computational method available for the functional prediction of type III PKSs. Results: We developed a plant type III PKS reaction predictor, pPAP, based on the recently proposed classification of type III PKSs. pPAP combines two kinds of similarity measures: one calculated by profile hidden Markov models (pHMMs) built from functionally and structurally important partial sequence regions, and the other based on mutual information between residue positions. pPAP targets PKSs acting on ring-type starter substrates, and classifies their functions into four reaction types. The pHMM approach discriminated two reaction types with high accuracy (97.5%, 39/40), but its accuracy decreased when discriminating three reaction types (87.8%, 43/49). When combined with a correlation-based approach, all 49 PKSs were correctly discriminated, and pPAP was still highly accurate (91.4%, 64/70) even after adding other reaction types. These results suggest pPAP, which is based on linear discriminant analyses of similarity measures, is effective for plant type III PKS function prediction. Availability and Implementation: pPAP is freely available at ftp://ftp.genome.jp/pub/tools/ppap/ Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Hierarchical probabilistic models for multiple gene/variant associations based on next generation sequencing data
Abstract Motivation The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays employs two competing types of models: models which rely on appropriate transformations of RNA-seq data (and are powered by a mature mathematical theory), or count-based models, which represent digital gene expression explicitly, thus rendering such transformations unnecessary. The latter constitutes an immensely popular methodology, which is however plagued by mathematical intractability. Results We develop tractable count-based models, which are amenable to efficient estimation through the introduction of latent variables and the appropriate application of recent statistical theory in a sparse Bayesian modelling framework. Furthermore, we examine several transformation methods for RNA-seq read counts and we introduce arcsin, logit and Laplace smoothing as preprocessing steps for transformation-based models. Using natural and carefully simulated data from the 1000 Genomes and gEUVADIS projects, we benchmark both approaches under a variety of scenarios, including the presence of noise and violation of basic model assumptions. We demonstrate that an arcsin transformation of Laplace-smoothed data is at least as good as state-of-the-art models, particularly at small samples. Furthermore, we show that an over-dispersed Poisson model is comparable to the celebrated Negative Binomial, but much easier to estimate. These results provide strong support for transformation-based versus count-based (particularly Negative-Binomial-based) models for eQTL mapping. Availability and implementation All methods are implemented in the free software eQTLseq: https://github.com/dvav/eQTLseq Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online. Abstract Motivation The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays employs two competing types of models: models which rely on appropriate transformations of RNA-seq data (and are powered by a mature mathematical theory), or count-based models, which represent digital gene expression explicitly, thus rendering such transformations unnecessary. The latter constitutes an immensely popular methodology, which is however plagued by mathematical intractability. Results We develop tractable count-based models, which are amenable to efficient estimation through the introduction of latent variables and the appropriate application of recent statistical theory in a sparse Bayesian modelling framework. Furthermore, we examine several transformation methods for RNA-seq read counts and we introduce arcsin, logit and Laplace smoothing as preprocessing steps for transformation-based models. Using natural and carefully simulated data from the 1000 Genomes and gEUVADIS projects, we benchmark both approaches under a variety of scenarios, including the presence of noise and violation of basic model assumptions. We demonstrate that an arcsin transformation of Laplace-smoothed data is at least as good as state-of-the-art models, particularly at small samples. Furthermore, we show that an over-dispersed Poisson model is comparable to the celebrated Negative Binomial, but much easier to estimate. These results provide strong support for transformation-based versus count-based (particularly Negative-Binomial-based) models for eQTL mapping. Availability and implementation All methods are implemented in the free software eQTLseq: https://github.com/dvav/eQTLseq Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.
IMG 4 version of the integrated microbial genomes comparative analysis system
The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu). The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).
Fast Genome Wide Functional Annotation through Orthology Assignment by eggNOG Mapper
Abstract Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de. Abstract Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.
Discovery of physiological and cancer related regulators of 3′ UTR processing with KAPAC
3′ Untranslated regions (3' UTRs) length is regulated in relation to cellular state. To uncover key regulators of poly(A) site use in specific conditions, we have developed PAQR, a method for quantifying poly(A) site use from RNA sequencing data and KAPAC, an approach that infers activities of oligomeric sequence motifs on poly(A) site choice. Application of PAQR and KAPAC to RNA sequencing data from normal and tumor tissue samples uncovers motifs that can explain changes in cleavage and polyadenylation in specific cancers. In particular, our analysis points to polypyrimidine tract binding protein 1 as a regulator of poly(A) site choice in glioblastoma. Electronic supplementary material The online version of this article (10.1186/s13059-018-1415-3) contains supplementary material, which is available to authorized users. 3′ Untranslated regions (3' UTRs) length is regulated in relation to cellular state. To uncover key regulators of poly(A) site use in specific conditions, we have developed PAQR, a method for quantifying poly(A) site use from RNA sequencing data and KAPAC, an approach that infers activities of oligomeric sequence motifs on poly(A) site choice. Application of PAQR and KAPAC to RNA sequencing data from normal and tumor tissue samples uncovers motifs that can explain changes in cleavage and polyadenylation in specific cancers. In particular, our analysis points to polypyrimidine tract binding protein 1 as a regulator of poly(A) site choice in glioblastoma. Electronic supplementary material The online version of this article (10.1186/s13059-018-1415-3) contains supplementary material, which is available to authorized users.
A multitask clustering approach for single cell RNA seq analysis in Recessive Dystrophic Epidermolysis Bullosa
Author summary scRNA-seq enables detailed profiling of heterogeneous cell populations and can be used to reveal lineage relationships or discover new cell types. In the literature, there has been little effort directed towards developing computational methods for cross-population transcriptome analysis of multiple single-cell populations. The cross-cell-population clustering problem is different from the traditional clustering problem because single-cell populations can be collected from different patients, different samples of a tissue, or different experimental replicates. The accompanying biological and technical variation tends to dominate the signals for clustering the pooled single cells from the multiple populations. In this work, we have developed a multitask clustering method to address the cross-population clustering problem. The method simultaneously clusters each individual cell population and controls variance among the cell-type cluster centers within each cell population and across the cell populations. We demonstrate that our multitask clustering method significantly improves clustering accuracy and marker discovery in three public scRNA-seq datasets and also apply the method to an in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) dataset. Our results make it evident that multitask clustering is a promising new approach for cross-population analysis of scRNA-seq data. Author summary scRNA-seq enables detailed profiling of heterogeneous cell populations and can be used to reveal lineage relationships or discover new cell types. In the literature, there has been little effort directed towards developing computational methods for cross-population transcriptome analysis of multiple single-cell populations. The cross-cell-population clustering problem is different from the traditional clustering problem because single-cell populations can be collected from different patients, different samples of a tissue, or different experimental replicates. The accompanying biological and technical variation tends to dominate the signals for clustering the pooled single cells from the multiple populations. In this work, we have developed a multitask clustering method to address the cross-population clustering problem. The method simultaneously clusters each individual cell population and controls variance among the cell-type cluster centers within each cell population and across the cell populations. We demonstrate that our multitask clustering method significantly improves clustering accuracy and marker discovery in three public scRNA-seq datasets and also apply the method to an in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) dataset. Our results make it evident that multitask clustering is a promising new approach for cross-population analysis of scRNA-seq data.Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC. Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC.
Rapid de novo assembly of the European eel genome from nanopore sequencing reads
We have sequenced the genome of the endangered European eel using the MinION by Oxford Nanopore, and assembled these data using a novel algorithm specifically designed for large eukaryotic genomes. For this 860 Mbp genome, the entire computational process takes two days on a single CPU. The resulting genome assembly significantly improves on a previous draft based on short reads only, both in terms of contiguity (N50 1.2 Mbp) and structural quality. This combination of affordable nanopore sequencing and light weight assembly promises to make high-quality genomic resources accessible for many non-model plants and animals. We have sequenced the genome of the endangered European eel using the MinION by Oxford Nanopore, and assembled these data using a novel algorithm specifically designed for large eukaryotic genomes. For this 860 Mbp genome, the entire computational process takes two days on a single CPU. The resulting genome assembly significantly improves on a previous draft based on short reads only, both in terms of contiguity (N50 1.2 Mbp) and structural quality. This combination of affordable nanopore sequencing and light weight assembly promises to make high-quality genomic resources accessible for many non-model plants and animals.
Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia
Author summary Understanding the extent to which species or diverged populations hybridize in nature is crucially important if we are to understand the speciation process. Accordingly numerous research groups have developed methodology for finding the genetic evidence of such introgression. In this report we develop a supervised machine learning approach for uncovering loci which have introgressed across species boundaries. We show that our method, FILET, has greater accuracy and power than competing methods in discovering introgression, and in addition can detect the directionality associated with the gene flow between species. Using whole genome sequences from Drosophila simulans and Drosophila sechellia we show that FILET discovers quite extensive introgression between these species that has occurred mostly from D. simulans to D. sechellia. Our work highlights the complex process of speciation even within a well-studied system and points to the growing importance of supervised machine learning in population genetics. Author summary Understanding the extent to which species or diverged populations hybridize in nature is crucially important if we are to understand the speciation process. Accordingly numerous research groups have developed methodology for finding the genetic evidence of such introgression. In this report we develop a supervised machine learning approach for uncovering loci which have introgressed across species boundaries. We show that our method, FILET, has greater accuracy and power than competing methods in discovering introgression, and in addition can detect the directionality associated with the gene flow between species. Using whole genome sequences from Drosophila simulans and Drosophila sechellia we show that FILET discovers quite extensive introgression between these species that has occurred mostly from D. simulans to D. sechellia. Our work highlights the complex process of speciation even within a well-studied system and points to the growing importance of supervised machine learning in population genetics.Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia. Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia.
Accurate identification of RNA editing sites from primitive sequence with deep neural networks
RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed’s state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective. RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed’s state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.
Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data
Background Rice research has been enabled by access to the high quality reference genome sequence generated in 2005 by the International Rice Genome Sequencing Project (IRGSP). To further facilitate genomic-enabled research, we have updated and validated the genome assembly and sequence for the Nipponbare cultivar of Oryza sativa (japonica group). Results The Nipponbare genome assembly was updated by revising and validating the minimal tiling path of clones with the optical map for rice. Sequencing errors in the revised genome assembly were identified by re-sequencing the genome of two different Nipponbare individuals using the Illumina Genome Analyzer II/IIx platform. A total of 4,886 sequencing errors were identified in 321 Mb of the assembled genome indicating an error rate in the original IRGSP assembly of only 0.15 per 10,000 nucleotides. A small number (five) of insertions/deletions were identified using longer reads generated using the Roche 454 pyrosequencing platform. As the re-sequencing data were generated from two different individuals, we were able to identify a number of allelic differences between the original individual used in the IRGSP effort and the two individuals used in the re-sequencing effort. The revised assembly, termed Os-Nipponbare-Reference-IRGSP-1.0, is now being used in updated releases of the Rice Annotation Project and the Michigan State University Rice Genome Annotation Project, thereby providing a unified set of pseudomolecules for the rice community. Conclusions A revised, error-corrected, and validated assembly of the Nipponbare cultivar of rice was generated using optical map data, re-sequencing data, and manual curation that will facilitate on-going and future research in rice. Detection of polymorphisms between three different Nipponbare individuals highlights that allelic differences between individuals should be considered in diversity studies. Electronic supplementary material The online version of this article (doi:10.1186/1939-8433-6-4) contains supplementary material, which is available to authorized users. Background Rice research has been enabled by access to the high quality reference genome sequence generated in 2005 by the International Rice Genome Sequencing Project (IRGSP). To further facilitate genomic-enabled research, we have updated and validated the genome assembly and sequence for the Nipponbare cultivar of Oryza sativa (japonica group). Results The Nipponbare genome assembly was updated by revising and validating the minimal tiling path of clones with the optical map for rice. Sequencing errors in the revised genome assembly were identified by re-sequencing the genome of two different Nipponbare individuals using the Illumina Genome Analyzer II/IIx platform. A total of 4,886 sequencing errors were identified in 321 Mb of the assembled genome indicating an error rate in the original IRGSP assembly of only 0.15 per 10,000 nucleotides. A small number (five) of insertions/deletions were identified using longer reads generated using the Roche 454 pyrosequencing platform. As the re-sequencing data were generated from two different individuals, we were able to identify a number of allelic differences between the original individual used in the IRGSP effort and the two individuals used in the re-sequencing effort. The revised assembly, termed Os-Nipponbare-Reference-IRGSP-1.0, is now being used in updated releases of the Rice Annotation Project and the Michigan State University Rice Genome Annotation Project, thereby providing a unified set of pseudomolecules for the rice community. Conclusions A revised, error-corrected, and validated assembly of the Nipponbare cultivar of rice was generated using optical map data, re-sequencing data, and manual curation that will facilitate on-going and future research in rice. Detection of polymorphisms between three different Nipponbare individuals highlights that allelic differences between individuals should be considered in diversity studies. Electronic supplementary material The online version of this article (doi:10.1186/1939-8433-6-4) contains supplementary material, which is available to authorized users.
Myosin repertoire expansion coincides with eukaryotic diversification in the Mesoproterozoic era
Background The last eukaryotic common ancestor already had an amazingly complex cell possessing genomic and cellular features such as spliceosomal introns, mitochondria, cilia-dependent motility, and a cytoskeleton together with several intracellular transport systems. In contrast to the microtubule-based dyneins and kinesins, the actin-filament associated myosins are considerably divergent in extant eukaryotes and a unifying picture of their evolution has not yet emerged. Results Here, we manually assembled and annotated 7852 myosins from 929 eukaryotes providing an unprecedented dense sequence and taxonomic sampling. For classification we complemented phylogenetic analyses with gene structure comparisons resulting in 79 distinct myosin classes. The intron pattern analysis and the taxonomic distribution of the classes suggest two myosins in the last eukaryotic common ancestor, a class-1 prototype and another myosin, which is most likely the ancestor of all other myosin classes. The sparse distribution of class-2 and class-4 myosins outside their major lineages contradicts their presence in the last eukaryotic common ancestor but instead strongly suggests early eukaryote-eukaryote horizontal gene transfer. Conclusions By correlating the evolution of myosin diversity with the history of Earth we found that myosin innovation occurred in independent major “burst” events in the major eukaryotic lineages. Most myosin inventions happened in the Mesoproterozoic era. In the late Neoproterozoic era, a process of extensive independent myosin loss began simultaneously with further eukaryotic diversification. Since the Cambrian explosion, myosin repertoire expansion is driven by lineage- and species-specific gene and genome duplications leading to subfunctionalization and fine-tuning of myosin functions. Electronic supplementary material The online version of this article (10.1186/s12862-017-1056-2) contains supplementary material, which is available to authorized users. Background The last eukaryotic common ancestor already had an amazingly complex cell possessing genomic and cellular features such as spliceosomal introns, mitochondria, cilia-dependent motility, and a cytoskeleton together with several intracellular transport systems. In contrast to the microtubule-based dyneins and kinesins, the actin-filament associated myosins are considerably divergent in extant eukaryotes and a unifying picture of their evolution has not yet emerged. Results Here, we manually assembled and annotated 7852 myosins from 929 eukaryotes providing an unprecedented dense sequence and taxonomic sampling. For classification we complemented phylogenetic analyses with gene structure comparisons resulting in 79 distinct myosin classes. The intron pattern analysis and the taxonomic distribution of the classes suggest two myosins in the last eukaryotic common ancestor, a class-1 prototype and another myosin, which is most likely the ancestor of all other myosin classes. The sparse distribution of class-2 and class-4 myosins outside their major lineages contradicts their presence in the last eukaryotic common ancestor but instead strongly suggests early eukaryote-eukaryote horizontal gene transfer. Conclusions By correlating the evolution of myosin diversity with the history of Earth we found that myosin innovation occurred in independent major “burst” events in the major eukaryotic lineages. Most myosin inventions happened in the Mesoproterozoic era. In the late Neoproterozoic era, a process of extensive independent myosin loss began simultaneously with further eukaryotic diversification. Since the Cambrian explosion, myosin repertoire expansion is driven by lineage- and species-specific gene and genome duplications leading to subfunctionalization and fine-tuning of myosin functions. Electronic supplementary material The online version of this article (10.1186/s12862-017-1056-2) contains supplementary material, which is available to authorized users.
A massively parallel strategy for STR marker development, capture, and genotyping
Abstract Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. Here, we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without a reference genome, and an approach for highly parallel target STR recovery. We employed our approach to capture a panel of 5000 STRs from a test group of diademed sifakas (Propithecus diadema, n = 3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci—97.3–99.6% of STRs characterized with ≥10x non-redundant sequence coverage. We then tested our STR capture strategy on P. diadema fecal DNA, and report robust initial results and suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from flanking regions. Our method provides a cost-effective and scalable solution for rapid recovery of large STR and SNP datasets in any species without needing a reference genome, and can be used even with suboptimal DNA more easily acquired in conservation and ecological studies. Abstract Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. Here, we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without a reference genome, and an approach for highly parallel target STR recovery. We employed our approach to capture a panel of 5000 STRs from a test group of diademed sifakas (Propithecus diadema, n = 3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci—97.3–99.6% of STRs characterized with ≥10x non-redundant sequence coverage. We then tested our STR capture strategy on P. diadema fecal DNA, and report robust initial results and suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from flanking regions. Our method provides a cost-effective and scalable solution for rapid recovery of large STR and SNP datasets in any species without needing a reference genome, and can be used even with suboptimal DNA more easily acquired in conservation and ecological studies.
Transmembrane protein topology prediction using support vector machines
Background Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated. Results We present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from . Conclusion The high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins. Background Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated. Results We present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from . Conclusion The high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins.
A Scaled Framework for CRISPR Editing of Human Pluripotent Stem Cells to Study Psychiatric Disease
Summary Scaling of CRISPR-Cas9 technology in human pluripotent stem cells (hPSCs) represents an important step for modeling complex disease and developing drug screens in human cells. However, variables affecting the scaling efficiency of gene editing in hPSCs remain poorly understood. Here, we report a standardized CRISPR-Cas9 approach, with robust benchmarking at each step, to successfully target and genotype a set of psychiatric disease-implicated genes in hPSCs and provide a resource of edited hPSC lines for six of these genes. We found that transcriptional state and nucleosome positioning around targeted loci was not correlated with editing efficiency. However, editing frequencies varied between different hPSC lines and correlated with genomic stability, underscoring the need for careful cell line selection and unbiased assessments of genomic integrity. Together, our step-by-step quantification and in-depth analyses provide an experimental roadmap for scaling Cas9-mediated editing in hPSCs to study psychiatric disease, with broader applicability for other polygenic diseases. Summary Scaling of CRISPR-Cas9 technology in human pluripotent stem cells (hPSCs) represents an important step for modeling complex disease and developing drug screens in human cells. However, variables affecting the scaling efficiency of gene editing in hPSCs remain poorly understood. Here, we report a standardized CRISPR-Cas9 approach, with robust benchmarking at each step, to successfully target and genotype a set of psychiatric disease-implicated genes in hPSCs and provide a resource of edited hPSC lines for six of these genes. We found that transcriptional state and nucleosome positioning around targeted loci was not correlated with editing efficiency. However, editing frequencies varied between different hPSC lines and correlated with genomic stability, underscoring the need for careful cell line selection and unbiased assessments of genomic integrity. Together, our step-by-step quantification and in-depth analyses provide an experimental roadmap for scaling Cas9-mediated editing in hPSCs to study psychiatric disease, with broader applicability for other polygenic diseases.
Digital DNA DNA hybridization for microbial species delineation by means of genome to genome sequence comparison
The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring segment pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation. The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring segment pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation.
Characterization of Genome Wide Variation in Four Row Wax, a Waxy Maize Landrace with a Reduced Kernel Row Phenotype
In southwest China, some maize landraces have long been isolated geographically, and have phenotypes that differ from those of widely grown cultivars. These landraces may harbor rich genetic variation responsible for those phenotypes. Four-row Wax is one such landrace, with four rows of kernels on the cob. We resequenced the genome of Four-row Wax, obtaining 50.46 Gb sequence at 21.87× coverage, then identified and characterized 3,252,194 SNPs, 213,181 short InDels (1–5 bp) and 39,631 structural variations (greater than 5 bp). Of those, 312,511 (9.6%) SNPs were novel compared to the most detailed haplotype map (HapMap) SNP database of maize. Characterization of variations in reported kernel row number (KRN) related genes and KRN QTL regions revealed potential causal mutations in fea2, td1, kn1, and te1. Genome-wide comparisons revealed abundant genetic variations in Four-row Wax, which may be associated with environmental adaptation. The sequence and SNP variations described here enrich genetic resources of maize, and provide guidance into study of seed numbers for crop yield improvement. In southwest China, some maize landraces have long been isolated geographically, and have phenotypes that differ from those of widely grown cultivars. These landraces may harbor rich genetic variation responsible for those phenotypes. Four-row Wax is one such landrace, with four rows of kernels on the cob. We resequenced the genome of Four-row Wax, obtaining 50.46 Gb sequence at 21.87× coverage, then identified and characterized 3,252,194 SNPs, 213,181 short InDels (1–5 bp) and 39,631 structural variations (greater than 5 bp). Of those, 312,511 (9.6%) SNPs were novel compared to the most detailed haplotype map (HapMap) SNP database of maize. Characterization of variations in reported kernel row number (KRN) related genes and KRN QTL regions revealed potential causal mutations in fea2, td1, kn1, and te1. Genome-wide comparisons revealed abundant genetic variations in Four-row Wax, which may be associated with environmental adaptation. The sequence and SNP variations described here enrich genetic resources of maize, and provide guidance into study of seed numbers for crop yield improvement.
CNV discovery for milk composition traits in dairy cattle using whole genome resequencing
Background Copy number variations (CNVs) are important and widely distributed in the genome. CNV detection opens a new avenue for exploring genes associated with complex traits in humans, animals and plants. Herein, we present a genome-wide assessment of CNVs that are potentially associated with milk composition traits in dairy cattle. Results In this study, CNVs were detected based on whole genome re-sequencing data of eight Holstein bulls from four half- and/or full-sib families, with extremely high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage. The range of coverage depth per individual was 8.2–11.9×. Using CNVnator, we identified a total of 14,821 CNVs, including 5025 duplications and 9796 deletions. Among them, 487 differential CNV regions (CNVRs) comprising ~8.23 Mb of the cattle genome were observed between the high and low groups. Annotation of these differential CNVRs were performed based on the cattle genome reference assembly (UMD3.1) and totally 235 functional genes were found within the CNVRs. By Gene Ontology and KEGG pathway analyses, we found that genes were significantly enriched for specific biological functions related to protein and lipid metabolism, insulin/IGF pathway-protein kinase B signaling cascade, prolactin signaling pathway and AMPK signaling pathways. These genes included INS, IGF2, FOXO3, TH, SCD5, GALNT18, GALNT16, ART3, SNCA and WNT7A, implying their potential association with milk protein and fat traits. In addition, 95 CNVRs were overlapped with 75 known QTLs that are associated with milk protein and fat traits of dairy cattle (Cattle QTLdb). Conclusions In conclusion, based on NGS of 8 Holstein bulls with extremely high and low EBVs for milk PP and FP, we identified a total of 14,821 CNVs, 487 differential CNVRs between groups, and 10 genes, which were suggested as promising candidate genes for milk protein and fat traits. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3636-3) contains supplementary material, which is available to authorized users. Background Copy number variations (CNVs) are important and widely distributed in the genome. CNV detection opens a new avenue for exploring genes associated with complex traits in humans, animals and plants. Herein, we present a genome-wide assessment of CNVs that are potentially associated with milk composition traits in dairy cattle. Results In this study, CNVs were detected based on whole genome re-sequencing data of eight Holstein bulls from four half- and/or full-sib families, with extremely high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage. The range of coverage depth per individual was 8.2–11.9×. Using CNVnator, we identified a total of 14,821 CNVs, including 5025 duplications and 9796 deletions. Among them, 487 differential CNV regions (CNVRs) comprising ~8.23 Mb of the cattle genome were observed between the high and low groups. Annotation of these differential CNVRs were performed based on the cattle genome reference assembly (UMD3.1) and totally 235 functional genes were found within the CNVRs. By Gene Ontology and KEGG pathway analyses, we found that genes were significantly enriched for specific biological functions related to protein and lipid metabolism, insulin/IGF pathway-protein kinase B signaling cascade, prolactin signaling pathway and AMPK signaling pathways. These genes included INS, IGF2, FOXO3, TH, SCD5, GALNT18, GALNT16, ART3, SNCA and WNT7A, implying their potential association with milk protein and fat traits. In addition, 95 CNVRs were overlapped with 75 known QTLs that are associated with milk protein and fat traits of dairy cattle (Cattle QTLdb). Conclusions In conclusion, based on NGS of 8 Holstein bulls with extremely high and low EBVs for milk PP and FP, we identified a total of 14,821 CNVs, 487 differential CNVRs between groups, and 10 genes, which were suggested as promising candidate genes for milk protein and fat traits. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3636-3) contains supplementary material, which is available to authorized users.
Compound heterozygous SLC19A3 mutations further refine the critical promoter region for biotin thiamine responsive basal ganglia disease
Mutations in the gene SLC19A3 result in thiamine metabolism dysfunction syndrome 2, also known as biotin-thiamine-responsive basal ganglia disease (BTBGD). This neurometabolic disease typically presents in early childhood with progressive neurodegeneration, including confusion, seizures, and dysphagia, advancing to coma and death. Treatment is possible via supplement of biotin and/or thiamine, with early treatment resulting in significant lifelong improvements. Here we report two siblings who received a refined diagnosis of BTBGD following whole-genome sequencing. Both children inherited compound heterozygous mutations from unaffected parents; a missense single-nucleotide variant (p.G23V) in the first transmembrane domain of the protein, and a 4808-bp deletion in exon 1 encompassing the 5′ UTR and minimal promoter region. This deletion is the smallest promoter deletion reported to date, further defining the minimal promoter region of SLC19A3. Unfortunately, one of the siblings died prior to diagnosis, but the other is showing significant improvement after commencement of therapy. This case demonstrates the power of whole-genome sequencing for the identification of structural variants and subsequent diagnosis of rare neurodevelopmental disorders. Mutations in the gene SLC19A3 result in thiamine metabolism dysfunction syndrome 2, also known as biotin-thiamine-responsive basal ganglia disease (BTBGD). This neurometabolic disease typically presents in early childhood with progressive neurodegeneration, including confusion, seizures, and dysphagia, advancing to coma and death. Treatment is possible via supplement of biotin and/or thiamine, with early treatment resulting in significant lifelong improvements. Here we report two siblings who received a refined diagnosis of BTBGD following whole-genome sequencing. Both children inherited compound heterozygous mutations from unaffected parents; a missense single-nucleotide variant (p.G23V) in the first transmembrane domain of the protein, and a 4808-bp deletion in exon 1 encompassing the 5′ UTR and minimal promoter region. This deletion is the smallest promoter deletion reported to date, further defining the minimal promoter region of SLC19A3. Unfortunately, one of the siblings died prior to diagnosis, but the other is showing significant improvement after commencement of therapy. This case demonstrates the power of whole-genome sequencing for the identification of structural variants and subsequent diagnosis of rare neurodevelopmental disorders.
First‐generation HapMap in Cajanus spp. reveals untapped variations in parental lines of mapping populations
Summary Whole genome re‐sequencing (WGRS) was conducted on a panel of 20 Cajanus spp. accessions (crossing parentals of recombinant inbred lines, introgression lines, multiparent advanced generation intercross and nested association mapping population) comprising of two wild species and 18 cultivated species accessions. A total of 791.77 million paired‐end reads were generated with an effective mapping depth of ~12X per accession. Analysis of WGRS data provided 5 465 676 genome‐wide variations including 4 686 422 SNPs and 779 254 InDels across the accessions. Large structural variations in the form of copy number variations (2598) and presence and absence variations (970) were also identified. Additionally, 2 630 904 accession‐specific variations comprising of 2 278 571 SNPs (86.6%), 166 243 deletions (6.3%) and 186 090 insertions (7.1%) were also reported. Identified polymorphic sites in this study provide the first‐generation HapMap in Cajanus spp. which will be useful in mapping the genomic regions responsible for important traits. Summary Whole genome re‐sequencing (WGRS) was conducted on a panel of 20 Cajanus spp. accessions (crossing parentals of recombinant inbred lines, introgression lines, multiparent advanced generation intercross and nested association mapping population) comprising of two wild species and 18 cultivated species accessions. A total of 791.77 million paired‐end reads were generated with an effective mapping depth of ~12X per accession. Analysis of WGRS data provided 5 465 676 genome‐wide variations including 4 686 422 SNPs and 779 254 InDels across the accessions. Large structural variations in the form of copy number variations (2598) and presence and absence variations (970) were also identified. Additionally, 2 630 904 accession‐specific variations comprising of 2 278 571 SNPs (86.6%), 166 243 deletions (6.3%) and 186 090 insertions (7.1%) were also reported. Identified polymorphic sites in this study provide the first‐generation HapMap in Cajanus spp. which will be useful in mapping the genomic regions responsible for important traits.
Genome Sequence of a Marbled Eel Polyoma Like Virus in Taiwan
ABSTRACT We report here the complete genome sequence of a virus isolated from a diseased marbled eel (Anguilla marmorata) in Taiwan. The virus has been characterized as being related to Japanese eel endothelial cell-infecting virus (JEECV), with a large T-antigen-like protein. The sequence of the marbled eel virus displays low homology to the JEECV. ABSTRACT We report here the complete genome sequence of a virus isolated from a diseased marbled eel (Anguilla marmorata) in Taiwan. The virus has been characterized as being related to Japanese eel endothelial cell-infecting virus (JEECV), with a large T-antigen-like protein. The sequence of the marbled eel virus displays low homology to the JEECV.
Development of a comparative genomic fingerprinting assay for rapid and high resolution genotyping of Arcobacter butzleri
Background Molecular typing methods are critical for epidemiological investigations, facilitating disease outbreak detection and source identification. Study of the epidemiology of the emerging human pathogen Arcobacter butzleri is currently hampered by the lack of a subtyping method that is easily deployable in the context of routine epidemiological surveillance. In this study we describe a comparative genomic fingerprinting (CGF) method for high-resolution and high-throughput subtyping of A. butzleri. Comparative analysis of the genome sequences of eleven A. butzleri strains, including eight strains newly sequenced as part of this project, was employed to identify accessory genes suitable for generating unique genetic fingerprints for high-resolution subtyping based on gene presence or absence within a strain. Results A set of eighty-three accessory genes was used to examine the population structure of a dataset comprised of isolates from various sources, including human and non-human animals, sewage, and river water (n=156). A streamlined assay (CGF40) based on a subset of 40 genes was subsequently developed through marker optimization. High levels of profile diversity (121 distinct profiles) were observed among the 156 isolates in the dataset, and a high Simpson’s Index of Diversity (ID) observed (ID > 0.969) indicate that the CGF40 assay possesses high discriminatory power. At the same time, our observation that 115 isolates in this dataset could be assigned to 29 clades with a profile similarity of 90% or greater indicates that the method can be used to identify clades comprised of genetically similar isolates. Conclusions The CGF40 assay described herein combines high resolution and repeatability with high throughput for the rapid characterization of A. butzleri strains. This assay will facilitate the study of the population structure and epidemiology of A. butzleri. Electronic supplementary material The online version of this article (doi:10.1186/s12866-015-0426-4) contains supplementary material, which is available to authorized users. Background Molecular typing methods are critical for epidemiological investigations, facilitating disease outbreak detection and source identification. Study of the epidemiology of the emerging human pathogen Arcobacter butzleri is currently hampered by the lack of a subtyping method that is easily deployable in the context of routine epidemiological surveillance. In this study we describe a comparative genomic fingerprinting (CGF) method for high-resolution and high-throughput subtyping of A. butzleri. Comparative analysis of the genome sequences of eleven A. butzleri strains, including eight strains newly sequenced as part of this project, was employed to identify accessory genes suitable for generating unique genetic fingerprints for high-resolution subtyping based on gene presence or absence within a strain. Results A set of eighty-three accessory genes was used to examine the population structure of a dataset comprised of isolates from various sources, including human and non-human animals, sewage, and river water (n=156). A streamlined assay (CGF40) based on a subset of 40 genes was subsequently developed through marker optimization. High levels of profile diversity (121 distinct profiles) were observed among the 156 isolates in the dataset, and a high Simpson’s Index of Diversity (ID) observed (ID > 0.969) indicate that the CGF40 assay possesses high discriminatory power. At the same time, our observation that 115 isolates in this dataset could be assigned to 29 clades with a profile similarity of 90% or greater indicates that the method can be used to identify clades comprised of genetically similar isolates. Conclusions The CGF40 assay described herein combines high resolution and repeatability with high throughput for the rapid characterization of A. butzleri strains. This assay will facilitate the study of the population structure and epidemiology of A. butzleri. Electronic supplementary material The online version of this article (doi:10.1186/s12866-015-0426-4) contains supplementary material, which is available to authorized users.
Draft Genome Sequence of Acinetobacter johnsonii MB44, Exhibiting Nematicidal Activity against Caenorhabditis elegans
Acinetobacter johnsonii MB44 was isolated from a frost-plant-tissue sample, which showed noteworthy nematicidal activity against the model organism Caenorhabditis elegans. Here, we report the 3.4 Mb draft genome of A. johnsonii MB44, which will help in understanding the molecular mechanism of its ability to infect nematodes. Acinetobacter johnsonii MB44 was isolated from a frost-plant-tissue sample, which showed noteworthy nematicidal activity against the model organism Caenorhabditis elegans. Here, we report the 3.4 Mb draft genome of A. johnsonii MB44, which will help in understanding the molecular mechanism of its ability to infect nematodes.
Six novel Y chromosome genes in Anopheles mosquitoes discovered by independently sequencing males and females
Background Y chromosomes are responsible for the initiation of male development, male fertility, and other male-related functions in diverse species. However, Y genes are rarely characterized outside a few model species due to the arduous nature of studying the repeat-rich Y. Results The chromosome quotient (CQ) is a novel approach to systematically discover Y chromosome genes. In the CQ method, genomic DNA from males and females is sequenced independently and aligned to candidate reference sequences. The female to male ratio of the number of alignments to a reference sequence, a parameter called the chromosome quotient (CQ), is used to determine whether the sequence is Y-linked. Using the CQ method, we successfully identified known Y sequences from Homo sapiens and Drosophila melanogaster. The CQ method facilitated the discovery of Y chromosome sequences from the malaria mosquitoes Anopheles stephensi and An. gambiae. Comparisons to transcriptome sequence data with blastn led to the discovery of six Anopheles Y genes, three from each species. All six genes are expressed in the early embryo. Two of the three An. stephensi Y genes were recently acquired from the autosomes or the X. Although An. stephensi and An. gambiae belong to the same subgenus, we found no evidence of Y genes shared between the species. Conclusions The CQ method can reliably identify Y chromosome sequences using the ratio of alignments from male and female sequence data. The CQ method is widely applicable to species with fragmented genome assemblies produced from next-generation sequencing data. Analysis of the six Y genes characterized in this study indicates rapid Y chromosome evolution between An. stephensi and An. gambiae. The Anopheles Y genes discovered by the CQ method provide unique markers for population and phylogenetic analysis, and opportunities for novel mosquito control measures through the manipulation of sexual dimorphism and fertility. Background Y chromosomes are responsible for the initiation of male development, male fertility, and other male-related functions in diverse species. However, Y genes are rarely characterized outside a few model species due to the arduous nature of studying the repeat-rich Y. Results The chromosome quotient (CQ) is a novel approach to systematically discover Y chromosome genes. In the CQ method, genomic DNA from males and females is sequenced independently and aligned to candidate reference sequences. The female to male ratio of the number of alignments to a reference sequence, a parameter called the chromosome quotient (CQ), is used to determine whether the sequence is Y-linked. Using the CQ method, we successfully identified known Y sequences from Homo sapiens and Drosophila melanogaster. The CQ method facilitated the discovery of Y chromosome sequences from the malaria mosquitoes Anopheles stephensi and An. gambiae. Comparisons to transcriptome sequence data with blastn led to the discovery of six Anopheles Y genes, three from each species. All six genes are expressed in the early embryo. Two of the three An. stephensi Y genes were recently acquired from the autosomes or the X. Although An. stephensi and An. gambiae belong to the same subgenus, we found no evidence of Y genes shared between the species. Conclusions The CQ method can reliably identify Y chromosome sequences using the ratio of alignments from male and female sequence data. The CQ method is widely applicable to species with fragmented genome assemblies produced from next-generation sequencing data. Analysis of the six Y genes characterized in this study indicates rapid Y chromosome evolution between An. stephensi and An. gambiae. The Anopheles Y genes discovered by the CQ method provide unique markers for population and phylogenetic analysis, and opportunities for novel mosquito control measures through the manipulation of sexual dimorphism and fertility.
QTL Mapping for Pest and Disease Resistance in Cassava and Coincidence of Some QTL with Introgression Regions Derived from Manihot glaziovii
Genetic mapping of quantitative trait loci (QTL) for resistance to cassava brown streak disease (CBSD), cassava mosaic disease (CMD), and cassava green mite (CGM) was performed using an F1 cross developed between the Tanzanian landrace, Kiroba, and a breeding line, AR37-80. The population was evaluated for two consecutive years in two sites in Tanzania. A genetic linkage map was derived from 106 F1 progeny and 1,974 SNP markers and spanned 18 chromosomes covering a distance of 1,698 cM. Fifteen significant QTL were identified; two are associated with CBSD root necrosis only, and were detected on chromosomes V and XII, while seven were associated with CBSD foliar symptoms only and were detected on chromosomes IV, VI, XVII, and XVIII. QTL on chromosomes 11 and 15 were associated with both CBSD foliar and root necrosis symptoms. Two QTL were found to be associated with CMD and were detected on chromosomes XII and XIV, while two were associated with CGM and were identified on chromosomes V and X. There are large Manihot glaziovii introgression regions in Kiroba on chromosomes I, XVII, and XVIII. The introgression segments on chromosomes XVII and XVIII overlap with QTL associated with CBSD foliar symptoms. The introgression region on chromosome I is of a different haplotype to the characteristic “Amani haplotype” found in the landrace Namikonga and others, and unlike some other genotypes, Kiroba does not have a large introgression block on chromosome IV. Kiroba is closely related to a sampled Tanzanian “tree cassava.” This supports the observation that some of the QTL associated with CBSD resistance in Kiroba are different to those observed in another variety, Namikonga. Genetic mapping of quantitative trait loci (QTL) for resistance to cassava brown streak disease (CBSD), cassava mosaic disease (CMD), and cassava green mite (CGM) was performed using an F1 cross developed between the Tanzanian landrace, Kiroba, and a breeding line, AR37-80. The population was evaluated for two consecutive years in two sites in Tanzania. A genetic linkage map was derived from 106 F1 progeny and 1,974 SNP markers and spanned 18 chromosomes covering a distance of 1,698 cM. Fifteen significant QTL were identified; two are associated with CBSD root necrosis only, and were detected on chromosomes V and XII, while seven were associated with CBSD foliar symptoms only and were detected on chromosomes IV, VI, XVII, and XVIII. QTL on chromosomes 11 and 15 were associated with both CBSD foliar and root necrosis symptoms. Two QTL were found to be associated with CMD and were detected on chromosomes XII and XIV, while two were associated with CGM and were identified on chromosomes V and X. There are large Manihot glaziovii introgression regions in Kiroba on chromosomes I, XVII, and XVIII. The introgression segments on chromosomes XVII and XVIII overlap with QTL associated with CBSD foliar symptoms. The introgression region on chromosome I is of a different haplotype to the characteristic “Amani haplotype” found in the landrace Namikonga and others, and unlike some other genotypes, Kiroba does not have a large introgression block on chromosome IV. Kiroba is closely related to a sampled Tanzanian “tree cassava.” This supports the observation that some of the QTL associated with CBSD resistance in Kiroba are different to those observed in another variety, Namikonga.
Transcriptome Analysis of Silver Carp (Hypophthalmichthys molitrix) by Paired End RNA Sequencing
The silver carp (Hypophthalmichthys molitrix) is among the most intensively pond-cultured fish species and is used in the wild to counteract water bloom in China. However, little genomic information is available for this species, especially regarding its ability to grow rapidly in water, even water contaminated with high concentrations of poisonous microcystin. In this study, we performed de novo transcriptome assembly and analysis of the 17.10 million short-read sequences produced by the Illumina paired-end sequencing technology. Using an improved multiple k-mer contig assembly method coupled with further scaffolding, 85 759 sequences were obtained. There were 23 044 sequences annotated with 3423 gene ontology terms for 104 196 term occurrences and the three corresponding organizing principles. A total of 38 200 assembled sequences were involved in 218 predicted Kyoto Encyclopedia of Genes and Genomes metabolic pathways. We also recovered 41 of 44 genes involved in the biosynthesis of glutathione. Of these, five genes were identified as experienced positive selection between silver carp and zebrafish, as determined by the likelihood ratio test. This report is the first annotated review of the silver carp transcriptome. These data will be of interest to researchers investigating the evolution and biological processes of the silver carp. This work also provides an archive for future studies of recent speciation and evolution of Cyprinidae fishes and can be used in comparative studies of other fishes. The silver carp (Hypophthalmichthys molitrix) is among the most intensively pond-cultured fish species and is used in the wild to counteract water bloom in China. However, little genomic information is available for this species, especially regarding its ability to grow rapidly in water, even water contaminated with high concentrations of poisonous microcystin. In this study, we performed de novo transcriptome assembly and analysis of the 17.10 million short-read sequences produced by the Illumina paired-end sequencing technology. Using an improved multiple k-mer contig assembly method coupled with further scaffolding, 85 759 sequences were obtained. There were 23 044 sequences annotated with 3423 gene ontology terms for 104 196 term occurrences and the three corresponding organizing principles. A total of 38 200 assembled sequences were involved in 218 predicted Kyoto Encyclopedia of Genes and Genomes metabolic pathways. We also recovered 41 of 44 genes involved in the biosynthesis of glutathione. Of these, five genes were identified as experienced positive selection between silver carp and zebrafish, as determined by the likelihood ratio test. This report is the first annotated review of the silver carp transcriptome. These data will be of interest to researchers investigating the evolution and biological processes of the silver carp. This work also provides an archive for future studies of recent speciation and evolution of Cyprinidae fishes and can be used in comparative studies of other fishes.
Draft Genome Sequence of Sphingobium quisquiliarum Strain P25T, a Novel Hexachlorocyclohexane (HCH) Degrading Bacterium Isolated from an HCH Dumpsite
Here, we report the draft genome sequence (4.2 Mb) of Sphingobium quisquiliarum strain P25T, a natural lin (genes involved in degradation of hexachlorocyclohexane [HCH] isomers) variant genotype, isolated from a heavily contaminated (450 mg HCH/g of soil) HCH dumpsite. Here, we report the draft genome sequence (4.2 Mb) of Sphingobium quisquiliarum strain P25T, a natural lin (genes involved in degradation of hexachlorocyclohexane [HCH] isomers) variant genotype, isolated from a heavily contaminated (450 mg HCH/g of soil) HCH dumpsite.
Comparison of Mycoplasma pneumoniae Genome Sequences from Strains Isolated from Symptomatic and Asymptomatic Patients
Mycoplasma pneumoniae is a common cause of respiratory tract infections (RTIs) in children. We recently demonstrated that this bacterium can be carried asymptomatically in the respiratory tract of children. To identify potential genetic differences between M. pneumoniae strains that are carried asymptomatically and those that cause symptomatic infections, we performed whole-genome sequence analysis of 20 M. pneumoniae strains. The analyzed strains included 3 reference strains, 3 strains isolated from asymptomatic children, 13 strains isolated from clinically well-defined patients suffering from an upper (n = 4) or lower (n = 9) RTI, and one strain isolated from a follow-up patient who recently recovered from an RTI. The obtained sequences were each compared to the sequences of the reference strains. To find differences between strains isolated from asymptomatic and symptomatic individuals, a variant comparison was performed between the different groups of strains. Irrespective of the group (asymptomatic vs. symptomatic) from which the strains originated, subtype 1 and subtype 2 strains formed separate clusters. We could not identify a specific genotype associated with M. pneumoniae virulence. However, we found marked genetic differences between clinical isolates and the reference strains, which indicated that the latter strains may not be regarded as appropriate representatives of circulating M. pneumoniae strains. Mycoplasma pneumoniae is a common cause of respiratory tract infections (RTIs) in children. We recently demonstrated that this bacterium can be carried asymptomatically in the respiratory tract of children. To identify potential genetic differences between M. pneumoniae strains that are carried asymptomatically and those that cause symptomatic infections, we performed whole-genome sequence analysis of 20 M. pneumoniae strains. The analyzed strains included 3 reference strains, 3 strains isolated from asymptomatic children, 13 strains isolated from clinically well-defined patients suffering from an upper (n = 4) or lower (n = 9) RTI, and one strain isolated from a follow-up patient who recently recovered from an RTI. The obtained sequences were each compared to the sequences of the reference strains. To find differences between strains isolated from asymptomatic and symptomatic individuals, a variant comparison was performed between the different groups of strains. Irrespective of the group (asymptomatic vs. symptomatic) from which the strains originated, subtype 1 and subtype 2 strains formed separate clusters. We could not identify a specific genotype associated with M. pneumoniae virulence. However, we found marked genetic differences between clinical isolates and the reference strains, which indicated that the latter strains may not be regarded as appropriate representatives of circulating M. pneumoniae strains.
High resolution profiling of the gut microbiome reveals the extent of Clostridium difficile burden
Microbiome profiling through 16S rRNA gene sequence analysis has proven to be a useful research tool in the study of C. difficile infection (CDI); however, CDI microbiome studies typically report results at the genus level or higher, thus precluding identification of this pathogen relative to other members of the gut microbiota. Accurate identification of C. difficile relative to the overall gut microbiome may be useful in assessments of colonization in research studies or as a prognostic indicator for patients with CDI. To investigate the burden of C. difficile at the species level relative to the overall gut microbiome, we applied a high-resolution method for 16S rRNA sequence assignment to previously published gut microbiome studies of CDI and other patient populations. We identified C. difficile in 131 of 156 index cases of CDI (average abundance 1.78%), and 18 of 211 healthy controls (average abundance 0.008%). We further detected substantial levels of C. difficile in a subset of infants that persisted over the first two to 12 months of life. Correlation analysis of C. difficile burden compared to other detected species demonstrated consistent negative associations with C. scindens and multiple Blautia species. These analyses contribute insight into the relative burden of C. difficile in the gut microbiome for multiple patient populations, and indicate that high-resolution 16S rRNA gene sequence analysis may prove useful in the development and evaluation of new therapies for CDI. Microbiome profiling through 16S rRNA gene sequence analysis has proven to be a useful research tool in the study of C. difficile infection (CDI); however, CDI microbiome studies typically report results at the genus level or higher, thus precluding identification of this pathogen relative to other members of the gut microbiota. Accurate identification of C. difficile relative to the overall gut microbiome may be useful in assessments of colonization in research studies or as a prognostic indicator for patients with CDI. To investigate the burden of C. difficile at the species level relative to the overall gut microbiome, we applied a high-resolution method for 16S rRNA sequence assignment to previously published gut microbiome studies of CDI and other patient populations. We identified C. difficile in 131 of 156 index cases of CDI (average abundance 1.78%), and 18 of 211 healthy controls (average abundance 0.008%). We further detected substantial levels of C. difficile in a subset of infants that persisted over the first two to 12 months of life. Correlation analysis of C. difficile burden compared to other detected species demonstrated consistent negative associations with C. scindens and multiple Blautia species. These analyses contribute insight into the relative burden of C. difficile in the gut microbiome for multiple patient populations, and indicate that high-resolution 16S rRNA gene sequence analysis may prove useful in the development and evaluation of new therapies for CDI.
Allele Frequencies of Variants in Ultra Conserved Elements Identify Selective Pressure on Transcription Factor Binding
Ultra-conserved genes or elements (UCGs/UCEs) in the human genome are extreme examples of conservation. We characterized natural variations in 2884 UCEs and UCGs in two distinct populations; Singaporean Chinese (n = 280) and Italian (n = 501) by using a pooled sample, targeted capture, sequencing approach. We identify, with high confidence, in these regions the abundance of rare SNVs (MAF<0.5%) of which 75% is not present in dbSNP137. UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies. By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants. Moreover, prevalent variants are less likely to overlap transcription factor binding site. Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation. All together, these results suggest UCEs are not under selective pressure as a stretch of DNA but are under differential evolutionary pressure on the single nucleotide level. Ultra-conserved genes or elements (UCGs/UCEs) in the human genome are extreme examples of conservation. We characterized natural variations in 2884 UCEs and UCGs in two distinct populations; Singaporean Chinese (n = 280) and Italian (n = 501) by using a pooled sample, targeted capture, sequencing approach. We identify, with high confidence, in these regions the abundance of rare SNVs (MAF<0.5%) of which 75% is not present in dbSNP137. UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies. By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants. Moreover, prevalent variants are less likely to overlap transcription factor binding site. Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation. All together, these results suggest UCEs are not under selective pressure as a stretch of DNA but are under differential evolutionary pressure on the single nucleotide level.
Cell Cycle Control of Bivalent Epigenetic Domains Regulates the Exit from Pluripotency
Highlights • Bivalent domains are unstable, dynamic, and cell-cycle regulated • CDK2 phosphorylates MLL2 and establishes bivalent domains in G1 • Chromosome remodeling in G1 is required for the “poised” pluripotent state Highlights • Bivalent domains are unstable, dynamic, and cell-cycle regulated • CDK2 phosphorylates MLL2 and establishes bivalent domains in G1 • Chromosome remodeling in G1 is required for the “poised” pluripotent stateSummary Here we show that bivalent domains and chromosome architecture for bivalent genes are dynamically regulated during the cell cycle in human pluripotent cells. Central to this is the transient increase in H3K4-trimethylation at developmental genes during G1, thereby creating a “window of opportunity” for cell-fate specification. This mechanism is controlled by CDK2-dependent phosphorylation of the MLL2 (KMT2B) histone methyl-transferase, which facilitates its recruitment to developmental genes in G1. MLL2 binding is required for changes in chromosome architecture around developmental genes and establishes promoter-enhancer looping interactions in a cell-cycle-dependent manner. These cell-cycle-regulated loops are shown to be essential for activation of bivalent genes and pluripotency exit. These findings demonstrate that bivalent domains are established to control the cell-cycle-dependent activation of developmental genes so that differentiation initiates from the G1 phase. Summary Here we show that bivalent domains and chromosome architecture for bivalent genes are dynamically regulated during the cell cycle in human pluripotent cells. Central to this is the transient increase in H3K4-trimethylation at developmental genes during G1, thereby creating a “window of opportunity” for cell-fate specification. This mechanism is controlled by CDK2-dependent phosphorylation of the MLL2 (KMT2B) histone methyl-transferase, which facilitates its recruitment to developmental genes in G1. MLL2 binding is required for changes in chromosome architecture around developmental genes and establishes promoter-enhancer looping interactions in a cell-cycle-dependent manner. These cell-cycle-regulated loops are shown to be essential for activation of bivalent genes and pluripotency exit. These findings demonstrate that bivalent domains are established to control the cell-cycle-dependent activation of developmental genes so that differentiation initiates from the G1 phase.
First Draft Genome Sequences of Two Bartonella tribocorum Strains from Laos and Cambodia
ABSTRACT Bartonella tribocorum is a Gram-negative bacterium known to infect animals, and rodents in particular, throughout the world. In this report, we present the draft genome sequences of two strains of B. tribocorum isolated from the blood of a rodent in Laos and a shrew in Cambodia. ABSTRACT Bartonella tribocorum is a Gram-negative bacterium known to infect animals, and rodents in particular, throughout the world. In this report, we present the draft genome sequences of two strains of B. tribocorum isolated from the blood of a rodent in Laos and a shrew in Cambodia.
A PRDX1 mutant allele causes a MMACHC secondary epimutation in cblC patients
To date, epimutations reported in man have been somatic and erased in germlines. Here, we identify a cause of the autosomal recessive cblC class of inborn errors of vitamin B12 metabolism that we name “epi-cblC”. The subjects are compound heterozygotes for a genetic mutation and for a promoter epimutation, detected in blood, fibroblasts, and sperm, at the MMACHC locus; 5-azacytidine restores the expression of MMACHC in fibroblasts. MMACHC is flanked by CCDC163P and PRDX1, which are in the opposite orientation. The epimutation is present in three generations and results from PRDX1 mutations that force antisense transcription of MMACHC thereby possibly generating a H3K36me3 mark. The silencing of PRDX1 transcription leads to partial hypomethylation of the epiallele and restores the expression of MMACHC. This example of epi-cblC demonstrates the need to search for compound epigenetic-genetic heterozygosity in patients with typical disease manifestation and genetic heterozygosity in disease-causing genes located in other gene trios. To date, epimutations reported in man have been somatic and erased in germlines. Here, we identify a cause of the autosomal recessive cblC class of inborn errors of vitamin B12 metabolism that we name “epi-cblC”. The subjects are compound heterozygotes for a genetic mutation and for a promoter epimutation, detected in blood, fibroblasts, and sperm, at the MMACHC locus; 5-azacytidine restores the expression of MMACHC in fibroblasts. MMACHC is flanked by CCDC163P and PRDX1, which are in the opposite orientation. The epimutation is present in three generations and results from PRDX1 mutations that force antisense transcription of MMACHC thereby possibly generating a H3K36me3 mark. The silencing of PRDX1 transcription leads to partial hypomethylation of the epiallele and restores the expression of MMACHC. This example of epi-cblC demonstrates the need to search for compound epigenetic-genetic heterozygosity in patients with typical disease manifestation and genetic heterozygosity in disease-causing genes located in other gene trios.
Exome sequencing in undiagnosed inherited and sporadic ataxias
Inherited ataxias are difficult to diagnose genetically. Pyle et al. use whole-exome sequencing to provide a likely molecular diagnosis in 14 of 22 families with ataxia. The approach reveals de novo mutations, broadens the phenotype of other disease genes, and is equally effective in young and older-onset patients. Inherited ataxias are difficult to diagnose genetically. Pyle et al. use whole-exome sequencing to provide a likely molecular diagnosis in 14 of 22 families with ataxia. The approach reveals de novo mutations, broadens the phenotype of other disease genes, and is equally effective in young and older-onset patients.Inherited ataxias are clinically and genetically heterogeneous, and a molecular diagnosis is not possible in most patients. Having excluded common sporadic, inherited and metabolic causes, we used an unbiased whole exome sequencing approach in 35 affected individuals, from 22 randomly selected families of white European descent. We defined the likely molecular diagnosis in 14 of 22 families (64%). This revealed de novo dominant mutations, validated disease genes previously described in isolated families, and broadened the clinical phenotype of known disease genes. The diagnostic yield was the same in both young and older-onset patients, including sporadic cases. We have demonstrated the impact of exome sequencing in a group of patients notoriously difficult to diagnose genetically. This has important implications for genetic counselling and diagnostic service provision. Inherited ataxias are clinically and genetically heterogeneous, and a molecular diagnosis is not possible in most patients. Having excluded common sporadic, inherited and metabolic causes, we used an unbiased whole exome sequencing approach in 35 affected individuals, from 22 randomly selected families of white European descent. We defined the likely molecular diagnosis in 14 of 22 families (64%). This revealed de novo dominant mutations, validated disease genes previously described in isolated families, and broadened the clinical phenotype of known disease genes. The diagnostic yield was the same in both young and older-onset patients, including sporadic cases. We have demonstrated the impact of exome sequencing in a group of patients notoriously difficult to diagnose genetically. This has important implications for genetic counselling and diagnostic service provision.
Splicing mutation analysis reveals previously unrecognized pathways in lymph node invasive breast cancer
Somatic mutations reported in large-scale breast cancer (BC) sequencing studies primarily consist of protein coding mutations. mRNA splicing mutation analyses have been limited in scope, despite their prevalence in Mendelian genetic disorders. We predicted splicing mutations in 442 BC tumour and matched normal exomes from The Cancer Genome Atlas Consortium (TCGA). These splicing defects were validated by abnormal expression changes in these tumours. Of the 5,206 putative mutations identified, exon skipping, leaky or cryptic splicing was confirmed for 988 variants. Pathway enrichment analysis of the mutated genes revealed mutations in 9 NCAM1-related pathways, which were significantly increased in samples with evidence of lymph node metastasis, but not in lymph node-negative tumours. We suggest that comprehensive reporting of DNA sequencing data should include non-trivial splicing analyses to avoid missing clinically-significant deleterious splicing mutations, which may reveal novel mutated pathways present in genetic disorders. Somatic mutations reported in large-scale breast cancer (BC) sequencing studies primarily consist of protein coding mutations. mRNA splicing mutation analyses have been limited in scope, despite their prevalence in Mendelian genetic disorders. We predicted splicing mutations in 442 BC tumour and matched normal exomes from The Cancer Genome Atlas Consortium (TCGA). These splicing defects were validated by abnormal expression changes in these tumours. Of the 5,206 putative mutations identified, exon skipping, leaky or cryptic splicing was confirmed for 988 variants. Pathway enrichment analysis of the mutated genes revealed mutations in 9 NCAM1-related pathways, which were significantly increased in samples with evidence of lymph node metastasis, but not in lymph node-negative tumours. We suggest that comprehensive reporting of DNA sequencing data should include non-trivial splicing analyses to avoid missing clinically-significant deleterious splicing mutations, which may reveal novel mutated pathways present in genetic disorders.
Reducing the exome search space for Mendelian diseases using genetic linkage analysis of exome genotypes
Many exome sequencing studies of Mendelian disorders fail to optimally exploit family information. Classical genetic linkage analysis is an effective method for eliminating a large fraction of the candidate causal variants discovered, even in small families that lack a unique linkage peak. We demonstrate that accurate genetic linkage mapping can be performed using SNP genotypes extracted from exome data, removing the need for separate array-based genotyping. We provide software to facilitate such analyses. Many exome sequencing studies of Mendelian disorders fail to optimally exploit family information. Classical genetic linkage analysis is an effective method for eliminating a large fraction of the candidate causal variants discovered, even in small families that lack a unique linkage peak. We demonstrate that accurate genetic linkage mapping can be performed using SNP genotypes extracted from exome data, removing the need for separate array-based genotyping. We provide software to facilitate such analyses.
Prescreening whole exome sequencing results from patients with retinal degeneration for variants in genes associated with retinal degeneration
Background Accurate clinical diagnosis and prognosis of retinal degeneration can be aided by the identification of the disease-causing genetic variant. It can confirm the clinical diagnosis as well as inform the clinician of the risk for potential involvement of other organs such as kidneys. It also aids in genetic counseling for affected individuals who want to have a child. Finally, knowledge of disease-causing variants informs laboratory investigators involved in translational research. With the advent of next-generation sequencing, identifying pathogenic mutations is becoming easier, especially the identification of novel pathogenic variants. Methods We used whole exome sequencing on a cohort of 69 patients with various forms of retinal degeneration and in whom screens for previously identified disease-causing variants had been inconclusive. All potential pathogenic variants were verified by Sanger sequencing and, when possible, segregation analysis of immediate relatives. Potential variants were identified by using a semi-masked approach in which rare variants in candidate genes were identified without knowledge of the clinical diagnosis (beyond “retinal degeneration”) or inheritance pattern. After the initial list of genes was prioritized, genetic diagnosis and inheritance pattern were taken into account. Results We identified the likely pathogenic variants in 64% of the subjects. Seven percent had a single heterozygous mutation identified that would cause recessive disease and 13% had no obviously pathogenic variants and no family members available to perform segregation analysis. Eleven subjects are good candidates for novel gene discovery. Two de novo mutations were identified that resulted in dominant retinal degeneration. Conclusion Whole exome sequencing allows for thorough genetic analysis of candidate genes as well as novel gene discovery. It allows for an unbiased analysis of genetic variants to reduce the chance that the pathogenic mutation will be missed due to incomplete or inaccurate family history or analysis at the early stage of a syndromic form of retinal degeneration. Background Accurate clinical diagnosis and prognosis of retinal degeneration can be aided by the identification of the disease-causing genetic variant. It can confirm the clinical diagnosis as well as inform the clinician of the risk for potential involvement of other organs such as kidneys. It also aids in genetic counseling for affected individuals who want to have a child. Finally, knowledge of disease-causing variants informs laboratory investigators involved in translational research. With the advent of next-generation sequencing, identifying pathogenic mutations is becoming easier, especially the identification of novel pathogenic variants. Methods We used whole exome sequencing on a cohort of 69 patients with various forms of retinal degeneration and in whom screens for previously identified disease-causing variants had been inconclusive. All potential pathogenic variants were verified by Sanger sequencing and, when possible, segregation analysis of immediate relatives. Potential variants were identified by using a semi-masked approach in which rare variants in candidate genes were identified without knowledge of the clinical diagnosis (beyond “retinal degeneration”) or inheritance pattern. After the initial list of genes was prioritized, genetic diagnosis and inheritance pattern were taken into account. Results We identified the likely pathogenic variants in 64% of the subjects. Seven percent had a single heterozygous mutation identified that would cause recessive disease and 13% had no obviously pathogenic variants and no family members available to perform segregation analysis. Eleven subjects are good candidates for novel gene discovery. Two de novo mutations were identified that resulted in dominant retinal degeneration. Conclusion Whole exome sequencing allows for thorough genetic analysis of candidate genes as well as novel gene discovery. It allows for an unbiased analysis of genetic variants to reduce the chance that the pathogenic mutation will be missed due to incomplete or inaccurate family history or analysis at the early stage of a syndromic form of retinal degeneration.
Organoids model distinct Vitamin E effects at different stages of prostate cancer evolution
Vitamin E increased prostate cancer risk in the Selenium and Vitamin E Cancer Prevention Trial (SELECT) through unknown mechanisms while Selenium showed no efficacy. We determined the effects of the SELECT supplements on benign (primary), premalignant ( RWPE-1) and malignant (LNCaP) prostate epithelial organoids. While the supplements decreased proliferation and induced cell death in cancer organoids, they had no effect on the benign organoids. In contrast, Vitamin E enhanced cell proliferation and survival in the premalignant organoids in a manner that recapitulated the SELECT results. Indeed, while Vitamin E induced a pro-proliferative gene expression signature, Selenium alone or combined with Vitamin E produced an anti-proliferative signature. The premalignant organoids also displayed significant downregulation of glucose transporter and glycolytic gene expression pointing to metabolic alterations. Detached RWPE-1 cells had low ATP levels due to diminished glucose uptake and glycolysis which was rescued by Vitamin E through the activation of fatty acid oxidation (FAO). FAO inhibition abrogated the ATP rescue, diminished survival of the inner matrix detached cells, restoring the normal hollow lumen morphology in Vitamin E treated organoids. Organoid models therefore clarify the paradoxical findings from SELECT and demonstrate that Vitamin E promotes tumorigenesis in the early stages of prostate cancer evolution. Vitamin E increased prostate cancer risk in the Selenium and Vitamin E Cancer Prevention Trial (SELECT) through unknown mechanisms while Selenium showed no efficacy. We determined the effects of the SELECT supplements on benign (primary), premalignant ( RWPE-1) and malignant (LNCaP) prostate epithelial organoids. While the supplements decreased proliferation and induced cell death in cancer organoids, they had no effect on the benign organoids. In contrast, Vitamin E enhanced cell proliferation and survival in the premalignant organoids in a manner that recapitulated the SELECT results. Indeed, while Vitamin E induced a pro-proliferative gene expression signature, Selenium alone or combined with Vitamin E produced an anti-proliferative signature. The premalignant organoids also displayed significant downregulation of glucose transporter and glycolytic gene expression pointing to metabolic alterations. Detached RWPE-1 cells had low ATP levels due to diminished glucose uptake and glycolysis which was rescued by Vitamin E through the activation of fatty acid oxidation (FAO). FAO inhibition abrogated the ATP rescue, diminished survival of the inner matrix detached cells, restoring the normal hollow lumen morphology in Vitamin E treated organoids. Organoid models therefore clarify the paradoxical findings from SELECT and demonstrate that Vitamin E promotes tumorigenesis in the early stages of prostate cancer evolution.
Genotype and clinical course in 2 Chinese Han siblings with Wilson disease presenting with isolated disabling premature osteoarthritis
Supplemental Digital Content is available in the text Supplemental Digital Content is available in the textAbstract Rationale: Premature osteoarthritis (POA) is a rare condition in Wilson disease (WD). Particularly, when POA is the only complaint of a WD patient for a long time, there would be misdiagnosis or missed diagnosis and then treatment delay. Patient concerns and diagnosis: Two Chinese Han siblings were diagnosed as WD by corneal K-F rings, laboratory test, and mutation analysis. They presented with isolated POA during the first 2 decades or more of their disease course, and were of missed diagnosis during that long time. The older affected sib became disabled due to his severe osteoarthritis when he was as young as 38 years old. Two compound heterozygous pathogenic variants c.2790_2792del and c.2621C>T were revealed in the ATP7B gene through targeted next-generation sequencing (NGS). Lessons: Adolescent-onset POA could be the only complaint of WD individual for at least 2 decades. Long delay in the treatment of WD's POA could lead to disability in early adulthood. Detailed physical examination, special biochemical test, and genotyping through targeted NGS should greatly reduce diagnosis delay in atypical WD patients with isolated POA phenotype. Abstract Rationale: Premature osteoarthritis (POA) is a rare condition in Wilson disease (WD). Particularly, when POA is the only complaint of a WD patient for a long time, there would be misdiagnosis or missed diagnosis and then treatment delay. Patient concerns and diagnosis: Two Chinese Han siblings were diagnosed as WD by corneal K-F rings, laboratory test, and mutation analysis. They presented with isolated POA during the first 2 decades or more of their disease course, and were of missed diagnosis during that long time. The older affected sib became disabled due to his severe osteoarthritis when he was as young as 38 years old. Two compound heterozygous pathogenic variants c.2790_2792del and c.2621C>T were revealed in the ATP7B gene through targeted next-generation sequencing (NGS). Lessons: Adolescent-onset POA could be the only complaint of WD individual for at least 2 decades. Long delay in the treatment of WD's POA could lead to disability in early adulthood. Detailed physical examination, special biochemical test, and genotyping through targeted NGS should greatly reduce diagnosis delay in atypical WD patients with isolated POA phenotype.
Whole Exome Sequencing Reveals Homozygous Mutations in RAI1, OTOF, and SLC26A4 Genes Associated with Nonsyndromic Hearing Loss in Altaian Families (South Siberia)
Hearing loss (HL) is one of the most common sensorineural disorders and several dozen genes contribute to its pathogenesis. Establishing a genetic diagnosis of HL is of great importance for clinical evaluation of deaf patients and for estimating recurrence risks for their families. Efforts to identify genes responsible for HL have been challenged by high genetic heterogeneity and different ethnic-specific prevalence of inherited deafness. Here we present the utility of whole exome sequencing (WES) for identifying candidate causal variants for previously unexplained nonsyndromic HL of seven patients from four unrelated Altaian families (the Altai Republic, South Siberia). The WES analysis revealed homozygous missense mutations in three genes associated with HL. Mutation c.2168A>G (SLC26A4) was found in one family, a novel mutation c.1111G>C (OTOF) was revealed in another family, and mutation c.5254G>A (RAI1) was found in two families. Sanger sequencing was applied for screening of identified variants in an ethnically diverse cohort of other patients with HL (n = 116) and in Altaian controls (n = 120). Identified variants were found only in patients of Altaian ethnicity (n = 93). Several lines of evidences support the association of homozygosity for discovered variants c.5254G>A (RAI1), c.1111C>G (OTOF), and c.2168A>G (SLC26A4) with HL in Altaian patients. Local prevalence of identified variants implies possible founder effect in significant number of HL cases in indigenous population of the Altai region. Notably, this is the first reported instance of patients with RAI1 missense mutation whose HL is not accompanied by specific traits typical for Smith-Magenis syndrome. Presumed association of RAI1 gene variant c.5254G>A with isolated HL needs to be proved by further experimental studies. Hearing loss (HL) is one of the most common sensorineural disorders and several dozen genes contribute to its pathogenesis. Establishing a genetic diagnosis of HL is of great importance for clinical evaluation of deaf patients and for estimating recurrence risks for their families. Efforts to identify genes responsible for HL have been challenged by high genetic heterogeneity and different ethnic-specific prevalence of inherited deafness. Here we present the utility of whole exome sequencing (WES) for identifying candidate causal variants for previously unexplained nonsyndromic HL of seven patients from four unrelated Altaian families (the Altai Republic, South Siberia). The WES analysis revealed homozygous missense mutations in three genes associated with HL. Mutation c.2168A>G (SLC26A4) was found in one family, a novel mutation c.1111G>C (OTOF) was revealed in another family, and mutation c.5254G>A (RAI1) was found in two families. Sanger sequencing was applied for screening of identified variants in an ethnically diverse cohort of other patients with HL (n = 116) and in Altaian controls (n = 120). Identified variants were found only in patients of Altaian ethnicity (n = 93). Several lines of evidences support the association of homozygosity for discovered variants c.5254G>A (RAI1), c.1111C>G (OTOF), and c.2168A>G (SLC26A4) with HL in Altaian patients. Local prevalence of identified variants implies possible founder effect in significant number of HL cases in indigenous population of the Altai region. Notably, this is the first reported instance of patients with RAI1 missense mutation whose HL is not accompanied by specific traits typical for Smith-Magenis syndrome. Presumed association of RAI1 gene variant c.5254G>A with isolated HL needs to be proved by further experimental studies.
Two mouse models reveal an actionable PARP1 dependence in aggressive chronic lymphocytic leukemia
Chronic lymphocytic leukemia (CLL) remains an incurable disease. Two recurrent cytogenetic aberrations, namely del(17p), affecting TP53, and del(11q), affecting ATM, are associated with resistance against genotoxic chemotherapy (del17p) and poor outcome (del11q and del17p). Both del(17p) and del(11q) are also associated with inferior outcome to the novel targeted agents, such as the BTK inhibitor ibrutinib. Thus, even in the era of targeted therapies, CLL with alterations in the ATM/p53 pathway remains a clinical challenge. Here we generated two mouse models of Atm- and Trp53-deficient CLL. These animals display a significantly earlier disease onset and reduced overall survival, compared to controls. We employed these models in conjunction with transcriptome analyses following cyclophosphamide treatment to reveal that Atm deficiency is associated with an exquisite and genotype-specific sensitivity against PARP inhibition. Thus, we generate two aggressive CLL models and provide a preclinical rational for the use of PARP inhibitors in ATM-affected human CLL. Chronic lymphocytic leukemia (CLL) remains an incurable disease. Two recurrent cytogenetic aberrations, namely del(17p), affecting TP53, and del(11q), affecting ATM, are associated with resistance against genotoxic chemotherapy (del17p) and poor outcome (del11q and del17p). Both del(17p) and del(11q) are also associated with inferior outcome to the novel targeted agents, such as the BTK inhibitor ibrutinib. Thus, even in the era of targeted therapies, CLL with alterations in the ATM/p53 pathway remains a clinical challenge. Here we generated two mouse models of Atm- and Trp53-deficient CLL. These animals display a significantly earlier disease onset and reduced overall survival, compared to controls. We employed these models in conjunction with transcriptome analyses following cyclophosphamide treatment to reveal that Atm deficiency is associated with an exquisite and genotype-specific sensitivity against PARP inhibition. Thus, we generate two aggressive CLL models and provide a preclinical rational for the use of PARP inhibitors in ATM-affected human CLL.
Extensive RNA editing and splicing increase immune self representation diversity in medullary thymic epithelial cells
Background In order to become functionally competent but harmless mediators of the immune system, T cells undergo a strict educational program in the thymus, where they learn to discriminate between self and non-self. This educational program is, to a large extent, mediated by medullary thymic epithelial cells that have a unique capacity to express, and subsequently present, a large fraction of body antigens. While the scope of promiscuously expressed genes by medullary thymic epithelial cells is well-established, relatively little is known about the expression of variants that are generated by co-transcriptional and post-transcriptional processes. Results Our study reveals that in comparison to other cell types, medullary thymic epithelial cells display significantly higher levels of alternative splicing, as well as A-to-I and C-to-U RNA editing, which thereby further expand the diversity of their self-antigen repertoire. Interestingly, Aire, the key mediator of promiscuous gene expression in these cells, plays a limited role in the regulation of these transcriptional processes. Conclusions Our results highlight RNA processing as another layer by which the immune system assures a comprehensive self-representation in the thymus which is required for the establishment of self-tolerance and prevention of autoimmunity. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-1079-9) contains supplementary material, which is available to authorized users. Background In order to become functionally competent but harmless mediators of the immune system, T cells undergo a strict educational program in the thymus, where they learn to discriminate between self and non-self. This educational program is, to a large extent, mediated by medullary thymic epithelial cells that have a unique capacity to express, and subsequently present, a large fraction of body antigens. While the scope of promiscuously expressed genes by medullary thymic epithelial cells is well-established, relatively little is known about the expression of variants that are generated by co-transcriptional and post-transcriptional processes. Results Our study reveals that in comparison to other cell types, medullary thymic epithelial cells display significantly higher levels of alternative splicing, as well as A-to-I and C-to-U RNA editing, which thereby further expand the diversity of their self-antigen repertoire. Interestingly, Aire, the key mediator of promiscuous gene expression in these cells, plays a limited role in the regulation of these transcriptional processes. Conclusions Our results highlight RNA processing as another layer by which the immune system assures a comprehensive self-representation in the thymus which is required for the establishment of self-tolerance and prevention of autoimmunity. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-1079-9) contains supplementary material, which is available to authorized users.
A novel compound heterozygous variant of the SLC12A3 gene in Gitelman syndrome pedigree
Background Gitelman syndrome (GS) is an autosomal recessive disorder caused by genic mutations of SLC12A3 (Solute carrier family 12 member 3), which encodes the Na-Cl cotransporter (NCC), and presents with characteristic metabolic abnormalities, including hypokalemia, metabolic alkalosis, hypomagnesemia, and hypocalciuria. In this study, we report a case of a GS pedigree, including analysis of GS-associated gene mutations. Methods We performed next-generation sequencing analysis and Sanger sequencing to explore the SLC12A3 mutations in a GS pedigree that included a 35-year-old female patient with GS and five family members within three generations. Furthermore, we summarized their clinical manifestations and analyzed laboratory parameters related to GS. Results The female proband (the patient with GS) presented with intermittent fatigue and transient periods of tetany, along with significant hypokalemia, hypomagnesemia, and hypocalciuria. All other members of the pedigree had normal laboratory results without obvious GS-related symptoms. Genetic analysis of the SLC12A3 gene identified two novel missense mutations (c.1919A > G, p.N640S in exon 15; c.2522A > G, p.D841G in exon 21) in the patient with GS. Moreover, we demonstrated that her mother, younger maternal uncle, and cousin were carriers of one mutation (c.1919A > G), and her father was the carrier of the other (c.2522A > G). Conclusion This is the first report of these two novel pathogenic variants of SLC12A3 and their contribution to GS. Further functional studies are particularly warranted to explore the underlying molecular mechanisms. Electronic supplementary material The online version of this article (10.1186/s12881-018-0527-7) contains supplementary material, which is available to authorized users. Background Gitelman syndrome (GS) is an autosomal recessive disorder caused by genic mutations of SLC12A3 (Solute carrier family 12 member 3), which encodes the Na-Cl cotransporter (NCC), and presents with characteristic metabolic abnormalities, including hypokalemia, metabolic alkalosis, hypomagnesemia, and hypocalciuria. In this study, we report a case of a GS pedigree, including analysis of GS-associated gene mutations. Methods We performed next-generation sequencing analysis and Sanger sequencing to explore the SLC12A3 mutations in a GS pedigree that included a 35-year-old female patient with GS and five family members within three generations. Furthermore, we summarized their clinical manifestations and analyzed laboratory parameters related to GS. Results The female proband (the patient with GS) presented with intermittent fatigue and transient periods of tetany, along with significant hypokalemia, hypomagnesemia, and hypocalciuria. All other members of the pedigree had normal laboratory results without obvious GS-related symptoms. Genetic analysis of the SLC12A3 gene identified two novel missense mutations (c.1919A > G, p.N640S in exon 15; c.2522A > G, p.D841G in exon 21) in the patient with GS. Moreover, we demonstrated that her mother, younger maternal uncle, and cousin were carriers of one mutation (c.1919A > G), and her father was the carrier of the other (c.2522A > G). Conclusion This is the first report of these two novel pathogenic variants of SLC12A3 and their contribution to GS. Further functional studies are particularly warranted to explore the underlying molecular mechanisms. Electronic supplementary material The online version of this article (10.1186/s12881-018-0527-7) contains supplementary material, which is available to authorized users.
Two novel colorectal cancer risk loci in the region on chromosome 9q22.32
Highly penetrant cancer syndromes account for less than 5% of all cases with familial colorectal cancer (CRC), and other genetic contribution explains the majority of the genetic contribution to CRC. A CRC susceptibility locus on chromosome 9q has been suggested. In this study, families where risk of CRC was linked to the region, were used to search for predisposing mutations in all genes in the region. No disease-causing mutation was found. Next, haplotype association studies were performed in the region, comparing Swedish CRC cases (2664) and controls (4782). Two overlapping haplotypes were suggested. One 10-SNP haplotype was indicated in familial CRC (OR 1.4, p = 0.00005) and one 25-SNP haplotype was indicated in sporadic CRC (OR 2.2, p = 0.0000012). The allele frequencies of the 10-SNP and the 25-SNP haplotypes were 13.7% and 2.5% respectively and both included one RNA, RP11-332M4.1 and RP11-l80l4.2, in the non-overlapping regions. The sporadic 25-SNP haplotype could not be studied further, but the familial 10-SNP haplotype was analyzed in 61 additional CRC families, and 6 of them were informative for all markers and had the risk haplotype. Targeted sequencing of the 10-SNP region in the linked families identified one variant in RP11-332M4.1, suggestive to confer the increased CRC risk on this haplotype. Our results support the presence of two loci at 9q22.32, each with one RNA as the putative cause of increased CRC risk. These RNAs could exert their effect through the same, or different, genes/pathways, possibly through the regulation of neighboring genes, such as PTCH1, FANCC, DKFZP434H0512, ERCC6L2 or the processed transcript LINC00046. Highly penetrant cancer syndromes account for less than 5% of all cases with familial colorectal cancer (CRC), and other genetic contribution explains the majority of the genetic contribution to CRC. A CRC susceptibility locus on chromosome 9q has been suggested. In this study, families where risk of CRC was linked to the region, were used to search for predisposing mutations in all genes in the region. No disease-causing mutation was found. Next, haplotype association studies were performed in the region, comparing Swedish CRC cases (2664) and controls (4782). Two overlapping haplotypes were suggested. One 10-SNP haplotype was indicated in familial CRC (OR 1.4, p = 0.00005) and one 25-SNP haplotype was indicated in sporadic CRC (OR 2.2, p = 0.0000012). The allele frequencies of the 10-SNP and the 25-SNP haplotypes were 13.7% and 2.5% respectively and both included one RNA, RP11-332M4.1 and RP11-l80l4.2, in the non-overlapping regions. The sporadic 25-SNP haplotype could not be studied further, but the familial 10-SNP haplotype was analyzed in 61 additional CRC families, and 6 of them were informative for all markers and had the risk haplotype. Targeted sequencing of the 10-SNP region in the linked families identified one variant in RP11-332M4.1, suggestive to confer the increased CRC risk on this haplotype. Our results support the presence of two loci at 9q22.32, each with one RNA as the putative cause of increased CRC risk. These RNAs could exert their effect through the same, or different, genes/pathways, possibly through the regulation of neighboring genes, such as PTCH1, FANCC, DKFZP434H0512, ERCC6L2 or the processed transcript LINC00046.
Rational management approach to pure red cell aplasia
Pure red cell aplasia is an orphan disease, and as such lacks rationally established standard therapies. Most cases are idiopathic; a subset is antibody-mediated. There is overlap between idiopathic cases and those with T-cell large granular lymphocytic leukemia, hypogammaglobulinemia, and low-grade lymphomas. In each of the aforementioned, the pathogenetic mechanisms may involve autoreactive cytotoxic responses. We selected 62 uniformly diagnosed pure red cell aplasia patients and analyzed their pathophysiologic features and responsiveness to rationally applied first-line and salvage therapies in order to propose diagnostic and therapeutic algorithms that may be helpful in guiding the management of prospective patients, 52% of whom were idiopathic, while the others involved large granular lymphocytic leukemia, thymoma, and B-cell dyscrasia. T-cell-mediated responses ranged between a continuum from polyclonal to monoclonal (as seen in large granular lymphocytic leukemia). During a median observation period of 40 months, patients received a median of two different therapies to achieve remission. Frequently used therapy included calcineurin-inhibitors with a steroid taper yielding a first-line overall response rate of 76% (53/70). Oral cyclophosphamide showed activity, albeit lower than that produced by cyclosporine. Intravenous immunoglobulins were effective both in parvovirus patients and in hypogammaglobulinemia cases. In salvage settings, alemtuzumab is active, particularly in large granular lymphocytic leukemia-associated cases. Other potentially useful salvage options include rituximab, anti-thymocyte globulin and bortezomib. The workup of acquired pure red cell aplasia should include investigations of common pathological associations. Most effective therapies are directed against T-cell-mediated immunity, and therapeutic choices need to account for associated conditions that may help in choosing alternative salvage agents, such as intravenous immunoglobulin, alemtuzumab and bortezomib. Pure red cell aplasia is an orphan disease, and as such lacks rationally established standard therapies. Most cases are idiopathic; a subset is antibody-mediated. There is overlap between idiopathic cases and those with T-cell large granular lymphocytic leukemia, hypogammaglobulinemia, and low-grade lymphomas. In each of the aforementioned, the pathogenetic mechanisms may involve autoreactive cytotoxic responses. We selected 62 uniformly diagnosed pure red cell aplasia patients and analyzed their pathophysiologic features and responsiveness to rationally applied first-line and salvage therapies in order to propose diagnostic and therapeutic algorithms that may be helpful in guiding the management of prospective patients, 52% of whom were idiopathic, while the others involved large granular lymphocytic leukemia, thymoma, and B-cell dyscrasia. T-cell-mediated responses ranged between a continuum from polyclonal to monoclonal (as seen in large granular lymphocytic leukemia). During a median observation period of 40 months, patients received a median of two different therapies to achieve remission. Frequently used therapy included calcineurin-inhibitors with a steroid taper yielding a first-line overall response rate of 76% (53/70). Oral cyclophosphamide showed activity, albeit lower than that produced by cyclosporine. Intravenous immunoglobulins were effective both in parvovirus patients and in hypogammaglobulinemia cases. In salvage settings, alemtuzumab is active, particularly in large granular lymphocytic leukemia-associated cases. Other potentially useful salvage options include rituximab, anti-thymocyte globulin and bortezomib. The workup of acquired pure red cell aplasia should include investigations of common pathological associations. Most effective therapies are directed against T-cell-mediated immunity, and therapeutic choices need to account for associated conditions that may help in choosing alternative salvage agents, such as intravenous immunoglobulin, alemtuzumab and bortezomib.
Identification of a de novo DYNC1H1 mutation via WES according to published guidelines
De novo mutations that contribute to rare Mendelian diseases, including neurological disorders, have been recently identified. Whole-exome sequencing (WES) has become a powerful tool for the identification of inherited and de novo mutations in Mendelian diseases. Two important guidelines were recently published regarding the investigation of causality of sequence variant in human disease and the interpretation of novel variants identified in human genome sequences. In this study, a family with supposed movement disorders was sequenced via WES (including the proband and her unaffected parents), and a standard investigation and interpretation of the identified variants was performed according to the published guidelines. We identified a novel de novo mutation (c.2327C > T, p.P776L) in DYNC1H1 gene and confirmed that it was the causal variant. The phenotype of the affected twins included delayed motor milestones, pes cavus, lower limb weakness and atrophy, and a waddling gait. Electromyographic (EMG) recordings revealed typical signs of chronic denervation. Our study demonstrates the power of WES to discover the de novo mutations associated with a neurological disease on the whole exome scale, and guidelines to conduct WES studies and interpret of identified variants are a preferable option for the exploration of the pathogenesis of rare neurological disorders. De novo mutations that contribute to rare Mendelian diseases, including neurological disorders, have been recently identified. Whole-exome sequencing (WES) has become a powerful tool for the identification of inherited and de novo mutations in Mendelian diseases. Two important guidelines were recently published regarding the investigation of causality of sequence variant in human disease and the interpretation of novel variants identified in human genome sequences. In this study, a family with supposed movement disorders was sequenced via WES (including the proband and her unaffected parents), and a standard investigation and interpretation of the identified variants was performed according to the published guidelines. We identified a novel de novo mutation (c.2327C > T, p.P776L) in DYNC1H1 gene and confirmed that it was the causal variant. The phenotype of the affected twins included delayed motor milestones, pes cavus, lower limb weakness and atrophy, and a waddling gait. Electromyographic (EMG) recordings revealed typical signs of chronic denervation. Our study demonstrates the power of WES to discover the de novo mutations associated with a neurological disease on the whole exome scale, and guidelines to conduct WES studies and interpret of identified variants are a preferable option for the exploration of the pathogenesis of rare neurological disorders.
Identification of cell type specific mutations in nodal T cell lymphomas
Recent genetic analysis has identified frequent mutations in ten-eleven translocation 2 (TET2), DNA methyltransferase 3A (DNMT3A), isocitrate dehydrogenase 2 (IDH2) and ras homolog family member A (RHOA) in nodal T-cell lymphomas, including angioimmunoblastic T-cell lymphoma and peripheral T-cell lymphoma, not otherwise specified. We examined the distribution of mutations in these subtypes of mature T-/natural killer cell neoplasms to determine their clonal architecture. Targeted sequencing was performed for 71 genes in tumor-derived DNA of 87 cases. The mutations were then analyzed in a programmed death-1 (PD1)-positive population enriched with tumor cells and CD20-positive B cells purified by laser microdissection from 19 cases. TET2 and DNMT3A mutations were identified in both the PD1+ cells and the CD20+ cells in 15/16 and 4/7 cases, respectively. All the RHOA and IDH2 mutations were confined to the PD1+ cells, indicating that some, including RHOA and IDH2 mutations, being specific events in tumor cells. Notably, we found that all NOTCH1 mutations were detected only in the CD20+ cells. In conclusion, we identified both B- as well as T-cell-specific mutations, and mutations common to both T and B cells. These findings indicate the expansion of a clone after multistep and multilineal acquisition of gene mutations. Recent genetic analysis has identified frequent mutations in ten-eleven translocation 2 (TET2), DNA methyltransferase 3A (DNMT3A), isocitrate dehydrogenase 2 (IDH2) and ras homolog family member A (RHOA) in nodal T-cell lymphomas, including angioimmunoblastic T-cell lymphoma and peripheral T-cell lymphoma, not otherwise specified. We examined the distribution of mutations in these subtypes of mature T-/natural killer cell neoplasms to determine their clonal architecture. Targeted sequencing was performed for 71 genes in tumor-derived DNA of 87 cases. The mutations were then analyzed in a programmed death-1 (PD1)-positive population enriched with tumor cells and CD20-positive B cells purified by laser microdissection from 19 cases. TET2 and DNMT3A mutations were identified in both the PD1+ cells and the CD20+ cells in 15/16 and 4/7 cases, respectively. All the RHOA and IDH2 mutations were confined to the PD1+ cells, indicating that some, including RHOA and IDH2 mutations, being specific events in tumor cells. Notably, we found that all NOTCH1 mutations were detected only in the CD20+ cells. In conclusion, we identified both B- as well as T-cell-specific mutations, and mutations common to both T and B cells. These findings indicate the expansion of a clone after multistep and multilineal acquisition of gene mutations.
X linked primary ciliary dyskinesia due to mutations in the cytoplasmic axonemal dynein assembly factor PIH1D3
Primary ciliary dyskinesia (PCD) is a genetically heterogeneous disease resulting in reduced mucus clearance and impaired lung function. Here, the authors show that mutations in PIH1D3 are responsible for an X-linked form of PCD, affecting assembly of a subset of inner arm dyneins. Primary ciliary dyskinesia (PCD) is a genetically heterogeneous disease resulting in reduced mucus clearance and impaired lung function. Here, the authors show that mutations in PIH1D3 are responsible for an X-linked form of PCD, affecting assembly of a subset of inner arm dyneins.By moving essential body fluids and molecules, motile cilia and flagella govern respiratory mucociliary clearance, laterality determination and the transport of gametes and cerebrospinal fluid. Primary ciliary dyskinesia (PCD) is an autosomal recessive disorder frequently caused by non-assembly of dynein arm motors into cilia and flagella axonemes. Before their import into cilia and flagella, multi-subunit axonemal dynein arms are thought to be stabilized and pre-assembled in the cytoplasm through a DNAAF2–DNAAF4–HSP90 complex akin to the HSP90 co-chaperone R2TP complex. Here, we demonstrate that large genomic deletions as well as point mutations involving PIH1D3 are responsible for an X-linked form of PCD causing disruption of early axonemal dynein assembly. We propose that PIH1D3, a protein that emerges as a new player of the cytoplasmic pre-assembly pathway, is part of a complementary conserved R2TP-like HSP90 co-chaperone complex, the loss of which affects assembly of a subset of inner arm dyneins. By moving essential body fluids and molecules, motile cilia and flagella govern respiratory mucociliary clearance, laterality determination and the transport of gametes and cerebrospinal fluid. Primary ciliary dyskinesia (PCD) is an autosomal recessive disorder frequently caused by non-assembly of dynein arm motors into cilia and flagella axonemes. Before their import into cilia and flagella, multi-subunit axonemal dynein arms are thought to be stabilized and pre-assembled in the cytoplasm through a DNAAF2–DNAAF4–HSP90 complex akin to the HSP90 co-chaperone R2TP complex. Here, we demonstrate that large genomic deletions as well as point mutations involving PIH1D3 are responsible for an X-linked form of PCD causing disruption of early axonemal dynein assembly. We propose that PIH1D3, a protein that emerges as a new player of the cytoplasmic pre-assembly pathway, is part of a complementary conserved R2TP-like HSP90 co-chaperone complex, the loss of which affects assembly of a subset of inner arm dyneins.
Genetic and epigenetic methylation defects and implication of the ERMN gene in autism spectrum disorders
Autism spectrum disorders (ASD) are highly heritable and genetically complex conditions. Although highly penetrant mutations in multiple genes have been identified, they account for the etiology of <1/3 of cases. There is also strong evidence for environmental contribution to ASD, which can be mediated by still poorly explored epigenetic modifications. We searched for methylation changes on blood DNA of 53 male ASD patients and 757 healthy controls using a methylomic array (450K Illumina), correlated the variants with transcriptional alterations in blood RNAseq data, and performed a case–control association study of the relevant findings in a larger cohort (394 cases and 500 controls). We found 700 differentially methylated CpGs, most of them hypomethylated in the ASD group (83.9%), with cis-acting expression changes at 7.6% of locations. Relevant findings included: (1) hypomethylation caused by rare genetic variants (meSNVs) at six loci (ERMN, USP24, METTL21C, PDE10A, STX16 and DBT) significantly associated with ASD (q-value <0.05); and (2) clustered epimutations associated to transcriptional changes in single-ASD patients (n=4). All meSNVs and clustered epimutations were inherited from unaffected parents. Resequencing of the top candidate genes also revealed a significant load of deleterious mutations affecting ERMN in ASD compared with controls. Our data indicate that inherited methylation alterations detectable in blood DNA, due to either genetic or epigenetic defects, can affect gene expression and contribute to ASD susceptibility most likely in an additive manner, and implicate ERMN as a novel ASD gene. Autism spectrum disorders (ASD) are highly heritable and genetically complex conditions. Although highly penetrant mutations in multiple genes have been identified, they account for the etiology of <1/3 of cases. There is also strong evidence for environmental contribution to ASD, which can be mediated by still poorly explored epigenetic modifications. We searched for methylation changes on blood DNA of 53 male ASD patients and 757 healthy controls using a methylomic array (450K Illumina), correlated the variants with transcriptional alterations in blood RNAseq data, and performed a case–control association study of the relevant findings in a larger cohort (394 cases and 500 controls). We found 700 differentially methylated CpGs, most of them hypomethylated in the ASD group (83.9%), with cis-acting expression changes at 7.6% of locations. Relevant findings included: (1) hypomethylation caused by rare genetic variants (meSNVs) at six loci (ERMN, USP24, METTL21C, PDE10A, STX16 and DBT) significantly associated with ASD (q-value <0.05); and (2) clustered epimutations associated to transcriptional changes in single-ASD patients (n=4). All meSNVs and clustered epimutations were inherited from unaffected parents. Resequencing of the top candidate genes also revealed a significant load of deleterious mutations affecting ERMN in ASD compared with controls. Our data indicate that inherited methylation alterations detectable in blood DNA, due to either genetic or epigenetic defects, can affect gene expression and contribute to ASD susceptibility most likely in an additive manner, and implicate ERMN as a novel ASD gene.
Mutations in histone modulators are associated with prolonged survival during azacitidine therapy
Early therapeutic decision-making is crucial in patients with higher-risk MDS. We evaluated the impact of clinical parameters and mutational profiles in 134 consecutive patients treated with azacitidine using a combined cohort from Karolinska University Hospital (n=89) and from King's College Hospital, London (n=45). While neither clinical parameters nor mutations had a significant impact on response rate, both karyotype and mutational profile were strongly associated with survival from the start of treatment. IPSS high-risk cytogenetics negatively impacted overall survival (median 20 vs 10 months; p<0.001), whereas mutations in histone modulators (ASXL1, EZH2) were associated with prolonged survival (22 vs 12 months, p=0.01). This positive association was present in both cohorts and remained highly significant in the multivariate cox model. Importantly, patients with mutations in histone modulators lacking high-risk cytogenetics showed a survival of 29 months compared to only 10 months in patients with the opposite pattern. While TP53 was negatively associated with survival, neither RUNX1-mutations nor the number of mutations appeared to influence survival in this cohort. We propose a model combining histone modulator mutational screening with cytogenetics in the clinical decision-making process for higher-risk MDS patients eligible for treatment with azacitidine. Early therapeutic decision-making is crucial in patients with higher-risk MDS. We evaluated the impact of clinical parameters and mutational profiles in 134 consecutive patients treated with azacitidine using a combined cohort from Karolinska University Hospital (n=89) and from King's College Hospital, London (n=45). While neither clinical parameters nor mutations had a significant impact on response rate, both karyotype and mutational profile were strongly associated with survival from the start of treatment. IPSS high-risk cytogenetics negatively impacted overall survival (median 20 vs 10 months; p<0.001), whereas mutations in histone modulators (ASXL1, EZH2) were associated with prolonged survival (22 vs 12 months, p=0.01). This positive association was present in both cohorts and remained highly significant in the multivariate cox model. Importantly, patients with mutations in histone modulators lacking high-risk cytogenetics showed a survival of 29 months compared to only 10 months in patients with the opposite pattern. While TP53 was negatively associated with survival, neither RUNX1-mutations nor the number of mutations appeared to influence survival in this cohort. We propose a model combining histone modulator mutational screening with cytogenetics in the clinical decision-making process for higher-risk MDS patients eligible for treatment with azacitidine.
STUB1 mutations in autosomal recessive ataxias – evidence for mutation specific clinical heterogeneity
Background A subset of hereditary cerebellar ataxias is inherited as autosomal recessive traits (ARCAs). Classification of recessive ataxias due to phenotypic differences in the cerebellum and cerebellar structures is constantly evolving due to new identified disease genes. Recently, reports have linked mutations in genes involved in ubiquitination (RNF216, OTUD4, STUB1) to ARCA with hypogonadism. Methods and results With a combination of homozygozity mapping and exome sequencing, we identified three mutations in STUB1 in two families with ARCA and cognitive impairment; a homozygous missense variant (c.194A > G, p.Asn65Ser) that segregated in three affected siblings, and a missense change (c.82G > A, p.Glu28Lys) which was inherited in trans with a nonsense mutation (c.430A > T, p.Lys144Ter) in another patient. STUB1 encodes CHIP (C-terminus of Heat shock protein 70 – Interacting Protein), a dual function protein with a role in ubiquitination as a co-chaperone with heat shock proteins, and as an E3 ligase. We show that the p.Asn65Ser substitution impairs CHIP’s ability to ubiquitinate HSC70 in vitro, despite being able to self-ubiquitinate. These results are consistent with previous studies highlighting this as a critical residue for the interaction between CHIP and its co-chaperones. Furthermore, we show that the levels of CHIP are strongly reduced in vivo in patients’ fibroblasts compared to controls. Conclusions These results suggest that STUB1 mutations might cause disease by impacting not only the E3 ligase function, but also its protein interaction properties and protein amount. Whether the clinical heterogeneity seen in STUB1 ARCA can be related to the location of the mutations remains to be understood, but interestingly, all siblings with the p.Asn65Ser substitution showed a marked appearance of accelerated aging not previously described in STUB1 related ARCA, none display hormonal aberrations/clinical hypogonadism while some affected family members had diabetes, alopecia, uveitis and ulcerative colitis, further refining the spectrum of STUB1 related disease. Electronic supplementary material The online version of this article (doi:10.1186/s13023-014-0146-0) contains supplementary material, which is available to authorized users. Background A subset of hereditary cerebellar ataxias is inherited as autosomal recessive traits (ARCAs). Classification of recessive ataxias due to phenotypic differences in the cerebellum and cerebellar structures is constantly evolving due to new identified disease genes. Recently, reports have linked mutations in genes involved in ubiquitination (RNF216, OTUD4, STUB1) to ARCA with hypogonadism. Methods and results With a combination of homozygozity mapping and exome sequencing, we identified three mutations in STUB1 in two families with ARCA and cognitive impairment; a homozygous missense variant (c.194A > G, p.Asn65Ser) that segregated in three affected siblings, and a missense change (c.82G > A, p.Glu28Lys) which was inherited in trans with a nonsense mutation (c.430A > T, p.Lys144Ter) in another patient. STUB1 encodes CHIP (C-terminus of Heat shock protein 70 – Interacting Protein), a dual function protein with a role in ubiquitination as a co-chaperone with heat shock proteins, and as an E3 ligase. We show that the p.Asn65Ser substitution impairs CHIP’s ability to ubiquitinate HSC70 in vitro, despite being able to self-ubiquitinate. These results are consistent with previous studies highlighting this as a critical residue for the interaction between CHIP and its co-chaperones. Furthermore, we show that the levels of CHIP are strongly reduced in vivo in patients’ fibroblasts compared to controls. Conclusions These results suggest that STUB1 mutations might cause disease by impacting not only the E3 ligase function, but also its protein interaction properties and protein amount. Whether the clinical heterogeneity seen in STUB1 ARCA can be related to the location of the mutations remains to be understood, but interestingly, all siblings with the p.Asn65Ser substitution showed a marked appearance of accelerated aging not previously described in STUB1 related ARCA, none display hormonal aberrations/clinical hypogonadism while some affected family members had diabetes, alopecia, uveitis and ulcerative colitis, further refining the spectrum of STUB1 related disease. Electronic supplementary material The online version of this article (doi:10.1186/s13023-014-0146-0) contains supplementary material, which is available to authorized users.
Rare Variants in NOD1 Associated with Carotid Bifurcation Intima Media Thickness in Dominican Republic Families
Cardiovascular disorders including ischemic stroke (IS) and myocardial infarction (MI) are heritable; however, few replicated loci have been identified. One strategy to identify loci influencing these complex disorders is to study subclinical phenotypes, such as carotid bifurcation intima-media thickness (bIMT). We have previously shown bIMT to be heritable and found evidence for linkage and association with common variants on chromosome 7p for bIMT. In this study, we aimed to characterize contributions of rare variants (RVs) in 7p to bIMT. To achieve this aim, we sequenced the 1 LOD unit down region on 7p in nine extended families from the Dominican Republic (DR) with strong evidence for linkage to bIMT. We then performed the family-based sequence kernel association test (famSKAT) on genes within the 7p region. Analyses were restricted to single nucleotide variants (SNVs) with population based minor allele frequency (MAF) <5%. We first analyzed all exonic RVs and then the subset of only non-synonymous RVs. There were 68 genes in our analyses. Nucleotide-binding oligomerization domain (NOD1) was the most significantly associated gene when analyzing exonic RVs (famSKAT p = 9.2x10-4; number of SNVs = 14). We achieved suggestive replication of NOD1 in an independent sample of twelve extended families from the DR (p = 0.055). Our study provides suggestive statistical evidence for a role of rare variants in NOD1 in bIMT. Studies in mice have shown Nod1 to play a role in heart function and atherosclerosis, providing biologic plausibility for a role in bIMT thus making NOD1 an excellent bIMT candidate. Cardiovascular disorders including ischemic stroke (IS) and myocardial infarction (MI) are heritable; however, few replicated loci have been identified. One strategy to identify loci influencing these complex disorders is to study subclinical phenotypes, such as carotid bifurcation intima-media thickness (bIMT). We have previously shown bIMT to be heritable and found evidence for linkage and association with common variants on chromosome 7p for bIMT. In this study, we aimed to characterize contributions of rare variants (RVs) in 7p to bIMT. To achieve this aim, we sequenced the 1 LOD unit down region on 7p in nine extended families from the Dominican Republic (DR) with strong evidence for linkage to bIMT. We then performed the family-based sequence kernel association test (famSKAT) on genes within the 7p region. Analyses were restricted to single nucleotide variants (SNVs) with population based minor allele frequency (MAF) <5%. We first analyzed all exonic RVs and then the subset of only non-synonymous RVs. There were 68 genes in our analyses. Nucleotide-binding oligomerization domain (NOD1) was the most significantly associated gene when analyzing exonic RVs (famSKAT p = 9.2x10-4; number of SNVs = 14). We achieved suggestive replication of NOD1 in an independent sample of twelve extended families from the DR (p = 0.055). Our study provides suggestive statistical evidence for a role of rare variants in NOD1 in bIMT. Studies in mice have shown Nod1 to play a role in heart function and atherosclerosis, providing biologic plausibility for a role in bIMT thus making NOD1 an excellent bIMT candidate.
Glioblastoma adaptation traced through decline of an IDH1 clonal driver and macro evolution of a double minute chromosome
In a glioblastoma tumour with multi-region sequencing before and after recurrence, we find an IDH1 mutation that is clonal in the primary but lost at recurrence. We also describe the evolution of a double-minute chromosome encoding regulators of the PI3K signalling axis that dominates at recurrence, emphasizing the challenges of an evolving and dynamic oncogenic landscape for precision medicine. In a glioblastoma tumour with multi-region sequencing before and after recurrence, we find an IDH1 mutation that is clonal in the primary but lost at recurrence. We also describe the evolution of a double-minute chromosome encoding regulators of the PI3K signalling axis that dominates at recurrence, emphasizing the challenges of an evolving and dynamic oncogenic landscape for precision medicine.Background Glioblastoma (GBM) is the most common malignant brain cancer occurring in adults, and is associated with dismal outcome and few therapeutic options. GBM has been shown to predominantly disrupt three core pathways through somatic aberrations, rendering it ideal for precision medicine approaches. Methods We describe a 35-year-old female patient with recurrent GBM following surgical removal of the primary tumour, adjuvant treatment with temozolomide and a 3-year disease-free period. Rapid whole-genome sequencing (WGS) of three separate tumour regions at recurrence was carried out and interpreted relative to WGS of two regions of the primary tumour. Results We found extensive mutational and copy-number heterogeneity within the primary tumour. We identified a TP53 mutation and two focal amplifications involving PDGFRA, KIT and CDK4, on chromosomes 4 and 12. A clonal IDH1 R132H mutation in the primary, a known GBM driver event, was detectable at only very low frequency in the recurrent tumour. After sub-clonal diversification, evidence was found for a whole-genome doubling event and a translocation between the amplified regions of PDGFRA, KIT and CDK4, encoded within a double-minute chromosome also incorporating miR26a-2. The WGS analysis uncovered progressive evolution of the double-minute chromosome converging on the KIT/PDGFRA/PI3K/mTOR axis, superseding the IDH1 mutation in dominance in a mutually exclusive manner at recurrence, consequently the patient was treated with imatinib. Despite rapid sequencing and cancer genome-guided therapy against amplified oncogenes, the disease progressed, and the patient died shortly after. Conclusion This case sheds light on the dynamic evolution of a GBM tumour, defining the origins of the lethal sub-clone, the macro-evolutionary genomic events dominating the disease at recurrence and the loss of a clonal driver. Even in the era of rapid WGS analysis, cases such as this illustrate the significant hurdles for precision medicine success. Background Glioblastoma (GBM) is the most common malignant brain cancer occurring in adults, and is associated with dismal outcome and few therapeutic options. GBM has been shown to predominantly disrupt three core pathways through somatic aberrations, rendering it ideal for precision medicine approaches. Methods We describe a 35-year-old female patient with recurrent GBM following surgical removal of the primary tumour, adjuvant treatment with temozolomide and a 3-year disease-free period. Rapid whole-genome sequencing (WGS) of three separate tumour regions at recurrence was carried out and interpreted relative to WGS of two regions of the primary tumour. Results We found extensive mutational and copy-number heterogeneity within the primary tumour. We identified a TP53 mutation and two focal amplifications involving PDGFRA, KIT and CDK4, on chromosomes 4 and 12. A clonal IDH1 R132H mutation in the primary, a known GBM driver event, was detectable at only very low frequency in the recurrent tumour. After sub-clonal diversification, evidence was found for a whole-genome doubling event and a translocation between the amplified regions of PDGFRA, KIT and CDK4, encoded within a double-minute chromosome also incorporating miR26a-2. The WGS analysis uncovered progressive evolution of the double-minute chromosome converging on the KIT/PDGFRA/PI3K/mTOR axis, superseding the IDH1 mutation in dominance in a mutually exclusive manner at recurrence, consequently the patient was treated with imatinib. Despite rapid sequencing and cancer genome-guided therapy against amplified oncogenes, the disease progressed, and the patient died shortly after. Conclusion This case sheds light on the dynamic evolution of a GBM tumour, defining the origins of the lethal sub-clone, the macro-evolutionary genomic events dominating the disease at recurrence and the loss of a clonal driver. Even in the era of rapid WGS analysis, cases such as this illustrate the significant hurdles for precision medicine success.
Genetic Diagnosis of Charcot Marie Tooth Disease in a Population by Next Generation Sequencing
Charcot-Marie-Tooth (CMT) disease is the most prevalent inherited neuropathy. Today more than 40 CMT genes have been identified. Diagnosing heterogeneous diseases by conventional Sanger sequencing is time consuming and expensive. Thus, more efficient and less costly methods are needed in clinical diagnostics. We included a population based sample of 81 CMT families. Gene mutations had previously been identified in 22 families; the remaining 59 families were analysed by next-generation sequencing. Thirty-two CMT genes and 19 genes causing other inherited neuropathies were included in a custom panel. Variants were classified into five pathogenicity classes by genotype-phenotype correlations and bioinformatics tools. Gene mutations, classified certainly or likely pathogenic, were identified in 37 (46%) of the 81 families. Point mutations in known CMT genes were identified in 21 families (26%), whereas four families (5%) had point mutations in other neuropathy genes, ARHGEF10, POLG, SETX, and SOD1. Eleven families (14%) carried the PMP22 duplication and one family carried a MPZ duplication (1%). Most mutations were identified not only in known CMT genes but also in other neuropathy genes, emphasising that genetic analysis should not be restricted to CMT genes only. Next-generation sequencing is a cost-effective tool in diagnosis of CMT improving diagnostic precision and time efficiency. Charcot-Marie-Tooth (CMT) disease is the most prevalent inherited neuropathy. Today more than 40 CMT genes have been identified. Diagnosing heterogeneous diseases by conventional Sanger sequencing is time consuming and expensive. Thus, more efficient and less costly methods are needed in clinical diagnostics. We included a population based sample of 81 CMT families. Gene mutations had previously been identified in 22 families; the remaining 59 families were analysed by next-generation sequencing. Thirty-two CMT genes and 19 genes causing other inherited neuropathies were included in a custom panel. Variants were classified into five pathogenicity classes by genotype-phenotype correlations and bioinformatics tools. Gene mutations, classified certainly or likely pathogenic, were identified in 37 (46%) of the 81 families. Point mutations in known CMT genes were identified in 21 families (26%), whereas four families (5%) had point mutations in other neuropathy genes, ARHGEF10, POLG, SETX, and SOD1. Eleven families (14%) carried the PMP22 duplication and one family carried a MPZ duplication (1%). Most mutations were identified not only in known CMT genes but also in other neuropathy genes, emphasising that genetic analysis should not be restricted to CMT genes only. Next-generation sequencing is a cost-effective tool in diagnosis of CMT improving diagnostic precision and time efficiency.
Germline whole exome sequencing and large scale replication identifies FANCM as a likely high grade serous ovarian cancer susceptibility gene
We analyzed whole exome sequencing data in germline DNA from 412 high grade serous ovarian cancer (HGSOC) cases from The Cancer Genome Atlas Project and identified 5,517 genes harboring a predicted deleterious germline coding mutation in at least one HGSOC case. Gene-set enrichment analysis showed enrichment for genes involved in DNA repair (p = 1.8×10-3). Twelve DNA repair genes - APEX1, APLF, ATX, EME1, FANCL, FANCM, MAD2L2, PARP2, PARP3, POLN, RAD54L and SMUG1 – were prioritized for targeted sequencing in up to 3,107 HGSOC cases, 1,491 cases of other epithelial ovarian cancer (EOC) subtypes and 3,368 unaffected controls of European origin. We estimated mutation prevalence for each gene and tested for associations with disease risk. Mutations were identified in both cases and controls in all genes except MAD2L2, where we found no evidence of mutations in controls. In FANCM we observed a higher mutation frequency in HGSOC cases compared to controls (29/3,107 cases, 0.96 percent; 13/3,368 controls, 0.38 percent; P=0.008) with little evidence for association with other subtypes (6/1,491, 0.40 percent; P=0.82). The relative risk of HGSOC associated with deleterious FANCM mutations was estimated to be 2.5 (95% CI 1.3 – 5.0; P=0.006). In summary, whole exome sequencing of EOC cases with large-scale replication in case-control studies has identified FANCM as a likely novel susceptibility gene for HGSOC, with mutations associated with a moderate increase in risk. These data may have clinical implications for risk prediction and prevention approaches for high-grade serous ovarian cancer in the future and a significant impact on reducing disease mortality. We analyzed whole exome sequencing data in germline DNA from 412 high grade serous ovarian cancer (HGSOC) cases from The Cancer Genome Atlas Project and identified 5,517 genes harboring a predicted deleterious germline coding mutation in at least one HGSOC case. Gene-set enrichment analysis showed enrichment for genes involved in DNA repair (p = 1.8×10-3). Twelve DNA repair genes - APEX1, APLF, ATX, EME1, FANCL, FANCM, MAD2L2, PARP2, PARP3, POLN, RAD54L and SMUG1 – were prioritized for targeted sequencing in up to 3,107 HGSOC cases, 1,491 cases of other epithelial ovarian cancer (EOC) subtypes and 3,368 unaffected controls of European origin. We estimated mutation prevalence for each gene and tested for associations with disease risk. Mutations were identified in both cases and controls in all genes except MAD2L2, where we found no evidence of mutations in controls. In FANCM we observed a higher mutation frequency in HGSOC cases compared to controls (29/3,107 cases, 0.96 percent; 13/3,368 controls, 0.38 percent; P=0.008) with little evidence for association with other subtypes (6/1,491, 0.40 percent; P=0.82). The relative risk of HGSOC associated with deleterious FANCM mutations was estimated to be 2.5 (95% CI 1.3 – 5.0; P=0.006). In summary, whole exome sequencing of EOC cases with large-scale replication in case-control studies has identified FANCM as a likely novel susceptibility gene for HGSOC, with mutations associated with a moderate increase in risk. These data may have clinical implications for risk prediction and prevention approaches for high-grade serous ovarian cancer in the future and a significant impact on reducing disease mortality.
The clinical features, outcomes and genetic characteristics of hypertrophic cardiomyopathy patients with severe right ventricular hypertrophy
Introduction Severe right ventricular hypertrophy (SRVH) is a rare phenotype in hypertrophic cardiomyopathy (HCM) for which limited information is available. This study was undertaken to investigate the clinical, prognostic and genetic characteristics of HCM patients with SRVH. Methods HCM with SRVH was defined as HCM with a maximum right ventricular wall thickness ≥10 mm. Whole-genome sequencing (WGS) was performed in HCM patients with SRVH. Multivariate Cox proportional hazards regression models were used to identify risk factors for cardiac death and events in HCM with SRVH. Patients with apical hypertrophic cardiomyopathy (ApHCM) were selected as a comparison group. The clinical features and outcomes of 34 HCM patients with SRVH and 273 ApHCM patients were compared. Results Compared with the ApHCM group, the HCM with SRVH group included younger patients and a higher proportion of female patients and also displayed higher cardiovascular morbidity and mortality. The multivariate Cox proportional hazards regression models identified 2 independent predictors of cardiovascular death in HCM patients with SRVH, a New York Heart Association class ≥III (hazard ratio [HR] = 8.7, 95% confidence interval (CI): 1.43-52.87, p = 0.019) and an age at the time of HCM diagnosis ≤18 (HR = 5.5, 95% CI: 1.24-28.36, p = 0.026). Among the 11 HCM patients with SRVH who underwent WGS, 10 (90.9%) were identified as carriers of at least one specific sarcomere gene mutation. MYH7 and TTN mutations were the most common sarcomere mutations noted in this study. Two or more HCM-related gene mutations were observed in 9 (82%) patients, and mutations in either other cardiomyopathy-related genes or ion-channel disease-related genes were found in 8 (73%) patients. Conclusions HCM patients with SRVH were characterized by poor clinical outcomes and the presentation of multiple gene mutations. Introduction Severe right ventricular hypertrophy (SRVH) is a rare phenotype in hypertrophic cardiomyopathy (HCM) for which limited information is available. This study was undertaken to investigate the clinical, prognostic and genetic characteristics of HCM patients with SRVH. Methods HCM with SRVH was defined as HCM with a maximum right ventricular wall thickness ≥10 mm. Whole-genome sequencing (WGS) was performed in HCM patients with SRVH. Multivariate Cox proportional hazards regression models were used to identify risk factors for cardiac death and events in HCM with SRVH. Patients with apical hypertrophic cardiomyopathy (ApHCM) were selected as a comparison group. The clinical features and outcomes of 34 HCM patients with SRVH and 273 ApHCM patients were compared. Results Compared with the ApHCM group, the HCM with SRVH group included younger patients and a higher proportion of female patients and also displayed higher cardiovascular morbidity and mortality. The multivariate Cox proportional hazards regression models identified 2 independent predictors of cardiovascular death in HCM patients with SRVH, a New York Heart Association class ≥III (hazard ratio [HR] = 8.7, 95% confidence interval (CI): 1.43-52.87, p = 0.019) and an age at the time of HCM diagnosis ≤18 (HR = 5.5, 95% CI: 1.24-28.36, p = 0.026). Among the 11 HCM patients with SRVH who underwent WGS, 10 (90.9%) were identified as carriers of at least one specific sarcomere gene mutation. MYH7 and TTN mutations were the most common sarcomere mutations noted in this study. Two or more HCM-related gene mutations were observed in 9 (82%) patients, and mutations in either other cardiomyopathy-related genes or ion-channel disease-related genes were found in 8 (73%) patients. Conclusions HCM patients with SRVH were characterized by poor clinical outcomes and the presentation of multiple gene mutations.
Identification of a Comprehensive Spectrum of Genetic Factors for Hereditary Breast Cancer in a Chinese Population by Next Generation Sequencing
The genetic etiology of hereditary breast cancer has not been fully elucidated. Although germline mutations of high-penetrance genes such as BRCA1/2 are implicated in development of hereditary breast cancers, at least half of all breast cancer families are not linked to these genes. To identify a comprehensive spectrum of genetic factors for hereditary breast cancer in a Chinese population, we performed an analysis of germline mutations in 2,165 coding exons of 152 genes associated with hereditary cancer using next-generation sequencing (NGS) in 99 breast cancer patients from families of cancer patients regardless of cancer types. Forty-two deleterious germline mutations were identified in 21 genes of 34 patients, including 18 (18.2%) BRCA1 or BRCA2 mutations, 3 (3%) TP53 mutations, 5 (5.1%) DNA mismatch repair gene mutations, 1 (1%) CDH1 mutation, 6 (6.1%) Fanconi anemia pathway gene mutations, and 9 (9.1%) mutations in other genes. Of seven patients who carried mutations in more than one gene, 4 were BRCA1/2 mutation carriers, and their average onset age was much younger than patients with only BRCA1/2 mutations. Almost all identified high-penetrance gene mutations in those families fulfill the typical phenotypes of hereditary cancer syndromes listed in the National Comprehensive Cancer Network (NCCN) guidelines, except two TP53 and three mismatch repair gene mutations. Furthermore, functional studies of MSH3 germline mutations confirmed the association between MSH3 mutation and tumorigenesis, and segregation analysis suggested antagonism between BRCA1 and MSH3. We also identified a lot of low-penetrance gene mutations. Although the clinical significance of those newly identified low-penetrance gene mutations has not been fully appreciated yet, these new findings do provide valuable epidemiological information for the future studies. Together, these findings highlight the importance of genetic testing based on NCCN guidelines and a multi-gene analysis using NGS may be a supplement to traditional genetic counseling. The genetic etiology of hereditary breast cancer has not been fully elucidated. Although germline mutations of high-penetrance genes such as BRCA1/2 are implicated in development of hereditary breast cancers, at least half of all breast cancer families are not linked to these genes. To identify a comprehensive spectrum of genetic factors for hereditary breast cancer in a Chinese population, we performed an analysis of germline mutations in 2,165 coding exons of 152 genes associated with hereditary cancer using next-generation sequencing (NGS) in 99 breast cancer patients from families of cancer patients regardless of cancer types. Forty-two deleterious germline mutations were identified in 21 genes of 34 patients, including 18 (18.2%) BRCA1 or BRCA2 mutations, 3 (3%) TP53 mutations, 5 (5.1%) DNA mismatch repair gene mutations, 1 (1%) CDH1 mutation, 6 (6.1%) Fanconi anemia pathway gene mutations, and 9 (9.1%) mutations in other genes. Of seven patients who carried mutations in more than one gene, 4 were BRCA1/2 mutation carriers, and their average onset age was much younger than patients with only BRCA1/2 mutations. Almost all identified high-penetrance gene mutations in those families fulfill the typical phenotypes of hereditary cancer syndromes listed in the National Comprehensive Cancer Network (NCCN) guidelines, except two TP53 and three mismatch repair gene mutations. Furthermore, functional studies of MSH3 germline mutations confirmed the association between MSH3 mutation and tumorigenesis, and segregation analysis suggested antagonism between BRCA1 and MSH3. We also identified a lot of low-penetrance gene mutations. Although the clinical significance of those newly identified low-penetrance gene mutations has not been fully appreciated yet, these new findings do provide valuable epidemiological information for the future studies. Together, these findings highlight the importance of genetic testing based on NCCN guidelines and a multi-gene analysis using NGS may be a supplement to traditional genetic counseling.
Setdb1 Is Required for Myogenic Differentiation of C2C12 Myoblast Cells via Maintenance of MyoD Expression
Setdb1, an H3-K9 specific histone methyltransferase, is associated with transcriptional silencing of euchromatic genes through chromatin modification. Functions of Setdb1 during development have been extensively studied in embryonic and mesenchymal stem cells as well as neurogenic progenitor cells. But the role of Sedtdb1 in myogenic differentiation remains unknown. In this study, we report that Setdb1 is required for myogenic potential of C2C12 myoblast cells through maintaining the expressions of MyoD and muscle-specific genes. We find that reduced Setdb1 expression in C2C12 myoblast cells severely delayed differentiation of C2C12 myoblast cells, whereas exogenous Setdb1 expression had little effect on. Gene expression profiling analysis using oligonucleotide micro-array and RNA-Seq technologies demonstrated that depletion of Setdb1 results in downregulation of MyoD as well as the components of muscle fiber in proliferating C2C12 cells. In addition, exogenous expression of MyoD reversed transcriptional repression of MyoD promoter-driven lucif-erase reporter by Setdb1 shRNA and rescued myogenic differentiation of C2C12 myoblast cells depleted of endogenous Setdb1. Taken together, these results provide new insights into how levels of key myogenic regulators are maintained prior to induction of differentiation. Setdb1, an H3-K9 specific histone methyltransferase, is associated with transcriptional silencing of euchromatic genes through chromatin modification. Functions of Setdb1 during development have been extensively studied in embryonic and mesenchymal stem cells as well as neurogenic progenitor cells. But the role of Sedtdb1 in myogenic differentiation remains unknown. In this study, we report that Setdb1 is required for myogenic potential of C2C12 myoblast cells through maintaining the expressions of MyoD and muscle-specific genes. We find that reduced Setdb1 expression in C2C12 myoblast cells severely delayed differentiation of C2C12 myoblast cells, whereas exogenous Setdb1 expression had little effect on. Gene expression profiling analysis using oligonucleotide micro-array and RNA-Seq technologies demonstrated that depletion of Setdb1 results in downregulation of MyoD as well as the components of muscle fiber in proliferating C2C12 cells. In addition, exogenous expression of MyoD reversed transcriptional repression of MyoD promoter-driven lucif-erase reporter by Setdb1 shRNA and rescued myogenic differentiation of C2C12 myoblast cells depleted of endogenous Setdb1. Taken together, these results provide new insights into how levels of key myogenic regulators are maintained prior to induction of differentiation.
Genetic analyses in a bonobo (Pan paniscus) with arrhythmogenic right ventricular cardiomyopathy
Arrhythmogenic right ventricular cardiomyopathy (ARVC) is a disorder that may lead to sudden death and can affect humans and other primates. In 2012, the alpha male bonobo of the Milwaukee County Zoo died suddenly and histologic evaluation found features of ARVC. This study sought to discover a possible genetic cause for ARVC in this individual. We sequenced our subject’s DNA to search for deleterious variants in genes involved in cardiovascular disorders. Variants found were annotated according to the human genome, following currently available classification used for human diseases. Sequencing from the DNA of an unrelated unaffected bonobo was also used for prediction of pathogenicity. Twenty-four variants of uncertain clinical significance (VUSs) but no pathogenic variants were found in the proband studied. Further familial, functional, and bonobo population studies are needed to determine if any of the VUSs or a combination of the VUSs found may be associated with the clinical findings. Future genotype-phenotype establishment will be beneficial for the appropriate care of the captive zoo bonobo population world-wide as well as conservation of the bobono species in its native habitat. Arrhythmogenic right ventricular cardiomyopathy (ARVC) is a disorder that may lead to sudden death and can affect humans and other primates. In 2012, the alpha male bonobo of the Milwaukee County Zoo died suddenly and histologic evaluation found features of ARVC. This study sought to discover a possible genetic cause for ARVC in this individual. We sequenced our subject’s DNA to search for deleterious variants in genes involved in cardiovascular disorders. Variants found were annotated according to the human genome, following currently available classification used for human diseases. Sequencing from the DNA of an unrelated unaffected bonobo was also used for prediction of pathogenicity. Twenty-four variants of uncertain clinical significance (VUSs) but no pathogenic variants were found in the proband studied. Further familial, functional, and bonobo population studies are needed to determine if any of the VUSs or a combination of the VUSs found may be associated with the clinical findings. Future genotype-phenotype establishment will be beneficial for the appropriate care of the captive zoo bonobo population world-wide as well as conservation of the bobono species in its native habitat.
Germline mutations in ETV6 are associated with thrombocytopenia, red cell macrocytosis and predisposition to lymphoblastic leukemia
Some familial platelet disorders are associated with predisposition to leukemia, myelodysplastic syndrome (MDS) or dyserythropoietic anemia.1,2 We identified a family with autosomal dominant thrombocytopenia, high erythrocyte mean corpuscular volume (MCV) and two occurrences of B-cell precursor acute lymphoblastic leukemia (ALL). Whole exome sequencing identified a heterozygous single nucleotide change in ETV6 (Ets Variant Gene 6), c.641C>T, encoding a p.Pro214Leu substitution in the central domain, segregating with thrombocytopenia and elevated MCV. A screen of 23 families with similar phenotype found two with ETV6 mutations. One family had the p.Pro214Leu mutation and one individual with ALL. The other family had a c.1252A>G transition producing a p.Arg418Gly substitution in the DNA binding domain, with alternative splicing and exon-skipping. Functional characterization of these mutations showed aberrant cellular localization of mutant and endogenous ETV6, decreased transcriptional repression and altered megakaryocyte maturation. Our findings underscore a key role for ETV6 in platelet formation and leukemia predisposition. Some familial platelet disorders are associated with predisposition to leukemia, myelodysplastic syndrome (MDS) or dyserythropoietic anemia.1,2 We identified a family with autosomal dominant thrombocytopenia, high erythrocyte mean corpuscular volume (MCV) and two occurrences of B-cell precursor acute lymphoblastic leukemia (ALL). Whole exome sequencing identified a heterozygous single nucleotide change in ETV6 (Ets Variant Gene 6), c.641C>T, encoding a p.Pro214Leu substitution in the central domain, segregating with thrombocytopenia and elevated MCV. A screen of 23 families with similar phenotype found two with ETV6 mutations. One family had the p.Pro214Leu mutation and one individual with ALL. The other family had a c.1252A>G transition producing a p.Arg418Gly substitution in the DNA binding domain, with alternative splicing and exon-skipping. Functional characterization of these mutations showed aberrant cellular localization of mutant and endogenous ETV6, decreased transcriptional repression and altered megakaryocyte maturation. Our findings underscore a key role for ETV6 in platelet formation and leukemia predisposition.
BACH2 immunodeficiency illustrates an association between super enhancers and haploinsufficiency
Transcriptional programs guiding lymphocyte differentiation depend on precise expression and timing of transcription factors (TFs). BACH2 is a TF essential for T- and B-lymphocytes and is associated with an archetypal super-enhancer (SE). Single nucleotide variants in the BACH2 locus associate with multiple autoimmune diseases but BACH2 mutations causing Mendelian monogenic primary immunodeficiency have not previously been identified. We describe a syndrome of BACH2-related immunodeficiency and autoimmunity (BRIDA) resulting from BACH2 haploinsufficiency. Patients had lymphocyte maturation defects, causing immunoglobulin deficiency and intestinal inflammation. The mutations disrupted protein stability by interfering with homodimerization or by causing aggregation. Analogous lymphocyte defects existed in Bach2 heterozygous mice. More generally, we found that genes causing monogenic haploinsufficient diseases are substantially enriched for TFs and SE-architecture. These observations show a new feature of SE-architecture in Mendelian diseases of immunity, that heterozygous mutations in SE-regulated genes identified on whole exome/genome sequencing may have greater significance than recognized. Transcriptional programs guiding lymphocyte differentiation depend on precise expression and timing of transcription factors (TFs). BACH2 is a TF essential for T- and B-lymphocytes and is associated with an archetypal super-enhancer (SE). Single nucleotide variants in the BACH2 locus associate with multiple autoimmune diseases but BACH2 mutations causing Mendelian monogenic primary immunodeficiency have not previously been identified. We describe a syndrome of BACH2-related immunodeficiency and autoimmunity (BRIDA) resulting from BACH2 haploinsufficiency. Patients had lymphocyte maturation defects, causing immunoglobulin deficiency and intestinal inflammation. The mutations disrupted protein stability by interfering with homodimerization or by causing aggregation. Analogous lymphocyte defects existed in Bach2 heterozygous mice. More generally, we found that genes causing monogenic haploinsufficient diseases are substantially enriched for TFs and SE-architecture. These observations show a new feature of SE-architecture in Mendelian diseases of immunity, that heterozygous mutations in SE-regulated genes identified on whole exome/genome sequencing may have greater significance than recognized.
Tumor associated copy number changes in the circulation of patients with prostate cancer identified through whole genome sequencing
Background Patients with prostate cancer may present with metastatic or recurrent disease despite initial curative treatment. The propensity of metastatic prostate cancer to spread to the bone has limited repeated sampling of tumor deposits. Hence, considerably less is understood about this lethal metastatic disease, as it is not commonly studied. Here we explored whole-genome sequencing of plasma DNA to scan the tumor genomes of these patients non-invasively. Methods We wanted to make whole-genome analysis from plasma DNA amenable to clinical routine applications and developed an approach based on a benchtop high-throughput platform, that is, Illuminas MiSeq instrument. We performed whole-genome sequencing from plasma at a shallow sequencing depth to establish a genome-wide copy number profile of the tumor at low costs within 2 days. In parallel, we sequenced a panel of 55 high-interest genes and 38 introns with frequent fusion breakpoints such as the TMPRSS2-ERG fusion with high coverage. After intensive testing of our approach with samples from 25 individuals without cancer we analyzed 13 plasma samples derived from five patients with castration resistant (CRPC) and four patients with castration sensitive prostate cancer (CSPC). Results The genome-wide profiling in the plasma of our patients revealed multiple copy number aberrations including those previously reported in prostate tumors, such as losses in 8p and gains in 8q. High-level copy number gains in the AR locus were observed in patients with CRPC but not with CSPC disease. We identified the TMPRSS2-ERG rearrangement associated 3-Mbp deletion on chromosome 21 and found corresponding fusion plasma fragments in these cases. In an index case multiregional sequencing of the primary tumor identified different copy number changes in each sector, suggesting multifocal disease. Our plasma analyses of this index case, performed 13 years after resection of the primary tumor, revealed novel chromosomal rearrangements, which were stable in serial plasma analyses over a 9-month period, which is consistent with the presence of one metastatic clone. Conclusions The genomic landscape of prostate cancer can be established by non-invasive means from plasma DNA. Our approach provides specific genomic signatures within 2 days which may therefore serve as 'liquid biopsy'. Background Patients with prostate cancer may present with metastatic or recurrent disease despite initial curative treatment. The propensity of metastatic prostate cancer to spread to the bone has limited repeated sampling of tumor deposits. Hence, considerably less is understood about this lethal metastatic disease, as it is not commonly studied. Here we explored whole-genome sequencing of plasma DNA to scan the tumor genomes of these patients non-invasively. Methods We wanted to make whole-genome analysis from plasma DNA amenable to clinical routine applications and developed an approach based on a benchtop high-throughput platform, that is, Illuminas MiSeq instrument. We performed whole-genome sequencing from plasma at a shallow sequencing depth to establish a genome-wide copy number profile of the tumor at low costs within 2 days. In parallel, we sequenced a panel of 55 high-interest genes and 38 introns with frequent fusion breakpoints such as the TMPRSS2-ERG fusion with high coverage. After intensive testing of our approach with samples from 25 individuals without cancer we analyzed 13 plasma samples derived from five patients with castration resistant (CRPC) and four patients with castration sensitive prostate cancer (CSPC). Results The genome-wide profiling in the plasma of our patients revealed multiple copy number aberrations including those previously reported in prostate tumors, such as losses in 8p and gains in 8q. High-level copy number gains in the AR locus were observed in patients with CRPC but not with CSPC disease. We identified the TMPRSS2-ERG rearrangement associated 3-Mbp deletion on chromosome 21 and found corresponding fusion plasma fragments in these cases. In an index case multiregional sequencing of the primary tumor identified different copy number changes in each sector, suggesting multifocal disease. Our plasma analyses of this index case, performed 13 years after resection of the primary tumor, revealed novel chromosomal rearrangements, which were stable in serial plasma analyses over a 9-month period, which is consistent with the presence of one metastatic clone. Conclusions The genomic landscape of prostate cancer can be established by non-invasive means from plasma DNA. Our approach provides specific genomic signatures within 2 days which may therefore serve as 'liquid biopsy'.
Integrative Variation Analysis Reveals that a Complex Genotype May Specify Phenotype in Siblings with Syndromic Autism Spectrum Disorder
It has been proposed that copy number variations (CNVs) are associated with increased risk of autism spectrum disorder (ASD) and, in conjunction with other genetic changes, contribute to the heterogeneity of ASD phenotypes. Array comparative genomic hybridization (aCGH) and exome sequencing, together with systems genetics and network analyses, are being used as tools for the study of complex disorders of unknown etiology, especially those characterized by significant genetic and phenotypic heterogeneity. Therefore, to characterize the complex genotype-phenotype relationship, we performed aCGH and sequenced the exomes of two affected siblings with ASD symptoms, dysmorphic features, and intellectual disability, searching for de novo CNVs, as well as for de novo and rare inherited point variations—single nucleotide variants (SNVs) or small insertions and deletions (indels)—with probable functional impacts. With aCGH, we identified, in both siblings, a duplication in the 4p16.3 region and a deletion at 8p23.3, inherited by a paternal balanced translocation, t(4, 8) (p16; p23). Exome variant analysis found a total of 316 variants, of which 102 were shared by both siblings, 128 were in the male sibling exome data, and 86 were in the female exome data. Our integrative network analysis showed that the siblings’ shared translocation could explain their similar syndromic phenotype, including overgrowth, macrocephaly, and intellectual disability. However, exome data aggregate genes to those already connected from their translocation, which are important to the robustness of the network and contribute to the understanding of the broader spectrum of psychiatric symptoms. This study shows the importance of using an integrative approach to explore genotype-phenotype variability. It has been proposed that copy number variations (CNVs) are associated with increased risk of autism spectrum disorder (ASD) and, in conjunction with other genetic changes, contribute to the heterogeneity of ASD phenotypes. Array comparative genomic hybridization (aCGH) and exome sequencing, together with systems genetics and network analyses, are being used as tools for the study of complex disorders of unknown etiology, especially those characterized by significant genetic and phenotypic heterogeneity. Therefore, to characterize the complex genotype-phenotype relationship, we performed aCGH and sequenced the exomes of two affected siblings with ASD symptoms, dysmorphic features, and intellectual disability, searching for de novo CNVs, as well as for de novo and rare inherited point variations—single nucleotide variants (SNVs) or small insertions and deletions (indels)—with probable functional impacts. With aCGH, we identified, in both siblings, a duplication in the 4p16.3 region and a deletion at 8p23.3, inherited by a paternal balanced translocation, t(4, 8) (p16; p23). Exome variant analysis found a total of 316 variants, of which 102 were shared by both siblings, 128 were in the male sibling exome data, and 86 were in the female exome data. Our integrative network analysis showed that the siblings’ shared translocation could explain their similar syndromic phenotype, including overgrowth, macrocephaly, and intellectual disability. However, exome data aggregate genes to those already connected from their translocation, which are important to the robustness of the network and contribute to the understanding of the broader spectrum of psychiatric symptoms. This study shows the importance of using an integrative approach to explore genotype-phenotype variability.
A survey on cellular RNA editing activity in response to Candida albicans infections
Background Adenosine-to-Inosine (A-to-I) RNA editing is catalyzed by the adenosine deaminase acting on RNA (ADAR) family of enzymes, which induces alterations in mRNA sequence. It has been shown that A-to-I RNA editing events are of significance in the cell’s innate immunity and cellular response to viral infections. However, whether RNA editing plays a role in cellular response to microorganism/fungi infection has not been determined. Candida albicans, one of the most prevalent human pathogenic fungi, usually act as a commensal on skin and superficial mucosal, but has been found to cause candidiasis in immunosuppression patients. Previously, we have revealed the up-regulation of A-to-I RNA editing activity in response to different types of influenza virus infections. The current work is designed to study the effect of microorganism/fungi infection on the activity of A-to-I RNA editing in infected hosts. Results We first detected and characterized the A-to-I RNA editing events in oral epithelial cells (OKF6) and primary human umbilical vein endothelial cells (HUVEC), under normal growth condition or with C. albicans infection. Eighty nine thousand six hundred forty eight and 60,872 A-to-I editing sites were detected in normal OKF6 and HUVEC cells, respectively. They were validated against the RNA editing databases, DARNED, RADAR, and REDIportal with 50, 80, and 80% success rates, respectively. While over 95% editing sites were detected in Alu regions, among the rest of the editing sites in non repetitive regions, the majority was located in introns and UTRs. The distributions of A-to-I editing activity and editing depth were analyzed during the course of C. albicans infection. While the normalized editing levels of common editing sites exhibited a significant increase, especially in Alu regions, no significant change in the expression of ADAR1 or ADAR2 was observed. Second, we performed further analysis on data from in vivo mouse study with C. albicans infection. One thousand one hundred thirty three and 955 A-to-I editing sites were identified in mouse tongue and kidney tissues, respectively. The number of A-to-I editing events was much smaller than in human epithelial or endothelial cells, due to the lack of Alu elements in mouse genome. Furthermore, during the course of C. albicans infection we observed stable level of A-to-I editing activity in 131 and 190 common editing sites in the mouse tongue and kidney tissues, and found no significant change in ADAR1 or ADAR2 expression (with the exception of ADAR2 displaying a significant increase at 12 h after infection in mouse kidney tissue before returning to normal). Conclusions This work represents the first comprehensive analysis of A-to-I RNA editome in human epithelial and endothelial cells. C. albicans infection of human epithelial and endothelial cells led to the up-regulation of A-to-I editing activities, through a mechanism different from that of viral infections in human hosts. However, the in vivo mouse model with C. albicans infection did not show significant changes in A-to-I editing activities in tongue and kidney tissues. The different results in the mouse model were likely due to the presence of more complex in vivo environments, e.g. circulation and mixed cell types. Electronic supplementary material The online version of this article (10.1186/s12864-017-4374-2) contains supplementary material, which is available to authorized users. Background Adenosine-to-Inosine (A-to-I) RNA editing is catalyzed by the adenosine deaminase acting on RNA (ADAR) family of enzymes, which induces alterations in mRNA sequence. It has been shown that A-to-I RNA editing events are of significance in the cell’s innate immunity and cellular response to viral infections. However, whether RNA editing plays a role in cellular response to microorganism/fungi infection has not been determined. Candida albicans, one of the most prevalent human pathogenic fungi, usually act as a commensal on skin and superficial mucosal, but has been found to cause candidiasis in immunosuppression patients. Previously, we have revealed the up-regulation of A-to-I RNA editing activity in response to different types of influenza virus infections. The current work is designed to study the effect of microorganism/fungi infection on the activity of A-to-I RNA editing in infected hosts. Results We first detected and characterized the A-to-I RNA editing events in oral epithelial cells (OKF6) and primary human umbilical vein endothelial cells (HUVEC), under normal growth condition or with C. albicans infection. Eighty nine thousand six hundred forty eight and 60,872 A-to-I editing sites were detected in normal OKF6 and HUVEC cells, respectively. They were validated against the RNA editing databases, DARNED, RADAR, and REDIportal with 50, 80, and 80% success rates, respectively. While over 95% editing sites were detected in Alu regions, among the rest of the editing sites in non repetitive regions, the majority was located in introns and UTRs. The distributions of A-to-I editing activity and editing depth were analyzed during the course of C. albicans infection. While the normalized editing levels of common editing sites exhibited a significant increase, especially in Alu regions, no significant change in the expression of ADAR1 or ADAR2 was observed. Second, we performed further analysis on data from in vivo mouse study with C. albicans infection. One thousand one hundred thirty three and 955 A-to-I editing sites were identified in mouse tongue and kidney tissues, respectively. The number of A-to-I editing events was much smaller than in human epithelial or endothelial cells, due to the lack of Alu elements in mouse genome. Furthermore, during the course of C. albicans infection we observed stable level of A-to-I editing activity in 131 and 190 common editing sites in the mouse tongue and kidney tissues, and found no significant change in ADAR1 or ADAR2 expression (with the exception of ADAR2 displaying a significant increase at 12 h after infection in mouse kidney tissue before returning to normal). Conclusions This work represents the first comprehensive analysis of A-to-I RNA editome in human epithelial and endothelial cells. C. albicans infection of human epithelial and endothelial cells led to the up-regulation of A-to-I editing activities, through a mechanism different from that of viral infections in human hosts. However, the in vivo mouse model with C. albicans infection did not show significant changes in A-to-I editing activities in tongue and kidney tissues. The different results in the mouse model were likely due to the presence of more complex in vivo environments, e.g. circulation and mixed cell types. Electronic supplementary material The online version of this article (10.1186/s12864-017-4374-2) contains supplementary material, which is available to authorized users.
Unraveling Mycobacterium tuberculosis genomic diversity and evolution in Lisbon, Portugal, a highly drug resistant setting
Background Multidrug- (MDR) and extensively drug resistant (XDR) tuberculosis (TB) presents a challenge to disease control and elimination goals. In Lisbon, Portugal, specific and successful XDR-TB strains have been found in circulation for almost two decades. Results In the present study we have genotyped and sequenced the genomes of 56 Mycobacterium tuberculosis isolates recovered mostly from Lisbon. The genotyping data revealed three major clusters associated with MDR-TB, two of which are associated with XDR-TB. Whilst the genomic data contributed to elucidate the phylogenetic positioning of circulating MDR-TB strains, showing a high predominance of a single SNP cluster group 5. Furthermore, a genome-wide phylogeny analysis from these strains, together with 19 publicly available genomes of Mycobacterium tuberculosis clinical isolates, revealed two major clades responsible for M/XDR-TB in the region: Lisboa3 and Q1 (LAM). The data presented by this study yielded insights on microevolution and identification of novel compensatory mutations associated with rifampicin resistance in rpoB and rpoC. The screening for other structural variations revealed putative clade-defining variants. One deletion in PPE41, found among Lisboa3 isolates, is proposed to contribute to immune evasion and as a selective advantage. Insertion sequence (IS) mapping has also demonstrated the role of IS6110 as a major driver in mycobacterial evolution by affecting gene integrity and regulation. Conclusions Globally, this study contributes with novel genome-wide phylogenetic data and has led to the identification of new genomic variants that support the notion of a growing genomic diversity facing both setting and host adaptation. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-991) contains supplementary material, which is available to authorized users. Background Multidrug- (MDR) and extensively drug resistant (XDR) tuberculosis (TB) presents a challenge to disease control and elimination goals. In Lisbon, Portugal, specific and successful XDR-TB strains have been found in circulation for almost two decades. Results In the present study we have genotyped and sequenced the genomes of 56 Mycobacterium tuberculosis isolates recovered mostly from Lisbon. The genotyping data revealed three major clusters associated with MDR-TB, two of which are associated with XDR-TB. Whilst the genomic data contributed to elucidate the phylogenetic positioning of circulating MDR-TB strains, showing a high predominance of a single SNP cluster group 5. Furthermore, a genome-wide phylogeny analysis from these strains, together with 19 publicly available genomes of Mycobacterium tuberculosis clinical isolates, revealed two major clades responsible for M/XDR-TB in the region: Lisboa3 and Q1 (LAM). The data presented by this study yielded insights on microevolution and identification of novel compensatory mutations associated with rifampicin resistance in rpoB and rpoC. The screening for other structural variations revealed putative clade-defining variants. One deletion in PPE41, found among Lisboa3 isolates, is proposed to contribute to immune evasion and as a selective advantage. Insertion sequence (IS) mapping has also demonstrated the role of IS6110 as a major driver in mycobacterial evolution by affecting gene integrity and regulation. Conclusions Globally, this study contributes with novel genome-wide phylogenetic data and has led to the identification of new genomic variants that support the notion of a growing genomic diversity facing both setting and host adaptation. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-991) contains supplementary material, which is available to authorized users.
Whole Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum
The accurate detection of induced mutations is critical for both forward and reverse genetics studies. Experimental chemical mutagenesis induces relatively few single base changes per individual. In a complex eukaryotic genome, false positive detection of mutations can occur at or above this mutagenesis rate. We demonstrate here, using a population of ethyl methanesulfonate (EMS)-treated Sorghum bicolor BTx623 individuals, that using replication to detect false positive-induced variants in next-generation sequencing (NGS) data permits higher throughput variant detection with greater accuracy. We used a lower sequence coverage depth (average of 7×) from 586 independently mutagenized individuals and detected 5,399,493 homozygous single nucleotide polymorphisms (SNPs). Of these, 76% originated from only 57,872 genomic positions prone to false positive variant calling. These positions are characterized by high copy number paralogs where the error-prone SNP positions are at copies containing a variant at the SNP position. The ability of short stretches of homology to generate these error-prone positions suggests that incompletely assembled or poorly mapped repeated sequences are one driver of these error-prone positions. Removal of these false positives left 1,275,872 homozygous and 477,531 heterozygous EMS-induced SNPs, which, congruent with the mutagenic mechanism of EMS, were >98% G:C to A:T transitions. Through this analysis, we generated a collection of sequence indexed mutants of sorghum. This collection contains 4035 high-impact homozygous mutations in 3637 genes and 56,514 homozygous missense mutations in 23,227 genes. Each line contains, on average, 2177 annotated homozygous SNPs per genome, including seven likely gene knockouts and 96 missense mutations. The number of mutations in a transcript was linearly correlated with the transcript length and also the G+C count, but not with the GC/AT ratio. Analysis of the detected mutagenized positions identified CG-rich patches, and flanking sequences strongly influenced EMS-induced mutation rates. This method for detecting false positive-induced mutations is generally applicable to any organism, is independent of the choice of in silico variant-calling algorithm, and is most valuable when the true mutation rate is likely to be low, such as in laboratory-induced mutations or somatic mutation detection in medicine. The accurate detection of induced mutations is critical for both forward and reverse genetics studies. Experimental chemical mutagenesis induces relatively few single base changes per individual. In a complex eukaryotic genome, false positive detection of mutations can occur at or above this mutagenesis rate. We demonstrate here, using a population of ethyl methanesulfonate (EMS)-treated Sorghum bicolor BTx623 individuals, that using replication to detect false positive-induced variants in next-generation sequencing (NGS) data permits higher throughput variant detection with greater accuracy. We used a lower sequence coverage depth (average of 7×) from 586 independently mutagenized individuals and detected 5,399,493 homozygous single nucleotide polymorphisms (SNPs). Of these, 76% originated from only 57,872 genomic positions prone to false positive variant calling. These positions are characterized by high copy number paralogs where the error-prone SNP positions are at copies containing a variant at the SNP position. The ability of short stretches of homology to generate these error-prone positions suggests that incompletely assembled or poorly mapped repeated sequences are one driver of these error-prone positions. Removal of these false positives left 1,275,872 homozygous and 477,531 heterozygous EMS-induced SNPs, which, congruent with the mutagenic mechanism of EMS, were >98% G:C to A:T transitions. Through this analysis, we generated a collection of sequence indexed mutants of sorghum. This collection contains 4035 high-impact homozygous mutations in 3637 genes and 56,514 homozygous missense mutations in 23,227 genes. Each line contains, on average, 2177 annotated homozygous SNPs per genome, including seven likely gene knockouts and 96 missense mutations. The number of mutations in a transcript was linearly correlated with the transcript length and also the G+C count, but not with the GC/AT ratio. Analysis of the detected mutagenized positions identified CG-rich patches, and flanking sequences strongly influenced EMS-induced mutation rates. This method for detecting false positive-induced mutations is generally applicable to any organism, is independent of the choice of in silico variant-calling algorithm, and is most valuable when the true mutation rate is likely to be low, such as in laboratory-induced mutations or somatic mutation detection in medicine.
Efficiency of olaparib in colorectal cancer patients with an alteration of the homologous repair protein
Precision medicine is defined by the administration of drugs based on the tumor’s particular genetic characteristics. It is developing quickly in the field of cancer therapy. For example, KRAS, NRAS and BRAF genetic testing demonstrates its efficiency for precision medicine in colorectal cancer (CRC). Besides for these well-known mutations, the purpose of performing larger genetic testing in this pathology is unknown. Recent reports have shown that using the poly ADP ribose polymerase (PARP) inhibitor olaparib in patients with homologous repair enzyme deficiency gave positive clinical results in breast, ovarian and prostate cancers. We have reported here the cases of 2 patients with multi-treated metastatic CRC who underwent somatic and constitutional exome analyses. The analyses revealed a loss of function mutation in a homologous repair enzyme resulting in the loss of heterozygosity for both patients (Check2 for the first patient and RAD51C for the second one). Both patients were treated with off-label usage of olaparib. While the first patient showed clinical benefit, reduction of carcinoembryonic antigen tumor marker and radiologic response, the second patient quickly presented a progression of the tumor. Additional genetic analyses revealed a frameshift truncating mutation of the TP53BP1 gene in the patient who progressed. Interestingly, deficiency in TP53BP1 was previously described to confer resistance to olaparib in mice breast cancer models. Our findings suggest that exome analysis may be a helpful tool to highlight targetable mutations in CRC and that olaparib may be efficient in patients with a homologous repair deficiency. Precision medicine is defined by the administration of drugs based on the tumor’s particular genetic characteristics. It is developing quickly in the field of cancer therapy. For example, KRAS, NRAS and BRAF genetic testing demonstrates its efficiency for precision medicine in colorectal cancer (CRC). Besides for these well-known mutations, the purpose of performing larger genetic testing in this pathology is unknown. Recent reports have shown that using the poly ADP ribose polymerase (PARP) inhibitor olaparib in patients with homologous repair enzyme deficiency gave positive clinical results in breast, ovarian and prostate cancers. We have reported here the cases of 2 patients with multi-treated metastatic CRC who underwent somatic and constitutional exome analyses. The analyses revealed a loss of function mutation in a homologous repair enzyme resulting in the loss of heterozygosity for both patients (Check2 for the first patient and RAD51C for the second one). Both patients were treated with off-label usage of olaparib. While the first patient showed clinical benefit, reduction of carcinoembryonic antigen tumor marker and radiologic response, the second patient quickly presented a progression of the tumor. Additional genetic analyses revealed a frameshift truncating mutation of the TP53BP1 gene in the patient who progressed. Interestingly, deficiency in TP53BP1 was previously described to confer resistance to olaparib in mice breast cancer models. Our findings suggest that exome analysis may be a helpful tool to highlight targetable mutations in CRC and that olaparib may be efficient in patients with a homologous repair deficiency.
Copy number alterations detected by whole exome and whole genome sequencing of esophageal adenocarcinoma
Background Esophageal adenocarcinoma (EA) is among the leading causes of cancer mortality, especially in developed countries. A high level of somatic copy number alterations (CNAs) accumulates over the decades in the progression from Barrett’s esophagus, the precursor lesion, to EA. Accurate identification of somatic CNAs is essential to understand cancer development. Many studies have been conducted for the detection of CNA in EA using microarrays. Next-generation sequencing (NGS) technologies are believed to have advantages in sensitivity and accuracy to detect CNA, yet no NGS-based CNA detection in EA has been reported. Results In this study, we analyzed whole-exome (WES) and whole-genome sequencing (WGS) data for detecting CNA from a published large-scale genomic study of EA. Two specific comparisons were conducted. First, the recurrent CNAs based on WGS and WES data from 145 EA samples were compared to those found in five previous microarray-based studies. We found that the majority of the previously identified regions were also detected in this study. Interestingly, some novel amplifications and deletions were discovered using the NGS data. In particular, SKI and PRKCZ detected in a deletion region are involved in transforming growth factor-β pathway, suggesting the potential utility of novel biomarkers for EA. Second, we compared CNAs detected in WGS and WES data from the same 15 EA samples. No large-scale CNA was identified statistically more frequently by WES or WGS, while more focal-scale CNAs were detected by WGS than by WES. Conclusions Our results suggest that NGS can replace microarrays to detect CNA in EA. WGS is superior to WES in that it can offer finer resolution for the detection, though if the interest is on recurrent CNAs, WES can be preferable to WGS for its cost-effectiveness. Electronic supplementary material The online version of this article (doi:10.1186/s40246-015-0044-0) contains supplementary material, which is available to authorized users. Background Esophageal adenocarcinoma (EA) is among the leading causes of cancer mortality, especially in developed countries. A high level of somatic copy number alterations (CNAs) accumulates over the decades in the progression from Barrett’s esophagus, the precursor lesion, to EA. Accurate identification of somatic CNAs is essential to understand cancer development. Many studies have been conducted for the detection of CNA in EA using microarrays. Next-generation sequencing (NGS) technologies are believed to have advantages in sensitivity and accuracy to detect CNA, yet no NGS-based CNA detection in EA has been reported. Results In this study, we analyzed whole-exome (WES) and whole-genome sequencing (WGS) data for detecting CNA from a published large-scale genomic study of EA. Two specific comparisons were conducted. First, the recurrent CNAs based on WGS and WES data from 145 EA samples were compared to those found in five previous microarray-based studies. We found that the majority of the previously identified regions were also detected in this study. Interestingly, some novel amplifications and deletions were discovered using the NGS data. In particular, SKI and PRKCZ detected in a deletion region are involved in transforming growth factor-β pathway, suggesting the potential utility of novel biomarkers for EA. Second, we compared CNAs detected in WGS and WES data from the same 15 EA samples. No large-scale CNA was identified statistically more frequently by WES or WGS, while more focal-scale CNAs were detected by WGS than by WES. Conclusions Our results suggest that NGS can replace microarrays to detect CNA in EA. WGS is superior to WES in that it can offer finer resolution for the detection, though if the interest is on recurrent CNAs, WES can be preferable to WGS for its cost-effectiveness. Electronic supplementary material The online version of this article (doi:10.1186/s40246-015-0044-0) contains supplementary material, which is available to authorized users.
Reconstructing the Population Genetic History of the Caribbean
Author Summary Latinos are often regarded as a single heterogeneous group, whose complex variation is not fully appreciated in several social, demographic, and biomedical contexts. By making use of genomic data, we characterize ancestral components of Caribbean populations on a sub-continental level and unveil fine-scale patterns of population structure distinguishing insular from mainland Caribbean populations as well as from other Hispanic/Latino groups. We provide genetic evidence for an inland South American origin of the Native American component in island populations and for extensive pre-Columbian gene flow across the Caribbean basin. The Caribbean-derived European component shows significant differentiation from parental Iberian populations, presumably as a result of founder effects during the colonization of the New World. Based on demographic models, we reconstruct the complex population history of the Caribbean since the onset of continental admixture. We find that insular populations are best modeled as mixtures absorbing two pulses of African migrants, coinciding with the early and maximum activity stages of the transatlantic slave trade. These two pulses appear to have originated in different regions within West Africa, imprinting two distinguishable signatures on present-day Afro-Caribbean genomes and shedding light on the genetic impact of the slave trade in the Caribbean. Author Summary Latinos are often regarded as a single heterogeneous group, whose complex variation is not fully appreciated in several social, demographic, and biomedical contexts. By making use of genomic data, we characterize ancestral components of Caribbean populations on a sub-continental level and unveil fine-scale patterns of population structure distinguishing insular from mainland Caribbean populations as well as from other Hispanic/Latino groups. We provide genetic evidence for an inland South American origin of the Native American component in island populations and for extensive pre-Columbian gene flow across the Caribbean basin. The Caribbean-derived European component shows significant differentiation from parental Iberian populations, presumably as a result of founder effects during the colonization of the New World. Based on demographic models, we reconstruct the complex population history of the Caribbean since the onset of continental admixture. We find that insular populations are best modeled as mixtures absorbing two pulses of African migrants, coinciding with the early and maximum activity stages of the transatlantic slave trade. These two pulses appear to have originated in different regions within West Africa, imprinting two distinguishable signatures on present-day Afro-Caribbean genomes and shedding light on the genetic impact of the slave trade in the Caribbean.The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse—which today is reflected by shorter, older ancestry tracts—consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse—reflected by longer, younger tracts—is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived. The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse—which today is reflected by shorter, older ancestry tracts—consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse—reflected by longer, younger tracts—is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived.
An essential domain of an early diverged RNA polymerase II functions to accurately decode a primitive chromatin landscape
Abstract A unique feature of RNA polymerase II (RNA pol II) is its long C-terminal extension, called the carboxy-terminal domain (CTD). The well-studied eukaryotes possess a tandemly repeated 7-amino-acid sequence, called the canonical CTD, which orchestrates various steps in mRNA synthesis. Many eukaryotes possess a CTD devoid of repeats, appropriately called a non-canonical CTD, which performs completely unknown functions. Trypanosoma brucei, the etiologic agent of African Sleeping Sickness, deploys an RNA pol II that contains a non-canonical CTD to accomplish an unusual transcriptional program; all protein-coding genes are transcribed as part of a polygenic precursor mRNA (pre-mRNA) that is initiated within a several-kilobase-long region, called the transcription start site (TSS), which is upstream of the first protein-coding gene in the polygenic array. In this report, we show that the non-canonical CTD of T. brucei RNA pol II is important for normal protein-coding gene expression, likely directing RNA pol II to the TSSs within the genome. Our work reveals the presence of a primordial CTD code within eukarya and indicates that proper recognition of the chromatin landscape is a central function of this RNA pol II-distinguishing domain. Abstract A unique feature of RNA polymerase II (RNA pol II) is its long C-terminal extension, called the carboxy-terminal domain (CTD). The well-studied eukaryotes possess a tandemly repeated 7-amino-acid sequence, called the canonical CTD, which orchestrates various steps in mRNA synthesis. Many eukaryotes possess a CTD devoid of repeats, appropriately called a non-canonical CTD, which performs completely unknown functions. Trypanosoma brucei, the etiologic agent of African Sleeping Sickness, deploys an RNA pol II that contains a non-canonical CTD to accomplish an unusual transcriptional program; all protein-coding genes are transcribed as part of a polygenic precursor mRNA (pre-mRNA) that is initiated within a several-kilobase-long region, called the transcription start site (TSS), which is upstream of the first protein-coding gene in the polygenic array. In this report, we show that the non-canonical CTD of T. brucei RNA pol II is important for normal protein-coding gene expression, likely directing RNA pol II to the TSSs within the genome. Our work reveals the presence of a primordial CTD code within eukarya and indicates that proper recognition of the chromatin landscape is a central function of this RNA pol II-distinguishing domain.
Identification of Epigenetic Biomarkers of Lung Adenocarcinoma through Multi Omics Data Analysis
Epigenetic mechanisms such as DNA methylation or histone modifications are essential for the regulation of gene expression and development of tissues. Alteration of epigenetic modifications can be used as an epigenetic biomarker for diagnosis and as promising targets for epigenetic therapy. A recent study explored cancer-cell specific epigenetic biomarkers by examining different types of epigenetic modifications simultaneously. However, it was based on microarrays and reported biomarkers that were also present in normal cells at a low frequency. Here, we first analyzed multi-omics data (including ChIP-Seq data of six types of histone modifications: H3K27ac, H3K4me1, H3K9me3, H3K36me3, H3K27me3, and H3K4me3) obtained from 26 lung adenocarcinoma cell lines and a normal cell line. We identified six genes with both H3K27ac and H3K4me3 histone modifications in their promoter regions, which were not present in the normal cell line, but present in ≥85% (22 out of 26) and ≤96% (25 out of 26) of the lung adenocarcinoma cell lines. Of these genes, NUP210 (encoding a main component of the nuclear pore complex) was the only gene in which the two modifications were not detected in another normal cell line. RNA-Seq analysis revealed that NUP210 was aberrantly overexpressed among the 26 lung adenocarcinoma cell lines, although the frequency of NUP210 overexpression was lower (19.3%) in 57 lung adenocarcinoma tissue samples studied and stored in another database. This study provides a basis to discover epigenetic biomarkers highly specific to a certain cancer, based on multi-omics data at the cell population level. Epigenetic mechanisms such as DNA methylation or histone modifications are essential for the regulation of gene expression and development of tissues. Alteration of epigenetic modifications can be used as an epigenetic biomarker for diagnosis and as promising targets for epigenetic therapy. A recent study explored cancer-cell specific epigenetic biomarkers by examining different types of epigenetic modifications simultaneously. However, it was based on microarrays and reported biomarkers that were also present in normal cells at a low frequency. Here, we first analyzed multi-omics data (including ChIP-Seq data of six types of histone modifications: H3K27ac, H3K4me1, H3K9me3, H3K36me3, H3K27me3, and H3K4me3) obtained from 26 lung adenocarcinoma cell lines and a normal cell line. We identified six genes with both H3K27ac and H3K4me3 histone modifications in their promoter regions, which were not present in the normal cell line, but present in ≥85% (22 out of 26) and ≤96% (25 out of 26) of the lung adenocarcinoma cell lines. Of these genes, NUP210 (encoding a main component of the nuclear pore complex) was the only gene in which the two modifications were not detected in another normal cell line. RNA-Seq analysis revealed that NUP210 was aberrantly overexpressed among the 26 lung adenocarcinoma cell lines, although the frequency of NUP210 overexpression was lower (19.3%) in 57 lung adenocarcinoma tissue samples studied and stored in another database. This study provides a basis to discover epigenetic biomarkers highly specific to a certain cancer, based on multi-omics data at the cell population level.
Genome wide RNA seq and ChIP seq reveal Linc YY1 function in regulating YY1/PRC2 activity during skeletal myogenesis
Little is known how lincRNAs are involved in skeletal myogenesis. Here we describe the discovery and functional annotation of Linc-YY1, a novel lincRNA originating from the promoter of the transcription factor (TF) Yin Yang 1 (YY1). Starting from whole transcriptome shotgun sequencing (a.k.a. RNA-seq) data from muscle C2C12 cells, a series of bioinformatics analysis was applied towards the identification of hundreds of high-confidence novel lincRNAs. Genome-wide approaches were then employed to demonstrate that Linc-YY1 functions to promote myogenesis through associating with YY1 and regulating YY1/PRC2 transcriptional activity in trans. Here we describe the details of the ChIP-seq, RNA-seq experiments, and data analysis procedures associated with the study published by Zhou and colleagues in the Nature Communications Journal in 2015 Zhou et al. (2015) [1]. The data was deposited on NCBI's Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) with accession number GSE74049. Little is known how lincRNAs are involved in skeletal myogenesis. Here we describe the discovery and functional annotation of Linc-YY1, a novel lincRNA originating from the promoter of the transcription factor (TF) Yin Yang 1 (YY1). Starting from whole transcriptome shotgun sequencing (a.k.a. RNA-seq) data from muscle C2C12 cells, a series of bioinformatics analysis was applied towards the identification of hundreds of high-confidence novel lincRNAs. Genome-wide approaches were then employed to demonstrate that Linc-YY1 functions to promote myogenesis through associating with YY1 and regulating YY1/PRC2 transcriptional activity in trans. Here we describe the details of the ChIP-seq, RNA-seq experiments, and data analysis procedures associated with the study published by Zhou and colleagues in the Nature Communications Journal in 2015 Zhou et al. (2015) [1]. The data was deposited on NCBI's Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) with accession number GSE74049.
R loops induce repressive chromatin marks over mammalian gene terminators
The formation of R-loops is a natural consequence of the transcription process, caused by invasion of the DNA duplex by nascent transcripts. These structures have been considered rare transcriptional by-products with potential harmful effects on genome integrity, due to the fragility of the displaced DNA coding strand1. However R-loops may also possess beneficial effects as their widespread formation has been detected over CpG island promoters in human genes2,3. Furthermore we have previously shown that R-loops are particularly enriched over G-rich terminator elements. These facilitate RNA polymerase II (Pol II) pausing prior to efficient termination4. Here we reveal an unanticipated link between R-loops and RNA interference (RNAi)-dependent H3K9me2 formation over pause site termination regions of mammalian protein coding genes. We show that R-loops induce antisense transcription over these pause elements which in turn lead to the generation of double-strand RNA (dsRNA) and recruitment of Dicer, Ago1, Ago2, and G9a histone lysine methyltransferase (HKMT). Consequently an H3K9me2 repressive mark is formed and Heterochromatin Protein 1γ (HP1γ) is recruited, that reinforces Pol II pausing prior to efficient transcriptional termination. We predict that R-loops promote a chromatin architecture that defines the termination region for a substantial subset of mammalian genes. The formation of R-loops is a natural consequence of the transcription process, caused by invasion of the DNA duplex by nascent transcripts. These structures have been considered rare transcriptional by-products with potential harmful effects on genome integrity, due to the fragility of the displaced DNA coding strand1. However R-loops may also possess beneficial effects as their widespread formation has been detected over CpG island promoters in human genes2,3. Furthermore we have previously shown that R-loops are particularly enriched over G-rich terminator elements. These facilitate RNA polymerase II (Pol II) pausing prior to efficient termination4. Here we reveal an unanticipated link between R-loops and RNA interference (RNAi)-dependent H3K9me2 formation over pause site termination regions of mammalian protein coding genes. We show that R-loops induce antisense transcription over these pause elements which in turn lead to the generation of double-strand RNA (dsRNA) and recruitment of Dicer, Ago1, Ago2, and G9a histone lysine methyltransferase (HKMT). Consequently an H3K9me2 repressive mark is formed and Heterochromatin Protein 1γ (HP1γ) is recruited, that reinforces Pol II pausing prior to efficient transcriptional termination. We predict that R-loops promote a chromatin architecture that defines the termination region for a substantial subset of mammalian genes.
Effects of DNA Methylation and Chromatin State on Rates of Molecular Evolution in Insects
Epigenetic information is widely appreciated for its role in gene regulation in eukaryotic organisms. However, epigenetic information can also influence genome evolution. Here, we investigate the effects of epigenetic information on gene sequence evolution in two disparate insects: the fly Drosophila melanogaster, which lacks substantial DNA methylation, and the ant Camponotus floridanus, which possesses a functional DNA methylation system. We found that DNA methylation was positively correlated with the synonymous substitution rate in C. floridanus, suggesting a key effect of DNA methylation on patterns of gene evolution. However, our data suggest the link between DNA methylation and elevated rates of synonymous substitution was explained, in large part, by the targeting of DNA methylation to genes with signatures of transcriptionally active chromatin, rather than the mutational effect of DNA methylation itself. This phenomenon may be explained by an elevated mutation rate for genes residing in transcriptionally active chromatin, or by increased structural constraints on genes in inactive chromatin. This result highlights the importance of chromatin structure as the primary epigenetic driver of genome evolution in insects. Overall, our study demonstrates how different epigenetic systems contribute to variation in the rates of coding sequence evolution. Epigenetic information is widely appreciated for its role in gene regulation in eukaryotic organisms. However, epigenetic information can also influence genome evolution. Here, we investigate the effects of epigenetic information on gene sequence evolution in two disparate insects: the fly Drosophila melanogaster, which lacks substantial DNA methylation, and the ant Camponotus floridanus, which possesses a functional DNA methylation system. We found that DNA methylation was positively correlated with the synonymous substitution rate in C. floridanus, suggesting a key effect of DNA methylation on patterns of gene evolution. However, our data suggest the link between DNA methylation and elevated rates of synonymous substitution was explained, in large part, by the targeting of DNA methylation to genes with signatures of transcriptionally active chromatin, rather than the mutational effect of DNA methylation itself. This phenomenon may be explained by an elevated mutation rate for genes residing in transcriptionally active chromatin, or by increased structural constraints on genes in inactive chromatin. This result highlights the importance of chromatin structure as the primary epigenetic driver of genome evolution in insects. Overall, our study demonstrates how different epigenetic systems contribute to variation in the rates of coding sequence evolution.
BCL11A enhancer dissection by Cas9 mediated in situ saturating mutagenesis
Summary Enhancers, critical determinants of cellular identity, are commonly identified by correlative chromatin marks and gain-of-function potential, though only loss-of-function studies can demonstrate their requirement in the native genomic context. Previously we identified an erythroid enhancer of BCL11A, subject to common genetic variation associated with fetal hemoglobin (HbF) level, whose mouse ortholog is necessary for erythroid BCL11A expression. Here we develop pooled CRISPR-Cas9 guide RNA libraries to perform in situ saturating mutagenesis of the human and mouse enhancers. This approach reveals critical minimal features and discrete vulnerabilities of these enhancers. Despite conserved function of the composite enhancers, their architecture diverges. The crucial human sequences appear primate-specific. Through editing of primary human progenitors and mouse transgenesis, we validate the BCL11A erythroid enhancer as a target for HbF reinduction. The detailed enhancer map will inform therapeutic genome editing. The screening approach described here is generally applicable to functional interrogation of noncoding genomic elements. Summary Enhancers, critical determinants of cellular identity, are commonly identified by correlative chromatin marks and gain-of-function potential, though only loss-of-function studies can demonstrate their requirement in the native genomic context. Previously we identified an erythroid enhancer of BCL11A, subject to common genetic variation associated with fetal hemoglobin (HbF) level, whose mouse ortholog is necessary for erythroid BCL11A expression. Here we develop pooled CRISPR-Cas9 guide RNA libraries to perform in situ saturating mutagenesis of the human and mouse enhancers. This approach reveals critical minimal features and discrete vulnerabilities of these enhancers. Despite conserved function of the composite enhancers, their architecture diverges. The crucial human sequences appear primate-specific. Through editing of primary human progenitors and mouse transgenesis, we validate the BCL11A erythroid enhancer as a target for HbF reinduction. The detailed enhancer map will inform therapeutic genome editing. The screening approach described here is generally applicable to functional interrogation of noncoding genomic elements.
EIN2 dependent regulation of acetylation of histone H3K14 and non canonical histone H3K23 in ethylene signalling
The translocation of the C-terminal domain of EIN2 to the nucleus is essential for induction of gene expression in response to the plant hormone ethylene. Here, Zhang et al. show that EIN2 is required for ethylene-inducible elevation of histone acetylation marks associated with transcriptional activation. The translocation of the C-terminal domain of EIN2 to the nucleus is essential for induction of gene expression in response to the plant hormone ethylene. Here, Zhang et al. show that EIN2 is required for ethylene-inducible elevation of histone acetylation marks associated with transcriptional activation.Ethylene gas is essential for many developmental processes and stress responses in plants. EIN2 plays a key role in ethylene signalling but its function remains enigmatic. Here, we show that ethylene specifically elevates acetylation of histone H3K14 and the non-canonical acetylation of H3K23 in etiolated seedlings. The up-regulation of these two histone marks positively correlates with ethylene-regulated transcription activation, and the elevation requires EIN2. Both EIN2 and EIN3 interact with a SANT domain protein named EIN2 nuclear associated protein 1 (ENAP1), overexpression of which results in elevation of histone acetylation and enhanced ethylene-inducible gene expression in an EIN2-dependent manner. On the basis of these findings we propose a model where, in the presence of ethylene, the EIN2 C terminus contributes to downstream signalling via the elevation of acetylation at H3K14 and H3K23. ENAP1 may potentially mediate ethylene-induced histone acetylation via its interactions with EIN2 C terminus. Ethylene gas is essential for many developmental processes and stress responses in plants. EIN2 plays a key role in ethylene signalling but its function remains enigmatic. Here, we show that ethylene specifically elevates acetylation of histone H3K14 and the non-canonical acetylation of H3K23 in etiolated seedlings. The up-regulation of these two histone marks positively correlates with ethylene-regulated transcription activation, and the elevation requires EIN2. Both EIN2 and EIN3 interact with a SANT domain protein named EIN2 nuclear associated protein 1 (ENAP1), overexpression of which results in elevation of histone acetylation and enhanced ethylene-inducible gene expression in an EIN2-dependent manner. On the basis of these findings we propose a model where, in the presence of ethylene, the EIN2 C terminus contributes to downstream signalling via the elevation of acetylation at H3K14 and H3K23. ENAP1 may potentially mediate ethylene-induced histone acetylation via its interactions with EIN2 C terminus.
Dataset of TWIST1 regulated genes in the cranial mesoderm and a transcriptome comparison of cranial mesoderm and cranial neural crest
This article contains data related to the research article entitled “Transcriptional targets of TWIST1 in the cranial mesoderm regulate cell-matrix interactions and mesenchyme maintenance” by Bildsoe et al. (2016) [1]. The data presented here are derived from: (1) a microarray-based comparison of sorted cranial mesoderm (CM) and cranial neural crest (CNC) cells from E9.5 mouse embryos; (2) comparisons of transcription profiles of head tissues from mouse embryos with a CM-specific loss-of-function of Twist1 and control mouse embryos collected at E8.5 and E9.5; (3) ChIP-seq using a TWIST1-specific monoclonal antibody with chromatin extracts from TWIST1-expressing MDCK cells, a model for a TWIST1-dependent mesenchymal state. This article contains data related to the research article entitled “Transcriptional targets of TWIST1 in the cranial mesoderm regulate cell-matrix interactions and mesenchyme maintenance” by Bildsoe et al. (2016) [1]. The data presented here are derived from: (1) a microarray-based comparison of sorted cranial mesoderm (CM) and cranial neural crest (CNC) cells from E9.5 mouse embryos; (2) comparisons of transcription profiles of head tissues from mouse embryos with a CM-specific loss-of-function of Twist1 and control mouse embryos collected at E8.5 and E9.5; (3) ChIP-seq using a TWIST1-specific monoclonal antibody with chromatin extracts from TWIST1-expressing MDCK cells, a model for a TWIST1-dependent mesenchymal state.
Temporal dynamics of gene expression and histone marks at the Arabidopsis shoot meristem during flowering
When plants flower, the shoot apical meristem switches fate to produce floral organs instead of leaves. Here You et al. perform tissue-specific epigenome profiling and show that during this transition changes in histone methylation are correlated with transcriptional responses in the meristem. When plants flower, the shoot apical meristem switches fate to produce floral organs instead of leaves. Here You et al. perform tissue-specific epigenome profiling and show that during this transition changes in histone methylation are correlated with transcriptional responses in the meristem.Plants can produce organs throughout their entire life from pluripotent stem cells located at their growing tip, the shoot apical meristem (SAM). At the time of flowering, the SAM of Arabidopsis thaliana switches fate and starts producing flowers instead of leaves. Correct timing of flowering in part determines reproductive success, and is therefore under environmental and endogenous control. How epigenetic regulation contributes to the floral transition has eluded analysis so far, mostly because of the poor accessibility of the SAM. Here we report the temporal dynamics of the chromatin modifications H3K4me3 and H3K27me3 and their correlation with transcriptional changes at the SAM in response to photoperiod-induced flowering. Emphasizing the importance of tissue-specific epigenomic analyses we detect enrichments of chromatin states in the SAM that were not apparent in whole seedlings. Furthermore, our results suggest that regulation of translation might be involved in adjusting meristem function during the induction of flowering. Plants can produce organs throughout their entire life from pluripotent stem cells located at their growing tip, the shoot apical meristem (SAM). At the time of flowering, the SAM of Arabidopsis thaliana switches fate and starts producing flowers instead of leaves. Correct timing of flowering in part determines reproductive success, and is therefore under environmental and endogenous control. How epigenetic regulation contributes to the floral transition has eluded analysis so far, mostly because of the poor accessibility of the SAM. Here we report the temporal dynamics of the chromatin modifications H3K4me3 and H3K27me3 and their correlation with transcriptional changes at the SAM in response to photoperiod-induced flowering. Emphasizing the importance of tissue-specific epigenomic analyses we detect enrichments of chromatin states in the SAM that were not apparent in whole seedlings. Furthermore, our results suggest that regulation of translation might be involved in adjusting meristem function during the induction of flowering.
Evaluation of Candidate Stromal Epithelial Cross Talk Genes Identifies Association between Risk of Serous Ovarian Cancer and TERT, a Cancer Susceptibility “Hot Spot”
Author Summary In this article, we report the findings from a large-scale analysis of common variation in genes that are expressed as a consequence of interactions between ovarian cancer cells and their host micro-environment that could influence serous ovarian cancer risk. We evaluated 1,302 common variants within or near 173 genes in two large case-control studies from the Ovarian Cancer Association Consortium (OCAC) and selected three variants for further evaluation in sixteen OCAC studies and an additional 18 for evaluation in five OCAC studies. We observed a significantly increased risk of serous ovarian cancer associated with a variant in the telomerase reverse transcriptase (TERT) gene. Although TERT variants have not been previously shown to contribute to ovarian cancer risk, several studies have recently reported associations between TERT variants and other forms of cancer, including gliomas, lung cancer, adenocarcinoma, basal cell carcinoma, prostate cancer, and multiple other cancers. TERT encodes a protein that is essential for the replication and maintenance of chromosomal integrity during cell division. In cancer cells, TERT has been linked to genomic instability and tumour cell proliferation. Further studies are necessary to confirm our findings and to investigate the mechanisms for the observed association. Author Summary In this article, we report the findings from a large-scale analysis of common variation in genes that are expressed as a consequence of interactions between ovarian cancer cells and their host micro-environment that could influence serous ovarian cancer risk. We evaluated 1,302 common variants within or near 173 genes in two large case-control studies from the Ovarian Cancer Association Consortium (OCAC) and selected three variants for further evaluation in sixteen OCAC studies and an additional 18 for evaluation in five OCAC studies. We observed a significantly increased risk of serous ovarian cancer associated with a variant in the telomerase reverse transcriptase (TERT) gene. Although TERT variants have not been previously shown to contribute to ovarian cancer risk, several studies have recently reported associations between TERT variants and other forms of cancer, including gliomas, lung cancer, adenocarcinoma, basal cell carcinoma, prostate cancer, and multiple other cancers. TERT encodes a protein that is essential for the replication and maintenance of chromosomal integrity during cell division. In cancer cells, TERT has been linked to genomic instability and tumour cell proliferation. Further studies are necessary to confirm our findings and to investigate the mechanisms for the observed association.We hypothesized that variants in genes expressed as a consequence of interactions between ovarian cancer cells and the host micro-environment could contribute to cancer susceptibility. We therefore used a two-stage approach to evaluate common single nucleotide polymorphisms (SNPs) in 173 genes involved in stromal epithelial interactions in the Ovarian Cancer Association Consortium (OCAC). In the discovery stage, cases with epithelial ovarian cancer (n = 675) and controls (n = 1,162) were genotyped at 1,536 SNPs using an Illumina GoldenGate assay. Based on Positive Predictive Value estimates, three SNPs—PODXL rs1013368, ITGA6 rs13027811, and MMP3 rs522616—were selected for replication using TaqMan genotyping in up to 3,059 serous invasive cases and 8,905 controls from 16 OCAC case-control studies. An additional 18 SNPs with P per-allele<0.05 in the discovery stage were selected for replication in a subset of five OCAC studies (n = 1,233 serous invasive cases; n = 3,364 controls). The discovery stage associations in PODXL, ITGA6, and MMP3 were attenuated in the larger replication set (adj. P per-allele≥0.5). However genotypes at TERT rs7726159 were associated with ovarian cancer risk in the smaller, five-study replication study (P per-allele = 0.03). Combined analysis of the discovery and replication sets for this TERT SNP showed an increased risk of serous ovarian cancer among non-Hispanic whites [adj. ORper-allele 1.14 (1.04–1.24) p = 0.003]. Our study adds to the growing evidence that, like the 8q24 locus, the telomerase reverse transcriptase locus at 5p15.33, is a general cancer susceptibility locus. We hypothesized that variants in genes expressed as a consequence of interactions between ovarian cancer cells and the host micro-environment could contribute to cancer susceptibility. We therefore used a two-stage approach to evaluate common single nucleotide polymorphisms (SNPs) in 173 genes involved in stromal epithelial interactions in the Ovarian Cancer Association Consortium (OCAC). In the discovery stage, cases with epithelial ovarian cancer (n = 675) and controls (n = 1,162) were genotyped at 1,536 SNPs using an Illumina GoldenGate assay. Based on Positive Predictive Value estimates, three SNPs—PODXL rs1013368, ITGA6 rs13027811, and MMP3 rs522616—were selected for replication using TaqMan genotyping in up to 3,059 serous invasive cases and 8,905 controls from 16 OCAC case-control studies. An additional 18 SNPs with P per-allele<0.05 in the discovery stage were selected for replication in a subset of five OCAC studies (n = 1,233 serous invasive cases; n = 3,364 controls). The discovery stage associations in PODXL, ITGA6, and MMP3 were attenuated in the larger replication set (adj. P per-allele≥0.5). However genotypes at TERT rs7726159 were associated with ovarian cancer risk in the smaller, five-study replication study (P per-allele = 0.03). Combined analysis of the discovery and replication sets for this TERT SNP showed an increased risk of serous ovarian cancer among non-Hispanic whites [adj. ORper-allele 1.14 (1.04–1.24) p = 0.003]. Our study adds to the growing evidence that, like the 8q24 locus, the telomerase reverse transcriptase locus at 5p15.33, is a general cancer susceptibility locus.
A Genome Wide Linkage Study for Chronic Obstructive Pulmonary Disease in a Dutch Genetic Isolate Identifies Novel Rare Candidate Variants
Chronic obstructive pulmonary disease (COPD) is a complex and heritable disease, associated with multiple genetic variants. Specific familial types of COPD may be explained by rare variants, which have not been widely studied. We aimed to discover rare genetic variants underlying COPD through a genome-wide linkage scan. Affected-only analysis was performed using the 6K Illumina Linkage IV Panel in 142 cases clustered in 27 families from a genetic isolate, the Erasmus Rucphen Family (ERF) study. Potential causal variants were identified by searching for shared rare variants in the exome-sequence data of the affected members of the families contributing most to the linkage peak. The identified rare variants were then tested for association with COPD in a large meta-analysis of several cohorts. Significant evidence for linkage was observed on chromosomes 15q14–15q25 [logarithm of the odds (LOD) score = 5.52], 11p15.4–11q14.1 (LOD = 3.71) and 5q14.3–5q33.2 (LOD = 3.49). In the chromosome 15 peak, that harbors the known COPD locus for nicotinic receptors, and in the chromosome 5 peak we could not identify shared variants. In the chromosome 11 locus, we identified four rare (minor allele frequency (MAF) <0.02), predicted pathogenic, missense variants. These were shared among the affected family members. The identified variants localize to genes including neuroblast differentiation-associated protein (AHNAK), previously associated with blood biomarkers in COPD, phospholipase C Beta 3 (PLCB3), shown to increase airway hyper-responsiveness, solute carrier family 22-A11 (SLC22A11), involved in amino acid metabolism and ion transport, and metallothionein-like protein 5 (MTL5), involved in nicotinate and nicotinamide metabolism. Association of SLC22A11 and MTL5 variants were confirmed in the meta-analysis of 9,888 cases and 27,060 controls. In conclusion, we have identified novel rare variants in plausible genes related to COPD. Further studies utilizing large sample whole-genome sequencing should further confirm the associations at chromosome 11 and investigate the chromosome 15 and 5 linked regions. Chronic obstructive pulmonary disease (COPD) is a complex and heritable disease, associated with multiple genetic variants. Specific familial types of COPD may be explained by rare variants, which have not been widely studied. We aimed to discover rare genetic variants underlying COPD through a genome-wide linkage scan. Affected-only analysis was performed using the 6K Illumina Linkage IV Panel in 142 cases clustered in 27 families from a genetic isolate, the Erasmus Rucphen Family (ERF) study. Potential causal variants were identified by searching for shared rare variants in the exome-sequence data of the affected members of the families contributing most to the linkage peak. The identified rare variants were then tested for association with COPD in a large meta-analysis of several cohorts. Significant evidence for linkage was observed on chromosomes 15q14–15q25 [logarithm of the odds (LOD) score = 5.52], 11p15.4–11q14.1 (LOD = 3.71) and 5q14.3–5q33.2 (LOD = 3.49). In the chromosome 15 peak, that harbors the known COPD locus for nicotinic receptors, and in the chromosome 5 peak we could not identify shared variants. In the chromosome 11 locus, we identified four rare (minor allele frequency (MAF) <0.02), predicted pathogenic, missense variants. These were shared among the affected family members. The identified variants localize to genes including neuroblast differentiation-associated protein (AHNAK), previously associated with blood biomarkers in COPD, phospholipase C Beta 3 (PLCB3), shown to increase airway hyper-responsiveness, solute carrier family 22-A11 (SLC22A11), involved in amino acid metabolism and ion transport, and metallothionein-like protein 5 (MTL5), involved in nicotinate and nicotinamide metabolism. Association of SLC22A11 and MTL5 variants were confirmed in the meta-analysis of 9,888 cases and 27,060 controls. In conclusion, we have identified novel rare variants in plausible genes related to COPD. Further studies utilizing large sample whole-genome sequencing should further confirm the associations at chromosome 11 and investigate the chromosome 15 and 5 linked regions.
Integrated miRNA mRNA analysis reveals regulatory pathways underlying the curly fleece trait in Chinese tan sheep
Background Tan sheep is an indigenous Chinese breed well known for its beautiful curly fleece. One prominent breed characteristic of this sheep breed is that the degree of curliness differs markedly between lambs and adults, but the molecular mechanisms regulating the shift are still not well understood. In this study, we identified 49 differentially expressed (DE) microRNAs (miRNAs) between Tan sheep at the two stages through miRNA-seq, and combined the data with that in our earlier Suppression Subtractive Hybridization cDNA (SSH) library study to elucidate the mechanisms underlying curly fleece formation. Results Thirty-six potential miRNA-mRNA target pairs were identified using computational methods, including 25 DE miRNAs and 10 DE genes involved in the MAPK signaling pathway, steroid biosynthesis and metabolic pathways. With the differential expressions between lambs and adults confirmed by qRT-PCR, some miRNAs were already annotated in the genome, but some were novel miRNAs. Inhibition of KRT83 expression by miR-432 was confirmed by both gene knockdown with siRNA and overexpression, which was consistent with the miRNAs and targets prediction results. Conclusion Our study represents the comprehensive analysis of mRNA and miRNA in Tan sheep and offers detailed insight into the development of curly fleece as well as the potential mechanisms controlling curly hair formation in humans. Electronic supplementary material The online version of this article (10.1186/s12864-018-4736-4) contains supplementary material, which is available to authorized users. Background Tan sheep is an indigenous Chinese breed well known for its beautiful curly fleece. One prominent breed characteristic of this sheep breed is that the degree of curliness differs markedly between lambs and adults, but the molecular mechanisms regulating the shift are still not well understood. In this study, we identified 49 differentially expressed (DE) microRNAs (miRNAs) between Tan sheep at the two stages through miRNA-seq, and combined the data with that in our earlier Suppression Subtractive Hybridization cDNA (SSH) library study to elucidate the mechanisms underlying curly fleece formation. Results Thirty-six potential miRNA-mRNA target pairs were identified using computational methods, including 25 DE miRNAs and 10 DE genes involved in the MAPK signaling pathway, steroid biosynthesis and metabolic pathways. With the differential expressions between lambs and adults confirmed by qRT-PCR, some miRNAs were already annotated in the genome, but some were novel miRNAs. Inhibition of KRT83 expression by miR-432 was confirmed by both gene knockdown with siRNA and overexpression, which was consistent with the miRNAs and targets prediction results. Conclusion Our study represents the comprehensive analysis of mRNA and miRNA in Tan sheep and offers detailed insight into the development of curly fleece as well as the potential mechanisms controlling curly hair formation in humans. Electronic supplementary material The online version of this article (10.1186/s12864-018-4736-4) contains supplementary material, which is available to authorized users.
Enrichment and verification of differentially expressed miRNAs in bursa of Fabricius in two breeds of duck
Objective The bursa of Fabricius (BF) is a central humoral immune organ belonging specifically to avians. Recent studies had suggested that miRNAs were active regulators involved in the immune processes. This study was to investigate the possible differences of the BF at miRNA level between two genetically disparate duck breeds. Methods Using Illumina next-generation sequencing, the miRNAs libraries of ducks were established. Results The results showed that there were 66 differentially expressed miRNAs and 28 novel miRNAs in bursa. A set of abundant miRNAs (i.e., let-7, miR-146a-5p, miR-21-5p, miR-17~92) which are involved in immunity and disease were detected and the predicted target genes of the novel miRNAs were associated with duck high anti-adversity ability. By gene ontology analysis and enriching KEGG pathway, the targets of differential expressed miRNAs were mainly involved in immunity and disease, supporting that there were differences in the BF immune functions between the two duck breeds. In addition, the metabolic pathway had the maximum enriched target genes and some enriched pathways that were related to cell cycle, protein synthesis, cell proliferation and apoptosis. It indicted that the difference of metabolism may be one of the reasons leading the immune difference between the BF of two duck breeds. Conclusion This data lists the main differences in the BF at miRNAs level between two genetically disparate duck breeds and lays a foundation to carry out molecular assisted breeding of poultry in the future. Objective The bursa of Fabricius (BF) is a central humoral immune organ belonging specifically to avians. Recent studies had suggested that miRNAs were active regulators involved in the immune processes. This study was to investigate the possible differences of the BF at miRNA level between two genetically disparate duck breeds. Methods Using Illumina next-generation sequencing, the miRNAs libraries of ducks were established. Results The results showed that there were 66 differentially expressed miRNAs and 28 novel miRNAs in bursa. A set of abundant miRNAs (i.e., let-7, miR-146a-5p, miR-21-5p, miR-17~92) which are involved in immunity and disease were detected and the predicted target genes of the novel miRNAs were associated with duck high anti-adversity ability. By gene ontology analysis and enriching KEGG pathway, the targets of differential expressed miRNAs were mainly involved in immunity and disease, supporting that there were differences in the BF immune functions between the two duck breeds. In addition, the metabolic pathway had the maximum enriched target genes and some enriched pathways that were related to cell cycle, protein synthesis, cell proliferation and apoptosis. It indicted that the difference of metabolism may be one of the reasons leading the immune difference between the BF of two duck breeds. Conclusion This data lists the main differences in the BF at miRNAs level between two genetically disparate duck breeds and lays a foundation to carry out molecular assisted breeding of poultry in the future.
Structural polymorphism in the promoter of pfmrp2 confers Plasmodium falciparum tolerance to quinoline drugs
Drug resistance in Plasmodium falciparum remains a challenge for the malaria eradication programmes around the world. With the emergence of artemisinin resistance, the efficacy of the partner drugs in the artemisinin combination therapies (ACT) that include quinoline-based drugs is becoming critical. So far only few resistance markers have been identified from which only two transmembrane transporters namely PfMDR1 (an ATP-binding cassette transporter) and PfCRT (a drug-metabolite transporter) have been experimentally verified. Another P. falciparum transporter, the ATP-binding cassette containing multidrug resistance-associated protein (PfMRP2) represents an additional possible factor of drug resistance in P. falciparum. In this study, we identified a parasite clone that is derived from the 3D7 P. falciparum strain and shows increased resistance to chloroquine, mefloquine and quinine through the trophozoite and schizont stages. We demonstrate that the resistance phenotype is caused by a 4.1 kb deletion in the 5′ upstream region of the pfmrp2 gene that leads to an alteration in the pfmrp2 transcription and thus increased level of PfMRP2 protein. These results also suggest the importance of putative promoter elements in regulation of gene expression during the P. falciparum intra-erythrocytic developmental cycle and the potential of genetic polymorphisms within these regions to underlie drug resistance. Drug resistance in Plasmodium falciparum remains a challenge for the malaria eradication programmes around the world. With the emergence of artemisinin resistance, the efficacy of the partner drugs in the artemisinin combination therapies (ACT) that include quinoline-based drugs is becoming critical. So far only few resistance markers have been identified from which only two transmembrane transporters namely PfMDR1 (an ATP-binding cassette transporter) and PfCRT (a drug-metabolite transporter) have been experimentally verified. Another P. falciparum transporter, the ATP-binding cassette containing multidrug resistance-associated protein (PfMRP2) represents an additional possible factor of drug resistance in P. falciparum. In this study, we identified a parasite clone that is derived from the 3D7 P. falciparum strain and shows increased resistance to chloroquine, mefloquine and quinine through the trophozoite and schizont stages. We demonstrate that the resistance phenotype is caused by a 4.1 kb deletion in the 5′ upstream region of the pfmrp2 gene that leads to an alteration in the pfmrp2 transcription and thus increased level of PfMRP2 protein. These results also suggest the importance of putative promoter elements in regulation of gene expression during the P. falciparum intra-erythrocytic developmental cycle and the potential of genetic polymorphisms within these regions to underlie drug resistance.
Rare variants in CFI, C3 and C9 are associated with high risk of advanced age related macular degeneration
To define the role of rare variants in advanced age-related macular degeneration (AMD) risk, we sequenced the exons of 681 genes within AMD-associated loci and pathways in 2,493 cases and controls. We first tested each gene for increased or decreased burden of rare variants in cases compared to controls. We found that 7.8% of AMD cases compared to 2.3% of controls are carriers of rare missense CFI variants (OR=3.6, p=2×10−8). There was a predominance of dysfunctional variants in cases compared to controls. We then tested individual variants for association to disease. We observed significant association with rare missense alleles outside CFI. Genotyping in 5,115 independent samples confirmed associations to AMD with a K155Q allele in C3 (replication p=3.5×10−5, OR=2.8; joint p=5.2×10−9, OR=3.8) and a P167S allele in C9 (replication p=2.4×10−5, OR=2.2; joint p=6.5×10−7, OR=2.2). Finally, we show that the 155Q allele in C3 results in resistance to proteolytic inactivation by CFH and CFI. These results implicate loss of C3 protein regulation and excessive alternative complement activation in AMD pathogenesis, thus informing both the direction of effect and mechanistic underpinnings of this disorder. To define the role of rare variants in advanced age-related macular degeneration (AMD) risk, we sequenced the exons of 681 genes within AMD-associated loci and pathways in 2,493 cases and controls. We first tested each gene for increased or decreased burden of rare variants in cases compared to controls. We found that 7.8% of AMD cases compared to 2.3% of controls are carriers of rare missense CFI variants (OR=3.6, p=2×10−8). There was a predominance of dysfunctional variants in cases compared to controls. We then tested individual variants for association to disease. We observed significant association with rare missense alleles outside CFI. Genotyping in 5,115 independent samples confirmed associations to AMD with a K155Q allele in C3 (replication p=3.5×10−5, OR=2.8; joint p=5.2×10−9, OR=3.8) and a P167S allele in C9 (replication p=2.4×10−5, OR=2.2; joint p=6.5×10−7, OR=2.2). Finally, we show that the 155Q allele in C3 results in resistance to proteolytic inactivation by CFH and CFI. These results implicate loss of C3 protein regulation and excessive alternative complement activation in AMD pathogenesis, thus informing both the direction of effect and mechanistic underpinnings of this disorder.
Lumbosacral stenosis in Labrador retriever military working dogs – an exomic exploratory study
Background Canine lumbosacral stenosis is defined as narrowing of the caudal lumbar and/or sacral vertebral canal. A risk factor for neurologic problems in many large sized breeds, lumbosacral stenosis can also cause early retirement in Labrador retriever military working dogs. Though vital for conservative management of the condition, early detection is complicated by the ambiguous nature of clinical signs of lumbosacral stenosis in stoic and high-drive Labrador retriever military working dogs. Though clinical diagnoses of lumbosacral stenosis using CT imaging are standard, they are usually not performed unless dogs present with clinical symptoms. Understanding the underlying genomic mechanisms would be beneficial in developing early detection methods for lumbosacral stenosis, which could prevent premature retirement in working dogs. The exomes of 8 young Labrador retriever military working dogs (4 affected and 4 unaffected by lumbosacral stenosis, phenotypically selected by CT image analyses from 40 dogs with no reported clinical signs of the condition) were sequenced to identify and annotate exonic variants between dogs negative and positive for lumbosacral stenosis. Results Two-hundred and fifty-two variants were detected to be homozygous for the wild allele and either homozygous or heterozygous for the variant allele. Seventeen non-disruptive variants were detected that could affect protein effectiveness in 7 annotated (SCN1B, RGS9BP, ASXL3, TTR, LRRC16B, PTPRO, ZBBX) and 3 predicted genes (EEF1A1, DNAJA1, ZFX). No exonic variants were detected in any of the canine orthologues for human lumbar spinal stenosis candidate genes. Conclusions TTR (transthyretin) gene could be a possible candidate for lumbosacral stenosis in Labrador retrievers based on previous human studies that have reported an association between human lumbar spinal stenosis and transthyretin protein amyloidosis. Other genes identified with exonic variants in this study but with no known published association with lumbosacral stenosis and/or lumbar spinal stenosis could also be candidate genes for future canine lumbosacral stenosis studies but their roles remain currently unknown. Human lumbar spinal stenosis candidate genes also cannot be ruled out as lumbosacral stenosis candidate genes. More definitive genetic investigations of this condition are needed before any genetic test for lumbosacral stenosis in Labrador retriever can be developed. Electronic supplementary material The online version of this article (10.1186/s40575-017-0052-6) contains supplementary material, which is available to authorized users. Background Canine lumbosacral stenosis is defined as narrowing of the caudal lumbar and/or sacral vertebral canal. A risk factor for neurologic problems in many large sized breeds, lumbosacral stenosis can also cause early retirement in Labrador retriever military working dogs. Though vital for conservative management of the condition, early detection is complicated by the ambiguous nature of clinical signs of lumbosacral stenosis in stoic and high-drive Labrador retriever military working dogs. Though clinical diagnoses of lumbosacral stenosis using CT imaging are standard, they are usually not performed unless dogs present with clinical symptoms. Understanding the underlying genomic mechanisms would be beneficial in developing early detection methods for lumbosacral stenosis, which could prevent premature retirement in working dogs. The exomes of 8 young Labrador retriever military working dogs (4 affected and 4 unaffected by lumbosacral stenosis, phenotypically selected by CT image analyses from 40 dogs with no reported clinical signs of the condition) were sequenced to identify and annotate exonic variants between dogs negative and positive for lumbosacral stenosis. Results Two-hundred and fifty-two variants were detected to be homozygous for the wild allele and either homozygous or heterozygous for the variant allele. Seventeen non-disruptive variants were detected that could affect protein effectiveness in 7 annotated (SCN1B, RGS9BP, ASXL3, TTR, LRRC16B, PTPRO, ZBBX) and 3 predicted genes (EEF1A1, DNAJA1, ZFX). No exonic variants were detected in any of the canine orthologues for human lumbar spinal stenosis candidate genes. Conclusions TTR (transthyretin) gene could be a possible candidate for lumbosacral stenosis in Labrador retrievers based on previous human studies that have reported an association between human lumbar spinal stenosis and transthyretin protein amyloidosis. Other genes identified with exonic variants in this study but with no known published association with lumbosacral stenosis and/or lumbar spinal stenosis could also be candidate genes for future canine lumbosacral stenosis studies but their roles remain currently unknown. Human lumbar spinal stenosis candidate genes also cannot be ruled out as lumbosacral stenosis candidate genes. More definitive genetic investigations of this condition are needed before any genetic test for lumbosacral stenosis in Labrador retriever can be developed. Electronic supplementary material The online version of this article (10.1186/s40575-017-0052-6) contains supplementary material, which is available to authorized users.
An autoinflammatory neurological disease due to interleukin 6 hypersecretion
Autoinflammatory diseases are rare illnesses characterized by apparently unprovoked inflammation without high-titer auto-antibodies or antigen-specific T cells. They may cause neurological manifestations, such as meningitis and hearing loss, but they are also characterized by non-neurological manifestations. In this work we studied a 30-year-old man who had a chronic disease characterized by meningitis, progressive hearing loss, persistently raised inflammatory markers and diffuse leukoencephalopathy on brain MRI. He also suffered from chronic recurrent osteomyelitis of the mandible. The hypothesis of an autoinflammatory disease prompted us to test for the presence of mutations in interleukin-1−pathway genes and to investigate the function of this pathway in the mononuclear cells obtained from the patient. Search for mutations in genes associated with interleukin-1−pathway demonstrated a novel NLRP3 (CIAS1) mutation (p.I288M) and a previously described MEFV mutation (p.R761H), but their combination was found to be non-pathogenic. On the other hand, we uncovered a selective interleukin-6 hypersecretion within the central nervous system as the likely pathogenic mechanism. This is also supported by the response to the anti-interleukin-6−receptor monoclonal antibody tocilizumab, but not to the recombinant interleukin-1−receptor antagonist anakinra. Exome sequencing failed to identify mutations in other genes known to be involved in autoinflammatory diseases. We propose that the disease described in this patient might be a prototype of a novel category of autoinflammatory diseases characterized by prominent neurological involvement. Autoinflammatory diseases are rare illnesses characterized by apparently unprovoked inflammation without high-titer auto-antibodies or antigen-specific T cells. They may cause neurological manifestations, such as meningitis and hearing loss, but they are also characterized by non-neurological manifestations. In this work we studied a 30-year-old man who had a chronic disease characterized by meningitis, progressive hearing loss, persistently raised inflammatory markers and diffuse leukoencephalopathy on brain MRI. He also suffered from chronic recurrent osteomyelitis of the mandible. The hypothesis of an autoinflammatory disease prompted us to test for the presence of mutations in interleukin-1−pathway genes and to investigate the function of this pathway in the mononuclear cells obtained from the patient. Search for mutations in genes associated with interleukin-1−pathway demonstrated a novel NLRP3 (CIAS1) mutation (p.I288M) and a previously described MEFV mutation (p.R761H), but their combination was found to be non-pathogenic. On the other hand, we uncovered a selective interleukin-6 hypersecretion within the central nervous system as the likely pathogenic mechanism. This is also supported by the response to the anti-interleukin-6−receptor monoclonal antibody tocilizumab, but not to the recombinant interleukin-1−receptor antagonist anakinra. Exome sequencing failed to identify mutations in other genes known to be involved in autoinflammatory diseases. We propose that the disease described in this patient might be a prototype of a novel category of autoinflammatory diseases characterized by prominent neurological involvement.
Somatic PRDM2 c.4467delA mutations in colorectal cancers control histone methylation and tumor growth
The chromatin modifier PRDM2/RIZ1 is inactivated by mutation in several forms of cancer and is a putative tumor suppressor gene. Frameshift mutations in the C-terminal region of PRDM2, affecting (A)8 or (A)9 repeats within exon 8, are found in one third of colorectal cancers with microsatellite instability, but the contribution of these mutations to colorectal tumorigenesis is unknown. To model somatic mutations in microsatellite unstable tumors, we devised a general approach to perform genome editing while stabilizing the mutated nucleotide repeat. We then engineered isogenic cell systems where the PRDM2 c.4467delA mutation in human HCT116 colorectal cancer cells was corrected to wild-type by genome editing. Restored PRDM2 increased global histone 3 lysine 9 dimethylation and reduced migration, anchorage-independent growth and tumor growth in vivo. Gene set enrichment analysis revealed regulation of several hallmark cancer pathways, particularly of epithelial-to-mesenchymal transition (EMT), with VIM being the most significantly regulated gene. These observations provide direct evidence that PRDM2 c.4467delA is a driver mutation in colorectal cancer and confirms PRDM2 as a cancer gene, pointing to regulation of EMT as a central aspect of its tumor suppressive action. The chromatin modifier PRDM2/RIZ1 is inactivated by mutation in several forms of cancer and is a putative tumor suppressor gene. Frameshift mutations in the C-terminal region of PRDM2, affecting (A)8 or (A)9 repeats within exon 8, are found in one third of colorectal cancers with microsatellite instability, but the contribution of these mutations to colorectal tumorigenesis is unknown. To model somatic mutations in microsatellite unstable tumors, we devised a general approach to perform genome editing while stabilizing the mutated nucleotide repeat. We then engineered isogenic cell systems where the PRDM2 c.4467delA mutation in human HCT116 colorectal cancer cells was corrected to wild-type by genome editing. Restored PRDM2 increased global histone 3 lysine 9 dimethylation and reduced migration, anchorage-independent growth and tumor growth in vivo. Gene set enrichment analysis revealed regulation of several hallmark cancer pathways, particularly of epithelial-to-mesenchymal transition (EMT), with VIM being the most significantly regulated gene. These observations provide direct evidence that PRDM2 c.4467delA is a driver mutation in colorectal cancer and confirms PRDM2 as a cancer gene, pointing to regulation of EMT as a central aspect of its tumor suppressive action.
Patterns of Population Variation in Two Paleopolyploid Eudicot Lineages Suggest That Dosage Based Selection on Homeologs Is Long Lived
Abstract Genes that are inherently subject to strong selective constraints tend to be overretained in duplicate after polyploidy. They also continue to experience similar, but somewhat relaxed, constraints after that polyploidy event. We sought to assess for how long the influence of polyploidy is felt on these genes’ selective pressures. We analyzed two nested polyploidy events in Brassicaceae: the At-α genome duplication that is the most recent polyploidy in the model plant Arabidopsis thaliana and a more recent hexaploidy shared by the genus Brassica and its relatives. By comparing the strength and direction of the natural selection acting at the population and at the species level, we find evidence for continued intensified purifying selection acting on retained duplicates from both polyploidies even down to the present. The constraint observed in preferentially retained genes is not a result of the polyploidy event: the orthologs of such genes experience even stronger constraint in nonpolyploid outgroup genomes. In both the Arabidopsis and Brassica lineages, we further find evidence for segregating mildly deleterious variants, confirming that the population-level data uncover patterns not visible with between-species comparisons. Using the A. thaliana metabolic network, we also explored whether network position was correlated with the measured selective constraint. At both the population and species level, nodes/genes tended to show similar constraints to their neighbors. Our results paint a picture of the long-lived effects of polyploidy on plant genomes, suggesting that even yesterday’s polyploids still have distinct evolutionary trajectories. Abstract Genes that are inherently subject to strong selective constraints tend to be overretained in duplicate after polyploidy. They also continue to experience similar, but somewhat relaxed, constraints after that polyploidy event. We sought to assess for how long the influence of polyploidy is felt on these genes’ selective pressures. We analyzed two nested polyploidy events in Brassicaceae: the At-α genome duplication that is the most recent polyploidy in the model plant Arabidopsis thaliana and a more recent hexaploidy shared by the genus Brassica and its relatives. By comparing the strength and direction of the natural selection acting at the population and at the species level, we find evidence for continued intensified purifying selection acting on retained duplicates from both polyploidies even down to the present. The constraint observed in preferentially retained genes is not a result of the polyploidy event: the orthologs of such genes experience even stronger constraint in nonpolyploid outgroup genomes. In both the Arabidopsis and Brassica lineages, we further find evidence for segregating mildly deleterious variants, confirming that the population-level data uncover patterns not visible with between-species comparisons. Using the A. thaliana metabolic network, we also explored whether network position was correlated with the measured selective constraint. At both the population and species level, nodes/genes tended to show similar constraints to their neighbors. Our results paint a picture of the long-lived effects of polyploidy on plant genomes, suggesting that even yesterday’s polyploids still have distinct evolutionary trajectories.
A nonsense mutation in PRNP associated with clinical Alzheimer's disease☆
Here, we describe a nonsense haplotype in PRNP associated with clinical Alzheimer's disease. The patient presented an early-onset of cognitive decline with memory loss as the primary cognitive problem. Whole-exome sequencing revealed a nonsense mutation in PRNP (NM_000311, c.C478T; p.Q160*; rs80356711) associated with homozygosity for the V allele at position 129 of the protein, further highlighting how very similar genotypes in PRNP result in strikingly different phenotypes. Here, we describe a nonsense haplotype in PRNP associated with clinical Alzheimer's disease. The patient presented an early-onset of cognitive decline with memory loss as the primary cognitive problem. Whole-exome sequencing revealed a nonsense mutation in PRNP (NM_000311, c.C478T; p.Q160*; rs80356711) associated with homozygosity for the V allele at position 129 of the protein, further highlighting how very similar genotypes in PRNP result in strikingly different phenotypes.
Whole genome sequencing and SNV genotyping of ‘Nebbiolo’ (Vitis vinifera L.) clones
‘Nebbiolo’ (Vitis vinifera) is among the most ancient and prestigious wine grape varieties characterised by a wide genetic variability exhibited by a high number of clones (vegetatively propagated lines of selected mother plants). However, limited information is available for this cultivar at the molecular and genomic levels. The whole-genomes of three ‘Nebbiolo’ clones (CVT 71, CVT 185 and CVT 423) were re-sequenced and a de novo transcriptome assembly was produced. Important remarks about the genetic peculiarities of ‘Nebbiolo’ and its intra-varietal variability useful for clonal identification were reported. In particular, several varietal transcripts identified for the first time in ‘Nebbiolo’ were disease resistance genes and single-nucleotide variants (SNVs) identified in ‘Nebbiolo’, but not in other cultivars, were associated with genes involved in the stress response. Ten newly discovered SNVs were successfully employed to identify some periclinal chimeras and to classify 98 ‘Nebbiolo’ clones in seven main genotypes, which resulted to be linked to the geographical origin of accessions. In addition, for the first time it was possible to discriminate some ‘Nebbiolo’ clones from the others. ‘Nebbiolo’ (Vitis vinifera) is among the most ancient and prestigious wine grape varieties characterised by a wide genetic variability exhibited by a high number of clones (vegetatively propagated lines of selected mother plants). However, limited information is available for this cultivar at the molecular and genomic levels. The whole-genomes of three ‘Nebbiolo’ clones (CVT 71, CVT 185 and CVT 423) were re-sequenced and a de novo transcriptome assembly was produced. Important remarks about the genetic peculiarities of ‘Nebbiolo’ and its intra-varietal variability useful for clonal identification were reported. In particular, several varietal transcripts identified for the first time in ‘Nebbiolo’ were disease resistance genes and single-nucleotide variants (SNVs) identified in ‘Nebbiolo’, but not in other cultivars, were associated with genes involved in the stress response. Ten newly discovered SNVs were successfully employed to identify some periclinal chimeras and to classify 98 ‘Nebbiolo’ clones in seven main genotypes, which resulted to be linked to the geographical origin of accessions. In addition, for the first time it was possible to discriminate some ‘Nebbiolo’ clones from the others.
A novel homozygous truncating GNAT1 mutation implicated in retinal degeneration
Background The GNAT1 gene encodes the α subunit of the rod transducin protein, a key element in the rod phototransduction cascade. Variants in GNAT1 have been implicated in stationary night-blindness in the past, but unlike other proteins in the same pathway, it has not previously been implicated in retinitis pigmentosa. Methods A panel of 182 retinopathy-associated genes was sequenced to locate disease-causing mutations in patients with inherited retinopathies. Results Sequencing revealed a novel homozygous truncating mutation in the GNAT1 gene in a patient with significant pigmentary disturbance and constriction of visual fields, a presentation consistent with retinitis pigmentosa. This is the first report of a patient homozygous for a complete loss-of-function GNAT1 mutation. The clinical data from this patient provide definitive evidence of retinitis pigmentosa with late onset in addition to the lifelong night-blindness that would be expected from a lack of transducin function. Conclusion These data suggest that some truncating GNAT1 variants can indeed cause a recessive, mild, late-onset retinal degeneration in human beings rather than just stationary night-blindness as reported previously, with notable similarities to the phenotype of the Gnat1 knockout mouse. Background The GNAT1 gene encodes the α subunit of the rod transducin protein, a key element in the rod phototransduction cascade. Variants in GNAT1 have been implicated in stationary night-blindness in the past, but unlike other proteins in the same pathway, it has not previously been implicated in retinitis pigmentosa. Methods A panel of 182 retinopathy-associated genes was sequenced to locate disease-causing mutations in patients with inherited retinopathies. Results Sequencing revealed a novel homozygous truncating mutation in the GNAT1 gene in a patient with significant pigmentary disturbance and constriction of visual fields, a presentation consistent with retinitis pigmentosa. This is the first report of a patient homozygous for a complete loss-of-function GNAT1 mutation. The clinical data from this patient provide definitive evidence of retinitis pigmentosa with late onset in addition to the lifelong night-blindness that would be expected from a lack of transducin function. Conclusion These data suggest that some truncating GNAT1 variants can indeed cause a recessive, mild, late-onset retinal degeneration in human beings rather than just stationary night-blindness as reported previously, with notable similarities to the phenotype of the Gnat1 knockout mouse.
AR 13, a Celecoxib Derivative, Directly Kills Francisella In Vitro and Aids Clearance and Mouse Survival In Vivo
Francisella tularensis (F. tularensis) is the causative agent of tularemia and is classified as a Tier 1 select agent. No licensed vaccine is currently available in the United States and treatment of tularemia is confined to few antibiotics. In this study, we demonstrate that AR-13, a derivative of the cyclooxygenase-2 inhibitor celecoxib, exhibits direct in vitro bactericidal killing activity against Francisella including a type A strain of F. tularensis (SchuS4) and the live vaccine strain (LVS), as well as toward the intracellular proliferation of LVS in macrophages, without causing significant host cell toxicity. Identification of an AR-13-resistant isolate indicates that this compound has an intracellular target(s) and that efflux pumps can mediate AR-13 resistance. In the mouse model of tularemia, AR-13 treatment protected 50% of the mice from lethal LVS infection and prolonged survival time from a lethal dose of F. tularensis SchuS4. Combination of AR-13 with a sub-optimal dose of gentamicin protected 60% of F. tularensis SchuS4-infected mice from death. Taken together, these data support the translational potential of AR-13 as a lead compound for the further development of new anti-Francisella agents. Francisella tularensis (F. tularensis) is the causative agent of tularemia and is classified as a Tier 1 select agent. No licensed vaccine is currently available in the United States and treatment of tularemia is confined to few antibiotics. In this study, we demonstrate that AR-13, a derivative of the cyclooxygenase-2 inhibitor celecoxib, exhibits direct in vitro bactericidal killing activity against Francisella including a type A strain of F. tularensis (SchuS4) and the live vaccine strain (LVS), as well as toward the intracellular proliferation of LVS in macrophages, without causing significant host cell toxicity. Identification of an AR-13-resistant isolate indicates that this compound has an intracellular target(s) and that efflux pumps can mediate AR-13 resistance. In the mouse model of tularemia, AR-13 treatment protected 50% of the mice from lethal LVS infection and prolonged survival time from a lethal dose of F. tularensis SchuS4. Combination of AR-13 with a sub-optimal dose of gentamicin protected 60% of F. tularensis SchuS4-infected mice from death. Taken together, these data support the translational potential of AR-13 as a lead compound for the further development of new anti-Francisella agents.