Unlock your biological data

?
Update of the Anopheles gambiae PEST genome assembly
An update on the Anopheles gambiae PEST genome assembly places about 33% of previously unmapped sequences on the chromosomes. An update on the Anopheles gambiae PEST genome assembly places about 33% of previously unmapped sequences on the chromosomes.
The genome of the platyfish, Xiphophorus maculatus, provides insights into evolutionary adaptation and several complex traits
Several attributes intuitively considered to be typical mammalian features, such as complex behavior, live birth, and malignant diseases like cancer, also appeared several times independently in so-called “lower” vertebrates. The genetic mechanisms underlying the evolution of these elaborate traits are poorly understood. The platyfish, Xiphophorus maculatus, offers a unique model to better understand the molecular biology of such traits. Herein we detail sequencing of the platyfish genome. Integrating genome assembly with extensive genetic maps uncovered that fish, in contrast to mammals, exhibit an unexpected evolutionary stability of chromosomes. Genes associated with viviparity show signatures of positive selection identifying new putative functional domains and rare cases of parallel evolution. We also discovered that genes implicated in cognition possess an unexpected high rate of duplicate gene retention after the teleost genome duplication suggesting a hypothesis for the evolution of the great behavioral complexity in fish, which exceeds that in amphibians and reptiles. Several attributes intuitively considered to be typical mammalian features, such as complex behavior, live birth, and malignant diseases like cancer, also appeared several times independently in so-called “lower” vertebrates. The genetic mechanisms underlying the evolution of these elaborate traits are poorly understood. The platyfish, Xiphophorus maculatus, offers a unique model to better understand the molecular biology of such traits. Herein we detail sequencing of the platyfish genome. Integrating genome assembly with extensive genetic maps uncovered that fish, in contrast to mammals, exhibit an unexpected evolutionary stability of chromosomes. Genes associated with viviparity show signatures of positive selection identifying new putative functional domains and rare cases of parallel evolution. We also discovered that genes implicated in cognition possess an unexpected high rate of duplicate gene retention after the teleost genome duplication suggesting a hypothesis for the evolution of the great behavioral complexity in fish, which exceeds that in amphibians and reptiles.
The Genome and Linkage Map of the Northern Pike (Esox lucius): Conserved Synteny Revealed between the Salmonid Sister Group and the Neoteleostei
The northern pike is the most frequently studied member of the Esociformes, the closest order to the diverse and economically important Salmoniformes. The ancestor of all salmonids purportedly experienced a whole-genome duplication (WGD) event, making salmonid species ideal for studying the early impacts of genome duplication while complicating their use in wider analyses of teleost evolution. Studies suggest that the Esociformes diverged from the salmonid lineage prior to the WGD, supporting the use of northern pike as a pre-duplication outgroup. Here we present the first genome assembly, reference transcriptome and linkage map for northern pike, and evaluate the suitability of this species to provide a representative pre-duplication genome for future studies of salmonid and teleost evolution. The northern pike genome sequence is composed of 94,267 contigs (N50 = 16,909 bp) contained in 5,688 scaffolds (N50 = 700,535 bp); the total scaffolded genome size is 878 million bases. Multiple lines of evidence suggest that over 96% of the protein-coding genome is present in the genome assembly. The reference transcriptome was constructed from 13 tissues and contains 38,696 transcripts, which are accompanied by normalized expression data in all tissues. Gene-prediction analysis produced a total of 19,601 northern pike-specific gene models. The first-generation linkage map identifies 25 linkage groups, in agreement with northern pike's diploid karyotype of 2N = 50, and facilitates the placement of 46% of assembled bases onto linkage groups. Analyses reveal a high degree of conserved synteny between northern pike and other model teleost genomes. While conservation of gene order is limited to smaller syntenic blocks, the wider conservation of genome organization implies the northern pike exhibits a suitable approximation of a non-duplicated Protacanthopterygiian genome. This dataset will facilitate future studies of esocid biology and empower ongoing examinations of the Atlantic salmon and rainbow trout genomes by facilitating their comparison with other major teleost groups. The northern pike is the most frequently studied member of the Esociformes, the closest order to the diverse and economically important Salmoniformes. The ancestor of all salmonids purportedly experienced a whole-genome duplication (WGD) event, making salmonid species ideal for studying the early impacts of genome duplication while complicating their use in wider analyses of teleost evolution. Studies suggest that the Esociformes diverged from the salmonid lineage prior to the WGD, supporting the use of northern pike as a pre-duplication outgroup. Here we present the first genome assembly, reference transcriptome and linkage map for northern pike, and evaluate the suitability of this species to provide a representative pre-duplication genome for future studies of salmonid and teleost evolution. The northern pike genome sequence is composed of 94,267 contigs (N50 = 16,909 bp) contained in 5,688 scaffolds (N50 = 700,535 bp); the total scaffolded genome size is 878 million bases. Multiple lines of evidence suggest that over 96% of the protein-coding genome is present in the genome assembly. The reference transcriptome was constructed from 13 tissues and contains 38,696 transcripts, which are accompanied by normalized expression data in all tissues. Gene-prediction analysis produced a total of 19,601 northern pike-specific gene models. The first-generation linkage map identifies 25 linkage groups, in agreement with northern pike's diploid karyotype of 2N = 50, and facilitates the placement of 46% of assembled bases onto linkage groups. Analyses reveal a high degree of conserved synteny between northern pike and other model teleost genomes. While conservation of gene order is limited to smaller syntenic blocks, the wider conservation of genome organization implies the northern pike exhibits a suitable approximation of a non-duplicated Protacanthopterygiian genome. This dataset will facilitate future studies of esocid biology and empower ongoing examinations of the Atlantic salmon and rainbow trout genomes by facilitating their comparison with other major teleost groups.
Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
Background Clostridium botulinum and related clostridial species express extremely potent neurotoxins known as botulinum neurotoxins (BoNTs) that cause long-lasting, potentially fatal intoxications in humans and other mammals. The amino acid variation within the BoNT is used to categorize the species into seven immunologically distinct BoNT serotypes (A–G) which are further divided into subtypes. The BoNTs are located within two generally conserved gene arrangements known as botulinum progenitor complexes which encode toxin-associated proteins involved in toxin stability and expression. Methodology/Principal Findings Because serotype A and B strains are responsible for the vast majority of human botulism cases worldwide, the location, arrangement and sequences of genes from eight different toxin complexes representing four different BoNT/A subtypes (BoNT/A1-Ba4) and one BoNT/B1 strain were examined. The bivalent Ba4 strain contained both the BoNT/A4 and BoNT/bvB toxin clusters. The arrangements of the BoNT/A3 and BoNT/A4 subtypes differed from the BoNT/A1 strains and were similar to those of BoNT/A2. However, unlike the BoNT/A2 subtype, the toxin complex genes of BoNT/A3 and BoNT/A4 were found within large plasmids and not within the chromosome. In the Ba4 strain, both BoNT toxin clusters (A4 and bivalent B) were located within the same 270 kb plasmid, separated by 97 kb. Complete genomic sequencing of the BoNT/B1 strain also revealed that its toxin complex genes were located within a 149 kb plasmid and the BoNT/A3 complex is within a 267 kb plasmid. Conclusions/Significance Despite their size differences and the BoNT genes they contain, the three plasmids containing these toxin cluster genes share significant sequence identity. The presence of partial insertion sequence (IS) elements, evidence of recombination/gene duplication events, and the discovery of the BoNT/A3, BoNT/Ba4 and BoNT/B1 toxin complex genes within plasmids illustrate the different mechanisms by which these genes move among diverse genetic backgrounds of C. botulinum. Background Clostridium botulinum and related clostridial species express extremely potent neurotoxins known as botulinum neurotoxins (BoNTs) that cause long-lasting, potentially fatal intoxications in humans and other mammals. The amino acid variation within the BoNT is used to categorize the species into seven immunologically distinct BoNT serotypes (A–G) which are further divided into subtypes. The BoNTs are located within two generally conserved gene arrangements known as botulinum progenitor complexes which encode toxin-associated proteins involved in toxin stability and expression. Methodology/Principal Findings Because serotype A and B strains are responsible for the vast majority of human botulism cases worldwide, the location, arrangement and sequences of genes from eight different toxin complexes representing four different BoNT/A subtypes (BoNT/A1-Ba4) and one BoNT/B1 strain were examined. The bivalent Ba4 strain contained both the BoNT/A4 and BoNT/bvB toxin clusters. The arrangements of the BoNT/A3 and BoNT/A4 subtypes differed from the BoNT/A1 strains and were similar to those of BoNT/A2. However, unlike the BoNT/A2 subtype, the toxin complex genes of BoNT/A3 and BoNT/A4 were found within large plasmids and not within the chromosome. In the Ba4 strain, both BoNT toxin clusters (A4 and bivalent B) were located within the same 270 kb plasmid, separated by 97 kb. Complete genomic sequencing of the BoNT/B1 strain also revealed that its toxin complex genes were located within a 149 kb plasmid and the BoNT/A3 complex is within a 267 kb plasmid. Conclusions/Significance Despite their size differences and the BoNT genes they contain, the three plasmids containing these toxin cluster genes share significant sequence identity. The presence of partial insertion sequence (IS) elements, evidence of recombination/gene duplication events, and the discovery of the BoNT/A3, BoNT/Ba4 and BoNT/B1 toxin complex genes within plasmids illustrate the different mechanisms by which these genes move among diverse genetic backgrounds of C. botulinum.
Complete genome sequence of Enterococcus faecium strain TX16 and comparative genomic analysis of Enterococcus faecium genomes
Background Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references. Results In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST) 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA) strains (including STs 16, 17, 18, and 78), in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade) and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA) clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains, as previously reported. Conclusions Our findings along with other studies show that HA clonal lineages harbor specific genetic elements as well as sequence differences in the core genome which may confer selection advantages over the more heterogeneous CA E. faecium isolates. Which of these differences are important for the success of specific E. faecium lineages in the hospital environment remain(s) to be determined. Background Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references. Results In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST) 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA) strains (including STs 16, 17, 18, and 78), in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade) and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA) clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains, as previously reported. Conclusions Our findings along with other studies show that HA clonal lineages harbor specific genetic elements as well as sequence differences in the core genome which may confer selection advantages over the more heterogeneous CA E. faecium isolates. Which of these differences are important for the success of specific E. faecium lineages in the hospital environment remain(s) to be determined.
Characterization of Genome-Wide Variation in Four-Row Wax, a Waxy Maize Landrace with a Reduced Kernel Row Phenotype
In southwest China, some maize landraces have long been isolated geographically, and have phenotypes that differ from those of widely grown cultivars. These landraces may harbor rich genetic variation responsible for those phenotypes. Four-row Wax is one such landrace, with four rows of kernels on the cob. We resequenced the genome of Four-row Wax, obtaining 50.46 Gb sequence at 21.87× coverage, then identified and characterized 3,252,194 SNPs, 213,181 short InDels (1–5 bp) and 39,631 structural variations (greater than 5 bp). Of those, 312,511 (9.6%) SNPs were novel compared to the most detailed haplotype map (HapMap) SNP database of maize. Characterization of variations in reported kernel row number (KRN) related genes and KRN QTL regions revealed potential causal mutations in fea2, td1, kn1, and te1. Genome-wide comparisons revealed abundant genetic variations in Four-row Wax, which may be associated with environmental adaptation. The sequence and SNP variations described here enrich genetic resources of maize, and provide guidance into study of seed numbers for crop yield improvement. In southwest China, some maize landraces have long been isolated geographically, and have phenotypes that differ from those of widely grown cultivars. These landraces may harbor rich genetic variation responsible for those phenotypes. Four-row Wax is one such landrace, with four rows of kernels on the cob. We resequenced the genome of Four-row Wax, obtaining 50.46 Gb sequence at 21.87× coverage, then identified and characterized 3,252,194 SNPs, 213,181 short InDels (1–5 bp) and 39,631 structural variations (greater than 5 bp). Of those, 312,511 (9.6%) SNPs were novel compared to the most detailed haplotype map (HapMap) SNP database of maize. Characterization of variations in reported kernel row number (KRN) related genes and KRN QTL regions revealed potential causal mutations in fea2, td1, kn1, and te1. Genome-wide comparisons revealed abundant genetic variations in Four-row Wax, which may be associated with environmental adaptation. The sequence and SNP variations described here enrich genetic resources of maize, and provide guidance into study of seed numbers for crop yield improvement.
CNV discovery for milk composition traits in dairy cattle using whole genome resequencing
Background Copy number variations (CNVs) are important and widely distributed in the genome. CNV detection opens a new avenue for exploring genes associated with complex traits in humans, animals and plants. Herein, we present a genome-wide assessment of CNVs that are potentially associated with milk composition traits in dairy cattle. Results In this study, CNVs were detected based on whole genome re-sequencing data of eight Holstein bulls from four half- and/or full-sib families, with extremely high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage. The range of coverage depth per individual was 8.2–11.9×. Using CNVnator, we identified a total of 14,821 CNVs, including 5025 duplications and 9796 deletions. Among them, 487 differential CNV regions (CNVRs) comprising ~8.23 Mb of the cattle genome were observed between the high and low groups. Annotation of these differential CNVRs were performed based on the cattle genome reference assembly (UMD3.1) and totally 235 functional genes were found within the CNVRs. By Gene Ontology and KEGG pathway analyses, we found that genes were significantly enriched for specific biological functions related to protein and lipid metabolism, insulin/IGF pathway-protein kinase B signaling cascade, prolactin signaling pathway and AMPK signaling pathways. These genes included INS, IGF2, FOXO3, TH, SCD5, GALNT18, GALNT16, ART3, SNCA and WNT7A, implying their potential association with milk protein and fat traits. In addition, 95 CNVRs were overlapped with 75 known QTLs that are associated with milk protein and fat traits of dairy cattle (Cattle QTLdb). Conclusions In conclusion, based on NGS of 8 Holstein bulls with extremely high and low EBVs for milk PP and FP, we identified a total of 14,821 CNVs, 487 differential CNVRs between groups, and 10 genes, which were suggested as promising candidate genes for milk protein and fat traits. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3636-3) contains supplementary material, which is available to authorized users. Background Copy number variations (CNVs) are important and widely distributed in the genome. CNV detection opens a new avenue for exploring genes associated with complex traits in humans, animals and plants. Herein, we present a genome-wide assessment of CNVs that are potentially associated with milk composition traits in dairy cattle. Results In this study, CNVs were detected based on whole genome re-sequencing data of eight Holstein bulls from four half- and/or full-sib families, with extremely high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage. The range of coverage depth per individual was 8.2–11.9×. Using CNVnator, we identified a total of 14,821 CNVs, including 5025 duplications and 9796 deletions. Among them, 487 differential CNV regions (CNVRs) comprising ~8.23 Mb of the cattle genome were observed between the high and low groups. Annotation of these differential CNVRs were performed based on the cattle genome reference assembly (UMD3.1) and totally 235 functional genes were found within the CNVRs. By Gene Ontology and KEGG pathway analyses, we found that genes were significantly enriched for specific biological functions related to protein and lipid metabolism, insulin/IGF pathway-protein kinase B signaling cascade, prolactin signaling pathway and AMPK signaling pathways. These genes included INS, IGF2, FOXO3, TH, SCD5, GALNT18, GALNT16, ART3, SNCA and WNT7A, implying their potential association with milk protein and fat traits. In addition, 95 CNVRs were overlapped with 75 known QTLs that are associated with milk protein and fat traits of dairy cattle (Cattle QTLdb). Conclusions In conclusion, based on NGS of 8 Holstein bulls with extremely high and low EBVs for milk PP and FP, we identified a total of 14,821 CNVs, 487 differential CNVRs between groups, and 10 genes, which were suggested as promising candidate genes for milk protein and fat traits. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3636-3) contains supplementary material, which is available to authorized users.
Compound heterozygous SLC19A3 mutations further refine the critical promoter region for biotin-thiamine-responsive basal ganglia disease
Mutations in the gene SLC19A3 result in thiamine metabolism dysfunction syndrome 2, also known as biotin-thiamine-responsive basal ganglia disease (BTBGD). This neurometabolic disease typically presents in early childhood with progressive neurodegeneration, including confusion, seizures, and dysphagia, advancing to coma and death. Treatment is possible via supplement of biotin and/or thiamine, with early treatment resulting in significant lifelong improvements. Here we report two siblings who received a refined diagnosis of BTBGD following whole-genome sequencing. Both children inherited compound heterozygous mutations from unaffected parents; a missense single-nucleotide variant (p.G23V) in the first transmembrane domain of the protein, and a 4808-bp deletion in exon 1 encompassing the 5′ UTR and minimal promoter region. This deletion is the smallest promoter deletion reported to date, further defining the minimal promoter region of SLC19A3. Unfortunately, one of the siblings died prior to diagnosis, but the other is showing significant improvement after commencement of therapy. This case demonstrates the power of whole-genome sequencing for the identification of structural variants and subsequent diagnosis of rare neurodevelopmental disorders. Mutations in the gene SLC19A3 result in thiamine metabolism dysfunction syndrome 2, also known as biotin-thiamine-responsive basal ganglia disease (BTBGD). This neurometabolic disease typically presents in early childhood with progressive neurodegeneration, including confusion, seizures, and dysphagia, advancing to coma and death. Treatment is possible via supplement of biotin and/or thiamine, with early treatment resulting in significant lifelong improvements. Here we report two siblings who received a refined diagnosis of BTBGD following whole-genome sequencing. Both children inherited compound heterozygous mutations from unaffected parents; a missense single-nucleotide variant (p.G23V) in the first transmembrane domain of the protein, and a 4808-bp deletion in exon 1 encompassing the 5′ UTR and minimal promoter region. This deletion is the smallest promoter deletion reported to date, further defining the minimal promoter region of SLC19A3. Unfortunately, one of the siblings died prior to diagnosis, but the other is showing significant improvement after commencement of therapy. This case demonstrates the power of whole-genome sequencing for the identification of structural variants and subsequent diagnosis of rare neurodevelopmental disorders.
First‐generation HapMap in Cajanus spp. reveals untapped variations in parental lines of mapping populations
Summary Whole genome re‐sequencing (WGRS) was conducted on a panel of 20 Cajanus spp. accessions (crossing parentals of recombinant inbred lines, introgression lines, multiparent advanced generation intercross and nested association mapping population) comprising of two wild species and 18 cultivated species accessions. A total of 791.77 million paired‐end reads were generated with an effective mapping depth of ~12X per accession. Analysis of WGRS data provided 5 465 676 genome‐wide variations including 4 686 422 SNPs and 779 254 InDels across the accessions. Large structural variations in the form of copy number variations (2598) and presence and absence variations (970) were also identified. Additionally, 2 630 904 accession‐specific variations comprising of 2 278 571 SNPs (86.6%), 166 243 deletions (6.3%) and 186 090 insertions (7.1%) were also reported. Identified polymorphic sites in this study provide the first‐generation HapMap in Cajanus spp. which will be useful in mapping the genomic regions responsible for important traits. Summary Whole genome re‐sequencing (WGRS) was conducted on a panel of 20 Cajanus spp. accessions (crossing parentals of recombinant inbred lines, introgression lines, multiparent advanced generation intercross and nested association mapping population) comprising of two wild species and 18 cultivated species accessions. A total of 791.77 million paired‐end reads were generated with an effective mapping depth of ~12X per accession. Analysis of WGRS data provided 5 465 676 genome‐wide variations including 4 686 422 SNPs and 779 254 InDels across the accessions. Large structural variations in the form of copy number variations (2598) and presence and absence variations (970) were also identified. Additionally, 2 630 904 accession‐specific variations comprising of 2 278 571 SNPs (86.6%), 166 243 deletions (6.3%) and 186 090 insertions (7.1%) were also reported. Identified polymorphic sites in this study provide the first‐generation HapMap in Cajanus spp. which will be useful in mapping the genomic regions responsible for important traits.
Characterization of Three New Insect-Specific Flaviviruses: Their Relationship to the Mosquito-Borne Flavivirus Pathogens
Abstract. Three novel insect-specific flaviviruses, isolated from mosquitoes collected in Peru, Malaysia (Sarawak), and the United States, are characterized. The new viruses, designated La Tina, Kampung Karu, and Long Pine Key, respectively, are antigenically and phylogenetically more similar to the mosquito-borne flavivirus pathogens, than to the classical insect-specific viruses like cell fusing agent and Culex flavivirus. The potential implications of this relationship and the possible uses of these and other arbovirus-related insect-specific flaviviruses are reviewed. Abstract. Three novel insect-specific flaviviruses, isolated from mosquitoes collected in Peru, Malaysia (Sarawak), and the United States, are characterized. The new viruses, designated La Tina, Kampung Karu, and Long Pine Key, respectively, are antigenically and phylogenetically more similar to the mosquito-borne flavivirus pathogens, than to the classical insect-specific viruses like cell fusing agent and Culex flavivirus. The potential implications of this relationship and the possible uses of these and other arbovirus-related insect-specific flaviviruses are reviewed.
Genome Sequence of a Marbled Eel Polyoma-Like Virus in Taiwan
ABSTRACT We report here the complete genome sequence of a virus isolated from a diseased marbled eel (Anguilla marmorata) in Taiwan. The virus has been characterized as being related to Japanese eel endothelial cell-infecting virus (JEECV), with a large T-antigen-like protein. The sequence of the marbled eel virus displays low homology to the JEECV. ABSTRACT We report here the complete genome sequence of a virus isolated from a diseased marbled eel (Anguilla marmorata) in Taiwan. The virus has been characterized as being related to Japanese eel endothelial cell-infecting virus (JEECV), with a large T-antigen-like protein. The sequence of the marbled eel virus displays low homology to the JEECV.
Combining next-generation sequencing and online databases for microsatellite development in non-model organisms
Next-generation sequencing (NGS) is revolutionising marker development and the rapidly increasing amount of transcriptomes published across a wide variety of taxa is providing valuable sequence databases for the identification of genetic markers without the need to generate new sequences. Microsatellites are still the most important source of polymorphic markers in ecology and evolution. Motivated by our long-term interest in the adaptive radiation of a non-model species complex of whitefishes (Coregonus spp.), in this study, we focus on microsatellite characterisation and multiplex optimisation using transcriptome sequences generated by Illumina® and Roche-454, as well as online databases of Expressed Sequence Tags (EST) for the study of whitefish evolution and demographic history. We identified and optimised 40 polymorphic loci in multiplex PCR reactions and validated the robustness of our analyses by testing several population genetics and phylogeographic predictions using 494 fish from five lakes and 2 distinct ecotypes. Next-generation sequencing (NGS) is revolutionising marker development and the rapidly increasing amount of transcriptomes published across a wide variety of taxa is providing valuable sequence databases for the identification of genetic markers without the need to generate new sequences. Microsatellites are still the most important source of polymorphic markers in ecology and evolution. Motivated by our long-term interest in the adaptive radiation of a non-model species complex of whitefishes (Coregonus spp.), in this study, we focus on microsatellite characterisation and multiplex optimisation using transcriptome sequences generated by Illumina® and Roche-454, as well as online databases of Expressed Sequence Tags (EST) for the study of whitefish evolution and demographic history. We identified and optimised 40 polymorphic loci in multiplex PCR reactions and validated the robustness of our analyses by testing several population genetics and phylogeographic predictions using 494 fish from five lakes and 2 distinct ecotypes.
Six novel Y chromosome genes in Anopheles mosquitoes discovered by independently sequencing males and females
Background Y chromosomes are responsible for the initiation of male development, male fertility, and other male-related functions in diverse species. However, Y genes are rarely characterized outside a few model species due to the arduous nature of studying the repeat-rich Y. Results The chromosome quotient (CQ) is a novel approach to systematically discover Y chromosome genes. In the CQ method, genomic DNA from males and females is sequenced independently and aligned to candidate reference sequences. The female to male ratio of the number of alignments to a reference sequence, a parameter called the chromosome quotient (CQ), is used to determine whether the sequence is Y-linked. Using the CQ method, we successfully identified known Y sequences from Homo sapiens and Drosophila melanogaster. The CQ method facilitated the discovery of Y chromosome sequences from the malaria mosquitoes Anopheles stephensi and An. gambiae. Comparisons to transcriptome sequence data with blastn led to the discovery of six Anopheles Y genes, three from each species. All six genes are expressed in the early embryo. Two of the three An. stephensi Y genes were recently acquired from the autosomes or the X. Although An. stephensi and An. gambiae belong to the same subgenus, we found no evidence of Y genes shared between the species. Conclusions The CQ method can reliably identify Y chromosome sequences using the ratio of alignments from male and female sequence data. The CQ method is widely applicable to species with fragmented genome assemblies produced from next-generation sequencing data. Analysis of the six Y genes characterized in this study indicates rapid Y chromosome evolution between An. stephensi and An. gambiae. The Anopheles Y genes discovered by the CQ method provide unique markers for population and phylogenetic analysis, and opportunities for novel mosquito control measures through the manipulation of sexual dimorphism and fertility. Background Y chromosomes are responsible for the initiation of male development, male fertility, and other male-related functions in diverse species. However, Y genes are rarely characterized outside a few model species due to the arduous nature of studying the repeat-rich Y. Results The chromosome quotient (CQ) is a novel approach to systematically discover Y chromosome genes. In the CQ method, genomic DNA from males and females is sequenced independently and aligned to candidate reference sequences. The female to male ratio of the number of alignments to a reference sequence, a parameter called the chromosome quotient (CQ), is used to determine whether the sequence is Y-linked. Using the CQ method, we successfully identified known Y sequences from Homo sapiens and Drosophila melanogaster. The CQ method facilitated the discovery of Y chromosome sequences from the malaria mosquitoes Anopheles stephensi and An. gambiae. Comparisons to transcriptome sequence data with blastn led to the discovery of six Anopheles Y genes, three from each species. All six genes are expressed in the early embryo. Two of the three An. stephensi Y genes were recently acquired from the autosomes or the X. Although An. stephensi and An. gambiae belong to the same subgenus, we found no evidence of Y genes shared between the species. Conclusions The CQ method can reliably identify Y chromosome sequences using the ratio of alignments from male and female sequence data. The CQ method is widely applicable to species with fragmented genome assemblies produced from next-generation sequencing data. Analysis of the six Y genes characterized in this study indicates rapid Y chromosome evolution between An. stephensi and An. gambiae. The Anopheles Y genes discovered by the CQ method provide unique markers for population and phylogenetic analysis, and opportunities for novel mosquito control measures through the manipulation of sexual dimorphism and fertility.
QTL Mapping for Pest and Disease Resistance in Cassava and Coincidence of Some QTL with Introgression Regions Derived from Manihot glaziovii
Genetic mapping of quantitative trait loci (QTL) for resistance to cassava brown streak disease (CBSD), cassava mosaic disease (CMD), and cassava green mite (CGM) was performed using an F1 cross developed between the Tanzanian landrace, Kiroba, and a breeding line, AR37-80. The population was evaluated for two consecutive years in two sites in Tanzania. A genetic linkage map was derived from 106 F1 progeny and 1,974 SNP markers and spanned 18 chromosomes covering a distance of 1,698 cM. Fifteen significant QTL were identified; two are associated with CBSD root necrosis only, and were detected on chromosomes V and XII, while seven were associated with CBSD foliar symptoms only and were detected on chromosomes IV, VI, XVII, and XVIII. QTL on chromosomes 11 and 15 were associated with both CBSD foliar and root necrosis symptoms. Two QTL were found to be associated with CMD and were detected on chromosomes XII and XIV, while two were associated with CGM and were identified on chromosomes V and X. There are large Manihot glaziovii introgression regions in Kiroba on chromosomes I, XVII, and XVIII. The introgression segments on chromosomes XVII and XVIII overlap with QTL associated with CBSD foliar symptoms. The introgression region on chromosome I is of a different haplotype to the characteristic “Amani haplotype” found in the landrace Namikonga and others, and unlike some other genotypes, Kiroba does not have a large introgression block on chromosome IV. Kiroba is closely related to a sampled Tanzanian “tree cassava.” This supports the observation that some of the QTL associated with CBSD resistance in Kiroba are different to those observed in another variety, Namikonga. Genetic mapping of quantitative trait loci (QTL) for resistance to cassava brown streak disease (CBSD), cassava mosaic disease (CMD), and cassava green mite (CGM) was performed using an F1 cross developed between the Tanzanian landrace, Kiroba, and a breeding line, AR37-80. The population was evaluated for two consecutive years in two sites in Tanzania. A genetic linkage map was derived from 106 F1 progeny and 1,974 SNP markers and spanned 18 chromosomes covering a distance of 1,698 cM. Fifteen significant QTL were identified; two are associated with CBSD root necrosis only, and were detected on chromosomes V and XII, while seven were associated with CBSD foliar symptoms only and were detected on chromosomes IV, VI, XVII, and XVIII. QTL on chromosomes 11 and 15 were associated with both CBSD foliar and root necrosis symptoms. Two QTL were found to be associated with CMD and were detected on chromosomes XII and XIV, while two were associated with CGM and were identified on chromosomes V and X. There are large Manihot glaziovii introgression regions in Kiroba on chromosomes I, XVII, and XVIII. The introgression segments on chromosomes XVII and XVIII overlap with QTL associated with CBSD foliar symptoms. The introgression region on chromosome I is of a different haplotype to the characteristic “Amani haplotype” found in the landrace Namikonga and others, and unlike some other genotypes, Kiroba does not have a large introgression block on chromosome IV. Kiroba is closely related to a sampled Tanzanian “tree cassava.” This supports the observation that some of the QTL associated with CBSD resistance in Kiroba are different to those observed in another variety, Namikonga.
Draft Genome Sequence of Sphingobium quisquiliarum Strain P25T, a Novel Hexachlorocyclohexane (HCH)-Degrading Bacterium Isolated from an HCH Dumpsite
Here, we report the draft genome sequence (4.2 Mb) of Sphingobium quisquiliarum strain P25T, a natural lin (genes involved in degradation of hexachlorocyclohexane [HCH] isomers) variant genotype, isolated from a heavily contaminated (450 mg HCH/g of soil) HCH dumpsite. Here, we report the draft genome sequence (4.2 Mb) of Sphingobium quisquiliarum strain P25T, a natural lin (genes involved in degradation of hexachlorocyclohexane [HCH] isomers) variant genotype, isolated from a heavily contaminated (450 mg HCH/g of soil) HCH dumpsite.
Genetic variation and metabolic pathway intricacy govern the active compound content and quality of the Chinese medicinal plant Lonicera japonica thunb
Background Traditional Chinese medicine uses various herbs for the treatment of various diseases for thousands of years and it is now time to assess the characteristics and effectiveness of these medicinal plants based on modern genetic and molecular tools. The herb Flos Lonicerae Japonicae (FLJ or Lonicera japonica Thunb.) is used as an anti-inflammatory agent but the chemical quality of FLJ and its medicinal efficacy has not been consistent. Here, we analyzed the transcriptomes and metabolic pathways to evaluate the active medicinal compounds in FLJ and hope that this approach can be used for a variety of medicinal herbs in the future. Results We assess transcriptomic differences between FLJ and L. japonica Thunb. var. chinensis (Watts) (rFLJ), which may explain the variable medicinal effects. We acquired transcriptomic data (over 100 million reads) from the two herbs, using RNA-seq method and the Illumina GAII platform. The transcriptomic profiles contain over 6,000 expressed sequence tags (ESTs) for each of the three flower development stages from FLJ, as well as comparable amount of ESTs from the rFLJ flower bud. To elucidate enzymatic divergence on biosynthetic pathways between the two varieties, we correlated genes and their expression profiles to known metabolic activities involving the relevant active compounds, including phenolic acids, flavonoids, terpenoids, and fatty acids. We also analyzed the diversification of genes that process the active compounds to distinguish orthologs and paralogs together with the pathways concerning biosynthesis of phenolic acid and its connections with other related pathways. Conclusions Our study provides both an initial description of gene expression profiles in flowers of FLJ and its counterfeit rFLJ and the enzyme pool that can be used to evaluate FLJ quality. Detailed molecular-level analyses allow us to decipher the relationship between metabolic pathways involved in processing active medicinal compounds and gene expressions of their processing enzymes. Our evolutionary analysis revealed specific functional divergence of orthologs and paralogs, which lead to variation in gene functions that govern the profile of active compounds. Background Traditional Chinese medicine uses various herbs for the treatment of various diseases for thousands of years and it is now time to assess the characteristics and effectiveness of these medicinal plants based on modern genetic and molecular tools. The herb Flos Lonicerae Japonicae (FLJ or Lonicera japonica Thunb.) is used as an anti-inflammatory agent but the chemical quality of FLJ and its medicinal efficacy has not been consistent. Here, we analyzed the transcriptomes and metabolic pathways to evaluate the active medicinal compounds in FLJ and hope that this approach can be used for a variety of medicinal herbs in the future. Results We assess transcriptomic differences between FLJ and L. japonica Thunb. var. chinensis (Watts) (rFLJ), which may explain the variable medicinal effects. We acquired transcriptomic data (over 100 million reads) from the two herbs, using RNA-seq method and the Illumina GAII platform. The transcriptomic profiles contain over 6,000 expressed sequence tags (ESTs) for each of the three flower development stages from FLJ, as well as comparable amount of ESTs from the rFLJ flower bud. To elucidate enzymatic divergence on biosynthetic pathways between the two varieties, we correlated genes and their expression profiles to known metabolic activities involving the relevant active compounds, including phenolic acids, flavonoids, terpenoids, and fatty acids. We also analyzed the diversification of genes that process the active compounds to distinguish orthologs and paralogs together with the pathways concerning biosynthesis of phenolic acid and its connections with other related pathways. Conclusions Our study provides both an initial description of gene expression profiles in flowers of FLJ and its counterfeit rFLJ and the enzyme pool that can be used to evaluate FLJ quality. Detailed molecular-level analyses allow us to decipher the relationship between metabolic pathways involved in processing active medicinal compounds and gene expressions of their processing enzymes. Our evolutionary analysis revealed specific functional divergence of orthologs and paralogs, which lead to variation in gene functions that govern the profile of active compounds.
High-resolution profiling of the gut microbiome reveals the extent of Clostridium difficile burden
Microbiome profiling through 16S rRNA gene sequence analysis has proven to be a useful research tool in the study of C. difficile infection (CDI); however, CDI microbiome studies typically report results at the genus level or higher, thus precluding identification of this pathogen relative to other members of the gut microbiota. Accurate identification of C. difficile relative to the overall gut microbiome may be useful in assessments of colonization in research studies or as a prognostic indicator for patients with CDI. To investigate the burden of C. difficile at the species level relative to the overall gut microbiome, we applied a high-resolution method for 16S rRNA sequence assignment to previously published gut microbiome studies of CDI and other patient populations. We identified C. difficile in 131 of 156 index cases of CDI (average abundance 1.78%), and 18 of 211 healthy controls (average abundance 0.008%). We further detected substantial levels of C. difficile in a subset of infants that persisted over the first two to 12 months of life. Correlation analysis of C. difficile burden compared to other detected species demonstrated consistent negative associations with C. scindens and multiple Blautia species. These analyses contribute insight into the relative burden of C. difficile in the gut microbiome for multiple patient populations, and indicate that high-resolution 16S rRNA gene sequence analysis may prove useful in the development and evaluation of new therapies for CDI. Microbiome profiling through 16S rRNA gene sequence analysis has proven to be a useful research tool in the study of C. difficile infection (CDI); however, CDI microbiome studies typically report results at the genus level or higher, thus precluding identification of this pathogen relative to other members of the gut microbiota. Accurate identification of C. difficile relative to the overall gut microbiome may be useful in assessments of colonization in research studies or as a prognostic indicator for patients with CDI. To investigate the burden of C. difficile at the species level relative to the overall gut microbiome, we applied a high-resolution method for 16S rRNA sequence assignment to previously published gut microbiome studies of CDI and other patient populations. We identified C. difficile in 131 of 156 index cases of CDI (average abundance 1.78%), and 18 of 211 healthy controls (average abundance 0.008%). We further detected substantial levels of C. difficile in a subset of infants that persisted over the first two to 12 months of life. Correlation analysis of C. difficile burden compared to other detected species demonstrated consistent negative associations with C. scindens and multiple Blautia species. These analyses contribute insight into the relative burden of C. difficile in the gut microbiome for multiple patient populations, and indicate that high-resolution 16S rRNA gene sequence analysis may prove useful in the development and evaluation of new therapies for CDI.
Allele Frequencies of Variants in Ultra Conserved Elements Identify Selective Pressure on Transcription Factor Binding
Ultra-conserved genes or elements (UCGs/UCEs) in the human genome are extreme examples of conservation. We characterized natural variations in 2884 UCEs and UCGs in two distinct populations; Singaporean Chinese (n = 280) and Italian (n = 501) by using a pooled sample, targeted capture, sequencing approach. We identify, with high confidence, in these regions the abundance of rare SNVs (MAF<0.5%) of which 75% is not present in dbSNP137. UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies. By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants. Moreover, prevalent variants are less likely to overlap transcription factor binding site. Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation. All together, these results suggest UCEs are not under selective pressure as a stretch of DNA but are under differential evolutionary pressure on the single nucleotide level. Ultra-conserved genes or elements (UCGs/UCEs) in the human genome are extreme examples of conservation. We characterized natural variations in 2884 UCEs and UCGs in two distinct populations; Singaporean Chinese (n = 280) and Italian (n = 501) by using a pooled sample, targeted capture, sequencing approach. We identify, with high confidence, in these regions the abundance of rare SNVs (MAF<0.5%) of which 75% is not present in dbSNP137. UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies. By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants. Moreover, prevalent variants are less likely to overlap transcription factor binding site. Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation. All together, these results suggest UCEs are not under selective pressure as a stretch of DNA but are under differential evolutionary pressure on the single nucleotide level.
A PRDX1 mutant allele causes a MMACHC secondary epimutation in cblC patients
To date, epimutations reported in man have been somatic and erased in germlines. Here, we identify a cause of the autosomal recessive cblC class of inborn errors of vitamin B12 metabolism that we name “epi-cblC”. The subjects are compound heterozygotes for a genetic mutation and for a promoter epimutation, detected in blood, fibroblasts, and sperm, at the MMACHC locus; 5-azacytidine restores the expression of MMACHC in fibroblasts. MMACHC is flanked by CCDC163P and PRDX1, which are in the opposite orientation. The epimutation is present in three generations and results from PRDX1 mutations that force antisense transcription of MMACHC thereby possibly generating a H3K36me3 mark. The silencing of PRDX1 transcription leads to partial hypomethylation of the epiallele and restores the expression of MMACHC. This example of epi-cblC demonstrates the need to search for compound epigenetic-genetic heterozygosity in patients with typical disease manifestation and genetic heterozygosity in disease-causing genes located in other gene trios. To date, epimutations reported in man have been somatic and erased in germlines. Here, we identify a cause of the autosomal recessive cblC class of inborn errors of vitamin B12 metabolism that we name “epi-cblC”. The subjects are compound heterozygotes for a genetic mutation and for a promoter epimutation, detected in blood, fibroblasts, and sperm, at the MMACHC locus; 5-azacytidine restores the expression of MMACHC in fibroblasts. MMACHC is flanked by CCDC163P and PRDX1, which are in the opposite orientation. The epimutation is present in three generations and results from PRDX1 mutations that force antisense transcription of MMACHC thereby possibly generating a H3K36me3 mark. The silencing of PRDX1 transcription leads to partial hypomethylation of the epiallele and restores the expression of MMACHC. This example of epi-cblC demonstrates the need to search for compound epigenetic-genetic heterozygosity in patients with typical disease manifestation and genetic heterozygosity in disease-causing genes located in other gene trios.
Exome sequencing in undiagnosed inherited and sporadic ataxias
Inherited ataxias are difficult to diagnose genetically. Pyle et al. use whole-exome sequencing to provide a likely molecular diagnosis in 14 of 22 families with ataxia. The approach reveals de novo mutations, broadens the phenotype of other disease genes, and is equally effective in young and older-onset patients. Inherited ataxias are difficult to diagnose genetically. Pyle et al. use whole-exome sequencing to provide a likely molecular diagnosis in 14 of 22 families with ataxia. The approach reveals de novo mutations, broadens the phenotype of other disease genes, and is equally effective in young and older-onset patients.
Splicing mutation analysis reveals previously unrecognized pathways in lymph node-invasive breast cancer
Somatic mutations reported in large-scale breast cancer (BC) sequencing studies primarily consist of protein coding mutations. mRNA splicing mutation analyses have been limited in scope, despite their prevalence in Mendelian genetic disorders. We predicted splicing mutations in 442 BC tumour and matched normal exomes from The Cancer Genome Atlas Consortium (TCGA). These splicing defects were validated by abnormal expression changes in these tumours. Of the 5,206 putative mutations identified, exon skipping, leaky or cryptic splicing was confirmed for 988 variants. Pathway enrichment analysis of the mutated genes revealed mutations in 9 NCAM1-related pathways, which were significantly increased in samples with evidence of lymph node metastasis, but not in lymph node-negative tumours. We suggest that comprehensive reporting of DNA sequencing data should include non-trivial splicing analyses to avoid missing clinically-significant deleterious splicing mutations, which may reveal novel mutated pathways present in genetic disorders. Somatic mutations reported in large-scale breast cancer (BC) sequencing studies primarily consist of protein coding mutations. mRNA splicing mutation analyses have been limited in scope, despite their prevalence in Mendelian genetic disorders. We predicted splicing mutations in 442 BC tumour and matched normal exomes from The Cancer Genome Atlas Consortium (TCGA). These splicing defects were validated by abnormal expression changes in these tumours. Of the 5,206 putative mutations identified, exon skipping, leaky or cryptic splicing was confirmed for 988 variants. Pathway enrichment analysis of the mutated genes revealed mutations in 9 NCAM1-related pathways, which were significantly increased in samples with evidence of lymph node metastasis, but not in lymph node-negative tumours. We suggest that comprehensive reporting of DNA sequencing data should include non-trivial splicing analyses to avoid missing clinically-significant deleterious splicing mutations, which may reveal novel mutated pathways present in genetic disorders.
Reducing the exome search space for Mendelian diseases using genetic linkage analysis of exome genotypes
Many exome sequencing studies of Mendelian disorders fail to optimally exploit family information. Classical genetic linkage analysis is an effective method for eliminating a large fraction of the candidate causal variants discovered, even in small families that lack a unique linkage peak. We demonstrate that accurate genetic linkage mapping can be performed using SNP genotypes extracted from exome data, removing the need for separate array-based genotyping. We provide software to facilitate such analyses. Many exome sequencing studies of Mendelian disorders fail to optimally exploit family information. Classical genetic linkage analysis is an effective method for eliminating a large fraction of the candidate causal variants discovered, even in small families that lack a unique linkage peak. We demonstrate that accurate genetic linkage mapping can be performed using SNP genotypes extracted from exome data, removing the need for separate array-based genotyping. We provide software to facilitate such analyses.
Organoids model distinct Vitamin E effects at different stages of prostate cancer evolution
Vitamin E increased prostate cancer risk in the Selenium and Vitamin E Cancer Prevention Trial (SELECT) through unknown mechanisms while Selenium showed no efficacy. We determined the effects of the SELECT supplements on benign (primary), premalignant ( RWPE-1) and malignant (LNCaP) prostate epithelial organoids. While the supplements decreased proliferation and induced cell death in cancer organoids, they had no effect on the benign organoids. In contrast, Vitamin E enhanced cell proliferation and survival in the premalignant organoids in a manner that recapitulated the SELECT results. Indeed, while Vitamin E induced a pro-proliferative gene expression signature, Selenium alone or combined with Vitamin E produced an anti-proliferative signature. The premalignant organoids also displayed significant downregulation of glucose transporter and glycolytic gene expression pointing to metabolic alterations. Detached RWPE-1 cells had low ATP levels due to diminished glucose uptake and glycolysis which was rescued by Vitamin E through the activation of fatty acid oxidation (FAO). FAO inhibition abrogated the ATP rescue, diminished survival of the inner matrix detached cells, restoring the normal hollow lumen morphology in Vitamin E treated organoids. Organoid models therefore clarify the paradoxical findings from SELECT and demonstrate that Vitamin E promotes tumorigenesis in the early stages of prostate cancer evolution. Vitamin E increased prostate cancer risk in the Selenium and Vitamin E Cancer Prevention Trial (SELECT) through unknown mechanisms while Selenium showed no efficacy. We determined the effects of the SELECT supplements on benign (primary), premalignant ( RWPE-1) and malignant (LNCaP) prostate epithelial organoids. While the supplements decreased proliferation and induced cell death in cancer organoids, they had no effect on the benign organoids. In contrast, Vitamin E enhanced cell proliferation and survival in the premalignant organoids in a manner that recapitulated the SELECT results. Indeed, while Vitamin E induced a pro-proliferative gene expression signature, Selenium alone or combined with Vitamin E produced an anti-proliferative signature. The premalignant organoids also displayed significant downregulation of glucose transporter and glycolytic gene expression pointing to metabolic alterations. Detached RWPE-1 cells had low ATP levels due to diminished glucose uptake and glycolysis which was rescued by Vitamin E through the activation of fatty acid oxidation (FAO). FAO inhibition abrogated the ATP rescue, diminished survival of the inner matrix detached cells, restoring the normal hollow lumen morphology in Vitamin E treated organoids. Organoid models therefore clarify the paradoxical findings from SELECT and demonstrate that Vitamin E promotes tumorigenesis in the early stages of prostate cancer evolution.
Genotype and clinical course in 2 Chinese Han siblings with Wilson disease presenting with isolated disabling premature osteoarthritis
Supplemental Digital Content is available in the text Supplemental Digital Content is available in the text
Whole Exome Sequencing Reveals Homozygous Mutations in RAI1, OTOF, and SLC26A4 Genes Associated with Nonsyndromic Hearing Loss in Altaian Families (South Siberia)
Hearing loss (HL) is one of the most common sensorineural disorders and several dozen genes contribute to its pathogenesis. Establishing a genetic diagnosis of HL is of great importance for clinical evaluation of deaf patients and for estimating recurrence risks for their families. Efforts to identify genes responsible for HL have been challenged by high genetic heterogeneity and different ethnic-specific prevalence of inherited deafness. Here we present the utility of whole exome sequencing (WES) for identifying candidate causal variants for previously unexplained nonsyndromic HL of seven patients from four unrelated Altaian families (the Altai Republic, South Siberia). The WES analysis revealed homozygous missense mutations in three genes associated with HL. Mutation c.2168A>G (SLC26A4) was found in one family, a novel mutation c.1111G>C (OTOF) was revealed in another family, and mutation c.5254G>A (RAI1) was found in two families. Sanger sequencing was applied for screening of identified variants in an ethnically diverse cohort of other patients with HL (n = 116) and in Altaian controls (n = 120). Identified variants were found only in patients of Altaian ethnicity (n = 93). Several lines of evidences support the association of homozygosity for discovered variants c.5254G>A (RAI1), c.1111C>G (OTOF), and c.2168A>G (SLC26A4) with HL in Altaian patients. Local prevalence of identified variants implies possible founder effect in significant number of HL cases in indigenous population of the Altai region. Notably, this is the first reported instance of patients with RAI1 missense mutation whose HL is not accompanied by specific traits typical for Smith-Magenis syndrome. Presumed association of RAI1 gene variant c.5254G>A with isolated HL needs to be proved by further experimental studies. Hearing loss (HL) is one of the most common sensorineural disorders and several dozen genes contribute to its pathogenesis. Establishing a genetic diagnosis of HL is of great importance for clinical evaluation of deaf patients and for estimating recurrence risks for their families. Efforts to identify genes responsible for HL have been challenged by high genetic heterogeneity and different ethnic-specific prevalence of inherited deafness. Here we present the utility of whole exome sequencing (WES) for identifying candidate causal variants for previously unexplained nonsyndromic HL of seven patients from four unrelated Altaian families (the Altai Republic, South Siberia). The WES analysis revealed homozygous missense mutations in three genes associated with HL. Mutation c.2168A>G (SLC26A4) was found in one family, a novel mutation c.1111G>C (OTOF) was revealed in another family, and mutation c.5254G>A (RAI1) was found in two families. Sanger sequencing was applied for screening of identified variants in an ethnically diverse cohort of other patients with HL (n = 116) and in Altaian controls (n = 120). Identified variants were found only in patients of Altaian ethnicity (n = 93). Several lines of evidences support the association of homozygosity for discovered variants c.5254G>A (RAI1), c.1111C>G (OTOF), and c.2168A>G (SLC26A4) with HL in Altaian patients. Local prevalence of identified variants implies possible founder effect in significant number of HL cases in indigenous population of the Altai region. Notably, this is the first reported instance of patients with RAI1 missense mutation whose HL is not accompanied by specific traits typical for Smith-Magenis syndrome. Presumed association of RAI1 gene variant c.5254G>A with isolated HL needs to be proved by further experimental studies.
Two mouse models reveal an actionable PARP1 dependence in aggressive chronic lymphocytic leukemia
Chronic lymphocytic leukemia (CLL) remains an incurable disease. Two recurrent cytogenetic aberrations, namely del(17p), affecting TP53, and del(11q), affecting ATM, are associated with resistance against genotoxic chemotherapy (del17p) and poor outcome (del11q and del17p). Both del(17p) and del(11q) are also associated with inferior outcome to the novel targeted agents, such as the BTK inhibitor ibrutinib. Thus, even in the era of targeted therapies, CLL with alterations in the ATM/p53 pathway remains a clinical challenge. Here we generated two mouse models of Atm- and Trp53-deficient CLL. These animals display a significantly earlier disease onset and reduced overall survival, compared to controls. We employed these models in conjunction with transcriptome analyses following cyclophosphamide treatment to reveal that Atm deficiency is associated with an exquisite and genotype-specific sensitivity against PARP inhibition. Thus, we generate two aggressive CLL models and provide a preclinical rational for the use of PARP inhibitors in ATM-affected human CLL. Chronic lymphocytic leukemia (CLL) remains an incurable disease. Two recurrent cytogenetic aberrations, namely del(17p), affecting TP53, and del(11q), affecting ATM, are associated with resistance against genotoxic chemotherapy (del17p) and poor outcome (del11q and del17p). Both del(17p) and del(11q) are also associated with inferior outcome to the novel targeted agents, such as the BTK inhibitor ibrutinib. Thus, even in the era of targeted therapies, CLL with alterations in the ATM/p53 pathway remains a clinical challenge. Here we generated two mouse models of Atm- and Trp53-deficient CLL. These animals display a significantly earlier disease onset and reduced overall survival, compared to controls. We employed these models in conjunction with transcriptome analyses following cyclophosphamide treatment to reveal that Atm deficiency is associated with an exquisite and genotype-specific sensitivity against PARP inhibition. Thus, we generate two aggressive CLL models and provide a preclinical rational for the use of PARP inhibitors in ATM-affected human CLL.
Whole Exome Sequencing Identifies a Troponin T Mutation Hot Spot in Familial Dilated Cardiomyopathy
Dilated cardiomyopathy (DCM) commonly causes heart failure and shows extensive genetic heterogeneity that may be amenable to newly developed next-generation DNA sequencing of the exome. In this study we report the successful use of exome sequencing to identify a pathogenic variant in the TNNT2 gene using segregation analysis in a large DCM family. Exome sequencing was performed on three distant relatives from a large family with a clear DCM phenotype. Missense, nonsense, and splice variants were analyzed for segregation among the three affected family members and confirmed in other relatives by direct sequencing. A c.517T C>T, Arg173Trp TNNT2 variant segregated with all affected family members and was also detected in one additional DCM family in our registry. The inclusion of segregation analysis using distant family members markedly improved the bioinformatics filtering process by removing from consideration variants that were not shared by all affected subjects. Haplotype analysis confirmed that the variant found in both DCM families was located on two distinct haplotypes, supporting the notion of independent mutational events in each family. In conclusion, an exome sequencing strategy that includes segregation analysis using distant affected relatives within a family represents a viable diagnostic strategy in a genetically heterogeneous disease like DCM. Dilated cardiomyopathy (DCM) commonly causes heart failure and shows extensive genetic heterogeneity that may be amenable to newly developed next-generation DNA sequencing of the exome. In this study we report the successful use of exome sequencing to identify a pathogenic variant in the TNNT2 gene using segregation analysis in a large DCM family. Exome sequencing was performed on three distant relatives from a large family with a clear DCM phenotype. Missense, nonsense, and splice variants were analyzed for segregation among the three affected family members and confirmed in other relatives by direct sequencing. A c.517T C>T, Arg173Trp TNNT2 variant segregated with all affected family members and was also detected in one additional DCM family in our registry. The inclusion of segregation analysis using distant family members markedly improved the bioinformatics filtering process by removing from consideration variants that were not shared by all affected subjects. Haplotype analysis confirmed that the variant found in both DCM families was located on two distinct haplotypes, supporting the notion of independent mutational events in each family. In conclusion, an exome sequencing strategy that includes segregation analysis using distant affected relatives within a family represents a viable diagnostic strategy in a genetically heterogeneous disease like DCM.
Extensive RNA editing and splicing increase immune self-representation diversity in medullary thymic epithelial cells
Background In order to become functionally competent but harmless mediators of the immune system, T cells undergo a strict educational program in the thymus, where they learn to discriminate between self and non-self. This educational program is, to a large extent, mediated by medullary thymic epithelial cells that have a unique capacity to express, and subsequently present, a large fraction of body antigens. While the scope of promiscuously expressed genes by medullary thymic epithelial cells is well-established, relatively little is known about the expression of variants that are generated by co-transcriptional and post-transcriptional processes. Results Our study reveals that in comparison to other cell types, medullary thymic epithelial cells display significantly higher levels of alternative splicing, as well as A-to-I and C-to-U RNA editing, which thereby further expand the diversity of their self-antigen repertoire. Interestingly, Aire, the key mediator of promiscuous gene expression in these cells, plays a limited role in the regulation of these transcriptional processes. Conclusions Our results highlight RNA processing as another layer by which the immune system assures a comprehensive self-representation in the thymus which is required for the establishment of self-tolerance and prevention of autoimmunity. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-1079-9) contains supplementary material, which is available to authorized users. Background In order to become functionally competent but harmless mediators of the immune system, T cells undergo a strict educational program in the thymus, where they learn to discriminate between self and non-self. This educational program is, to a large extent, mediated by medullary thymic epithelial cells that have a unique capacity to express, and subsequently present, a large fraction of body antigens. While the scope of promiscuously expressed genes by medullary thymic epithelial cells is well-established, relatively little is known about the expression of variants that are generated by co-transcriptional and post-transcriptional processes. Results Our study reveals that in comparison to other cell types, medullary thymic epithelial cells display significantly higher levels of alternative splicing, as well as A-to-I and C-to-U RNA editing, which thereby further expand the diversity of their self-antigen repertoire. Interestingly, Aire, the key mediator of promiscuous gene expression in these cells, plays a limited role in the regulation of these transcriptional processes. Conclusions Our results highlight RNA processing as another layer by which the immune system assures a comprehensive self-representation in the thymus which is required for the establishment of self-tolerance and prevention of autoimmunity. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-1079-9) contains supplementary material, which is available to authorized users.
A novel compound heterozygous variant of the SLC12A3 gene in Gitelman syndrome pedigree
Background Gitelman syndrome (GS) is an autosomal recessive disorder caused by genic mutations of SLC12A3 (Solute carrier family 12 member 3), which encodes the Na-Cl cotransporter (NCC), and presents with characteristic metabolic abnormalities, including hypokalemia, metabolic alkalosis, hypomagnesemia, and hypocalciuria. In this study, we report a case of a GS pedigree, including analysis of GS-associated gene mutations. Methods We performed next-generation sequencing analysis and Sanger sequencing to explore the SLC12A3 mutations in a GS pedigree that included a 35-year-old female patient with GS and five family members within three generations. Furthermore, we summarized their clinical manifestations and analyzed laboratory parameters related to GS. Results The female proband (the patient with GS) presented with intermittent fatigue and transient periods of tetany, along with significant hypokalemia, hypomagnesemia, and hypocalciuria. All other members of the pedigree had normal laboratory results without obvious GS-related symptoms. Genetic analysis of the SLC12A3 gene identified two novel missense mutations (c.1919A > G, p.N640S in exon 15; c.2522A > G, p.D841G in exon 21) in the patient with GS. Moreover, we demonstrated that her mother, younger maternal uncle, and cousin were carriers of one mutation (c.1919A > G), and her father was the carrier of the other (c.2522A > G). Conclusion This is the first report of these two novel pathogenic variants of SLC12A3 and their contribution to GS. Further functional studies are particularly warranted to explore the underlying molecular mechanisms. Electronic supplementary material The online version of this article (10.1186/s12881-018-0527-7) contains supplementary material, which is available to authorized users. Background Gitelman syndrome (GS) is an autosomal recessive disorder caused by genic mutations of SLC12A3 (Solute carrier family 12 member 3), which encodes the Na-Cl cotransporter (NCC), and presents with characteristic metabolic abnormalities, including hypokalemia, metabolic alkalosis, hypomagnesemia, and hypocalciuria. In this study, we report a case of a GS pedigree, including analysis of GS-associated gene mutations. Methods We performed next-generation sequencing analysis and Sanger sequencing to explore the SLC12A3 mutations in a GS pedigree that included a 35-year-old female patient with GS and five family members within three generations. Furthermore, we summarized their clinical manifestations and analyzed laboratory parameters related to GS. Results The female proband (the patient with GS) presented with intermittent fatigue and transient periods of tetany, along with significant hypokalemia, hypomagnesemia, and hypocalciuria. All other members of the pedigree had normal laboratory results without obvious GS-related symptoms. Genetic analysis of the SLC12A3 gene identified two novel missense mutations (c.1919A > G, p.N640S in exon 15; c.2522A > G, p.D841G in exon 21) in the patient with GS. Moreover, we demonstrated that her mother, younger maternal uncle, and cousin were carriers of one mutation (c.1919A > G), and her father was the carrier of the other (c.2522A > G). Conclusion This is the first report of these two novel pathogenic variants of SLC12A3 and their contribution to GS. Further functional studies are particularly warranted to explore the underlying molecular mechanisms. Electronic supplementary material The online version of this article (10.1186/s12881-018-0527-7) contains supplementary material, which is available to authorized users.
Two novel colorectal cancer risk loci in the region on chromosome 9q22.32
Highly penetrant cancer syndromes account for less than 5% of all cases with familial colorectal cancer (CRC), and other genetic contribution explains the majority of the genetic contribution to CRC. A CRC susceptibility locus on chromosome 9q has been suggested. In this study, families where risk of CRC was linked to the region, were used to search for predisposing mutations in all genes in the region. No disease-causing mutation was found. Next, haplotype association studies were performed in the region, comparing Swedish CRC cases (2664) and controls (4782). Two overlapping haplotypes were suggested. One 10-SNP haplotype was indicated in familial CRC (OR 1.4, p = 0.00005) and one 25-SNP haplotype was indicated in sporadic CRC (OR 2.2, p = 0.0000012). The allele frequencies of the 10-SNP and the 25-SNP haplotypes were 13.7% and 2.5% respectively and both included one RNA, RP11-332M4.1 and RP11-l80l4.2, in the non-overlapping regions. The sporadic 25-SNP haplotype could not be studied further, but the familial 10-SNP haplotype was analyzed in 61 additional CRC families, and 6 of them were informative for all markers and had the risk haplotype. Targeted sequencing of the 10-SNP region in the linked families identified one variant in RP11-332M4.1, suggestive to confer the increased CRC risk on this haplotype. Our results support the presence of two loci at 9q22.32, each with one RNA as the putative cause of increased CRC risk. These RNAs could exert their effect through the same, or different, genes/pathways, possibly through the regulation of neighboring genes, such as PTCH1, FANCC, DKFZP434H0512, ERCC6L2 or the processed transcript LINC00046. Highly penetrant cancer syndromes account for less than 5% of all cases with familial colorectal cancer (CRC), and other genetic contribution explains the majority of the genetic contribution to CRC. A CRC susceptibility locus on chromosome 9q has been suggested. In this study, families where risk of CRC was linked to the region, were used to search for predisposing mutations in all genes in the region. No disease-causing mutation was found. Next, haplotype association studies were performed in the region, comparing Swedish CRC cases (2664) and controls (4782). Two overlapping haplotypes were suggested. One 10-SNP haplotype was indicated in familial CRC (OR 1.4, p = 0.00005) and one 25-SNP haplotype was indicated in sporadic CRC (OR 2.2, p = 0.0000012). The allele frequencies of the 10-SNP and the 25-SNP haplotypes were 13.7% and 2.5% respectively and both included one RNA, RP11-332M4.1 and RP11-l80l4.2, in the non-overlapping regions. The sporadic 25-SNP haplotype could not be studied further, but the familial 10-SNP haplotype was analyzed in 61 additional CRC families, and 6 of them were informative for all markers and had the risk haplotype. Targeted sequencing of the 10-SNP region in the linked families identified one variant in RP11-332M4.1, suggestive to confer the increased CRC risk on this haplotype. Our results support the presence of two loci at 9q22.32, each with one RNA as the putative cause of increased CRC risk. These RNAs could exert their effect through the same, or different, genes/pathways, possibly through the regulation of neighboring genes, such as PTCH1, FANCC, DKFZP434H0512, ERCC6L2 or the processed transcript LINC00046.
Rational management approach to pure red cell aplasia
Pure red cell aplasia is an orphan disease, and as such lacks rationally established standard therapies. Most cases are idiopathic; a subset is antibody-mediated. There is overlap between idiopathic cases and those with T-cell large granular lymphocytic leukemia, hypogammaglobulinemia, and low-grade lymphomas. In each of the aforementioned, the pathogenetic mechanisms may involve autoreactive cytotoxic responses. We selected 62 uniformly diagnosed pure red cell aplasia patients and analyzed their pathophysiologic features and responsiveness to rationally applied first-line and salvage therapies in order to propose diagnostic and therapeutic algorithms that may be helpful in guiding the management of prospective patients, 52% of whom were idiopathic, while the others involved large granular lymphocytic leukemia, thymoma, and B-cell dyscrasia. T-cell-mediated responses ranged between a continuum from polyclonal to monoclonal (as seen in large granular lymphocytic leukemia). During a median observation period of 40 months, patients received a median of two different therapies to achieve remission. Frequently used therapy included calcineurin-inhibitors with a steroid taper yielding a first-line overall response rate of 76% (53/70). Oral cyclophosphamide showed activity, albeit lower than that produced by cyclosporine. Intravenous immunoglobulins were effective both in parvovirus patients and in hypogammaglobulinemia cases. In salvage settings, alemtuzumab is active, particularly in large granular lymphocytic leukemia-associated cases. Other potentially useful salvage options include rituximab, anti-thymocyte globulin and bortezomib. The workup of acquired pure red cell aplasia should include investigations of common pathological associations. Most effective therapies are directed against T-cell-mediated immunity, and therapeutic choices need to account for associated conditions that may help in choosing alternative salvage agents, such as intravenous immunoglobulin, alemtuzumab and bortezomib. Pure red cell aplasia is an orphan disease, and as such lacks rationally established standard therapies. Most cases are idiopathic; a subset is antibody-mediated. There is overlap between idiopathic cases and those with T-cell large granular lymphocytic leukemia, hypogammaglobulinemia, and low-grade lymphomas. In each of the aforementioned, the pathogenetic mechanisms may involve autoreactive cytotoxic responses. We selected 62 uniformly diagnosed pure red cell aplasia patients and analyzed their pathophysiologic features and responsiveness to rationally applied first-line and salvage therapies in order to propose diagnostic and therapeutic algorithms that may be helpful in guiding the management of prospective patients, 52% of whom were idiopathic, while the others involved large granular lymphocytic leukemia, thymoma, and B-cell dyscrasia. T-cell-mediated responses ranged between a continuum from polyclonal to monoclonal (as seen in large granular lymphocytic leukemia). During a median observation period of 40 months, patients received a median of two different therapies to achieve remission. Frequently used therapy included calcineurin-inhibitors with a steroid taper yielding a first-line overall response rate of 76% (53/70). Oral cyclophosphamide showed activity, albeit lower than that produced by cyclosporine. Intravenous immunoglobulins were effective both in parvovirus patients and in hypogammaglobulinemia cases. In salvage settings, alemtuzumab is active, particularly in large granular lymphocytic leukemia-associated cases. Other potentially useful salvage options include rituximab, anti-thymocyte globulin and bortezomib. The workup of acquired pure red cell aplasia should include investigations of common pathological associations. Most effective therapies are directed against T-cell-mediated immunity, and therapeutic choices need to account for associated conditions that may help in choosing alternative salvage agents, such as intravenous immunoglobulin, alemtuzumab and bortezomib.
Identification of a de novo DYNC1H1 mutation via WES according to published guidelines
De novo mutations that contribute to rare Mendelian diseases, including neurological disorders, have been recently identified. Whole-exome sequencing (WES) has become a powerful tool for the identification of inherited and de novo mutations in Mendelian diseases. Two important guidelines were recently published regarding the investigation of causality of sequence variant in human disease and the interpretation of novel variants identified in human genome sequences. In this study, a family with supposed movement disorders was sequenced via WES (including the proband and her unaffected parents), and a standard investigation and interpretation of the identified variants was performed according to the published guidelines. We identified a novel de novo mutation (c.2327C > T, p.P776L) in DYNC1H1 gene and confirmed that it was the causal variant. The phenotype of the affected twins included delayed motor milestones, pes cavus, lower limb weakness and atrophy, and a waddling gait. Electromyographic (EMG) recordings revealed typical signs of chronic denervation. Our study demonstrates the power of WES to discover the de novo mutations associated with a neurological disease on the whole exome scale, and guidelines to conduct WES studies and interpret of identified variants are a preferable option for the exploration of the pathogenesis of rare neurological disorders. De novo mutations that contribute to rare Mendelian diseases, including neurological disorders, have been recently identified. Whole-exome sequencing (WES) has become a powerful tool for the identification of inherited and de novo mutations in Mendelian diseases. Two important guidelines were recently published regarding the investigation of causality of sequence variant in human disease and the interpretation of novel variants identified in human genome sequences. In this study, a family with supposed movement disorders was sequenced via WES (including the proband and her unaffected parents), and a standard investigation and interpretation of the identified variants was performed according to the published guidelines. We identified a novel de novo mutation (c.2327C > T, p.P776L) in DYNC1H1 gene and confirmed that it was the causal variant. The phenotype of the affected twins included delayed motor milestones, pes cavus, lower limb weakness and atrophy, and a waddling gait. Electromyographic (EMG) recordings revealed typical signs of chronic denervation. Our study demonstrates the power of WES to discover the de novo mutations associated with a neurological disease on the whole exome scale, and guidelines to conduct WES studies and interpret of identified variants are a preferable option for the exploration of the pathogenesis of rare neurological disorders.
X-linked primary ciliary dyskinesia due to mutations in the cytoplasmic axonemal dynein assembly factor PIH1D3
Primary ciliary dyskinesia (PCD) is a genetically heterogeneous disease resulting in reduced mucus clearance and impaired lung function. Here, the authors show that mutations in PIH1D3 are responsible for an X-linked form of PCD, affecting assembly of a subset of inner arm dyneins. Primary ciliary dyskinesia (PCD) is a genetically heterogeneous disease resulting in reduced mucus clearance and impaired lung function. Here, the authors show that mutations in PIH1D3 are responsible for an X-linked form of PCD, affecting assembly of a subset of inner arm dyneins.
Genetic and epigenetic methylation defects and implication of the ERMN gene in autism spectrum disorders
Autism spectrum disorders (ASD) are highly heritable and genetically complex conditions. Although highly penetrant mutations in multiple genes have been identified, they account for the etiology of <1/3 of cases. There is also strong evidence for environmental contribution to ASD, which can be mediated by still poorly explored epigenetic modifications. We searched for methylation changes on blood DNA of 53 male ASD patients and 757 healthy controls using a methylomic array (450K Illumina), correlated the variants with transcriptional alterations in blood RNAseq data, and performed a case–control association study of the relevant findings in a larger cohort (394 cases and 500 controls). We found 700 differentially methylated CpGs, most of them hypomethylated in the ASD group (83.9%), with cis-acting expression changes at 7.6% of locations. Relevant findings included: (1) hypomethylation caused by rare genetic variants (meSNVs) at six loci (ERMN, USP24, METTL21C, PDE10A, STX16 and DBT) significantly associated with ASD (q-value <0.05); and (2) clustered epimutations associated to transcriptional changes in single-ASD patients (n=4). All meSNVs and clustered epimutations were inherited from unaffected parents. Resequencing of the top candidate genes also revealed a significant load of deleterious mutations affecting ERMN in ASD compared with controls. Our data indicate that inherited methylation alterations detectable in blood DNA, due to either genetic or epigenetic defects, can affect gene expression and contribute to ASD susceptibility most likely in an additive manner, and implicate ERMN as a novel ASD gene. Autism spectrum disorders (ASD) are highly heritable and genetically complex conditions. Although highly penetrant mutations in multiple genes have been identified, they account for the etiology of <1/3 of cases. There is also strong evidence for environmental contribution to ASD, which can be mediated by still poorly explored epigenetic modifications. We searched for methylation changes on blood DNA of 53 male ASD patients and 757 healthy controls using a methylomic array (450K Illumina), correlated the variants with transcriptional alterations in blood RNAseq data, and performed a case–control association study of the relevant findings in a larger cohort (394 cases and 500 controls). We found 700 differentially methylated CpGs, most of them hypomethylated in the ASD group (83.9%), with cis-acting expression changes at 7.6% of locations. Relevant findings included: (1) hypomethylation caused by rare genetic variants (meSNVs) at six loci (ERMN, USP24, METTL21C, PDE10A, STX16 and DBT) significantly associated with ASD (q-value <0.05); and (2) clustered epimutations associated to transcriptional changes in single-ASD patients (n=4). All meSNVs and clustered epimutations were inherited from unaffected parents. Resequencing of the top candidate genes also revealed a significant load of deleterious mutations affecting ERMN in ASD compared with controls. Our data indicate that inherited methylation alterations detectable in blood DNA, due to either genetic or epigenetic defects, can affect gene expression and contribute to ASD susceptibility most likely in an additive manner, and implicate ERMN as a novel ASD gene.
Targeted sequencing identifies a novel SH2D1A pathogenic variant in a Chinese family: Carrier screening and prenatal genetic testing
X-linked lymphoproliferative disease type 1 (XLP1) is a rare primary immunodeficiency characterized by a clinical triad consisting of severe EBV-induced hemophagocytic lymphohistiocytosis, B-cell lymphoma, and dysgammaglobulinemia. Mutations in SH2D1A gene have been revealed as the cause of XLP1. In this study, a pregnant woman with recurrence history of birthing immunodeficiency was screened for pathogenic variant because the proband sample was unavailable. We aimed to clarify the genetic diagnosis and provide prenatal testing for the family. Next-generation sequencing (NGS)-based multigene panel was used in carrier screening of the pregnant woman. Variants of immunodeficiency related genes were analyzed and prioritized. Candidate variant was verified by using Sanger sequencing. The possible influence of the identified variant was evaluated through RNA assay. Amniocentesis, karyotyping, and Sanger sequencing were performed for prenatal testing. We identified a novel de novo frameshift SH2D1A pathogenic variant (c.251_255delTTTCA) in the pregnant carrier. Peripheral blood RNA assay indicated that the mutant transcript could escape nonsense-mediated mRNA decay (NMD) and might encode a C-terminal truncated protein. Information of the variant led to success prenatal diagnosis of the fetus. In conclusion, our study clarified the genetic diagnosis and altered disease prevention for a pregnant carrier of XLP1. X-linked lymphoproliferative disease type 1 (XLP1) is a rare primary immunodeficiency characterized by a clinical triad consisting of severe EBV-induced hemophagocytic lymphohistiocytosis, B-cell lymphoma, and dysgammaglobulinemia. Mutations in SH2D1A gene have been revealed as the cause of XLP1. In this study, a pregnant woman with recurrence history of birthing immunodeficiency was screened for pathogenic variant because the proband sample was unavailable. We aimed to clarify the genetic diagnosis and provide prenatal testing for the family. Next-generation sequencing (NGS)-based multigene panel was used in carrier screening of the pregnant woman. Variants of immunodeficiency related genes were analyzed and prioritized. Candidate variant was verified by using Sanger sequencing. The possible influence of the identified variant was evaluated through RNA assay. Amniocentesis, karyotyping, and Sanger sequencing were performed for prenatal testing. We identified a novel de novo frameshift SH2D1A pathogenic variant (c.251_255delTTTCA) in the pregnant carrier. Peripheral blood RNA assay indicated that the mutant transcript could escape nonsense-mediated mRNA decay (NMD) and might encode a C-terminal truncated protein. Information of the variant led to success prenatal diagnosis of the fetus. In conclusion, our study clarified the genetic diagnosis and altered disease prevention for a pregnant carrier of XLP1.
Glioblastoma adaptation traced through decline of an IDH1 clonal driver and macro-evolution of a double-minute chromosome
In a glioblastoma tumour with multi-region sequencing before and after recurrence, we find an IDH1 mutation that is clonal in the primary but lost at recurrence. We also describe the evolution of a double-minute chromosome encoding regulators of the PI3K signalling axis that dominates at recurrence, emphasizing the challenges of an evolving and dynamic oncogenic landscape for precision medicine. In a glioblastoma tumour with multi-region sequencing before and after recurrence, we find an IDH1 mutation that is clonal in the primary but lost at recurrence. We also describe the evolution of a double-minute chromosome encoding regulators of the PI3K signalling axis that dominates at recurrence, emphasizing the challenges of an evolving and dynamic oncogenic landscape for precision medicine.
The clinical features, outcomes and genetic characteristics of hypertrophic cardiomyopathy patients with severe right ventricular hypertrophy
Introduction Severe right ventricular hypertrophy (SRVH) is a rare phenotype in hypertrophic cardiomyopathy (HCM) for which limited information is available. This study was undertaken to investigate the clinical, prognostic and genetic characteristics of HCM patients with SRVH. Methods HCM with SRVH was defined as HCM with a maximum right ventricular wall thickness ≥10 mm. Whole-genome sequencing (WGS) was performed in HCM patients with SRVH. Multivariate Cox proportional hazards regression models were used to identify risk factors for cardiac death and events in HCM with SRVH. Patients with apical hypertrophic cardiomyopathy (ApHCM) were selected as a comparison group. The clinical features and outcomes of 34 HCM patients with SRVH and 273 ApHCM patients were compared. Results Compared with the ApHCM group, the HCM with SRVH group included younger patients and a higher proportion of female patients and also displayed higher cardiovascular morbidity and mortality. The multivariate Cox proportional hazards regression models identified 2 independent predictors of cardiovascular death in HCM patients with SRVH, a New York Heart Association class ≥III (hazard ratio [HR] = 8.7, 95% confidence interval (CI): 1.43-52.87, p = 0.019) and an age at the time of HCM diagnosis ≤18 (HR = 5.5, 95% CI: 1.24-28.36, p = 0.026). Among the 11 HCM patients with SRVH who underwent WGS, 10 (90.9%) were identified as carriers of at least one specific sarcomere gene mutation. MYH7 and TTN mutations were the most common sarcomere mutations noted in this study. Two or more HCM-related gene mutations were observed in 9 (82%) patients, and mutations in either other cardiomyopathy-related genes or ion-channel disease-related genes were found in 8 (73%) patients. Conclusions HCM patients with SRVH were characterized by poor clinical outcomes and the presentation of multiple gene mutations. Introduction Severe right ventricular hypertrophy (SRVH) is a rare phenotype in hypertrophic cardiomyopathy (HCM) for which limited information is available. This study was undertaken to investigate the clinical, prognostic and genetic characteristics of HCM patients with SRVH. Methods HCM with SRVH was defined as HCM with a maximum right ventricular wall thickness ≥10 mm. Whole-genome sequencing (WGS) was performed in HCM patients with SRVH. Multivariate Cox proportional hazards regression models were used to identify risk factors for cardiac death and events in HCM with SRVH. Patients with apical hypertrophic cardiomyopathy (ApHCM) were selected as a comparison group. The clinical features and outcomes of 34 HCM patients with SRVH and 273 ApHCM patients were compared. Results Compared with the ApHCM group, the HCM with SRVH group included younger patients and a higher proportion of female patients and also displayed higher cardiovascular morbidity and mortality. The multivariate Cox proportional hazards regression models identified 2 independent predictors of cardiovascular death in HCM patients with SRVH, a New York Heart Association class ≥III (hazard ratio [HR] = 8.7, 95% confidence interval (CI): 1.43-52.87, p = 0.019) and an age at the time of HCM diagnosis ≤18 (HR = 5.5, 95% CI: 1.24-28.36, p = 0.026). Among the 11 HCM patients with SRVH who underwent WGS, 10 (90.9%) were identified as carriers of at least one specific sarcomere gene mutation. MYH7 and TTN mutations were the most common sarcomere mutations noted in this study. Two or more HCM-related gene mutations were observed in 9 (82%) patients, and mutations in either other cardiomyopathy-related genes or ion-channel disease-related genes were found in 8 (73%) patients. Conclusions HCM patients with SRVH were characterized by poor clinical outcomes and the presentation of multiple gene mutations.
Identification of a Comprehensive Spectrum of Genetic Factors for Hereditary Breast Cancer in a Chinese Population by Next-Generation Sequencing
The genetic etiology of hereditary breast cancer has not been fully elucidated. Although germline mutations of high-penetrance genes such as BRCA1/2 are implicated in development of hereditary breast cancers, at least half of all breast cancer families are not linked to these genes. To identify a comprehensive spectrum of genetic factors for hereditary breast cancer in a Chinese population, we performed an analysis of germline mutations in 2,165 coding exons of 152 genes associated with hereditary cancer using next-generation sequencing (NGS) in 99 breast cancer patients from families of cancer patients regardless of cancer types. Forty-two deleterious germline mutations were identified in 21 genes of 34 patients, including 18 (18.2%) BRCA1 or BRCA2 mutations, 3 (3%) TP53 mutations, 5 (5.1%) DNA mismatch repair gene mutations, 1 (1%) CDH1 mutation, 6 (6.1%) Fanconi anemia pathway gene mutations, and 9 (9.1%) mutations in other genes. Of seven patients who carried mutations in more than one gene, 4 were BRCA1/2 mutation carriers, and their average onset age was much younger than patients with only BRCA1/2 mutations. Almost all identified high-penetrance gene mutations in those families fulfill the typical phenotypes of hereditary cancer syndromes listed in the National Comprehensive Cancer Network (NCCN) guidelines, except two TP53 and three mismatch repair gene mutations. Furthermore, functional studies of MSH3 germline mutations confirmed the association between MSH3 mutation and tumorigenesis, and segregation analysis suggested antagonism between BRCA1 and MSH3. We also identified a lot of low-penetrance gene mutations. Although the clinical significance of those newly identified low-penetrance gene mutations has not been fully appreciated yet, these new findings do provide valuable epidemiological information for the future studies. Together, these findings highlight the importance of genetic testing based on NCCN guidelines and a multi-gene analysis using NGS may be a supplement to traditional genetic counseling. The genetic etiology of hereditary breast cancer has not been fully elucidated. Although germline mutations of high-penetrance genes such as BRCA1/2 are implicated in development of hereditary breast cancers, at least half of all breast cancer families are not linked to these genes. To identify a comprehensive spectrum of genetic factors for hereditary breast cancer in a Chinese population, we performed an analysis of germline mutations in 2,165 coding exons of 152 genes associated with hereditary cancer using next-generation sequencing (NGS) in 99 breast cancer patients from families of cancer patients regardless of cancer types. Forty-two deleterious germline mutations were identified in 21 genes of 34 patients, including 18 (18.2%) BRCA1 or BRCA2 mutations, 3 (3%) TP53 mutations, 5 (5.1%) DNA mismatch repair gene mutations, 1 (1%) CDH1 mutation, 6 (6.1%) Fanconi anemia pathway gene mutations, and 9 (9.1%) mutations in other genes. Of seven patients who carried mutations in more than one gene, 4 were BRCA1/2 mutation carriers, and their average onset age was much younger than patients with only BRCA1/2 mutations. Almost all identified high-penetrance gene mutations in those families fulfill the typical phenotypes of hereditary cancer syndromes listed in the National Comprehensive Cancer Network (NCCN) guidelines, except two TP53 and three mismatch repair gene mutations. Furthermore, functional studies of MSH3 germline mutations confirmed the association between MSH3 mutation and tumorigenesis, and segregation analysis suggested antagonism between BRCA1 and MSH3. We also identified a lot of low-penetrance gene mutations. Although the clinical significance of those newly identified low-penetrance gene mutations has not been fully appreciated yet, these new findings do provide valuable epidemiological information for the future studies. Together, these findings highlight the importance of genetic testing based on NCCN guidelines and a multi-gene analysis using NGS may be a supplement to traditional genetic counseling.
New perspective in diagnostics of mitochondrial disorders: two years’ experience with whole-exome sequencing at a national paediatric centre
Background Whole-exome sequencing (WES) has led to an exponential increase in identification of causative variants in mitochondrial disorders (MD). Methods We performed WES in 113 MD suspected patients from Polish paediatric reference centre, in whom routine testing failed to identify a molecular defect. WES was performed using TruSeqExome enrichment, followed by variant prioritization, validation by Sanger sequencing, and segregation with the disease phenotype in the family. Results Likely causative mutations were identified in 67 (59.3 %) patients; these included variants in mtDNA (6 patients) and nDNA: X-linked (9 patients), autosomal dominant (5 patients), and autosomal recessive (47 patients, 11 homozygotes). Novel variants accounted for 50.5 % (50/99) of all detected changes. In 47 patients, changes in 31 MD-related genes (ACAD9, ADCK3, AIFM1, CLPB, COX10, DLD, EARS2, FBXL4, MTATP6, MTFMT, MTND1, MTND3, MTND5, NAXE, NDUFS6, NDUFS7, NDUFV1, OPA1, PARS2, PC, PDHA1, POLG, RARS2, RRM2B, SCO2, SERAC1, SLC19A3, SLC25A12, TAZ, TMEM126B, VARS2) were identified. The ACAD9, CLPB, FBXL4, PDHA1 genes recurred more than twice suggesting higher general/ethnic prevalence. In 19 cases, variants in 18 non-MD related genes (ADAR, CACNA1A, CDKL5, CLN3, CPS1, DMD, DYSF, GBE1, GFAP, HSD17B4, MECP2, MYBPC3, PEX5, PGAP2, PIGN, PRF1, SBDS, SCN2A) were found. The percentage of positive WES results rose gradually with increasing probability of MD according to the Mitochondrial Disease Criteria (MDC) scale (from 36 to 90 % for low and high probability, respectively). The percentage of detected MD-related genes compared with non MD-related genes also grew with the increasing MD likelihood (from 20 to 97 %). Molecular diagnosis was established in 30/47 (63.8 %) neonates and in 17/28 (60.7 %) patients with basal ganglia involvement. Mutations in CLPB, SERAC1, TAZ genes were identified in neonates with 3-methylglutaconic aciduria (3-MGA) as a discriminative feature. New MD-related candidate gene (NDUFB8) is under verification. Conclusions We suggest WES rather than targeted NGS as the method of choice in diagnostics of MD in children, including neonates with 3-MGA aciduria, who died without determination of disease cause and with limited availability of laboratory data. There is a strong correlation between the degree of MD diagnosis by WES and MD likelihood expressed by the MDC scale. Electronic supplementary material The online version of this article (doi:10.1186/s12967-016-0930-9) contains supplementary material, which is available to authorized users. Background Whole-exome sequencing (WES) has led to an exponential increase in identification of causative variants in mitochondrial disorders (MD). Methods We performed WES in 113 MD suspected patients from Polish paediatric reference centre, in whom routine testing failed to identify a molecular defect. WES was performed using TruSeqExome enrichment, followed by variant prioritization, validation by Sanger sequencing, and segregation with the disease phenotype in the family. Results Likely causative mutations were identified in 67 (59.3 %) patients; these included variants in mtDNA (6 patients) and nDNA: X-linked (9 patients), autosomal dominant (5 patients), and autosomal recessive (47 patients, 11 homozygotes). Novel variants accounted for 50.5 % (50/99) of all detected changes. In 47 patients, changes in 31 MD-related genes (ACAD9, ADCK3, AIFM1, CLPB, COX10, DLD, EARS2, FBXL4, MTATP6, MTFMT, MTND1, MTND3, MTND5, NAXE, NDUFS6, NDUFS7, NDUFV1, OPA1, PARS2, PC, PDHA1, POLG, RARS2, RRM2B, SCO2, SERAC1, SLC19A3, SLC25A12, TAZ, TMEM126B, VARS2) were identified. The ACAD9, CLPB, FBXL4, PDHA1 genes recurred more than twice suggesting higher general/ethnic prevalence. In 19 cases, variants in 18 non-MD related genes (ADAR, CACNA1A, CDKL5, CLN3, CPS1, DMD, DYSF, GBE1, GFAP, HSD17B4, MECP2, MYBPC3, PEX5, PGAP2, PIGN, PRF1, SBDS, SCN2A) were found. The percentage of positive WES results rose gradually with increasing probability of MD according to the Mitochondrial Disease Criteria (MDC) scale (from 36 to 90 % for low and high probability, respectively). The percentage of detected MD-related genes compared with non MD-related genes also grew with the increasing MD likelihood (from 20 to 97 %). Molecular diagnosis was established in 30/47 (63.8 %) neonates and in 17/28 (60.7 %) patients with basal ganglia involvement. Mutations in CLPB, SERAC1, TAZ genes were identified in neonates with 3-methylglutaconic aciduria (3-MGA) as a discriminative feature. New MD-related candidate gene (NDUFB8) is under verification. Conclusions We suggest WES rather than targeted NGS as the method of choice in diagnostics of MD in children, including neonates with 3-MGA aciduria, who died without determination of disease cause and with limited availability of laboratory data. There is a strong correlation between the degree of MD diagnosis by WES and MD likelihood expressed by the MDC scale. Electronic supplementary material The online version of this article (doi:10.1186/s12967-016-0930-9) contains supplementary material, which is available to authorized users.
Setdb1 Is Required for Myogenic Differentiation of C2C12 Myoblast Cells via Maintenance of MyoD Expression
Setdb1, an H3-K9 specific histone methyltransferase, is associated with transcriptional silencing of euchromatic genes through chromatin modification. Functions of Setdb1 during development have been extensively studied in embryonic and mesenchymal stem cells as well as neurogenic progenitor cells. But the role of Sedtdb1 in myogenic differentiation remains unknown. In this study, we report that Setdb1 is required for myogenic potential of C2C12 myoblast cells through maintaining the expressions of MyoD and muscle-specific genes. We find that reduced Setdb1 expression in C2C12 myoblast cells severely delayed differentiation of C2C12 myoblast cells, whereas exogenous Setdb1 expression had little effect on. Gene expression profiling analysis using oligonucleotide micro-array and RNA-Seq technologies demonstrated that depletion of Setdb1 results in downregulation of MyoD as well as the components of muscle fiber in proliferating C2C12 cells. In addition, exogenous expression of MyoD reversed transcriptional repression of MyoD promoter-driven lucif-erase reporter by Setdb1 shRNA and rescued myogenic differentiation of C2C12 myoblast cells depleted of endogenous Setdb1. Taken together, these results provide new insights into how levels of key myogenic regulators are maintained prior to induction of differentiation. Setdb1, an H3-K9 specific histone methyltransferase, is associated with transcriptional silencing of euchromatic genes through chromatin modification. Functions of Setdb1 during development have been extensively studied in embryonic and mesenchymal stem cells as well as neurogenic progenitor cells. But the role of Sedtdb1 in myogenic differentiation remains unknown. In this study, we report that Setdb1 is required for myogenic potential of C2C12 myoblast cells through maintaining the expressions of MyoD and muscle-specific genes. We find that reduced Setdb1 expression in C2C12 myoblast cells severely delayed differentiation of C2C12 myoblast cells, whereas exogenous Setdb1 expression had little effect on. Gene expression profiling analysis using oligonucleotide micro-array and RNA-Seq technologies demonstrated that depletion of Setdb1 results in downregulation of MyoD as well as the components of muscle fiber in proliferating C2C12 cells. In addition, exogenous expression of MyoD reversed transcriptional repression of MyoD promoter-driven lucif-erase reporter by Setdb1 shRNA and rescued myogenic differentiation of C2C12 myoblast cells depleted of endogenous Setdb1. Taken together, these results provide new insights into how levels of key myogenic regulators are maintained prior to induction of differentiation.
Genetic analyses in a bonobo (Pan paniscus) with arrhythmogenic right ventricular cardiomyopathy
Arrhythmogenic right ventricular cardiomyopathy (ARVC) is a disorder that may lead to sudden death and can affect humans and other primates. In 2012, the alpha male bonobo of the Milwaukee County Zoo died suddenly and histologic evaluation found features of ARVC. This study sought to discover a possible genetic cause for ARVC in this individual. We sequenced our subject’s DNA to search for deleterious variants in genes involved in cardiovascular disorders. Variants found were annotated according to the human genome, following currently available classification used for human diseases. Sequencing from the DNA of an unrelated unaffected bonobo was also used for prediction of pathogenicity. Twenty-four variants of uncertain clinical significance (VUSs) but no pathogenic variants were found in the proband studied. Further familial, functional, and bonobo population studies are needed to determine if any of the VUSs or a combination of the VUSs found may be associated with the clinical findings. Future genotype-phenotype establishment will be beneficial for the appropriate care of the captive zoo bonobo population world-wide as well as conservation of the bobono species in its native habitat. Arrhythmogenic right ventricular cardiomyopathy (ARVC) is a disorder that may lead to sudden death and can affect humans and other primates. In 2012, the alpha male bonobo of the Milwaukee County Zoo died suddenly and histologic evaluation found features of ARVC. This study sought to discover a possible genetic cause for ARVC in this individual. We sequenced our subject’s DNA to search for deleterious variants in genes involved in cardiovascular disorders. Variants found were annotated according to the human genome, following currently available classification used for human diseases. Sequencing from the DNA of an unrelated unaffected bonobo was also used for prediction of pathogenicity. Twenty-four variants of uncertain clinical significance (VUSs) but no pathogenic variants were found in the proband studied. Further familial, functional, and bonobo population studies are needed to determine if any of the VUSs or a combination of the VUSs found may be associated with the clinical findings. Future genotype-phenotype establishment will be beneficial for the appropriate care of the captive zoo bonobo population world-wide as well as conservation of the bobono species in its native habitat.
Germline mutations in ETV6 are associated with thrombocytopenia, red cell macrocytosis and predisposition to lymphoblastic leukemia
Some familial platelet disorders are associated with predisposition to leukemia, myelodysplastic syndrome (MDS) or dyserythropoietic anemia.1,2 We identified a family with autosomal dominant thrombocytopenia, high erythrocyte mean corpuscular volume (MCV) and two occurrences of B-cell precursor acute lymphoblastic leukemia (ALL). Whole exome sequencing identified a heterozygous single nucleotide change in ETV6 (Ets Variant Gene 6), c.641C>T, encoding a p.Pro214Leu substitution in the central domain, segregating with thrombocytopenia and elevated MCV. A screen of 23 families with similar phenotype found two with ETV6 mutations. One family had the p.Pro214Leu mutation and one individual with ALL. The other family had a c.1252A>G transition producing a p.Arg418Gly substitution in the DNA binding domain, with alternative splicing and exon-skipping. Functional characterization of these mutations showed aberrant cellular localization of mutant and endogenous ETV6, decreased transcriptional repression and altered megakaryocyte maturation. Our findings underscore a key role for ETV6 in platelet formation and leukemia predisposition. Some familial platelet disorders are associated with predisposition to leukemia, myelodysplastic syndrome (MDS) or dyserythropoietic anemia.1,2 We identified a family with autosomal dominant thrombocytopenia, high erythrocyte mean corpuscular volume (MCV) and two occurrences of B-cell precursor acute lymphoblastic leukemia (ALL). Whole exome sequencing identified a heterozygous single nucleotide change in ETV6 (Ets Variant Gene 6), c.641C>T, encoding a p.Pro214Leu substitution in the central domain, segregating with thrombocytopenia and elevated MCV. A screen of 23 families with similar phenotype found two with ETV6 mutations. One family had the p.Pro214Leu mutation and one individual with ALL. The other family had a c.1252A>G transition producing a p.Arg418Gly substitution in the DNA binding domain, with alternative splicing and exon-skipping. Functional characterization of these mutations showed aberrant cellular localization of mutant and endogenous ETV6, decreased transcriptional repression and altered megakaryocyte maturation. Our findings underscore a key role for ETV6 in platelet formation and leukemia predisposition.
BACH2 immunodeficiency illustrates an association between super-enhancers and haploinsufficiency
Transcriptional programs guiding lymphocyte differentiation depend on precise expression and timing of transcription factors (TFs). BACH2 is a TF essential for T- and B-lymphocytes and is associated with an archetypal super-enhancer (SE). Single nucleotide variants in the BACH2 locus associate with multiple autoimmune diseases but BACH2 mutations causing Mendelian monogenic primary immunodeficiency have not previously been identified. We describe a syndrome of BACH2-related immunodeficiency and autoimmunity (BRIDA) resulting from BACH2 haploinsufficiency. Patients had lymphocyte maturation defects, causing immunoglobulin deficiency and intestinal inflammation. The mutations disrupted protein stability by interfering with homodimerization or by causing aggregation. Analogous lymphocyte defects existed in Bach2 heterozygous mice. More generally, we found that genes causing monogenic haploinsufficient diseases are substantially enriched for TFs and SE-architecture. These observations show a new feature of SE-architecture in Mendelian diseases of immunity, that heterozygous mutations in SE-regulated genes identified on whole exome/genome sequencing may have greater significance than recognized. Transcriptional programs guiding lymphocyte differentiation depend on precise expression and timing of transcription factors (TFs). BACH2 is a TF essential for T- and B-lymphocytes and is associated with an archetypal super-enhancer (SE). Single nucleotide variants in the BACH2 locus associate with multiple autoimmune diseases but BACH2 mutations causing Mendelian monogenic primary immunodeficiency have not previously been identified. We describe a syndrome of BACH2-related immunodeficiency and autoimmunity (BRIDA) resulting from BACH2 haploinsufficiency. Patients had lymphocyte maturation defects, causing immunoglobulin deficiency and intestinal inflammation. The mutations disrupted protein stability by interfering with homodimerization or by causing aggregation. Analogous lymphocyte defects existed in Bach2 heterozygous mice. More generally, we found that genes causing monogenic haploinsufficient diseases are substantially enriched for TFs and SE-architecture. These observations show a new feature of SE-architecture in Mendelian diseases of immunity, that heterozygous mutations in SE-regulated genes identified on whole exome/genome sequencing may have greater significance than recognized.
Tumor-associated copy number changes in the circulation of patients with prostate cancer identified through whole-genome sequencing
Background Patients with prostate cancer may present with metastatic or recurrent disease despite initial curative treatment. The propensity of metastatic prostate cancer to spread to the bone has limited repeated sampling of tumor deposits. Hence, considerably less is understood about this lethal metastatic disease, as it is not commonly studied. Here we explored whole-genome sequencing of plasma DNA to scan the tumor genomes of these patients non-invasively. Methods We wanted to make whole-genome analysis from plasma DNA amenable to clinical routine applications and developed an approach based on a benchtop high-throughput platform, that is, Illuminas MiSeq instrument. We performed whole-genome sequencing from plasma at a shallow sequencing depth to establish a genome-wide copy number profile of the tumor at low costs within 2 days. In parallel, we sequenced a panel of 55 high-interest genes and 38 introns with frequent fusion breakpoints such as the TMPRSS2-ERG fusion with high coverage. After intensive testing of our approach with samples from 25 individuals without cancer we analyzed 13 plasma samples derived from five patients with castration resistant (CRPC) and four patients with castration sensitive prostate cancer (CSPC). Results The genome-wide profiling in the plasma of our patients revealed multiple copy number aberrations including those previously reported in prostate tumors, such as losses in 8p and gains in 8q. High-level copy number gains in the AR locus were observed in patients with CRPC but not with CSPC disease. We identified the TMPRSS2-ERG rearrangement associated 3-Mbp deletion on chromosome 21 and found corresponding fusion plasma fragments in these cases. In an index case multiregional sequencing of the primary tumor identified different copy number changes in each sector, suggesting multifocal disease. Our plasma analyses of this index case, performed 13 years after resection of the primary tumor, revealed novel chromosomal rearrangements, which were stable in serial plasma analyses over a 9-month period, which is consistent with the presence of one metastatic clone. Conclusions The genomic landscape of prostate cancer can be established by non-invasive means from plasma DNA. Our approach provides specific genomic signatures within 2 days which may therefore serve as 'liquid biopsy'. Background Patients with prostate cancer may present with metastatic or recurrent disease despite initial curative treatment. The propensity of metastatic prostate cancer to spread to the bone has limited repeated sampling of tumor deposits. Hence, considerably less is understood about this lethal metastatic disease, as it is not commonly studied. Here we explored whole-genome sequencing of plasma DNA to scan the tumor genomes of these patients non-invasively. Methods We wanted to make whole-genome analysis from plasma DNA amenable to clinical routine applications and developed an approach based on a benchtop high-throughput platform, that is, Illuminas MiSeq instrument. We performed whole-genome sequencing from plasma at a shallow sequencing depth to establish a genome-wide copy number profile of the tumor at low costs within 2 days. In parallel, we sequenced a panel of 55 high-interest genes and 38 introns with frequent fusion breakpoints such as the TMPRSS2-ERG fusion with high coverage. After intensive testing of our approach with samples from 25 individuals without cancer we analyzed 13 plasma samples derived from five patients with castration resistant (CRPC) and four patients with castration sensitive prostate cancer (CSPC). Results The genome-wide profiling in the plasma of our patients revealed multiple copy number aberrations including those previously reported in prostate tumors, such as losses in 8p and gains in 8q. High-level copy number gains in the AR locus were observed in patients with CRPC but not with CSPC disease. We identified the TMPRSS2-ERG rearrangement associated 3-Mbp deletion on chromosome 21 and found corresponding fusion plasma fragments in these cases. In an index case multiregional sequencing of the primary tumor identified different copy number changes in each sector, suggesting multifocal disease. Our plasma analyses of this index case, performed 13 years after resection of the primary tumor, revealed novel chromosomal rearrangements, which were stable in serial plasma analyses over a 9-month period, which is consistent with the presence of one metastatic clone. Conclusions The genomic landscape of prostate cancer can be established by non-invasive means from plasma DNA. Our approach provides specific genomic signatures within 2 days which may therefore serve as 'liquid biopsy'.
A survey on cellular RNA editing activity in response to Candida albicans infections
Background Adenosine-to-Inosine (A-to-I) RNA editing is catalyzed by the adenosine deaminase acting on RNA (ADAR) family of enzymes, which induces alterations in mRNA sequence. It has been shown that A-to-I RNA editing events are of significance in the cell’s innate immunity and cellular response to viral infections. However, whether RNA editing plays a role in cellular response to microorganism/fungi infection has not been determined. Candida albicans, one of the most prevalent human pathogenic fungi, usually act as a commensal on skin and superficial mucosal, but has been found to cause candidiasis in immunosuppression patients. Previously, we have revealed the up-regulation of A-to-I RNA editing activity in response to different types of influenza virus infections. The current work is designed to study the effect of microorganism/fungi infection on the activity of A-to-I RNA editing in infected hosts. Results We first detected and characterized the A-to-I RNA editing events in oral epithelial cells (OKF6) and primary human umbilical vein endothelial cells (HUVEC), under normal growth condition or with C. albicans infection. Eighty nine thousand six hundred forty eight and 60,872 A-to-I editing sites were detected in normal OKF6 and HUVEC cells, respectively. They were validated against the RNA editing databases, DARNED, RADAR, and REDIportal with 50, 80, and 80% success rates, respectively. While over 95% editing sites were detected in Alu regions, among the rest of the editing sites in non repetitive regions, the majority was located in introns and UTRs. The distributions of A-to-I editing activity and editing depth were analyzed during the course of C. albicans infection. While the normalized editing levels of common editing sites exhibited a significant increase, especially in Alu regions, no significant change in the expression of ADAR1 or ADAR2 was observed. Second, we performed further analysis on data from in vivo mouse study with C. albicans infection. One thousand one hundred thirty three and 955 A-to-I editing sites were identified in mouse tongue and kidney tissues, respectively. The number of A-to-I editing events was much smaller than in human epithelial or endothelial cells, due to the lack of Alu elements in mouse genome. Furthermore, during the course of C. albicans infection we observed stable level of A-to-I editing activity in 131 and 190 common editing sites in the mouse tongue and kidney tissues, and found no significant change in ADAR1 or ADAR2 expression (with the exception of ADAR2 displaying a significant increase at 12 h after infection in mouse kidney tissue before returning to normal). Conclusions This work represents the first comprehensive analysis of A-to-I RNA editome in human epithelial and endothelial cells. C. albicans infection of human epithelial and endothelial cells led to the up-regulation of A-to-I editing activities, through a mechanism different from that of viral infections in human hosts. However, the in vivo mouse model with C. albicans infection did not show significant changes in A-to-I editing activities in tongue and kidney tissues. The different results in the mouse model were likely due to the presence of more complex in vivo environments, e.g. circulation and mixed cell types. Electronic supplementary material The online version of this article (10.1186/s12864-017-4374-2) contains supplementary material, which is available to authorized users. Background Adenosine-to-Inosine (A-to-I) RNA editing is catalyzed by the adenosine deaminase acting on RNA (ADAR) family of enzymes, which induces alterations in mRNA sequence. It has been shown that A-to-I RNA editing events are of significance in the cell’s innate immunity and cellular response to viral infections. However, whether RNA editing plays a role in cellular response to microorganism/fungi infection has not been determined. Candida albicans, one of the most prevalent human pathogenic fungi, usually act as a commensal on skin and superficial mucosal, but has been found to cause candidiasis in immunosuppression patients. Previously, we have revealed the up-regulation of A-to-I RNA editing activity in response to different types of influenza virus infections. The current work is designed to study the effect of microorganism/fungi infection on the activity of A-to-I RNA editing in infected hosts. Results We first detected and characterized the A-to-I RNA editing events in oral epithelial cells (OKF6) and primary human umbilical vein endothelial cells (HUVEC), under normal growth condition or with C. albicans infection. Eighty nine thousand six hundred forty eight and 60,872 A-to-I editing sites were detected in normal OKF6 and HUVEC cells, respectively. They were validated against the RNA editing databases, DARNED, RADAR, and REDIportal with 50, 80, and 80% success rates, respectively. While over 95% editing sites were detected in Alu regions, among the rest of the editing sites in non repetitive regions, the majority was located in introns and UTRs. The distributions of A-to-I editing activity and editing depth were analyzed during the course of C. albicans infection. While the normalized editing levels of common editing sites exhibited a significant increase, especially in Alu regions, no significant change in the expression of ADAR1 or ADAR2 was observed. Second, we performed further analysis on data from in vivo mouse study with C. albicans infection. One thousand one hundred thirty three and 955 A-to-I editing sites were identified in mouse tongue and kidney tissues, respectively. The number of A-to-I editing events was much smaller than in human epithelial or endothelial cells, due to the lack of Alu elements in mouse genome. Furthermore, during the course of C. albicans infection we observed stable level of A-to-I editing activity in 131 and 190 common editing sites in the mouse tongue and kidney tissues, and found no significant change in ADAR1 or ADAR2 expression (with the exception of ADAR2 displaying a significant increase at 12 h after infection in mouse kidney tissue before returning to normal). Conclusions This work represents the first comprehensive analysis of A-to-I RNA editome in human epithelial and endothelial cells. C. albicans infection of human epithelial and endothelial cells led to the up-regulation of A-to-I editing activities, through a mechanism different from that of viral infections in human hosts. However, the in vivo mouse model with C. albicans infection did not show significant changes in A-to-I editing activities in tongue and kidney tissues. The different results in the mouse model were likely due to the presence of more complex in vivo environments, e.g. circulation and mixed cell types. Electronic supplementary material The online version of this article (10.1186/s12864-017-4374-2) contains supplementary material, which is available to authorized users.
Feralisation targets different genomic loci to domestication in the chicken
Feralisation or the reintroduction of domestic animals into the wild population requires adaptation to survive. Here, the authors analyse the genetics of domestic chickens released into the wild and identify genomic regions that are selected for during feralisation. Feralisation or the reintroduction of domestic animals into the wild population requires adaptation to survive. Here, the authors analyse the genetics of domestic chickens released into the wild and identify genomic regions that are selected for during feralisation.
Splicing Variants of SERPINA1 Gene in Ovine Milk: Characterization of cDNA and Identification of Polymorphisms
The serine protease inhibitor, clade A, member 1 (SERPINA1) is the gene for a protein called alpha-1-antitrypsin (AAT), which is a member of the serine protease inhibitor (serpin) superfamily of proteins. By conformational change, serpins control several chemical reactions inhibiting the activity of proteases. AAT is the most abundant endogenous serpin in blood circulation and it is present in relatively high concentration in human milk as well as in bovine and porcine colostrum. Here we report for the first time the molecular characterization and sequence variability of the ovine SERPINA1 cDNA and gene. cDNAs from mammary gland and from milk were PCR amplified, and three different transcripts (1437, 1166 and 521bp) of the SERPINA1 gene were identified. We amplified and sequenced different regions of the gene (5’ UTR, from exon 2 to exon 5 and 3’ UTR), and we found that the exon-intron structure of the gene is similar to that of human and bovine. We detected a total of 97 SNPs in cDNAs and gene sequences from 10 sheep of three different breeds. In adult sheep tissues a SERPINA1 gene expression analysis indicated a differential expression of the three different transcripts. The finding reported in this paper will aid further studies on possible involvement of the SERPINA1 gene in different physiological states and its possible association with production traits. The serine protease inhibitor, clade A, member 1 (SERPINA1) is the gene for a protein called alpha-1-antitrypsin (AAT), which is a member of the serine protease inhibitor (serpin) superfamily of proteins. By conformational change, serpins control several chemical reactions inhibiting the activity of proteases. AAT is the most abundant endogenous serpin in blood circulation and it is present in relatively high concentration in human milk as well as in bovine and porcine colostrum. Here we report for the first time the molecular characterization and sequence variability of the ovine SERPINA1 cDNA and gene. cDNAs from mammary gland and from milk were PCR amplified, and three different transcripts (1437, 1166 and 521bp) of the SERPINA1 gene were identified. We amplified and sequenced different regions of the gene (5’ UTR, from exon 2 to exon 5 and 3’ UTR), and we found that the exon-intron structure of the gene is similar to that of human and bovine. We detected a total of 97 SNPs in cDNAs and gene sequences from 10 sheep of three different breeds. In adult sheep tissues a SERPINA1 gene expression analysis indicated a differential expression of the three different transcripts. The finding reported in this paper will aid further studies on possible involvement of the SERPINA1 gene in different physiological states and its possible association with production traits.
Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum
The accurate detection of induced mutations is critical for both forward and reverse genetics studies. Experimental chemical mutagenesis induces relatively few single base changes per individual. In a complex eukaryotic genome, false positive detection of mutations can occur at or above this mutagenesis rate. We demonstrate here, using a population of ethyl methanesulfonate (EMS)-treated Sorghum bicolor BTx623 individuals, that using replication to detect false positive-induced variants in next-generation sequencing (NGS) data permits higher throughput variant detection with greater accuracy. We used a lower sequence coverage depth (average of 7×) from 586 independently mutagenized individuals and detected 5,399,493 homozygous single nucleotide polymorphisms (SNPs). Of these, 76% originated from only 57,872 genomic positions prone to false positive variant calling. These positions are characterized by high copy number paralogs where the error-prone SNP positions are at copies containing a variant at the SNP position. The ability of short stretches of homology to generate these error-prone positions suggests that incompletely assembled or poorly mapped repeated sequences are one driver of these error-prone positions. Removal of these false positives left 1,275,872 homozygous and 477,531 heterozygous EMS-induced SNPs, which, congruent with the mutagenic mechanism of EMS, were >98% G:C to A:T transitions. Through this analysis, we generated a collection of sequence indexed mutants of sorghum. This collection contains 4035 high-impact homozygous mutations in 3637 genes and 56,514 homozygous missense mutations in 23,227 genes. Each line contains, on average, 2177 annotated homozygous SNPs per genome, including seven likely gene knockouts and 96 missense mutations. The number of mutations in a transcript was linearly correlated with the transcript length and also the G+C count, but not with the GC/AT ratio. Analysis of the detected mutagenized positions identified CG-rich patches, and flanking sequences strongly influenced EMS-induced mutation rates. This method for detecting false positive-induced mutations is generally applicable to any organism, is independent of the choice of in silico variant-calling algorithm, and is most valuable when the true mutation rate is likely to be low, such as in laboratory-induced mutations or somatic mutation detection in medicine. The accurate detection of induced mutations is critical for both forward and reverse genetics studies. Experimental chemical mutagenesis induces relatively few single base changes per individual. In a complex eukaryotic genome, false positive detection of mutations can occur at or above this mutagenesis rate. We demonstrate here, using a population of ethyl methanesulfonate (EMS)-treated Sorghum bicolor BTx623 individuals, that using replication to detect false positive-induced variants in next-generation sequencing (NGS) data permits higher throughput variant detection with greater accuracy. We used a lower sequence coverage depth (average of 7×) from 586 independently mutagenized individuals and detected 5,399,493 homozygous single nucleotide polymorphisms (SNPs). Of these, 76% originated from only 57,872 genomic positions prone to false positive variant calling. These positions are characterized by high copy number paralogs where the error-prone SNP positions are at copies containing a variant at the SNP position. The ability of short stretches of homology to generate these error-prone positions suggests that incompletely assembled or poorly mapped repeated sequences are one driver of these error-prone positions. Removal of these false positives left 1,275,872 homozygous and 477,531 heterozygous EMS-induced SNPs, which, congruent with the mutagenic mechanism of EMS, were >98% G:C to A:T transitions. Through this analysis, we generated a collection of sequence indexed mutants of sorghum. This collection contains 4035 high-impact homozygous mutations in 3637 genes and 56,514 homozygous missense mutations in 23,227 genes. Each line contains, on average, 2177 annotated homozygous SNPs per genome, including seven likely gene knockouts and 96 missense mutations. The number of mutations in a transcript was linearly correlated with the transcript length and also the G+C count, but not with the GC/AT ratio. Analysis of the detected mutagenized positions identified CG-rich patches, and flanking sequences strongly influenced EMS-induced mutation rates. This method for detecting false positive-induced mutations is generally applicable to any organism, is independent of the choice of in silico variant-calling algorithm, and is most valuable when the true mutation rate is likely to be low, such as in laboratory-induced mutations or somatic mutation detection in medicine.
Efficiency of olaparib in colorectal cancer patients with an alteration of the homologous repair protein
Precision medicine is defined by the administration of drugs based on the tumor’s particular genetic characteristics. It is developing quickly in the field of cancer therapy. For example, KRAS, NRAS and BRAF genetic testing demonstrates its efficiency for precision medicine in colorectal cancer (CRC). Besides for these well-known mutations, the purpose of performing larger genetic testing in this pathology is unknown. Recent reports have shown that using the poly ADP ribose polymerase (PARP) inhibitor olaparib in patients with homologous repair enzyme deficiency gave positive clinical results in breast, ovarian and prostate cancers. We have reported here the cases of 2 patients with multi-treated metastatic CRC who underwent somatic and constitutional exome analyses. The analyses revealed a loss of function mutation in a homologous repair enzyme resulting in the loss of heterozygosity for both patients (Check2 for the first patient and RAD51C for the second one). Both patients were treated with off-label usage of olaparib. While the first patient showed clinical benefit, reduction of carcinoembryonic antigen tumor marker and radiologic response, the second patient quickly presented a progression of the tumor. Additional genetic analyses revealed a frameshift truncating mutation of the TP53BP1 gene in the patient who progressed. Interestingly, deficiency in TP53BP1 was previously described to confer resistance to olaparib in mice breast cancer models. Our findings suggest that exome analysis may be a helpful tool to highlight targetable mutations in CRC and that olaparib may be efficient in patients with a homologous repair deficiency. Precision medicine is defined by the administration of drugs based on the tumor’s particular genetic characteristics. It is developing quickly in the field of cancer therapy. For example, KRAS, NRAS and BRAF genetic testing demonstrates its efficiency for precision medicine in colorectal cancer (CRC). Besides for these well-known mutations, the purpose of performing larger genetic testing in this pathology is unknown. Recent reports have shown that using the poly ADP ribose polymerase (PARP) inhibitor olaparib in patients with homologous repair enzyme deficiency gave positive clinical results in breast, ovarian and prostate cancers. We have reported here the cases of 2 patients with multi-treated metastatic CRC who underwent somatic and constitutional exome analyses. The analyses revealed a loss of function mutation in a homologous repair enzyme resulting in the loss of heterozygosity for both patients (Check2 for the first patient and RAD51C for the second one). Both patients were treated with off-label usage of olaparib. While the first patient showed clinical benefit, reduction of carcinoembryonic antigen tumor marker and radiologic response, the second patient quickly presented a progression of the tumor. Additional genetic analyses revealed a frameshift truncating mutation of the TP53BP1 gene in the patient who progressed. Interestingly, deficiency in TP53BP1 was previously described to confer resistance to olaparib in mice breast cancer models. Our findings suggest that exome analysis may be a helpful tool to highlight targetable mutations in CRC and that olaparib may be efficient in patients with a homologous repair deficiency.
Investigating the relationship between UMODL1 gene polymorphisms and high myopia: a case–control study in Chinese
Background The UMODL1 gene was found to be associated with high myopia in Japanese. This study aimed to investigate this gene for association with high myopia in Chinese. Methods Two groups of unrelated Han Chinese from Hong Kong were recruited using the same criteria: Sample Set 1 comprising 356 controls (spherical equivalent, SE, within ±1 diopter or D) and 356 cases (SE ≤ −8D), and Sample Set 2 comprising 394 controls and 526 cases. Fifty-nine tag single nucleotide polymorphisms (SNPs) were selected and genotyped for Sample Set 1. Four SNPs were followed up with Sample Set 2. Both single-marker and haplotype analyses were performed with cases defined by different SE thresholds. Secondary phenotypes were also analyzed for association with genotypes. Results Data filtering left 57 SNPs for analysis. Single-marker analysis did not reveal any significant differences between cases and controls in the initial study. However, haplotype GCT for markers rs220168-rs220170-rs11911271 showed marginal significance (empirical P = 0.076; SE ≤ −12D for cases), but could not be replicated in the follow-up study. In contrast, non-synonymous SNP rs3819142 was associated with high myopia (SE ≤ −10D) in the follow-up study, but could not be confirmed using Sample Set 1. The SNP rs2839471, positive in the original Japanese study, gave negative results in all our analyses. Exploratory analysis of secondary phenotypes indicated that allele C of rs220120 was associated with anterior chamber depth (adjusted P = 0.0460). Conclusions Common UMODL1 polymorphisms were unlikely to be important in the genetic susceptibility to high myopia in Han Chinese. Background The UMODL1 gene was found to be associated with high myopia in Japanese. This study aimed to investigate this gene for association with high myopia in Chinese. Methods Two groups of unrelated Han Chinese from Hong Kong were recruited using the same criteria: Sample Set 1 comprising 356 controls (spherical equivalent, SE, within ±1 diopter or D) and 356 cases (SE ≤ −8D), and Sample Set 2 comprising 394 controls and 526 cases. Fifty-nine tag single nucleotide polymorphisms (SNPs) were selected and genotyped for Sample Set 1. Four SNPs were followed up with Sample Set 2. Both single-marker and haplotype analyses were performed with cases defined by different SE thresholds. Secondary phenotypes were also analyzed for association with genotypes. Results Data filtering left 57 SNPs for analysis. Single-marker analysis did not reveal any significant differences between cases and controls in the initial study. However, haplotype GCT for markers rs220168-rs220170-rs11911271 showed marginal significance (empirical P = 0.076; SE ≤ −12D for cases), but could not be replicated in the follow-up study. In contrast, non-synonymous SNP rs3819142 was associated with high myopia (SE ≤ −10D) in the follow-up study, but could not be confirmed using Sample Set 1. The SNP rs2839471, positive in the original Japanese study, gave negative results in all our analyses. Exploratory analysis of secondary phenotypes indicated that allele C of rs220120 was associated with anterior chamber depth (adjusted P = 0.0460). Conclusions Common UMODL1 polymorphisms were unlikely to be important in the genetic susceptibility to high myopia in Han Chinese.
An essential domain of an early-diverged RNA polymerase II functions to accurately decode a primitive chromatin landscape
Abstract A unique feature of RNA polymerase II (RNA pol II) is its long C-terminal extension, called the carboxy-terminal domain (CTD). The well-studied eukaryotes possess a tandemly repeated 7-amino-acid sequence, called the canonical CTD, which orchestrates various steps in mRNA synthesis. Many eukaryotes possess a CTD devoid of repeats, appropriately called a non-canonical CTD, which performs completely unknown functions. Trypanosoma brucei, the etiologic agent of African Sleeping Sickness, deploys an RNA pol II that contains a non-canonical CTD to accomplish an unusual transcriptional program; all protein-coding genes are transcribed as part of a polygenic precursor mRNA (pre-mRNA) that is initiated within a several-kilobase-long region, called the transcription start site (TSS), which is upstream of the first protein-coding gene in the polygenic array. In this report, we show that the non-canonical CTD of T. brucei RNA pol II is important for normal protein-coding gene expression, likely directing RNA pol II to the TSSs within the genome. Our work reveals the presence of a primordial CTD code within eukarya and indicates that proper recognition of the chromatin landscape is a central function of this RNA pol II-distinguishing domain. Abstract A unique feature of RNA polymerase II (RNA pol II) is its long C-terminal extension, called the carboxy-terminal domain (CTD). The well-studied eukaryotes possess a tandemly repeated 7-amino-acid sequence, called the canonical CTD, which orchestrates various steps in mRNA synthesis. Many eukaryotes possess a CTD devoid of repeats, appropriately called a non-canonical CTD, which performs completely unknown functions. Trypanosoma brucei, the etiologic agent of African Sleeping Sickness, deploys an RNA pol II that contains a non-canonical CTD to accomplish an unusual transcriptional program; all protein-coding genes are transcribed as part of a polygenic precursor mRNA (pre-mRNA) that is initiated within a several-kilobase-long region, called the transcription start site (TSS), which is upstream of the first protein-coding gene in the polygenic array. In this report, we show that the non-canonical CTD of T. brucei RNA pol II is important for normal protein-coding gene expression, likely directing RNA pol II to the TSSs within the genome. Our work reveals the presence of a primordial CTD code within eukarya and indicates that proper recognition of the chromatin landscape is a central function of this RNA pol II-distinguishing domain.
Identification of Epigenetic Biomarkers of Lung Adenocarcinoma through Multi-Omics Data Analysis
Epigenetic mechanisms such as DNA methylation or histone modifications are essential for the regulation of gene expression and development of tissues. Alteration of epigenetic modifications can be used as an epigenetic biomarker for diagnosis and as promising targets for epigenetic therapy. A recent study explored cancer-cell specific epigenetic biomarkers by examining different types of epigenetic modifications simultaneously. However, it was based on microarrays and reported biomarkers that were also present in normal cells at a low frequency. Here, we first analyzed multi-omics data (including ChIP-Seq data of six types of histone modifications: H3K27ac, H3K4me1, H3K9me3, H3K36me3, H3K27me3, and H3K4me3) obtained from 26 lung adenocarcinoma cell lines and a normal cell line. We identified six genes with both H3K27ac and H3K4me3 histone modifications in their promoter regions, which were not present in the normal cell line, but present in ≥85% (22 out of 26) and ≤96% (25 out of 26) of the lung adenocarcinoma cell lines. Of these genes, NUP210 (encoding a main component of the nuclear pore complex) was the only gene in which the two modifications were not detected in another normal cell line. RNA-Seq analysis revealed that NUP210 was aberrantly overexpressed among the 26 lung adenocarcinoma cell lines, although the frequency of NUP210 overexpression was lower (19.3%) in 57 lung adenocarcinoma tissue samples studied and stored in another database. This study provides a basis to discover epigenetic biomarkers highly specific to a certain cancer, based on multi-omics data at the cell population level. Epigenetic mechanisms such as DNA methylation or histone modifications are essential for the regulation of gene expression and development of tissues. Alteration of epigenetic modifications can be used as an epigenetic biomarker for diagnosis and as promising targets for epigenetic therapy. A recent study explored cancer-cell specific epigenetic biomarkers by examining different types of epigenetic modifications simultaneously. However, it was based on microarrays and reported biomarkers that were also present in normal cells at a low frequency. Here, we first analyzed multi-omics data (including ChIP-Seq data of six types of histone modifications: H3K27ac, H3K4me1, H3K9me3, H3K36me3, H3K27me3, and H3K4me3) obtained from 26 lung adenocarcinoma cell lines and a normal cell line. We identified six genes with both H3K27ac and H3K4me3 histone modifications in their promoter regions, which were not present in the normal cell line, but present in ≥85% (22 out of 26) and ≤96% (25 out of 26) of the lung adenocarcinoma cell lines. Of these genes, NUP210 (encoding a main component of the nuclear pore complex) was the only gene in which the two modifications were not detected in another normal cell line. RNA-Seq analysis revealed that NUP210 was aberrantly overexpressed among the 26 lung adenocarcinoma cell lines, although the frequency of NUP210 overexpression was lower (19.3%) in 57 lung adenocarcinoma tissue samples studied and stored in another database. This study provides a basis to discover epigenetic biomarkers highly specific to a certain cancer, based on multi-omics data at the cell population level.
R-loops induce repressive chromatin marks over mammalian gene terminators
The formation of R-loops is a natural consequence of the transcription process, caused by invasion of the DNA duplex by nascent transcripts. These structures have been considered rare transcriptional by-products with potential harmful effects on genome integrity, due to the fragility of the displaced DNA coding strand1. However R-loops may also possess beneficial effects as their widespread formation has been detected over CpG island promoters in human genes2,3. Furthermore we have previously shown that R-loops are particularly enriched over G-rich terminator elements. These facilitate RNA polymerase II (Pol II) pausing prior to efficient termination4. Here we reveal an unanticipated link between R-loops and RNA interference (RNAi)-dependent H3K9me2 formation over pause site termination regions of mammalian protein coding genes. We show that R-loops induce antisense transcription over these pause elements which in turn lead to the generation of double-strand RNA (dsRNA) and recruitment of Dicer, Ago1, Ago2, and G9a histone lysine methyltransferase (HKMT). Consequently an H3K9me2 repressive mark is formed and Heterochromatin Protein 1γ (HP1γ) is recruited, that reinforces Pol II pausing prior to efficient transcriptional termination. We predict that R-loops promote a chromatin architecture that defines the termination region for a substantial subset of mammalian genes. The formation of R-loops is a natural consequence of the transcription process, caused by invasion of the DNA duplex by nascent transcripts. These structures have been considered rare transcriptional by-products with potential harmful effects on genome integrity, due to the fragility of the displaced DNA coding strand1. However R-loops may also possess beneficial effects as their widespread formation has been detected over CpG island promoters in human genes2,3. Furthermore we have previously shown that R-loops are particularly enriched over G-rich terminator elements. These facilitate RNA polymerase II (Pol II) pausing prior to efficient termination4. Here we reveal an unanticipated link between R-loops and RNA interference (RNAi)-dependent H3K9me2 formation over pause site termination regions of mammalian protein coding genes. We show that R-loops induce antisense transcription over these pause elements which in turn lead to the generation of double-strand RNA (dsRNA) and recruitment of Dicer, Ago1, Ago2, and G9a histone lysine methyltransferase (HKMT). Consequently an H3K9me2 repressive mark is formed and Heterochromatin Protein 1γ (HP1γ) is recruited, that reinforces Pol II pausing prior to efficient transcriptional termination. We predict that R-loops promote a chromatin architecture that defines the termination region for a substantial subset of mammalian genes.
BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis
Summary Enhancers, critical determinants of cellular identity, are commonly identified by correlative chromatin marks and gain-of-function potential, though only loss-of-function studies can demonstrate their requirement in the native genomic context. Previously we identified an erythroid enhancer of BCL11A, subject to common genetic variation associated with fetal hemoglobin (HbF) level, whose mouse ortholog is necessary for erythroid BCL11A expression. Here we develop pooled CRISPR-Cas9 guide RNA libraries to perform in situ saturating mutagenesis of the human and mouse enhancers. This approach reveals critical minimal features and discrete vulnerabilities of these enhancers. Despite conserved function of the composite enhancers, their architecture diverges. The crucial human sequences appear primate-specific. Through editing of primary human progenitors and mouse transgenesis, we validate the BCL11A erythroid enhancer as a target for HbF reinduction. The detailed enhancer map will inform therapeutic genome editing. The screening approach described here is generally applicable to functional interrogation of noncoding genomic elements. Summary Enhancers, critical determinants of cellular identity, are commonly identified by correlative chromatin marks and gain-of-function potential, though only loss-of-function studies can demonstrate their requirement in the native genomic context. Previously we identified an erythroid enhancer of BCL11A, subject to common genetic variation associated with fetal hemoglobin (HbF) level, whose mouse ortholog is necessary for erythroid BCL11A expression. Here we develop pooled CRISPR-Cas9 guide RNA libraries to perform in situ saturating mutagenesis of the human and mouse enhancers. This approach reveals critical minimal features and discrete vulnerabilities of these enhancers. Despite conserved function of the composite enhancers, their architecture diverges. The crucial human sequences appear primate-specific. Through editing of primary human progenitors and mouse transgenesis, we validate the BCL11A erythroid enhancer as a target for HbF reinduction. The detailed enhancer map will inform therapeutic genome editing. The screening approach described here is generally applicable to functional interrogation of noncoding genomic elements.
Inducible super-enhancers are organized based on canonical signal-specific transcription factor binding elements
Abstract Super-enhancers are established through the interactions of several enhancers and a large number of proteins, including transcription factors and co-regulators; however, the formation of these interactions is poorly understood. By re-analysing previously published estrogen receptor alpha (ERα) ChIP-seq data sets derived from the MCF-7 cell line, we observed that in the absence of stimulation, future super-enhancers are represented by one or a few transcription factor binding event(s) and these extraordinary enhancers possess a response element largely specific to the ERα dimer. Upon hormonal stimulation, these primary binding sites are surrounded by a large amount of ERα and the critical components of active enhancers, such as P300 and MED1, and together with neighbouring sites bound by newly recruited ERα, they generate the functional super-enhancers. To further validate the role of canonical elements in super-enhancer formation, we investigated some additional signal-dependent transcription factors, confirming that certain, distinguished binding elements have a general organizer function. These results suggest that certain signal-specific transcription factors guide super-enhancer formation upon binding to strong response elements. These findings may reshape the current understanding of how these regulatory units assemble, highlighting the involvement of DNA elements instead of protein–protein interactions. Abstract Super-enhancers are established through the interactions of several enhancers and a large number of proteins, including transcription factors and co-regulators; however, the formation of these interactions is poorly understood. By re-analysing previously published estrogen receptor alpha (ERα) ChIP-seq data sets derived from the MCF-7 cell line, we observed that in the absence of stimulation, future super-enhancers are represented by one or a few transcription factor binding event(s) and these extraordinary enhancers possess a response element largely specific to the ERα dimer. Upon hormonal stimulation, these primary binding sites are surrounded by a large amount of ERα and the critical components of active enhancers, such as P300 and MED1, and together with neighbouring sites bound by newly recruited ERα, they generate the functional super-enhancers. To further validate the role of canonical elements in super-enhancer formation, we investigated some additional signal-dependent transcription factors, confirming that certain, distinguished binding elements have a general organizer function. These results suggest that certain signal-specific transcription factors guide super-enhancer formation upon binding to strong response elements. These findings may reshape the current understanding of how these regulatory units assemble, highlighting the involvement of DNA elements instead of protein–protein interactions.
EIN2-dependent regulation of acetylation of histone H3K14 and non-canonical histone H3K23 in ethylene signalling
The translocation of the C-terminal domain of EIN2 to the nucleus is essential for induction of gene expression in response to the plant hormone ethylene. Here, Zhang et al. show that EIN2 is required for ethylene-inducible elevation of histone acetylation marks associated with transcriptional activation. The translocation of the C-terminal domain of EIN2 to the nucleus is essential for induction of gene expression in response to the plant hormone ethylene. Here, Zhang et al. show that EIN2 is required for ethylene-inducible elevation of histone acetylation marks associated with transcriptional activation.
Temporal dynamics of gene expression and histone marks at the Arabidopsis shoot meristem during flowering
When plants flower, the shoot apical meristem switches fate to produce floral organs instead of leaves. Here You et al. perform tissue-specific epigenome profiling and show that during this transition changes in histone methylation are correlated with transcriptional responses in the meristem. When plants flower, the shoot apical meristem switches fate to produce floral organs instead of leaves. Here You et al. perform tissue-specific epigenome profiling and show that during this transition changes in histone methylation are correlated with transcriptional responses in the meristem.
A Genome-Wide Linkage Study for Chronic Obstructive Pulmonary Disease in a Dutch Genetic Isolate Identifies Novel Rare Candidate Variants
Chronic obstructive pulmonary disease (COPD) is a complex and heritable disease, associated with multiple genetic variants. Specific familial types of COPD may be explained by rare variants, which have not been widely studied. We aimed to discover rare genetic variants underlying COPD through a genome-wide linkage scan. Affected-only analysis was performed using the 6K Illumina Linkage IV Panel in 142 cases clustered in 27 families from a genetic isolate, the Erasmus Rucphen Family (ERF) study. Potential causal variants were identified by searching for shared rare variants in the exome-sequence data of the affected members of the families contributing most to the linkage peak. The identified rare variants were then tested for association with COPD in a large meta-analysis of several cohorts. Significant evidence for linkage was observed on chromosomes 15q14–15q25 [logarithm of the odds (LOD) score = 5.52], 11p15.4–11q14.1 (LOD = 3.71) and 5q14.3–5q33.2 (LOD = 3.49). In the chromosome 15 peak, that harbors the known COPD locus for nicotinic receptors, and in the chromosome 5 peak we could not identify shared variants. In the chromosome 11 locus, we identified four rare (minor allele frequency (MAF) <0.02), predicted pathogenic, missense variants. These were shared among the affected family members. The identified variants localize to genes including neuroblast differentiation-associated protein (AHNAK), previously associated with blood biomarkers in COPD, phospholipase C Beta 3 (PLCB3), shown to increase airway hyper-responsiveness, solute carrier family 22-A11 (SLC22A11), involved in amino acid metabolism and ion transport, and metallothionein-like protein 5 (MTL5), involved in nicotinate and nicotinamide metabolism. Association of SLC22A11 and MTL5 variants were confirmed in the meta-analysis of 9,888 cases and 27,060 controls. In conclusion, we have identified novel rare variants in plausible genes related to COPD. Further studies utilizing large sample whole-genome sequencing should further confirm the associations at chromosome 11 and investigate the chromosome 15 and 5 linked regions. Chronic obstructive pulmonary disease (COPD) is a complex and heritable disease, associated with multiple genetic variants. Specific familial types of COPD may be explained by rare variants, which have not been widely studied. We aimed to discover rare genetic variants underlying COPD through a genome-wide linkage scan. Affected-only analysis was performed using the 6K Illumina Linkage IV Panel in 142 cases clustered in 27 families from a genetic isolate, the Erasmus Rucphen Family (ERF) study. Potential causal variants were identified by searching for shared rare variants in the exome-sequence data of the affected members of the families contributing most to the linkage peak. The identified rare variants were then tested for association with COPD in a large meta-analysis of several cohorts. Significant evidence for linkage was observed on chromosomes 15q14–15q25 [logarithm of the odds (LOD) score = 5.52], 11p15.4–11q14.1 (LOD = 3.71) and 5q14.3–5q33.2 (LOD = 3.49). In the chromosome 15 peak, that harbors the known COPD locus for nicotinic receptors, and in the chromosome 5 peak we could not identify shared variants. In the chromosome 11 locus, we identified four rare (minor allele frequency (MAF) <0.02), predicted pathogenic, missense variants. These were shared among the affected family members. The identified variants localize to genes including neuroblast differentiation-associated protein (AHNAK), previously associated with blood biomarkers in COPD, phospholipase C Beta 3 (PLCB3), shown to increase airway hyper-responsiveness, solute carrier family 22-A11 (SLC22A11), involved in amino acid metabolism and ion transport, and metallothionein-like protein 5 (MTL5), involved in nicotinate and nicotinamide metabolism. Association of SLC22A11 and MTL5 variants were confirmed in the meta-analysis of 9,888 cases and 27,060 controls. In conclusion, we have identified novel rare variants in plausible genes related to COPD. Further studies utilizing large sample whole-genome sequencing should further confirm the associations at chromosome 11 and investigate the chromosome 15 and 5 linked regions.
Integrated miRNA-mRNA analysis reveals regulatory pathways underlying the curly fleece trait in Chinese tan sheep
Background Tan sheep is an indigenous Chinese breed well known for its beautiful curly fleece. One prominent breed characteristic of this sheep breed is that the degree of curliness differs markedly between lambs and adults, but the molecular mechanisms regulating the shift are still not well understood. In this study, we identified 49 differentially expressed (DE) microRNAs (miRNAs) between Tan sheep at the two stages through miRNA-seq, and combined the data with that in our earlier Suppression Subtractive Hybridization cDNA (SSH) library study to elucidate the mechanisms underlying curly fleece formation. Results Thirty-six potential miRNA-mRNA target pairs were identified using computational methods, including 25 DE miRNAs and 10 DE genes involved in the MAPK signaling pathway, steroid biosynthesis and metabolic pathways. With the differential expressions between lambs and adults confirmed by qRT-PCR, some miRNAs were already annotated in the genome, but some were novel miRNAs. Inhibition of KRT83 expression by miR-432 was confirmed by both gene knockdown with siRNA and overexpression, which was consistent with the miRNAs and targets prediction results. Conclusion Our study represents the comprehensive analysis of mRNA and miRNA in Tan sheep and offers detailed insight into the development of curly fleece as well as the potential mechanisms controlling curly hair formation in humans. Electronic supplementary material The online version of this article (10.1186/s12864-018-4736-4) contains supplementary material, which is available to authorized users. Background Tan sheep is an indigenous Chinese breed well known for its beautiful curly fleece. One prominent breed characteristic of this sheep breed is that the degree of curliness differs markedly between lambs and adults, but the molecular mechanisms regulating the shift are still not well understood. In this study, we identified 49 differentially expressed (DE) microRNAs (miRNAs) between Tan sheep at the two stages through miRNA-seq, and combined the data with that in our earlier Suppression Subtractive Hybridization cDNA (SSH) library study to elucidate the mechanisms underlying curly fleece formation. Results Thirty-six potential miRNA-mRNA target pairs were identified using computational methods, including 25 DE miRNAs and 10 DE genes involved in the MAPK signaling pathway, steroid biosynthesis and metabolic pathways. With the differential expressions between lambs and adults confirmed by qRT-PCR, some miRNAs were already annotated in the genome, but some were novel miRNAs. Inhibition of KRT83 expression by miR-432 was confirmed by both gene knockdown with siRNA and overexpression, which was consistent with the miRNAs and targets prediction results. Conclusion Our study represents the comprehensive analysis of mRNA and miRNA in Tan sheep and offers detailed insight into the development of curly fleece as well as the potential mechanisms controlling curly hair formation in humans. Electronic supplementary material The online version of this article (10.1186/s12864-018-4736-4) contains supplementary material, which is available to authorized users.
Enrichment and verification of differentially expressed miRNAs in bursa of Fabricius in two breeds of duck
Objective The bursa of Fabricius (BF) is a central humoral immune organ belonging specifically to avians. Recent studies had suggested that miRNAs were active regulators involved in the immune processes. This study was to investigate the possible differences of the BF at miRNA level between two genetically disparate duck breeds. Methods Using Illumina next-generation sequencing, the miRNAs libraries of ducks were established. Results The results showed that there were 66 differentially expressed miRNAs and 28 novel miRNAs in bursa. A set of abundant miRNAs (i.e., let-7, miR-146a-5p, miR-21-5p, miR-17~92) which are involved in immunity and disease were detected and the predicted target genes of the novel miRNAs were associated with duck high anti-adversity ability. By gene ontology analysis and enriching KEGG pathway, the targets of differential expressed miRNAs were mainly involved in immunity and disease, supporting that there were differences in the BF immune functions between the two duck breeds. In addition, the metabolic pathway had the maximum enriched target genes and some enriched pathways that were related to cell cycle, protein synthesis, cell proliferation and apoptosis. It indicted that the difference of metabolism may be one of the reasons leading the immune difference between the BF of two duck breeds. Conclusion This data lists the main differences in the BF at miRNAs level between two genetically disparate duck breeds and lays a foundation to carry out molecular assisted breeding of poultry in the future. Objective The bursa of Fabricius (BF) is a central humoral immune organ belonging specifically to avians. Recent studies had suggested that miRNAs were active regulators involved in the immune processes. This study was to investigate the possible differences of the BF at miRNA level between two genetically disparate duck breeds. Methods Using Illumina next-generation sequencing, the miRNAs libraries of ducks were established. Results The results showed that there were 66 differentially expressed miRNAs and 28 novel miRNAs in bursa. A set of abundant miRNAs (i.e., let-7, miR-146a-5p, miR-21-5p, miR-17~92) which are involved in immunity and disease were detected and the predicted target genes of the novel miRNAs were associated with duck high anti-adversity ability. By gene ontology analysis and enriching KEGG pathway, the targets of differential expressed miRNAs were mainly involved in immunity and disease, supporting that there were differences in the BF immune functions between the two duck breeds. In addition, the metabolic pathway had the maximum enriched target genes and some enriched pathways that were related to cell cycle, protein synthesis, cell proliferation and apoptosis. It indicted that the difference of metabolism may be one of the reasons leading the immune difference between the BF of two duck breeds. Conclusion This data lists the main differences in the BF at miRNAs level between two genetically disparate duck breeds and lays a foundation to carry out molecular assisted breeding of poultry in the future.
Structural polymorphism in the promoter of pfmrp2 confers Plasmodium falciparum tolerance to quinoline drugs
Drug resistance in Plasmodium falciparum remains a challenge for the malaria eradication programmes around the world. With the emergence of artemisinin resistance, the efficacy of the partner drugs in the artemisinin combination therapies (ACT) that include quinoline-based drugs is becoming critical. So far only few resistance markers have been identified from which only two transmembrane transporters namely PfMDR1 (an ATP-binding cassette transporter) and PfCRT (a drug-metabolite transporter) have been experimentally verified. Another P. falciparum transporter, the ATP-binding cassette containing multidrug resistance-associated protein (PfMRP2) represents an additional possible factor of drug resistance in P. falciparum. In this study, we identified a parasite clone that is derived from the 3D7 P. falciparum strain and shows increased resistance to chloroquine, mefloquine and quinine through the trophozoite and schizont stages. We demonstrate that the resistance phenotype is caused by a 4.1 kb deletion in the 5′ upstream region of the pfmrp2 gene that leads to an alteration in the pfmrp2 transcription and thus increased level of PfMRP2 protein. These results also suggest the importance of putative promoter elements in regulation of gene expression during the P. falciparum intra-erythrocytic developmental cycle and the potential of genetic polymorphisms within these regions to underlie drug resistance. Drug resistance in Plasmodium falciparum remains a challenge for the malaria eradication programmes around the world. With the emergence of artemisinin resistance, the efficacy of the partner drugs in the artemisinin combination therapies (ACT) that include quinoline-based drugs is becoming critical. So far only few resistance markers have been identified from which only two transmembrane transporters namely PfMDR1 (an ATP-binding cassette transporter) and PfCRT (a drug-metabolite transporter) have been experimentally verified. Another P. falciparum transporter, the ATP-binding cassette containing multidrug resistance-associated protein (PfMRP2) represents an additional possible factor of drug resistance in P. falciparum. In this study, we identified a parasite clone that is derived from the 3D7 P. falciparum strain and shows increased resistance to chloroquine, mefloquine and quinine through the trophozoite and schizont stages. We demonstrate that the resistance phenotype is caused by a 4.1 kb deletion in the 5′ upstream region of the pfmrp2 gene that leads to an alteration in the pfmrp2 transcription and thus increased level of PfMRP2 protein. These results also suggest the importance of putative promoter elements in regulation of gene expression during the P. falciparum intra-erythrocytic developmental cycle and the potential of genetic polymorphisms within these regions to underlie drug resistance.
Rare variants in CFI, C3 and C9 are associated with high risk of advanced age-related macular degeneration
To define the role of rare variants in advanced age-related macular degeneration (AMD) risk, we sequenced the exons of 681 genes within AMD-associated loci and pathways in 2,493 cases and controls. We first tested each gene for increased or decreased burden of rare variants in cases compared to controls. We found that 7.8% of AMD cases compared to 2.3% of controls are carriers of rare missense CFI variants (OR=3.6, p=2×10−8). There was a predominance of dysfunctional variants in cases compared to controls. We then tested individual variants for association to disease. We observed significant association with rare missense alleles outside CFI. Genotyping in 5,115 independent samples confirmed associations to AMD with a K155Q allele in C3 (replication p=3.5×10−5, OR=2.8; joint p=5.2×10−9, OR=3.8) and a P167S allele in C9 (replication p=2.4×10−5, OR=2.2; joint p=6.5×10−7, OR=2.2). Finally, we show that the 155Q allele in C3 results in resistance to proteolytic inactivation by CFH and CFI. These results implicate loss of C3 protein regulation and excessive alternative complement activation in AMD pathogenesis, thus informing both the direction of effect and mechanistic underpinnings of this disorder. To define the role of rare variants in advanced age-related macular degeneration (AMD) risk, we sequenced the exons of 681 genes within AMD-associated loci and pathways in 2,493 cases and controls. We first tested each gene for increased or decreased burden of rare variants in cases compared to controls. We found that 7.8% of AMD cases compared to 2.3% of controls are carriers of rare missense CFI variants (OR=3.6, p=2×10−8). There was a predominance of dysfunctional variants in cases compared to controls. We then tested individual variants for association to disease. We observed significant association with rare missense alleles outside CFI. Genotyping in 5,115 independent samples confirmed associations to AMD with a K155Q allele in C3 (replication p=3.5×10−5, OR=2.8; joint p=5.2×10−9, OR=3.8) and a P167S allele in C9 (replication p=2.4×10−5, OR=2.2; joint p=6.5×10−7, OR=2.2). Finally, we show that the 155Q allele in C3 results in resistance to proteolytic inactivation by CFH and CFI. These results implicate loss of C3 protein regulation and excessive alternative complement activation in AMD pathogenesis, thus informing both the direction of effect and mechanistic underpinnings of this disorder.
In depth annotation of the Anopheles gambiae mosquito midgut transcriptome
Background Genome sequencing of Anopheles gambiae was completed more than ten years ago and has accelerated research on malaria transmission. However, annotation needs to be refined and verified experimentally, as most predicted transcripts have been identified by comparative analysis with genomes from other species. The mosquito midgut—the first organ to interact with Plasmodium parasites—mounts effective antiplasmodial responses that limit parasite survival and disease transmission. High-throughput Illumina sequencing of the midgut transcriptome was used to identify new genes and transcripts, contributing to the refinement of An. gambiae genome annotation. Results We sequenced ~223 million reads from An. gambiae midgut cDNA libraries generated from susceptible (G3) and refractory (L35) mosquito strains. Mosquitoes were infected with either Plasmodium berghei or Plasmodium falciparum, and midguts were collected after the first or second Plasmodium infection. In total, 22,889 unique midgut transcript models were generated from both An. gambiae strain sequences combined, and 76% are potentially novel. Of these novel transcripts, 49.5% aligned with annotated genes and appear to be isoforms or pre-mRNAs of reference transcripts, while 50.5% mapped to regions between annotated genes and represent novel intergenic transcripts (NITs). Predicted models were validated for midgut expression using qRT-PCR and microarray analysis, and novel isoforms were confirmed by sequencing predicted intron-exon boundaries. Coding potential analysis revealed that 43% of total midgut transcripts appear to be long non-coding RNA (lncRNA), and functional annotation of NITs showed that 68% had no homology to current databases from other species. Reads were also analyzed using de novo assembly and predicted transcripts compared with genome mapping-based models. Finally, variant analysis of G3 and L35 midgut transcripts detected 160,742 variants with respect to the An. gambiae PEST genome, and 74% were new variants. Intergenic transcripts had a higher frequency of variation compared with non-intergenic transcripts. Conclusion This in-depth Illumina sequencing and assembly of the An. gambiae midgut transcriptome doubled the number of known transcripts and tripled the number of variants known in this mosquito species. It also revealed existence of a large number of lncRNA and opens new possibilities for investigating the biological function of many newly discovered transcripts. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-636) contains supplementary material, which is available to authorized users. Background Genome sequencing of Anopheles gambiae was completed more than ten years ago and has accelerated research on malaria transmission. However, annotation needs to be refined and verified experimentally, as most predicted transcripts have been identified by comparative analysis with genomes from other species. The mosquito midgut—the first organ to interact with Plasmodium parasites—mounts effective antiplasmodial responses that limit parasite survival and disease transmission. High-throughput Illumina sequencing of the midgut transcriptome was used to identify new genes and transcripts, contributing to the refinement of An. gambiae genome annotation. Results We sequenced ~223 million reads from An. gambiae midgut cDNA libraries generated from susceptible (G3) and refractory (L35) mosquito strains. Mosquitoes were infected with either Plasmodium berghei or Plasmodium falciparum, and midguts were collected after the first or second Plasmodium infection. In total, 22,889 unique midgut transcript models were generated from both An. gambiae strain sequences combined, and 76% are potentially novel. Of these novel transcripts, 49.5% aligned with annotated genes and appear to be isoforms or pre-mRNAs of reference transcripts, while 50.5% mapped to regions between annotated genes and represent novel intergenic transcripts (NITs). Predicted models were validated for midgut expression using qRT-PCR and microarray analysis, and novel isoforms were confirmed by sequencing predicted intron-exon boundaries. Coding potential analysis revealed that 43% of total midgut transcripts appear to be long non-coding RNA (lncRNA), and functional annotation of NITs showed that 68% had no homology to current databases from other species. Reads were also analyzed using de novo assembly and predicted transcripts compared with genome mapping-based models. Finally, variant analysis of G3 and L35 midgut transcripts detected 160,742 variants with respect to the An. gambiae PEST genome, and 74% were new variants. Intergenic transcripts had a higher frequency of variation compared with non-intergenic transcripts. Conclusion This in-depth Illumina sequencing and assembly of the An. gambiae midgut transcriptome doubled the number of known transcripts and tripled the number of variants known in this mosquito species. It also revealed existence of a large number of lncRNA and opens new possibilities for investigating the biological function of many newly discovered transcripts. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-636) contains supplementary material, which is available to authorized users.
Lumbosacral stenosis in Labrador retriever military working dogs – an exomic exploratory study
Background Canine lumbosacral stenosis is defined as narrowing of the caudal lumbar and/or sacral vertebral canal. A risk factor for neurologic problems in many large sized breeds, lumbosacral stenosis can also cause early retirement in Labrador retriever military working dogs. Though vital for conservative management of the condition, early detection is complicated by the ambiguous nature of clinical signs of lumbosacral stenosis in stoic and high-drive Labrador retriever military working dogs. Though clinical diagnoses of lumbosacral stenosis using CT imaging are standard, they are usually not performed unless dogs present with clinical symptoms. Understanding the underlying genomic mechanisms would be beneficial in developing early detection methods for lumbosacral stenosis, which could prevent premature retirement in working dogs. The exomes of 8 young Labrador retriever military working dogs (4 affected and 4 unaffected by lumbosacral stenosis, phenotypically selected by CT image analyses from 40 dogs with no reported clinical signs of the condition) were sequenced to identify and annotate exonic variants between dogs negative and positive for lumbosacral stenosis. Results Two-hundred and fifty-two variants were detected to be homozygous for the wild allele and either homozygous or heterozygous for the variant allele. Seventeen non-disruptive variants were detected that could affect protein effectiveness in 7 annotated (SCN1B, RGS9BP, ASXL3, TTR, LRRC16B, PTPRO, ZBBX) and 3 predicted genes (EEF1A1, DNAJA1, ZFX). No exonic variants were detected in any of the canine orthologues for human lumbar spinal stenosis candidate genes. Conclusions TTR (transthyretin) gene could be a possible candidate for lumbosacral stenosis in Labrador retrievers based on previous human studies that have reported an association between human lumbar spinal stenosis and transthyretin protein amyloidosis. Other genes identified with exonic variants in this study but with no known published association with lumbosacral stenosis and/or lumbar spinal stenosis could also be candidate genes for future canine lumbosacral stenosis studies but their roles remain currently unknown. Human lumbar spinal stenosis candidate genes also cannot be ruled out as lumbosacral stenosis candidate genes. More definitive genetic investigations of this condition are needed before any genetic test for lumbosacral stenosis in Labrador retriever can be developed. Electronic supplementary material The online version of this article (10.1186/s40575-017-0052-6) contains supplementary material, which is available to authorized users. Background Canine lumbosacral stenosis is defined as narrowing of the caudal lumbar and/or sacral vertebral canal. A risk factor for neurologic problems in many large sized breeds, lumbosacral stenosis can also cause early retirement in Labrador retriever military working dogs. Though vital for conservative management of the condition, early detection is complicated by the ambiguous nature of clinical signs of lumbosacral stenosis in stoic and high-drive Labrador retriever military working dogs. Though clinical diagnoses of lumbosacral stenosis using CT imaging are standard, they are usually not performed unless dogs present with clinical symptoms. Understanding the underlying genomic mechanisms would be beneficial in developing early detection methods for lumbosacral stenosis, which could prevent premature retirement in working dogs. The exomes of 8 young Labrador retriever military working dogs (4 affected and 4 unaffected by lumbosacral stenosis, phenotypically selected by CT image analyses from 40 dogs with no reported clinical signs of the condition) were sequenced to identify and annotate exonic variants between dogs negative and positive for lumbosacral stenosis. Results Two-hundred and fifty-two variants were detected to be homozygous for the wild allele and either homozygous or heterozygous for the variant allele. Seventeen non-disruptive variants were detected that could affect protein effectiveness in 7 annotated (SCN1B, RGS9BP, ASXL3, TTR, LRRC16B, PTPRO, ZBBX) and 3 predicted genes (EEF1A1, DNAJA1, ZFX). No exonic variants were detected in any of the canine orthologues for human lumbar spinal stenosis candidate genes. Conclusions TTR (transthyretin) gene could be a possible candidate for lumbosacral stenosis in Labrador retrievers based on previous human studies that have reported an association between human lumbar spinal stenosis and transthyretin protein amyloidosis. Other genes identified with exonic variants in this study but with no known published association with lumbosacral stenosis and/or lumbar spinal stenosis could also be candidate genes for future canine lumbosacral stenosis studies but their roles remain currently unknown. Human lumbar spinal stenosis candidate genes also cannot be ruled out as lumbosacral stenosis candidate genes. More definitive genetic investigations of this condition are needed before any genetic test for lumbosacral stenosis in Labrador retriever can be developed. Electronic supplementary material The online version of this article (10.1186/s40575-017-0052-6) contains supplementary material, which is available to authorized users.
An autoinflammatory neurological disease due to interleukin 6 hypersecretion
Autoinflammatory diseases are rare illnesses characterized by apparently unprovoked inflammation without high-titer auto-antibodies or antigen-specific T cells. They may cause neurological manifestations, such as meningitis and hearing loss, but they are also characterized by non-neurological manifestations. In this work we studied a 30-year-old man who had a chronic disease characterized by meningitis, progressive hearing loss, persistently raised inflammatory markers and diffuse leukoencephalopathy on brain MRI. He also suffered from chronic recurrent osteomyelitis of the mandible. The hypothesis of an autoinflammatory disease prompted us to test for the presence of mutations in interleukin-1−pathway genes and to investigate the function of this pathway in the mononuclear cells obtained from the patient. Search for mutations in genes associated with interleukin-1−pathway demonstrated a novel NLRP3 (CIAS1) mutation (p.I288M) and a previously described MEFV mutation (p.R761H), but their combination was found to be non-pathogenic. On the other hand, we uncovered a selective interleukin-6 hypersecretion within the central nervous system as the likely pathogenic mechanism. This is also supported by the response to the anti-interleukin-6−receptor monoclonal antibody tocilizumab, but not to the recombinant interleukin-1−receptor antagonist anakinra. Exome sequencing failed to identify mutations in other genes known to be involved in autoinflammatory diseases. We propose that the disease described in this patient might be a prototype of a novel category of autoinflammatory diseases characterized by prominent neurological involvement. Autoinflammatory diseases are rare illnesses characterized by apparently unprovoked inflammation without high-titer auto-antibodies or antigen-specific T cells. They may cause neurological manifestations, such as meningitis and hearing loss, but they are also characterized by non-neurological manifestations. In this work we studied a 30-year-old man who had a chronic disease characterized by meningitis, progressive hearing loss, persistently raised inflammatory markers and diffuse leukoencephalopathy on brain MRI. He also suffered from chronic recurrent osteomyelitis of the mandible. The hypothesis of an autoinflammatory disease prompted us to test for the presence of mutations in interleukin-1−pathway genes and to investigate the function of this pathway in the mononuclear cells obtained from the patient. Search for mutations in genes associated with interleukin-1−pathway demonstrated a novel NLRP3 (CIAS1) mutation (p.I288M) and a previously described MEFV mutation (p.R761H), but their combination was found to be non-pathogenic. On the other hand, we uncovered a selective interleukin-6 hypersecretion within the central nervous system as the likely pathogenic mechanism. This is also supported by the response to the anti-interleukin-6−receptor monoclonal antibody tocilizumab, but not to the recombinant interleukin-1−receptor antagonist anakinra. Exome sequencing failed to identify mutations in other genes known to be involved in autoinflammatory diseases. We propose that the disease described in this patient might be a prototype of a novel category of autoinflammatory diseases characterized by prominent neurological involvement.
Somatic PRDM2 c.4467delA mutations in colorectal cancers control histone methylation and tumor growth
The chromatin modifier PRDM2/RIZ1 is inactivated by mutation in several forms of cancer and is a putative tumor suppressor gene. Frameshift mutations in the C-terminal region of PRDM2, affecting (A)8 or (A)9 repeats within exon 8, are found in one third of colorectal cancers with microsatellite instability, but the contribution of these mutations to colorectal tumorigenesis is unknown. To model somatic mutations in microsatellite unstable tumors, we devised a general approach to perform genome editing while stabilizing the mutated nucleotide repeat. We then engineered isogenic cell systems where the PRDM2 c.4467delA mutation in human HCT116 colorectal cancer cells was corrected to wild-type by genome editing. Restored PRDM2 increased global histone 3 lysine 9 dimethylation and reduced migration, anchorage-independent growth and tumor growth in vivo. Gene set enrichment analysis revealed regulation of several hallmark cancer pathways, particularly of epithelial-to-mesenchymal transition (EMT), with VIM being the most significantly regulated gene. These observations provide direct evidence that PRDM2 c.4467delA is a driver mutation in colorectal cancer and confirms PRDM2 as a cancer gene, pointing to regulation of EMT as a central aspect of its tumor suppressive action. The chromatin modifier PRDM2/RIZ1 is inactivated by mutation in several forms of cancer and is a putative tumor suppressor gene. Frameshift mutations in the C-terminal region of PRDM2, affecting (A)8 or (A)9 repeats within exon 8, are found in one third of colorectal cancers with microsatellite instability, but the contribution of these mutations to colorectal tumorigenesis is unknown. To model somatic mutations in microsatellite unstable tumors, we devised a general approach to perform genome editing while stabilizing the mutated nucleotide repeat. We then engineered isogenic cell systems where the PRDM2 c.4467delA mutation in human HCT116 colorectal cancer cells was corrected to wild-type by genome editing. Restored PRDM2 increased global histone 3 lysine 9 dimethylation and reduced migration, anchorage-independent growth and tumor growth in vivo. Gene set enrichment analysis revealed regulation of several hallmark cancer pathways, particularly of epithelial-to-mesenchymal transition (EMT), with VIM being the most significantly regulated gene. These observations provide direct evidence that PRDM2 c.4467delA is a driver mutation in colorectal cancer and confirms PRDM2 as a cancer gene, pointing to regulation of EMT as a central aspect of its tumor suppressive action.
Patterns of Population Variation in Two Paleopolyploid Eudicot Lineages Suggest That Dosage-Based Selection on Homeologs Is Long-Lived
Abstract Genes that are inherently subject to strong selective constraints tend to be overretained in duplicate after polyploidy. They also continue to experience similar, but somewhat relaxed, constraints after that polyploidy event. We sought to assess for how long the influence of polyploidy is felt on these genes’ selective pressures. We analyzed two nested polyploidy events in Brassicaceae: the At-α genome duplication that is the most recent polyploidy in the model plant Arabidopsis thaliana and a more recent hexaploidy shared by the genus Brassica and its relatives. By comparing the strength and direction of the natural selection acting at the population and at the species level, we find evidence for continued intensified purifying selection acting on retained duplicates from both polyploidies even down to the present. The constraint observed in preferentially retained genes is not a result of the polyploidy event: the orthologs of such genes experience even stronger constraint in nonpolyploid outgroup genomes. In both the Arabidopsis and Brassica lineages, we further find evidence for segregating mildly deleterious variants, confirming that the population-level data uncover patterns not visible with between-species comparisons. Using the A. thaliana metabolic network, we also explored whether network position was correlated with the measured selective constraint. At both the population and species level, nodes/genes tended to show similar constraints to their neighbors. Our results paint a picture of the long-lived effects of polyploidy on plant genomes, suggesting that even yesterday’s polyploids still have distinct evolutionary trajectories. Abstract Genes that are inherently subject to strong selective constraints tend to be overretained in duplicate after polyploidy. They also continue to experience similar, but somewhat relaxed, constraints after that polyploidy event. We sought to assess for how long the influence of polyploidy is felt on these genes’ selective pressures. We analyzed two nested polyploidy events in Brassicaceae: the At-α genome duplication that is the most recent polyploidy in the model plant Arabidopsis thaliana and a more recent hexaploidy shared by the genus Brassica and its relatives. By comparing the strength and direction of the natural selection acting at the population and at the species level, we find evidence for continued intensified purifying selection acting on retained duplicates from both polyploidies even down to the present. The constraint observed in preferentially retained genes is not a result of the polyploidy event: the orthologs of such genes experience even stronger constraint in nonpolyploid outgroup genomes. In both the Arabidopsis and Brassica lineages, we further find evidence for segregating mildly deleterious variants, confirming that the population-level data uncover patterns not visible with between-species comparisons. Using the A. thaliana metabolic network, we also explored whether network position was correlated with the measured selective constraint. At both the population and species level, nodes/genes tended to show similar constraints to their neighbors. Our results paint a picture of the long-lived effects of polyploidy on plant genomes, suggesting that even yesterday’s polyploids still have distinct evolutionary trajectories.
Whole-genome sequencing and SNV genotyping of ‘Nebbiolo’ (Vitis vinifera L.) clones
‘Nebbiolo’ (Vitis vinifera) is among the most ancient and prestigious wine grape varieties characterised by a wide genetic variability exhibited by a high number of clones (vegetatively propagated lines of selected mother plants). However, limited information is available for this cultivar at the molecular and genomic levels. The whole-genomes of three ‘Nebbiolo’ clones (CVT 71, CVT 185 and CVT 423) were re-sequenced and a de novo transcriptome assembly was produced. Important remarks about the genetic peculiarities of ‘Nebbiolo’ and its intra-varietal variability useful for clonal identification were reported. In particular, several varietal transcripts identified for the first time in ‘Nebbiolo’ were disease resistance genes and single-nucleotide variants (SNVs) identified in ‘Nebbiolo’, but not in other cultivars, were associated with genes involved in the stress response. Ten newly discovered SNVs were successfully employed to identify some periclinal chimeras and to classify 98 ‘Nebbiolo’ clones in seven main genotypes, which resulted to be linked to the geographical origin of accessions. In addition, for the first time it was possible to discriminate some ‘Nebbiolo’ clones from the others. ‘Nebbiolo’ (Vitis vinifera) is among the most ancient and prestigious wine grape varieties characterised by a wide genetic variability exhibited by a high number of clones (vegetatively propagated lines of selected mother plants). However, limited information is available for this cultivar at the molecular and genomic levels. The whole-genomes of three ‘Nebbiolo’ clones (CVT 71, CVT 185 and CVT 423) were re-sequenced and a de novo transcriptome assembly was produced. Important remarks about the genetic peculiarities of ‘Nebbiolo’ and its intra-varietal variability useful for clonal identification were reported. In particular, several varietal transcripts identified for the first time in ‘Nebbiolo’ were disease resistance genes and single-nucleotide variants (SNVs) identified in ‘Nebbiolo’, but not in other cultivars, were associated with genes involved in the stress response. Ten newly discovered SNVs were successfully employed to identify some periclinal chimeras and to classify 98 ‘Nebbiolo’ clones in seven main genotypes, which resulted to be linked to the geographical origin of accessions. In addition, for the first time it was possible to discriminate some ‘Nebbiolo’ clones from the others.
A novel homozygous truncating GNAT1 mutation implicated in retinal degeneration
Background The GNAT1 gene encodes the α subunit of the rod transducin protein, a key element in the rod phototransduction cascade. Variants in GNAT1 have been implicated in stationary night-blindness in the past, but unlike other proteins in the same pathway, it has not previously been implicated in retinitis pigmentosa. Methods A panel of 182 retinopathy-associated genes was sequenced to locate disease-causing mutations in patients with inherited retinopathies. Results Sequencing revealed a novel homozygous truncating mutation in the GNAT1 gene in a patient with significant pigmentary disturbance and constriction of visual fields, a presentation consistent with retinitis pigmentosa. This is the first report of a patient homozygous for a complete loss-of-function GNAT1 mutation. The clinical data from this patient provide definitive evidence of retinitis pigmentosa with late onset in addition to the lifelong night-blindness that would be expected from a lack of transducin function. Conclusion These data suggest that some truncating GNAT1 variants can indeed cause a recessive, mild, late-onset retinal degeneration in human beings rather than just stationary night-blindness as reported previously, with notable similarities to the phenotype of the Gnat1 knockout mouse. Background The GNAT1 gene encodes the α subunit of the rod transducin protein, a key element in the rod phototransduction cascade. Variants in GNAT1 have been implicated in stationary night-blindness in the past, but unlike other proteins in the same pathway, it has not previously been implicated in retinitis pigmentosa. Methods A panel of 182 retinopathy-associated genes was sequenced to locate disease-causing mutations in patients with inherited retinopathies. Results Sequencing revealed a novel homozygous truncating mutation in the GNAT1 gene in a patient with significant pigmentary disturbance and constriction of visual fields, a presentation consistent with retinitis pigmentosa. This is the first report of a patient homozygous for a complete loss-of-function GNAT1 mutation. The clinical data from this patient provide definitive evidence of retinitis pigmentosa with late onset in addition to the lifelong night-blindness that would be expected from a lack of transducin function. Conclusion These data suggest that some truncating GNAT1 variants can indeed cause a recessive, mild, late-onset retinal degeneration in human beings rather than just stationary night-blindness as reported previously, with notable similarities to the phenotype of the Gnat1 knockout mouse.
AR-13, a Celecoxib Derivative, Directly Kills Francisella In Vitro and Aids Clearance and Mouse Survival In Vivo
Francisella tularensis (F. tularensis) is the causative agent of tularemia and is classified as a Tier 1 select agent. No licensed vaccine is currently available in the United States and treatment of tularemia is confined to few antibiotics. In this study, we demonstrate that AR-13, a derivative of the cyclooxygenase-2 inhibitor celecoxib, exhibits direct in vitro bactericidal killing activity against Francisella including a type A strain of F. tularensis (SchuS4) and the live vaccine strain (LVS), as well as toward the intracellular proliferation of LVS in macrophages, without causing significant host cell toxicity. Identification of an AR-13-resistant isolate indicates that this compound has an intracellular target(s) and that efflux pumps can mediate AR-13 resistance. In the mouse model of tularemia, AR-13 treatment protected 50% of the mice from lethal LVS infection and prolonged survival time from a lethal dose of F. tularensis SchuS4. Combination of AR-13 with a sub-optimal dose of gentamicin protected 60% of F. tularensis SchuS4-infected mice from death. Taken together, these data support the translational potential of AR-13 as a lead compound for the further development of new anti-Francisella agents. Francisella tularensis (F. tularensis) is the causative agent of tularemia and is classified as a Tier 1 select agent. No licensed vaccine is currently available in the United States and treatment of tularemia is confined to few antibiotics. In this study, we demonstrate that AR-13, a derivative of the cyclooxygenase-2 inhibitor celecoxib, exhibits direct in vitro bactericidal killing activity against Francisella including a type A strain of F. tularensis (SchuS4) and the live vaccine strain (LVS), as well as toward the intracellular proliferation of LVS in macrophages, without causing significant host cell toxicity. Identification of an AR-13-resistant isolate indicates that this compound has an intracellular target(s) and that efflux pumps can mediate AR-13 resistance. In the mouse model of tularemia, AR-13 treatment protected 50% of the mice from lethal LVS infection and prolonged survival time from a lethal dose of F. tularensis SchuS4. Combination of AR-13 with a sub-optimal dose of gentamicin protected 60% of F. tularensis SchuS4-infected mice from death. Taken together, these data support the translational potential of AR-13 as a lead compound for the further development of new anti-Francisella agents.
Comparative analysis of a sex chromosome from the blackchin tilapia, Sarotherodon melanotheron
Background Inversions and other structural polymorphisms often reduce the rate of recombination between sex chromosomes, making it impossible to fine map sex-determination loci using traditional genetic mapping techniques. Here we compare distantly related species of tilapia that each segregate an XY system of sex-determination on linkage group 1. We use whole genome sequencing to identify shared sex-patterned polymorphisms, which are candidates for the ancestral sex-determination mutation. Results We found that Sarotherodon melanotheron segregates an XY system on LG1 in the same region identified in Oreochromis niloticus. Both species have higher densities of sex-patterned SNPs, as well as elevated number of ancestral copy number variants in this region when compared to the rest of the genome, but the pattern of differentiation along LG1 differs between species. The number of sex-patterned SNPs shared by the two species is small, but larger than expected by chance, suggesting that a novel Y-chromosome arose just before the divergence of the two species. We identified a shared sex-patterned SNP that alters a Gata4 binding site near Wilms tumor protein that might be responsible for sex-determination. Conclusions Shared sex-patterned SNPs, insertions and deletions suggest an ancestral sex-determination system that is common to both S. melanotheron and O. niloticus. Functional analyses are needed to evaluate shared SNPs near candidate genes that might play a role in sex-determination of these species. Interspecific variation in the sex chromosomes of tilapia species provides an excellent model system for understanding the evolution of vertebrate sex chromosomes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3163-7) contains supplementary material, which is available to authorized users. Background Inversions and other structural polymorphisms often reduce the rate of recombination between sex chromosomes, making it impossible to fine map sex-determination loci using traditional genetic mapping techniques. Here we compare distantly related species of tilapia that each segregate an XY system of sex-determination on linkage group 1. We use whole genome sequencing to identify shared sex-patterned polymorphisms, which are candidates for the ancestral sex-determination mutation. Results We found that Sarotherodon melanotheron segregates an XY system on LG1 in the same region identified in Oreochromis niloticus. Both species have higher densities of sex-patterned SNPs, as well as elevated number of ancestral copy number variants in this region when compared to the rest of the genome, but the pattern of differentiation along LG1 differs between species. The number of sex-patterned SNPs shared by the two species is small, but larger than expected by chance, suggesting that a novel Y-chromosome arose just before the divergence of the two species. We identified a shared sex-patterned SNP that alters a Gata4 binding site near Wilms tumor protein that might be responsible for sex-determination. Conclusions Shared sex-patterned SNPs, insertions and deletions suggest an ancestral sex-determination system that is common to both S. melanotheron and O. niloticus. Functional analyses are needed to evaluate shared SNPs near candidate genes that might play a role in sex-determination of these species. Interspecific variation in the sex chromosomes of tilapia species provides an excellent model system for understanding the evolution of vertebrate sex chromosomes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3163-7) contains supplementary material, which is available to authorized users.
MME mutation in dominant spinocerebellar ataxia with neuropathy (SCA43)
Objective: To identify the causative gene mutation in a 5-generation Belgian family with dominantly inherited spinocerebellar ataxia and polyneuropathy, in which known genetic etiologies had been excluded. Methods: We collected DNA samples of 28 family members, including 7 living affected individuals, whose clinical records were reviewed by a neurologist experienced in ataxia. We combined linkage data of 21 family members with whole exome sequencing in 2 affected individuals to identify shared heterozygous variants mapping to potentially linked regions. Variants were screened for rarity and for predicted damaging effect. A candidate mutation was confirmed by Sanger sequencing and tested for cosegregation with the disease. Results: Affected individuals presented with late-onset sensorimotor axonal polyneuropathy; all but one also had cerebellar ataxia. We identified a variant in the MME gene, p.C143Y, that was absent from control databases, cosegregated with the phenotype, and was predicted to have a strong damaging effect on the encoded protein by all algorithms we used. Conclusions: MME encodes neprilysin (NEP), a zinc-dependent metalloprotease expressed in most tissues, including the central and peripheral nervous systems. The mutated cysteine 143 forms a disulfide bridge, which is 100% conserved in NEP and in similar enzymes. The recent identification of recessive MME mutations in 10 unrelated individuals from Japan with axonal polyneuropathy further supports the causality of the mutation, despite the dominant mode of inheritance and the presence of cerebellar involvement in our study family. Functional studies are needed to identify the mechanisms underlying these differences. Objective: To identify the causative gene mutation in a 5-generation Belgian family with dominantly inherited spinocerebellar ataxia and polyneuropathy, in which known genetic etiologies had been excluded. Methods: We collected DNA samples of 28 family members, including 7 living affected individuals, whose clinical records were reviewed by a neurologist experienced in ataxia. We combined linkage data of 21 family members with whole exome sequencing in 2 affected individuals to identify shared heterozygous variants mapping to potentially linked regions. Variants were screened for rarity and for predicted damaging effect. A candidate mutation was confirmed by Sanger sequencing and tested for cosegregation with the disease. Results: Affected individuals presented with late-onset sensorimotor axonal polyneuropathy; all but one also had cerebellar ataxia. We identified a variant in the MME gene, p.C143Y, that was absent from control databases, cosegregated with the phenotype, and was predicted to have a strong damaging effect on the encoded protein by all algorithms we used. Conclusions: MME encodes neprilysin (NEP), a zinc-dependent metalloprotease expressed in most tissues, including the central and peripheral nervous systems. The mutated cysteine 143 forms a disulfide bridge, which is 100% conserved in NEP and in similar enzymes. The recent identification of recessive MME mutations in 10 unrelated individuals from Japan with axonal polyneuropathy further supports the causality of the mutation, despite the dominant mode of inheritance and the presence of cerebellar involvement in our study family. Functional studies are needed to identify the mechanisms underlying these differences.
Rare variants of small effect size in neuronal excitability genes influence clinical outcome in Japanese cases of SCN1A truncation-positive Dravet syndrome
Dravet syndrome (DS) is a rare, devastating form of childhood epilepsy that is often associated with mutations in the voltage-gated sodium channel gene, SCN1A. There is considerable variability in expressivity within families, as well as among individuals carrying the same primary mutation, suggesting that clinical outcome is modulated by variants at other genes. To identify modifier gene variants that contribute to clinical outcome, we sequenced the exomes of 22 individuals at both ends of a phenotype distribution (i.e., mild and severe cognitive condition). We controlled for variation associated with different mutation types by limiting inclusion to individuals with a de novo truncation mutation resulting in SCN1A haploinsufficiency. We performed tests aimed at identifying 1) single common variants that are enriched in either phenotypic group, 2) sets of common or rare variants aggregated in and around genes associated with clinical outcome, and 3) rare variants in 237 candidate genes associated with neuronal excitability. While our power to identify enrichment of a common variant in either phenotypic group is limited as a result of the rarity of mild phenotypes in individuals with SCN1A truncation variants, our top candidates did not map to functional regions of genes, or in genes that are known to be associated with neurological pathways. In contrast, we found a statistically-significant excess of rare variants predicted to be damaging and of small effect size in genes associated with neuronal excitability in severely affected individuals. A KCNQ2 variant previously associated with benign neonatal seizures is present in 3 of 12 individuals in the severe category. To compare our results with the healthy population, we performed a similar analysis on whole exome sequencing data from 70 Japanese individuals in the 1000 genomes project. Interestingly, the frequency of rare damaging variants in the same set of neuronal excitability genes in healthy individuals is nearly as high as in severely affected individuals. Rather than a single common gene/variant modifying clinical outcome in SCN1A-related epilepsies, our results point to the cumulative effect of rare variants with little to no measurable phenotypic effect (i.e., typical genetic background) unless present in combination with a disease-causing truncation mutation in SCN1A. Dravet syndrome (DS) is a rare, devastating form of childhood epilepsy that is often associated with mutations in the voltage-gated sodium channel gene, SCN1A. There is considerable variability in expressivity within families, as well as among individuals carrying the same primary mutation, suggesting that clinical outcome is modulated by variants at other genes. To identify modifier gene variants that contribute to clinical outcome, we sequenced the exomes of 22 individuals at both ends of a phenotype distribution (i.e., mild and severe cognitive condition). We controlled for variation associated with different mutation types by limiting inclusion to individuals with a de novo truncation mutation resulting in SCN1A haploinsufficiency. We performed tests aimed at identifying 1) single common variants that are enriched in either phenotypic group, 2) sets of common or rare variants aggregated in and around genes associated with clinical outcome, and 3) rare variants in 237 candidate genes associated with neuronal excitability. While our power to identify enrichment of a common variant in either phenotypic group is limited as a result of the rarity of mild phenotypes in individuals with SCN1A truncation variants, our top candidates did not map to functional regions of genes, or in genes that are known to be associated with neurological pathways. In contrast, we found a statistically-significant excess of rare variants predicted to be damaging and of small effect size in genes associated with neuronal excitability in severely affected individuals. A KCNQ2 variant previously associated with benign neonatal seizures is present in 3 of 12 individuals in the severe category. To compare our results with the healthy population, we performed a similar analysis on whole exome sequencing data from 70 Japanese individuals in the 1000 genomes project. Interestingly, the frequency of rare damaging variants in the same set of neuronal excitability genes in healthy individuals is nearly as high as in severely affected individuals. Rather than a single common gene/variant modifying clinical outcome in SCN1A-related epilepsies, our results point to the cumulative effect of rare variants with little to no measurable phenotypic effect (i.e., typical genetic background) unless present in combination with a disease-causing truncation mutation in SCN1A.
Mycobacterium leprae genomes from naturally infected nonhuman primates
Author summary Mycobacterium leprae, which causes leprosy in humans, also infects nine-banded armadillos, red squirrels, and nonhuman primates. Genomic data for M. leprae strains from wild armadillos and red squirrels show that humans were responsible for the original introduction of M. leprae to these species. It is not known whether naturally occurring leprosy among nonhuman primates is due to incidental infections from humans or whether nonhuman primates can serve as a host for M. leprae. To this end, we sequenced complete genomes of M. leprae strains from three naturally infected nonhuman primates. Our results suggest that M. leprae strains can be transmitted from humans to nonhuman primates as well as between nonhuman primate species, and thus, other primates might serve as a host for M. leprae in the wild. We also assessed whether wild ring-tailed lemurs from Madagascar and chimpanzees from Uganda showed presence of M. leprae infection. Although these populations tested negative for M. leprae infection, further research on the prevalence of M. leprae in other wild nonhuman primate populations, especially in leprosy-endemic regions, is warranted. Author summary Mycobacterium leprae, which causes leprosy in humans, also infects nine-banded armadillos, red squirrels, and nonhuman primates. Genomic data for M. leprae strains from wild armadillos and red squirrels show that humans were responsible for the original introduction of M. leprae to these species. It is not known whether naturally occurring leprosy among nonhuman primates is due to incidental infections from humans or whether nonhuman primates can serve as a host for M. leprae. To this end, we sequenced complete genomes of M. leprae strains from three naturally infected nonhuman primates. Our results suggest that M. leprae strains can be transmitted from humans to nonhuman primates as well as between nonhuman primate species, and thus, other primates might serve as a host for M. leprae in the wild. We also assessed whether wild ring-tailed lemurs from Madagascar and chimpanzees from Uganda showed presence of M. leprae infection. Although these populations tested negative for M. leprae infection, further research on the prevalence of M. leprae in other wild nonhuman primate populations, especially in leprosy-endemic regions, is warranted.
De novo derivation of proteomes from transcriptomes for transcript and protein identification
Identification of proteins by tandem mass spectrometry requires a database of the proteins that could be in the sample. This is available for model species (e.g. humans) but not for non-model species. Ideally, for a non-model species the sequencing of expressed mRNA would generate a protein database for mass spectrometry based identification, allowing detection of genes and proteins using high throughput sequencing and protein identification technologies. Here we use human cells infected with human adenovirus as a complex and dynamic model to demonstrate this approach is robust. Our Proteomics Informed by Transcriptomics technique identifies >99% of over 3700 distinct proteins identified using traditional analysis reliant on comprehensive human and adenovirus protein lists. This facilitates high throughput acquisition of direct evidence for transcripts and proteins in non-model species. Critically, we show this approach can also be used to highlight genes and proteins undergoing dynamic changes in post transcriptional protein stability. Identification of proteins by tandem mass spectrometry requires a database of the proteins that could be in the sample. This is available for model species (e.g. humans) but not for non-model species. Ideally, for a non-model species the sequencing of expressed mRNA would generate a protein database for mass spectrometry based identification, allowing detection of genes and proteins using high throughput sequencing and protein identification technologies. Here we use human cells infected with human adenovirus as a complex and dynamic model to demonstrate this approach is robust. Our Proteomics Informed by Transcriptomics technique identifies >99% of over 3700 distinct proteins identified using traditional analysis reliant on comprehensive human and adenovirus protein lists. This facilitates high throughput acquisition of direct evidence for transcripts and proteins in non-model species. Critically, we show this approach can also be used to highlight genes and proteins undergoing dynamic changes in post transcriptional protein stability.
Complete human CD1a deficiency on Langerhans cells due to a rare point mutation in the coding sequence
Morgagnian cataract resulting from a naturally occurring nonsense mutation elucidates a role of CPAMD8 in mammalian lens development
To investigate the genetic basis of hereditary lens opacities we analyzed 31 cases of bilateral congenital cataract in Red Holstein Friesian cattle. A genome-wide association study revealed a significant association on bovine chromosome 7 at positions 6,166,179 and 12,429,691. Whole genome re-sequencing of one case and four relatives showed a nonsense mutation (g.5995966C>T) in the PZP-like, alpha-2-macroglobulin domain containing 8 (CPAMD8) gene leading to a premature stop codon (CPAMD8 p.Gln74*) associated with cataract development in cattle. With immunohistochemistry we confirmed a physiological expression of CPAMD8 in the ciliary body epithelium of the eye in unaffected cattle, while the protein was not detectable in the ciliary body of cattle with cataracts. RNA expression of CPAMD8 was detected in healthy adult, fetal and cataractous lenses. To investigate the genetic basis of hereditary lens opacities we analyzed 31 cases of bilateral congenital cataract in Red Holstein Friesian cattle. A genome-wide association study revealed a significant association on bovine chromosome 7 at positions 6,166,179 and 12,429,691. Whole genome re-sequencing of one case and four relatives showed a nonsense mutation (g.5995966C>T) in the PZP-like, alpha-2-macroglobulin domain containing 8 (CPAMD8) gene leading to a premature stop codon (CPAMD8 p.Gln74*) associated with cataract development in cattle. With immunohistochemistry we confirmed a physiological expression of CPAMD8 in the ciliary body epithelium of the eye in unaffected cattle, while the protein was not detectable in the ciliary body of cattle with cataracts. RNA expression of CPAMD8 was detected in healthy adult, fetal and cataractous lenses.
Energy metabolism and whole-exome sequencing-based analysis of Sasang constitution: a pilot study
Background Traditional Korean Sasang constitutional (SC) medicine categorizes individuals into four constitutional types [Tae-eum (TE), So-eum (SE), Tae-yang (TY), or So-yang (SY)] based on biological and physiological characteristics. As these characteristics are closely related to the bioenergetics of the human body, we assessed the correlation between SC type and energy metabolism features. Methods Forty healthy, young (22.3 ± 1.4 years) males volunteered to participate in this study. Participants answered an SC questionnaire, and their face shape, voice tone, and body shape were assessed using an SC analysis tool. Thirty-one participants (10 TE, 10 SE, 3 TY, and 8 SY) were selected for further analysis. Collected blood samples were subjected to blood composition analysis, mitochondrial function analysis, and whole-exome sequencing. Results The SY type showed significantly lower total cholesterol and high-density lipoprotein cholesterol levels than the SE type. Cellular and mitochondrial Adenosine triphosphate (ATP) levels were similar across types. All types showed similar basal mitochondrial oxygen consumption rates, whereas the TE type showed a significantly lower ATP-linked oxygen consumption rate than the other types. Whole-exome sequencing identified several genes variants that were exclusively detected in particular SC types, including 19 for SE, seven for SY, 11 for TE, and six for TY. Conclusion SC type-specific differences in mitochondrial function and gene mutations were detected in a small group of healthy, young Korean males. These results are expected to greatly improve the accurate screening and utilization of SC medicine. Background Traditional Korean Sasang constitutional (SC) medicine categorizes individuals into four constitutional types [Tae-eum (TE), So-eum (SE), Tae-yang (TY), or So-yang (SY)] based on biological and physiological characteristics. As these characteristics are closely related to the bioenergetics of the human body, we assessed the correlation between SC type and energy metabolism features. Methods Forty healthy, young (22.3 ± 1.4 years) males volunteered to participate in this study. Participants answered an SC questionnaire, and their face shape, voice tone, and body shape were assessed using an SC analysis tool. Thirty-one participants (10 TE, 10 SE, 3 TY, and 8 SY) were selected for further analysis. Collected blood samples were subjected to blood composition analysis, mitochondrial function analysis, and whole-exome sequencing. Results The SY type showed significantly lower total cholesterol and high-density lipoprotein cholesterol levels than the SE type. Cellular and mitochondrial Adenosine triphosphate (ATP) levels were similar across types. All types showed similar basal mitochondrial oxygen consumption rates, whereas the TE type showed a significantly lower ATP-linked oxygen consumption rate than the other types. Whole-exome sequencing identified several genes variants that were exclusively detected in particular SC types, including 19 for SE, seven for SY, 11 for TE, and six for TY. Conclusion SC type-specific differences in mitochondrial function and gene mutations were detected in a small group of healthy, young Korean males. These results are expected to greatly improve the accurate screening and utilization of SC medicine.
Whole-genome re-sequencing of two Italian tomato landraces reveals sequence variations in genes associated with stress tolerance, fruit quality and long shelf-life traits
Abstract Tomato is a high value crop and the primary model for fleshy fruit development and ripening. Breeding priorities include increased fruit quality, shelf life and tolerance to stresses. To contribute towards this goal, we re-sequenced the genomes of Corbarino (COR) and Lucariello (LUC) landraces, which both possess the traits of plant adaptation to water deficit, prolonged fruit shelf-life and good fruit quality. Through the newly developed pipeline Reconstructor, we generated the genome sequences of COR and LUC using datasets of 65.8 M and 56.4 M of 30–150 bp paired-end reads, respectively. New contigs including reads that could not be mapped to the tomato reference genome were assembled, and a total of 43, 054 and 44, 579 gene loci were annotated in COR and LUC. Both genomes showed novel regions with similarity to Solanum pimpinellifolium and Solanum pennellii. In addition to small deletions and insertions, 2, 000 and 1, 700 single nucleotide polymorphisms (SNPs) could exert potentially disruptive effects on 1, 371 and 1, 201 genes in COR and LUC, respectively. A detailed survey of the SNPs occurring in fruit quality, shelf life and stress tolerance related-genes identified several candidates of potential relevance. Variations in ethylene response components may concur in determining peculiar phenotypes of COR and LUC. Abstract Tomato is a high value crop and the primary model for fleshy fruit development and ripening. Breeding priorities include increased fruit quality, shelf life and tolerance to stresses. To contribute towards this goal, we re-sequenced the genomes of Corbarino (COR) and Lucariello (LUC) landraces, which both possess the traits of plant adaptation to water deficit, prolonged fruit shelf-life and good fruit quality. Through the newly developed pipeline Reconstructor, we generated the genome sequences of COR and LUC using datasets of 65.8 M and 56.4 M of 30–150 bp paired-end reads, respectively. New contigs including reads that could not be mapped to the tomato reference genome were assembled, and a total of 43, 054 and 44, 579 gene loci were annotated in COR and LUC. Both genomes showed novel regions with similarity to Solanum pimpinellifolium and Solanum pennellii. In addition to small deletions and insertions, 2, 000 and 1, 700 single nucleotide polymorphisms (SNPs) could exert potentially disruptive effects on 1, 371 and 1, 201 genes in COR and LUC, respectively. A detailed survey of the SNPs occurring in fruit quality, shelf life and stress tolerance related-genes identified several candidates of potential relevance. Variations in ethylene response components may concur in determining peculiar phenotypes of COR and LUC.
The Characteristics of Heterozygous Protein Truncating Variants in the Human Genome
Author Summary Genome sequencing provides evidence for large numbers of putative protein truncating variants in humans. Most truncating variants are only observed in few individuals but are collectively prevalent and widely distributed across the coding genome. Most of the truncating variants are so rare that they are only observed in heterozygosis. The current study identifies 10% of genes where heterozygous truncations are not observed and describes their biological characteristics. In addition, for genes where rare truncations are observed, we argue that these are an unexplored source of diversity of phenotypic traits and diseases. Author Summary Genome sequencing provides evidence for large numbers of putative protein truncating variants in humans. Most truncating variants are only observed in few individuals but are collectively prevalent and widely distributed across the coding genome. Most of the truncating variants are so rare that they are only observed in heterozygosis. The current study identifies 10% of genes where heterozygous truncations are not observed and describes their biological characteristics. In addition, for genes where rare truncations are observed, we argue that these are an unexplored source of diversity of phenotypic traits and diseases.
Inferred father-to-son transmission of herpes simplex virus results in near-perfect preservation of viral genome identity and in vivo phenotypes
High throughout sequencing has provided an unprecedented view of the circulating diversity of all classes of human herpesviruses. For herpes simplex virus 1 (HSV-1), we and others have previously published data demonstrating sequence diversity between hosts. However the extent of variation during transmission events, or in one host over years of chronic infection, remain unknown. Here we present an initial example of full characterization of viruses isolated from a father to son transmission event. The likely occasion of transmission occurred 17 years before the strains were isolated, enabling a first view of the degree of virus conservation after decades of recurrences, including transmission and adaptation to a new host. We have characterized the pathogenicity of these strains in a mouse ocular model of infection, and sequenced the full viral genomes. Surprisingly, we find that these two viruses have preserved their phenotype and genotype nearly perfectly during inferred transmission from father to son, and during nearly two decades of episodes of recurrent disease in each human host. Given the close genetic relationship of these two hosts, it remains to be seen whether or not this conservation of sequence will occur during non-familial transmission events. High throughout sequencing has provided an unprecedented view of the circulating diversity of all classes of human herpesviruses. For herpes simplex virus 1 (HSV-1), we and others have previously published data demonstrating sequence diversity between hosts. However the extent of variation during transmission events, or in one host over years of chronic infection, remain unknown. Here we present an initial example of full characterization of viruses isolated from a father to son transmission event. The likely occasion of transmission occurred 17 years before the strains were isolated, enabling a first view of the degree of virus conservation after decades of recurrences, including transmission and adaptation to a new host. We have characterized the pathogenicity of these strains in a mouse ocular model of infection, and sequenced the full viral genomes. Surprisingly, we find that these two viruses have preserved their phenotype and genotype nearly perfectly during inferred transmission from father to son, and during nearly two decades of episodes of recurrent disease in each human host. Given the close genetic relationship of these two hosts, it remains to be seen whether or not this conservation of sequence will occur during non-familial transmission events.
Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon
Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, ‘SCNU1154’, ‘Edisto47’, ‘MR-1’, and ‘PMR5’. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon. Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, ‘SCNU1154’, ‘Edisto47’, ‘MR-1’, and ‘PMR5’. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.
Extensive local adaptation within the chemosensory system following Drosophila melanogaster's global expansion
Fruit flies gain valuable information about their environment by sensing chemicals. Here, Arguello et al. show strong signals of recent selection on the chemosensory system of the fruit fly Drosophila melanogaster, consistent with the adaptation of populations to their local chemical environment. Fruit flies gain valuable information about their environment by sensing chemicals. Here, Arguello et al. show strong signals of recent selection on the chemosensory system of the fruit fly Drosophila melanogaster, consistent with the adaptation of populations to their local chemical environment.
Whole-Genome Sequencing and Comparative Genome Analysis of Bacillus subtilis Strains Isolated from Non-Salted Fermented Soybean Foods
Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA), we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food) starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from “Tua Nao” of Thailand traces a different evolutionary process from other strains. Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA), we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food) starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from “Tua Nao” of Thailand traces a different evolutionary process from other strains.
Drosophila Adaptation to Viral Infection through Defensive Symbiont Evolution
Author Summary Animals live in close association with microbial partners that can shape many aspects of their lives. For instance, several insects carry bacteria that defend them against parasites and infectious diseases. The intracellular bacterium Wolbachia protects the fruit fly Drosophila melanogaster against viral infection. Natural populations of Drosophila carry different variants of Wolbachia, which differ from one another in the strength of this protection. Here we show that a population of Drosophila infected with viruses during several generations adapts to this challenge through turnover in Wolbachia composition. The Wolbachia variants that give higher protection to viruses, by increasing fly survival and fecundity upon infection, are strongly selected. This work demonstrates that the interaction of an animal with a pathogen can shape its associated microbial populations. We show that adaptation to pathogens can be achieved not only through selection of resistance on the host proper but also through the evolutionary shaping of its microbial community. Author Summary Animals live in close association with microbial partners that can shape many aspects of their lives. For instance, several insects carry bacteria that defend them against parasites and infectious diseases. The intracellular bacterium Wolbachia protects the fruit fly Drosophila melanogaster against viral infection. Natural populations of Drosophila carry different variants of Wolbachia, which differ from one another in the strength of this protection. Here we show that a population of Drosophila infected with viruses during several generations adapts to this challenge through turnover in Wolbachia composition. The Wolbachia variants that give higher protection to viruses, by increasing fly survival and fecundity upon infection, are strongly selected. This work demonstrates that the interaction of an animal with a pathogen can shape its associated microbial populations. We show that adaptation to pathogens can be achieved not only through selection of resistance on the host proper but also through the evolutionary shaping of its microbial community.
The Pattern and Distribution of Induced Mutations in J. curcas Using Reduced Representation Sequencing
Mutagenesis in combination with Genotyping by Sequencing (GBS) is a powerful tool for introducing variation, studying gene function and identifying causal mutations underlying phenotypes of interest in crop plant genomes. About 400 million paired-end reads were obtained from 82 ethylmethane sulfonate (EMS) induced mutants and 14 wild-type accessions of Jatropha curcas for the detection of Single Nucleotide Polymorphisms (SNPs) and Insertion/Deletions (InDels) by two different approaches (nGBS and ddGBS) on an Illumina HiSeq 2000 sequencer. Using bioinformatics analyses, 1,452 induced SNPs and InDels were identified in coding regions, which were distributed across 995 genes. The predominantly observed mutations were G/C to A/T transitions (64%), while transversions were observed at a lower frequency (36%). Regarding the effect of mutations on gene function, 18% of the mutations were located in intergenic regions. In fact, mutants with the highest number of heterozygous SNPs were found in samples treated with 0.8% EMS for 3 h. Reconstruction of the metabolic pathways showed that in total 16 SNPs were located in six KEGG pathways by nGBS and two pathways by ddGBS. The most highly represented pathways were ether-lipid metabolism and glycerophospholipid metabolism, followed by starch and sucrose metabolism by nGBS and triterpenoid biosynthesis as well as steroid biosynthesis by ddGBS. Furthermore, high genome methylation was observed in J. curcas, which might help to understand the plasticity of the Jatropha genome in response to environmental factors. At last, the results showed that continuously vegetatively propagated tissue is a fast, efficient and accurate method to dissolve chimeras, especially for long-lived plants like J. curcas. Obtained data showed that allelic variations and in silico analyses of gene functions (gene function prediction), which control important traits, could be identified in mutant populations using nGBS and ddGBS. However, the handling of GBS data is more difficult and more challenging than the traditional TILLING strategy in mutated plants, since the Jatropha genome sequence is incomplete, which makes alignment and variant analysis of target sequence reads challenging to perform and interpret. Therefore, providing a complete Jatropha reference genome sequence with high quality should be a priority for any breeding program. Mutagenesis in combination with Genotyping by Sequencing (GBS) is a powerful tool for introducing variation, studying gene function and identifying causal mutations underlying phenotypes of interest in crop plant genomes. About 400 million paired-end reads were obtained from 82 ethylmethane sulfonate (EMS) induced mutants and 14 wild-type accessions of Jatropha curcas for the detection of Single Nucleotide Polymorphisms (SNPs) and Insertion/Deletions (InDels) by two different approaches (nGBS and ddGBS) on an Illumina HiSeq 2000 sequencer. Using bioinformatics analyses, 1,452 induced SNPs and InDels were identified in coding regions, which were distributed across 995 genes. The predominantly observed mutations were G/C to A/T transitions (64%), while transversions were observed at a lower frequency (36%). Regarding the effect of mutations on gene function, 18% of the mutations were located in intergenic regions. In fact, mutants with the highest number of heterozygous SNPs were found in samples treated with 0.8% EMS for 3 h. Reconstruction of the metabolic pathways showed that in total 16 SNPs were located in six KEGG pathways by nGBS and two pathways by ddGBS. The most highly represented pathways were ether-lipid metabolism and glycerophospholipid metabolism, followed by starch and sucrose metabolism by nGBS and triterpenoid biosynthesis as well as steroid biosynthesis by ddGBS. Furthermore, high genome methylation was observed in J. curcas, which might help to understand the plasticity of the Jatropha genome in response to environmental factors. At last, the results showed that continuously vegetatively propagated tissue is a fast, efficient and accurate method to dissolve chimeras, especially for long-lived plants like J. curcas. Obtained data showed that allelic variations and in silico analyses of gene functions (gene function prediction), which control important traits, could be identified in mutant populations using nGBS and ddGBS. However, the handling of GBS data is more difficult and more challenging than the traditional TILLING strategy in mutated plants, since the Jatropha genome sequence is incomplete, which makes alignment and variant analysis of target sequence reads challenging to perform and interpret. Therefore, providing a complete Jatropha reference genome sequence with high quality should be a priority for any breeding program.
A Novel Mutation in the Transglutaminase-1 Gene in an Autosomal Recessive Congenital Ichthyosis Patient
Structure-function implication on a novel homozygous Trp250/Gly mutation of transglutaminase-1 (TGM1) observed in a patient of autosomal recessive congenital ichthyosis is invoked from a bioinformatics analysis. Structural consequences of this mutation are hypothesized in comparison to homologous enzyme human factor XIIIA accepted as valid in similar structural analysis and are projected as guidelines for future studies at an experimental level on TGM1 thus mutated. Structure-function implication on a novel homozygous Trp250/Gly mutation of transglutaminase-1 (TGM1) observed in a patient of autosomal recessive congenital ichthyosis is invoked from a bioinformatics analysis. Structural consequences of this mutation are hypothesized in comparison to homologous enzyme human factor XIIIA accepted as valid in similar structural analysis and are projected as guidelines for future studies at an experimental level on TGM1 thus mutated.
Mutational landscapes of tongue carcinoma reveal recurrent mutations in genes of therapeutic and prognostic relevance
Background Carcinoma of the oral tongue (OTSCC) is the most common malignancy of the oral cavity, characterized by frequent recurrence and poor survival. The last three decades has witnessed a change in the OTSCC epidemiological profile, with increasing incidence in younger patients, females and never-smokers. Here, we sought to characterize the OTSCC genomic landscape and to determine factors that may delineate the genetic basis of this disease, inform prognosis and identify targets for therapeutic intervention. Methods Seventy-eight cases were subjected to whole-exome (n = 18) and targeted deep sequencing (n = 60). Results While the most common mutation was in TP53, the OTSCC genetic landscape differed from previously described cohorts of patients with head and neck tumors: OTSCCs demonstrated frequent mutations in DST and RNF213, while alterations in CDKN2A and NOTCH1 were significantly less frequent. Despite a lack of previously reported NOTCH1 mutations, integrated analysis showed enrichments of alterations affecting Notch signaling in OTSCC. Importantly, these Notch pathway alterations were prognostic on multivariate analyses. A high proportion of OTSCCs also presented with alterations in drug targetable and chromatin remodeling genes. Patients harboring mutations in actionable pathways were more likely to succumb from recurrent disease compared with those who did not, suggesting that the former should be considered for treatment with targeted compounds in future trials. Conclusions Our study defines the Asian OTSCC mutational landscape, highlighting the key role of Notch signaling in oral tongue tumorigenesis. We also observed somatic mutations in multiple therapeutically relevant genes, which may represent candidate drug targets in this highly lethal tumor type. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0219-2) contains supplementary material, which is available to authorized users. Background Carcinoma of the oral tongue (OTSCC) is the most common malignancy of the oral cavity, characterized by frequent recurrence and poor survival. The last three decades has witnessed a change in the OTSCC epidemiological profile, with increasing incidence in younger patients, females and never-smokers. Here, we sought to characterize the OTSCC genomic landscape and to determine factors that may delineate the genetic basis of this disease, inform prognosis and identify targets for therapeutic intervention. Methods Seventy-eight cases were subjected to whole-exome (n = 18) and targeted deep sequencing (n = 60). Results While the most common mutation was in TP53, the OTSCC genetic landscape differed from previously described cohorts of patients with head and neck tumors: OTSCCs demonstrated frequent mutations in DST and RNF213, while alterations in CDKN2A and NOTCH1 were significantly less frequent. Despite a lack of previously reported NOTCH1 mutations, integrated analysis showed enrichments of alterations affecting Notch signaling in OTSCC. Importantly, these Notch pathway alterations were prognostic on multivariate analyses. A high proportion of OTSCCs also presented with alterations in drug targetable and chromatin remodeling genes. Patients harboring mutations in actionable pathways were more likely to succumb from recurrent disease compared with those who did not, suggesting that the former should be considered for treatment with targeted compounds in future trials. Conclusions Our study defines the Asian OTSCC mutational landscape, highlighting the key role of Notch signaling in oral tongue tumorigenesis. We also observed somatic mutations in multiple therapeutically relevant genes, which may represent candidate drug targets in this highly lethal tumor type. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0219-2) contains supplementary material, which is available to authorized users.
Increased Missense Mutation Burden of Fatty Acid Metabolism Related Genes in Nunavik Inuit Population
Background Nunavik Inuit (northern Quebec, Canada) reside along the arctic coastline where for generations their daily energy intake has mainly been derived from animal fat. Given this particular diet it has been hypothesized that natural selection would lead to population specific allele frequency differences and unique variants in genes related to fatty acid metabolism. A group of genes, namely CPT1A, CPT1B, CPT1C, CPT2, CRAT and CROT, encode for three carnitine acyltransferases that are important for the oxidation of fatty acids, a critical step in their metabolism. Methods Exome sequencing and SNP array genotyping were used to examine the genetic variations in the six genes encoding for the carnitine acyltransferases in 113 Nunavik Inuit individuals. Results Altogether ten missense variants were found in genes CPT1A, CPT1B, CPT1C, CPT2 and CRAT, including three novel variants and one Inuit specific variant CPT1A p.P479L (rs80356779). The latter has the highest frequency (0.955) compared to other Inuit populations. We found that by comparison to Asians or Europeans, the Nunavik Inuit have an increased mutation burden in CPT1A, CPT2 and CRAT; there is also a high level of population differentiation based on carnitine acyltransferase gene variations between Nunavik Inuit and Asians. Conclusion The increased number and frequency of deleterious variants in these fatty acid metabolism genes in Nunavik Inuit may be the result of genetic adaptation to their diet and/or the extremely cold climate. In addition, the identification of these variants may help to understand some of the specific health risks of Nunavik Inuit. Background Nunavik Inuit (northern Quebec, Canada) reside along the arctic coastline where for generations their daily energy intake has mainly been derived from animal fat. Given this particular diet it has been hypothesized that natural selection would lead to population specific allele frequency differences and unique variants in genes related to fatty acid metabolism. A group of genes, namely CPT1A, CPT1B, CPT1C, CPT2, CRAT and CROT, encode for three carnitine acyltransferases that are important for the oxidation of fatty acids, a critical step in their metabolism. Methods Exome sequencing and SNP array genotyping were used to examine the genetic variations in the six genes encoding for the carnitine acyltransferases in 113 Nunavik Inuit individuals. Results Altogether ten missense variants were found in genes CPT1A, CPT1B, CPT1C, CPT2 and CRAT, including three novel variants and one Inuit specific variant CPT1A p.P479L (rs80356779). The latter has the highest frequency (0.955) compared to other Inuit populations. We found that by comparison to Asians or Europeans, the Nunavik Inuit have an increased mutation burden in CPT1A, CPT2 and CRAT; there is also a high level of population differentiation based on carnitine acyltransferase gene variations between Nunavik Inuit and Asians. Conclusion The increased number and frequency of deleterious variants in these fatty acid metabolism genes in Nunavik Inuit may be the result of genetic adaptation to their diet and/or the extremely cold climate. In addition, the identification of these variants may help to understand some of the specific health risks of Nunavik Inuit.
Accurate Breakpoint Mapping in Apparently Balanced Translocation Families with Discordant Phenotypes Using Whole Genome Mate-Pair Sequencing
Familial apparently balanced translocations (ABTs) segregating with discordant phenotypes are extremely challenging for interpretation and counseling due to the scarcity of publications and lack of routine techniques for quick investigation. Recently, next generation sequencing has emerged as an efficacious methodology for precise detection of translocation breakpoints. However, studies so far have mainly focused on de novo translocations. The present study focuses specifically on familial cases in order to shed some light to this diagnostic dilemma. Whole-genome mate-pair sequencing (WG-MPS) was applied to map the breakpoints in nine two-way ABT carriers from four families. Translocation breakpoints and patient-specific structural variants were validated by Sanger sequencing and quantitative Real Time PCR, respectively. Identical sequencing patterns and breakpoints were identified in affected and non-affected members carrying the same translocations. PTCD1, ATP5J2-PTCD1, CADPS2, and STPG1 were disrupted by the translocations in three families, rendering them initially as possible disease candidate genes. However, subsequent mutation screening and structural variant analysis did not reveal any pathogenic mutations or unique variants in the affected individuals that could explain the phenotypic differences between carriers of the same translocations. In conclusion, we suggest that NGS-based methods, such as WG-MPS, can be successfully used for detailed mapping of translocation breakpoints, which can also be used in routine clinical investigation of ABT cases. Unlike de novo translocations, no associations were determined here between familial two-way ABTs and the phenotype of the affected members, in which the presence of cryptic imbalances and complex chromosomal rearrangements has been excluded. Future whole-exome or whole-genome sequencing will potentially reveal unidentified mutations in the patients underlying the discordant phenotypes within each family. In addition, larger studies are needed to determine the exact percentage for phenotypic risk in families with ABTs. Familial apparently balanced translocations (ABTs) segregating with discordant phenotypes are extremely challenging for interpretation and counseling due to the scarcity of publications and lack of routine techniques for quick investigation. Recently, next generation sequencing has emerged as an efficacious methodology for precise detection of translocation breakpoints. However, studies so far have mainly focused on de novo translocations. The present study focuses specifically on familial cases in order to shed some light to this diagnostic dilemma. Whole-genome mate-pair sequencing (WG-MPS) was applied to map the breakpoints in nine two-way ABT carriers from four families. Translocation breakpoints and patient-specific structural variants were validated by Sanger sequencing and quantitative Real Time PCR, respectively. Identical sequencing patterns and breakpoints were identified in affected and non-affected members carrying the same translocations. PTCD1, ATP5J2-PTCD1, CADPS2, and STPG1 were disrupted by the translocations in three families, rendering them initially as possible disease candidate genes. However, subsequent mutation screening and structural variant analysis did not reveal any pathogenic mutations or unique variants in the affected individuals that could explain the phenotypic differences between carriers of the same translocations. In conclusion, we suggest that NGS-based methods, such as WG-MPS, can be successfully used for detailed mapping of translocation breakpoints, which can also be used in routine clinical investigation of ABT cases. Unlike de novo translocations, no associations were determined here between familial two-way ABTs and the phenotype of the affected members, in which the presence of cryptic imbalances and complex chromosomal rearrangements has been excluded. Future whole-exome or whole-genome sequencing will potentially reveal unidentified mutations in the patients underlying the discordant phenotypes within each family. In addition, larger studies are needed to determine the exact percentage for phenotypic risk in families with ABTs.
The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine
eLife digest Few crop failures have been as devastating as those caused by potato late blight in the 1840s. This disease is caused by a filamentous microbe called Phytophthora infestans, which spread from North America to Europe in 1845, leading to the Great Famine in Ireland and to severe crop losses in the rest of Europe. Phytophthora is thought to have originated in the Toluca valley of Mexico, where many different strains evolve alongside wild potato relatives, but the exact strain that caused the Great Famine, and how it is related to modern strains of the pathogen, has remained a mystery. Yoshida et al. have used a technique call ‘shotgun’ sequencing to map the genomes of 11 historical strains of P. infestans and 15 modern strains. The historical strains were extracted from the leaves of potato and tomato plants that were collected in North America and Europe, including Ireland and Great Britain, from 1845 onwards and stored in herbaria for future research. By comparing the genomes of the historical and modern samples, Yoshida et al. found that the historical strains all belonged to a single lineage that shows very little genetic diversity. Previously it has been proposed that this lineage was the same as US-1, which was the dominant strain of potato blight in the world until the end of the 1970s, or that it was more closely related to modern strains than to US-1. Yoshida et al. now rule out both of these possibilities and show that the lineage that caused the great famine, which they call HERB-1, is clearly distinct from US-1, although they are closely related, and they conclude that both HERB-1 and US-1 might have dispersed from a common ancestor that existed outside of Mexico in the early 1800s. Why US-1 later replaced HERB-1 as the dominant strain in the world is an important question for future studies. DOI: http://dx.doi.org/10.7554/eLife.00731.002 eLife digest Few crop failures have been as devastating as those caused by potato late blight in the 1840s. This disease is caused by a filamentous microbe called Phytophthora infestans, which spread from North America to Europe in 1845, leading to the Great Famine in Ireland and to severe crop losses in the rest of Europe. Phytophthora is thought to have originated in the Toluca valley of Mexico, where many different strains evolve alongside wild potato relatives, but the exact strain that caused the Great Famine, and how it is related to modern strains of the pathogen, has remained a mystery. Yoshida et al. have used a technique call ‘shotgun’ sequencing to map the genomes of 11 historical strains of P. infestans and 15 modern strains. The historical strains were extracted from the leaves of potato and tomato plants that were collected in North America and Europe, including Ireland and Great Britain, from 1845 onwards and stored in herbaria for future research. By comparing the genomes of the historical and modern samples, Yoshida et al. found that the historical strains all belonged to a single lineage that shows very little genetic diversity. Previously it has been proposed that this lineage was the same as US-1, which was the dominant strain of potato blight in the world until the end of the 1970s, or that it was more closely related to modern strains than to US-1. Yoshida et al. now rule out both of these possibilities and show that the lineage that caused the great famine, which they call HERB-1, is clearly distinct from US-1, although they are closely related, and they conclude that both HERB-1 and US-1 might have dispersed from a common ancestor that existed outside of Mexico in the early 1800s. Why US-1 later replaced HERB-1 as the dominant strain in the world is an important question for future studies. DOI: http://dx.doi.org/10.7554/eLife.00731.002
Comparative profiling of differentially expressed microRNAs between the follicular and luteal phases ovaries of goats
To explore if the regulation at post-transcriptional level of follicular phase (Fols) to luteal phase (Luts) transition occurs in the ovaries of Anhuai goats, the differentially expressed microRNAs (miRNAs) of ovaries in the Fols and Luts were analyzed using Solexa sequencing in the study. In total, 320 known miRNAs were co-expressed in the two phases, 339 and 353 known miRNAs were expressed in the ovary in the Fols and Luts, respectively. In addition, 45 novel miRNAs were co-expressed in the two phases, 70 and 94 novel miRNAs were expressed in the ovary in the Fols and Luts, respectively. Let-7f was the highest expressed significantly different known miRNA in the two phases, and mir-159 was the highest expressed significantly different novel miRNA in the two phases, which may participate in the follicular-luteal transition of Anhuai goats. GO annotation and KEGG pathway analysis were applied to analyze the target genes of differentially expressed miRNAs detected in the two phases. The results will help to further understand the role of miRNAs in the regulation of follicular to luteal transition in goat ovaries. Electronic supplementary material The online version of this article (doi:10.1186/s40064-016-2902-1) contains supplementary material, which is available to authorized users. To explore if the regulation at post-transcriptional level of follicular phase (Fols) to luteal phase (Luts) transition occurs in the ovaries of Anhuai goats, the differentially expressed microRNAs (miRNAs) of ovaries in the Fols and Luts were analyzed using Solexa sequencing in the study. In total, 320 known miRNAs were co-expressed in the two phases, 339 and 353 known miRNAs were expressed in the ovary in the Fols and Luts, respectively. In addition, 45 novel miRNAs were co-expressed in the two phases, 70 and 94 novel miRNAs were expressed in the ovary in the Fols and Luts, respectively. Let-7f was the highest expressed significantly different known miRNA in the two phases, and mir-159 was the highest expressed significantly different novel miRNA in the two phases, which may participate in the follicular-luteal transition of Anhuai goats. GO annotation and KEGG pathway analysis were applied to analyze the target genes of differentially expressed miRNAs detected in the two phases. The results will help to further understand the role of miRNAs in the regulation of follicular to luteal transition in goat ovaries. Electronic supplementary material The online version of this article (doi:10.1186/s40064-016-2902-1) contains supplementary material, which is available to authorized users.
Genomic Features of the Damselfly Calopteryx splendens Representing a Sister Clade to Most Insect Orders
Insects comprise the most diverse and successful animal group with over one million described species that are found in almost every terrestrial and limnic habitat, with many being used as important models in genetics, ecology, and evolutionary research. Genome sequencing projects have greatly expanded the sampling of species from many insect orders, but genomic resources for species of certain insect lineages have remained relatively limited to date. To address this paucity, we sequenced the genome of the banded demoiselle, Calopteryx splendens, a damselfly (Odonata: Zygoptera) belonging to Palaeoptera, the clade containing the first winged insects. The 1.6 Gbp C. splendens draft genome assembly is one of the largest insect genomes sequenced to date and encodes a predicted set of 22,523 protein-coding genes. Comparative genomic analyses with other sequenced insects identified a relatively small repertoire of C. splendens detoxification genes, which could explain its previously noted sensitivity to habitat pollution. Intriguingly, this repertoire includes a cytochrome P450 gene not previously described in any insect genome. The C. splendens immune gene repertoire appears relatively complete and features several genes encoding novel multi-domain peptidoglycan recognition proteins. Analysis of chemosensory genes revealed the presence of both gustatory and ionotropic receptors, as well as the insect odorant receptor coreceptor gene (OrCo) and at least four partner odorant receptors (ORs). This represents the oldest known instance of a complete OrCo/OR system in insects, and provides the molecular underpinning for odonate olfaction. The C. splendens genome improves the sampling of insect lineages that diverged before the radiation of Holometabola and offers new opportunities for molecular-level evolutionary, ecological, and behavioral studies. Insects comprise the most diverse and successful animal group with over one million described species that are found in almost every terrestrial and limnic habitat, with many being used as important models in genetics, ecology, and evolutionary research. Genome sequencing projects have greatly expanded the sampling of species from many insect orders, but genomic resources for species of certain insect lineages have remained relatively limited to date. To address this paucity, we sequenced the genome of the banded demoiselle, Calopteryx splendens, a damselfly (Odonata: Zygoptera) belonging to Palaeoptera, the clade containing the first winged insects. The 1.6 Gbp C. splendens draft genome assembly is one of the largest insect genomes sequenced to date and encodes a predicted set of 22,523 protein-coding genes. Comparative genomic analyses with other sequenced insects identified a relatively small repertoire of C. splendens detoxification genes, which could explain its previously noted sensitivity to habitat pollution. Intriguingly, this repertoire includes a cytochrome P450 gene not previously described in any insect genome. The C. splendens immune gene repertoire appears relatively complete and features several genes encoding novel multi-domain peptidoglycan recognition proteins. Analysis of chemosensory genes revealed the presence of both gustatory and ionotropic receptors, as well as the insect odorant receptor coreceptor gene (OrCo) and at least four partner odorant receptors (ORs). This represents the oldest known instance of a complete OrCo/OR system in insects, and provides the molecular underpinning for odonate olfaction. The C. splendens genome improves the sampling of insect lineages that diverged before the radiation of Holometabola and offers new opportunities for molecular-level evolutionary, ecological, and behavioral studies.
Whole transcriptome RNA-Seq analysis reveals extensive cell type-specific compartmentalization in Volvox carteri
Background One of evolution’s most important achievements is the development and radiation of multicellular organisms with different types of cells. Complex multicellularity has evolved several times in eukaryotes; yet, in most lineages, an investigation of its molecular background is considerably challenging since the transition occurred too far in the past and, in addition, these lineages evolved a large number of cell types. However, for volvocine green algae, such as Volvox carteri, multicellularity is a relatively recent innovation. Furthermore, V. carteri shows a complete division of labor between only two cell types – small, flagellated somatic cells and large, immotile reproductive cells. Thus, V. carteri provides a unique opportunity to study multicellularity and cellular differentiation at the molecular level. Results This study provides a whole transcriptome RNA-Seq analysis of separated cell types of the multicellular green alga V. carteri f. nagariensis to reveal cell type-specific components and functions. To this end, 246 million quality filtered reads were mapped to the genome and valid expression data were obtained for 93% of the 14,247 gene loci. In the subsequent search for protein domains with assigned molecular function, we identified 9435 previously classified domains in 44% of all gene loci. Furthermore, in 43% of all gene loci we identified 15,254 domains that are involved in biological processes. All identified domains were investigated regarding cell type-specific expression. Moreover, we provide further insight into the expression pattern of previously described gene families (e.g., pherophorin, extracellular matrix metalloprotease, and VARL families). Our results demonstrate an extensive compartmentalization of the transcriptome between cell types: More than half of all genes show a clear difference in expression between somatic and reproductive cells. Conclusions This study constitutes the first transcriptome-wide RNA-Seq analysis of separated cell types of V. carteri focusing on gene expression. The high degree of differential expression indicates a strong differentiation of cell types despite the fact that V. carteri diverged relatively recently from its unicellular relatives. Our expression dataset and the bioinformatic analyses provide the opportunity to further investigate and understand the mechanisms of cell type-specific expression and its transcriptional regulation. Electronic supplementary material The online version of this article (doi:10.1186/s12915-017-0450-y) contains supplementary material, which is available to authorized users. Background One of evolution’s most important achievements is the development and radiation of multicellular organisms with different types of cells. Complex multicellularity has evolved several times in eukaryotes; yet, in most lineages, an investigation of its molecular background is considerably challenging since the transition occurred too far in the past and, in addition, these lineages evolved a large number of cell types. However, for volvocine green algae, such as Volvox carteri, multicellularity is a relatively recent innovation. Furthermore, V. carteri shows a complete division of labor between only two cell types – small, flagellated somatic cells and large, immotile reproductive cells. Thus, V. carteri provides a unique opportunity to study multicellularity and cellular differentiation at the molecular level. Results This study provides a whole transcriptome RNA-Seq analysis of separated cell types of the multicellular green alga V. carteri f. nagariensis to reveal cell type-specific components and functions. To this end, 246 million quality filtered reads were mapped to the genome and valid expression data were obtained for 93% of the 14,247 gene loci. In the subsequent search for protein domains with assigned molecular function, we identified 9435 previously classified domains in 44% of all gene loci. Furthermore, in 43% of all gene loci we identified 15,254 domains that are involved in biological processes. All identified domains were investigated regarding cell type-specific expression. Moreover, we provide further insight into the expression pattern of previously described gene families (e.g., pherophorin, extracellular matrix metalloprotease, and VARL families). Our results demonstrate an extensive compartmentalization of the transcriptome between cell types: More than half of all genes show a clear difference in expression between somatic and reproductive cells. Conclusions This study constitutes the first transcriptome-wide RNA-Seq analysis of separated cell types of V. carteri focusing on gene expression. The high degree of differential expression indicates a strong differentiation of cell types despite the fact that V. carteri diverged relatively recently from its unicellular relatives. Our expression dataset and the bioinformatic analyses provide the opportunity to further investigate and understand the mechanisms of cell type-specific expression and its transcriptional regulation. Electronic supplementary material The online version of this article (doi:10.1186/s12915-017-0450-y) contains supplementary material, which is available to authorized users.
Genome-wide association study identifies multiple susceptibility loci for diffuse large B-cell lymphoma
Exploring the miRNA-mRNA Regulatory Network in Clear Cell Renal Cell Carcinomas by Next-Generation Sequencing Expression Profiles
Altered microRNA (miRNA) expression is a hallmark of many cancer types. The combined analysis of miRNA and messenger RNA (mRNA) expression profiles is crucial to identifying links between deregulated miRNAs and oncogenic pathways. Therefore, we investigated the small non-coding (snc) transcriptomes of nine clear cell renal cell carcinomas (ccRCCs) and adjacent normal tissues for alterations in miRNA expression using a publicly available small RNA-Sequencing (sRNA-Seq) raw-dataset. We constructed a network of deregulated miRNAs and a set of differentially expressed genes publicly available from an independent study to in silico determine miRNAs that contribute to clear cell renal cell carcinogenesis. From a total of 1,672 sncRNAs, 61 were differentially expressed across all ccRCC tissue samples. Several with known implications in ccRCC development, like the upregulated miR-21-5p, miR-142-5p, as well as the downregulated miR-106a-5p, miR-135a-5p, or miR-206. Additionally, novel promising candidates like miR-3065, which i.a. targets NRP2 and FLT1, were detected in this study. Interaction network analysis revealed pivotal roles for miR-106a-5p, whose loss might contribute to the upregulation of 49 target mRNAs, miR-135a-5p (32 targets), miR-206 (28 targets), miR-363-3p (22 targets), and miR-216b (13 targets). Among these targets are the angiogenesis, metastasis, and motility promoting oncogenes c-MET, VEGFA, NRP2, and FLT1, the latter two coding for VEGFA receptors. Altered microRNA (miRNA) expression is a hallmark of many cancer types. The combined analysis of miRNA and messenger RNA (mRNA) expression profiles is crucial to identifying links between deregulated miRNAs and oncogenic pathways. Therefore, we investigated the small non-coding (snc) transcriptomes of nine clear cell renal cell carcinomas (ccRCCs) and adjacent normal tissues for alterations in miRNA expression using a publicly available small RNA-Sequencing (sRNA-Seq) raw-dataset. We constructed a network of deregulated miRNAs and a set of differentially expressed genes publicly available from an independent study to in silico determine miRNAs that contribute to clear cell renal cell carcinogenesis. From a total of 1,672 sncRNAs, 61 were differentially expressed across all ccRCC tissue samples. Several with known implications in ccRCC development, like the upregulated miR-21-5p, miR-142-5p, as well as the downregulated miR-106a-5p, miR-135a-5p, or miR-206. Additionally, novel promising candidates like miR-3065, which i.a. targets NRP2 and FLT1, were detected in this study. Interaction network analysis revealed pivotal roles for miR-106a-5p, whose loss might contribute to the upregulation of 49 target mRNAs, miR-135a-5p (32 targets), miR-206 (28 targets), miR-363-3p (22 targets), and miR-216b (13 targets). Among these targets are the angiogenesis, metastasis, and motility promoting oncogenes c-MET, VEGFA, NRP2, and FLT1, the latter two coding for VEGFA receptors.
Characterization and genomic analysis of kraft lignin biodegradation by the beta-proteobacterium Cupriavidus basilensis B-8
Background Lignin materials are abundant and among the most important potential sources for biofuel production. Development of an efficient lignin degradation process has considerable potential for the production of a variety of chemicals, including bioethanol. However, lignin degradation using current methods is inefficient. Given their immense environmental adaptability and biochemical versatility, bacterial could be used as a valuable tool for the rapid degradation of lignin. Kraft lignin (KL) is a polymer by-product of the pulp and paper industry resulting from alkaline sulfide treatment of lignocellulose, and it has been widely used for lignin-related studies. Results Beta-proteobacterium Cupriavidus basilensis B-8 isolated from erosive bamboo slips displayed substantial KL degradation capability. With initial concentrations of 0.5–6 g L-1, at least 31.3% KL could be degraded in 7 days. The maximum degradation rate was 44.4% at the initial concentration of 2 g L-1. The optimum pH and temperature for KL degradation were 7.0 and 30°C, respectively. Manganese peroxidase (MnP) and laccase (Lac) demonstrated their greatest level of activity, 1685.3 U L-1 and 815.6 U L-1, at the third and fourth days, respectively. Many small molecule intermediates were formed during the process of KL degradation, as determined using GC-MS analysis. In order to perform metabolic reconstruction of lignin degradation in this bacterium, a draft genome sequence for C. basilensis B-8 was generated. Genomic analysis focused on the catabolic potential of this bacterium against several lignin-derived compounds. These analyses together with sequence comparisons predicted the existence of three major metabolic pathways: β-ketoadipate, phenol degradation, and gentisate pathways. Conclusion These results confirmed the capability of C. basilensis B-8 to promote KL degradation. Whole genomic sequencing and systematic analysis of the C. basilensis B-8 genome identified degradation steps and intermediates from this bacterial-mediated KL degradation method. Our findings provide a theoretical basis for research into the mechanisms of lignin degradation as well as a practical basis for biofuel production using lignin materials. Background Lignin materials are abundant and among the most important potential sources for biofuel production. Development of an efficient lignin degradation process has considerable potential for the production of a variety of chemicals, including bioethanol. However, lignin degradation using current methods is inefficient. Given their immense environmental adaptability and biochemical versatility, bacterial could be used as a valuable tool for the rapid degradation of lignin. Kraft lignin (KL) is a polymer by-product of the pulp and paper industry resulting from alkaline sulfide treatment of lignocellulose, and it has been widely used for lignin-related studies. Results Beta-proteobacterium Cupriavidus basilensis B-8 isolated from erosive bamboo slips displayed substantial KL degradation capability. With initial concentrations of 0.5–6 g L-1, at least 31.3% KL could be degraded in 7 days. The maximum degradation rate was 44.4% at the initial concentration of 2 g L-1. The optimum pH and temperature for KL degradation were 7.0 and 30°C, respectively. Manganese peroxidase (MnP) and laccase (Lac) demonstrated their greatest level of activity, 1685.3 U L-1 and 815.6 U L-1, at the third and fourth days, respectively. Many small molecule intermediates were formed during the process of KL degradation, as determined using GC-MS analysis. In order to perform metabolic reconstruction of lignin degradation in this bacterium, a draft genome sequence for C. basilensis B-8 was generated. Genomic analysis focused on the catabolic potential of this bacterium against several lignin-derived compounds. These analyses together with sequence comparisons predicted the existence of three major metabolic pathways: β-ketoadipate, phenol degradation, and gentisate pathways. Conclusion These results confirmed the capability of C. basilensis B-8 to promote KL degradation. Whole genomic sequencing and systematic analysis of the C. basilensis B-8 genome identified degradation steps and intermediates from this bacterial-mediated KL degradation method. Our findings provide a theoretical basis for research into the mechanisms of lignin degradation as well as a practical basis for biofuel production using lignin materials.
Complete Genome Sequence of the Myoviral Bacteriophage YS35, Which Causes the Lysis of a Multidrug-Resistant Pseudomonas aeruginosa Strain
ABSTRACT The lytic bacteriophage YS35, which is capable of lysing multidrug-resistant Pseudomonas aeruginosa strains, was isolated from a sewage sample. Here, we describe the complete genome sequence of this myoviral bacteriophage, which contains 93,296 bp of double-stranded DNA and has a 49.4% G+C content. ABSTRACT The lytic bacteriophage YS35, which is capable of lysing multidrug-resistant Pseudomonas aeruginosa strains, was isolated from a sewage sample. Here, we describe the complete genome sequence of this myoviral bacteriophage, which contains 93,296 bp of double-stranded DNA and has a 49.4% G+C content.
The Adaptation of Temperate Bacteriophages to Their Host Genomes
Rapid turnover of mobile elements drives the plasticity of bacterial genomes. Integrated bacteriophages (prophages) encode host-adaptive traits and represent a sizable fraction of bacterial chromosomes. We hypothesized that natural selection shapes prophage integration patterns relative to the host genome organization. We tested this idea by detecting and studying 500 prophages of 69 strains of Escherichia and Salmonella. Phage integrases often target not only conserved genes but also intergenic positions, suggesting purifying selection for integration sites. Furthermore, most integration hotspots are conserved between the two host genera. Integration sites seem also selected at the large chromosomal scale, as they are nonrandomly organized in terms of the origin–terminus axis and the macrodomain structure. The genes of lambdoid prophages are systematically co-oriented with the bacterial replication fork and display the host high frequency of polarized FtsK-orienting polar sequences motifs required for chromosome segregation. matS motifs are strongly avoided by prophages suggesting counter selection of motifs disrupting macrodomains. These results show how natural selection for seamless integration of prophages in the chromosome shapes the evolution of the bacterium and the phage. First, integration sites are highly conserved for many millions of years favoring lysogeny over the lytic cycle for temperate phages. Second, the global distribution of prophages is intimately associated with the chromosome structure and the patterns of gene expression. Third, the phage endures selection for DNA motifs that pertain exclusively to the biology of the prophage in the bacterial chromosome. Understanding prophage genetic adaptation sheds new lights on the coexistence of horizontal transfer and organized bacterial genomes. Rapid turnover of mobile elements drives the plasticity of bacterial genomes. Integrated bacteriophages (prophages) encode host-adaptive traits and represent a sizable fraction of bacterial chromosomes. We hypothesized that natural selection shapes prophage integration patterns relative to the host genome organization. We tested this idea by detecting and studying 500 prophages of 69 strains of Escherichia and Salmonella. Phage integrases often target not only conserved genes but also intergenic positions, suggesting purifying selection for integration sites. Furthermore, most integration hotspots are conserved between the two host genera. Integration sites seem also selected at the large chromosomal scale, as they are nonrandomly organized in terms of the origin–terminus axis and the macrodomain structure. The genes of lambdoid prophages are systematically co-oriented with the bacterial replication fork and display the host high frequency of polarized FtsK-orienting polar sequences motifs required for chromosome segregation. matS motifs are strongly avoided by prophages suggesting counter selection of motifs disrupting macrodomains. These results show how natural selection for seamless integration of prophages in the chromosome shapes the evolution of the bacterium and the phage. First, integration sites are highly conserved for many millions of years favoring lysogeny over the lytic cycle for temperate phages. Second, the global distribution of prophages is intimately associated with the chromosome structure and the patterns of gene expression. Third, the phage endures selection for DNA motifs that pertain exclusively to the biology of the prophage in the bacterial chromosome. Understanding prophage genetic adaptation sheds new lights on the coexistence of horizontal transfer and organized bacterial genomes.
Convergent Evolution of Chicken Z and Human X Chromosomes by Expansion and Gene Acquisition
In birds, as in mammals, one pair of chromosomes differs between the sexes. In birds, males are ZZ and females ZW. In mammals, males are XY and females XX. Like the mammalian XY pair, the avian ZW pair is believed to have evolved from autosomes, with most change occurring in the chromosomes found in only one sex – the W and Y chromosomes1–5. By contrast, the sex chromosomes found in both sexes – the Z and X chromosomes – are assumed to have diverged little from their autosomal progenitors2. Here we report findings that overturn this assumption for both the chicken Z and human X chromosomes. The chicken Z chromosome, which we sequenced essentially to completion, is less gene-dense than chicken autosomes but contains a massive tandem array containing hundreds of duplicated genes expressed in testes. A comprehensive comparison of the chicken Z chromosome to the finished sequence of the human X chromosome demonstrates that each evolved independently from different portions of the ancestral genome. Despite this independence, the chicken Z and human X chromosomes share features that distinguish them from autosomes: the acquisition and amplification of testis-expressed genes, as well as a low gene density resulting from an expansion of intergenic regions. These features were not present on the autosomes from which the Z and X chromosomes originated but were instead acquired during the evolution of the Z and X as sex chromosomes. We conclude that the avian Z and mammalian X chromosomes followed convergent evolutionary trajectories, despite their evolving with opposite (female vs. male) systems of heterogamety. More broadly, in birds and mammals, sex chromosome evolution involved not only gene loss in sex-specific chromosomes, but also marked expansion and gene acquisition in sex chromosomes common to males and females. In birds, as in mammals, one pair of chromosomes differs between the sexes. In birds, males are ZZ and females ZW. In mammals, males are XY and females XX. Like the mammalian XY pair, the avian ZW pair is believed to have evolved from autosomes, with most change occurring in the chromosomes found in only one sex – the W and Y chromosomes1–5. By contrast, the sex chromosomes found in both sexes – the Z and X chromosomes – are assumed to have diverged little from their autosomal progenitors2. Here we report findings that overturn this assumption for both the chicken Z and human X chromosomes. The chicken Z chromosome, which we sequenced essentially to completion, is less gene-dense than chicken autosomes but contains a massive tandem array containing hundreds of duplicated genes expressed in testes. A comprehensive comparison of the chicken Z chromosome to the finished sequence of the human X chromosome demonstrates that each evolved independently from different portions of the ancestral genome. Despite this independence, the chicken Z and human X chromosomes share features that distinguish them from autosomes: the acquisition and amplification of testis-expressed genes, as well as a low gene density resulting from an expansion of intergenic regions. These features were not present on the autosomes from which the Z and X chromosomes originated but were instead acquired during the evolution of the Z and X as sex chromosomes. We conclude that the avian Z and mammalian X chromosomes followed convergent evolutionary trajectories, despite their evolving with opposite (female vs. male) systems of heterogamety. More broadly, in birds and mammals, sex chromosome evolution involved not only gene loss in sex-specific chromosomes, but also marked expansion and gene acquisition in sex chromosomes common to males and females.