Features
Company
Pricing
Variant discovery in the sheep milk transcriptome using RNA sequencing
Background The identification of genetic variation underlying desired phenotypes is one of the main challenges of current livestock genetic research. High-throughput transcriptome sequencing (RNA-Seq) offers new opportunities for the detection of transcriptome variants (SNPs and short indels) in different tissues and species. In this study, we used RNA-Seq on Milk Sheep Somatic Cells (MSCs) with the goal of characterizing the genetic variation within the coding regions of the milk transcriptome in Churra and Assaf sheep, two common dairy sheep breeds farmed in Spain. Results A total of 216,637 variants were detected in the MSCs transcriptome of the eight ewes analyzed. Among them, a total of 57,795 variants were detected in the regions harboring Quantitative Trait Loci (QTL) for milk yield, protein percentage and fat percentage, of which 21.44% were novel variants. Among the total variants detected, 561 (2.52%) and 1,649 (7.42%) were predicted to produce high or moderate impact changes in the corresponding transcriptional unit, respectively. In the functional enrichment analysis of the genes positioned within selected QTL regions harboring novel relevant functional variants (high and moderate impact), the KEGG pathway with the highest enrichment was “protein processing in endoplasmic reticulum”. Additionally, a total of 504 and 1,063 variants were identified in the genes encoding principal milk proteins and molecules involved in the lipid metabolism, respectively. Of these variants, 20 mutations were found to have putative relevant effects on the encoded proteins. Conclusions We present herein the first transcriptomic approach aimed at identifying genetic variants of the genes expressed in the lactating mammary gland of sheep. Through the transcriptome analysis of variability within regions harboring QTL for milk yield, protein percentage and fat percentage, we have found several pathways and genes that harbor mutations that could affect dairy production traits. Moreover, remarkable variants were also found in candidate genes coding for major milk proteins and proteins related to milk fat metabolism. Several of the SNPs found in this study could be included as suitable markers in genotyping platforms or custom SNP arrays to perform association analyses in commercial populations and apply genomic selection protocols in the dairy production industry. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3581-1) contains supplementary material, which is available to authorized users. Background The identification of genetic variation underlying desired phenotypes is one of the main challenges of current livestock genetic research. High-throughput transcriptome sequencing (RNA-Seq) offers new opportunities for the detection of transcriptome variants (SNPs and short indels) in different tissues and species. In this study, we used RNA-Seq on Milk Sheep Somatic Cells (MSCs) with the goal of characterizing the genetic variation within the coding regions of the milk transcriptome in Churra and Assaf sheep, two common dairy sheep breeds farmed in Spain. Results A total of 216,637 variants were detected in the MSCs transcriptome of the eight ewes analyzed. Among them, a total of 57,795 variants were detected in the regions harboring Quantitative Trait Loci (QTL) for milk yield, protein percentage and fat percentage, of which 21.44% were novel variants. Among the total variants detected, 561 (2.52%) and 1,649 (7.42%) were predicted to produce high or moderate impact changes in the corresponding transcriptional unit, respectively. In the functional enrichment analysis of the genes positioned within selected QTL regions harboring novel relevant functional variants (high and moderate impact), the KEGG pathway with the highest enrichment was “protein processing in endoplasmic reticulum”. Additionally, a total of 504 and 1,063 variants were identified in the genes encoding principal milk proteins and molecules involved in the lipid metabolism, respectively. Of these variants, 20 mutations were found to have putative relevant effects on the encoded proteins. Conclusions We present herein the first transcriptomic approach aimed at identifying genetic variants of the genes expressed in the lactating mammary gland of sheep. Through the transcriptome analysis of variability within regions harboring QTL for milk yield, protein percentage and fat percentage, we have found several pathways and genes that harbor mutations that could affect dairy production traits. Moreover, remarkable variants were also found in candidate genes coding for major milk proteins and proteins related to milk fat metabolism. Several of the SNPs found in this study could be included as suitable markers in genotyping platforms or custom SNP arrays to perform association analyses in commercial populations and apply genomic selection protocols in the dairy production industry. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3581-1) contains supplementary material, which is available to authorized users.
Comparative analysis of a sex chromosome from the blackchin tilapia, Sarotherodon melanotheron
Background Inversions and other structural polymorphisms often reduce the rate of recombination between sex chromosomes, making it impossible to fine map sex-determination loci using traditional genetic mapping techniques. Here we compare distantly related species of tilapia that each segregate an XY system of sex-determination on linkage group 1. We use whole genome sequencing to identify shared sex-patterned polymorphisms, which are candidates for the ancestral sex-determination mutation. Results We found that Sarotherodon melanotheron segregates an XY system on LG1 in the same region identified in Oreochromis niloticus. Both species have higher densities of sex-patterned SNPs, as well as elevated number of ancestral copy number variants in this region when compared to the rest of the genome, but the pattern of differentiation along LG1 differs between species. The number of sex-patterned SNPs shared by the two species is small, but larger than expected by chance, suggesting that a novel Y-chromosome arose just before the divergence of the two species. We identified a shared sex-patterned SNP that alters a Gata4 binding site near Wilms tumor protein that might be responsible for sex-determination. Conclusions Shared sex-patterned SNPs, insertions and deletions suggest an ancestral sex-determination system that is common to both S. melanotheron and O. niloticus. Functional analyses are needed to evaluate shared SNPs near candidate genes that might play a role in sex-determination of these species. Interspecific variation in the sex chromosomes of tilapia species provides an excellent model system for understanding the evolution of vertebrate sex chromosomes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3163-7) contains supplementary material, which is available to authorized users. Background Inversions and other structural polymorphisms often reduce the rate of recombination between sex chromosomes, making it impossible to fine map sex-determination loci using traditional genetic mapping techniques. Here we compare distantly related species of tilapia that each segregate an XY system of sex-determination on linkage group 1. We use whole genome sequencing to identify shared sex-patterned polymorphisms, which are candidates for the ancestral sex-determination mutation. Results We found that Sarotherodon melanotheron segregates an XY system on LG1 in the same region identified in Oreochromis niloticus. Both species have higher densities of sex-patterned SNPs, as well as elevated number of ancestral copy number variants in this region when compared to the rest of the genome, but the pattern of differentiation along LG1 differs between species. The number of sex-patterned SNPs shared by the two species is small, but larger than expected by chance, suggesting that a novel Y-chromosome arose just before the divergence of the two species. We identified a shared sex-patterned SNP that alters a Gata4 binding site near Wilms tumor protein that might be responsible for sex-determination. Conclusions Shared sex-patterned SNPs, insertions and deletions suggest an ancestral sex-determination system that is common to both S. melanotheron and O. niloticus. Functional analyses are needed to evaluate shared SNPs near candidate genes that might play a role in sex-determination of these species. Interspecific variation in the sex chromosomes of tilapia species provides an excellent model system for understanding the evolution of vertebrate sex chromosomes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3163-7) contains supplementary material, which is available to authorized users.
MME mutation in dominant spinocerebellar ataxia with neuropathy (SCA43)
Objective: To identify the causative gene mutation in a 5-generation Belgian family with dominantly inherited spinocerebellar ataxia and polyneuropathy, in which known genetic etiologies had been excluded. Methods: We collected DNA samples of 28 family members, including 7 living affected individuals, whose clinical records were reviewed by a neurologist experienced in ataxia. We combined linkage data of 21 family members with whole exome sequencing in 2 affected individuals to identify shared heterozygous variants mapping to potentially linked regions. Variants were screened for rarity and for predicted damaging effect. A candidate mutation was confirmed by Sanger sequencing and tested for cosegregation with the disease. Results: Affected individuals presented with late-onset sensorimotor axonal polyneuropathy; all but one also had cerebellar ataxia. We identified a variant in the MME gene, p.C143Y, that was absent from control databases, cosegregated with the phenotype, and was predicted to have a strong damaging effect on the encoded protein by all algorithms we used. Conclusions: MME encodes neprilysin (NEP), a zinc-dependent metalloprotease expressed in most tissues, including the central and peripheral nervous systems. The mutated cysteine 143 forms a disulfide bridge, which is 100% conserved in NEP and in similar enzymes. The recent identification of recessive MME mutations in 10 unrelated individuals from Japan with axonal polyneuropathy further supports the causality of the mutation, despite the dominant mode of inheritance and the presence of cerebellar involvement in our study family. Functional studies are needed to identify the mechanisms underlying these differences. Objective: To identify the causative gene mutation in a 5-generation Belgian family with dominantly inherited spinocerebellar ataxia and polyneuropathy, in which known genetic etiologies had been excluded. Methods: We collected DNA samples of 28 family members, including 7 living affected individuals, whose clinical records were reviewed by a neurologist experienced in ataxia. We combined linkage data of 21 family members with whole exome sequencing in 2 affected individuals to identify shared heterozygous variants mapping to potentially linked regions. Variants were screened for rarity and for predicted damaging effect. A candidate mutation was confirmed by Sanger sequencing and tested for cosegregation with the disease. Results: Affected individuals presented with late-onset sensorimotor axonal polyneuropathy; all but one also had cerebellar ataxia. We identified a variant in the MME gene, p.C143Y, that was absent from control databases, cosegregated with the phenotype, and was predicted to have a strong damaging effect on the encoded protein by all algorithms we used. Conclusions: MME encodes neprilysin (NEP), a zinc-dependent metalloprotease expressed in most tissues, including the central and peripheral nervous systems. The mutated cysteine 143 forms a disulfide bridge, which is 100% conserved in NEP and in similar enzymes. The recent identification of recessive MME mutations in 10 unrelated individuals from Japan with axonal polyneuropathy further supports the causality of the mutation, despite the dominant mode of inheritance and the presence of cerebellar involvement in our study family. Functional studies are needed to identify the mechanisms underlying these differences.
Rare variants of small effect size in neuronal excitability genes influence clinical outcome in Japanese cases of SCN1A truncation positive Dravet syndrome
Dravet syndrome (DS) is a rare, devastating form of childhood epilepsy that is often associated with mutations in the voltage-gated sodium channel gene, SCN1A. There is considerable variability in expressivity within families, as well as among individuals carrying the same primary mutation, suggesting that clinical outcome is modulated by variants at other genes. To identify modifier gene variants that contribute to clinical outcome, we sequenced the exomes of 22 individuals at both ends of a phenotype distribution (i.e., mild and severe cognitive condition). We controlled for variation associated with different mutation types by limiting inclusion to individuals with a de novo truncation mutation resulting in SCN1A haploinsufficiency. We performed tests aimed at identifying 1) single common variants that are enriched in either phenotypic group, 2) sets of common or rare variants aggregated in and around genes associated with clinical outcome, and 3) rare variants in 237 candidate genes associated with neuronal excitability. While our power to identify enrichment of a common variant in either phenotypic group is limited as a result of the rarity of mild phenotypes in individuals with SCN1A truncation variants, our top candidates did not map to functional regions of genes, or in genes that are known to be associated with neurological pathways. In contrast, we found a statistically-significant excess of rare variants predicted to be damaging and of small effect size in genes associated with neuronal excitability in severely affected individuals. A KCNQ2 variant previously associated with benign neonatal seizures is present in 3 of 12 individuals in the severe category. To compare our results with the healthy population, we performed a similar analysis on whole exome sequencing data from 70 Japanese individuals in the 1000 genomes project. Interestingly, the frequency of rare damaging variants in the same set of neuronal excitability genes in healthy individuals is nearly as high as in severely affected individuals. Rather than a single common gene/variant modifying clinical outcome in SCN1A-related epilepsies, our results point to the cumulative effect of rare variants with little to no measurable phenotypic effect (i.e., typical genetic background) unless present in combination with a disease-causing truncation mutation in SCN1A. Dravet syndrome (DS) is a rare, devastating form of childhood epilepsy that is often associated with mutations in the voltage-gated sodium channel gene, SCN1A. There is considerable variability in expressivity within families, as well as among individuals carrying the same primary mutation, suggesting that clinical outcome is modulated by variants at other genes. To identify modifier gene variants that contribute to clinical outcome, we sequenced the exomes of 22 individuals at both ends of a phenotype distribution (i.e., mild and severe cognitive condition). We controlled for variation associated with different mutation types by limiting inclusion to individuals with a de novo truncation mutation resulting in SCN1A haploinsufficiency. We performed tests aimed at identifying 1) single common variants that are enriched in either phenotypic group, 2) sets of common or rare variants aggregated in and around genes associated with clinical outcome, and 3) rare variants in 237 candidate genes associated with neuronal excitability. While our power to identify enrichment of a common variant in either phenotypic group is limited as a result of the rarity of mild phenotypes in individuals with SCN1A truncation variants, our top candidates did not map to functional regions of genes, or in genes that are known to be associated with neurological pathways. In contrast, we found a statistically-significant excess of rare variants predicted to be damaging and of small effect size in genes associated with neuronal excitability in severely affected individuals. A KCNQ2 variant previously associated with benign neonatal seizures is present in 3 of 12 individuals in the severe category. To compare our results with the healthy population, we performed a similar analysis on whole exome sequencing data from 70 Japanese individuals in the 1000 genomes project. Interestingly, the frequency of rare damaging variants in the same set of neuronal excitability genes in healthy individuals is nearly as high as in severely affected individuals. Rather than a single common gene/variant modifying clinical outcome in SCN1A-related epilepsies, our results point to the cumulative effect of rare variants with little to no measurable phenotypic effect (i.e., typical genetic background) unless present in combination with a disease-causing truncation mutation in SCN1A.
A de novo missense mutation of FGFR2 causes facial dysplasia syndrome in Holstein cattle
Background Surveillance for bovine genetic diseases in Denmark identified a hitherto unreported congenital syndrome occurring among progeny of a Holstein sire used for artificial breeding. A genetic aetiology due to a dominant inheritance with incomplete penetrance or a mosaic germline mutation was suspected as all recorded cases were progeny of the same sire. Detailed investigations were performed to characterize the syndrome and to reveal its cause. Results Seven malformed calves were submitted examination. All cases shared a common morphology with the most striking lesions being severe facial dysplasia and complete prolapse of the eyes. Consequently the syndrome was named facial dysplasia syndrome (FDS). Furthermore, extensive brain malformations, including microencephaly, hydrocephalus, lobation of the cerebral hemispheres and compression of the brain were present. Subsequent data analysis of progeny of the sire revealed that around 0.5% of his offspring suffered from FDS. High density single nucleotide polymorphism (SNP) genotyping data of the seven cases and their parents were used to map the defect in the bovine genome. Significant genetic linkage was obtained for three regions, including chromosome 26 where whole genome sequencing of a case-parent trio revealed two de novo variants perfectly associated with the disease: an intronic SNP in the DMBT1 gene and a single non-synonymous variant in the FGFR2 gene. This FGFR2 missense variant (c.927G>T) affects a gene encoding a member of the fibroblast growth factor receptor family, where amino acid sequence is highly conserved between members and across species. It is predicted to change an evolutionary conserved tryptophan into a cysteine residue (p.Trp309Cys). Both variant alleles were proven to result from de novo mutation events in the germline of the sire. Conclusions FDS is a novel genetic disorder of Holstein cattle. Mutations in the human FGFR2 gene are associated with various dominant inherited craniofacial dysostosis syndromes. Given the phenotypic similarities in FDS affected calves, the genetic mapping and absence of further high impact variants in the critical genome regions, it is highly likely that the missense mutation in the FGFR2 gene caused the FDS phenotype in a dominant mode of inheritance. Electronic supplementary material The online version of this article (doi:10.1186/s12863-017-0541-3) contains supplementary material, which is available to authorized users. Background Surveillance for bovine genetic diseases in Denmark identified a hitherto unreported congenital syndrome occurring among progeny of a Holstein sire used for artificial breeding. A genetic aetiology due to a dominant inheritance with incomplete penetrance or a mosaic germline mutation was suspected as all recorded cases were progeny of the same sire. Detailed investigations were performed to characterize the syndrome and to reveal its cause. Results Seven malformed calves were submitted examination. All cases shared a common morphology with the most striking lesions being severe facial dysplasia and complete prolapse of the eyes. Consequently the syndrome was named facial dysplasia syndrome (FDS). Furthermore, extensive brain malformations, including microencephaly, hydrocephalus, lobation of the cerebral hemispheres and compression of the brain were present. Subsequent data analysis of progeny of the sire revealed that around 0.5% of his offspring suffered from FDS. High density single nucleotide polymorphism (SNP) genotyping data of the seven cases and their parents were used to map the defect in the bovine genome. Significant genetic linkage was obtained for three regions, including chromosome 26 where whole genome sequencing of a case-parent trio revealed two de novo variants perfectly associated with the disease: an intronic SNP in the DMBT1 gene and a single non-synonymous variant in the FGFR2 gene. This FGFR2 missense variant (c.927G>T) affects a gene encoding a member of the fibroblast growth factor receptor family, where amino acid sequence is highly conserved between members and across species. It is predicted to change an evolutionary conserved tryptophan into a cysteine residue (p.Trp309Cys). Both variant alleles were proven to result from de novo mutation events in the germline of the sire. Conclusions FDS is a novel genetic disorder of Holstein cattle. Mutations in the human FGFR2 gene are associated with various dominant inherited craniofacial dysostosis syndromes. Given the phenotypic similarities in FDS affected calves, the genetic mapping and absence of further high impact variants in the critical genome regions, it is highly likely that the missense mutation in the FGFR2 gene caused the FDS phenotype in a dominant mode of inheritance. Electronic supplementary material The online version of this article (doi:10.1186/s12863-017-0541-3) contains supplementary material, which is available to authorized users.
Mutations in the Plasmodium falciparum chloroquine resistance transporter, PfCRT, enlarge the parasite’s food vacuole and alter drug sensitivities
Mutations in the Plasmodium falciparum chloroquine resistance transporter, PfCRT, are the major determinant of chloroquine resistance in this lethal human malaria parasite. Here, we describe P. falciparum lines subjected to selection by amantadine or blasticidin that carry PfCRT mutations (C101F or L272F), causing the development of enlarged food vacuoles. These parasites also have increased sensitivity to chloroquine and some other quinoline antimalarials, but exhibit no or minimal change in sensitivity to artemisinins, when compared with parental strains. A transgenic parasite line expressing the L272F variant of PfCRT confirmed this increased chloroquine sensitivity and enlarged food vacuole phenotype. Furthermore, the introduction of the C101F or L272F mutation into a chloroquine-resistant variant of PfCRT reduced the ability of this protein to transport chloroquine by approximately 93 and 82%, respectively, when expressed in Xenopus oocytes. These data provide, at least in part, a mechanistic explanation for the increased sensitivity of the mutant parasite lines to chloroquine. Taken together, these findings provide new insights into PfCRT function and PfCRT-mediated drug resistance, as well as the food vacuole, which is an important target of many antimalarial drugs. Mutations in the Plasmodium falciparum chloroquine resistance transporter, PfCRT, are the major determinant of chloroquine resistance in this lethal human malaria parasite. Here, we describe P. falciparum lines subjected to selection by amantadine or blasticidin that carry PfCRT mutations (C101F or L272F), causing the development of enlarged food vacuoles. These parasites also have increased sensitivity to chloroquine and some other quinoline antimalarials, but exhibit no or minimal change in sensitivity to artemisinins, when compared with parental strains. A transgenic parasite line expressing the L272F variant of PfCRT confirmed this increased chloroquine sensitivity and enlarged food vacuole phenotype. Furthermore, the introduction of the C101F or L272F mutation into a chloroquine-resistant variant of PfCRT reduced the ability of this protein to transport chloroquine by approximately 93 and 82%, respectively, when expressed in Xenopus oocytes. These data provide, at least in part, a mechanistic explanation for the increased sensitivity of the mutant parasite lines to chloroquine. Taken together, these findings provide new insights into PfCRT function and PfCRT-mediated drug resistance, as well as the food vacuole, which is an important target of many antimalarial drugs.
Mycobacterium leprae genomes from naturally infected nonhuman primates
Author summary Mycobacterium leprae, which causes leprosy in humans, also infects nine-banded armadillos, red squirrels, and nonhuman primates. Genomic data for M. leprae strains from wild armadillos and red squirrels show that humans were responsible for the original introduction of M. leprae to these species. It is not known whether naturally occurring leprosy among nonhuman primates is due to incidental infections from humans or whether nonhuman primates can serve as a host for M. leprae. To this end, we sequenced complete genomes of M. leprae strains from three naturally infected nonhuman primates. Our results suggest that M. leprae strains can be transmitted from humans to nonhuman primates as well as between nonhuman primate species, and thus, other primates might serve as a host for M. leprae in the wild. We also assessed whether wild ring-tailed lemurs from Madagascar and chimpanzees from Uganda showed presence of M. leprae infection. Although these populations tested negative for M. leprae infection, further research on the prevalence of M. leprae in other wild nonhuman primate populations, especially in leprosy-endemic regions, is warranted. Author summary Mycobacterium leprae, which causes leprosy in humans, also infects nine-banded armadillos, red squirrels, and nonhuman primates. Genomic data for M. leprae strains from wild armadillos and red squirrels show that humans were responsible for the original introduction of M. leprae to these species. It is not known whether naturally occurring leprosy among nonhuman primates is due to incidental infections from humans or whether nonhuman primates can serve as a host for M. leprae. To this end, we sequenced complete genomes of M. leprae strains from three naturally infected nonhuman primates. Our results suggest that M. leprae strains can be transmitted from humans to nonhuman primates as well as between nonhuman primate species, and thus, other primates might serve as a host for M. leprae in the wild. We also assessed whether wild ring-tailed lemurs from Madagascar and chimpanzees from Uganda showed presence of M. leprae infection. Although these populations tested negative for M. leprae infection, further research on the prevalence of M. leprae in other wild nonhuman primate populations, especially in leprosy-endemic regions, is warranted.Leprosy is caused by the bacterial pathogens Mycobacterium leprae and Mycobacterium lepromatosis. Apart from humans, animals such as nine-banded armadillos in the Americas and red squirrels in the British Isles are naturally infected with M. leprae. Natural leprosy has also been reported in certain nonhuman primates, but it is not known whether these occurrences are due to incidental infections by human M. leprae strains or by M. leprae strains specific to nonhuman primates. In this study, complete M. leprae genomes from three naturally infected nonhuman primates (a chimpanzee from Sierra Leone, a sooty mangabey from West Africa, and a cynomolgus macaque from The Philippines) were sequenced. Phylogenetic analyses showed that the cynomolgus macaque M. leprae strain is most closely related to a human M. leprae strain from New Caledonia, whereas the chimpanzee and sooty mangabey M. leprae strains belong to a human M. leprae lineage commonly found in West Africa. Additionally, samples from ring-tailed lemurs from the Bezà Mahafaly Special Reserve, Madagascar, and chimpanzees from Ngogo, Kibale National Park, Uganda, were screened using quantitative PCR assays, to assess the prevalence of M. leprae in wild nonhuman primates. However, these samples did not show evidence of M. leprae infection. Overall, this study adds genomic data for nonhuman primate M. leprae strains to the existing M. leprae literature and finds that this pathogen can be transmitted from humans to nonhuman primates as well as between nonhuman primate species. While the prevalence of natural leprosy in nonhuman primates is likely low, nevertheless, future studies should continue to explore the prevalence of leprosy-causing pathogens in the wild. Leprosy is caused by the bacterial pathogens Mycobacterium leprae and Mycobacterium lepromatosis. Apart from humans, animals such as nine-banded armadillos in the Americas and red squirrels in the British Isles are naturally infected with M. leprae. Natural leprosy has also been reported in certain nonhuman primates, but it is not known whether these occurrences are due to incidental infections by human M. leprae strains or by M. leprae strains specific to nonhuman primates. In this study, complete M. leprae genomes from three naturally infected nonhuman primates (a chimpanzee from Sierra Leone, a sooty mangabey from West Africa, and a cynomolgus macaque from The Philippines) were sequenced. Phylogenetic analyses showed that the cynomolgus macaque M. leprae strain is most closely related to a human M. leprae strain from New Caledonia, whereas the chimpanzee and sooty mangabey M. leprae strains belong to a human M. leprae lineage commonly found in West Africa. Additionally, samples from ring-tailed lemurs from the Bezà Mahafaly Special Reserve, Madagascar, and chimpanzees from Ngogo, Kibale National Park, Uganda, were screened using quantitative PCR assays, to assess the prevalence of M. leprae in wild nonhuman primates. However, these samples did not show evidence of M. leprae infection. Overall, this study adds genomic data for nonhuman primate M. leprae strains to the existing M. leprae literature and finds that this pathogen can be transmitted from humans to nonhuman primates as well as between nonhuman primate species. While the prevalence of natural leprosy in nonhuman primates is likely low, nevertheless, future studies should continue to explore the prevalence of leprosy-causing pathogens in the wild.
RNA Seq analysis identifies genes associated with differential reproductive success under drought stress in accessions of wild barley Hordeum spontaneum
Background The evolutionary basis of reproductive success in different environments is of major interest in the study of plant adaptation. Since the reproductive stage is particularly sensitive to drought, genes affecting reproductive success during this stage are key players in the evolution of adaptive mechanisms. We used an ecological genomics approach to investigate the reproductive response of drought-tolerant and sensitive wild barley accessions originating from different habitats in the Levant. Results We sequenced mRNA extracted from spikelets at the flowering stage in drought-treated and control plants. The barley genome was used for a reference-guided assembly and differential expression analysis. Our approach enabled to detect biological processes affecting grain production under drought stress. We detected novel candidate genes and differentially expressed alleles associated with drought tolerance. Drought associated genes were shown to be more conserved than non-associated genes, and drought-tolerance genes were found to evolve more rapidly than other drought associated genes. Conclusions We show that reproductive success under drought stress is not a habitat-specific trait but a shared physiological adaptation that appeared to evolve recently in the evolutionary history of wild barley. Exploring the genomic basis of reproductive success under stress in crop wild progenitors is expected to have considerable ecological and economical applications. Electronic supplementary material The online version of this article (doi:10.1186/s12870-015-0528-z) contains supplementary material, which is available to authorized users. Background The evolutionary basis of reproductive success in different environments is of major interest in the study of plant adaptation. Since the reproductive stage is particularly sensitive to drought, genes affecting reproductive success during this stage are key players in the evolution of adaptive mechanisms. We used an ecological genomics approach to investigate the reproductive response of drought-tolerant and sensitive wild barley accessions originating from different habitats in the Levant. Results We sequenced mRNA extracted from spikelets at the flowering stage in drought-treated and control plants. The barley genome was used for a reference-guided assembly and differential expression analysis. Our approach enabled to detect biological processes affecting grain production under drought stress. We detected novel candidate genes and differentially expressed alleles associated with drought tolerance. Drought associated genes were shown to be more conserved than non-associated genes, and drought-tolerance genes were found to evolve more rapidly than other drought associated genes. Conclusions We show that reproductive success under drought stress is not a habitat-specific trait but a shared physiological adaptation that appeared to evolve recently in the evolutionary history of wild barley. Exploring the genomic basis of reproductive success under stress in crop wild progenitors is expected to have considerable ecological and economical applications. Electronic supplementary material The online version of this article (doi:10.1186/s12870-015-0528-z) contains supplementary material, which is available to authorized users.
De novo derivation of proteomes from transcriptomes for transcript and protein identification
Identification of proteins by tandem mass spectrometry requires a database of the proteins that could be in the sample. This is available for model species (e.g. humans) but not for non-model species. Ideally, for a non-model species the sequencing of expressed mRNA would generate a protein database for mass spectrometry based identification, allowing detection of genes and proteins using high throughput sequencing and protein identification technologies. Here we use human cells infected with human adenovirus as a complex and dynamic model to demonstrate this approach is robust. Our Proteomics Informed by Transcriptomics technique identifies >99% of over 3700 distinct proteins identified using traditional analysis reliant on comprehensive human and adenovirus protein lists. This facilitates high throughput acquisition of direct evidence for transcripts and proteins in non-model species. Critically, we show this approach can also be used to highlight genes and proteins undergoing dynamic changes in post transcriptional protein stability. Identification of proteins by tandem mass spectrometry requires a database of the proteins that could be in the sample. This is available for model species (e.g. humans) but not for non-model species. Ideally, for a non-model species the sequencing of expressed mRNA would generate a protein database for mass spectrometry based identification, allowing detection of genes and proteins using high throughput sequencing and protein identification technologies. Here we use human cells infected with human adenovirus as a complex and dynamic model to demonstrate this approach is robust. Our Proteomics Informed by Transcriptomics technique identifies >99% of over 3700 distinct proteins identified using traditional analysis reliant on comprehensive human and adenovirus protein lists. This facilitates high throughput acquisition of direct evidence for transcripts and proteins in non-model species. Critically, we show this approach can also be used to highlight genes and proteins undergoing dynamic changes in post transcriptional protein stability.
Complete human CD1a deficiency on Langerhans cells due to a rare point mutation in the coding sequence
New Genetic Loci Associated with Preharvest Sprouting and Its Evaluation Based on the Model Equation in Rice
Preharvest sprouting (PHS) in rice panicles is an important quantitative trait that causes both yield losses and the deterioration of grain quality under unpredictable moisture conditions at the ripening stage. However, the molecular mechanism underlying PHS has not yet been elucidated. Here, we explored the genetic loci associated with PHS in rice and formulated a model regression equation for rapid screening for use in breeding programs. After re-sequencing 21 representative accessions for PHS and performing enrichment analysis, we found that approximately 20,000 SNPs revealed distinct allelic distributions between PHS resistant and susceptible accessions. Of these, 39 candidate SNP loci were selected, including previously reported QTLs. We analyzed the genotypes of 144 rice accessions to determine the association between PHS and the 39 candidate SNP loci, 10 of which were identified as significantly affecting PHS based on allele type. Based on the allele types of the SNP loci, we constructed a regression equation for evaluating PHS, accounting for an R2 value of 0.401 in japonica rice. We validated this equation using additional accessions, which exhibited a significant R2 value of 0.430 between the predicted values and actual measurements. The newly detected SNP loci and the model equation could facilitate marker-assisted selection to predict PHS in rice germplasm and breeding lines. Preharvest sprouting (PHS) in rice panicles is an important quantitative trait that causes both yield losses and the deterioration of grain quality under unpredictable moisture conditions at the ripening stage. However, the molecular mechanism underlying PHS has not yet been elucidated. Here, we explored the genetic loci associated with PHS in rice and formulated a model regression equation for rapid screening for use in breeding programs. After re-sequencing 21 representative accessions for PHS and performing enrichment analysis, we found that approximately 20,000 SNPs revealed distinct allelic distributions between PHS resistant and susceptible accessions. Of these, 39 candidate SNP loci were selected, including previously reported QTLs. We analyzed the genotypes of 144 rice accessions to determine the association between PHS and the 39 candidate SNP loci, 10 of which were identified as significantly affecting PHS based on allele type. Based on the allele types of the SNP loci, we constructed a regression equation for evaluating PHS, accounting for an R2 value of 0.401 in japonica rice. We validated this equation using additional accessions, which exhibited a significant R2 value of 0.430 between the predicted values and actual measurements. The newly detected SNP loci and the model equation could facilitate marker-assisted selection to predict PHS in rice germplasm and breeding lines.
Morgagnian cataract resulting from a naturally occurring nonsense mutation elucidates a role of CPAMD8 in mammalian lens development
To investigate the genetic basis of hereditary lens opacities we analyzed 31 cases of bilateral congenital cataract in Red Holstein Friesian cattle. A genome-wide association study revealed a significant association on bovine chromosome 7 at positions 6,166,179 and 12,429,691. Whole genome re-sequencing of one case and four relatives showed a nonsense mutation (g.5995966C>T) in the PZP-like, alpha-2-macroglobulin domain containing 8 (CPAMD8) gene leading to a premature stop codon (CPAMD8 p.Gln74*) associated with cataract development in cattle. With immunohistochemistry we confirmed a physiological expression of CPAMD8 in the ciliary body epithelium of the eye in unaffected cattle, while the protein was not detectable in the ciliary body of cattle with cataracts. RNA expression of CPAMD8 was detected in healthy adult, fetal and cataractous lenses. To investigate the genetic basis of hereditary lens opacities we analyzed 31 cases of bilateral congenital cataract in Red Holstein Friesian cattle. A genome-wide association study revealed a significant association on bovine chromosome 7 at positions 6,166,179 and 12,429,691. Whole genome re-sequencing of one case and four relatives showed a nonsense mutation (g.5995966C>T) in the PZP-like, alpha-2-macroglobulin domain containing 8 (CPAMD8) gene leading to a premature stop codon (CPAMD8 p.Gln74*) associated with cataract development in cattle. With immunohistochemistry we confirmed a physiological expression of CPAMD8 in the ciliary body epithelium of the eye in unaffected cattle, while the protein was not detectable in the ciliary body of cattle with cataracts. RNA expression of CPAMD8 was detected in healthy adult, fetal and cataractous lenses.
Whole genome re sequencing of two Italian tomato landraces reveals sequence variations in genes associated with stress tolerance, fruit quality and long shelf life traits
Abstract Tomato is a high value crop and the primary model for fleshy fruit development and ripening. Breeding priorities include increased fruit quality, shelf life and tolerance to stresses. To contribute towards this goal, we re-sequenced the genomes of Corbarino (COR) and Lucariello (LUC) landraces, which both possess the traits of plant adaptation to water deficit, prolonged fruit shelf-life and good fruit quality. Through the newly developed pipeline Reconstructor, we generated the genome sequences of COR and LUC using datasets of 65.8 M and 56.4 M of 30–150 bp paired-end reads, respectively. New contigs including reads that could not be mapped to the tomato reference genome were assembled, and a total of 43, 054 and 44, 579 gene loci were annotated in COR and LUC. Both genomes showed novel regions with similarity to Solanum pimpinellifolium and Solanum pennellii. In addition to small deletions and insertions, 2, 000 and 1, 700 single nucleotide polymorphisms (SNPs) could exert potentially disruptive effects on 1, 371 and 1, 201 genes in COR and LUC, respectively. A detailed survey of the SNPs occurring in fruit quality, shelf life and stress tolerance related-genes identified several candidates of potential relevance. Variations in ethylene response components may concur in determining peculiar phenotypes of COR and LUC. Abstract Tomato is a high value crop and the primary model for fleshy fruit development and ripening. Breeding priorities include increased fruit quality, shelf life and tolerance to stresses. To contribute towards this goal, we re-sequenced the genomes of Corbarino (COR) and Lucariello (LUC) landraces, which both possess the traits of plant adaptation to water deficit, prolonged fruit shelf-life and good fruit quality. Through the newly developed pipeline Reconstructor, we generated the genome sequences of COR and LUC using datasets of 65.8 M and 56.4 M of 30–150 bp paired-end reads, respectively. New contigs including reads that could not be mapped to the tomato reference genome were assembled, and a total of 43, 054 and 44, 579 gene loci were annotated in COR and LUC. Both genomes showed novel regions with similarity to Solanum pimpinellifolium and Solanum pennellii. In addition to small deletions and insertions, 2, 000 and 1, 700 single nucleotide polymorphisms (SNPs) could exert potentially disruptive effects on 1, 371 and 1, 201 genes in COR and LUC, respectively. A detailed survey of the SNPs occurring in fruit quality, shelf life and stress tolerance related-genes identified several candidates of potential relevance. Variations in ethylene response components may concur in determining peculiar phenotypes of COR and LUC.
The Characteristics of Heterozygous Protein Truncating Variants in the Human Genome
Author Summary Genome sequencing provides evidence for large numbers of putative protein truncating variants in humans. Most truncating variants are only observed in few individuals but are collectively prevalent and widely distributed across the coding genome. Most of the truncating variants are so rare that they are only observed in heterozygosis. The current study identifies 10% of genes where heterozygous truncations are not observed and describes their biological characteristics. In addition, for genes where rare truncations are observed, we argue that these are an unexplored source of diversity of phenotypic traits and diseases. Author Summary Genome sequencing provides evidence for large numbers of putative protein truncating variants in humans. Most truncating variants are only observed in few individuals but are collectively prevalent and widely distributed across the coding genome. Most of the truncating variants are so rare that they are only observed in heterozygosis. The current study identifies 10% of genes where heterozygous truncations are not observed and describes their biological characteristics. In addition, for genes where rare truncations are observed, we argue that these are an unexplored source of diversity of phenotypic traits and diseases.Sequencing projects have identified large numbers of rare stop-gain and frameshift variants in the human genome. As most of these are observed in the heterozygous state, they test a gene’s tolerance to haploinsufficiency and dominant loss of function. We analyzed the distribution of truncating variants across 16,260 autosomal protein coding genes in 11,546 individuals. We observed 39,893 truncating variants affecting 12,062 genes, which significantly differed from an expectation of 12,916 genes under a model of neutral de novo mutation (p<10−4). Extrapolating this to increasing numbers of sequenced individuals, we estimate that 10.8% of human genes do not tolerate heterozygous truncating variants. An additional 10 to 15% of truncated genes may be rescued by incomplete penetrance or compensatory mutations, or because the truncating variants are of limited functional impact. The study of protein truncating variants delineates the essential genome and, more generally, identifies rare heterozygous variants as an unexplored source of diversity of phenotypic traits and diseases. Sequencing projects have identified large numbers of rare stop-gain and frameshift variants in the human genome. As most of these are observed in the heterozygous state, they test a gene’s tolerance to haploinsufficiency and dominant loss of function. We analyzed the distribution of truncating variants across 16,260 autosomal protein coding genes in 11,546 individuals. We observed 39,893 truncating variants affecting 12,062 genes, which significantly differed from an expectation of 12,916 genes under a model of neutral de novo mutation (p<10−4). Extrapolating this to increasing numbers of sequenced individuals, we estimate that 10.8% of human genes do not tolerate heterozygous truncating variants. An additional 10 to 15% of truncated genes may be rescued by incomplete penetrance or compensatory mutations, or because the truncating variants are of limited functional impact. The study of protein truncating variants delineates the essential genome and, more generally, identifies rare heterozygous variants as an unexplored source of diversity of phenotypic traits and diseases.
Inferred father to son transmission of herpes simplex virus results in near perfect preservation of viral genome identity and in vivo phenotypes
High throughout sequencing has provided an unprecedented view of the circulating diversity of all classes of human herpesviruses. For herpes simplex virus 1 (HSV-1), we and others have previously published data demonstrating sequence diversity between hosts. However the extent of variation during transmission events, or in one host over years of chronic infection, remain unknown. Here we present an initial example of full characterization of viruses isolated from a father to son transmission event. The likely occasion of transmission occurred 17 years before the strains were isolated, enabling a first view of the degree of virus conservation after decades of recurrences, including transmission and adaptation to a new host. We have characterized the pathogenicity of these strains in a mouse ocular model of infection, and sequenced the full viral genomes. Surprisingly, we find that these two viruses have preserved their phenotype and genotype nearly perfectly during inferred transmission from father to son, and during nearly two decades of episodes of recurrent disease in each human host. Given the close genetic relationship of these two hosts, it remains to be seen whether or not this conservation of sequence will occur during non-familial transmission events. High throughout sequencing has provided an unprecedented view of the circulating diversity of all classes of human herpesviruses. For herpes simplex virus 1 (HSV-1), we and others have previously published data demonstrating sequence diversity between hosts. However the extent of variation during transmission events, or in one host over years of chronic infection, remain unknown. Here we present an initial example of full characterization of viruses isolated from a father to son transmission event. The likely occasion of transmission occurred 17 years before the strains were isolated, enabling a first view of the degree of virus conservation after decades of recurrences, including transmission and adaptation to a new host. We have characterized the pathogenicity of these strains in a mouse ocular model of infection, and sequenced the full viral genomes. Surprisingly, we find that these two viruses have preserved their phenotype and genotype nearly perfectly during inferred transmission from father to son, and during nearly two decades of episodes of recurrent disease in each human host. Given the close genetic relationship of these two hosts, it remains to be seen whether or not this conservation of sequence will occur during non-familial transmission events.
Whole Genome Re Sequencing and Characterization of Powdery Mildew Disease Associated Allelic Variation in Melon
Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, ‘SCNU1154’, ‘Edisto47’, ‘MR-1’, and ‘PMR5’. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon. Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, ‘SCNU1154’, ‘Edisto47’, ‘MR-1’, and ‘PMR5’. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.
Extensive local adaptation within the chemosensory system following Drosophila melanogaster's global expansion
Fruit flies gain valuable information about their environment by sensing chemicals. Here, Arguello et al. show strong signals of recent selection on the chemosensory system of the fruit fly Drosophila melanogaster, consistent with the adaptation of populations to their local chemical environment. Fruit flies gain valuable information about their environment by sensing chemicals. Here, Arguello et al. show strong signals of recent selection on the chemosensory system of the fruit fly Drosophila melanogaster, consistent with the adaptation of populations to their local chemical environment.How organisms adapt to new environments is of fundamental biological interest, but poorly understood at the genetic level. Chemosensory systems provide attractive models to address this problem, because they lie between external environmental signals and internal physiological responses. To investigate how selection has shaped the well-characterized chemosensory system of Drosophila melanogaster, we have analysed genome-wide data from five diverse populations. By couching population genomic analyses of chemosensory protein families within parallel analyses of other large families, we demonstrate that chemosensory proteins are not outliers for adaptive divergence between species. However, chemosensory families often display the strongest genome-wide signals of recent selection within D. melanogaster. We show that recent adaptation has operated almost exclusively on standing variation, and that patterns of adaptive mutations predict diverse effects on protein function. Finally, we provide evidence that chemosensory proteins have experienced relaxed constraint, and argue that this has been important for their rapid adaptation over short timescales. How organisms adapt to new environments is of fundamental biological interest, but poorly understood at the genetic level. Chemosensory systems provide attractive models to address this problem, because they lie between external environmental signals and internal physiological responses. To investigate how selection has shaped the well-characterized chemosensory system of Drosophila melanogaster, we have analysed genome-wide data from five diverse populations. By couching population genomic analyses of chemosensory protein families within parallel analyses of other large families, we demonstrate that chemosensory proteins are not outliers for adaptive divergence between species. However, chemosensory families often display the strongest genome-wide signals of recent selection within D. melanogaster. We show that recent adaptation has operated almost exclusively on standing variation, and that patterns of adaptive mutations predict diverse effects on protein function. Finally, we provide evidence that chemosensory proteins have experienced relaxed constraint, and argue that this has been important for their rapid adaptation over short timescales.
Whole Genome Sequencing and Comparative Genome Analysis of Bacillus subtilis Strains Isolated from Non Salted Fermented Soybean Foods
Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA), we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food) starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from “Tua Nao” of Thailand traces a different evolutionary process from other strains. Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA), we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food) starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from “Tua Nao” of Thailand traces a different evolutionary process from other strains.
Drosophila Adaptation to Viral Infection through Defensive Symbiont Evolution
Author Summary Animals live in close association with microbial partners that can shape many aspects of their lives. For instance, several insects carry bacteria that defend them against parasites and infectious diseases. The intracellular bacterium Wolbachia protects the fruit fly Drosophila melanogaster against viral infection. Natural populations of Drosophila carry different variants of Wolbachia, which differ from one another in the strength of this protection. Here we show that a population of Drosophila infected with viruses during several generations adapts to this challenge through turnover in Wolbachia composition. The Wolbachia variants that give higher protection to viruses, by increasing fly survival and fecundity upon infection, are strongly selected. This work demonstrates that the interaction of an animal with a pathogen can shape its associated microbial populations. We show that adaptation to pathogens can be achieved not only through selection of resistance on the host proper but also through the evolutionary shaping of its microbial community. Author Summary Animals live in close association with microbial partners that can shape many aspects of their lives. For instance, several insects carry bacteria that defend them against parasites and infectious diseases. The intracellular bacterium Wolbachia protects the fruit fly Drosophila melanogaster against viral infection. Natural populations of Drosophila carry different variants of Wolbachia, which differ from one another in the strength of this protection. Here we show that a population of Drosophila infected with viruses during several generations adapts to this challenge through turnover in Wolbachia composition. The Wolbachia variants that give higher protection to viruses, by increasing fly survival and fecundity upon infection, are strongly selected. This work demonstrates that the interaction of an animal with a pathogen can shape its associated microbial populations. We show that adaptation to pathogens can be achieved not only through selection of resistance on the host proper but also through the evolutionary shaping of its microbial community.Microbial symbionts can modulate host interactions with biotic and abiotic factors. Such interactions may affect the evolutionary trajectories of both host and symbiont. Wolbachia protects Drosophila melanogaster against several viral infections and the strength of the protection varies between variants of this endosymbiont. Since Wolbachia is maternally transmitted, its fitness depends on the fitness of its host. Therefore, Wolbachia populations may be under selection when Drosophila is subjected to viral infection. Here we show that in D. melanogaster populations selected for increased survival upon infection with Drosophila C virus there is a strong selection coefficient for specific Wolbachia variants, leading to their fixation. Flies carrying these selected Wolbachia variants have higher survival and fertility upon viral infection when compared to flies with the other variants. These findings demonstrate how the interaction of a host with pathogens shapes the genetic composition of symbiont populations. Furthermore, host adaptation can result from the evolution of its symbionts, with host and symbiont functioning as a single evolutionary unit. Microbial symbionts can modulate host interactions with biotic and abiotic factors. Such interactions may affect the evolutionary trajectories of both host and symbiont. Wolbachia protects Drosophila melanogaster against several viral infections and the strength of the protection varies between variants of this endosymbiont. Since Wolbachia is maternally transmitted, its fitness depends on the fitness of its host. Therefore, Wolbachia populations may be under selection when Drosophila is subjected to viral infection. Here we show that in D. melanogaster populations selected for increased survival upon infection with Drosophila C virus there is a strong selection coefficient for specific Wolbachia variants, leading to their fixation. Flies carrying these selected Wolbachia variants have higher survival and fertility upon viral infection when compared to flies with the other variants. These findings demonstrate how the interaction of a host with pathogens shapes the genetic composition of symbiont populations. Furthermore, host adaptation can result from the evolution of its symbionts, with host and symbiont functioning as a single evolutionary unit.
The Pattern and Distribution of Induced Mutations in J. curcas Using Reduced Representation Sequencing
Mutagenesis in combination with Genotyping by Sequencing (GBS) is a powerful tool for introducing variation, studying gene function and identifying causal mutations underlying phenotypes of interest in crop plant genomes. About 400 million paired-end reads were obtained from 82 ethylmethane sulfonate (EMS) induced mutants and 14 wild-type accessions of Jatropha curcas for the detection of Single Nucleotide Polymorphisms (SNPs) and Insertion/Deletions (InDels) by two different approaches (nGBS and ddGBS) on an Illumina HiSeq 2000 sequencer. Using bioinformatics analyses, 1,452 induced SNPs and InDels were identified in coding regions, which were distributed across 995 genes. The predominantly observed mutations were G/C to A/T transitions (64%), while transversions were observed at a lower frequency (36%). Regarding the effect of mutations on gene function, 18% of the mutations were located in intergenic regions. In fact, mutants with the highest number of heterozygous SNPs were found in samples treated with 0.8% EMS for 3 h. Reconstruction of the metabolic pathways showed that in total 16 SNPs were located in six KEGG pathways by nGBS and two pathways by ddGBS. The most highly represented pathways were ether-lipid metabolism and glycerophospholipid metabolism, followed by starch and sucrose metabolism by nGBS and triterpenoid biosynthesis as well as steroid biosynthesis by ddGBS. Furthermore, high genome methylation was observed in J. curcas, which might help to understand the plasticity of the Jatropha genome in response to environmental factors. At last, the results showed that continuously vegetatively propagated tissue is a fast, efficient and accurate method to dissolve chimeras, especially for long-lived plants like J. curcas. Obtained data showed that allelic variations and in silico analyses of gene functions (gene function prediction), which control important traits, could be identified in mutant populations using nGBS and ddGBS. However, the handling of GBS data is more difficult and more challenging than the traditional TILLING strategy in mutated plants, since the Jatropha genome sequence is incomplete, which makes alignment and variant analysis of target sequence reads challenging to perform and interpret. Therefore, providing a complete Jatropha reference genome sequence with high quality should be a priority for any breeding program. Mutagenesis in combination with Genotyping by Sequencing (GBS) is a powerful tool for introducing variation, studying gene function and identifying causal mutations underlying phenotypes of interest in crop plant genomes. About 400 million paired-end reads were obtained from 82 ethylmethane sulfonate (EMS) induced mutants and 14 wild-type accessions of Jatropha curcas for the detection of Single Nucleotide Polymorphisms (SNPs) and Insertion/Deletions (InDels) by two different approaches (nGBS and ddGBS) on an Illumina HiSeq 2000 sequencer. Using bioinformatics analyses, 1,452 induced SNPs and InDels were identified in coding regions, which were distributed across 995 genes. The predominantly observed mutations were G/C to A/T transitions (64%), while transversions were observed at a lower frequency (36%). Regarding the effect of mutations on gene function, 18% of the mutations were located in intergenic regions. In fact, mutants with the highest number of heterozygous SNPs were found in samples treated with 0.8% EMS for 3 h. Reconstruction of the metabolic pathways showed that in total 16 SNPs were located in six KEGG pathways by nGBS and two pathways by ddGBS. The most highly represented pathways were ether-lipid metabolism and glycerophospholipid metabolism, followed by starch and sucrose metabolism by nGBS and triterpenoid biosynthesis as well as steroid biosynthesis by ddGBS. Furthermore, high genome methylation was observed in J. curcas, which might help to understand the plasticity of the Jatropha genome in response to environmental factors. At last, the results showed that continuously vegetatively propagated tissue is a fast, efficient and accurate method to dissolve chimeras, especially for long-lived plants like J. curcas. Obtained data showed that allelic variations and in silico analyses of gene functions (gene function prediction), which control important traits, could be identified in mutant populations using nGBS and ddGBS. However, the handling of GBS data is more difficult and more challenging than the traditional TILLING strategy in mutated plants, since the Jatropha genome sequence is incomplete, which makes alignment and variant analysis of target sequence reads challenging to perform and interpret. Therefore, providing a complete Jatropha reference genome sequence with high quality should be a priority for any breeding program.
A Novel Mutation in the Transglutaminase 1 Gene in an Autosomal Recessive Congenital Ichthyosis Patient
Structure-function implication on a novel homozygous Trp250/Gly mutation of transglutaminase-1 (TGM1) observed in a patient of autosomal recessive congenital ichthyosis is invoked from a bioinformatics analysis. Structural consequences of this mutation are hypothesized in comparison to homologous enzyme human factor XIIIA accepted as valid in similar structural analysis and are projected as guidelines for future studies at an experimental level on TGM1 thus mutated. Structure-function implication on a novel homozygous Trp250/Gly mutation of transglutaminase-1 (TGM1) observed in a patient of autosomal recessive congenital ichthyosis is invoked from a bioinformatics analysis. Structural consequences of this mutation are hypothesized in comparison to homologous enzyme human factor XIIIA accepted as valid in similar structural analysis and are projected as guidelines for future studies at an experimental level on TGM1 thus mutated.
Whole Genome Sequencing Revealed Mutations in Two Independent Genes as the Underlying Cause of Retinal Degeneration in an Ashkenazi Jewish Pedigree
Retinitis pigmentosa (RP) causes progressive photoreceptor loss resulting from mutations in over 80 genes. This study identified the genetic cause of RP in three members of a non-consanguineous pedigree. Detailed ophthalmic evaluation was performed in the three affected family members. Whole exome sequencing (WES) and whole genome sequencing (WGS) were performed in the three affected and the two unaffected family members and variants were filtered to detect rare, potentially deleterious variants segregating with disease. WES and WGS did not identify potentially pathogenic variants shared by all three affected members. However, WES identified a previously reported homozygous nonsense mutation in KIZ (c.226C>T, p.Arg76*) in two affected sisters, but not in their affected second cousin. WGS revealed a novel 1.135 kb homozygous deletion in a retina transcript of C21orf2 and a novel 30.651 kb heterozygous deletion in CACNA2D4 in the affected second cousin. The sisters with the KIZ mutation carried no copies of the C21orf2 or CACNA2D4 deletions, while the second cousin with the C21orf2 and CACNA2D4 deletions carried no copies of the KIZ mutation. This study identified two independent, homozygous mutations in genes previously reported in autosomal recessive RP in a non-consanguineous family, and demonstrated the value of WGS when WES fails to identify likely disease-causing mutations. Retinitis pigmentosa (RP) causes progressive photoreceptor loss resulting from mutations in over 80 genes. This study identified the genetic cause of RP in three members of a non-consanguineous pedigree. Detailed ophthalmic evaluation was performed in the three affected family members. Whole exome sequencing (WES) and whole genome sequencing (WGS) were performed in the three affected and the two unaffected family members and variants were filtered to detect rare, potentially deleterious variants segregating with disease. WES and WGS did not identify potentially pathogenic variants shared by all three affected members. However, WES identified a previously reported homozygous nonsense mutation in KIZ (c.226C>T, p.Arg76*) in two affected sisters, but not in their affected second cousin. WGS revealed a novel 1.135 kb homozygous deletion in a retina transcript of C21orf2 and a novel 30.651 kb heterozygous deletion in CACNA2D4 in the affected second cousin. The sisters with the KIZ mutation carried no copies of the C21orf2 or CACNA2D4 deletions, while the second cousin with the C21orf2 and CACNA2D4 deletions carried no copies of the KIZ mutation. This study identified two independent, homozygous mutations in genes previously reported in autosomal recessive RP in a non-consanguineous family, and demonstrated the value of WGS when WES fails to identify likely disease-causing mutations.
Mutational landscapes of tongue carcinoma reveal recurrent mutations in genes of therapeutic and prognostic relevance
Background Carcinoma of the oral tongue (OTSCC) is the most common malignancy of the oral cavity, characterized by frequent recurrence and poor survival. The last three decades has witnessed a change in the OTSCC epidemiological profile, with increasing incidence in younger patients, females and never-smokers. Here, we sought to characterize the OTSCC genomic landscape and to determine factors that may delineate the genetic basis of this disease, inform prognosis and identify targets for therapeutic intervention. Methods Seventy-eight cases were subjected to whole-exome (n = 18) and targeted deep sequencing (n = 60). Results While the most common mutation was in TP53, the OTSCC genetic landscape differed from previously described cohorts of patients with head and neck tumors: OTSCCs demonstrated frequent mutations in DST and RNF213, while alterations in CDKN2A and NOTCH1 were significantly less frequent. Despite a lack of previously reported NOTCH1 mutations, integrated analysis showed enrichments of alterations affecting Notch signaling in OTSCC. Importantly, these Notch pathway alterations were prognostic on multivariate analyses. A high proportion of OTSCCs also presented with alterations in drug targetable and chromatin remodeling genes. Patients harboring mutations in actionable pathways were more likely to succumb from recurrent disease compared with those who did not, suggesting that the former should be considered for treatment with targeted compounds in future trials. Conclusions Our study defines the Asian OTSCC mutational landscape, highlighting the key role of Notch signaling in oral tongue tumorigenesis. We also observed somatic mutations in multiple therapeutically relevant genes, which may represent candidate drug targets in this highly lethal tumor type. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0219-2) contains supplementary material, which is available to authorized users. Background Carcinoma of the oral tongue (OTSCC) is the most common malignancy of the oral cavity, characterized by frequent recurrence and poor survival. The last three decades has witnessed a change in the OTSCC epidemiological profile, with increasing incidence in younger patients, females and never-smokers. Here, we sought to characterize the OTSCC genomic landscape and to determine factors that may delineate the genetic basis of this disease, inform prognosis and identify targets for therapeutic intervention. Methods Seventy-eight cases were subjected to whole-exome (n = 18) and targeted deep sequencing (n = 60). Results While the most common mutation was in TP53, the OTSCC genetic landscape differed from previously described cohorts of patients with head and neck tumors: OTSCCs demonstrated frequent mutations in DST and RNF213, while alterations in CDKN2A and NOTCH1 were significantly less frequent. Despite a lack of previously reported NOTCH1 mutations, integrated analysis showed enrichments of alterations affecting Notch signaling in OTSCC. Importantly, these Notch pathway alterations were prognostic on multivariate analyses. A high proportion of OTSCCs also presented with alterations in drug targetable and chromatin remodeling genes. Patients harboring mutations in actionable pathways were more likely to succumb from recurrent disease compared with those who did not, suggesting that the former should be considered for treatment with targeted compounds in future trials. Conclusions Our study defines the Asian OTSCC mutational landscape, highlighting the key role of Notch signaling in oral tongue tumorigenesis. We also observed somatic mutations in multiple therapeutically relevant genes, which may represent candidate drug targets in this highly lethal tumor type. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0219-2) contains supplementary material, which is available to authorized users.
Increased Missense Mutation Burden of Fatty Acid Metabolism Related Genes in Nunavik Inuit Population
Background Nunavik Inuit (northern Quebec, Canada) reside along the arctic coastline where for generations their daily energy intake has mainly been derived from animal fat. Given this particular diet it has been hypothesized that natural selection would lead to population specific allele frequency differences and unique variants in genes related to fatty acid metabolism. A group of genes, namely CPT1A, CPT1B, CPT1C, CPT2, CRAT and CROT, encode for three carnitine acyltransferases that are important for the oxidation of fatty acids, a critical step in their metabolism. Methods Exome sequencing and SNP array genotyping were used to examine the genetic variations in the six genes encoding for the carnitine acyltransferases in 113 Nunavik Inuit individuals. Results Altogether ten missense variants were found in genes CPT1A, CPT1B, CPT1C, CPT2 and CRAT, including three novel variants and one Inuit specific variant CPT1A p.P479L (rs80356779). The latter has the highest frequency (0.955) compared to other Inuit populations. We found that by comparison to Asians or Europeans, the Nunavik Inuit have an increased mutation burden in CPT1A, CPT2 and CRAT; there is also a high level of population differentiation based on carnitine acyltransferase gene variations between Nunavik Inuit and Asians. Conclusion The increased number and frequency of deleterious variants in these fatty acid metabolism genes in Nunavik Inuit may be the result of genetic adaptation to their diet and/or the extremely cold climate. In addition, the identification of these variants may help to understand some of the specific health risks of Nunavik Inuit. Background Nunavik Inuit (northern Quebec, Canada) reside along the arctic coastline where for generations their daily energy intake has mainly been derived from animal fat. Given this particular diet it has been hypothesized that natural selection would lead to population specific allele frequency differences and unique variants in genes related to fatty acid metabolism. A group of genes, namely CPT1A, CPT1B, CPT1C, CPT2, CRAT and CROT, encode for three carnitine acyltransferases that are important for the oxidation of fatty acids, a critical step in their metabolism. Methods Exome sequencing and SNP array genotyping were used to examine the genetic variations in the six genes encoding for the carnitine acyltransferases in 113 Nunavik Inuit individuals. Results Altogether ten missense variants were found in genes CPT1A, CPT1B, CPT1C, CPT2 and CRAT, including three novel variants and one Inuit specific variant CPT1A p.P479L (rs80356779). The latter has the highest frequency (0.955) compared to other Inuit populations. We found that by comparison to Asians or Europeans, the Nunavik Inuit have an increased mutation burden in CPT1A, CPT2 and CRAT; there is also a high level of population differentiation based on carnitine acyltransferase gene variations between Nunavik Inuit and Asians. Conclusion The increased number and frequency of deleterious variants in these fatty acid metabolism genes in Nunavik Inuit may be the result of genetic adaptation to their diet and/or the extremely cold climate. In addition, the identification of these variants may help to understand some of the specific health risks of Nunavik Inuit.
Fanconi anemia gene variants in therapy related myeloid neoplasms
Accurate Breakpoint Mapping in Apparently Balanced Translocation Families with Discordant Phenotypes Using Whole Genome Mate Pair Sequencing
Familial apparently balanced translocations (ABTs) segregating with discordant phenotypes are extremely challenging for interpretation and counseling due to the scarcity of publications and lack of routine techniques for quick investigation. Recently, next generation sequencing has emerged as an efficacious methodology for precise detection of translocation breakpoints. However, studies so far have mainly focused on de novo translocations. The present study focuses specifically on familial cases in order to shed some light to this diagnostic dilemma. Whole-genome mate-pair sequencing (WG-MPS) was applied to map the breakpoints in nine two-way ABT carriers from four families. Translocation breakpoints and patient-specific structural variants were validated by Sanger sequencing and quantitative Real Time PCR, respectively. Identical sequencing patterns and breakpoints were identified in affected and non-affected members carrying the same translocations. PTCD1, ATP5J2-PTCD1, CADPS2, and STPG1 were disrupted by the translocations in three families, rendering them initially as possible disease candidate genes. However, subsequent mutation screening and structural variant analysis did not reveal any pathogenic mutations or unique variants in the affected individuals that could explain the phenotypic differences between carriers of the same translocations. In conclusion, we suggest that NGS-based methods, such as WG-MPS, can be successfully used for detailed mapping of translocation breakpoints, which can also be used in routine clinical investigation of ABT cases. Unlike de novo translocations, no associations were determined here between familial two-way ABTs and the phenotype of the affected members, in which the presence of cryptic imbalances and complex chromosomal rearrangements has been excluded. Future whole-exome or whole-genome sequencing will potentially reveal unidentified mutations in the patients underlying the discordant phenotypes within each family. In addition, larger studies are needed to determine the exact percentage for phenotypic risk in families with ABTs. Familial apparently balanced translocations (ABTs) segregating with discordant phenotypes are extremely challenging for interpretation and counseling due to the scarcity of publications and lack of routine techniques for quick investigation. Recently, next generation sequencing has emerged as an efficacious methodology for precise detection of translocation breakpoints. However, studies so far have mainly focused on de novo translocations. The present study focuses specifically on familial cases in order to shed some light to this diagnostic dilemma. Whole-genome mate-pair sequencing (WG-MPS) was applied to map the breakpoints in nine two-way ABT carriers from four families. Translocation breakpoints and patient-specific structural variants were validated by Sanger sequencing and quantitative Real Time PCR, respectively. Identical sequencing patterns and breakpoints were identified in affected and non-affected members carrying the same translocations. PTCD1, ATP5J2-PTCD1, CADPS2, and STPG1 were disrupted by the translocations in three families, rendering them initially as possible disease candidate genes. However, subsequent mutation screening and structural variant analysis did not reveal any pathogenic mutations or unique variants in the affected individuals that could explain the phenotypic differences between carriers of the same translocations. In conclusion, we suggest that NGS-based methods, such as WG-MPS, can be successfully used for detailed mapping of translocation breakpoints, which can also be used in routine clinical investigation of ABT cases. Unlike de novo translocations, no associations were determined here between familial two-way ABTs and the phenotype of the affected members, in which the presence of cryptic imbalances and complex chromosomal rearrangements has been excluded. Future whole-exome or whole-genome sequencing will potentially reveal unidentified mutations in the patients underlying the discordant phenotypes within each family. In addition, larger studies are needed to determine the exact percentage for phenotypic risk in families with ABTs.
The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine
eLife digest Few crop failures have been as devastating as those caused by potato late blight in the 1840s. This disease is caused by a filamentous microbe called Phytophthora infestans, which spread from North America to Europe in 1845, leading to the Great Famine in Ireland and to severe crop losses in the rest of Europe. Phytophthora is thought to have originated in the Toluca valley of Mexico, where many different strains evolve alongside wild potato relatives, but the exact strain that caused the Great Famine, and how it is related to modern strains of the pathogen, has remained a mystery. Yoshida et al. have used a technique call ‘shotgun’ sequencing to map the genomes of 11 historical strains of P. infestans and 15 modern strains. The historical strains were extracted from the leaves of potato and tomato plants that were collected in North America and Europe, including Ireland and Great Britain, from 1845 onwards and stored in herbaria for future research. By comparing the genomes of the historical and modern samples, Yoshida et al. found that the historical strains all belonged to a single lineage that shows very little genetic diversity. Previously it has been proposed that this lineage was the same as US-1, which was the dominant strain of potato blight in the world until the end of the 1970s, or that it was more closely related to modern strains than to US-1. Yoshida et al. now rule out both of these possibilities and show that the lineage that caused the great famine, which they call HERB-1, is clearly distinct from US-1, although they are closely related, and they conclude that both HERB-1 and US-1 might have dispersed from a common ancestor that existed outside of Mexico in the early 1800s. Why US-1 later replaced HERB-1 as the dominant strain in the world is an important question for future studies. DOI: http://dx.doi.org/10.7554/eLife.00731.002 eLife digest Few crop failures have been as devastating as those caused by potato late blight in the 1840s. This disease is caused by a filamentous microbe called Phytophthora infestans, which spread from North America to Europe in 1845, leading to the Great Famine in Ireland and to severe crop losses in the rest of Europe. Phytophthora is thought to have originated in the Toluca valley of Mexico, where many different strains evolve alongside wild potato relatives, but the exact strain that caused the Great Famine, and how it is related to modern strains of the pathogen, has remained a mystery. Yoshida et al. have used a technique call ‘shotgun’ sequencing to map the genomes of 11 historical strains of P. infestans and 15 modern strains. The historical strains were extracted from the leaves of potato and tomato plants that were collected in North America and Europe, including Ireland and Great Britain, from 1845 onwards and stored in herbaria for future research. By comparing the genomes of the historical and modern samples, Yoshida et al. found that the historical strains all belonged to a single lineage that shows very little genetic diversity. Previously it has been proposed that this lineage was the same as US-1, which was the dominant strain of potato blight in the world until the end of the 1970s, or that it was more closely related to modern strains than to US-1. Yoshida et al. now rule out both of these possibilities and show that the lineage that caused the great famine, which they call HERB-1, is clearly distinct from US-1, although they are closely related, and they conclude that both HERB-1 and US-1 might have dispersed from a common ancestor that existed outside of Mexico in the early 1800s. Why US-1 later replaced HERB-1 as the dominant strain in the world is an important question for future studies. DOI: http://dx.doi.org/10.7554/eLife.00731.002Phytophthora infestans, the cause of potato late blight, is infamous for having triggered the Irish Great Famine in the 1840s. Until the late 1970s, P. infestans diversity outside of its Mexican center of origin was low, and one scenario held that a single strain, US-1, had dominated the global population for 150 years; this was later challenged based on DNA analysis of historical herbarium specimens. We have compared the genomes of 11 herbarium and 15 modern strains. We conclude that the 19th century epidemic was caused by a unique genotype, HERB-1, that persisted for over 50 years. HERB-1 is distinct from all examined modern strains, but it is a close relative of US-1, which replaced it outside of Mexico in the 20th century. We propose that HERB-1 and US-1 emerged from a metapopulation that was established in the early 1800s outside of the species' center of diversity. DOI: http://dx.doi.org/10.7554/eLife.00731.001 Phytophthora infestans, the cause of potato late blight, is infamous for having triggered the Irish Great Famine in the 1840s. Until the late 1970s, P. infestans diversity outside of its Mexican center of origin was low, and one scenario held that a single strain, US-1, had dominated the global population for 150 years; this was later challenged based on DNA analysis of historical herbarium specimens. We have compared the genomes of 11 herbarium and 15 modern strains. We conclude that the 19th century epidemic was caused by a unique genotype, HERB-1, that persisted for over 50 years. HERB-1 is distinct from all examined modern strains, but it is a close relative of US-1, which replaced it outside of Mexico in the 20th century. We propose that HERB-1 and US-1 emerged from a metapopulation that was established in the early 1800s outside of the species' center of diversity. DOI: http://dx.doi.org/10.7554/eLife.00731.001
Comparative profiling of differentially expressed microRNAs between the follicular and luteal phases ovaries of goats
To explore if the regulation at post-transcriptional level of follicular phase (Fols) to luteal phase (Luts) transition occurs in the ovaries of Anhuai goats, the differentially expressed microRNAs (miRNAs) of ovaries in the Fols and Luts were analyzed using Solexa sequencing in the study. In total, 320 known miRNAs were co-expressed in the two phases, 339 and 353 known miRNAs were expressed in the ovary in the Fols and Luts, respectively. In addition, 45 novel miRNAs were co-expressed in the two phases, 70 and 94 novel miRNAs were expressed in the ovary in the Fols and Luts, respectively. Let-7f was the highest expressed significantly different known miRNA in the two phases, and mir-159 was the highest expressed significantly different novel miRNA in the two phases, which may participate in the follicular-luteal transition of Anhuai goats. GO annotation and KEGG pathway analysis were applied to analyze the target genes of differentially expressed miRNAs detected in the two phases. The results will help to further understand the role of miRNAs in the regulation of follicular to luteal transition in goat ovaries. Electronic supplementary material The online version of this article (doi:10.1186/s40064-016-2902-1) contains supplementary material, which is available to authorized users. To explore if the regulation at post-transcriptional level of follicular phase (Fols) to luteal phase (Luts) transition occurs in the ovaries of Anhuai goats, the differentially expressed microRNAs (miRNAs) of ovaries in the Fols and Luts were analyzed using Solexa sequencing in the study. In total, 320 known miRNAs were co-expressed in the two phases, 339 and 353 known miRNAs were expressed in the ovary in the Fols and Luts, respectively. In addition, 45 novel miRNAs were co-expressed in the two phases, 70 and 94 novel miRNAs were expressed in the ovary in the Fols and Luts, respectively. Let-7f was the highest expressed significantly different known miRNA in the two phases, and mir-159 was the highest expressed significantly different novel miRNA in the two phases, which may participate in the follicular-luteal transition of Anhuai goats. GO annotation and KEGG pathway analysis were applied to analyze the target genes of differentially expressed miRNAs detected in the two phases. The results will help to further understand the role of miRNAs in the regulation of follicular to luteal transition in goat ovaries. Electronic supplementary material The online version of this article (doi:10.1186/s40064-016-2902-1) contains supplementary material, which is available to authorized users.
Genomic Features of the Damselfly Calopteryx splendens Representing a Sister Clade to Most Insect Orders
Insects comprise the most diverse and successful animal group with over one million described species that are found in almost every terrestrial and limnic habitat, with many being used as important models in genetics, ecology, and evolutionary research. Genome sequencing projects have greatly expanded the sampling of species from many insect orders, but genomic resources for species of certain insect lineages have remained relatively limited to date. To address this paucity, we sequenced the genome of the banded demoiselle, Calopteryx splendens, a damselfly (Odonata: Zygoptera) belonging to Palaeoptera, the clade containing the first winged insects. The 1.6 Gbp C. splendens draft genome assembly is one of the largest insect genomes sequenced to date and encodes a predicted set of 22,523 protein-coding genes. Comparative genomic analyses with other sequenced insects identified a relatively small repertoire of C. splendens detoxification genes, which could explain its previously noted sensitivity to habitat pollution. Intriguingly, this repertoire includes a cytochrome P450 gene not previously described in any insect genome. The C. splendens immune gene repertoire appears relatively complete and features several genes encoding novel multi-domain peptidoglycan recognition proteins. Analysis of chemosensory genes revealed the presence of both gustatory and ionotropic receptors, as well as the insect odorant receptor coreceptor gene (OrCo) and at least four partner odorant receptors (ORs). This represents the oldest known instance of a complete OrCo/OR system in insects, and provides the molecular underpinning for odonate olfaction. The C. splendens genome improves the sampling of insect lineages that diverged before the radiation of Holometabola and offers new opportunities for molecular-level evolutionary, ecological, and behavioral studies. Insects comprise the most diverse and successful animal group with over one million described species that are found in almost every terrestrial and limnic habitat, with many being used as important models in genetics, ecology, and evolutionary research. Genome sequencing projects have greatly expanded the sampling of species from many insect orders, but genomic resources for species of certain insect lineages have remained relatively limited to date. To address this paucity, we sequenced the genome of the banded demoiselle, Calopteryx splendens, a damselfly (Odonata: Zygoptera) belonging to Palaeoptera, the clade containing the first winged insects. The 1.6 Gbp C. splendens draft genome assembly is one of the largest insect genomes sequenced to date and encodes a predicted set of 22,523 protein-coding genes. Comparative genomic analyses with other sequenced insects identified a relatively small repertoire of C. splendens detoxification genes, which could explain its previously noted sensitivity to habitat pollution. Intriguingly, this repertoire includes a cytochrome P450 gene not previously described in any insect genome. The C. splendens immune gene repertoire appears relatively complete and features several genes encoding novel multi-domain peptidoglycan recognition proteins. Analysis of chemosensory genes revealed the presence of both gustatory and ionotropic receptors, as well as the insect odorant receptor coreceptor gene (OrCo) and at least four partner odorant receptors (ORs). This represents the oldest known instance of a complete OrCo/OR system in insects, and provides the molecular underpinning for odonate olfaction. The C. splendens genome improves the sampling of insect lineages that diverged before the radiation of Holometabola and offers new opportunities for molecular-level evolutionary, ecological, and behavioral studies.
Whole transcriptome RNA Seq analysis reveals extensive cell type specific compartmentalization in Volvox carteri
Background One of evolution’s most important achievements is the development and radiation of multicellular organisms with different types of cells. Complex multicellularity has evolved several times in eukaryotes; yet, in most lineages, an investigation of its molecular background is considerably challenging since the transition occurred too far in the past and, in addition, these lineages evolved a large number of cell types. However, for volvocine green algae, such as Volvox carteri, multicellularity is a relatively recent innovation. Furthermore, V. carteri shows a complete division of labor between only two cell types – small, flagellated somatic cells and large, immotile reproductive cells. Thus, V. carteri provides a unique opportunity to study multicellularity and cellular differentiation at the molecular level. Results This study provides a whole transcriptome RNA-Seq analysis of separated cell types of the multicellular green alga V. carteri f. nagariensis to reveal cell type-specific components and functions. To this end, 246 million quality filtered reads were mapped to the genome and valid expression data were obtained for 93% of the 14,247 gene loci. In the subsequent search for protein domains with assigned molecular function, we identified 9435 previously classified domains in 44% of all gene loci. Furthermore, in 43% of all gene loci we identified 15,254 domains that are involved in biological processes. All identified domains were investigated regarding cell type-specific expression. Moreover, we provide further insight into the expression pattern of previously described gene families (e.g., pherophorin, extracellular matrix metalloprotease, and VARL families). Our results demonstrate an extensive compartmentalization of the transcriptome between cell types: More than half of all genes show a clear difference in expression between somatic and reproductive cells. Conclusions This study constitutes the first transcriptome-wide RNA-Seq analysis of separated cell types of V. carteri focusing on gene expression. The high degree of differential expression indicates a strong differentiation of cell types despite the fact that V. carteri diverged relatively recently from its unicellular relatives. Our expression dataset and the bioinformatic analyses provide the opportunity to further investigate and understand the mechanisms of cell type-specific expression and its transcriptional regulation. Electronic supplementary material The online version of this article (doi:10.1186/s12915-017-0450-y) contains supplementary material, which is available to authorized users. Background One of evolution’s most important achievements is the development and radiation of multicellular organisms with different types of cells. Complex multicellularity has evolved several times in eukaryotes; yet, in most lineages, an investigation of its molecular background is considerably challenging since the transition occurred too far in the past and, in addition, these lineages evolved a large number of cell types. However, for volvocine green algae, such as Volvox carteri, multicellularity is a relatively recent innovation. Furthermore, V. carteri shows a complete division of labor between only two cell types – small, flagellated somatic cells and large, immotile reproductive cells. Thus, V. carteri provides a unique opportunity to study multicellularity and cellular differentiation at the molecular level. Results This study provides a whole transcriptome RNA-Seq analysis of separated cell types of the multicellular green alga V. carteri f. nagariensis to reveal cell type-specific components and functions. To this end, 246 million quality filtered reads were mapped to the genome and valid expression data were obtained for 93% of the 14,247 gene loci. In the subsequent search for protein domains with assigned molecular function, we identified 9435 previously classified domains in 44% of all gene loci. Furthermore, in 43% of all gene loci we identified 15,254 domains that are involved in biological processes. All identified domains were investigated regarding cell type-specific expression. Moreover, we provide further insight into the expression pattern of previously described gene families (e.g., pherophorin, extracellular matrix metalloprotease, and VARL families). Our results demonstrate an extensive compartmentalization of the transcriptome between cell types: More than half of all genes show a clear difference in expression between somatic and reproductive cells. Conclusions This study constitutes the first transcriptome-wide RNA-Seq analysis of separated cell types of V. carteri focusing on gene expression. The high degree of differential expression indicates a strong differentiation of cell types despite the fact that V. carteri diverged relatively recently from its unicellular relatives. Our expression dataset and the bioinformatic analyses provide the opportunity to further investigate and understand the mechanisms of cell type-specific expression and its transcriptional regulation. Electronic supplementary material The online version of this article (doi:10.1186/s12915-017-0450-y) contains supplementary material, which is available to authorized users.
Genome wide association study identifies multiple susceptibility loci for diffuse large B cell lymphoma
Characterization and genomic analysis of kraft lignin biodegradation by the beta proteobacterium Cupriavidus basilensis B 8
Background Lignin materials are abundant and among the most important potential sources for biofuel production. Development of an efficient lignin degradation process has considerable potential for the production of a variety of chemicals, including bioethanol. However, lignin degradation using current methods is inefficient. Given their immense environmental adaptability and biochemical versatility, bacterial could be used as a valuable tool for the rapid degradation of lignin. Kraft lignin (KL) is a polymer by-product of the pulp and paper industry resulting from alkaline sulfide treatment of lignocellulose, and it has been widely used for lignin-related studies. Results Beta-proteobacterium Cupriavidus basilensis B-8 isolated from erosive bamboo slips displayed substantial KL degradation capability. With initial concentrations of 0.5–6 g L-1, at least 31.3% KL could be degraded in 7 days. The maximum degradation rate was 44.4% at the initial concentration of 2 g L-1. The optimum pH and temperature for KL degradation were 7.0 and 30°C, respectively. Manganese peroxidase (MnP) and laccase (Lac) demonstrated their greatest level of activity, 1685.3 U L-1 and 815.6 U L-1, at the third and fourth days, respectively. Many small molecule intermediates were formed during the process of KL degradation, as determined using GC-MS analysis. In order to perform metabolic reconstruction of lignin degradation in this bacterium, a draft genome sequence for C. basilensis B-8 was generated. Genomic analysis focused on the catabolic potential of this bacterium against several lignin-derived compounds. These analyses together with sequence comparisons predicted the existence of three major metabolic pathways: β-ketoadipate, phenol degradation, and gentisate pathways. Conclusion These results confirmed the capability of C. basilensis B-8 to promote KL degradation. Whole genomic sequencing and systematic analysis of the C. basilensis B-8 genome identified degradation steps and intermediates from this bacterial-mediated KL degradation method. Our findings provide a theoretical basis for research into the mechanisms of lignin degradation as well as a practical basis for biofuel production using lignin materials. Background Lignin materials are abundant and among the most important potential sources for biofuel production. Development of an efficient lignin degradation process has considerable potential for the production of a variety of chemicals, including bioethanol. However, lignin degradation using current methods is inefficient. Given their immense environmental adaptability and biochemical versatility, bacterial could be used as a valuable tool for the rapid degradation of lignin. Kraft lignin (KL) is a polymer by-product of the pulp and paper industry resulting from alkaline sulfide treatment of lignocellulose, and it has been widely used for lignin-related studies. Results Beta-proteobacterium Cupriavidus basilensis B-8 isolated from erosive bamboo slips displayed substantial KL degradation capability. With initial concentrations of 0.5–6 g L-1, at least 31.3% KL could be degraded in 7 days. The maximum degradation rate was 44.4% at the initial concentration of 2 g L-1. The optimum pH and temperature for KL degradation were 7.0 and 30°C, respectively. Manganese peroxidase (MnP) and laccase (Lac) demonstrated their greatest level of activity, 1685.3 U L-1 and 815.6 U L-1, at the third and fourth days, respectively. Many small molecule intermediates were formed during the process of KL degradation, as determined using GC-MS analysis. In order to perform metabolic reconstruction of lignin degradation in this bacterium, a draft genome sequence for C. basilensis B-8 was generated. Genomic analysis focused on the catabolic potential of this bacterium against several lignin-derived compounds. These analyses together with sequence comparisons predicted the existence of three major metabolic pathways: β-ketoadipate, phenol degradation, and gentisate pathways. Conclusion These results confirmed the capability of C. basilensis B-8 to promote KL degradation. Whole genomic sequencing and systematic analysis of the C. basilensis B-8 genome identified degradation steps and intermediates from this bacterial-mediated KL degradation method. Our findings provide a theoretical basis for research into the mechanisms of lignin degradation as well as a practical basis for biofuel production using lignin materials.
Complete Genome Sequence of the Myoviral Bacteriophage YS35, Which Causes the Lysis of a Multidrug Resistant Pseudomonas aeruginosa Strain
ABSTRACT The lytic bacteriophage YS35, which is capable of lysing multidrug-resistant Pseudomonas aeruginosa strains, was isolated from a sewage sample. Here, we describe the complete genome sequence of this myoviral bacteriophage, which contains 93,296 bp of double-stranded DNA and has a 49.4% G+C content. ABSTRACT The lytic bacteriophage YS35, which is capable of lysing multidrug-resistant Pseudomonas aeruginosa strains, was isolated from a sewage sample. Here, we describe the complete genome sequence of this myoviral bacteriophage, which contains 93,296 bp of double-stranded DNA and has a 49.4% G+C content.
Complete mitogenome sequences of four flatfishes (Pleuronectiformes) reveal a novel gene arrangement of L strand coding genes
Background Few mitochondrial gene rearrangements are found in vertebrates and large-scale changes in these genomes occur even less frequently. It is difficult, therefore, to propose a mechanism to account for observed changes in mitogenome structure. Mitochondrial gene rearrangements are usually explained by the recombination model or tandem duplication and random loss model. Results In this study, the complete mitochondrial genomes of four flatfishes, Crossorhombus azureus (blue flounder), Grammatobothus krempfi, Pleuronichthys cornutus, and Platichthys stellatus were determined. A striking finding is that eight genes in the C. azureus mitogenome are located in a novel position, differing from that of available vertebrate mitogenomes. Specifically, the ND6 and seven tRNA genes (the Q, A, C, Y, S1, E, P genes) encoded by the L-strand have been translocated to a position between tRNA-T and tRNA-F though the original order of the genes is maintained. Conclusions These special features are used to suggest a mechanism for C. azureus mitogenome rearrangement. First, a dimeric molecule was formed by two monomers linked head-to-tail, then one of the two sets of promoters lost function and the genes controlled by the disabled promoters became pseudogenes, non-coding sequences, and even were lost from the genome. This study provides a new gene-rearrangement model that accounts for the events of gene-rearrangement in a vertebrate mitogenome. Background Few mitochondrial gene rearrangements are found in vertebrates and large-scale changes in these genomes occur even less frequently. It is difficult, therefore, to propose a mechanism to account for observed changes in mitogenome structure. Mitochondrial gene rearrangements are usually explained by the recombination model or tandem duplication and random loss model. Results In this study, the complete mitochondrial genomes of four flatfishes, Crossorhombus azureus (blue flounder), Grammatobothus krempfi, Pleuronichthys cornutus, and Platichthys stellatus were determined. A striking finding is that eight genes in the C. azureus mitogenome are located in a novel position, differing from that of available vertebrate mitogenomes. Specifically, the ND6 and seven tRNA genes (the Q, A, C, Y, S1, E, P genes) encoded by the L-strand have been translocated to a position between tRNA-T and tRNA-F though the original order of the genes is maintained. Conclusions These special features are used to suggest a mechanism for C. azureus mitogenome rearrangement. First, a dimeric molecule was formed by two monomers linked head-to-tail, then one of the two sets of promoters lost function and the genes controlled by the disabled promoters became pseudogenes, non-coding sequences, and even were lost from the genome. This study provides a new gene-rearrangement model that accounts for the events of gene-rearrangement in a vertebrate mitogenome.
The Adaptation of Temperate Bacteriophages to Their Host Genomes
Rapid turnover of mobile elements drives the plasticity of bacterial genomes. Integrated bacteriophages (prophages) encode host-adaptive traits and represent a sizable fraction of bacterial chromosomes. We hypothesized that natural selection shapes prophage integration patterns relative to the host genome organization. We tested this idea by detecting and studying 500 prophages of 69 strains of Escherichia and Salmonella. Phage integrases often target not only conserved genes but also intergenic positions, suggesting purifying selection for integration sites. Furthermore, most integration hotspots are conserved between the two host genera. Integration sites seem also selected at the large chromosomal scale, as they are nonrandomly organized in terms of the origin–terminus axis and the macrodomain structure. The genes of lambdoid prophages are systematically co-oriented with the bacterial replication fork and display the host high frequency of polarized FtsK-orienting polar sequences motifs required for chromosome segregation. matS motifs are strongly avoided by prophages suggesting counter selection of motifs disrupting macrodomains. These results show how natural selection for seamless integration of prophages in the chromosome shapes the evolution of the bacterium and the phage. First, integration sites are highly conserved for many millions of years favoring lysogeny over the lytic cycle for temperate phages. Second, the global distribution of prophages is intimately associated with the chromosome structure and the patterns of gene expression. Third, the phage endures selection for DNA motifs that pertain exclusively to the biology of the prophage in the bacterial chromosome. Understanding prophage genetic adaptation sheds new lights on the coexistence of horizontal transfer and organized bacterial genomes. Rapid turnover of mobile elements drives the plasticity of bacterial genomes. Integrated bacteriophages (prophages) encode host-adaptive traits and represent a sizable fraction of bacterial chromosomes. We hypothesized that natural selection shapes prophage integration patterns relative to the host genome organization. We tested this idea by detecting and studying 500 prophages of 69 strains of Escherichia and Salmonella. Phage integrases often target not only conserved genes but also intergenic positions, suggesting purifying selection for integration sites. Furthermore, most integration hotspots are conserved between the two host genera. Integration sites seem also selected at the large chromosomal scale, as they are nonrandomly organized in terms of the origin–terminus axis and the macrodomain structure. The genes of lambdoid prophages are systematically co-oriented with the bacterial replication fork and display the host high frequency of polarized FtsK-orienting polar sequences motifs required for chromosome segregation. matS motifs are strongly avoided by prophages suggesting counter selection of motifs disrupting macrodomains. These results show how natural selection for seamless integration of prophages in the chromosome shapes the evolution of the bacterium and the phage. First, integration sites are highly conserved for many millions of years favoring lysogeny over the lytic cycle for temperate phages. Second, the global distribution of prophages is intimately associated with the chromosome structure and the patterns of gene expression. Third, the phage endures selection for DNA motifs that pertain exclusively to the biology of the prophage in the bacterial chromosome. Understanding prophage genetic adaptation sheds new lights on the coexistence of horizontal transfer and organized bacterial genomes.
Convergent Evolution of Chicken Z and Human X Chromosomes by Expansion and Gene Acquisition
In birds, as in mammals, one pair of chromosomes differs between the sexes. In birds, males are ZZ and females ZW. In mammals, males are XY and females XX. Like the mammalian XY pair, the avian ZW pair is believed to have evolved from autosomes, with most change occurring in the chromosomes found in only one sex – the W and Y chromosomes1–5. By contrast, the sex chromosomes found in both sexes – the Z and X chromosomes – are assumed to have diverged little from their autosomal progenitors2. Here we report findings that overturn this assumption for both the chicken Z and human X chromosomes. The chicken Z chromosome, which we sequenced essentially to completion, is less gene-dense than chicken autosomes but contains a massive tandem array containing hundreds of duplicated genes expressed in testes. A comprehensive comparison of the chicken Z chromosome to the finished sequence of the human X chromosome demonstrates that each evolved independently from different portions of the ancestral genome. Despite this independence, the chicken Z and human X chromosomes share features that distinguish them from autosomes: the acquisition and amplification of testis-expressed genes, as well as a low gene density resulting from an expansion of intergenic regions. These features were not present on the autosomes from which the Z and X chromosomes originated but were instead acquired during the evolution of the Z and X as sex chromosomes. We conclude that the avian Z and mammalian X chromosomes followed convergent evolutionary trajectories, despite their evolving with opposite (female vs. male) systems of heterogamety. More broadly, in birds and mammals, sex chromosome evolution involved not only gene loss in sex-specific chromosomes, but also marked expansion and gene acquisition in sex chromosomes common to males and females. In birds, as in mammals, one pair of chromosomes differs between the sexes. In birds, males are ZZ and females ZW. In mammals, males are XY and females XX. Like the mammalian XY pair, the avian ZW pair is believed to have evolved from autosomes, with most change occurring in the chromosomes found in only one sex – the W and Y chromosomes1–5. By contrast, the sex chromosomes found in both sexes – the Z and X chromosomes – are assumed to have diverged little from their autosomal progenitors2. Here we report findings that overturn this assumption for both the chicken Z and human X chromosomes. The chicken Z chromosome, which we sequenced essentially to completion, is less gene-dense than chicken autosomes but contains a massive tandem array containing hundreds of duplicated genes expressed in testes. A comprehensive comparison of the chicken Z chromosome to the finished sequence of the human X chromosome demonstrates that each evolved independently from different portions of the ancestral genome. Despite this independence, the chicken Z and human X chromosomes share features that distinguish them from autosomes: the acquisition and amplification of testis-expressed genes, as well as a low gene density resulting from an expansion of intergenic regions. These features were not present on the autosomes from which the Z and X chromosomes originated but were instead acquired during the evolution of the Z and X as sex chromosomes. We conclude that the avian Z and mammalian X chromosomes followed convergent evolutionary trajectories, despite their evolving with opposite (female vs. male) systems of heterogamety. More broadly, in birds and mammals, sex chromosome evolution involved not only gene loss in sex-specific chromosomes, but also marked expansion and gene acquisition in sex chromosomes common to males and females.
A marine inducible prophage vB_CibM P1 isolated from the aerobic anoxygenic phototrophic bacterium Citromicrobium bathyomarinum JL354
A prophage vB_CibM-P1 was induced by mitomycin C from the epipelagic strain Citromicrobium bathyomarinum JL354, a member of the alpha-IV subcluster of marine aerobic anoxygenic phototrophic bacteria (AAPB). The induced bacteriophage vB_CibM-P1 had Myoviridae-like morphology and polyhedral heads (approximately capsid 60–100 nm) with tail fibers. The vB_CibM-P1 genome is ~38 kb in size, with 66.0% GC content. The genome contains 58 proposed open reading frames that are involved in integration, DNA packaging, morphogenesis and bacterial lysis. VB_CibM-P1 is a temperate phage that can be directly induced in hosts. In response to mitomycin C induction, virus-like particles can increase to 7 × 109 per ml, while host cells decrease an order of magnitude. The vB_CibM-P1 bacteriophage is the first inducible prophage from AAPB. A prophage vB_CibM-P1 was induced by mitomycin C from the epipelagic strain Citromicrobium bathyomarinum JL354, a member of the alpha-IV subcluster of marine aerobic anoxygenic phototrophic bacteria (AAPB). The induced bacteriophage vB_CibM-P1 had Myoviridae-like morphology and polyhedral heads (approximately capsid 60–100 nm) with tail fibers. The vB_CibM-P1 genome is ~38 kb in size, with 66.0% GC content. The genome contains 58 proposed open reading frames that are involved in integration, DNA packaging, morphogenesis and bacterial lysis. VB_CibM-P1 is a temperate phage that can be directly induced in hosts. In response to mitomycin C induction, virus-like particles can increase to 7 × 109 per ml, while host cells decrease an order of magnitude. The vB_CibM-P1 bacteriophage is the first inducible prophage from AAPB.
Complete Genome Sequence of Enterococcus Bacteriophage EFLK1
We previously isolated EFDG1, a lytic phage against enterococci for therapeutic use. Nevertheless, EFDG1-resistant bacterial strains (EFDG1r) have evolved. EFLK1, a new highly effective phage against EFDG1r strains, was isolated in this study. The genome of EFLK1 was fully sequenced, analyzed, and deposited in GenBank. We previously isolated EFDG1, a lytic phage against enterococci for therapeutic use. Nevertheless, EFDG1-resistant bacterial strains (EFDG1r) have evolved. EFLK1, a new highly effective phage against EFDG1r strains, was isolated in this study. The genome of EFLK1 was fully sequenced, analyzed, and deposited in GenBank.
Clear Genetic Distinctiveness between Human and Pig Derived Trichuris Based on Analyses of Mitochondrial Datasets
Author Summary Trichuriasis is a neglected tropical disease (NTD) caused by parasitic nematodes of the genus Trichuris (Nematoda), causing significant human and animal health problems as well as considerable socio-economic consequences world-wide. Although Trichuris species are considered to be relatively host specific, there has been significant controversy as to whether Trichuris infecting humans (recognized as T. trichiura) is a distinct species from that found in pigs (recognized as T. suis), or not. In the present study, we sequenced, annotated and compared the complete mitochondrial genomes of Trichuris from these two hosts and undertook a phylogenetic analysis of the mitochondrial datasets. This analysis showed clear genetic distinctiveness and strong statistical support for the hypothesis that T. trichiura and T. suis are separate species, consistent with previous studies using nuclear ribosomal DNA sequence data. Future studies could explore, using mitochondrial genetic markers defined in the present study, cross-transmission of Trichuris between pigs and humans in endemic regions, and the population genetics of T. trichiura and T. suis. Author Summary Trichuriasis is a neglected tropical disease (NTD) caused by parasitic nematodes of the genus Trichuris (Nematoda), causing significant human and animal health problems as well as considerable socio-economic consequences world-wide. Although Trichuris species are considered to be relatively host specific, there has been significant controversy as to whether Trichuris infecting humans (recognized as T. trichiura) is a distinct species from that found in pigs (recognized as T. suis), or not. In the present study, we sequenced, annotated and compared the complete mitochondrial genomes of Trichuris from these two hosts and undertook a phylogenetic analysis of the mitochondrial datasets. This analysis showed clear genetic distinctiveness and strong statistical support for the hypothesis that T. trichiura and T. suis are separate species, consistent with previous studies using nuclear ribosomal DNA sequence data. Future studies could explore, using mitochondrial genetic markers defined in the present study, cross-transmission of Trichuris between pigs and humans in endemic regions, and the population genetics of T. trichiura and T. suis.The whipworm, Trichuris trichiura, causes trichuriasis in ∼600 million people worldwide, mainly in developing countries. Whipworms also infect other animal hosts, including pigs (T. suis), dogs (T. vulpis) and non-human primates, and cause disease in these hosts, which is similar to trichuriasis of humans. Although Trichuris species are considered to be host specific, there has been considerable controversy, over the years, as to whether T. trichiura and T. suis are the same or distinct species. Here, we characterised the entire mitochondrial genomes of human-derived Trichuris and pig-derived Trichuris, compared them and then tested the hypothesis that the parasites from these two host species are genetically distinct in a phylogenetic analysis of the sequence data. Taken together, the findings support the proposal that T. trichiura and T. suis are separate species, consistent with previous data for nuclear ribosomal DNA. Using molecular analytical tools, employing genetic markers defined herein, future work should conduct large-scale studies to establish whether T. trichiura is found in pigs and T. suis in humans in endemic regions. The whipworm, Trichuris trichiura, causes trichuriasis in ∼600 million people worldwide, mainly in developing countries. Whipworms also infect other animal hosts, including pigs (T. suis), dogs (T. vulpis) and non-human primates, and cause disease in these hosts, which is similar to trichuriasis of humans. Although Trichuris species are considered to be host specific, there has been considerable controversy, over the years, as to whether T. trichiura and T. suis are the same or distinct species. Here, we characterised the entire mitochondrial genomes of human-derived Trichuris and pig-derived Trichuris, compared them and then tested the hypothesis that the parasites from these two host species are genetically distinct in a phylogenetic analysis of the sequence data. Taken together, the findings support the proposal that T. trichiura and T. suis are separate species, consistent with previous data for nuclear ribosomal DNA. Using molecular analytical tools, employing genetic markers defined herein, future work should conduct large-scale studies to establish whether T. trichiura is found in pigs and T. suis in humans in endemic regions.
Draft Genome Sequence of Zymomonas mobilis ZM481 (ATCC 31823)
Zymomonas mobilis ZM481 (ATCC 31823) is an ethanol-tolerant strain that can produce the highest level of ethanol in Z. mobilis from glucose in the shortest time. Here, we report a draft genome sequence of ZM481, which can help us understand the genes related to the ethanol tolerance of this strain. Zymomonas mobilis ZM481 (ATCC 31823) is an ethanol-tolerant strain that can produce the highest level of ethanol in Z. mobilis from glucose in the shortest time. Here, we report a draft genome sequence of ZM481, which can help us understand the genes related to the ethanol tolerance of this strain.
Intraspecific rearrangement of mitochondrial genome suggests the prevalence of the tandem duplication random loss (TDLR) mechanism in Quasipaa boulengeri
Background Tandem duplication followed by random loss (TDRL) is the most frequently invoked model to explain the diversity of gene rearrangements in metazoan mitogenomes. The initial stages of gene rearrangement are difficult to observe in nature, which limits our understanding of incipient duplication events and the subsequent process of random loss. Intraspecific gene reorganizations may represent intermediate states, and if so they potentially shed light on the evolutionary dynamics of TDRL. Results Nucleotide sequences in a hotspot of gene-rearrangement in 28 populations of a single species of frog, Quasipaa boulengeri, provide such predicted intermediate states. Gene order and phylogenetic analyses support a single tandem duplication event and a step-by-step process of random loss. Intraspecific gene rearrangements are not commonly found through comparison of all mitochondrial DNA records of amphibians and squamate reptiles in GenBank. Conclusions The intraspecific variation in Q. boulengeri provides insights into the rate of partial duplications and deletions within a mitogenome, and reveals that fixation and gene-distribution in mitogenomic reorganization is likely non-adaptive. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3309-7) contains supplementary material, which is available to authorized users. Background Tandem duplication followed by random loss (TDRL) is the most frequently invoked model to explain the diversity of gene rearrangements in metazoan mitogenomes. The initial stages of gene rearrangement are difficult to observe in nature, which limits our understanding of incipient duplication events and the subsequent process of random loss. Intraspecific gene reorganizations may represent intermediate states, and if so they potentially shed light on the evolutionary dynamics of TDRL. Results Nucleotide sequences in a hotspot of gene-rearrangement in 28 populations of a single species of frog, Quasipaa boulengeri, provide such predicted intermediate states. Gene order and phylogenetic analyses support a single tandem duplication event and a step-by-step process of random loss. Intraspecific gene rearrangements are not commonly found through comparison of all mitochondrial DNA records of amphibians and squamate reptiles in GenBank. Conclusions The intraspecific variation in Q. boulengeri provides insights into the rate of partial duplications and deletions within a mitogenome, and reveals that fixation and gene-distribution in mitogenomic reorganization is likely non-adaptive. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3309-7) contains supplementary material, which is available to authorized users.
Draft Genome Sequence of Polaromonas glacialis Strain R3 9, a Psychrotolerant Bacterium Isolated from Arctic Glacial Foreland
Here we report the draft genome sequence of the psychrotolerant Polaromonas glacialis strain R3-9, isolated from Midtre Lovénbreen glacial foreland near Ny-Alesund, Svalbard Archipelago, Norway. Here we report the draft genome sequence of the psychrotolerant Polaromonas glacialis strain R3-9, isolated from Midtre Lovénbreen glacial foreland near Ny-Alesund, Svalbard Archipelago, Norway.
A Deeply Branching Thermophilic Bacterium with an Ancient Acetyl CoA Pathway Dominates a Subsurface Ecosystem
A nearly complete genome sequence of Candidatus ‘Acetothermum autotrophicum’, a presently uncultivated bacterium in candidate division OP1, was revealed by metagenomic analysis of a subsurface thermophilic microbial mat community. Phylogenetic analysis based on the concatenated sequences of proteins common among 367 prokaryotes suggests that Ca. ‘A. autotrophicum’ is one of the earliest diverging bacterial lineages. It possesses a folate-dependent Wood-Ljungdahl (acetyl-CoA) pathway of CO2 fixation, is predicted to have an acetogenic lifestyle, and possesses the newly discovered archaeal-autotrophic type of bifunctional fructose 1,6-bisphosphate aldolase/phosphatase. A phylogenetic analysis of the core gene cluster of the acethyl-CoA pathway, shared by acetogens, methanogens, some sulfur- and iron-reducers and dechlorinators, supports the hypothesis that the core gene cluster of Ca. ‘A. autotrophicum’ is a particularly ancient bacterial pathway. The habitat, physiology and phylogenetic position of Ca. ‘A. autotrophicum’ support the view that the first bacterial and archaeal lineages were H2-dependent acetogens and methanogenes living in hydrothermal environments. A nearly complete genome sequence of Candidatus ‘Acetothermum autotrophicum’, a presently uncultivated bacterium in candidate division OP1, was revealed by metagenomic analysis of a subsurface thermophilic microbial mat community. Phylogenetic analysis based on the concatenated sequences of proteins common among 367 prokaryotes suggests that Ca. ‘A. autotrophicum’ is one of the earliest diverging bacterial lineages. It possesses a folate-dependent Wood-Ljungdahl (acetyl-CoA) pathway of CO2 fixation, is predicted to have an acetogenic lifestyle, and possesses the newly discovered archaeal-autotrophic type of bifunctional fructose 1,6-bisphosphate aldolase/phosphatase. A phylogenetic analysis of the core gene cluster of the acethyl-CoA pathway, shared by acetogens, methanogens, some sulfur- and iron-reducers and dechlorinators, supports the hypothesis that the core gene cluster of Ca. ‘A. autotrophicum’ is a particularly ancient bacterial pathway. The habitat, physiology and phylogenetic position of Ca. ‘A. autotrophicum’ support the view that the first bacterial and archaeal lineages were H2-dependent acetogens and methanogenes living in hydrothermal environments.
Draft Genome Sequence of Root Associated Sugarcane Growth Promoting Microbispora sp. Strain GKU 823
ABSTRACT The endophytic plant growth-promoting Microbispora sp. strain GKU 823 was isolated from the roots of sugarcane cultivated in Thailand. It has an estimated 9.4-Mbp genome and a G+C content of 71.3%. The genome sequence reveals several genes associated with plant growth-promoting traits and extensive specialized metabolite biosynthesis. ABSTRACT The endophytic plant growth-promoting Microbispora sp. strain GKU 823 was isolated from the roots of sugarcane cultivated in Thailand. It has an estimated 9.4-Mbp genome and a G+C content of 71.3%. The genome sequence reveals several genes associated with plant growth-promoting traits and extensive specialized metabolite biosynthesis.
Comparative Genomics Analysis of Two Different Virulent Bovine Pasteurella multocida Isolates
The Pasteurella multocida capsular type A isolates can cause pneumonia and bovine respiratory disease (BRD). In this study, comparative genomics analysis was carried out to identify the virulence genes in two different virulent P. multocida capsular type A isolates (high virulent PmCQ2 and low virulent PmCQ6). The draft genome sequence of PmCQ2 is 2.32 Mbp and contains 2,002 protein-coding genes, 9 insertion sequence (IS) elements, and 1 prophage region. The draft genome sequence of PmCQ6 is 2.29 Mbp and contains 1,970 protein-coding genes, 2 IS elements, and 3 prophage regions. The genome alignment analysis revealed that the genome similarity between PmCQ2 and PmCQ6 is 99% with high colinearity. To identify the candidate genes responsible for virulence, the PmCQ2 and PmCQ6 were compared together with that of the published genomes of high virulent Pm36950 and PmHN06 and avirulent Pm3480 and Pm70 (capsular type F). Five genes and two insertion sequences are identified in high virulent strains but not in low virulent or avirulent strains. These results indicated that these genes or insertion sequences might be responsible for the virulence of P. multocida, providing prospective candidates for further studies on the pathogenesis and the host-pathogen interactions of P. multocida. The Pasteurella multocida capsular type A isolates can cause pneumonia and bovine respiratory disease (BRD). In this study, comparative genomics analysis was carried out to identify the virulence genes in two different virulent P. multocida capsular type A isolates (high virulent PmCQ2 and low virulent PmCQ6). The draft genome sequence of PmCQ2 is 2.32 Mbp and contains 2,002 protein-coding genes, 9 insertion sequence (IS) elements, and 1 prophage region. The draft genome sequence of PmCQ6 is 2.29 Mbp and contains 1,970 protein-coding genes, 2 IS elements, and 3 prophage regions. The genome alignment analysis revealed that the genome similarity between PmCQ2 and PmCQ6 is 99% with high colinearity. To identify the candidate genes responsible for virulence, the PmCQ2 and PmCQ6 were compared together with that of the published genomes of high virulent Pm36950 and PmHN06 and avirulent Pm3480 and Pm70 (capsular type F). Five genes and two insertion sequences are identified in high virulent strains but not in low virulent or avirulent strains. These results indicated that these genes or insertion sequences might be responsible for the virulence of P. multocida, providing prospective candidates for further studies on the pathogenesis and the host-pathogen interactions of P. multocida.
RNA Seq analyses reveal the order of tRNA processing events and the maturation of C/D box and CRISPR RNAs in the hyperthermophile Methanopyrus kandleri
The methanogenic archaeon Methanopyrus kandleri grows near the upper temperature limit for life. Genome analyses revealed strategies to adapt to these harsh conditions and elucidated a unique transfer RNA (tRNA) C-to-U editing mechanism at base 8 for 30 different tRNA species. Here, RNA-Seq deep sequencing methodology was combined with computational analyses to characterize the small RNome of this hyperthermophilic organism and to obtain insights into the RNA metabolism at extreme temperatures. A large number of 132 small RNAs were identified that guide RNA modifications, which are expected to stabilize structured RNA molecules. The C/D box guide RNAs were shown to exist as circular RNA molecules. In addition, clustered regularly interspaced short palindromic repeats RNA processing and potential regulatory RNAs were identified. Finally, the identification of tRNA precursors before and after the unique C8-to-U8 editing activity enabled the determination of the order of tRNA processing events with termini truncation preceding intron removal. This order of tRNA maturation follows the compartmentalized tRNA processing order found in Eukaryotes and suggests its conservation during evolution. The methanogenic archaeon Methanopyrus kandleri grows near the upper temperature limit for life. Genome analyses revealed strategies to adapt to these harsh conditions and elucidated a unique transfer RNA (tRNA) C-to-U editing mechanism at base 8 for 30 different tRNA species. Here, RNA-Seq deep sequencing methodology was combined with computational analyses to characterize the small RNome of this hyperthermophilic organism and to obtain insights into the RNA metabolism at extreme temperatures. A large number of 132 small RNAs were identified that guide RNA modifications, which are expected to stabilize structured RNA molecules. The C/D box guide RNAs were shown to exist as circular RNA molecules. In addition, clustered regularly interspaced short palindromic repeats RNA processing and potential regulatory RNAs were identified. Finally, the identification of tRNA precursors before and after the unique C8-to-U8 editing activity enabled the determination of the order of tRNA processing events with termini truncation preceding intron removal. This order of tRNA maturation follows the compartmentalized tRNA processing order found in Eukaryotes and suggests its conservation during evolution.
Whole genome analysis of a livestock associated methicillin resistant Staphylococcus aureus ST398 isolate from a case of human endocarditis
Background Recently, a new livestock-associated methicillin-resistant Staphylococcus aureus (MRSA) Sequence Type 398 (ST398) isolate has emerged worldwide. Although there have been reports of invasive disease in humans, MRSA ST398 colonization is much more common in livestock and demonstrates especially high prevalence rates in pigs and calves. The aim of this study was to compare the genome sequence of an ST398 MRSA isolate with other S. aureus genomes in order to identify genetic traits that may explain the success of this particular lineage. Therefore, we determined the whole genome sequence of S0385, an MRSA ST398 isolate from a human case of endocarditis. Results The entire genome sequence of S0385 demonstrated considerable accessory genome content differences relative to other S. aureus genomes. Several mobile genetic elements that confer antibiotic resistance were identified, including a novel composite of an type V (5C2&5) Staphylococcal Chromosome Cassette mec (SCCmec) with distinct joining (J) regions. The presence of multiple integrative conjugative elements combined with the absence of a type I restriction and modification system on one of the two νSa islands, could enhance horizontal gene transfer in this strain. The ST398 MRSA isolate carries a unique pathogenicity island which encodes homologues of two excreted virulence factors; staphylococcal complement inhibitor (SCIN) and von Willebrand factor-binding protein (vWbp). However, several virulence factors such as enterotoxins and phage encoded toxins, including Panton-Valentine leukocidin (PVL), were not identified in this isolate. Conclusions Until now MRSA ST398 isolates did not cause frequent invasive disease in humans, which may be due to the absence of several common virulence factors. However, the proposed enhanced ability of these isolates to acquire mobile elements may lead to the rapid acquisition of determinants which contribute to virulence in human infections. Background Recently, a new livestock-associated methicillin-resistant Staphylococcus aureus (MRSA) Sequence Type 398 (ST398) isolate has emerged worldwide. Although there have been reports of invasive disease in humans, MRSA ST398 colonization is much more common in livestock and demonstrates especially high prevalence rates in pigs and calves. The aim of this study was to compare the genome sequence of an ST398 MRSA isolate with other S. aureus genomes in order to identify genetic traits that may explain the success of this particular lineage. Therefore, we determined the whole genome sequence of S0385, an MRSA ST398 isolate from a human case of endocarditis. Results The entire genome sequence of S0385 demonstrated considerable accessory genome content differences relative to other S. aureus genomes. Several mobile genetic elements that confer antibiotic resistance were identified, including a novel composite of an type V (5C2&5) Staphylococcal Chromosome Cassette mec (SCCmec) with distinct joining (J) regions. The presence of multiple integrative conjugative elements combined with the absence of a type I restriction and modification system on one of the two νSa islands, could enhance horizontal gene transfer in this strain. The ST398 MRSA isolate carries a unique pathogenicity island which encodes homologues of two excreted virulence factors; staphylococcal complement inhibitor (SCIN) and von Willebrand factor-binding protein (vWbp). However, several virulence factors such as enterotoxins and phage encoded toxins, including Panton-Valentine leukocidin (PVL), were not identified in this isolate. Conclusions Until now MRSA ST398 isolates did not cause frequent invasive disease in humans, which may be due to the absence of several common virulence factors. However, the proposed enhanced ability of these isolates to acquire mobile elements may lead to the rapid acquisition of determinants which contribute to virulence in human infections.
Complete Genome Sequence of Martelella endophytica YC6887, Which Has Antifungal Activity Associated with a Halophyte
Martelella endophytica YC6887, which produces antifungal compounds against fungal and oomycete pathogens, was isolated from the root of a halophyte, Rosa rugosa, collected at a tidal flat in South Korea. Its full-genome sequence shows that it is a circular DNA, without a plasmid, of about 4.8 Mb in size. Martelella endophytica YC6887, which produces antifungal compounds against fungal and oomycete pathogens, was isolated from the root of a halophyte, Rosa rugosa, collected at a tidal flat in South Korea. Its full-genome sequence shows that it is a circular DNA, without a plasmid, of about 4.8 Mb in size.
A Framework for Assessing the Concordance of Molecular Typing Methods and the True Strain Phylogeny of Campylobacter jejuni and C. coli Using Draft Genome Sequence Data
Tracking of sources of sporadic cases of campylobacteriosis remains challenging, as commonly used molecular typing methods have limited ability to unambiguously link genetically related strains. Genomics has become increasingly prominent in the public health response to enteric pathogens as methods enable characterization of pathogens at an unprecedented level of resolution. However, the cost of sequencing and expertise required for bioinformatic analyses remains prohibitive, and these comprehensive analyses are limited to a few priority strains. Although several molecular typing methods are currently widely used for epidemiological analysis of campylobacters, it is not clear how accurately these methods reflect true strain relationships. To address this, we have developed a framework and associated computational tools to rapidly analyze draft genome sequence data for the assessment of molecular typing methods against a “gold standard” based on the phylogenetic analysis of highly conserved core (HCC) genes with high sequence quality. We analyzed 104 publicly available whole genome sequences (WGS) of C. jejuni and C. coli. In addition to in silico determination of multi-locus sequence typing (MLST), flaA, and porA type, as well as comparative genomic fingerprinting (CGF) type, we inferred a “reference” phylogeny based on 389 HCC genes. Molecular typing data were compared to the reference phylogeny for concordance using the adjusted Wallace coefficient (AWC) with confidence intervals. Although MLST targets the sequence variability in core genes and CGF targets insertions/deletions of accessory genes, both methods are based on multi-locus analysis and provided better estimates of true phylogeny than methods based on single loci (porA, flaA). A more comprehensive WGS dataset including additional genetically related strains, both epidemiologically linked and unlinked, will be necessary to more comprehensively assess the performance of subtyping methods for outbreak investigations and surveillance activities. Analyses of the strengths and weaknesses of widely used typing methodologies in inferring true strain relationships will provide guidance in the interpretation of this data for epidemiological purposes. Tracking of sources of sporadic cases of campylobacteriosis remains challenging, as commonly used molecular typing methods have limited ability to unambiguously link genetically related strains. Genomics has become increasingly prominent in the public health response to enteric pathogens as methods enable characterization of pathogens at an unprecedented level of resolution. However, the cost of sequencing and expertise required for bioinformatic analyses remains prohibitive, and these comprehensive analyses are limited to a few priority strains. Although several molecular typing methods are currently widely used for epidemiological analysis of campylobacters, it is not clear how accurately these methods reflect true strain relationships. To address this, we have developed a framework and associated computational tools to rapidly analyze draft genome sequence data for the assessment of molecular typing methods against a “gold standard” based on the phylogenetic analysis of highly conserved core (HCC) genes with high sequence quality. We analyzed 104 publicly available whole genome sequences (WGS) of C. jejuni and C. coli. In addition to in silico determination of multi-locus sequence typing (MLST), flaA, and porA type, as well as comparative genomic fingerprinting (CGF) type, we inferred a “reference” phylogeny based on 389 HCC genes. Molecular typing data were compared to the reference phylogeny for concordance using the adjusted Wallace coefficient (AWC) with confidence intervals. Although MLST targets the sequence variability in core genes and CGF targets insertions/deletions of accessory genes, both methods are based on multi-locus analysis and provided better estimates of true phylogeny than methods based on single loci (porA, flaA). A more comprehensive WGS dataset including additional genetically related strains, both epidemiologically linked and unlinked, will be necessary to more comprehensively assess the performance of subtyping methods for outbreak investigations and surveillance activities. Analyses of the strengths and weaknesses of widely used typing methodologies in inferring true strain relationships will provide guidance in the interpretation of this data for epidemiological purposes.
The Dynamic Regulatory Genome of Capsaspora and the Origin of Animal Multicellularity
Summary The unicellular ancestor of animals had a complex repertoire of genes linked to multicellular processes. This suggests that changes in the regulatory genome, rather than in gene innovation, were key to the origin of animals. Here, we carry out multiple functional genomic assays in Capsaspora owczarzaki, the unicellular relative of animals with the largest known gene repertoire for transcriptional regulation. We show that changing chromatin states, differential lincRNA expression, and dynamic cis-regulatory sites are associated with life cycle transitions in Capsaspora. Moreover, we demonstrate conservation of animal developmental transcription-factor networks and extensive network interconnection in this premetazoan organism. In contrast, however, Capsaspora lacks animal promoter types, and its regulatory sites are small, proximal, and lack signatures of animal enhancers. Overall, our results indicate that the emergence of animal multicellularity was linked to a major shift in genome cis-regulatory complexity, most notably the appearance of distal enhancer regulation. Summary The unicellular ancestor of animals had a complex repertoire of genes linked to multicellular processes. This suggests that changes in the regulatory genome, rather than in gene innovation, were key to the origin of animals. Here, we carry out multiple functional genomic assays in Capsaspora owczarzaki, the unicellular relative of animals with the largest known gene repertoire for transcriptional regulation. We show that changing chromatin states, differential lincRNA expression, and dynamic cis-regulatory sites are associated with life cycle transitions in Capsaspora. Moreover, we demonstrate conservation of animal developmental transcription-factor networks and extensive network interconnection in this premetazoan organism. In contrast, however, Capsaspora lacks animal promoter types, and its regulatory sites are small, proximal, and lack signatures of animal enhancers. Overall, our results indicate that the emergence of animal multicellularity was linked to a major shift in genome cis-regulatory complexity, most notably the appearance of distal enhancer regulation.
Histone demethylase Lsd1 represses hematopoietic stem and progenitor cell signatures during blood cell maturation
eLife digest Our blood contains many different types of cells. Red blood cells carry oxygen around the body, whereas white blood cells are a key part of our immune system. All these different types of blood cells are derived from special cells in our bone marrow called hematopoietic stem cells. The type of blood cell that the stem cell becomes depends on the genes that are expressed as proteins in that stem cell. Gene expression can be controlled in a number of ways, including epigenetic process that influence the expression of genes without altering the underlying sequence of bases in the DNA. For example, DNA is wrapped around histone proteins and the addition of a methyl group to these proteins, a process known as histone methylation, can increase the expression of a gene, whereas the removal of a methyl group (demethylation) can repress gene expression. Lysine-specific demethylase 1 (Lsd1) is an enzyme that is known to mediate the demethylation of lysine amino acids on histone proteins. The role of Lsd1 in embryonic stem cells has been widely studied, and deletion of the gene that codes for Lsd1 is known to result in the death of mice embryos. However, very little is known about its roles in the later stages of mammalian development. Here, Kerenyi et al. use new genetic tools to knock out the gene for Lsd1 at different stages of development in order to examine its impact on the formation of new blood cells. They find that Lsd1 is required for the successful differentiation of hematopoietic stem cells into different types of blood cells, and that knocking out Lsd1 results in a severe loss of white and red blood cells. Moreover, they show that the lack of Lsd1 causes problems during both the early and later stages of development. Kerenyi et al. go on to demonstrate that Lsd1 regulates the activity of promoters and enhancers of various genes associated with hematopoietic stem cells. They also show that knocking out the Lsd1 gene results in impaired silencing of these genes, and that the incomplete expression of these genes is not compatible with the maturation of blood cells. Lsd1 has recently been proposed as the potential target for the treatment of leukemia and other blood disorders. However, the fact that a loss of Lsd1 function has adverse effects during both the early and later stages of blood cell development suggests that research into drugs that target Lsd1 should not begin until a suitable time window for the administration of such drugs can be identified. DOI: http://dx.doi.org/10.7554/eLife.00633.002 eLife digest Our blood contains many different types of cells. Red blood cells carry oxygen around the body, whereas white blood cells are a key part of our immune system. All these different types of blood cells are derived from special cells in our bone marrow called hematopoietic stem cells. The type of blood cell that the stem cell becomes depends on the genes that are expressed as proteins in that stem cell. Gene expression can be controlled in a number of ways, including epigenetic process that influence the expression of genes without altering the underlying sequence of bases in the DNA. For example, DNA is wrapped around histone proteins and the addition of a methyl group to these proteins, a process known as histone methylation, can increase the expression of a gene, whereas the removal of a methyl group (demethylation) can repress gene expression. Lysine-specific demethylase 1 (Lsd1) is an enzyme that is known to mediate the demethylation of lysine amino acids on histone proteins. The role of Lsd1 in embryonic stem cells has been widely studied, and deletion of the gene that codes for Lsd1 is known to result in the death of mice embryos. However, very little is known about its roles in the later stages of mammalian development. Here, Kerenyi et al. use new genetic tools to knock out the gene for Lsd1 at different stages of development in order to examine its impact on the formation of new blood cells. They find that Lsd1 is required for the successful differentiation of hematopoietic stem cells into different types of blood cells, and that knocking out Lsd1 results in a severe loss of white and red blood cells. Moreover, they show that the lack of Lsd1 causes problems during both the early and later stages of development. Kerenyi et al. go on to demonstrate that Lsd1 regulates the activity of promoters and enhancers of various genes associated with hematopoietic stem cells. They also show that knocking out the Lsd1 gene results in impaired silencing of these genes, and that the incomplete expression of these genes is not compatible with the maturation of blood cells. Lsd1 has recently been proposed as the potential target for the treatment of leukemia and other blood disorders. However, the fact that a loss of Lsd1 function has adverse effects during both the early and later stages of blood cell development suggests that research into drugs that target Lsd1 should not begin until a suitable time window for the administration of such drugs can be identified. DOI: http://dx.doi.org/10.7554/eLife.00633.002Here, we describe that lysine-specific demethylase 1 (Lsd1/KDM1a), which demethylates histone H3 on Lys4 or Lys9 (H3K4/K9), is an indispensible epigenetic governor of hematopoietic differentiation. Integrative genomic analysis, combining global occupancy of Lsd1, genome-wide analysis of its substrates H3K4 monomethylation and dimethylation, and gene expression profiling, reveals that Lsd1 represses hematopoietic stem and progenitor cell (HSPC) gene expression programs during hematopoietic differentiation. We found that Lsd1 acts at transcription start sites, as well as enhancer regions. Loss of Lsd1 was associated with increased H3K4me1 and H3K4me2 methylation on HSPC genes and gene derepression. Failure to fully silence HSPC genes compromised differentiation of hematopoietic stem cells as well as mature blood cell lineages. Collectively, our data indicate that Lsd1-mediated concurrent repression of enhancer and promoter activity of stem and progenitor cell genes is a pivotal epigenetic mechanism required for proper hematopoietic maturation. DOI: http://dx.doi.org/10.7554/eLife.00633.001 Here, we describe that lysine-specific demethylase 1 (Lsd1/KDM1a), which demethylates histone H3 on Lys4 or Lys9 (H3K4/K9), is an indispensible epigenetic governor of hematopoietic differentiation. Integrative genomic analysis, combining global occupancy of Lsd1, genome-wide analysis of its substrates H3K4 monomethylation and dimethylation, and gene expression profiling, reveals that Lsd1 represses hematopoietic stem and progenitor cell (HSPC) gene expression programs during hematopoietic differentiation. We found that Lsd1 acts at transcription start sites, as well as enhancer regions. Loss of Lsd1 was associated with increased H3K4me1 and H3K4me2 methylation on HSPC genes and gene derepression. Failure to fully silence HSPC genes compromised differentiation of hematopoietic stem cells as well as mature blood cell lineages. Collectively, our data indicate that Lsd1-mediated concurrent repression of enhancer and promoter activity of stem and progenitor cell genes is a pivotal epigenetic mechanism required for proper hematopoietic maturation. DOI: http://dx.doi.org/10.7554/eLife.00633.001
Correction of the auditory phenotype in C57BL/6N mice via CRISPR/Cas9 mediated homology directed repair
Background Nuclease-based technologies have been developed that enable targeting of specific DNA sequences directly in the zygote. These approaches provide an opportunity to modify the genomes of inbred mice, and allow the removal of strain-specific mutations that confound phenotypic assessment. One such mutation is the Cdh23ahl allele, present in several commonly used inbred mouse strains, which predisposes to age-related progressive hearing loss. Results We have used targeted CRISPR/Cas9-mediated homology directed repair (HDR) to correct the Cdh23ahl allele directly in C57BL/6NTac zygotes. Employing offset-nicking Cas9 (D10A) nickase with paired RNA guides and a single-stranded oligonucleotide donor template we show that allele repair was successfully achieved. To investigate potential Cas9-mediated ‘off-target’ mutations in our corrected mouse, we undertook whole-genome sequencing and assessed the ‘off-target’ sites predicted for the guide RNAs (≤4 nucleotide mis-matches). No induced sequence changes were identified at any of these sites. Correction of the progressive hearing loss phenotype was demonstrated using auditory-evoked brainstem response testing of mice at 24 and 36 weeks of age, and rescue of the progressive loss of sensory hair cell stereocilia bundles was confirmed using scanning electron microscopy of dissected cochleae from 36-week-old mice. Conclusions CRISPR/Cas9-mediated HDR has been successfully utilised to efficiently correct the Cdh23ahl allele in C57BL/6NTac mice, and rescue the associated auditory phenotype. The corrected mice described in this report will allow age-related auditory phenotyping studies to be undertaken using C57BL/6NTac-derived models, such as those generated by the International Mouse Phenotyping Consortium (IMPC) programme. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0273-4) contains supplementary material, which is available to authorized users. Background Nuclease-based technologies have been developed that enable targeting of specific DNA sequences directly in the zygote. These approaches provide an opportunity to modify the genomes of inbred mice, and allow the removal of strain-specific mutations that confound phenotypic assessment. One such mutation is the Cdh23ahl allele, present in several commonly used inbred mouse strains, which predisposes to age-related progressive hearing loss. Results We have used targeted CRISPR/Cas9-mediated homology directed repair (HDR) to correct the Cdh23ahl allele directly in C57BL/6NTac zygotes. Employing offset-nicking Cas9 (D10A) nickase with paired RNA guides and a single-stranded oligonucleotide donor template we show that allele repair was successfully achieved. To investigate potential Cas9-mediated ‘off-target’ mutations in our corrected mouse, we undertook whole-genome sequencing and assessed the ‘off-target’ sites predicted for the guide RNAs (≤4 nucleotide mis-matches). No induced sequence changes were identified at any of these sites. Correction of the progressive hearing loss phenotype was demonstrated using auditory-evoked brainstem response testing of mice at 24 and 36 weeks of age, and rescue of the progressive loss of sensory hair cell stereocilia bundles was confirmed using scanning electron microscopy of dissected cochleae from 36-week-old mice. Conclusions CRISPR/Cas9-mediated HDR has been successfully utilised to efficiently correct the Cdh23ahl allele in C57BL/6NTac mice, and rescue the associated auditory phenotype. The corrected mice described in this report will allow age-related auditory phenotyping studies to be undertaken using C57BL/6NTac-derived models, such as those generated by the International Mouse Phenotyping Consortium (IMPC) programme. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0273-4) contains supplementary material, which is available to authorized users.
Genome Wide Estimates of Mutation Rates and Spectrum in Schizosaccharomyces pombe Indicate CpG Sites are Highly Mutagenic Despite the Absence of DNA Methylation
We accumulated mutations for 1952 generations in 79 initially identical, haploid lines of the fission yeast Schizosaccharomyces pombe, and then performed whole-genome sequencing to determine the mutation rates and spectrum. We captured 696 spontaneous mutations across the 79 mutation accumulation (MA) lines. We compared the mutation spectrum and rate to a recently published equivalent experiment on the same species, and to another model ascomycetous yeast, the budding yeast Saccharomyces cerevisiae. While the two species are approximately 600 million years diverged from each other, they share similar life histories, genome size and genomic G/C content. We found that Sc. pombe and S. cerevisiae have similar mutation rates, but Sc. pombe exhibits a stronger insertion bias. Intriguingly, we observed an increased mutation rate at cytosine nucleotides, specifically CpG nucleotides, which is also seen in S. cerevisiae. However, the absence of methylation in Sc. pombe and the pattern of mutation at these sites, primarily C → A as opposed to C → T, strongly suggest that the increased mutation rate is not caused by deamination of methylated cytosines. This result implies that the high mutability of CpG dinucleotides in other species may be caused in part by a methylation-independent mechanism. Many of our findings mirror those seen in the recent study, despite the use of different passaging conditions, indicating that MA is a reliable method for estimating mutation rates and spectra. We accumulated mutations for 1952 generations in 79 initially identical, haploid lines of the fission yeast Schizosaccharomyces pombe, and then performed whole-genome sequencing to determine the mutation rates and spectrum. We captured 696 spontaneous mutations across the 79 mutation accumulation (MA) lines. We compared the mutation spectrum and rate to a recently published equivalent experiment on the same species, and to another model ascomycetous yeast, the budding yeast Saccharomyces cerevisiae. While the two species are approximately 600 million years diverged from each other, they share similar life histories, genome size and genomic G/C content. We found that Sc. pombe and S. cerevisiae have similar mutation rates, but Sc. pombe exhibits a stronger insertion bias. Intriguingly, we observed an increased mutation rate at cytosine nucleotides, specifically CpG nucleotides, which is also seen in S. cerevisiae. However, the absence of methylation in Sc. pombe and the pattern of mutation at these sites, primarily C → A as opposed to C → T, strongly suggest that the increased mutation rate is not caused by deamination of methylated cytosines. This result implies that the high mutability of CpG dinucleotides in other species may be caused in part by a methylation-independent mechanism. Many of our findings mirror those seen in the recent study, despite the use of different passaging conditions, indicating that MA is a reliable method for estimating mutation rates and spectra.
Impact of the G84E variant on HOXB13 gene and protein expression in formalin fixed, paraffin embedded prostate tumours
The HOXB13 G84E variant is associated with risk of prostate cancer (PCa), however the role this variant plays in PCa development is unknown. This study examined 751 cases, 450 relatives and 355 controls to determine the contribution of this variant to PCa risk in Tasmania and investigated HOXB13 gene and protein expression in tumours from nine G84E heterozygote variant and 13 wild-type carriers. Quantitative PCR and immunohistochemistry showed that HOXB13 gene and protein expression did not differ between tumour samples from variant and wild-type carriers. Allele-specific transcription revealed that two of seven G84E carriers transcribed both the variant and wild-type allele, while five carriers transcribed the wild-type allele. Methylation of surrounding CpG sites was lower in the variant compared to the wild-type allele, however overall methylation across the region was very low. Notably, tumour characteristics were less aggressive in the two variant carriers that transcribed the variant allele compared to the five that did not. This study has shown that HOXB13 expression does not differ between tumour tissue of G84E variant carriers and non-carriers. Intriguingly, the G84E variant allele was rarely transcribed in carriers, suggesting that HOXB13 expression may be driven by the wild-type allele in the majority of carriers. The HOXB13 G84E variant is associated with risk of prostate cancer (PCa), however the role this variant plays in PCa development is unknown. This study examined 751 cases, 450 relatives and 355 controls to determine the contribution of this variant to PCa risk in Tasmania and investigated HOXB13 gene and protein expression in tumours from nine G84E heterozygote variant and 13 wild-type carriers. Quantitative PCR and immunohistochemistry showed that HOXB13 gene and protein expression did not differ between tumour samples from variant and wild-type carriers. Allele-specific transcription revealed that two of seven G84E carriers transcribed both the variant and wild-type allele, while five carriers transcribed the wild-type allele. Methylation of surrounding CpG sites was lower in the variant compared to the wild-type allele, however overall methylation across the region was very low. Notably, tumour characteristics were less aggressive in the two variant carriers that transcribed the variant allele compared to the five that did not. This study has shown that HOXB13 expression does not differ between tumour tissue of G84E variant carriers and non-carriers. Intriguingly, the G84E variant allele was rarely transcribed in carriers, suggesting that HOXB13 expression may be driven by the wild-type allele in the majority of carriers.
Frequent variations in cancer related genes may play prognostic role in treatment of patients with chronic myeloid leukemia
Background Genome variability of host genome and cancer cells play critical role in diversity of response to existing therapies and overall success in treating oncological diseases. In chronic myeloid leukemia targeted therapy with tyrosine kinase inhibitors demonstrates high efficacy in most of the patients. However about 15 % of patients demonstrate primary resistance to standard therapy. Whole exome sequencing is a good tool for unbiased search of genetic variations important for prognosis of survival and therapy efficacy in many cancers. We apply this approach to CML patients with optimal response and failure of tyrosine kinase therapy. Results We analyzed exome variations between optimal responders and failures and found 7 variants in cancer-related genes with different genotypes in two groups of patients. Five of them were found in optimal responders: rs11579366, rs1990236, rs176037, rs10653661, rs3803264 and two in failures: rs3099950, rs9471966. These variants were found in genes associated with cancers (ANKRD35, DNAH9, MAGEC1, TOX3) or participating in cancer-related signaling pathways (THSD1, MORN2, PTCRA). Conclusion We found gene variants which may become early predictors of the therapy outcome and allow development of new early prognostic tests for estimation of therapy efficacy in CML patients. Normal genetic variation may influence therapy efficacy during targeted treatment of cancers. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0308-7) contains supplementary material, which is available to authorized users. Background Genome variability of host genome and cancer cells play critical role in diversity of response to existing therapies and overall success in treating oncological diseases. In chronic myeloid leukemia targeted therapy with tyrosine kinase inhibitors demonstrates high efficacy in most of the patients. However about 15 % of patients demonstrate primary resistance to standard therapy. Whole exome sequencing is a good tool for unbiased search of genetic variations important for prognosis of survival and therapy efficacy in many cancers. We apply this approach to CML patients with optimal response and failure of tyrosine kinase therapy. Results We analyzed exome variations between optimal responders and failures and found 7 variants in cancer-related genes with different genotypes in two groups of patients. Five of them were found in optimal responders: rs11579366, rs1990236, rs176037, rs10653661, rs3803264 and two in failures: rs3099950, rs9471966. These variants were found in genes associated with cancers (ANKRD35, DNAH9, MAGEC1, TOX3) or participating in cancer-related signaling pathways (THSD1, MORN2, PTCRA). Conclusion We found gene variants which may become early predictors of the therapy outcome and allow development of new early prognostic tests for estimation of therapy efficacy in CML patients. Normal genetic variation may influence therapy efficacy during targeted treatment of cancers. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0308-7) contains supplementary material, which is available to authorized users.
A systematic approach to the reporting of medically relevant findings from whole genome sequencing
Background The MedSeq Project is a randomized clinical trial developing approaches to assess the impact of integrating genome sequencing into clinical medicine. To facilitate the return of results of potential medical relevance to physicians and patients participating in the MedSeq Project, we sought to develop a reporting approach for the effective communication of such findings. Methods Genome sequencing was performed on the Illumina HiSeq platform. Variants were filtered, interpreted, and validated according to methods developed by the Laboratory for Molecular Medicine and consistent with current professional guidelines. The GeneInsight software suite, which is integrated with the Partners HealthCare electronic health record, was used for variant curation, report drafting, and delivery. Results We developed a concise 5–6 page Genome Report (GR) featuring a single-page summary of results of potential medical relevance with additional pages containing structured variant, gene, and disease information along with supporting evidence for reported variants and brief descriptions of associated diseases and clinical implications. The GR is formatted to provide a succinct summary of genomic findings, enabling physicians to take appropriate steps for disease diagnosis, prevention, and management in their patients. Conclusions Our experience highlights important considerations for the reporting of results of potential medical relevance and provides a framework for interpretation and reporting practices in clinical genome sequencing. Electronic supplementary material The online version of this article (doi:10.1186/s12881-014-0134-1) contains supplementary material, which is available to authorized users. Background The MedSeq Project is a randomized clinical trial developing approaches to assess the impact of integrating genome sequencing into clinical medicine. To facilitate the return of results of potential medical relevance to physicians and patients participating in the MedSeq Project, we sought to develop a reporting approach for the effective communication of such findings. Methods Genome sequencing was performed on the Illumina HiSeq platform. Variants were filtered, interpreted, and validated according to methods developed by the Laboratory for Molecular Medicine and consistent with current professional guidelines. The GeneInsight software suite, which is integrated with the Partners HealthCare electronic health record, was used for variant curation, report drafting, and delivery. Results We developed a concise 5–6 page Genome Report (GR) featuring a single-page summary of results of potential medical relevance with additional pages containing structured variant, gene, and disease information along with supporting evidence for reported variants and brief descriptions of associated diseases and clinical implications. The GR is formatted to provide a succinct summary of genomic findings, enabling physicians to take appropriate steps for disease diagnosis, prevention, and management in their patients. Conclusions Our experience highlights important considerations for the reporting of results of potential medical relevance and provides a framework for interpretation and reporting practices in clinical genome sequencing. Electronic supplementary material The online version of this article (doi:10.1186/s12881-014-0134-1) contains supplementary material, which is available to authorized users.
A 26 hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases
While the cost of whole genome sequencing (WGS) is approaching the realm of routine medical tests, it remains too tardy to help guide the management of many acute medical conditions. Rapid WGS is imperative in light of growing evidence of its utility in acute care, such as in diagnosis of genetic diseases in very ill infants, and genotype-guided choice of chemotherapy at cancer relapse. In such situations, delayed, empiric, or phenotype-based clinical decisions may meet with substantial morbidity or mortality. We previously described a rapid WGS method, STATseq, with a sensitivity of >96 % for nucleotide variants that allowed a provisional diagnosis of a genetic disease in 50 h. Here improvements in sequencing run time, read alignment, and variant calling are described that enable 26-h time to provisional molecular diagnosis with >99.5 % sensitivity and specificity of genotypes. STATseq appears to be an appropriate strategy for acutely ill patients with potentially actionable genetic diseases. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0221-8) contains supplementary material, which is available to authorized users. While the cost of whole genome sequencing (WGS) is approaching the realm of routine medical tests, it remains too tardy to help guide the management of many acute medical conditions. Rapid WGS is imperative in light of growing evidence of its utility in acute care, such as in diagnosis of genetic diseases in very ill infants, and genotype-guided choice of chemotherapy at cancer relapse. In such situations, delayed, empiric, or phenotype-based clinical decisions may meet with substantial morbidity or mortality. We previously described a rapid WGS method, STATseq, with a sensitivity of >96 % for nucleotide variants that allowed a provisional diagnosis of a genetic disease in 50 h. Here improvements in sequencing run time, read alignment, and variant calling are described that enable 26-h time to provisional molecular diagnosis with >99.5 % sensitivity and specificity of genotypes. STATseq appears to be an appropriate strategy for acutely ill patients with potentially actionable genetic diseases. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0221-8) contains supplementary material, which is available to authorized users.
Genetic alterations analysis in prognostic stratified groups identified TP53 and ARID1A as poor clinical performance markers in intrahepatic cholangiocarcinoma
The incidence and mortality rates of intrahepatic cholangiocarcinoma have been rising worldwide. Few patients present an early-stage disease that is amenable to curative surgery and after resection, high recurrence rates persist. To identify new independent marker related to aggressive behaviour, two prognostic groups of patient were selected and divided according to prognostic performance. All patients alive at 36 months were included in good prognostic performers, while all patients died due to disease within 36 months in poor prognostic performers. Using high-coverage target sequencing we analysed principal genetic alterations in two groups and compared results to clinical data. In the 33 cases included in poor prognosis group, TP53 was most mutated gene (p = 0.011) and exclusively present in these cases. Similarly, ARID1A was exclusive of this group (p = 0.024). TP53 and ARID1A are mutually exclusive in this study. Statistical analysis showed mutations in TP53 and ARID1A genes and amplification of MET gene as independent predictors of poor prognosis (TP53, p = 0.0031, ARID1A, p = 0.0007, MET, p = 0.0003 in Cox analysis). LOH in PTEN was also identified as marker of disease recurrence (p = 0.04) in univariate analysis. This work improves our understanding of aggressiveness related to this tumour type and has identified novel prognostic markers of clinical outcome. The incidence and mortality rates of intrahepatic cholangiocarcinoma have been rising worldwide. Few patients present an early-stage disease that is amenable to curative surgery and after resection, high recurrence rates persist. To identify new independent marker related to aggressive behaviour, two prognostic groups of patient were selected and divided according to prognostic performance. All patients alive at 36 months were included in good prognostic performers, while all patients died due to disease within 36 months in poor prognostic performers. Using high-coverage target sequencing we analysed principal genetic alterations in two groups and compared results to clinical data. In the 33 cases included in poor prognosis group, TP53 was most mutated gene (p = 0.011) and exclusively present in these cases. Similarly, ARID1A was exclusive of this group (p = 0.024). TP53 and ARID1A are mutually exclusive in this study. Statistical analysis showed mutations in TP53 and ARID1A genes and amplification of MET gene as independent predictors of poor prognosis (TP53, p = 0.0031, ARID1A, p = 0.0007, MET, p = 0.0003 in Cox analysis). LOH in PTEN was also identified as marker of disease recurrence (p = 0.04) in univariate analysis. This work improves our understanding of aggressiveness related to this tumour type and has identified novel prognostic markers of clinical outcome.
DNA and RNA sequence based GWAS highlights membrane transport genes as key modulators of milk lactose content
Background Lactose provides an easily-digested energy source for neonates, and is the primary carbohydrate in milk in most species. Bovine lactose is also a key component of many human food products. However, compared to analyses of other milk components, the genetic control of lactose has been little studied. Here we present the first GWAS focussed on analysis of milk lactose traits. Results Using a discovery population of 12,000 taurine dairy cattle, we detail 27 QTL for lactose concentration and yield, and subsequently validate the effects of 26 of these loci in a distinct population of 18,000 cows. We next present data implicating causative genes and variants for these QTL. Fine mapping of these regions using imputed, whole genome sequence-resolution genotypes reveals protein-coding candidate causative variants affecting the ABCG2, DGAT1, STAT5B, KCNH4, NPFFR2 and RNF214 genes. Eleven of the remaining QTL appear to be driven by regulatory effects, suggested by the presence of co-locating, co-segregating eQTL discovered using mammary RNA sequence data from a population of 357 lactating cows. Pathway analysis of genes representing all lactose-associated loci shows significant enrichment of genes located in the endoplasmic reticulum, with functions related to ion channel activity mediated through the LRRC8C, P2RX4, KCNJ2 and ANKH genes. A number of the validated QTL are also found to be associated with additional milk volume, fat and protein phenotypes. Conclusions Overall, these findings highlight novel candidate genes and variants involved in milk lactose regulation, whose impacts on membrane transport mechanisms reinforce the key osmo-regulatory roles of lactose in milk. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4320-3) contains supplementary material, which is available to authorized users. Background Lactose provides an easily-digested energy source for neonates, and is the primary carbohydrate in milk in most species. Bovine lactose is also a key component of many human food products. However, compared to analyses of other milk components, the genetic control of lactose has been little studied. Here we present the first GWAS focussed on analysis of milk lactose traits. Results Using a discovery population of 12,000 taurine dairy cattle, we detail 27 QTL for lactose concentration and yield, and subsequently validate the effects of 26 of these loci in a distinct population of 18,000 cows. We next present data implicating causative genes and variants for these QTL. Fine mapping of these regions using imputed, whole genome sequence-resolution genotypes reveals protein-coding candidate causative variants affecting the ABCG2, DGAT1, STAT5B, KCNH4, NPFFR2 and RNF214 genes. Eleven of the remaining QTL appear to be driven by regulatory effects, suggested by the presence of co-locating, co-segregating eQTL discovered using mammary RNA sequence data from a population of 357 lactating cows. Pathway analysis of genes representing all lactose-associated loci shows significant enrichment of genes located in the endoplasmic reticulum, with functions related to ion channel activity mediated through the LRRC8C, P2RX4, KCNJ2 and ANKH genes. A number of the validated QTL are also found to be associated with additional milk volume, fat and protein phenotypes. Conclusions Overall, these findings highlight novel candidate genes and variants involved in milk lactose regulation, whose impacts on membrane transport mechanisms reinforce the key osmo-regulatory roles of lactose in milk. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4320-3) contains supplementary material, which is available to authorized users.
Comparative Studies of Copy Number Variation Detection Methods for Next Generation Sequencing Technologies
Copy number variation (CNV) has played an important role in studies of susceptibility or resistance to complex diseases. Traditional methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution of genomic regions. Following the emergence of next generation sequencing (NGS) technologies, CNV detection methods based on the short read data have recently been developed. However, due to the relatively young age of the procedures, their performance is not fully understood. To help investigators choose suitable methods to detect CNVs, comparative studies are needed. We compared six publicly available CNV detection methods: CNV-seq, FREEC, readDepth, CNVnator, SegSeq and event-wise testing (EWT). They are evaluated both on simulated and real data with different experiment settings. The receiver operating characteristic (ROC) curve is employed to demonstrate the detection performance in terms of sensitivity and specificity, box plot is employed to compare their performances in terms of breakpoint and copy number estimation, Venn diagram is employed to show the consistency among these methods, and F-score is employed to show the overlapping quality of detected CNVs. The computational demands are also studied. The results of our work provide a comprehensive evaluation on the performances of the selected CNV detection methods, which will help biological investigators choose the best possible method. Copy number variation (CNV) has played an important role in studies of susceptibility or resistance to complex diseases. Traditional methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution of genomic regions. Following the emergence of next generation sequencing (NGS) technologies, CNV detection methods based on the short read data have recently been developed. However, due to the relatively young age of the procedures, their performance is not fully understood. To help investigators choose suitable methods to detect CNVs, comparative studies are needed. We compared six publicly available CNV detection methods: CNV-seq, FREEC, readDepth, CNVnator, SegSeq and event-wise testing (EWT). They are evaluated both on simulated and real data with different experiment settings. The receiver operating characteristic (ROC) curve is employed to demonstrate the detection performance in terms of sensitivity and specificity, box plot is employed to compare their performances in terms of breakpoint and copy number estimation, Venn diagram is employed to show the consistency among these methods, and F-score is employed to show the overlapping quality of detected CNVs. The computational demands are also studied. The results of our work provide a comprehensive evaluation on the performances of the selected CNV detection methods, which will help biological investigators choose the best possible method.
Protein and small non coding RNA enriched extracellular vesicles are released by the pathogenic blood fluke Schistosoma mansoni
Background Penetration of skin, migration through tissues and establishment of long-lived intravascular partners require Schistosoma parasites to successfully manipulate definitive host defences. While previous studies of larval schistosomula have postulated a function for excreted/secreted (E/S) products in initiating these host-modulatory events, the role of extracellular vesicles (EVs) has yet to be considered. Here, using preparatory ultracentrifugation as well as methodologies to globally analyse both proteins and small non-coding RNAs (sncRNAs), we conducted the first characterization of Schistosoma mansoni schistosomula EVs and their potential host-regulatory cargos. Results Transmission electron microscopy analysis of EVs isolated from schistosomula in vitro cultures revealed the presence of numerous, 30–100 nm sized exosome-like vesicles. Proteomic analysis of these vesicles revealed a core set of 109 proteins, including homologs to those previously found enriched in other eukaryotic EVs, as well as hypothetical proteins of high abundance and currently unknown function. Characterization of E/S sncRNAs found within and outside of schistosomula EVs additionally identified the presence of potential gene-regulatory miRNAs (35 known and 170 potentially novel miRNAs) and tRNA-derived small RNAs (tsRNAs; nineteen 5′ tsRNAs and fourteen 3′ tsRNAs). Conclusions The identification of S. mansoni EVs and the combinatorial protein/sncRNA characterization of their cargo signifies that an important new participant in the complex biology underpinning schistosome/host interactions has now been discovered. Further work defining the role of these schistosomula EVs and the function/stability of intra- and extra-vesicular sncRNA components presents tremendous opportunities for developing novel schistosomiasis diagnostics or interventions. Background Penetration of skin, migration through tissues and establishment of long-lived intravascular partners require Schistosoma parasites to successfully manipulate definitive host defences. While previous studies of larval schistosomula have postulated a function for excreted/secreted (E/S) products in initiating these host-modulatory events, the role of extracellular vesicles (EVs) has yet to be considered. Here, using preparatory ultracentrifugation as well as methodologies to globally analyse both proteins and small non-coding RNAs (sncRNAs), we conducted the first characterization of Schistosoma mansoni schistosomula EVs and their potential host-regulatory cargos. Results Transmission electron microscopy analysis of EVs isolated from schistosomula in vitro cultures revealed the presence of numerous, 30–100 nm sized exosome-like vesicles. Proteomic analysis of these vesicles revealed a core set of 109 proteins, including homologs to those previously found enriched in other eukaryotic EVs, as well as hypothetical proteins of high abundance and currently unknown function. Characterization of E/S sncRNAs found within and outside of schistosomula EVs additionally identified the presence of potential gene-regulatory miRNAs (35 known and 170 potentially novel miRNAs) and tRNA-derived small RNAs (tsRNAs; nineteen 5′ tsRNAs and fourteen 3′ tsRNAs). Conclusions The identification of S. mansoni EVs and the combinatorial protein/sncRNA characterization of their cargo signifies that an important new participant in the complex biology underpinning schistosome/host interactions has now been discovered. Further work defining the role of these schistosomula EVs and the function/stability of intra- and extra-vesicular sncRNA components presents tremendous opportunities for developing novel schistosomiasis diagnostics or interventions.
Multi platform microRNA profiling of hepatoblastoma patients using formalin fixed paraffin embedded archival samples
Background Formalin fixed paraffin embedded (FFPE) samples are a valuable resource in cancer research and have the potential to be extensively used. However, they are often underused because of degradation and chemical modifications occurring in the RNA that can present obstacles in downstream analysis. In routine medical care, FFPE material is examined and archived, therefore clinical collections of many types of cancers exist. It is beneficial to assess and record the quality of data that can be obtained from this type of material. The current study investigated three independent platforms and their ability to profile microRNAs (miRNAs) within FFPE samples from hepatoblastoma (HB) patients. Findings Here we present three types of datasets consisting of miRNA profiles for 13 HB patients with different tumour types and molecular variations. The three platforms that were used to generate these data are: next-generation sequencing (Illumina MiSeq), microarray (Affymetrix® GeneChip® miRNA 3.0 array) and NanoString (nCounter, Human v2 miRNA Assay). The mature miRNAs identified are based on miRBase version 17 and 18. Conclusions These datasets provide a global landscape of miRNA expression for a rare childhood cancer that has not previously been well characterised. These data could serve as a resource for future studies aiming to make comparisons of HB miRNA profiles and to document aberrant miRNA expression in this type of cancer. Electronic supplementary material The online version of this article (doi:10.1186/s13742-015-0099-9) contains supplementary material, which is available to authorized users. Background Formalin fixed paraffin embedded (FFPE) samples are a valuable resource in cancer research and have the potential to be extensively used. However, they are often underused because of degradation and chemical modifications occurring in the RNA that can present obstacles in downstream analysis. In routine medical care, FFPE material is examined and archived, therefore clinical collections of many types of cancers exist. It is beneficial to assess and record the quality of data that can be obtained from this type of material. The current study investigated three independent platforms and their ability to profile microRNAs (miRNAs) within FFPE samples from hepatoblastoma (HB) patients. Findings Here we present three types of datasets consisting of miRNA profiles for 13 HB patients with different tumour types and molecular variations. The three platforms that were used to generate these data are: next-generation sequencing (Illumina MiSeq), microarray (Affymetrix® GeneChip® miRNA 3.0 array) and NanoString (nCounter, Human v2 miRNA Assay). The mature miRNAs identified are based on miRBase version 17 and 18. Conclusions These datasets provide a global landscape of miRNA expression for a rare childhood cancer that has not previously been well characterised. These data could serve as a resource for future studies aiming to make comparisons of HB miRNA profiles and to document aberrant miRNA expression in this type of cancer. Electronic supplementary material The online version of this article (doi:10.1186/s13742-015-0099-9) contains supplementary material, which is available to authorized users.
SLC35E3 identified as a target of novel m1061 5p via microRNA profiling of patients with cardiovascular disease
MicroRNAs (miRNA) are considered to be potential therapeutic targets for the treatment of various cardiovascular diseases (CVDs). To understand the underlying mechanism of miRNAs and target genes associated with CVD, deep sequencing of blood samples from three patients with CVD and three controls was performed using the Illumina HiSeq 2000 system. The results of the present study revealed that 65 abnormal hsa-miRNAs targeted 2,784 putative genes in patients with CVD; 59 upregulated miRNAs targeted 2,401 genes and six downregulated miRNAs targeted 383 genes. In addition, a total of 49 Gene Ontology (GO) biological processes and were enriched, and the target genes of downregulated miRNAs were enriched in 12 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Most of these pathways are responsible for lipid and glycan metabolism. In particular, three downregulated miRNAs, hsa-miR-1268b, hsa-miR-1273d, hsa-miR-3187-5p, were involved in a-linolenic acid metabolism. The target genes of upregulated miRNAs were enriched in 15 KEGG pathways, mainly in the ‘neurodegenerative diseases and cancers’ class. In the present study five novel upregulated miRNAs, including m0499-5p, m0970-5p, m1042-5p, m1061-5p and m1953-5p, and a downregulated miRNA, novel-m1627-5p, were identified in patients with CVD. Novel-m1627-5p was demonstrated to target 146 human genes. Additionally, Novel-m1061-5p targeted four genes, including fumarylacetoacetate hydrolase domain containing 2A, potassium voltage-gated channel, Shaw-related subfamily, member 4, coiled-coil domain containing 85C and solute carrier family 35 member E3 (SLC35E3). The GO term, ‘carbohydrate derivative transport involving in biological process’, was associated with SLC35E3. Novel-m1061-5p in patients with CVD may repress the expression levels of SLC35E3, a member of the nucleoside sugar transporter subfamily E, which is known to cause defective glycol-conjugation in the Golgi complex and/or the endoplasmic reticulum. Further investigation is required to understand the underlying mechanisms of the novel miRNAs. Novel-m1061-5p may serve as a marker for prognosis or a potential target for the treatment of CVD. MicroRNAs (miRNA) are considered to be potential therapeutic targets for the treatment of various cardiovascular diseases (CVDs). To understand the underlying mechanism of miRNAs and target genes associated with CVD, deep sequencing of blood samples from three patients with CVD and three controls was performed using the Illumina HiSeq 2000 system. The results of the present study revealed that 65 abnormal hsa-miRNAs targeted 2,784 putative genes in patients with CVD; 59 upregulated miRNAs targeted 2,401 genes and six downregulated miRNAs targeted 383 genes. In addition, a total of 49 Gene Ontology (GO) biological processes and were enriched, and the target genes of downregulated miRNAs were enriched in 12 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Most of these pathways are responsible for lipid and glycan metabolism. In particular, three downregulated miRNAs, hsa-miR-1268b, hsa-miR-1273d, hsa-miR-3187-5p, were involved in a-linolenic acid metabolism. The target genes of upregulated miRNAs were enriched in 15 KEGG pathways, mainly in the ‘neurodegenerative diseases and cancers’ class. In the present study five novel upregulated miRNAs, including m0499-5p, m0970-5p, m1042-5p, m1061-5p and m1953-5p, and a downregulated miRNA, novel-m1627-5p, were identified in patients with CVD. Novel-m1627-5p was demonstrated to target 146 human genes. Additionally, Novel-m1061-5p targeted four genes, including fumarylacetoacetate hydrolase domain containing 2A, potassium voltage-gated channel, Shaw-related subfamily, member 4, coiled-coil domain containing 85C and solute carrier family 35 member E3 (SLC35E3). The GO term, ‘carbohydrate derivative transport involving in biological process’, was associated with SLC35E3. Novel-m1061-5p in patients with CVD may repress the expression levels of SLC35E3, a member of the nucleoside sugar transporter subfamily E, which is known to cause defective glycol-conjugation in the Golgi complex and/or the endoplasmic reticulum. Further investigation is required to understand the underlying mechanisms of the novel miRNAs. Novel-m1061-5p may serve as a marker for prognosis or a potential target for the treatment of CVD.
De novo Assembly of the Grass Carp Ctenopharyngodon idella Transcriptome to Identify miRNA Targets Associated with Motile Aeromonad Septicemia
Background De novo transcriptome sequencing is a robust method of predicting miRNA target genes, especially for organisms without reference genomes. Differentially expressed miRNAs had been identified previously in kidney samples collected from susceptible and resistant grass carp (Ctenopharyngodon idella) affected by Aeromonas hydrophila. Target identification for these differentially expressed miRNAs poses a major challenge in this non-model organism. Results Two cDNA libraries constructed from mRNAs of susceptible and resistant C. idella were sequenced by Illumina Hiseq 2000 technology. A total of more than 100 million reads were generated and de novo assembled into 199,593 transcripts which were further extensively annotated by comparing their sequences to different protein databases. Biochemical pathways were predicted from these transcript sequences. A BLASTx analysis against a non-redundant protein database revealed that 61,373 unigenes coded for 28,311 annotated proteins. Two cDNA libraries from susceptible and resistant samples showed that 721 unigenes were expressed at significantly different levels; 475 were significantly up-regulated and 246 were significantly down-regulated in the SG samples compared to the RG samples. The computational prediction of miRNA targets from these differentially expressed genes identified 188 unigenes as the targets of 5 conserved and 4 putative novel miRNA families. Conclusion This study demonstrates the feasibility of identifying miRNA targets by transcriptome analysis. The transcriptome assembly data represent a substantial increase in the genomic resources available for C. idella and will provide insights into the gene expression profile analysis and the miRNA function annotations in further studies. Background De novo transcriptome sequencing is a robust method of predicting miRNA target genes, especially for organisms without reference genomes. Differentially expressed miRNAs had been identified previously in kidney samples collected from susceptible and resistant grass carp (Ctenopharyngodon idella) affected by Aeromonas hydrophila. Target identification for these differentially expressed miRNAs poses a major challenge in this non-model organism. Results Two cDNA libraries constructed from mRNAs of susceptible and resistant C. idella were sequenced by Illumina Hiseq 2000 technology. A total of more than 100 million reads were generated and de novo assembled into 199,593 transcripts which were further extensively annotated by comparing their sequences to different protein databases. Biochemical pathways were predicted from these transcript sequences. A BLASTx analysis against a non-redundant protein database revealed that 61,373 unigenes coded for 28,311 annotated proteins. Two cDNA libraries from susceptible and resistant samples showed that 721 unigenes were expressed at significantly different levels; 475 were significantly up-regulated and 246 were significantly down-regulated in the SG samples compared to the RG samples. The computational prediction of miRNA targets from these differentially expressed genes identified 188 unigenes as the targets of 5 conserved and 4 putative novel miRNA families. Conclusion This study demonstrates the feasibility of identifying miRNA targets by transcriptome analysis. The transcriptome assembly data represent a substantial increase in the genomic resources available for C. idella and will provide insights into the gene expression profile analysis and the miRNA function annotations in further studies.
De novo assembly of the blunt snout bream (Megalobrama amblycephala) gill transcriptome to identify ammonia exposure associated microRNAs and their targets
De novo transcriptome sequencing is a robust method for microRNA (miRNA) target gene prediction, especially for organisms without reference genomes. Following exposure of Megalobrama amblycephala to ammonia (0.1 or 20 mg L−1 ), two cDNA libraries were constructed from the fish gills and sequenced using Illumina HiSeq 2000. Over 90 million reads were generated and de novo assembled into 46, 615 unigenes, which were then extensively annotated by comparing to different protein databases, followed by biochemical pathway prediction. The expression of 2666 unigenes significantly differed; 1961 were up-regulated, while 975 were down-regulated. Among these, 250 unigenes were identified as the targets for 10 conserved and 4 putative novel miRNA families by miRNA target computational prediction. We examined expression of ssa-miRNA-21 and its target genes by real-time quantitative PCR and found agreement with the sequencing data. This study demonstrates the feasibility of identifying miRNA targets by transcriptome analysis. The transcriptome assembly data represent a substantial increase in the genomic resources available for Megalobrama amblycephala and will be useful for gene expression profile analysis and miRNA functional annotation. De novo transcriptome sequencing is a robust method for microRNA (miRNA) target gene prediction, especially for organisms without reference genomes. Following exposure of Megalobrama amblycephala to ammonia (0.1 or 20 mg L−1 ), two cDNA libraries were constructed from the fish gills and sequenced using Illumina HiSeq 2000. Over 90 million reads were generated and de novo assembled into 46, 615 unigenes, which were then extensively annotated by comparing to different protein databases, followed by biochemical pathway prediction. The expression of 2666 unigenes significantly differed; 1961 were up-regulated, while 975 were down-regulated. Among these, 250 unigenes were identified as the targets for 10 conserved and 4 putative novel miRNA families by miRNA target computational prediction. We examined expression of ssa-miRNA-21 and its target genes by real-time quantitative PCR and found agreement with the sequencing data. This study demonstrates the feasibility of identifying miRNA targets by transcriptome analysis. The transcriptome assembly data represent a substantial increase in the genomic resources available for Megalobrama amblycephala and will be useful for gene expression profile analysis and miRNA functional annotation.
Natural selection in a population of Drosophila melanogaster explained by changes in gene expression caused by sequence variation in core promoter regions
Background Understanding the evolutionary forces that influence variation in gene regulatory regions in natural populations is an important challenge for evolutionary biology because natural selection for such variations could promote adaptive phenotypic evolution. Recently, whole-genome sequence analyses have identified regulatory regions subject to natural selection. However, these studies could not identify the relationship between sequence variation in the detected regions and change in gene expression levels. We analyzed sequence variations in core promoter regions, which are critical regions for gene regulation in higher eukaryotes, in a natural population of Drosophila melanogaster, and identified core promoter sequence variations associated with differences in gene expression levels subjected to natural selection. Results Among the core promoter regions whose sequence variation could change transcription factor binding sites and explain differences in expression levels, three core promoter regions were detected as candidates associated with purifying selection or selective sweep and seven as candidates associated with balancing selection, excluding the possibility of linkage between these regions and core promoter regions. CHKov1, which confers resistance to the sigma virus and related insecticides, was identified as core promoter regions that has been subject to selective sweep, although it could not be denied that selection for variation in core promoter regions was due to linked single nucleotide polymorphisms in the regulatory region outside core promoter regions. Nucleotide changes in core promoter regions of CHKov1 caused the loss of two basal transcription factor binding sites and acquisition of one transcription factor binding site, resulting in decreased gene expression levels. Of nine core promoter regions regions associated with balancing selection, brat, and CG9044 are associated with neuromuscular junction development, and Nmda1 are associated with learning, behavioral plasticity, and memory. Diversity of neural and behavioral traits may have been maintained by balancing selection. Conclusions Our results revealed the evolutionary process occurring by natural selection for differences in gene expression levels caused by sequence variation in core promoter regions in a natural population. The sequences of core promoter regions were diverse even within the population, possibly providing a source for natural selection. Electronic supplementary material The online version of this article (doi:10.1186/s12862-016-0606-3) contains supplementary material, which is available to authorized users. Background Understanding the evolutionary forces that influence variation in gene regulatory regions in natural populations is an important challenge for evolutionary biology because natural selection for such variations could promote adaptive phenotypic evolution. Recently, whole-genome sequence analyses have identified regulatory regions subject to natural selection. However, these studies could not identify the relationship between sequence variation in the detected regions and change in gene expression levels. We analyzed sequence variations in core promoter regions, which are critical regions for gene regulation in higher eukaryotes, in a natural population of Drosophila melanogaster, and identified core promoter sequence variations associated with differences in gene expression levels subjected to natural selection. Results Among the core promoter regions whose sequence variation could change transcription factor binding sites and explain differences in expression levels, three core promoter regions were detected as candidates associated with purifying selection or selective sweep and seven as candidates associated with balancing selection, excluding the possibility of linkage between these regions and core promoter regions. CHKov1, which confers resistance to the sigma virus and related insecticides, was identified as core promoter regions that has been subject to selective sweep, although it could not be denied that selection for variation in core promoter regions was due to linked single nucleotide polymorphisms in the regulatory region outside core promoter regions. Nucleotide changes in core promoter regions of CHKov1 caused the loss of two basal transcription factor binding sites and acquisition of one transcription factor binding site, resulting in decreased gene expression levels. Of nine core promoter regions regions associated with balancing selection, brat, and CG9044 are associated with neuromuscular junction development, and Nmda1 are associated with learning, behavioral plasticity, and memory. Diversity of neural and behavioral traits may have been maintained by balancing selection. Conclusions Our results revealed the evolutionary process occurring by natural selection for differences in gene expression levels caused by sequence variation in core promoter regions in a natural population. The sequences of core promoter regions were diverse even within the population, possibly providing a source for natural selection. Electronic supplementary material The online version of this article (doi:10.1186/s12862-016-0606-3) contains supplementary material, which is available to authorized users.
Challenges and disparities in the application of personalized genomic medicine to populations with African ancestry
Personalized medicine requires accurate and ethnicity-optimized reference genome panels. Here, the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) evaluates typical variant filters and existing genome databases against newly sequenced African-ancestry populations. Personalized medicine requires accurate and ethnicity-optimized reference genome panels. Here, the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) evaluates typical variant filters and existing genome databases against newly sequenced African-ancestry populations.To characterize the extent and impact of ancestry-related biases in precision genomic medicine, we use 642 whole-genome sequences from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) project to evaluate typical filters and databases. We find significant correlations between estimated African ancestry proportions and the number of variants per individual in all variant classification sets but one. The source of these correlations is highlighted in more detail by looking at the interaction between filtering criteria and the ClinVar and Human Gene Mutation databases. ClinVar's correlation, representing African ancestry-related bias, has changed over time amidst monthly updates, with the most extreme switch happening between March and April of 2014 (r=0.733 to r=−0.683). We identify 68 SNPs as the major drivers of this change in correlation. As long as ancestry-related bias when using these clinical databases is minimally recognized, the genetics community will face challenges with implementation, interpretation and cost-effectiveness when treating minority populations. To characterize the extent and impact of ancestry-related biases in precision genomic medicine, we use 642 whole-genome sequences from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) project to evaluate typical filters and databases. We find significant correlations between estimated African ancestry proportions and the number of variants per individual in all variant classification sets but one. The source of these correlations is highlighted in more detail by looking at the interaction between filtering criteria and the ClinVar and Human Gene Mutation databases. ClinVar's correlation, representing African ancestry-related bias, has changed over time amidst monthly updates, with the most extreme switch happening between March and April of 2014 (r=0.733 to r=−0.683). We identify 68 SNPs as the major drivers of this change in correlation. As long as ancestry-related bias when using these clinical databases is minimally recognized, the genetics community will face challenges with implementation, interpretation and cost-effectiveness when treating minority populations.
Population genomics of the eastern cottonwood (Populus deltoides)
Abstract Despite its economic importance as a bioenergy crop and key role in riparian ecosystems, little is known about genetic diversity and adaptation of the eastern cottonwood (Populus deltoides). Here, we report the first population genomics study for this species, conducted on a sample of 425 unrelated individuals collected in 13 states of the southeastern United States. The trees were genotyped by targeted resequencing of 18,153 genes and 23,835 intergenic regions, followed by the identification of single nucleotide polymorphisms (SNPs). This natural P. deltoides population showed low levels of subpopulation differentiation (F ST = 0.022–0.106), high genetic diversity (θW = 0.00100, π = 0.00170), a large effective population size (N e ≈ 32,900), and low to moderate levels of linkage disequilibrium. Additionally, genomewide scans for selection (Tajima's D), subpopulation differentiation (XTX), and environmental association analyses with eleven climate variables carried out with two different methods (LFMM and BAYENV2) identified genes putatively involved in local adaptation. Interestingly, many of these genes were also identified as adaptation candidates in another poplar species, Populus trichocarpa, indicating possible convergent evolution. This study constitutes the first assessment of genetic diversity and local adaptation in P. deltoides throughout the southern part of its range, information we expect to be of use to guide management and breeding strategies for this species in future, especially in the face of climate change. Abstract Despite its economic importance as a bioenergy crop and key role in riparian ecosystems, little is known about genetic diversity and adaptation of the eastern cottonwood (Populus deltoides). Here, we report the first population genomics study for this species, conducted on a sample of 425 unrelated individuals collected in 13 states of the southeastern United States. The trees were genotyped by targeted resequencing of 18,153 genes and 23,835 intergenic regions, followed by the identification of single nucleotide polymorphisms (SNPs). This natural P. deltoides population showed low levels of subpopulation differentiation (F ST = 0.022–0.106), high genetic diversity (θW = 0.00100, π = 0.00170), a large effective population size (N e ≈ 32,900), and low to moderate levels of linkage disequilibrium. Additionally, genomewide scans for selection (Tajima's D), subpopulation differentiation (XTX), and environmental association analyses with eleven climate variables carried out with two different methods (LFMM and BAYENV2) identified genes putatively involved in local adaptation. Interestingly, many of these genes were also identified as adaptation candidates in another poplar species, Populus trichocarpa, indicating possible convergent evolution. This study constitutes the first assessment of genetic diversity and local adaptation in P. deltoides throughout the southern part of its range, information we expect to be of use to guide management and breeding strategies for this species in future, especially in the face of climate change.
Genome wide analysis of genetic susceptibility to language impairment in an isolated Chilean population
Specific language impairment (SLI) is an unexpected deficit in the acquisition of language skills and affects between 5 and 8% of pre-school children. Despite its prevalence and high heritability, our understanding of the aetiology of this disorder is only emerging. In this paper, we apply genome-wide techniques to investigate an isolated Chilean population who exhibit an increased frequency of SLI. Loss of heterozygosity (LOH) mapping and parametric and non-parametric linkage analyses indicate that complex genetic factors are likely to underlie susceptibility to SLI in this population. Across all analyses performed, the most consistently implicated locus was on chromosome 7q. This locus achieved highly significant linkage under all three non-parametric models (max NPL=6.73, P=4.0 × 10−11). In addition, it yielded a HLOD of 1.24 in the recessive parametric linkage analyses and contained a segment that was homozygous in two affected individuals. Further, investigation of this region identified a two-SNP haplotype that occurs at an increased frequency in language-impaired individuals (P=0.008). We hypothesise that the linkage regions identified here, in particular that on chromosome 7, may contain variants that underlie the high prevalence of SLI observed in this isolated population and may be of relevance to other populations affected by language impairments. Specific language impairment (SLI) is an unexpected deficit in the acquisition of language skills and affects between 5 and 8% of pre-school children. Despite its prevalence and high heritability, our understanding of the aetiology of this disorder is only emerging. In this paper, we apply genome-wide techniques to investigate an isolated Chilean population who exhibit an increased frequency of SLI. Loss of heterozygosity (LOH) mapping and parametric and non-parametric linkage analyses indicate that complex genetic factors are likely to underlie susceptibility to SLI in this population. Across all analyses performed, the most consistently implicated locus was on chromosome 7q. This locus achieved highly significant linkage under all three non-parametric models (max NPL=6.73, P=4.0 × 10−11). In addition, it yielded a HLOD of 1.24 in the recessive parametric linkage analyses and contained a segment that was homozygous in two affected individuals. Further, investigation of this region identified a two-SNP haplotype that occurs at an increased frequency in language-impaired individuals (P=0.008). We hypothesise that the linkage regions identified here, in particular that on chromosome 7, may contain variants that underlie the high prevalence of SLI observed in this isolated population and may be of relevance to other populations affected by language impairments.
Epistatic interactions between at least three loci determine the “rat tail” phenotype in cattle
Background The “rat-tail” syndrome (RTS) is an inherited hypotrichosis in cattle, which is exclusively expressed in diluted coloured hair. The affected animals also suffer from disturbed thermoregulation, which impairs their health and growth performance. Phenotypic features that are similar to RTS are observed in dogs with black hair follicle dysplasia. Results We used a resource cross population between German Holstein and Charolais cattle breeds to prove that epistatic interactions between at least three independent genetic loci are required for the expression of the RTS phenotype. In this population, the RTS is exclusively expressed in animals with a eumelanic background that is due to the dominant ED allele at the melanocortin 1 receptor gene located on Bos taurus autosome (BTA) 18. In addition, only the individuals that are heterozygous at the dilution locus on BTA5 that corresponds to the premelanosome protein or silver gene variant c.64G>A were classified as displaying a RTS phenotype. Linkage and whole-genome association analyses using different models and different pedigrees allowed us to map a third locus (hereafter referred to as the RTS locus) that is essential for the expression of the RTS phenotype to the chromosomal region between 14 and 22 Mb on BTA5. Our findings clearly demonstrate that the RTS and dilution loci are distinct loci on BTA5. Conclusions Our study provides evidence that the RTS locus has effects on hair conformation and coat colour dilution and that the effect on coat colour dilution is clearly independent from that of the dilution locus. Finally, our results excluded several other loci that were previously reported to be associated with or to underlie hair conformation or pigmentation traits as the causal mutations of RTS and also several major functional candidate genes that are associated with hypotrichosis in humans. Our finding on the identification of a three-locus interaction that underlies RTS provides a prime example of epistatic interaction between several independent loci that is required for the expression of a distinct phenotype. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0199-8) contains supplementary material, which is available to authorized users. Background The “rat-tail” syndrome (RTS) is an inherited hypotrichosis in cattle, which is exclusively expressed in diluted coloured hair. The affected animals also suffer from disturbed thermoregulation, which impairs their health and growth performance. Phenotypic features that are similar to RTS are observed in dogs with black hair follicle dysplasia. Results We used a resource cross population between German Holstein and Charolais cattle breeds to prove that epistatic interactions between at least three independent genetic loci are required for the expression of the RTS phenotype. In this population, the RTS is exclusively expressed in animals with a eumelanic background that is due to the dominant ED allele at the melanocortin 1 receptor gene located on Bos taurus autosome (BTA) 18. In addition, only the individuals that are heterozygous at the dilution locus on BTA5 that corresponds to the premelanosome protein or silver gene variant c.64G>A were classified as displaying a RTS phenotype. Linkage and whole-genome association analyses using different models and different pedigrees allowed us to map a third locus (hereafter referred to as the RTS locus) that is essential for the expression of the RTS phenotype to the chromosomal region between 14 and 22 Mb on BTA5. Our findings clearly demonstrate that the RTS and dilution loci are distinct loci on BTA5. Conclusions Our study provides evidence that the RTS locus has effects on hair conformation and coat colour dilution and that the effect on coat colour dilution is clearly independent from that of the dilution locus. Finally, our results excluded several other loci that were previously reported to be associated with or to underlie hair conformation or pigmentation traits as the causal mutations of RTS and also several major functional candidate genes that are associated with hypotrichosis in humans. Our finding on the identification of a three-locus interaction that underlies RTS provides a prime example of epistatic interaction between several independent loci that is required for the expression of a distinct phenotype. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0199-8) contains supplementary material, which is available to authorized users.
A whole genome association study of mother to child transmission of HIV in Malawi
Background More than 300,000 children are newly infected with HIV each year, predominantly through mother-to-child transmission (HIV MTCT). Identification of host genetic traits associated with transmission may more clearly explain the mechanisms of HIV MTCT and further the development of a vaccine to protect infants from infection. Associations between transmission and a selection of genes or single nucleotide polymorphisms (SNP)s may give an incomplete picture of HIV MTCT etiology. Thus, this study employed a genome-wide association approach to identify novel variants associated with HIV MTCT. Methods We conducted a nested case-control study of HIV MTCT using infants of HIV(+) mothers, drawn from a cohort study of malaria and HIV in pregnancy in Blantyre, Malawi. Whole genome scans (650,000 SNPs genotyped using Illumina genotyping assays) were obtained for each infant. Logistic regression was used to evaluate the association between each SNP and HIV MTCT. Results Genotype results were available for 100 HIV(+) infants (at birth, 6, or 12 weeks) and 126 HIV(-) infants (at birth, 6, and 12 weeks). We identified 9 SNPs within 6 genes with a P-value < 5 × 10-5 associated with the risk of transmission, in either unadjusted or adjusted by maternal HIV viral load analyses. Carriers of the rs8069770 variant allele were associated with a lower risk of HIV MTCT (odds ratio = 0.27, 95% confidence interval = 0.14, 0.51), where rs8069770 is located within HS3ST3A1, a gene involved in heparan sulfate biosynthesis. Interesting associations for SNPs located within or near genes involved in pregnancy and development, innate immunological response, or HIV protein interactions were also observed. Conclusions This study used a genome-wide approach to identify novel variants associated with the risk of HIV MTCT in order to gain new insights into HIV MTCT etiology. Replication of this work using a larger sample size will help us to differentiate true positive findings. Background More than 300,000 children are newly infected with HIV each year, predominantly through mother-to-child transmission (HIV MTCT). Identification of host genetic traits associated with transmission may more clearly explain the mechanisms of HIV MTCT and further the development of a vaccine to protect infants from infection. Associations between transmission and a selection of genes or single nucleotide polymorphisms (SNP)s may give an incomplete picture of HIV MTCT etiology. Thus, this study employed a genome-wide association approach to identify novel variants associated with HIV MTCT. Methods We conducted a nested case-control study of HIV MTCT using infants of HIV(+) mothers, drawn from a cohort study of malaria and HIV in pregnancy in Blantyre, Malawi. Whole genome scans (650,000 SNPs genotyped using Illumina genotyping assays) were obtained for each infant. Logistic regression was used to evaluate the association between each SNP and HIV MTCT. Results Genotype results were available for 100 HIV(+) infants (at birth, 6, or 12 weeks) and 126 HIV(-) infants (at birth, 6, and 12 weeks). We identified 9 SNPs within 6 genes with a P-value < 5 × 10-5 associated with the risk of transmission, in either unadjusted or adjusted by maternal HIV viral load analyses. Carriers of the rs8069770 variant allele were associated with a lower risk of HIV MTCT (odds ratio = 0.27, 95% confidence interval = 0.14, 0.51), where rs8069770 is located within HS3ST3A1, a gene involved in heparan sulfate biosynthesis. Interesting associations for SNPs located within or near genes involved in pregnancy and development, innate immunological response, or HIV protein interactions were also observed. Conclusions This study used a genome-wide approach to identify novel variants associated with the risk of HIV MTCT in order to gain new insights into HIV MTCT etiology. Replication of this work using a larger sample size will help us to differentiate true positive findings.
A Novel Splice Site Mutation in ALS2 Establishes the Diagnosis of Juvenile Amyotrophic Lateral Sclerosis in a Family with Early Onset Anarthria and Generalized Dystonias
The diagnosis of childhood neurological disorders remains challenging given the overlapping clinical presentation across subgroups and heterogeneous presentation within subgroups. To determine the underlying genetic cause of a severe neurological disorder in a large consanguineous Pakistani family presenting with severe scoliosis, anarthria and progressive neuromuscular degeneration, we performed genome-wide homozygosity mapping accompanied by whole-exome sequencing in two affected first cousins and their unaffected parents to find the causative mutation. We identified a novel homozygous splice-site mutation (c.3512+1G>A) in the ALS2 gene (NM_020919.3) encoding alsin that segregated with the disease in this family. Homozygous loss-of-function mutations in ALS2 are known to cause juvenile-onset amyotrophic lateral sclerosis (ALS), one of the many neurological conditions having overlapping symptoms with many neurological phenotypes. RT-PCR validation revealed that the mutation resulted in exon-skipping as well as the use of an alternative donor splice, both of which are predicted to cause loss-of-function of the resulting proteins. By examining 216 known neurological disease genes in our exome sequencing data, we also identified 9 other rare nonsynonymous mutations in these genes, some of which lie in highly conserved regions. Sequencing of a single proband might have led to mis-identification of some of these as the causative variant. Our findings established a firm diagnosis of juvenile ALS in this family, thus demonstrating the use of whole exome sequencing combined with linkage analysis in families as a powerful tool for establishing a quick and precise genetic diagnosis of complex neurological phenotypes. The diagnosis of childhood neurological disorders remains challenging given the overlapping clinical presentation across subgroups and heterogeneous presentation within subgroups. To determine the underlying genetic cause of a severe neurological disorder in a large consanguineous Pakistani family presenting with severe scoliosis, anarthria and progressive neuromuscular degeneration, we performed genome-wide homozygosity mapping accompanied by whole-exome sequencing in two affected first cousins and their unaffected parents to find the causative mutation. We identified a novel homozygous splice-site mutation (c.3512+1G>A) in the ALS2 gene (NM_020919.3) encoding alsin that segregated with the disease in this family. Homozygous loss-of-function mutations in ALS2 are known to cause juvenile-onset amyotrophic lateral sclerosis (ALS), one of the many neurological conditions having overlapping symptoms with many neurological phenotypes. RT-PCR validation revealed that the mutation resulted in exon-skipping as well as the use of an alternative donor splice, both of which are predicted to cause loss-of-function of the resulting proteins. By examining 216 known neurological disease genes in our exome sequencing data, we also identified 9 other rare nonsynonymous mutations in these genes, some of which lie in highly conserved regions. Sequencing of a single proband might have led to mis-identification of some of these as the causative variant. Our findings established a firm diagnosis of juvenile ALS in this family, thus demonstrating the use of whole exome sequencing combined with linkage analysis in families as a powerful tool for establishing a quick and precise genetic diagnosis of complex neurological phenotypes.
Analysis of exome sequence in 604 trios for recessive genotypes in schizophrenia
Genetic associations involving both rare and common alleles have been reported for schizophrenia but there have been no systematic scans for rare recessive genotypes using fully phased trio data. Here, we use exome sequencing in 604 schizophrenia proband–parent trios to investigate the role of recessive (homozygous or compound heterozygous) nonsynonymous genotypes in the disorder. The burden of recessive genotypes was not significantly increased in probands at either a genome-wide level or in any individual gene after adjustment for multiple testing. At a system level, probands had an excess of nonsynonymous compound heterozygous genotypes (minor allele frequency, MAF ⩽1%) in voltage-gated sodium channels (VGSCs; eight in probands and none in parents, P=1.5 × 10−4). Previous findings of multiple de novo loss-of-function mutations in this gene family, particularly SCN2A, in autism and intellectual disability provide biological and genetic plausibility for this finding. Pointing further to the involvement of VGSCs in schizophrenia, we found that these genes were enriched for nonsynonymous mutations (MAF ⩽0.1%) in cases genotyped using an exome array, (5585 schizophrenia cases and 8103 controls), and that in the trios data, synaptic proteins interacting with VGSCs were also enriched for both compound heterozygosity (P=0.018) and de novo mutations (P=0.04). However, we were unable to replicate the specific association with compound heterozygosity at VGSCs in an independent sample of Taiwanese schizophrenia trios (N=614). We conclude that recessive genotypes do not appear to make a substantial contribution to schizophrenia at a genome-wide level. Although multiple lines of evidence, including several from this study, suggest that rare mutations in VGSCs contribute to the disorder, in the absence of replication of the original findings regarding compound heterozygosity, this conclusion requires evaluation in a larger sample of trios. Genetic associations involving both rare and common alleles have been reported for schizophrenia but there have been no systematic scans for rare recessive genotypes using fully phased trio data. Here, we use exome sequencing in 604 schizophrenia proband–parent trios to investigate the role of recessive (homozygous or compound heterozygous) nonsynonymous genotypes in the disorder. The burden of recessive genotypes was not significantly increased in probands at either a genome-wide level or in any individual gene after adjustment for multiple testing. At a system level, probands had an excess of nonsynonymous compound heterozygous genotypes (minor allele frequency, MAF ⩽1%) in voltage-gated sodium channels (VGSCs; eight in probands and none in parents, P=1.5 × 10−4). Previous findings of multiple de novo loss-of-function mutations in this gene family, particularly SCN2A, in autism and intellectual disability provide biological and genetic plausibility for this finding. Pointing further to the involvement of VGSCs in schizophrenia, we found that these genes were enriched for nonsynonymous mutations (MAF ⩽0.1%) in cases genotyped using an exome array, (5585 schizophrenia cases and 8103 controls), and that in the trios data, synaptic proteins interacting with VGSCs were also enriched for both compound heterozygosity (P=0.018) and de novo mutations (P=0.04). However, we were unable to replicate the specific association with compound heterozygosity at VGSCs in an independent sample of Taiwanese schizophrenia trios (N=614). We conclude that recessive genotypes do not appear to make a substantial contribution to schizophrenia at a genome-wide level. Although multiple lines of evidence, including several from this study, suggest that rare mutations in VGSCs contribute to the disorder, in the absence of replication of the original findings regarding compound heterozygosity, this conclusion requires evaluation in a larger sample of trios.
Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease specific patterns at shared loci
We simultaneously investigated the genetic landscape of ankylosing spondylitis, Crohn's disease, psoriasis, primary sclerosing cholangitis and ulcerative colitis to investigate pleiotropy and the relationship between these clinically related diseases. Using high-density genotype data from more than 86,000 individuals of European-ancestry we identified 244 independent multi-disease signals including 27 novel genome-wide significant susceptibility loci and 3 unreported shared risk loci. Complex pleiotropy was supported when contrasting multi-disease signals with expression data sets from human, rat and mouse, and epigenetic and expressed enhancer profiles. The comorbidities among the five immune diseases were best explained by biological pleiotropy rather than heterogeneity (a subgroup of cases that is genetically identical to another disease, possibly due to diagnostic misclassification, molecular subtypes, or excessive comorbidity). In particular, the strong comorbidity between primary sclerosing cholangitis and inflammatory bowel disease is likely the result of a unique disease, which is genetically distinct from classical inflammatory bowel disease phenotypes. We simultaneously investigated the genetic landscape of ankylosing spondylitis, Crohn's disease, psoriasis, primary sclerosing cholangitis and ulcerative colitis to investigate pleiotropy and the relationship between these clinically related diseases. Using high-density genotype data from more than 86,000 individuals of European-ancestry we identified 244 independent multi-disease signals including 27 novel genome-wide significant susceptibility loci and 3 unreported shared risk loci. Complex pleiotropy was supported when contrasting multi-disease signals with expression data sets from human, rat and mouse, and epigenetic and expressed enhancer profiles. The comorbidities among the five immune diseases were best explained by biological pleiotropy rather than heterogeneity (a subgroup of cases that is genetically identical to another disease, possibly due to diagnostic misclassification, molecular subtypes, or excessive comorbidity). In particular, the strong comorbidity between primary sclerosing cholangitis and inflammatory bowel disease is likely the result of a unique disease, which is genetically distinct from classical inflammatory bowel disease phenotypes.
Genetic variants in CETP increase risk of intracerebral hemorrhage
Objective In observational epidemiologic studies, higher plasma high‐density lipoprotein cholesterol (HDL‐C) has been associated with increased risk of intracerebral hemorrhage (ICH). DNA sequence variants that decrease cholesteryl ester transfer protein (CETP) gene activity increase plasma HDL‐C; as such, medicines that inhibit CETP and raise HDL‐C are in clinical development. Here, we test the hypothesis that CETP DNA sequence variants associated with higher HDL‐C also increase risk for ICH. Methods We performed 2 candidate‐gene analyses of CETP. First, we tested individual CETP variants in a discovery cohort of 1,149 ICH cases and 1,238 controls from 3 studies, followed by replication in 1,625 cases and 1,845 controls from 5 studies. Second, we constructed a genetic risk score comprised of 7 independent variants at the CETP locus and tested this score for association with HDL‐C as well as ICH risk. Results Twelve variants within CETP demonstrated nominal association with ICH, with the strongest association at the rs173539 locus (odds ratio [OR] = 1.25, standard error [SE] = 0.06, p = 6.0 × 10−4) with no heterogeneity across studies (I 2 = 0%). This association was replicated in patients of European ancestry (p = 0.03). A genetic score of CETP variants found to increase HDL‐C by ∼2.85mg/dl in the Global Lipids Genetics Consortium was strongly associated with ICH risk (OR = 1.86, SE = 0.13, p = 1.39 × 10−6). Interpretation Genetic variants in CETP associated with increased HDL‐C raise the risk of ICH. Given ongoing therapeutic development in CETP inhibition and other HDL‐raising strategies, further exploration of potential adverse cerebrovascular outcomes may be warranted. Ann Neurol 2016;80:730–740 Objective In observational epidemiologic studies, higher plasma high‐density lipoprotein cholesterol (HDL‐C) has been associated with increased risk of intracerebral hemorrhage (ICH). DNA sequence variants that decrease cholesteryl ester transfer protein (CETP) gene activity increase plasma HDL‐C; as such, medicines that inhibit CETP and raise HDL‐C are in clinical development. Here, we test the hypothesis that CETP DNA sequence variants associated with higher HDL‐C also increase risk for ICH. Methods We performed 2 candidate‐gene analyses of CETP. First, we tested individual CETP variants in a discovery cohort of 1,149 ICH cases and 1,238 controls from 3 studies, followed by replication in 1,625 cases and 1,845 controls from 5 studies. Second, we constructed a genetic risk score comprised of 7 independent variants at the CETP locus and tested this score for association with HDL‐C as well as ICH risk. Results Twelve variants within CETP demonstrated nominal association with ICH, with the strongest association at the rs173539 locus (odds ratio [OR] = 1.25, standard error [SE] = 0.06, p = 6.0 × 10−4) with no heterogeneity across studies (I 2 = 0%). This association was replicated in patients of European ancestry (p = 0.03). A genetic score of CETP variants found to increase HDL‐C by ∼2.85mg/dl in the Global Lipids Genetics Consortium was strongly associated with ICH risk (OR = 1.86, SE = 0.13, p = 1.39 × 10−6). Interpretation Genetic variants in CETP associated with increased HDL‐C raise the risk of ICH. Given ongoing therapeutic development in CETP inhibition and other HDL‐raising strategies, further exploration of potential adverse cerebrovascular outcomes may be warranted. Ann Neurol 2016;80:730–740
Comparison of Genome Wide Variation between Malawians and African Ancestry HapMap Populations
Understanding genetic variation between populations is important because it affects the portability of human genome wide analytical methods. We compared genetic variation and substructure between Malawians and other African and non-African HapMap populations. Allele frequencies and adjacent linkage disequilibrium (LD) were measured for 617,715 single nucleotide polymorphisms (SNPs) across subject genomes. Allele frequencies in the Malawian population (N = 226) were highly correlated with allele frequencies in HapMap populations of African Ancestry (AFA, N = 376), namely Yoruban in Ibadan, Nigeria (Spearman’s r2 = 0.97), Luhya in Webuye, Kenya (r2 = 0.97), African Americans in the southwest United States (r2 = 0.94), and Maasai in Kinyawa, Kenya (r2 = 0.91). This correlation was much lower between Malawians and other ancestry populations (r2 < 0.52). LD correlations between Malawians and HapMap populations were strongest for the populations of African ancestry (AFA r2 > 0.82, other ancestries r2 < 0.57). Principal components analyses revealed little population substructure within our Malawi sample but provided clear distinction between Malawians, AFA populations, and two European populations. Five SNPs within the lactase gene (LCT) had substantially different allele frequencies between the Malawi population and Maasai in Kenyawa, Kenya (rs3769013, rs730005, rs3769012, rs2304370; p values < 1×10−33). Understanding genetic variation between populations is important because it affects the portability of human genome wide analytical methods. We compared genetic variation and substructure between Malawians and other African and non-African HapMap populations. Allele frequencies and adjacent linkage disequilibrium (LD) were measured for 617,715 single nucleotide polymorphisms (SNPs) across subject genomes. Allele frequencies in the Malawian population (N = 226) were highly correlated with allele frequencies in HapMap populations of African Ancestry (AFA, N = 376), namely Yoruban in Ibadan, Nigeria (Spearman’s r2 = 0.97), Luhya in Webuye, Kenya (r2 = 0.97), African Americans in the southwest United States (r2 = 0.94), and Maasai in Kinyawa, Kenya (r2 = 0.91). This correlation was much lower between Malawians and other ancestry populations (r2 < 0.52). LD correlations between Malawians and HapMap populations were strongest for the populations of African ancestry (AFA r2 > 0.82, other ancestries r2 < 0.57). Principal components analyses revealed little population substructure within our Malawi sample but provided clear distinction between Malawians, AFA populations, and two European populations. Five SNPs within the lactase gene (LCT) had substantially different allele frequencies between the Malawi population and Maasai in Kenyawa, Kenya (rs3769013, rs730005, rs3769012, rs2304370; p values < 1×10−33).
RAD sequencing reveals genomewide divergence between independent invasions of the European green crab (Carcinus maenas) in the Northwest Atlantic
Abstract Genomic studies of invasive species can reveal both invasive pathways and functional differences underpinning patterns of colonization success. The European green crab (Carcinus maenas) was initially introduced to eastern North America nearly 200 years ago where it expanded northwards to eastern Nova Scotia. A subsequent invasion to Nova Scotia from a northern European source allowed further range expansion, providing a unique opportunity to study the invasion genomics of a species with multiple invasions. Here, we use restriction‐site‐associated DNA sequencing‐derived SNPs to explore fine‐scale genomewide differentiation between these two invasions. We identified 9137 loci from green crab sampled from 11 locations along eastern North America and compared spatial variation to mitochondrial COI sequence variation used previously to characterize these invasions. Overall spatial divergence among invasions was high (pairwise FST ~0.001 to 0.15) and spread across many loci, with a mean FST ~0.052 and 52% of loci examined characterized by FST values >0.05. The majority of the most divergent loci (i.e., outliers, ~1.2%) displayed latitudinal clines in allele frequency highlighting extensive genomic divergence among the invasions. Discriminant analysis of principal components (both neutral and outlier loci) clearly resolved the two invasions spatially and was highly correlated with mitochondrial divergence. Our results reveal extensive cryptic intraspecific genomic diversity associated with differing patterns of colonization success and demonstrates clear utility for genomic approaches to delineating the distribution and colonization success of aquatic invasive species. Abstract Genomic studies of invasive species can reveal both invasive pathways and functional differences underpinning patterns of colonization success. The European green crab (Carcinus maenas) was initially introduced to eastern North America nearly 200 years ago where it expanded northwards to eastern Nova Scotia. A subsequent invasion to Nova Scotia from a northern European source allowed further range expansion, providing a unique opportunity to study the invasion genomics of a species with multiple invasions. Here, we use restriction‐site‐associated DNA sequencing‐derived SNPs to explore fine‐scale genomewide differentiation between these two invasions. We identified 9137 loci from green crab sampled from 11 locations along eastern North America and compared spatial variation to mitochondrial COI sequence variation used previously to characterize these invasions. Overall spatial divergence among invasions was high (pairwise FST ~0.001 to 0.15) and spread across many loci, with a mean FST ~0.052 and 52% of loci examined characterized by FST values >0.05. The majority of the most divergent loci (i.e., outliers, ~1.2%) displayed latitudinal clines in allele frequency highlighting extensive genomic divergence among the invasions. Discriminant analysis of principal components (both neutral and outlier loci) clearly resolved the two invasions spatially and was highly correlated with mitochondrial divergence. Our results reveal extensive cryptic intraspecific genomic diversity associated with differing patterns of colonization success and demonstrates clear utility for genomic approaches to delineating the distribution and colonization success of aquatic invasive species.
Genetic Risk Score of NOS Gene Variants Associated with Myocardial Infarction Correlates with Coronary Incidence across Europe
Coronary artery disease (CAD) mortality and morbidity is present in the European continent in a four-fold gradient across populations, from the South (Spain and France) with the lowest CAD mortality, towards the North (Finland and UK). This observed gradient has not been fully explained by classical or single genetic risk factors, resulting in some cases in the so called Southern European or Mediterranean paradox. Here we approached population genetic risk estimates using genetic risk scores (GRS) constructed with single nucleotide polymorphisms (SNP) from nitric oxide synthases (NOS) genes. These SNPs appeared to be associated with myocardial infarction (MI) in 2165 cases and 2153 controls. The GRSs were computed in 34 general European populations. Although the contribution of these GRS was lower than 1% between cases and controls, the mean GRS per population was positively correlated with coronary incidence explaining 65–85% of the variation among populations (67% in women and 86% in men). This large contribution to CAD incidence variation among populations might be a result of colinearity with several other common genetic and environmental factors. These results are not consistent with the cardiovascular Mediterranean paradox for genetics and support a CAD genetic architecture mainly based on combinations of common genetic polymorphisms. Population genetic risk scores is a promising approach in public health interventions to develop lifestyle programs and prevent intermediate risk factors in certain subpopulations with specific genetic predisposition. Coronary artery disease (CAD) mortality and morbidity is present in the European continent in a four-fold gradient across populations, from the South (Spain and France) with the lowest CAD mortality, towards the North (Finland and UK). This observed gradient has not been fully explained by classical or single genetic risk factors, resulting in some cases in the so called Southern European or Mediterranean paradox. Here we approached population genetic risk estimates using genetic risk scores (GRS) constructed with single nucleotide polymorphisms (SNP) from nitric oxide synthases (NOS) genes. These SNPs appeared to be associated with myocardial infarction (MI) in 2165 cases and 2153 controls. The GRSs were computed in 34 general European populations. Although the contribution of these GRS was lower than 1% between cases and controls, the mean GRS per population was positively correlated with coronary incidence explaining 65–85% of the variation among populations (67% in women and 86% in men). This large contribution to CAD incidence variation among populations might be a result of colinearity with several other common genetic and environmental factors. These results are not consistent with the cardiovascular Mediterranean paradox for genetics and support a CAD genetic architecture mainly based on combinations of common genetic polymorphisms. Population genetic risk scores is a promising approach in public health interventions to develop lifestyle programs and prevent intermediate risk factors in certain subpopulations with specific genetic predisposition.
Genetic Ancestry of Hadza and Sandawe Peoples Reveals Ancient Population Structure in Africa
Abstract The Hadza and Sandawe populations in present-day Tanzania speak languages containing click sounds and therefore thought to be distantly related to southern African Khoisan languages. We analyzed genome-wide genotype data for individuals sampled from the Hadza and Sandawe populations in the context of a global data set of 3,528 individuals from 163 ethno-linguistic groups. We found that Hadza and Sandawe individuals share ancestry distinct from and most closely related to Omotic ancestry; share Khoisan ancestry with populations such as ≠Khomani, Karretjie, and Ju/’hoansi in southern Africa; share Niger-Congo ancestry with populations such as Yoruba from Nigeria and Luhya from Kenya, consistent with migration associated with the Bantu Expansion; and share Cushitic ancestry with Somali, multiple Ethiopian populations, the Maasai population in Kenya, and the Nama population in Namibia. We detected evidence for low levels of Arabian, Nilo-Saharan, and Pygmy ancestries in a minority of individuals. Our results indicate that west Eurasian ancestry in eastern Africa is more precisely the Arabian parent of Cushitic ancestry. Relative to the Out-of-Africa migrations, Hadza ancestry emerged early whereas Sandawe ancestry emerged late. Abstract The Hadza and Sandawe populations in present-day Tanzania speak languages containing click sounds and therefore thought to be distantly related to southern African Khoisan languages. We analyzed genome-wide genotype data for individuals sampled from the Hadza and Sandawe populations in the context of a global data set of 3,528 individuals from 163 ethno-linguistic groups. We found that Hadza and Sandawe individuals share ancestry distinct from and most closely related to Omotic ancestry; share Khoisan ancestry with populations such as ≠Khomani, Karretjie, and Ju/’hoansi in southern Africa; share Niger-Congo ancestry with populations such as Yoruba from Nigeria and Luhya from Kenya, consistent with migration associated with the Bantu Expansion; and share Cushitic ancestry with Somali, multiple Ethiopian populations, the Maasai population in Kenya, and the Nama population in Namibia. We detected evidence for low levels of Arabian, Nilo-Saharan, and Pygmy ancestries in a minority of individuals. Our results indicate that west Eurasian ancestry in eastern Africa is more precisely the Arabian parent of Cushitic ancestry. Relative to the Out-of-Africa migrations, Hadza ancestry emerged early whereas Sandawe ancestry emerged late.
Genome Wide Meta Analysis of Sciatica in Finnish Population
Sciatica or the sciatic syndrome is a common and often disabling low back disorder in the working-age population. It has a relatively high heritability but poorly understood molecular mechanisms. The Finnish population is a genetic isolate where small founder population and bottleneck events have led to enrichment of certain rare and low frequency variants. We performed here the first genome-wide association (GWAS) and meta-analysis of sciatica. The meta-analysis was conducted across two GWAS covering 291 Finnish sciatica cases and 3671 controls genotyped and imputed at 7.7 million autosomal variants. The most promising loci (p<1x10-6) were replicated in 776 Finnish sciatica patients and 18,489 controls. We identified five intragenic variants, with relatively low frequencies, at two novel loci associated with sciatica at genome-wide significance. These included chr9:14344410:I (rs71321981) at 9p22.3 (NFIB gene; p = 1.30x10-8, MAF = 0.08) and four variants at 15q21.2: rs145901849, rs80035109, rs190200374 and rs117458827 (MYO5A; p = 1.34x10-8, MAF = 0.06; p = 2.32x10-8, MAF = 0.07; p = 3.85x10-8, MAF = 0.06; p = 4.78x10-8, MAF = 0.07, respectively). The most significant association in the meta-analysis, a single base insertion rs71321981 within the regulatory region of the transcription factor NFIB, replicated in an independent Finnish population sample (p = 0.04). Despite identifying 15q21.2 as a promising locus, we were not able to replicate it. It was differentiated; the lead variants within 15q21.2 were more frequent in Finland (6–7%) than in other European populations (1–2%). Imputation accuracies of the three significantly associated variants (chr9:14344410:I, rs190200374, and rs80035109) were validated by genotyping. In summary, our results suggest a novel locus, 9p22.3 (NFIB), which may be involved in susceptibility to sciatica. In addition, another locus, 15q21.2, emerged as a promising one, but failed to replicate. Sciatica or the sciatic syndrome is a common and often disabling low back disorder in the working-age population. It has a relatively high heritability but poorly understood molecular mechanisms. The Finnish population is a genetic isolate where small founder population and bottleneck events have led to enrichment of certain rare and low frequency variants. We performed here the first genome-wide association (GWAS) and meta-analysis of sciatica. The meta-analysis was conducted across two GWAS covering 291 Finnish sciatica cases and 3671 controls genotyped and imputed at 7.7 million autosomal variants. The most promising loci (p<1x10-6) were replicated in 776 Finnish sciatica patients and 18,489 controls. We identified five intragenic variants, with relatively low frequencies, at two novel loci associated with sciatica at genome-wide significance. These included chr9:14344410:I (rs71321981) at 9p22.3 (NFIB gene; p = 1.30x10-8, MAF = 0.08) and four variants at 15q21.2: rs145901849, rs80035109, rs190200374 and rs117458827 (MYO5A; p = 1.34x10-8, MAF = 0.06; p = 2.32x10-8, MAF = 0.07; p = 3.85x10-8, MAF = 0.06; p = 4.78x10-8, MAF = 0.07, respectively). The most significant association in the meta-analysis, a single base insertion rs71321981 within the regulatory region of the transcription factor NFIB, replicated in an independent Finnish population sample (p = 0.04). Despite identifying 15q21.2 as a promising locus, we were not able to replicate it. It was differentiated; the lead variants within 15q21.2 were more frequent in Finland (6–7%) than in other European populations (1–2%). Imputation accuracies of the three significantly associated variants (chr9:14344410:I, rs190200374, and rs80035109) were validated by genotyping. In summary, our results suggest a novel locus, 9p22.3 (NFIB), which may be involved in susceptibility to sciatica. In addition, another locus, 15q21.2, emerged as a promising one, but failed to replicate.
Dense fine mapping study identifies new susceptibility loci for primary biliary cirrhosis
We genotyped 2,861 cases from the UK PBC consortium and 8,514 UK population controls across 196,524 variants within 186 known autoimmune risk loci. We identified three loci newly associated with primary biliary cirrhosis (PBC) (with P<5×10−8), increasing the number of known susceptibility loci to 25. The most associated variant at 19p12 is a low-frequency non-synonymous SNP in TYK2, further implicating JAK/STAT and cytokine signalling in disease pathogenesis. A further five loci contained non-synonymous variants in high linkage disequilibrium (LD) (r2>0.8) with the most associated variant at the locus. We found multiple independent common, low-frequency and rare variant association signals at five loci. Of the 26 independent non-HLA signals tagged on Immunochip, 15 have SNPs in B-lymphoblastoid open-chromatin regions in high LD (r2>0.8) with the most associated variant. This study demonstrates how dense fine-mapping arrays coupled with functional genomic data can be utilized to identify candidate causal variants for functional follow-up. We genotyped 2,861 cases from the UK PBC consortium and 8,514 UK population controls across 196,524 variants within 186 known autoimmune risk loci. We identified three loci newly associated with primary biliary cirrhosis (PBC) (with P<5×10−8), increasing the number of known susceptibility loci to 25. The most associated variant at 19p12 is a low-frequency non-synonymous SNP in TYK2, further implicating JAK/STAT and cytokine signalling in disease pathogenesis. A further five loci contained non-synonymous variants in high linkage disequilibrium (LD) (r2>0.8) with the most associated variant at the locus. We found multiple independent common, low-frequency and rare variant association signals at five loci. Of the 26 independent non-HLA signals tagged on Immunochip, 15 have SNPs in B-lymphoblastoid open-chromatin regions in high LD (r2>0.8) with the most associated variant. This study demonstrates how dense fine-mapping arrays coupled with functional genomic data can be utilized to identify candidate causal variants for functional follow-up.
Whole genome sequence analysis reveals differences in population management and selection of European low input pig breeds
Background A major concern in conservation genetics is to maintain the genetic diversity of populations. Genetic variation in livestock species is threatened by the progressive marginalisation of local breeds in benefit of high-output pigs worldwide. We used high-density SNP and re-sequencing data to assess genetic diversity of local pig breeds from Europe. In addition, we re-sequenced pigs from commercial breeds to identify potential candidate mutations responsible for phenotypic divergence among these groups of breeds. Results Our results point out some local breeds with low genetic diversity, whose genome shows a high proportion of regions of homozygosis (>50%) and that harbour a large number of potentially damaging mutations. We also observed a high correlation between genetic diversity estimates using high-density SNP data and Next Generation Sequencing data (r = 0.96 at individual level). The study of non-synonymous SNPs that were fixed in commercial breeds and also in any local breed, but with different allele, revealed 99 non-synonymous SNPs affecting 65 genes. Candidate mutations that may underlie differences in the adaptation to the environment were exemplified by the genes AZGP1 and TAS2R40. We also observed that highly productive breeds may have lost advantageous genotypes within genes involve in immune response – e.g. IL12RB2 and STAB1–, probably as a result of strong artificial in the intensive production systems in pig. Conclusions The high correlation between genetic diversity computed with the 60K SNP and whole genome re-sequence data indicates that the Porcine 60K SNP Beadchip provides reliable estimates of genomic diversity in European pig populations despite the expected bias. Moreover, this analysis gave insights for strategies to the genetic characterization of local breeds. The comparison between re-sequenced local pigs and re-sequenced commercial pigs made it possible to report candidate mutations to be responsible for phenotypic divergence among those groups of breeds. This study highlights the importance of low input breeds as a valuable genetic reservoir for the pig production industry. However, the high levels of ROHs, inbreeding and potentially damaging mutations emphasize the importance of the genetic characterization of local breeds to preserve their genomic variability. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-601) contains supplementary material, which is available to authorized users. Background A major concern in conservation genetics is to maintain the genetic diversity of populations. Genetic variation in livestock species is threatened by the progressive marginalisation of local breeds in benefit of high-output pigs worldwide. We used high-density SNP and re-sequencing data to assess genetic diversity of local pig breeds from Europe. In addition, we re-sequenced pigs from commercial breeds to identify potential candidate mutations responsible for phenotypic divergence among these groups of breeds. Results Our results point out some local breeds with low genetic diversity, whose genome shows a high proportion of regions of homozygosis (>50%) and that harbour a large number of potentially damaging mutations. We also observed a high correlation between genetic diversity estimates using high-density SNP data and Next Generation Sequencing data (r = 0.96 at individual level). The study of non-synonymous SNPs that were fixed in commercial breeds and also in any local breed, but with different allele, revealed 99 non-synonymous SNPs affecting 65 genes. Candidate mutations that may underlie differences in the adaptation to the environment were exemplified by the genes AZGP1 and TAS2R40. We also observed that highly productive breeds may have lost advantageous genotypes within genes involve in immune response – e.g. IL12RB2 and STAB1–, probably as a result of strong artificial in the intensive production systems in pig. Conclusions The high correlation between genetic diversity computed with the 60K SNP and whole genome re-sequence data indicates that the Porcine 60K SNP Beadchip provides reliable estimates of genomic diversity in European pig populations despite the expected bias. Moreover, this analysis gave insights for strategies to the genetic characterization of local breeds. The comparison between re-sequenced local pigs and re-sequenced commercial pigs made it possible to report candidate mutations to be responsible for phenotypic divergence among those groups of breeds. This study highlights the importance of low input breeds as a valuable genetic reservoir for the pig production industry. However, the high levels of ROHs, inbreeding and potentially damaging mutations emphasize the importance of the genetic characterization of local breeds to preserve their genomic variability. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-601) contains supplementary material, which is available to authorized users.
Genome Wide Association Study for Type 2 Diabetes in Indians Identifies a New Susceptibility Locus at 2q21
Indians undergoing socioeconomic and lifestyle transitions will be maximally affected by epidemic of type 2 diabetes (T2D). We conducted a two-stage genome-wide association study of T2D in 12,535 Indians, a less explored but high-risk group. We identified a new type 2 diabetes–associated locus at 2q21, with the lead signal being rs6723108 (odds ratio 1.31; P = 3.32 × 10−9). Imputation analysis refined the signal to rs998451 (odds ratio 1.56; P = 6.3 × 10−12) within TMEM163 that encodes a probable vesicular transporter in nerve terminals. TMEM163 variants also showed association with decreased fasting plasma insulin and homeostatic model assessment of insulin resistance, indicating a plausible effect through impaired insulin secretion. The 2q21 region also harbors RAB3GAP1 and ACMSD; those are involved in neurologic disorders. Forty-nine of 56 previously reported signals showed consistency in direction with similar effect sizes in Indians and previous studies, and 25 of them were also associated (P < 0.05). Known loci and the newly identified 2q21 locus altogether explained 7.65% variance in the risk of T2D in Indians. Our study suggests that common susceptibility variants for T2D are largely the same across populations, but also reveals a population-specific locus and provides further insights into genetic architecture and etiology of T2D. Indians undergoing socioeconomic and lifestyle transitions will be maximally affected by epidemic of type 2 diabetes (T2D). We conducted a two-stage genome-wide association study of T2D in 12,535 Indians, a less explored but high-risk group. We identified a new type 2 diabetes–associated locus at 2q21, with the lead signal being rs6723108 (odds ratio 1.31; P = 3.32 × 10−9). Imputation analysis refined the signal to rs998451 (odds ratio 1.56; P = 6.3 × 10−12) within TMEM163 that encodes a probable vesicular transporter in nerve terminals. TMEM163 variants also showed association with decreased fasting plasma insulin and homeostatic model assessment of insulin resistance, indicating a plausible effect through impaired insulin secretion. The 2q21 region also harbors RAB3GAP1 and ACMSD; those are involved in neurologic disorders. Forty-nine of 56 previously reported signals showed consistency in direction with similar effect sizes in Indians and previous studies, and 25 of them were also associated (P < 0.05). Known loci and the newly identified 2q21 locus altogether explained 7.65% variance in the risk of T2D in Indians. Our study suggests that common susceptibility variants for T2D are largely the same across populations, but also reveals a population-specific locus and provides further insights into genetic architecture and etiology of T2D.
The genetic architecture of pediatric cognitive abilities in the Philadelphia Neurodevelopmental Cohort
The objective of this analysis was to examine the genetic architecture of diverse cognitive abilities in children and adolescents, including the magnitude of common genetic effects and patterns of shared and unique genetic influences. Subjects included 3,689 members of the Philadelphia Neurodevelopmental Cohort, a general population sample of ages 8-21 years who completed an extensive battery of cognitive tests. We used genome-wide complex trait analysis (GCTA) to estimate the SNP-based heritability of each domain, as well as the genetic correlation between all domains that showed significant genetic influence. Several of the individual domains suggested strong influence of common genetic variants (e.g. reading ability, h2g=0.43, p=4e-06; emotion identification, h2g=0.36, p=1e-05; verbal memory, h2g=0.24, p=0.005). The genetic correlations highlighted trait domains that are candidates for joint interrogation in future genetic studies (e.g. language reasoning and spatial reasoning, r(g)=0.72, p=0.007). These results can be used to structure future genetic and neuropsychiatric investigations of diverse cognitive abilities. The objective of this analysis was to examine the genetic architecture of diverse cognitive abilities in children and adolescents, including the magnitude of common genetic effects and patterns of shared and unique genetic influences. Subjects included 3,689 members of the Philadelphia Neurodevelopmental Cohort, a general population sample of ages 8-21 years who completed an extensive battery of cognitive tests. We used genome-wide complex trait analysis (GCTA) to estimate the SNP-based heritability of each domain, as well as the genetic correlation between all domains that showed significant genetic influence. Several of the individual domains suggested strong influence of common genetic variants (e.g. reading ability, h2g=0.43, p=4e-06; emotion identification, h2g=0.36, p=1e-05; verbal memory, h2g=0.24, p=0.005). The genetic correlations highlighted trait domains that are candidates for joint interrogation in future genetic studies (e.g. language reasoning and spatial reasoning, r(g)=0.72, p=0.007). These results can be used to structure future genetic and neuropsychiatric investigations of diverse cognitive abilities.
Variants near CHRNA3/5 and APOE have age and sex related effects on human lifespan
Understanding the genetic influences on human aging requires a large number of subjects for a study of sufficient power. Here, Jim Wilson and colleagues use information on parental ages at death to show that common variants near the genes for apolipoprotein E and nicotinic acetylcholine receptor subunit alpha 5 are associated with longer lifespan. Understanding the genetic influences on human aging requires a large number of subjects for a study of sufficient power. Here, Jim Wilson and colleagues use information on parental ages at death to show that common variants near the genes for apolipoprotein E and nicotinic acetylcholine receptor subunit alpha 5 are associated with longer lifespan.Lifespan is a trait of enormous personal interest. Research into the biological basis of human lifespan, however, is hampered by the long time to death. Using a novel approach of regressing (272,081) parental lifespans beyond age 40 years on participant genotype in a new large data set (UK Biobank), we here show that common variants near the apolipoprotein E and nicotinic acetylcholine receptor subunit alpha 5 genes are associated with lifespan. The effects are strongly sex and age dependent, with APOE ɛ4 differentially influencing maternal lifespan (P=4.2 × 10−15, effect −1.24 years of maternal life per imputed risk allele in parent; sex difference, P=0.011), and a locus near CHRNA3/5 differentially affecting paternal lifespan (P=4.8 × 10−11, effect −0.86 years per allele; sex difference P=0.075). Rare homozygous carriers of the risk alleles at both loci are predicted to have 3.3–3.7 years shorter lives. Lifespan is a trait of enormous personal interest. Research into the biological basis of human lifespan, however, is hampered by the long time to death. Using a novel approach of regressing (272,081) parental lifespans beyond age 40 years on participant genotype in a new large data set (UK Biobank), we here show that common variants near the apolipoprotein E and nicotinic acetylcholine receptor subunit alpha 5 genes are associated with lifespan. The effects are strongly sex and age dependent, with APOE ɛ4 differentially influencing maternal lifespan (P=4.2 × 10−15, effect −1.24 years of maternal life per imputed risk allele in parent; sex difference, P=0.011), and a locus near CHRNA3/5 differentially affecting paternal lifespan (P=4.8 × 10−11, effect −0.86 years per allele; sex difference P=0.075). Rare homozygous carriers of the risk alleles at both loci are predicted to have 3.3–3.7 years shorter lives.
Genomewide meta‐analysis identifies loci associated with IGF‐I and IGFBP‐3 levels with impact on age‐related traits
Summary The growth hormone/insulin‐like growth factor (IGF) axis can be manipulated in animal models to promote longevity, and IGF‐related proteins including IGF‐I and IGF‐binding protein‐3 (IGFBP‐3) have also been implicated in risk of human diseases including cardiovascular diseases, diabetes, and cancer. Through genomewide association study of up to 30 884 adults of European ancestry from 21 studies, we confirmed and extended the list of previously identified loci associated with circulating IGF‐I and IGFBP‐3 concentrations (IGF1, IGFBP3,GCKR,TNS3, GHSR, FOXO3, ASXL2, NUBP2/IGFALS, SORCS2, and CELSR2). Significant sex interactions, which were characterized by different genotype–phenotype associations between men and women, were found only for associations of IGFBP‐3 concentrations with SNPs at the loci IGFBP3 and SORCS2. Analyses of SNPs, gene expression, and protein levels suggested that interplay between IGFBP3 and genes within the NUBP2 locus (IGFALS and HAGH) may affect circulating IGF‐I and IGFBP‐3 concentrations. The IGF‐I‐decreasing allele of SNP rs934073, which is an eQTL of ASXL2, was associated with lower adiposity and higher likelihood of survival beyond 90 years. The known longevity‐associated variant rs2153960 (FOXO3) was observed to be a genomewide significant SNP for IGF‐I concentrations. Bioinformatics analysis suggested enrichment of putative regulatory elements among these IGF‐I‐ and IGFBP‐3‐associated loci, particularly of rs646776 at CELSR2. In conclusion, this study identified several loci associated with circulating IGF‐I and IGFBP‐3 concentrations and provides clues to the potential role of the IGF axis in mediating effects of known (FOXO3) and novel (ASXL2) longevity‐associated loci. Summary The growth hormone/insulin‐like growth factor (IGF) axis can be manipulated in animal models to promote longevity, and IGF‐related proteins including IGF‐I and IGF‐binding protein‐3 (IGFBP‐3) have also been implicated in risk of human diseases including cardiovascular diseases, diabetes, and cancer. Through genomewide association study of up to 30 884 adults of European ancestry from 21 studies, we confirmed and extended the list of previously identified loci associated with circulating IGF‐I and IGFBP‐3 concentrations (IGF1, IGFBP3,GCKR,TNS3, GHSR, FOXO3, ASXL2, NUBP2/IGFALS, SORCS2, and CELSR2). Significant sex interactions, which were characterized by different genotype–phenotype associations between men and women, were found only for associations of IGFBP‐3 concentrations with SNPs at the loci IGFBP3 and SORCS2. Analyses of SNPs, gene expression, and protein levels suggested that interplay between IGFBP3 and genes within the NUBP2 locus (IGFALS and HAGH) may affect circulating IGF‐I and IGFBP‐3 concentrations. The IGF‐I‐decreasing allele of SNP rs934073, which is an eQTL of ASXL2, was associated with lower adiposity and higher likelihood of survival beyond 90 years. The known longevity‐associated variant rs2153960 (FOXO3) was observed to be a genomewide significant SNP for IGF‐I concentrations. Bioinformatics analysis suggested enrichment of putative regulatory elements among these IGF‐I‐ and IGFBP‐3‐associated loci, particularly of rs646776 at CELSR2. In conclusion, this study identified several loci associated with circulating IGF‐I and IGFBP‐3 concentrations and provides clues to the potential role of the IGF axis in mediating effects of known (FOXO3) and novel (ASXL2) longevity‐associated loci.
Genetic variant rs17225178 in the ARNT2 gene is associated with Asperger Syndrome
Background Autism Spectrum Conditions (ASC) are neurodevelopmental conditions characterized by difficulties in communication and social interaction, alongside unusually repetitive behaviours and narrow interests. Asperger Syndrome (AS) is one subgroup of ASC and differs from classic autism in that in AS there is no language or general cognitive delay. Genetic, epigenetic and environmental factors are implicated in ASC and genes involved in neural connectivity and neurodevelopment are good candidates for studying the susceptibility to ASC. The aryl-hydrocarbon receptor nuclear translocator 2 (ARNT2) gene encodes a transcription factor involved in neurodevelopmental processes, neuronal connectivity and cellular responses to hypoxia. A mutation in this gene has been identified in individuals with ASC and single nucleotide polymorphisms (SNPs) have been nominally associated with AS and autistic traits in previous studies. Methods In this study, we tested 34 SNPs in ARNT2 for association with AS in 118 cases and 412 controls of Caucasian origin. P values were adjusted for multiple comparisons, and linkage disequilibrium (LD) among the SNPs analysed was calculated in our sample. Finally, SNP annotation allowed functional and structural analyses of the genetic variants in ARNT2. We tested the replicability of our result using the genome-wide association studies (GWAS) database of the Psychiatric Genomics Consortium (PGC). Results We report statistically significant association of rs17225178 with AS. This SNP modifies transcription factor binding sites and regions that regulate the chromatin state in neural cell lines. It is also included in a LD block in our sample, alongside other genetic variants that alter chromatin regulatory regions in neural cells. Conclusions These findings demonstrate that rs17225178 in the ARNT2 gene is associated with AS and support previous studies that pointed out an involvement of this gene in the predisposition to ASC. Background Autism Spectrum Conditions (ASC) are neurodevelopmental conditions characterized by difficulties in communication and social interaction, alongside unusually repetitive behaviours and narrow interests. Asperger Syndrome (AS) is one subgroup of ASC and differs from classic autism in that in AS there is no language or general cognitive delay. Genetic, epigenetic and environmental factors are implicated in ASC and genes involved in neural connectivity and neurodevelopment are good candidates for studying the susceptibility to ASC. The aryl-hydrocarbon receptor nuclear translocator 2 (ARNT2) gene encodes a transcription factor involved in neurodevelopmental processes, neuronal connectivity and cellular responses to hypoxia. A mutation in this gene has been identified in individuals with ASC and single nucleotide polymorphisms (SNPs) have been nominally associated with AS and autistic traits in previous studies. Methods In this study, we tested 34 SNPs in ARNT2 for association with AS in 118 cases and 412 controls of Caucasian origin. P values were adjusted for multiple comparisons, and linkage disequilibrium (LD) among the SNPs analysed was calculated in our sample. Finally, SNP annotation allowed functional and structural analyses of the genetic variants in ARNT2. We tested the replicability of our result using the genome-wide association studies (GWAS) database of the Psychiatric Genomics Consortium (PGC). Results We report statistically significant association of rs17225178 with AS. This SNP modifies transcription factor binding sites and regions that regulate the chromatin state in neural cell lines. It is also included in a LD block in our sample, alongside other genetic variants that alter chromatin regulatory regions in neural cells. Conclusions These findings demonstrate that rs17225178 in the ARNT2 gene is associated with AS and support previous studies that pointed out an involvement of this gene in the predisposition to ASC.
Genetic Variants on Chromosome 1p13.3 Are Associated with Non ST Elevation Myocardial Infarction and the Expression of DRAM2 in the Finnish Population
Myocardial infarction (MI) is divided into either ST elevation MI (STEMI) or non-ST elevation MI (NSTEMI), differing in a number of clinical characteristics. We sought to identify genetic variants conferring risk to NSTEMI or STEMI by conducting a genome-wide association study (GWAS) of MI stratified into NSTEMI and STEMI in a consecutive sample of 1,579 acute MI cases with 1,576 controls. Subsequently, we followed the results in an independent population-based sample of 562 cases and 566 controls, a partially independent prospective cohort (N = 16,627 with 163 incident NSTEMI cases), and examined the effect of disease-associated variants on gene expression in 513 healthy participants. Genetic variants on chromosome 1p13.3 near the damage-regulated autophagy modulator 2 gene DRAM2 associated with NSTEMI (rs656843; odds ratio 1.57, P = 3.11 × 10−10) in the case-control analysis with a consistent but not statistically significant effect in the prospective cohort (rs656843; hazard ratio 1.13, P = 0.43). These variants were not associated with STEMI (rs656843; odds ratio, 1.11, P = 0.20; hazard ratio 0.97, P = 0.87), appearing to have a pronounced effect on NSTEMI risk. A majority of the variants at 1p13.3 associated with NSTEMI were also associated with the expression level of DRAM2 in blood leukocytes of healthy controls (top-ranked variant rs325927, P = 1.50 × 10−12). The results suggest that genetic factors may in part influence whether coronary artery disease results in NSTEMI rather than STEMI. Myocardial infarction (MI) is divided into either ST elevation MI (STEMI) or non-ST elevation MI (NSTEMI), differing in a number of clinical characteristics. We sought to identify genetic variants conferring risk to NSTEMI or STEMI by conducting a genome-wide association study (GWAS) of MI stratified into NSTEMI and STEMI in a consecutive sample of 1,579 acute MI cases with 1,576 controls. Subsequently, we followed the results in an independent population-based sample of 562 cases and 566 controls, a partially independent prospective cohort (N = 16,627 with 163 incident NSTEMI cases), and examined the effect of disease-associated variants on gene expression in 513 healthy participants. Genetic variants on chromosome 1p13.3 near the damage-regulated autophagy modulator 2 gene DRAM2 associated with NSTEMI (rs656843; odds ratio 1.57, P = 3.11 × 10−10) in the case-control analysis with a consistent but not statistically significant effect in the prospective cohort (rs656843; hazard ratio 1.13, P = 0.43). These variants were not associated with STEMI (rs656843; odds ratio, 1.11, P = 0.20; hazard ratio 0.97, P = 0.87), appearing to have a pronounced effect on NSTEMI risk. A majority of the variants at 1p13.3 associated with NSTEMI were also associated with the expression level of DRAM2 in blood leukocytes of healthy controls (top-ranked variant rs325927, P = 1.50 × 10−12). The results suggest that genetic factors may in part influence whether coronary artery disease results in NSTEMI rather than STEMI.
Genomic population structure of freshwater‐resident and anadromous ide (Leuciscus idus) in north‐western Europe
Abstract Climate change experts largely agree that future climate change and associated rises in oceanic water levels over the upcoming decades, will affect marine salinity levels. The subsequent effects on fish communities in estuarine ecosystems however, are less clear. One species that is likely to become increasingly affected by changes in salinity is the ide (Leuciscus idus). The ide is a stenohaline freshwater fish that primarily inhabits rivers, with frequent anadromous behavior when sea salinity does not exceed 15%. Unlike most other anadromous Baltic Sea fish species, the ide has yet to be subjected to large‐scale stocking programs, and thus provides an excellent opportunity for studying the natural population structure across the current salinity gradient in the Danish Belts. To explore this, we used Genotyping‐by‐Sequencing to determine genomic population structure of both freshwater resident and anadromous ide populations in the western Baltic Sea region, and relate the results to the current salinity gradient and the demographic history of ide in the region. The sample sites separate into four clusters, with all anadromous populations in one cluster and the freshwater resident populations in the remaining three. Results demonstrate high level of differentiation between sites hosting freshwater resident populations, but little differentiation among anadromous populations. Thus ide exhibit the genomic population structure of both a typical freshwater species, and a typical anadromous species. In addition to providing a first insight into the population structure of north‐western European ide, our data also (1) provide indications of a single illegal introduction by man; (2) suggest limited genetic effects of heavy pollution in the past; and (3) indicate possible historical anadromous behavior in a now isolated freshwater population. Abstract Climate change experts largely agree that future climate change and associated rises in oceanic water levels over the upcoming decades, will affect marine salinity levels. The subsequent effects on fish communities in estuarine ecosystems however, are less clear. One species that is likely to become increasingly affected by changes in salinity is the ide (Leuciscus idus). The ide is a stenohaline freshwater fish that primarily inhabits rivers, with frequent anadromous behavior when sea salinity does not exceed 15%. Unlike most other anadromous Baltic Sea fish species, the ide has yet to be subjected to large‐scale stocking programs, and thus provides an excellent opportunity for studying the natural population structure across the current salinity gradient in the Danish Belts. To explore this, we used Genotyping‐by‐Sequencing to determine genomic population structure of both freshwater resident and anadromous ide populations in the western Baltic Sea region, and relate the results to the current salinity gradient and the demographic history of ide in the region. The sample sites separate into four clusters, with all anadromous populations in one cluster and the freshwater resident populations in the remaining three. Results demonstrate high level of differentiation between sites hosting freshwater resident populations, but little differentiation among anadromous populations. Thus ide exhibit the genomic population structure of both a typical freshwater species, and a typical anadromous species. In addition to providing a first insight into the population structure of north‐western European ide, our data also (1) provide indications of a single illegal introduction by man; (2) suggest limited genetic effects of heavy pollution in the past; and (3) indicate possible historical anadromous behavior in a now isolated freshwater population.
A genome wide association study of posttraumatic stress disorder identifies the retinoid related orphan receptor alpha (RORA) gene as a significant risk locus
We describe the results of the first genome-wide association study of PTSD performed using trauma-exposed white non-Hispanic participants from a cohort of veterans and their intimate partners (295 cases and 196 controls). Several SNPs yielded evidence of association. One SNP (rs8042149), located in the retinoid-related orphan receptor alpha gene (RORA), reached genome-wide significance. Nominally significant associations were observed for other RORA SNPs in two African American replication samples—one from the veteran cohort (43 cases and 41 controls) and another independent cohort (100 cases and 421 controls). However, only the associated SNP from the veteran African American replication sample survived gene-level multiple testing correction. RORA has been implicated in prior GWAS studies of psychiatric disorders and is known to play an important role in neuroprotection and other behaviorally-relevant processes. This study represents an important step towards identifying the genetic underpinnings of PTSD. We describe the results of the first genome-wide association study of PTSD performed using trauma-exposed white non-Hispanic participants from a cohort of veterans and their intimate partners (295 cases and 196 controls). Several SNPs yielded evidence of association. One SNP (rs8042149), located in the retinoid-related orphan receptor alpha gene (RORA), reached genome-wide significance. Nominally significant associations were observed for other RORA SNPs in two African American replication samples—one from the veteran cohort (43 cases and 41 controls) and another independent cohort (100 cases and 421 controls). However, only the associated SNP from the veteran African American replication sample survived gene-level multiple testing correction. RORA has been implicated in prior GWAS studies of psychiatric disorders and is known to play an important role in neuroprotection and other behaviorally-relevant processes. This study represents an important step towards identifying the genetic underpinnings of PTSD.
A Pooled Genome Wide Association Study of Asperger Syndrome
Asperger Syndrome (AS) is a neurodevelopmental condition characterized by impairments in social interaction and communication, alongside the presence of unusually repetitive, restricted interests and stereotyped behaviour. Individuals with AS have no delay in cognitive and language development. It is a subset of Autism Spectrum Conditions (ASC), which are highly heritable and has a population prevalence of approximately 1%. Few studies have investigated the genetic basis of AS. To address this gap in the literature, we performed a genome-wide pooled DNA association study to identify candidate loci in 612 individuals (294 cases and 318 controls) of Caucasian ancestry, using the Affymetrix GeneChip Human Mapping version 6.0 array. We identified 11 SNPs that had a p-value below 1x10-5. These SNPs were independently genotyped in the same sample. Three of the SNPs (rs1268055, rs7785891 and rs2782448) were nominally significant, though none remained significant after Bonferroni correction. Two of our top three SNPs (rs7785891 and rs2782448) lie in loci previously implicated in ASC. However, investigation of the three SNPs in the ASC genome-wide association dataset from the Psychiatric Genomics Consortium indicated that these three SNPs were not significantly associated with ASC. The effect sizes of the variants were modest, indicating that our study was not sufficiently powered to identify causal variants with precision. Asperger Syndrome (AS) is a neurodevelopmental condition characterized by impairments in social interaction and communication, alongside the presence of unusually repetitive, restricted interests and stereotyped behaviour. Individuals with AS have no delay in cognitive and language development. It is a subset of Autism Spectrum Conditions (ASC), which are highly heritable and has a population prevalence of approximately 1%. Few studies have investigated the genetic basis of AS. To address this gap in the literature, we performed a genome-wide pooled DNA association study to identify candidate loci in 612 individuals (294 cases and 318 controls) of Caucasian ancestry, using the Affymetrix GeneChip Human Mapping version 6.0 array. We identified 11 SNPs that had a p-value below 1x10-5. These SNPs were independently genotyped in the same sample. Three of the SNPs (rs1268055, rs7785891 and rs2782448) were nominally significant, though none remained significant after Bonferroni correction. Two of our top three SNPs (rs7785891 and rs2782448) lie in loci previously implicated in ASC. However, investigation of the three SNPs in the ASC genome-wide association dataset from the Psychiatric Genomics Consortium indicated that these three SNPs were not significantly associated with ASC. The effect sizes of the variants were modest, indicating that our study was not sufficiently powered to identify causal variants with precision.
Common variation near ROBO2 is associated with expressive vocabulary in infancy
The genetic basis of expressive vocabulary in children around 2 years old is poorly understood. Here, the authors show that a genetic variant near the ROBO2 gene is associated with early language acquisition in the general population and highlight a potential genetic link between language-related common genetic variation and a linkage region for dyslexia, speech-sound disorder and reading. The genetic basis of expressive vocabulary in children around 2 years old is poorly understood. Here, the authors show that a genetic variant near the ROBO2 gene is associated with early language acquisition in the general population and highlight a potential genetic link between language-related common genetic variation and a linkage region for dyslexia, speech-sound disorder and reading.Twin studies suggest that expressive vocabulary at ~24 months is modestly heritable. However, the genes influencing this early linguistic phenotype are unknown. Here we conduct a genome-wide screen and follow-up study of expressive vocabulary in toddlers of European descent from up to four studies of the EArly Genetics and Lifecourse Epidemiology consortium, analysing an early (15–18 months, ‘one-word stage’, NTotal=8,889) and a later (24–30 months, ‘two-word stage’, NTotal=10,819) phase of language acquisition. For the early phase, one single-nucleotide polymorphism (rs7642482) at 3p12.3 near ROBO2, encoding a conserved axon-binding receptor, reaches the genome-wide significance level (P=1.3 × 10−8) in the combined sample. This association links language-related common genetic variation in the general population to a potential autism susceptibility locus and a linkage region for dyslexia, speech-sound disorder and reading. The contribution of common genetic influences is, although modest, supported by genome-wide complex trait analysis (meta-GCTA h215–18-months=0.13, meta-GCTA h224–30-months=0.14) and in concordance with additional twin analysis (5,733 pairs of European descent, h224-months=0.20). Twin studies suggest that expressive vocabulary at ~24 months is modestly heritable. However, the genes influencing this early linguistic phenotype are unknown. Here we conduct a genome-wide screen and follow-up study of expressive vocabulary in toddlers of European descent from up to four studies of the EArly Genetics and Lifecourse Epidemiology consortium, analysing an early (15–18 months, ‘one-word stage’, NTotal=8,889) and a later (24–30 months, ‘two-word stage’, NTotal=10,819) phase of language acquisition. For the early phase, one single-nucleotide polymorphism (rs7642482) at 3p12.3 near ROBO2, encoding a conserved axon-binding receptor, reaches the genome-wide significance level (P=1.3 × 10−8) in the combined sample. This association links language-related common genetic variation in the general population to a potential autism susceptibility locus and a linkage region for dyslexia, speech-sound disorder and reading. The contribution of common genetic influences is, although modest, supported by genome-wide complex trait analysis (meta-GCTA h215–18-months=0.13, meta-GCTA h224–30-months=0.14) and in concordance with additional twin analysis (5,733 pairs of European descent, h224-months=0.20).
Expression Quantitative Trait Loci Information Improves Predictive Modeling of Disease Relevance of Non Coding Genetic Variation
Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence. We and others have demonstrated strong enrichment of such single nucleotide polymorphisms (SNPs) for expression quantitative trait loci (eQTLs), supporting an important role for regulatory genetic variation in complex disease pathogenesis. Herein we describe our initial efforts to develop a predictive model of disease-associated variants leveraging eQTL information. We first catalogued cis-acting eQTLs (SNPs within 100kb of target gene transcripts) by meta-analyzing four studies of three blood-derived tissues (n = 586). At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10−04), particularly those related to immune diseases and metabolic traits. Based on eQTL information and other variant annotations (distance from target gene transcript, minor allele frequency, and chromatin state), we created multivariate logistic regression models to predict SNP membership in reported GWAS. The complete model revealed independent contributions of specific annotations as strong predictors, including evidence for an eQTL (odds ratio (OR) = 1.2–2.0, P < 10−11) and the chromatin states of active promoters, different classes of strong or weak enhancers, or transcriptionally active regions (OR = 1.5–2.3, P < 10−11). This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3–10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information. This eQTL-based prediction model of disease relevance can help systematically prioritize non-coding GWAS SNPs for further functional characterization. Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence. We and others have demonstrated strong enrichment of such single nucleotide polymorphisms (SNPs) for expression quantitative trait loci (eQTLs), supporting an important role for regulatory genetic variation in complex disease pathogenesis. Herein we describe our initial efforts to develop a predictive model of disease-associated variants leveraging eQTL information. We first catalogued cis-acting eQTLs (SNPs within 100kb of target gene transcripts) by meta-analyzing four studies of three blood-derived tissues (n = 586). At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10−04), particularly those related to immune diseases and metabolic traits. Based on eQTL information and other variant annotations (distance from target gene transcript, minor allele frequency, and chromatin state), we created multivariate logistic regression models to predict SNP membership in reported GWAS. The complete model revealed independent contributions of specific annotations as strong predictors, including evidence for an eQTL (odds ratio (OR) = 1.2–2.0, P < 10−11) and the chromatin states of active promoters, different classes of strong or weak enhancers, or transcriptionally active regions (OR = 1.5–2.3, P < 10−11). This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3–10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information. This eQTL-based prediction model of disease relevance can help systematically prioritize non-coding GWAS SNPs for further functional characterization.
Clinical and Molecular Heterogeneity of RTEL1 Deficiency
Typical features of dyskeratosis congenita (DC) resulting from excessive telomere shortening include bone marrow failure (BMF), mucosal fragility, and pulmonary or liver fibrosis. In more severe cases, immune deficiency and recurring infections can add to disease severity. RTEL1 deficiency has recently been described as a major genetic etiology, but the molecular basis and clinical consequences of RTEL1-associated DC are incompletely characterized. We report our observations in a cohort of six patients: five with novel biallelic RTEL1 mutations p.Trp456Cys, p.Ile425Thr, p.Cys1244ProfsX17, p.Pro884_Gln885ins53X13, and one with novel heterozygous mutation p.Val796AlafsX4. The most unifying features were hypocellular BMF in 6/6 and B-/NK-cell lymphopenia in 5/6 patients. In addition, three patients with homozygous mutations p.Trp456Cys or p.Ile425Thr also suffered from immunodeficiency, cerebellar hypoplasia, and enteropathy, consistent with Hoyeraal-Hreidarsson syndrome. Chromosomal breakage resembling a homologous recombination defect was detected in patient-derived fibroblasts but not in hematopoietic compartment. Notably, in both cellular compartments, differential expression of 1243aa and 1219/1300aa RTEL1 isoforms was observed. In fibroblasts, response to ionizing irradiation and non-homologous end joining were not impaired. Telomeric circles did not accumulate in patient-derived primary cells and lymphoblastoid cell lines, implying alternative pathomechanisms for telomeric loss. Overall, RTEL1-deficient cells exhibited a phenotype of replicative exhaustion, spontaneous apoptosis and senescence. Specifically, CD34+ cells failed to expand in vitro, B-cell development was compromised, and T-cells did not proliferate in long-term culture. Finally, we report on the natural history and outcome of our patients. While two patients died from infections, hematopoietic stem cell transplantation (HSCT) resulted in sustained engraftment in two patients. Whether chemotherapy negatively impacts on the course and onset of other DC-related symptoms remains open at present. Early-onset lung disease occurred in one of our patients after HSCT. In conclusion, RTEL deficiency can show a heterogeneous clinical picture ranging from mild hypocellular BMF with B/NK cell lymphopenia to early-onset, very severe, and rapidly progressing cellular deficiency. Typical features of dyskeratosis congenita (DC) resulting from excessive telomere shortening include bone marrow failure (BMF), mucosal fragility, and pulmonary or liver fibrosis. In more severe cases, immune deficiency and recurring infections can add to disease severity. RTEL1 deficiency has recently been described as a major genetic etiology, but the molecular basis and clinical consequences of RTEL1-associated DC are incompletely characterized. We report our observations in a cohort of six patients: five with novel biallelic RTEL1 mutations p.Trp456Cys, p.Ile425Thr, p.Cys1244ProfsX17, p.Pro884_Gln885ins53X13, and one with novel heterozygous mutation p.Val796AlafsX4. The most unifying features were hypocellular BMF in 6/6 and B-/NK-cell lymphopenia in 5/6 patients. In addition, three patients with homozygous mutations p.Trp456Cys or p.Ile425Thr also suffered from immunodeficiency, cerebellar hypoplasia, and enteropathy, consistent with Hoyeraal-Hreidarsson syndrome. Chromosomal breakage resembling a homologous recombination defect was detected in patient-derived fibroblasts but not in hematopoietic compartment. Notably, in both cellular compartments, differential expression of 1243aa and 1219/1300aa RTEL1 isoforms was observed. In fibroblasts, response to ionizing irradiation and non-homologous end joining were not impaired. Telomeric circles did not accumulate in patient-derived primary cells and lymphoblastoid cell lines, implying alternative pathomechanisms for telomeric loss. Overall, RTEL1-deficient cells exhibited a phenotype of replicative exhaustion, spontaneous apoptosis and senescence. Specifically, CD34+ cells failed to expand in vitro, B-cell development was compromised, and T-cells did not proliferate in long-term culture. Finally, we report on the natural history and outcome of our patients. While two patients died from infections, hematopoietic stem cell transplantation (HSCT) resulted in sustained engraftment in two patients. Whether chemotherapy negatively impacts on the course and onset of other DC-related symptoms remains open at present. Early-onset lung disease occurred in one of our patients after HSCT. In conclusion, RTEL deficiency can show a heterogeneous clinical picture ranging from mild hypocellular BMF with B/NK cell lymphopenia to early-onset, very severe, and rapidly progressing cellular deficiency.
Susceptibility Loci Associated with Specific and Shared Subtypes of Lymphoid Malignancies
Author Summary B-cell lymphomas comprise several diseases representing aberrant proliferations of immune cells at various stages of maturation. It might be expected that dissimilar subtypes of lymphoma will have different etiologic and pathogenic mechanisms, reflecting the distinct histologic and clinical characteristics of these diseases. This study aims to define both shared as well as specific genetic risk factors for lymphoma. Utilizing a genome-wide approach, we discovered novel locations in the genome associated with risk for lymphoid malignancies. Common variants in these regions, on chromosome 11q12.1 and 6p23, were each associated with a modest modification of risk for lymphoma. These regions harbor several genes of biological importance in lymphoid maturation and function. We also further characterized the HLA region at 6p21.32, previously associated with lymphoma risk and thought to be important in immune function. Some of the associated SNP markers were specific for one common subtype of lymphoma, e.g. follicular lymphoma. However, others were associated with combined subsets of disease, suggesting that there are both shared and subtype-specific associations between common genetic variants and human lymphoid cancer. Secondary analyses showed that the two novel regions harbor candidates that are biologically relevant and that regulate cell development and hematopoiesis. Author Summary B-cell lymphomas comprise several diseases representing aberrant proliferations of immune cells at various stages of maturation. It might be expected that dissimilar subtypes of lymphoma will have different etiologic and pathogenic mechanisms, reflecting the distinct histologic and clinical characteristics of these diseases. This study aims to define both shared as well as specific genetic risk factors for lymphoma. Utilizing a genome-wide approach, we discovered novel locations in the genome associated with risk for lymphoid malignancies. Common variants in these regions, on chromosome 11q12.1 and 6p23, were each associated with a modest modification of risk for lymphoma. These regions harbor several genes of biological importance in lymphoid maturation and function. We also further characterized the HLA region at 6p21.32, previously associated with lymphoma risk and thought to be important in immune function. Some of the associated SNP markers were specific for one common subtype of lymphoma, e.g. follicular lymphoma. However, others were associated with combined subsets of disease, suggesting that there are both shared and subtype-specific associations between common genetic variants and human lymphoid cancer. Secondary analyses showed that the two novel regions harbor candidates that are biologically relevant and that regulate cell development and hematopoiesis.The genetics of lymphoma susceptibility reflect the marked heterogeneity of diseases that comprise this broad phenotype. However, multiple subtypes of lymphoma are observed in some families, suggesting shared pathways of genetic predisposition to these pathologically distinct entities. Using a two-stage GWAS, we tested 530,583 SNPs in 944 cases of lymphoma, including 282 familial cases, and 4,044 public shared controls, followed by genotyping of 50 SNPs in 1,245 cases and 2,596 controls. A novel region on 11q12.1 showed association with combined lymphoma (LYM) subtypes. SNPs in this region included rs12289961 near LPXN, (PLYM = 3.89×10−8, OR = 1.29) and rs948562 (PLYM = 5.85×10−7, OR = 1.29). A SNP in a novel non-HLA region on 6p23 (rs707824, PNHL = 5.72×10−7) was suggestive of an association conferring susceptibility to lymphoma. Four SNPs, all in a previously reported HLA region, 6p21.32, showed genome-wide significant associations with follicular lymphoma. The most significant association with follicular lymphoma was for rs4530903 (PFL = 2.69×10−12, OR = 1.93). Three novel SNPs near the HLA locus, rs9268853, rs2647046, and rs2621416, demonstrated additional variation contributing toward genetic susceptibility to FL associated with this region. Genes implicated by GWAS were also found to be cis-eQTLs in lymphoblastoid cell lines; candidate genes in these regions have been implicated in hematopoiesis and immune function. These results, showing novel susceptibility regions and allelic heterogeneity, point to the existence of pathways of susceptibility to both shared as well as specific subtypes of lymphoid malignancy. The genetics of lymphoma susceptibility reflect the marked heterogeneity of diseases that comprise this broad phenotype. However, multiple subtypes of lymphoma are observed in some families, suggesting shared pathways of genetic predisposition to these pathologically distinct entities. Using a two-stage GWAS, we tested 530,583 SNPs in 944 cases of lymphoma, including 282 familial cases, and 4,044 public shared controls, followed by genotyping of 50 SNPs in 1,245 cases and 2,596 controls. A novel region on 11q12.1 showed association with combined lymphoma (LYM) subtypes. SNPs in this region included rs12289961 near LPXN, (PLYM = 3.89×10−8, OR = 1.29) and rs948562 (PLYM = 5.85×10−7, OR = 1.29). A SNP in a novel non-HLA region on 6p23 (rs707824, PNHL = 5.72×10−7) was suggestive of an association conferring susceptibility to lymphoma. Four SNPs, all in a previously reported HLA region, 6p21.32, showed genome-wide significant associations with follicular lymphoma. The most significant association with follicular lymphoma was for rs4530903 (PFL = 2.69×10−12, OR = 1.93). Three novel SNPs near the HLA locus, rs9268853, rs2647046, and rs2621416, demonstrated additional variation contributing toward genetic susceptibility to FL associated with this region. Genes implicated by GWAS were also found to be cis-eQTLs in lymphoblastoid cell lines; candidate genes in these regions have been implicated in hematopoiesis and immune function. These results, showing novel susceptibility regions and allelic heterogeneity, point to the existence of pathways of susceptibility to both shared as well as specific subtypes of lymphoid malignancy.
Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels
Genotype imputation is a vital tool in genome-wide association studies (GWAS) and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous, and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR + CEU + YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation based analysis in Latinos. Genotype imputation is a vital tool in genome-wide association studies (GWAS) and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous, and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR + CEU + YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation based analysis in Latinos.
A GWAS Study on Liver Function Test Using eMERGE Network Participants
Introduction Liver enzyme levels and total serum bilirubin are under genetic control and in recent years genome-wide population-based association studies have identified different susceptibility loci for these traits. We conducted a genome-wide association study in European ancestry participants from the Electronic Medical Records and Genomics (eMERGE) Network dataset of patient medical records with available genotyping data in order to identify genetic contributors to variability in serum bilirubin levels and other liver function tests and to compare the effects between adult and pediatric populations. Methods The process of whole genome imputation of eMERGE samples with standard quality control measures have been described previously. After removing missing data and outliers based on principal components (PC) analyses, 3294 samples from European ancestry were used for the GWAS study. The association between each single nucleotide polymorphism (SNP) and total serum bilirubin and other liver function tests was tested using linear regression, adjusting for age, gender, site, platform and ancestry principal components (PC). Results Consistent with previous results, a strong association signal has been detected for UGT1A gene cluster (best SNP rs887829, beta = 0.15, p = 1.30x10-118) for total serum bilirubin level. Indeed, in this region more than 176 SNPs (or indels) had p<10−8 spanning 150Kb on the long arm of chromosome 2q37.1. In addition, we found a similar level of magnitude in a pediatric group (p = 8.26x10-47, beta = 0.17). Further imputation using sequencing data as a reference panel revealed association of other markers including known TA7 repeat indels (rs8175347) (p = 9.78x10-117) and rs111741722 (p = 5.41x10-119) which were in proxy (r2 = 0.99) with rs887829. Among rare variants, two Asian subjects homozygous for coding SNP rs4148323 (G71R) were identified. Additional known effects for total serum bilirubin were also confirmed including organic anion transporters SLCO1B1-SLCO1B3, TDRP and ZMYND8 at FDR<0.05 with no gene-gene interaction effects. Phenome-wide association studies (PheWAS) suggest a protective effect of TA7 repeat against cerebrovascular disease in an adult cohort (OR = 0.75, p = 0.0008). Among other liver function tests, we also confirmed the previous effect of the ABO blood group locus for variation in serum alkaline phosphatase (rs579459, p = 9.44x10-15). Conclusions Taken together, our data present interesting findings with strong confirmation of previous effects by simply using the eMERGE electronic health record phenotyping. In addition, our findings indicate that similar to the adult population, the UGT1A1 is the main locus responsible for normal variation of serum bilirubin in pediatric populations. Introduction Liver enzyme levels and total serum bilirubin are under genetic control and in recent years genome-wide population-based association studies have identified different susceptibility loci for these traits. We conducted a genome-wide association study in European ancestry participants from the Electronic Medical Records and Genomics (eMERGE) Network dataset of patient medical records with available genotyping data in order to identify genetic contributors to variability in serum bilirubin levels and other liver function tests and to compare the effects between adult and pediatric populations. Methods The process of whole genome imputation of eMERGE samples with standard quality control measures have been described previously. After removing missing data and outliers based on principal components (PC) analyses, 3294 samples from European ancestry were used for the GWAS study. The association between each single nucleotide polymorphism (SNP) and total serum bilirubin and other liver function tests was tested using linear regression, adjusting for age, gender, site, platform and ancestry principal components (PC). Results Consistent with previous results, a strong association signal has been detected for UGT1A gene cluster (best SNP rs887829, beta = 0.15, p = 1.30x10-118) for total serum bilirubin level. Indeed, in this region more than 176 SNPs (or indels) had p<10−8 spanning 150Kb on the long arm of chromosome 2q37.1. In addition, we found a similar level of magnitude in a pediatric group (p = 8.26x10-47, beta = 0.17). Further imputation using sequencing data as a reference panel revealed association of other markers including known TA7 repeat indels (rs8175347) (p = 9.78x10-117) and rs111741722 (p = 5.41x10-119) which were in proxy (r2 = 0.99) with rs887829. Among rare variants, two Asian subjects homozygous for coding SNP rs4148323 (G71R) were identified. Additional known effects for total serum bilirubin were also confirmed including organic anion transporters SLCO1B1-SLCO1B3, TDRP and ZMYND8 at FDR<0.05 with no gene-gene interaction effects. Phenome-wide association studies (PheWAS) suggest a protective effect of TA7 repeat against cerebrovascular disease in an adult cohort (OR = 0.75, p = 0.0008). Among other liver function tests, we also confirmed the previous effect of the ABO blood group locus for variation in serum alkaline phosphatase (rs579459, p = 9.44x10-15). Conclusions Taken together, our data present interesting findings with strong confirmation of previous effects by simply using the eMERGE electronic health record phenotyping. In addition, our findings indicate that similar to the adult population, the UGT1A1 is the main locus responsible for normal variation of serum bilirubin in pediatric populations.
Genetic evidence for an origin of the Armenians from Bronze Age mixing of multiple populations
The Armenians are a culturally isolated population who historically inhabited a region in the Near East bounded by the Mediterranean and Black seas and the Caucasus, but remain under-represented in genetic studies and have a complex history including a major geographic displacement during World War I. Here, we analyse genome-wide variation in 173 Armenians and compare them with 78 other worldwide populations. We find that Armenians form a distinctive cluster linking the Near East, Europe, and the Caucasus. We show that Armenian diversity can be explained by several mixtures of Eurasian populations that occurred between ~3000 and ~2000 bce, a period characterized by major population migrations after the domestication of the horse, appearance of chariots, and the rise of advanced civilizations in the Near East. However, genetic signals of population mixture cease after ~1200 bce when Bronze Age civilizations in the Eastern Mediterranean world suddenly and violently collapsed. Armenians have since remained isolated and genetic structure within the population developed ~500 years ago when Armenia was divided between the Ottomans and the Safavid Empire in Iran. Finally, we show that Armenians have higher genetic affinity to Neolithic Europeans than other present-day Near Easterners, and that 29% of Armenian ancestry may originate from an ancestral population that is best represented by Neolithic Europeans. The Armenians are a culturally isolated population who historically inhabited a region in the Near East bounded by the Mediterranean and Black seas and the Caucasus, but remain under-represented in genetic studies and have a complex history including a major geographic displacement during World War I. Here, we analyse genome-wide variation in 173 Armenians and compare them with 78 other worldwide populations. We find that Armenians form a distinctive cluster linking the Near East, Europe, and the Caucasus. We show that Armenian diversity can be explained by several mixtures of Eurasian populations that occurred between ~3000 and ~2000 bce, a period characterized by major population migrations after the domestication of the horse, appearance of chariots, and the rise of advanced civilizations in the Near East. However, genetic signals of population mixture cease after ~1200 bce when Bronze Age civilizations in the Eastern Mediterranean world suddenly and violently collapsed. Armenians have since remained isolated and genetic structure within the population developed ~500 years ago when Armenia was divided between the Ottomans and the Safavid Empire in Iran. Finally, we show that Armenians have higher genetic affinity to Neolithic Europeans than other present-day Near Easterners, and that 29% of Armenian ancestry may originate from an ancestral population that is best represented by Neolithic Europeans.
Autosomal recessive retinitis pigmentosa with homozygous rhodopsin mutation E150K and non coding cis regulatory variants in CRX binding regions of SAMD7
The aim of this study was to unravel the molecular pathogenesis of an unusual retinitis pigmentosa (RP) phenotype observed in a Turkish consanguineous family. Homozygosity mapping revealed two candidate genes, SAMD7 and RHO. A homozygous RHO mutation c.448G > A, p.E150K was found in two affected siblings, while no coding SAMD7 mutations were identified. Interestingly, four non-coding homozygous variants were found in two SAMD7 genomic regions relevant for binding of the retinal transcription factor CRX (CRX-bound regions, CBRs) in these affected siblings. Three variants are located in a promoter CBR termed CBR1, while the fourth is located more downstream in CBR2. Transcriptional activity of these variants was assessed by luciferase assays and electroporation of mouse retinal explants with reporter constructs of wild-type and variant SAMD7 CBRs. The combined CBR2/CBR1 variant construct showed significantly decreased SAMD7 reporter activity compared to the wild-type sequence, suggesting a cis-regulatory effect on SAMD7 expression. As Samd7 is a recently identified Crx-regulated transcriptional repressor in retina, we hypothesize that these SAMD7 variants might contribute to the retinal phenotype observed here, characterized by unusual, recognizable pigment deposits, differing from the classic spicular intraretinal pigmentation observed in other individuals homozygous for p.E150K, and typically associated with RP in general. The aim of this study was to unravel the molecular pathogenesis of an unusual retinitis pigmentosa (RP) phenotype observed in a Turkish consanguineous family. Homozygosity mapping revealed two candidate genes, SAMD7 and RHO. A homozygous RHO mutation c.448G > A, p.E150K was found in two affected siblings, while no coding SAMD7 mutations were identified. Interestingly, four non-coding homozygous variants were found in two SAMD7 genomic regions relevant for binding of the retinal transcription factor CRX (CRX-bound regions, CBRs) in these affected siblings. Three variants are located in a promoter CBR termed CBR1, while the fourth is located more downstream in CBR2. Transcriptional activity of these variants was assessed by luciferase assays and electroporation of mouse retinal explants with reporter constructs of wild-type and variant SAMD7 CBRs. The combined CBR2/CBR1 variant construct showed significantly decreased SAMD7 reporter activity compared to the wild-type sequence, suggesting a cis-regulatory effect on SAMD7 expression. As Samd7 is a recently identified Crx-regulated transcriptional repressor in retina, we hypothesize that these SAMD7 variants might contribute to the retinal phenotype observed here, characterized by unusual, recognizable pigment deposits, differing from the classic spicular intraretinal pigmentation observed in other individuals homozygous for p.E150K, and typically associated with RP in general.
Gene by Environment Investigation of Incident Lung Cancer Risk in African Americans
Highlights • Genetic by environment (e.g., cigarettes/day, CPD) interactions for lung cancer are understudied in non-European ancestry populations. • We analyzed interactions between nominal smoking quantity SNPs (n = 7156 discovery sample) and CPD and risk of lung cancer (n = 1078 cases, n = 822 controls). • Six SNPs were effect modifiers of CPD for lung cancer, suggesting that the allelic dose effect is most pronounced in light smokers. Lung cancer is the leading cause of cancer death, disproportionately affecting African-Americans. Prior studies have reported specific genetic markers linked to both smoking quantity and risk of lung cancer in multiple ethnic/racial groups. Investigators analyzed associations between 28 polymorphisms and average cigarettes smoked per day (CPD) in 7156 African-American females and examined interactions between the top polymorphisms and CPD in a cohort of African-American males and females (1078 lung cancer cases and 822 health control patients). The results suggested that six polymorphisms within one genomic region increased lung cancer risk in African-Americans, which was most pronounced in light smokers. Highlights • Genetic by environment (e.g., cigarettes/day, CPD) interactions for lung cancer are understudied in non-European ancestry populations. • We analyzed interactions between nominal smoking quantity SNPs (n = 7156 discovery sample) and CPD and risk of lung cancer (n = 1078 cases, n = 822 controls). • Six SNPs were effect modifiers of CPD for lung cancer, suggesting that the allelic dose effect is most pronounced in light smokers. Lung cancer is the leading cause of cancer death, disproportionately affecting African-Americans. Prior studies have reported specific genetic markers linked to both smoking quantity and risk of lung cancer in multiple ethnic/racial groups. Investigators analyzed associations between 28 polymorphisms and average cigarettes smoked per day (CPD) in 7156 African-American females and examined interactions between the top polymorphisms and CPD in a cohort of African-American males and females (1078 lung cancer cases and 822 health control patients). The results suggested that six polymorphisms within one genomic region increased lung cancer risk in African-Americans, which was most pronounced in light smokers.Background Genome-wide association studies have identified polymorphisms linked to both smoking exposure and risk of lung cancer. The degree to which lung cancer risk is driven by increased smoking, genetics, or gene–environment interactions is not well understood. Methods We analyzed associations between 28 single nucleotide polymorphisms (SNPs) previously associated with smoking quantity and lung cancer in 7156 African-American females in the Women's Health Initiative (WHI), then analyzed main effects of top nominally significant SNPs and interactions between SNPs, cigarettes per day (CPD) and pack-years for lung cancer in an independent, multi-center case–control study of African-American females and males (1078 lung cancer cases and 822 controls). Findings Nine nominally significant SNPs for CPD in WHI were associated with incident lung cancer (corrected p-values from 0.027 to 6.09 × 10− 5). CPD was found to be a nominally significant effect modifier between SNP and lung cancer for six SNPs, including CHRNA5 rs2036527[A](betaSNP*CPD = − 0.017, p = 0.0061, corrected p = 0.054), which was associated with CPD in a previous genome-wide meta-analysis of African-Americans. Interpretation These results suggest that chromosome 15q25.1 variants are robustly associated with CPD and lung cancer in African-Americans and that the allelic dose effect of these polymorphisms on lung cancer risk is most pronounced in lighter smokers. Background Genome-wide association studies have identified polymorphisms linked to both smoking exposure and risk of lung cancer. The degree to which lung cancer risk is driven by increased smoking, genetics, or gene–environment interactions is not well understood. Methods We analyzed associations between 28 single nucleotide polymorphisms (SNPs) previously associated with smoking quantity and lung cancer in 7156 African-American females in the Women's Health Initiative (WHI), then analyzed main effects of top nominally significant SNPs and interactions between SNPs, cigarettes per day (CPD) and pack-years for lung cancer in an independent, multi-center case–control study of African-American females and males (1078 lung cancer cases and 822 controls). Findings Nine nominally significant SNPs for CPD in WHI were associated with incident lung cancer (corrected p-values from 0.027 to 6.09 × 10− 5). CPD was found to be a nominally significant effect modifier between SNP and lung cancer for six SNPs, including CHRNA5 rs2036527[A](betaSNP*CPD = − 0.017, p = 0.0061, corrected p = 0.054), which was associated with CPD in a previous genome-wide meta-analysis of African-Americans. Interpretation These results suggest that chromosome 15q25.1 variants are robustly associated with CPD and lung cancer in African-Americans and that the allelic dose effect of these polymorphisms on lung cancer risk is most pronounced in lighter smokers.