Citation analyticsNew
Protocol design
Bioinformatics tools
Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes
Summary: Comparative genomics remains a pivotal strategy to study the evolution of gene organization, and this primacy is reinforced by the growing number of full genome sequences available in public repositories. Despite this growth, bioinformatic tools available to visualize and compare genomes and to infer evolutionary events remain restricted to two or three genomes at a time, thus limiting the breadth and the nature of the question that can be investigated. Here we present Genomicus, a new synteny browser that can represent and compare unlimited numbers of genomes in a broad phylogenetic view. In addition, Genomicus includes reconstructed ancestral gene organization, thus greatly facilitating the interpretation of the data. Availability: Genomicus is freely available for online use at while data can be downloaded at Contact: [email protected]
Efficient search, mapping, and optimization of multi protein genetic systems in diverse bacteria
Developing predictive models of multi-protein genetic systems to understand and optimize their behavior remains a combinatorial challenge, particularly when measurement throughput is limited. We developed a computational approach to build predictive models and identify optimal sequences and expression levels, while circumventing combinatorial explosion. Maximally informative genetic system variants were first designed by the RBS Library Calculator, an algorithm to design sequences for efficiently searching a multi-protein expression space across a > 10,000-fold range with tailored search parameters and well-predicted translation rates. We validated the algorithm's predictions by characterizing 646 genetic system variants, encoded in plasmids and genomes, expressed in six gram-positive and gram-negative bacterial hosts. We then combined the search algorithm with system-level kinetic modeling, requiring the construction and characterization of 73 variants to build a sequence-expression-activity map (SEAMAP) for a biosynthesis pathway. Using model predictions, we designed and characterized 47 additional pathway variants to navigate its activity space, find optimal expression regions with desired activity response curves, and relieve rate-limiting steps in metabolism. Creating sequence-expression-activity maps accelerates the optimization of many protein systems and allows previous measurements to quantitatively inform future designs.
Transcriptional Profiling of Bacillus anthracis Sterne (34F2) during Iron Starvation
Lack of available iron is one of many environmental challenges that a bacterium encounters during infection and adaptation to iron starvation is important for the pathogen to efficiently replicate within the host. Here we define the transcriptional response of B. anthracis Sterne (34F2) to iron depleted conditions. Genome-wide transcript analysis showed that B. anthracis undergoes considerable changes in gene expression during growth in iron-depleted media, including the regulation of known and candidate virulence factors. Two genes encoding putative internalin proteins were chosen for further study. Deletion of either gene (GBAA0552 or GBAA1340) resulted in attenuation in a murine model of infection. This attenuation was amplified in a double mutant strain. These data define the transcriptional changes induced during growth in low iron conditions and illustrate the potential of this dataset in the identification of putative virulence determinants for future study.
Re annotation and re analysis of the Campylobacter jejuni NCTC11168 genome sequence
Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.
Genome dynamics and diversity of Shigella species, the etiologic agents of bacillary dysentery
The Shigella bacteria cause bacillary dysentery, which remains a significant threat to public health. The genus status and species classification appear no longer valid, as compelling evidence indicates that Shigella, as well as enteroinvasive Escherichia coli, are derived from multiple origins of E.coli and form a single pathovar. Nevertheless, Shigella dysenteriae serotype 1 causes deadly epidemics but Shigella boydii is restricted to the Indian subcontinent, while Shigella flexneri and Shigella sonnei are prevalent in developing and developed countries respectively. To begin to explain these distinctive epidemiological and pathological features at the genome level, we have carried out comparative genomics on four representative strains. Each of the Shigella genomes includes a virulence plasmid that encodes conserved primary virulence determinants. The Shigella chromosomes share most of their genes with that of E.coli K12 strain MG1655, but each has over 200 pseudogenes, 300∼700 copies of insertion sequence (IS) elements, and numerous deletions, insertions, translocations and inversions. There is extensive diversity of putative virulence genes, mostly acquired via bacteriophage-mediated lateral gene transfer. Hence, via convergent evolution involving gain and loss of functions, through bacteriophage-mediated gene acquisition, IS-mediated DNA rearrangements and formation of pseudogenes, the Shigella spp. became highly specific human pathogens with variable epidemiological and pathological features.
CNV discovery for milk composition traits in dairy cattle using whole genome resequencing
Copy number variations (CNVs) are important and widely distributed in the genome. CNV detection opens a new avenue for exploring genes associated with complex traits in humans, animals and plants. Herein, we present a genome-wide assessment of CNVs that are potentially associated with milk composition traits in dairy cattle. In this study, CNVs were detected based on whole genome re-sequencing data of eight Holstein bulls from four half- and/or full-sib families, with extremely high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage. The range of coverage depth per individual was 8.2–11.9×. Using CNVnator, we identified a total of 14,821 CNVs, including 5025 duplications and 9796 deletions. Among them, 487 differential CNV regions (CNVRs) comprising ~8.23 Mb of the cattle genome were observed between the high and low groups. Annotation of these differential CNVRs were performed based on the cattle genome reference assembly (UMD3.1) and totally 235 functional genes were found within the CNVRs. By Gene Ontology and KEGG pathway analyses, we found that genes were significantly enriched for specific biological functions related to protein and lipid metabolism, insulin/IGF pathway-protein kinase B signaling cascade, prolactin signaling pathway and AMPK signaling pathways. These genes included INS, IGF2, FOXO3, TH, SCD5, GALNT18, GALNT16, ART3, SNCA and WNT7A, implying their potential association with milk protein and fat traits. In addition, 95 CNVRs were overlapped with 75 known QTLs that are associated with milk protein and fat traits of dairy cattle (Cattle QTLdb). In conclusion, based on NGS of 8 Holstein bulls with extremely high and low EBVs for milk PP and FP, we identified a total of 14,821 CNVs, 487 differential CNVRs between groups, and 10 genes, which were suggested as promising candidate genes for milk protein and fat traits. The online version of this article (doi:10.1186/s12864-017-3636-3) contains supplementary material, which is available to authorized users.
First‐generation HapMap in Cajanus spp. reveals untapped variations in parental lines of mapping populations
Whole genome re‐sequencing (WGRS) was conducted on a panel of 20 Cajanus spp. accessions (crossing parentals of recombinant inbred lines, introgression lines, multiparent advanced generation intercross and nested association mapping population) comprising of two wild species and 18 cultivated species accessions. A total of 791.77 million paired‐end reads were generated with an effective mapping depth of ~12X per accession. Analysis of WGRS data provided 5 465 676 genome‐wide variations including 4 686 422 SNPs and 779 254 InDels across the accessions. Large structural variations in the form of copy number variations (2598) and presence and absence variations (970) were also identified. Additionally, 2 630 904 accession‐specific variations comprising of 2 278 571 SNPs (86.6%), 166 243 deletions (6.3%) and 186 090 insertions (7.1%) were also reported. Identified polymorphic sites in this study provide the first‐generation HapMap in Cajanus spp. which will be useful in mapping the genomic regions responsible for important traits.
Characterization of Three New Insect Specific Flaviviruses: Their Relationship to the Mosquito Borne Flavivirus Pathogens
Three novel insect-specific flaviviruses, isolated from mosquitoes collected in Peru, Malaysia (Sarawak), and the United States, are characterized. The new viruses, designated La Tina, Kampung Karu, and Long Pine Key, respectively, are antigenically and phylogenetically more similar to the mosquito-borne flavivirus pathogens, than to the classical insect-specific viruses like cell fusing agent and Culex flavivirus. The potential implications of this relationship and the possible uses of these and other arbovirus-related insect-specific flaviviruses are reviewed.
Development of a comparative genomic fingerprinting assay for rapid and high resolution genotyping of Arcobacter butzleri
Molecular typing methods are critical for epidemiological investigations, facilitating disease outbreak detection and source identification. Study of the epidemiology of the emerging human pathogen Arcobacter butzleri is currently hampered by the lack of a subtyping method that is easily deployable in the context of routine epidemiological surveillance. In this study we describe a comparative genomic fingerprinting (CGF) method for high-resolution and high-throughput subtyping of A. butzleri. Comparative analysis of the genome sequences of eleven A. butzleri strains, including eight strains newly sequenced as part of this project, was employed to identify accessory genes suitable for generating unique genetic fingerprints for high-resolution subtyping based on gene presence or absence within a strain. A set of eighty-three accessory genes was used to examine the population structure of a dataset comprised of isolates from various sources, including human and non-human animals, sewage, and river water (n=156). A streamlined assay (CGF40) based on a subset of 40 genes was subsequently developed through marker optimization. High levels of profile diversity (121 distinct profiles) were observed among the 156 isolates in the dataset, and a high Simpson’s Index of Diversity (ID) observed (ID > 0.969) indicate that the CGF40 assay possesses high discriminatory power. At the same time, our observation that 115 isolates in this dataset could be assigned to 29 clades with a profile similarity of 90% or greater indicates that the method can be used to identify clades comprised of genetically similar isolates. The CGF40 assay described herein combines high resolution and repeatability with high throughput for the rapid characterization of A. butzleri strains. This assay will facilitate the study of the population structure and epidemiology of A. butzleri. The online version of this article (doi:10.1186/s12866-015-0426-4) contains supplementary material, which is available to authorized users.
Draft Genome Sequence of Sphingobium quisquiliarum Strain P25T, a Novel Hexachlorocyclohexane (HCH) Degrading Bacterium Isolated from an HCH Dumpsite
Here, we report the draft genome sequence (4.2 Mb) of Sphingobium quisquiliarum strain P25T, a natural lin (genes involved in degradation of hexachlorocyclohexane [HCH] isomers) variant genotype, isolated from a heavily contaminated (450 mg HCH/g of soil) HCH dumpsite.
Cell Cycle Control of Bivalent Epigenetic Domains Regulates the Exit from Pluripotency
Here we show that bivalent domains and chromosome architecture for bivalent genes are dynamically regulated during the cell cycle in human pluripotent cells. Central to this is the transient increase in H3K4-trimethylation at developmental genes during G1, thereby creating a “window of opportunity” for cell-fate specification. This mechanism is controlled by CDK2-dependent phosphorylation of the MLL2 (KMT2B) histone methyl-transferase, which facilitates its recruitment to developmental genes in G1. MLL2 binding is required for changes in chromosome architecture around developmental genes and establishes promoter-enhancer looping interactions in a cell-cycle-dependent manner. These cell-cycle-regulated loops are shown to be essential for activation of bivalent genes and pluripotency exit. These findings demonstrate that bivalent domains are established to control the cell-cycle-dependent activation of developmental genes so that differentiation initiates from the G1 phase. • Bivalent domains are unstable, dynamic, and cell-cycle regulated • CDK2 phosphorylates MLL2 and establishes bivalent domains in G1 • Chromosome remodeling in G1 is required for the “poised” pluripotent state Bivalent domains are unstable, dynamic, and cell-cycle regulated CDK2 phosphorylates MLL2 and establishes bivalent domains in G1 Chromosome remodeling in G1 is required for the “poised” pluripotent state In this report, Dalton and colleagues show that developmental genes are primed for activation in G1 phase of the cell cycle by a mechanism requiring convergence of the cell-cycle machinery with cell signaling pathways. This priming mechanism involves the establishment of bivalent epigenetic domains and dynamic changes in chromosome architecture around developmental genes.
First Draft Genome Sequences of Two Bartonella tribocorum Strains from Laos and Cambodia
Bartonella tribocorum is a Gram-negative bacterium known to infect animals, and rodents in particular, throughout the world. In this report, we present the draft genome sequences of two strains of B. tribocorum isolated from the blood of a rodent in Laos and a shrew in Cambodia.
Exome sequencing in undiagnosed inherited and sporadic ataxias
Inherited ataxias are difficult to diagnose genetically. Pyle et al. use whole-exome sequencing to provide a likely molecular diagnosis in 14 of 22 families with ataxia. The approach reveals de novo mutations, broadens the phenotype of other disease genes, and is equally effective in young and older-onset patients. Inherited ataxias are clinically and genetically heterogeneous, and a molecular diagnosis is not possible in most patients. Having excluded common sporadic, inherited and metabolic causes, we used an unbiased whole exome sequencing approach in 35 affected individuals, from 22 randomly selected families of white European descent. We defined the likely molecular diagnosis in 14 of 22 families (64%). This revealed de novo dominant mutations, validated disease genes previously described in isolated families, and broadened the clinical phenotype of known disease genes. The diagnostic yield was the same in both young and older-onset patients, including sporadic cases. We have demonstrated the impact of exome sequencing in a group of patients notoriously difficult to diagnose genetically. This has important implications for genetic counselling and diagnostic service provision.
Genotype and clinical course in 2 Chinese Han siblings with Wilson disease presenting with isolated disabling premature osteoarthritis
Supplemental Digital Content is available in the text Premature osteoarthritis (POA) is a rare condition in Wilson disease (WD). Particularly, when POA is the only complaint of a WD patient for a long time, there would be misdiagnosis or missed diagnosis and then treatment delay. Two Chinese Han siblings were diagnosed as WD by corneal K-F rings, laboratory test, and mutation analysis. They presented with isolated POA during the first 2 decades or more of their disease course, and were of missed diagnosis during that long time. The older affected sib became disabled due to his severe osteoarthritis when he was as young as 38 years old. Two compound heterozygous pathogenic variants c.2790_2792del and c.2621C>T were revealed in the ATP7B gene through targeted next-generation sequencing (NGS). Adolescent-onset POA could be the only complaint of WD individual for at least 2 decades. Long delay in the treatment of WD's POA could lead to disability in early adulthood. Detailed physical examination, special biochemical test, and genotyping through targeted NGS should greatly reduce diagnosis delay in atypical WD patients with isolated POA phenotype.
Whole Exome Sequencing Reveals Homozygous Mutations in RAI1, OTOF, and SLC26A4 Genes Associated with Nonsyndromic Hearing Loss in Altaian Families (South Siberia)
Hearing loss (HL) is one of the most common sensorineural disorders and several dozen genes contribute to its pathogenesis. Establishing a genetic diagnosis of HL is of great importance for clinical evaluation of deaf patients and for estimating recurrence risks for their families. Efforts to identify genes responsible for HL have been challenged by high genetic heterogeneity and different ethnic-specific prevalence of inherited deafness. Here we present the utility of whole exome sequencing (WES) for identifying candidate causal variants for previously unexplained nonsyndromic HL of seven patients from four unrelated Altaian families (the Altai Republic, South Siberia). The WES analysis revealed homozygous missense mutations in three genes associated with HL. Mutation c.2168A>G (SLC26A4) was found in one family, a novel mutation c.1111G>C (OTOF) was revealed in another family, and mutation c.5254G>A (RAI1) was found in two families. Sanger sequencing was applied for screening of identified variants in an ethnically diverse cohort of other patients with HL (n = 116) and in Altaian controls (n = 120). Identified variants were found only in patients of Altaian ethnicity (n = 93). Several lines of evidences support the association of homozygosity for discovered variants c.5254G>A (RAI1), c.1111C>G (OTOF), and c.2168A>G (SLC26A4) with HL in Altaian patients. Local prevalence of identified variants implies possible founder effect in significant number of HL cases in indigenous population of the Altai region. Notably, this is the first reported instance of patients with RAI1 missense mutation whose HL is not accompanied by specific traits typical for Smith-Magenis syndrome. Presumed association of RAI1 gene variant c.5254G>A with isolated HL needs to be proved by further experimental studies.
ANO10 mutations cause ataxia and coenzyme Q10 deficiency
Inherited ataxias are heterogeneous disorders affecting both children and adults, with over 40 different causative genes, making molecular genetic diagnosis challenging. Although recent advances in next-generation sequencing have significantly improved mutation detection, few treatments exist for patients with inherited ataxia. In two patients with adult-onset cerebellar ataxia and coenzyme Q10 (CoQ10) deficiency in muscle, whole exome sequencing revealed mutations in ANO10, which encodes anoctamin 10, a member of a family of putative calcium-activated chloride channels, and the causative gene for autosomal recessive spinocerebellar ataxia-10 (SCAR10). Both patients presented with slowly progressive ataxia and dysarthria leading to severe disability in the sixth decade. Epilepsy and learning difficulties were also present in one patient, while retinal degeneration and cataract were present in the other. The detection of mutations in ANO10 in our patients indicate that ANO10 defects cause secondary low CoQ10 and SCAR10 patients may benefit from CoQ10 supplementation. The online version of this article (doi:10.1007/s00415-014-7476-7) contains supplementary material, which is available to authorized users.
Identification of a de novo DYNC1H1 mutation via WES according to published guidelines
De novo mutations that contribute to rare Mendelian diseases, including neurological disorders, have been recently identified. Whole-exome sequencing (WES) has become a powerful tool for the identification of inherited and de novo mutations in Mendelian diseases. Two important guidelines were recently published regarding the investigation of causality of sequence variant in human disease and the interpretation of novel variants identified in human genome sequences. In this study, a family with supposed movement disorders was sequenced via WES (including the proband and her unaffected parents), and a standard investigation and interpretation of the identified variants was performed according to the published guidelines. We identified a novel de novo mutation (c.2327C > T, p.P776L) in DYNC1H1 gene and confirmed that it was the causal variant. The phenotype of the affected twins included delayed motor milestones, pes cavus, lower limb weakness and atrophy, and a waddling gait. Electromyographic (EMG) recordings revealed typical signs of chronic denervation. Our study demonstrates the power of WES to discover the de novo mutations associated with a neurological disease on the whole exome scale, and guidelines to conduct WES studies and interpret of identified variants are a preferable option for the exploration of the pathogenesis of rare neurological disorders.
Identification of cell type specific mutations in nodal T cell lymphomas
Recent genetic analysis has identified frequent mutations in ten-eleven translocation 2 (TET2), DNA methyltransferase 3A (DNMT3A), isocitrate dehydrogenase 2 (IDH2) and ras homolog family member A (RHOA) in nodal T-cell lymphomas, including angioimmunoblastic T-cell lymphoma and peripheral T-cell lymphoma, not otherwise specified. We examined the distribution of mutations in these subtypes of mature T-/natural killer cell neoplasms to determine their clonal architecture. Targeted sequencing was performed for 71 genes in tumor-derived DNA of 87 cases. The mutations were then analyzed in a programmed death-1 (PD1)-positive population enriched with tumor cells and CD20-positive B cells purified by laser microdissection from 19 cases. TET2 and DNMT3A mutations were identified in both the PD1+ cells and the CD20+ cells in 15/16 and 4/7 cases, respectively. All the RHOA and IDH2 mutations were confined to the PD1+ cells, indicating that some, including RHOA and IDH2 mutations, being specific events in tumor cells. Notably, we found that all NOTCH1 mutations were detected only in the CD20+ cells. In conclusion, we identified both B- as well as T-cell-specific mutations, and mutations common to both T and B cells. These findings indicate the expansion of a clone after multistep and multilineal acquisition of gene mutations.
Mutations in histone modulators are associated with prolonged survival during azacitidine therapy
Early therapeutic decision-making is crucial in patients with higher-risk MDS. We evaluated the impact of clinical parameters and mutational profiles in 134 consecutive patients treated with azacitidine using a combined cohort from Karolinska University Hospital (n=89) and from King's College Hospital, London (n=45). While neither clinical parameters nor mutations had a significant impact on response rate, both karyotype and mutational profile were strongly associated with survival from the start of treatment. IPSS high-risk cytogenetics negatively impacted overall survival (median 20 vs 10 months; p<0.001), whereas mutations in histone modulators (ASXL1, EZH2) were associated with prolonged survival (22 vs 12 months, p=0.01). This positive association was present in both cohorts and remained highly significant in the multivariate cox model. Importantly, patients with mutations in histone modulators lacking high-risk cytogenetics showed a survival of 29 months compared to only 10 months in patients with the opposite pattern. While TP53 was negatively associated with survival, neither RUNX1-mutations nor the number of mutations appeared to influence survival in this cohort. We propose a model combining histone modulator mutational screening with cytogenetics in the clinical decision-making process for higher-risk MDS patients eligible for treatment with azacitidine.
STUB1 mutations in autosomal recessive ataxias – evidence for mutation specific clinical heterogeneity
A subset of hereditary cerebellar ataxias is inherited as autosomal recessive traits (ARCAs). Classification of recessive ataxias due to phenotypic differences in the cerebellum and cerebellar structures is constantly evolving due to new identified disease genes. Recently, reports have linked mutations in genes involved in ubiquitination (RNF216, OTUD4, STUB1) to ARCA with hypogonadism. With a combination of homozygozity mapping and exome sequencing, we identified three mutations in STUB1 in two families with ARCA and cognitive impairment; a homozygous missense variant (c.194A > G, p.Asn65Ser) that segregated in three affected siblings, and a missense change (c.82G > A, p.Glu28Lys) which was inherited in trans with a nonsense mutation (c.430A > T, p.Lys144Ter) in another patient. STUB1 encodes CHIP (C-terminus of Heat shock protein 70 – Interacting Protein), a dual function protein with a role in ubiquitination as a co-chaperone with heat shock proteins, and as an E3 ligase. We show that the p.Asn65Ser substitution impairs CHIP’s ability to ubiquitinate HSC70 in vitro, despite being able to self-ubiquitinate. These results are consistent with previous studies highlighting this as a critical residue for the interaction between CHIP and its co-chaperones. Furthermore, we show that the levels of CHIP are strongly reduced in vivo in patients’ fibroblasts compared to controls. These results suggest that STUB1 mutations might cause disease by impacting not only the E3 ligase function, but also its protein interaction properties and protein amount. Whether the clinical heterogeneity seen in STUB1 ARCA can be related to the location of the mutations remains to be understood, but interestingly, all siblings with the p.Asn65Ser substitution showed a marked appearance of accelerated aging not previously described in STUB1 related ARCA, none display hormonal aberrations/clinical hypogonadism while some affected family members had diabetes, alopecia, uveitis and ulcerative colitis, further refining the spectrum of STUB1 related disease. The online version of this article (doi:10.1186/s13023-014-0146-0) contains supplementary material, which is available to authorized users.
Glioblastoma adaptation traced through decline of an IDH1 clonal driver and macro evolution of a double minute chromosome
In a glioblastoma tumour with multi-region sequencing before and after recurrence, we find an IDH1 mutation that is clonal in the primary but lost at recurrence. We also describe the evolution of a double-minute chromosome encoding regulators of the PI3K signalling axis that dominates at recurrence, emphasizing the challenges of an evolving and dynamic oncogenic landscape for precision medicine. Glioblastoma (GBM) is the most common malignant brain cancer occurring in adults, and is associated with dismal outcome and few therapeutic options. GBM has been shown to predominantly disrupt three core pathways through somatic aberrations, rendering it ideal for precision medicine approaches. We describe a 35-year-old female patient with recurrent GBM following surgical removal of the primary tumour, adjuvant treatment with temozolomide and a 3-year disease-free period. Rapid whole-genome sequencing (WGS) of three separate tumour regions at recurrence was carried out and interpreted relative to WGS of two regions of the primary tumour. We found extensive mutational and copy-number heterogeneity within the primary tumour. We identified a TP53 mutation and two focal amplifications involving PDGFRA, KIT and CDK4, on chromosomes 4 and 12. A clonal IDH1 R132H mutation in the primary, a known GBM driver event, was detectable at only very low frequency in the recurrent tumour. After sub-clonal diversification, evidence was found for a whole-genome doubling event and a translocation between the amplified regions of PDGFRA, KIT and CDK4, encoded within a double-minute chromosome also incorporating miR26a-2. The WGS analysis uncovered progressive evolution of the double-minute chromosome converging on the KIT/PDGFRA/PI3K/mTOR axis, superseding the IDH1 mutation in dominance in a mutually exclusive manner at recurrence, consequently the patient was treated with imatinib. Despite rapid sequencing and cancer genome-guided therapy against amplified oncogenes, the disease progressed, and the patient died shortly after. This case sheds light on the dynamic evolution of a GBM tumour, defining the origins of the lethal sub-clone, the macro-evolutionary genomic events dominating the disease at recurrence and the loss of a clonal driver. Even in the era of rapid WGS analysis, cases such as this illustrate the significant hurdles for precision medicine success.
The clinical features, outcomes and genetic characteristics of hypertrophic cardiomyopathy patients with severe right ventricular hypertrophy
Severe right ventricular hypertrophy (SRVH) is a rare phenotype in hypertrophic cardiomyopathy (HCM) for which limited information is available. This study was undertaken to investigate the clinical, prognostic and genetic characteristics of HCM patients with SRVH. HCM with SRVH was defined as HCM with a maximum right ventricular wall thickness ≥10 mm. Whole-genome sequencing (WGS) was performed in HCM patients with SRVH. Multivariate Cox proportional hazards regression models were used to identify risk factors for cardiac death and events in HCM with SRVH. Patients with apical hypertrophic cardiomyopathy (ApHCM) were selected as a comparison group. The clinical features and outcomes of 34 HCM patients with SRVH and 273 ApHCM patients were compared. Compared with the ApHCM group, the HCM with SRVH group included younger patients and a higher proportion of female patients and also displayed higher cardiovascular morbidity and mortality. The multivariate Cox proportional hazards regression models identified 2 independent predictors of cardiovascular death in HCM patients with SRVH, a New York Heart Association class ≥III (hazard ratio [HR] = 8.7, 95% confidence interval (CI): 1.43-52.87, p = 0.019) and an age at the time of HCM diagnosis ≤18 (HR = 5.5, 95% CI: 1.24-28.36, p = 0.026). Among the 11 HCM patients with SRVH who underwent WGS, 10 (90.9%) were identified as carriers of at least one specific sarcomere gene mutation. MYH7 and TTN mutations were the most common sarcomere mutations noted in this study. Two or more HCM-related gene mutations were observed in 9 (82%) patients, and mutations in either other cardiomyopathy-related genes or ion-channel disease-related genes were found in 8 (73%) patients. HCM patients with SRVH were characterized by poor clinical outcomes and the presentation of multiple gene mutations.
Isolated inclusion body myopathy caused by a multisystem proteinopathy–linked hnRNPA1 mutation
To identify the genetic cause of isolated inclusion body myopathy (IBM) with autosomal dominant inheritance in 2 families. Genetic investigations were performed using whole-exome and Sanger sequencing of the heterogeneous nuclear ribonucleoprotein A1 gene (hnRNPA1). The clinical and pathologic features of patients in the 2 families were evaluated with neurologic examinations, muscle imaging, and muscle biopsy. We identified a missense p.D314N mutation in hnRNPA1, which is also known to cause familial amyotrophic lateral sclerosis, in 2 families with IBM. The affected individuals developed muscle weakness in their 40s, which slowly progressed toward a limb-girdle pattern. Further evaluation of the affected individuals revealed no apparent motor neuron dysfunction, cognitive impairment, or bone abnormality. The muscle pathology was compatible with IBM, lacking apparent neurogenic change and inflammation. Multiple immunohistochemical analyses revealed the cytoplasmic aggregation of hnRNPA1 in close association with autophagosomes and myonuclei. Furthermore, the aberrant accumulation was characterized by coaggregation with ubiquitin, sequestome-1/p62, valosin-containing protein/p97, and a variety of RNA-binding proteins (RBPs). The present study expands the clinical phenotype of hnRNPA1-linked multisystem proteinopathy. Mutations in hnRNPA1, and possibly hnRNPA2B1, will be responsible for isolated IBM with a pure muscular phenotype. Although the mechanisms underlying the selective skeletal muscle involvement remain to be elucidated, the immunohistochemical results suggest a broad sequestration of RBPs by the mutated hnRNPA1.
New perspective in diagnostics of mitochondrial disorders: two years’ experience with whole exome sequencing at a national paediatric centre
Whole-exome sequencing (WES) has led to an exponential increase in identification of causative variants in mitochondrial disorders (MD). We performed WES in 113 MD suspected patients from Polish paediatric reference centre, in whom routine testing failed to identify a molecular defect. WES was performed using TruSeqExome enrichment, followed by variant prioritization, validation by Sanger sequencing, and segregation with the disease phenotype in the family. Likely causative mutations were identified in 67 (59.3 %) patients; these included variants in mtDNA (6 patients) and nDNA: X-linked (9 patients), autosomal dominant (5 patients), and autosomal recessive (47 patients, 11 homozygotes). Novel variants accounted for 50.5 % (50/99) of all detected changes. In 47 patients, changes in 31 MD-related genes (ACAD9, ADCK3, AIFM1, CLPB, COX10, DLD, EARS2, FBXL4, MTATP6, MTFMT, MTND1, MTND3, MTND5, NAXE, NDUFS6, NDUFS7, NDUFV1, OPA1, PARS2, PC, PDHA1, POLG, RARS2, RRM2B, SCO2, SERAC1, SLC19A3, SLC25A12, TAZ, TMEM126B, VARS2) were identified. The ACAD9, CLPB, FBXL4, PDHA1 genes recurred more than twice suggesting higher general/ethnic prevalence. In 19 cases, variants in 18 non-MD related genes (ADAR, CACNA1A, CDKL5, CLN3, CPS1, DMD, DYSF, GBE1, GFAP, HSD17B4, MECP2, MYBPC3, PEX5, PGAP2, PIGN, PRF1, SBDS, SCN2A) were found. The percentage of positive WES results rose gradually with increasing probability of MD according to the Mitochondrial Disease Criteria (MDC) scale (from 36 to 90 % for low and high probability, respectively). The percentage of detected MD-related genes compared with non MD-related genes also grew with the increasing MD likelihood (from 20 to 97 %). Molecular diagnosis was established in 30/47 (63.8 %) neonates and in 17/28 (60.7 %) patients with basal ganglia involvement. Mutations in CLPB, SERAC1, TAZ genes were identified in neonates with 3-methylglutaconic aciduria (3-MGA) as a discriminative feature. New MD-related candidate gene (NDUFB8) is under verification. We suggest WES rather than targeted NGS as the method of choice in diagnostics of MD in children, including neonates with 3-MGA aciduria, who died without determination of disease cause and with limited availability of laboratory data. There is a strong correlation between the degree of MD diagnosis by WES and MD likelihood expressed by the MDC scale. The online version of this article (doi:10.1186/s12967-016-0930-9) contains supplementary material, which is available to authorized users.
Germline mutations in ETV6 are associated with thrombocytopenia, red cell macrocytosis and predisposition to lymphoblastic leukemia
Some familial platelet disorders are associated with predisposition to leukemia, myelodysplastic syndrome (MDS) or dyserythropoietic anemia.1,2 We identified a family with autosomal dominant thrombocytopenia, high erythrocyte mean corpuscular volume (MCV) and two occurrences of B-cell precursor acute lymphoblastic leukemia (ALL). Whole exome sequencing identified a heterozygous single nucleotide change in ETV6 (Ets Variant Gene 6), c.641C>T, encoding a p.Pro214Leu substitution in the central domain, segregating with thrombocytopenia and elevated MCV. A screen of 23 families with similar phenotype found two with ETV6 mutations. One family had the p.Pro214Leu mutation and one individual with ALL. The other family had a c.1252A>G transition producing a p.Arg418Gly substitution in the DNA binding domain, with alternative splicing and exon-skipping. Functional characterization of these mutations showed aberrant cellular localization of mutant and endogenous ETV6, decreased transcriptional repression and altered megakaryocyte maturation. Our findings underscore a key role for ETV6 in platelet formation and leukemia predisposition.
Behr’s Syndrome is Typically Associated with Disturbed Mitochondrial Translation and Mutations in the C12orf65 Gene
Behr’s syndrome is a classical phenotypic description of childhood-onset optic atrophy combined with various neurological symptoms, including ophthalmoparesis, nystagmus, spastic paraparesis, ataxia, peripheral neuropathy and learning difficulties. Here we describe 4 patients with the classical Behr’s syndrome phenotype from 3 unrelated families who carry homozygous nonsense mutations in the C12orf65 gene encoding a protein involved in mitochondrial translation. Whole exome sequencing was performed in genomic DNA and oxygen consumption was measured in patient cell lines. We detected 2 different homozygous C12orf65 nonsense mutations in 4 patients with a homogeneous clinical presentation matching the historical description of Behr’s syndrome. The first symptom in all patients was childhood-onset optic atrophy, followed by spastic paraparesis, distal weakness, motor neuropathy and ophthalmoparesis. We think that C12orf65 mutations are more frequent than previously suggested and screening of this gene should be considered not only in patients with mitochondrial respiratory chain deficiencies, but also in inherited peripheral neuropathies, spastic paraplegias and ataxias, especially with pre-existing optic atrophy.
A survey on cellular RNA editing activity in response to Candida albicans infections
Adenosine-to-Inosine (A-to-I) RNA editing is catalyzed by the adenosine deaminase acting on RNA (ADAR) family of enzymes, which induces alterations in mRNA sequence. It has been shown that A-to-I RNA editing events are of significance in the cell’s innate immunity and cellular response to viral infections. However, whether RNA editing plays a role in cellular response to microorganism/fungi infection has not been determined. Candida albicans, one of the most prevalent human pathogenic fungi, usually act as a commensal on skin and superficial mucosal, but has been found to cause candidiasis in immunosuppression patients. Previously, we have revealed the up-regulation of A-to-I RNA editing activity in response to different types of influenza virus infections. The current work is designed to study the effect of microorganism/fungi infection on the activity of A-to-I RNA editing in infected hosts. We first detected and characterized the A-to-I RNA editing events in oral epithelial cells (OKF6) and primary human umbilical vein endothelial cells (HUVEC), under normal growth condition or with C. albicans infection. Eighty nine thousand six hundred forty eight and 60,872 A-to-I editing sites were detected in normal OKF6 and HUVEC cells, respectively. They were validated against the RNA editing databases, DARNED, RADAR, and REDIportal with 50, 80, and 80% success rates, respectively. While over 95% editing sites were detected in Alu regions, among the rest of the editing sites in non repetitive regions, the majority was located in introns and UTRs. The distributions of A-to-I editing activity and editing depth were analyzed during the course of C. albicans infection. While the normalized editing levels of common editing sites exhibited a significant increase, especially in Alu regions, no significant change in the expression of ADAR1 or ADAR2 was observed. Second, we performed further analysis on data from in vivo mouse study with C. albicans infection. One thousand one hundred thirty three and 955 A-to-I editing sites were identified in mouse tongue and kidney tissues, respectively. The number of A-to-I editing events was much smaller than in human epithelial or endothelial cells, due to the lack of Alu elements in mouse genome. Furthermore, during the course of C. albicans infection we observed stable level of A-to-I editing activity in 131 and 190 common editing sites in the mouse tongue and kidney tissues, and found no significant change in ADAR1 or ADAR2 expression (with the exception of ADAR2 displaying a significant increase at 12 h after infection in mouse kidney tissue before returning to normal). This work represents the first comprehensive analysis of A-to-I RNA editome in human epithelial and endothelial cells. C. albicans infection of human epithelial and endothelial cells led to the up-regulation of A-to-I editing activities, through a mechanism different from that of viral infections in human hosts. However, the in vivo mouse model with C. albicans infection did not show significant changes in A-to-I editing activities in tongue and kidney tissues. The different results in the mouse model were likely due to the presence of more complex in vivo environments, e.g. circulation and mixed cell types. The online version of this article (10.1186/s12864-017-4374-2) contains supplementary material, which is available to authorized users.
Congenital myasthenic syndromes due to mutations in ALG2 and ALG14
Congenital myasthenic syndromes are a heterogeneous group of inherited disorders that arise from impaired signal transmission at the neuromuscular synapse. They are characterized by fatigable muscle weakness. We performed linkage analysis, whole-exome and whole-genome sequencing to determine the underlying defect in patients with an inherited limb-girdle pattern of myasthenic weakness. We identify ALG14 and ALG2 as novel genes in which mutations cause a congenital myasthenic syndrome. Through analogy with yeast, ALG14 is thought to form a multiglycosyltransferase complex with ALG13 and DPAGT1 that catalyses the first two committed steps of asparagine-linked protein glycosylation. We show that ALG14 is concentrated at the muscle motor endplates and small interfering RNA silencing of ALG14 results in reduced cell-surface expression of muscle acetylcholine receptor expressed in human embryonic kidney 293 cells. ALG2 is an alpha-1,3-mannosyltransferase that also catalyses early steps in the asparagine-linked glycosylation pathway. Mutations were identified in two kinships, with mutation ALG2p.Val68Gly found to severely reduce ALG2 expression both in patient muscle, and in cell cultures. Identification of DPAGT1, ALG14 and ALG2 mutations as a cause of congenital myasthenic syndrome underscores the importance of asparagine-linked protein glycosylation for proper functioning of the neuromuscular junction. These syndromes form part of the wider spectrum of congenital disorders of glycosylation caused by impaired asparagine-linked glycosylation. It is likely that further genes encoding components of this pathway will be associated with congenital myasthenic syndromes or impaired neuromuscular transmission as part of a more severe multisystem disorder. Our findings suggest that treatment with cholinesterase inhibitors may improve muscle function in many of the congenital disorders of glycosylation.
Splicing Variants of SERPINA1 Gene in Ovine Milk: Characterization of cDNA and Identification of Polymorphisms
The serine protease inhibitor, clade A, member 1 (SERPINA1) is the gene for a protein called alpha-1-antitrypsin (AAT), which is a member of the serine protease inhibitor (serpin) superfamily of proteins. By conformational change, serpins control several chemical reactions inhibiting the activity of proteases. AAT is the most abundant endogenous serpin in blood circulation and it is present in relatively high concentration in human milk as well as in bovine and porcine colostrum. Here we report for the first time the molecular characterization and sequence variability of the ovine SERPINA1 cDNA and gene. cDNAs from mammary gland and from milk were PCR amplified, and three different transcripts (1437, 1166 and 521bp) of the SERPINA1 gene were identified. We amplified and sequenced different regions of the gene (5’ UTR, from exon 2 to exon 5 and 3’ UTR), and we found that the exon-intron structure of the gene is similar to that of human and bovine. We detected a total of 97 SNPs in cDNAs and gene sequences from 10 sheep of three different breeds. In adult sheep tissues a SERPINA1 gene expression analysis indicated a differential expression of the three different transcripts. The finding reported in this paper will aid further studies on possible involvement of the SERPINA1 gene in different physiological states and its possible association with production traits.
Copy number alterations detected by whole exome and whole genome sequencing of esophageal adenocarcinoma
Esophageal adenocarcinoma (EA) is among the leading causes of cancer mortality, especially in developed countries. A high level of somatic copy number alterations (CNAs) accumulates over the decades in the progression from Barrett’s esophagus, the precursor lesion, to EA. Accurate identification of somatic CNAs is essential to understand cancer development. Many studies have been conducted for the detection of CNA in EA using microarrays. Next-generation sequencing (NGS) technologies are believed to have advantages in sensitivity and accuracy to detect CNA, yet no NGS-based CNA detection in EA has been reported. In this study, we analyzed whole-exome (WES) and whole-genome sequencing (WGS) data for detecting CNA from a published large-scale genomic study of EA. Two specific comparisons were conducted. First, the recurrent CNAs based on WGS and WES data from 145 EA samples were compared to those found in five previous microarray-based studies. We found that the majority of the previously identified regions were also detected in this study. Interestingly, some novel amplifications and deletions were discovered using the NGS data. In particular, SKI and PRKCZ detected in a deletion region are involved in transforming growth factor-β pathway, suggesting the potential utility of novel biomarkers for EA. Second, we compared CNAs detected in WGS and WES data from the same 15 EA samples. No large-scale CNA was identified statistically more frequently by WES or WGS, while more focal-scale CNAs were detected by WGS than by WES. Our results suggest that NGS can replace microarrays to detect CNA in EA. WGS is superior to WES in that it can offer finer resolution for the detection, though if the interest is on recurrent CNAs, WES can be preferable to WGS for its cost-effectiveness. The online version of this article (doi:10.1186/s40246-015-0044-0) contains supplementary material, which is available to authorized users.
Reconstructing the Population Genetic History of the Caribbean
The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse—which today is reflected by shorter, older ancestry tracts—consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse—reflected by longer, younger tracts—is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived. Latinos are often regarded as a single heterogeneous group, whose complex variation is not fully appreciated in several social, demographic, and biomedical contexts. By making use of genomic data, we characterize ancestral components of Caribbean populations on a sub-continental level and unveil fine-scale patterns of population structure distinguishing insular from mainland Caribbean populations as well as from other Hispanic/Latino groups. We provide genetic evidence for an inland South American origin of the Native American component in island populations and for extensive pre-Columbian gene flow across the Caribbean basin. The Caribbean-derived European component shows significant differentiation from parental Iberian populations, presumably as a result of founder effects during the colonization of the New World. Based on demographic models, we reconstruct the complex population history of the Caribbean since the onset of continental admixture. We find that insular populations are best modeled as mixtures absorbing two pulses of African migrants, coinciding with the early and maximum activity stages of the transatlantic slave trade. These two pulses appear to have originated in different regions within West Africa, imprinting two distinguishable signatures on present-day Afro-Caribbean genomes and shedding light on the genetic impact of the slave trade in the Caribbean.
Identification of selective sweeps reveals divergent selection between Chinese Holstein and Simmental cattle populations
The identification of signals left by recent positive selection provides a feasible approach for targeting genomic variants that underlie complex traits and fitness. A better understanding of the selection mechanisms that occurred during the evolution of species can also be gained. In this study, we simultaneously detected the genome-wide footprints of recent positive selection that occurred within and between Chinese Holstein and Simmental populations, which have been subjected to artificial selection for distinct purposes. We conducted analyses using various complementary approaches, including LRH, XP-EHH and FST, based on the Illumina 770K high-density single nucleotide polymorphism (SNP) array, to enable more comprehensive detection. We successfully constructed profiles of selective signals in both cattle populations. To further annotate these regions, we identified a set of novel functional genes related to growth, reproduction, immune response and milk production. There were no overlapping candidate windows between the two breeds. Finally, we investigated the distribution of SNPs that had low FST values across five distinct functional regions in the genome. In the low-minor allele frequency bin, we found a higher proportion of low-FST SNPs in the exons of the bovine genome, which indicates strong purifying selection of the exons. The selection signatures identified in these two populations demonstrated positive selection pressure on a set of important genes with potential functions that are involved in many biological processes. We also demonstrated that in the bovine genome, exons were under strong purifying selection. Our findings provide insight into the mechanisms of artificial selection and will facilitate follow-up functional studies of potential candidate genes that are related to various economically important traits in cattle. The online version of this article (doi:10.1186/s12711-016-0254-5) contains supplementary material, which is available to authorized users.
Investigating the relationship between UMODL1 gene polymorphisms and high myopia: a case–control study in Chinese
The UMODL1 gene was found to be associated with high myopia in Japanese. This study aimed to investigate this gene for association with high myopia in Chinese. Two groups of unrelated Han Chinese from Hong Kong were recruited using the same criteria: Sample Set 1 comprising 356 controls (spherical equivalent, SE, within ±1 diopter or D) and 356 cases (SE ≤ −8D), and Sample Set 2 comprising 394 controls and 526 cases. Fifty-nine tag single nucleotide polymorphisms (SNPs) were selected and genotyped for Sample Set 1. Four SNPs were followed up with Sample Set 2. Both single-marker and haplotype analyses were performed with cases defined by different SE thresholds. Secondary phenotypes were also analyzed for association with genotypes. Data filtering left 57 SNPs for analysis. Single-marker analysis did not reveal any significant differences between cases and controls in the initial study. However, haplotype GCT for markers rs220168-rs220170-rs11911271 showed marginal significance (empirical P = 0.076; SE ≤ −12D for cases), but could not be replicated in the follow-up study. In contrast, non-synonymous SNP rs3819142 was associated with high myopia (SE ≤ −10D) in the follow-up study, but could not be confirmed using Sample Set 1. The SNP rs2839471, positive in the original Japanese study, gave negative results in all our analyses. Exploratory analysis of secondary phenotypes indicated that allele C of rs220120 was associated with anterior chamber depth (adjusted P = 0.0460). Common UMODL1 polymorphisms were unlikely to be important in the genetic susceptibility to high myopia in Han Chinese.
EIN2 dependent regulation of acetylation of histone H3K14 and non canonical histone H3K23 in ethylene signalling
Ethylene gas is essential for many developmental processes and stress responses in plants. EIN2 plays a key role in ethylene signalling but its function remains enigmatic. Here, we show that ethylene specifically elevates acetylation of histone H3K14 and the non-canonical acetylation of H3K23 in etiolated seedlings. The up-regulation of these two histone marks positively correlates with ethylene-regulated transcription activation, and the elevation requires EIN2. Both EIN2 and EIN3 interact with a SANT domain protein named EIN2 nuclear associated protein 1 (ENAP1), overexpression of which results in elevation of histone acetylation and enhanced ethylene-inducible gene expression in an EIN2-dependent manner. On the basis of these findings we propose a model where, in the presence of ethylene, the EIN2 C terminus contributes to downstream signalling via the elevation of acetylation at H3K14 and H3K23. ENAP1 may potentially mediate ethylene-induced histone acetylation via its interactions with EIN2 C terminus. The translocation of the C-terminal domain of EIN2 to the nucleus is essential for induction of gene expression in response to the plant hormone ethylene. Here, Zhang et al. show that EIN2 is required for ethylene-inducible elevation of histone acetylation marks associated with transcriptional activation.
Dataset of TWIST1 regulated genes in the cranial mesoderm and a transcriptome comparison of cranial mesoderm and cranial neural crest
This article contains data related to the research article entitled “Transcriptional targets of TWIST1 in the cranial mesoderm regulate cell-matrix interactions and mesenchyme maintenance” by Bildsoe et al. (2016) [1]. The data presented here are derived from: (1) a microarray-based comparison of sorted cranial mesoderm (CM) and cranial neural crest (CNC) cells from E9.5 mouse embryos; (2) comparisons of transcription profiles of head tissues from mouse embryos with a CM-specific loss-of-function of Twist1 and control mouse embryos collected at E8.5 and E9.5; (3) ChIP-seq using a TWIST1-specific monoclonal antibody with chromatin extracts from TWIST1-expressing MDCK cells, a model for a TWIST1-dependent mesenchymal state.
A Genome Wide Linkage Study for Chronic Obstructive Pulmonary Disease in a Dutch Genetic Isolate Identifies Novel Rare Candidate Variants
Chronic obstructive pulmonary disease (COPD) is a complex and heritable disease, associated with multiple genetic variants. Specific familial types of COPD may be explained by rare variants, which have not been widely studied. We aimed to discover rare genetic variants underlying COPD through a genome-wide linkage scan. Affected-only analysis was performed using the 6K Illumina Linkage IV Panel in 142 cases clustered in 27 families from a genetic isolate, the Erasmus Rucphen Family (ERF) study. Potential causal variants were identified by searching for shared rare variants in the exome-sequence data of the affected members of the families contributing most to the linkage peak. The identified rare variants were then tested for association with COPD in a large meta-analysis of several cohorts. Significant evidence for linkage was observed on chromosomes 15q14–15q25 [logarithm of the odds (LOD) score = 5.52], 11p15.4–11q14.1 (LOD = 3.71) and 5q14.3–5q33.2 (LOD = 3.49). In the chromosome 15 peak, that harbors the known COPD locus for nicotinic receptors, and in the chromosome 5 peak we could not identify shared variants. In the chromosome 11 locus, we identified four rare (minor allele frequency (MAF) <0.02), predicted pathogenic, missense variants. These were shared among the affected family members. The identified variants localize to genes including neuroblast differentiation-associated protein (AHNAK), previously associated with blood biomarkers in COPD, phospholipase C Beta 3 (PLCB3), shown to increase airway hyper-responsiveness, solute carrier family 22-A11 (SLC22A11), involved in amino acid metabolism and ion transport, and metallothionein-like protein 5 (MTL5), involved in nicotinate and nicotinamide metabolism. Association of SLC22A11 and MTL5 variants were confirmed in the meta-analysis of 9,888 cases and 27,060 controls. In conclusion, we have identified novel rare variants in plausible genes related to COPD. Further studies utilizing large sample whole-genome sequencing should further confirm the associations at chromosome 11 and investigate the chromosome 15 and 5 linked regions.
Lumbosacral stenosis in Labrador retriever military working dogs – an exomic exploratory study
Canine lumbosacral stenosis is defined as narrowing of the caudal lumbar and/or sacral vertebral canal. A risk factor for neurologic problems in many large sized breeds, lumbosacral stenosis can also cause early retirement in Labrador retriever military working dogs. Though vital for conservative management of the condition, early detection is complicated by the ambiguous nature of clinical signs of lumbosacral stenosis in stoic and high-drive Labrador retriever military working dogs. Though clinical diagnoses of lumbosacral stenosis using CT imaging are standard, they are usually not performed unless dogs present with clinical symptoms. Understanding the underlying genomic mechanisms would be beneficial in developing early detection methods for lumbosacral stenosis, which could prevent premature retirement in working dogs. The exomes of 8 young Labrador retriever military working dogs (4 affected and 4 unaffected by lumbosacral stenosis, phenotypically selected by CT image analyses from 40 dogs with no reported clinical signs of the condition) were sequenced to identify and annotate exonic variants between dogs negative and positive for lumbosacral stenosis. Two-hundred and fifty-two variants were detected to be homozygous for the wild allele and either homozygous or heterozygous for the variant allele. Seventeen non-disruptive variants were detected that could affect protein effectiveness in 7 annotated (SCN1B, RGS9BP, ASXL3, TTR, LRRC16B, PTPRO, ZBBX) and 3 predicted genes (EEF1A1, DNAJA1, ZFX). No exonic variants were detected in any of the canine orthologues for human lumbar spinal stenosis candidate genes. TTR (transthyretin) gene could be a possible candidate for lumbosacral stenosis in Labrador retrievers based on previous human studies that have reported an association between human lumbar spinal stenosis and transthyretin protein amyloidosis. Other genes identified with exonic variants in this study but with no known published association with lumbosacral stenosis and/or lumbar spinal stenosis could also be candidate genes for future canine lumbosacral stenosis studies but their roles remain currently unknown. Human lumbar spinal stenosis candidate genes also cannot be ruled out as lumbosacral stenosis candidate genes. More definitive genetic investigations of this condition are needed before any genetic test for lumbosacral stenosis in Labrador retriever can be developed. The online version of this article (10.1186/s40575-017-0052-6) contains supplementary material, which is available to authorized users.
A nonsense mutation in PRNP associated with clinical Alzheimer's disease☆
Here, we describe a nonsense haplotype in PRNP associated with clinical Alzheimer's disease. The patient presented an early-onset of cognitive decline with memory loss as the primary cognitive problem. Whole-exome sequencing revealed a nonsense mutation in PRNP (NM_000311, c.C478T; p.Q160*; rs80356711) associated with homozygosity for the V allele at position 129 of the protein, further highlighting how very similar genotypes in PRNP result in strikingly different phenotypes.
AR 13, a Celecoxib Derivative, Directly Kills Francisella In Vitro and Aids Clearance and Mouse Survival In Vivo
Francisella tularensis (F. tularensis) is the causative agent of tularemia and is classified as a Tier 1 select agent. No licensed vaccine is currently available in the United States and treatment of tularemia is confined to few antibiotics. In this study, we demonstrate that AR-13, a derivative of the cyclooxygenase-2 inhibitor celecoxib, exhibits direct in vitro bactericidal killing activity against Francisella including a type A strain of F. tularensis (SchuS4) and the live vaccine strain (LVS), as well as toward the intracellular proliferation of LVS in macrophages, without causing significant host cell toxicity. Identification of an AR-13-resistant isolate indicates that this compound has an intracellular target(s) and that efflux pumps can mediate AR-13 resistance. In the mouse model of tularemia, AR-13 treatment protected 50% of the mice from lethal LVS infection and prolonged survival time from a lethal dose of F. tularensis SchuS4. Combination of AR-13 with a sub-optimal dose of gentamicin protected 60% of F. tularensis SchuS4-infected mice from death. Taken together, these data support the translational potential of AR-13 as a lead compound for the further development of new anti-Francisella agents.
Complimentary mechanisms of dual checkpoint blockade expand unique T cell repertoires and activate adaptive anti tumor immunity in triple negative breast tumors
Triple-negative breast cancer (TNBC) is an aggressive and molecularly diverse breast cancer subtype typified by the presence of p53 mutations (∼80%), elevated immune gene signatures and neoantigen expression, as well as the presence of tumor infiltrating lymphocytes (TILs). As these factors are hypothesized to be strong immunologic prerequisites for the use of immune checkpoint blockade (ICB) antibodies, multiple clinical trials testing single ICBs have advanced to Phase III, with early indications of heterogeneous response rates of <20% to anti-PD1 and anti-PDL1 ICB. While promising, these modest response rates highlight the need for mechanistic studies to understand how different ICBs function, how their combination impacts functionality and efficacy, as well as what immunologic parameters predict efficacy to different ICBs regimens in TNBC. To address these issues, we tested anti-PD1 and anti-CTLA4 in multiple models of TNBC and found that their combination profoundly enhanced the efficacy of either treatment alone. We demonstrate that this efficacy is due to anti-CTLA4-driven expansion of an individually unique T-cell receptor (TCR) repertoire whose functionality is enhanced by both intratumoral Treg suppression and anti-PD1 blockade of tumor expressed PDL1. Notably, the individuality of the TCR repertoire was observed regardless of whether the tumor cells expressed a nonself antigen (ovalbumin) or if tumor-specific transgenic T-cells were transferred prior to sequencing. However, responsiveness was strongly correlated with systemic measures of tumor-specific T-cell and B-cell responses, which along with systemic assessment of TCR expansion, may serve as the most useful predictors for clinical responsiveness in future clinical trials of TNBC utilizing anti-PD1/anti-CTLA4 ICB.
Rare variants of small effect size in neuronal excitability genes influence clinical outcome in Japanese cases of SCN1A truncation positive Dravet syndrome
Dravet syndrome (DS) is a rare, devastating form of childhood epilepsy that is often associated with mutations in the voltage-gated sodium channel gene, SCN1A. There is considerable variability in expressivity within families, as well as among individuals carrying the same primary mutation, suggesting that clinical outcome is modulated by variants at other genes. To identify modifier gene variants that contribute to clinical outcome, we sequenced the exomes of 22 individuals at both ends of a phenotype distribution (i.e., mild and severe cognitive condition). We controlled for variation associated with different mutation types by limiting inclusion to individuals with a de novo truncation mutation resulting in SCN1A haploinsufficiency. We performed tests aimed at identifying 1) single common variants that are enriched in either phenotypic group, 2) sets of common or rare variants aggregated in and around genes associated with clinical outcome, and 3) rare variants in 237 candidate genes associated with neuronal excitability. While our power to identify enrichment of a common variant in either phenotypic group is limited as a result of the rarity of mild phenotypes in individuals with SCN1A truncation variants, our top candidates did not map to functional regions of genes, or in genes that are known to be associated with neurological pathways. In contrast, we found a statistically-significant excess of rare variants predicted to be damaging and of small effect size in genes associated with neuronal excitability in severely affected individuals. A KCNQ2 variant previously associated with benign neonatal seizures is present in 3 of 12 individuals in the severe category. To compare our results with the healthy population, we performed a similar analysis on whole exome sequencing data from 70 Japanese individuals in the 1000 genomes project. Interestingly, the frequency of rare damaging variants in the same set of neuronal excitability genes in healthy individuals is nearly as high as in severely affected individuals. Rather than a single common gene/variant modifying clinical outcome in SCN1A-related epilepsies, our results point to the cumulative effect of rare variants with little to no measurable phenotypic effect (i.e., typical genetic background) unless present in combination with a disease-causing truncation mutation in SCN1A.
De novo derivation of proteomes from transcriptomes for transcript and protein identification
Identification of proteins by tandem mass spectrometry requires a database of the proteins that could be in the sample. This is available for model species (e.g. humans) but not for non-model species. Ideally, for a non-model species the sequencing of expressed mRNA would generate a protein database for mass spectrometry based identification, allowing detection of genes and proteins using high throughput sequencing and protein identification technologies. Here we use human cells infected with human adenovirus as a complex and dynamic model to demonstrate this approach is robust. Our Proteomics Informed by Transcriptomics technique identifies >99% of over 3700 distinct proteins identified using traditional analysis reliant on comprehensive human and adenovirus protein lists. This facilitates high throughput acquisition of direct evidence for transcripts and proteins in non-model species. Critically, we show this approach can also be used to highlight genes and proteins undergoing dynamic changes in post transcriptional protein stability.
New Genetic Loci Associated with Preharvest Sprouting and Its Evaluation Based on the Model Equation in Rice
Preharvest sprouting (PHS) in rice panicles is an important quantitative trait that causes both yield losses and the deterioration of grain quality under unpredictable moisture conditions at the ripening stage. However, the molecular mechanism underlying PHS has not yet been elucidated. Here, we explored the genetic loci associated with PHS in rice and formulated a model regression equation for rapid screening for use in breeding programs. After re-sequencing 21 representative accessions for PHS and performing enrichment analysis, we found that approximately 20,000 SNPs revealed distinct allelic distributions between PHS resistant and susceptible accessions. Of these, 39 candidate SNP loci were selected, including previously reported QTLs. We analyzed the genotypes of 144 rice accessions to determine the association between PHS and the 39 candidate SNP loci, 10 of which were identified as significantly affecting PHS based on allele type. Based on the allele types of the SNP loci, we constructed a regression equation for evaluating PHS, accounting for an R2 value of 0.401 in japonica rice. We validated this equation using additional accessions, which exhibited a significant R2 value of 0.430 between the predicted values and actual measurements. The newly detected SNP loci and the model equation could facilitate marker-assisted selection to predict PHS in rice germplasm and breeding lines.
Differential Natural Selection of Human Zinc Transporter Genes between African and Non African Populations
Zinc transporters play important roles in all eukaryotes by maintaining the rational zinc concentration in cells. However, the diversity of zinc transporter genes (ZTGs) remains poorly studied. Here, we investigated the genetic diversity of 24 human ZTGs based on the 1000 Genomes data. Some ZTGs show small population differences, such as SLC30A6 with a weighted-average FST (WA-FST = 0.015), while other ZTGs exhibit considerably large population differences, such as SLC30A9 (WA-FST = 0.284). Overall, ZTGs harbor many more highly population-differentiated variants compared with random genes. Intriguingly, we found that SLC30A9 was underlying natural selection in both East Asians (EAS) and Africans (AFR) but in different directions. Notably, a non-synonymous variant (rs1047626) in SLC30A9 is almost fixed with 96.4% A in EAS and 92% G in AFR, respectively. Consequently, there are two different functional haplotypes exhibiting dominant abundance in AFR and EAS, respectively. Furthermore, a strong correlation was observed between the haplotype frequencies of SLC30A9 and distributions of zinc contents in soils or crops. We speculate that the genetic differentiation of ZTGs could directly contribute to population heterogeneity in zinc transporting capabilities and local adaptations of human populations in regard to the local zinc state or diets, which have both evolutionary and medical implications.
Shared ancestral susceptibility to colorectal cancer and other nutrition related diseases
The majority of non-syndromic colorectal cancers (CRCs) can be described as a complex disease. A two-stage case–control study on CRC susceptibility was conducted to assess the influence of the ancestral alleles in the polymorphisms previously associated with nutrition-related complex diseases. In stage I, 28 single nucleotide polymorphisms (SNPs) were genotyped in a hospital-based Czech population (1025 CRC cases, 787 controls) using an allele-specific PCR-based genotyping system (KASPar®). In stage II, replication was carried out for the five SNPs with the lowest p values. The replication set consisted of 1798 CRC cases and 1810 controls from a population-based German study (DACHS). Odds ratios (ORs) and 95% confidence intervals (CIs) for associations between genotypes and CRC risk were estimated using logistic regression. To identify signatures of selection, Fay-Wu’s H and Integrated Haplotype Score (iHS) were estimated. In the Czech population, carriers of the ancestral alleles of AGT rs699 and CYP3A7 rs10211 showed an increased risk of CRC (OR 1.26 and 1.38, respectively; two-sided p≤0.05), whereas carriers of the ancestral allele of ENPP1 rs1044498 had a decreased risk (OR 0.79; p≤0.05). For rs1044498, the strongest association was detected in the Czech male subpopulation (OR 0.61; p=0.0015). The associations were not replicated in the German population. Signatures of selection were found for all three analyzed genes. Our study showed evidence of association for the ancestral alleles of polymorphisms in AGT and CYP3A7 and for the derived allele of a polymorphism in ENPP1 with an increased risk of CRC in Czechs, but not in Germans. The ancestral alleles of these SNPs have previously been associated with nutrition-related diseases hypertension (AGT and CYP3A7) and insulin resistance (ENPP1). Future studies may shed light on the complex genetic and environmental interactions between different types of nutrition-related diseases.
Accurate Breakpoint Mapping in Apparently Balanced Translocation Families with Discordant Phenotypes Using Whole Genome Mate Pair Sequencing
Familial apparently balanced translocations (ABTs) segregating with discordant phenotypes are extremely challenging for interpretation and counseling due to the scarcity of publications and lack of routine techniques for quick investigation. Recently, next generation sequencing has emerged as an efficacious methodology for precise detection of translocation breakpoints. However, studies so far have mainly focused on de novo translocations. The present study focuses specifically on familial cases in order to shed some light to this diagnostic dilemma. Whole-genome mate-pair sequencing (WG-MPS) was applied to map the breakpoints in nine two-way ABT carriers from four families. Translocation breakpoints and patient-specific structural variants were validated by Sanger sequencing and quantitative Real Time PCR, respectively. Identical sequencing patterns and breakpoints were identified in affected and non-affected members carrying the same translocations. PTCD1, ATP5J2-PTCD1, CADPS2, and STPG1 were disrupted by the translocations in three families, rendering them initially as possible disease candidate genes. However, subsequent mutation screening and structural variant analysis did not reveal any pathogenic mutations or unique variants in the affected individuals that could explain the phenotypic differences between carriers of the same translocations. In conclusion, we suggest that NGS-based methods, such as WG-MPS, can be successfully used for detailed mapping of translocation breakpoints, which can also be used in routine clinical investigation of ABT cases. Unlike de novo translocations, no associations were determined here between familial two-way ABTs and the phenotype of the affected members, in which the presence of cryptic imbalances and complex chromosomal rearrangements has been excluded. Future whole-exome or whole-genome sequencing will potentially reveal unidentified mutations in the patients underlying the discordant phenotypes within each family. In addition, larger studies are needed to determine the exact percentage for phenotypic risk in families with ABTs.
A Multi Breed Genome Wide Association Analysis for Canine Hypothyroidism Identifies a Shared Major Risk Locus on CFA12
Hypothyroidism is a complex clinical condition found in both humans and dogs, thought to be caused by a combination of genetic and environmental factors. In this study we present a multi-breed analysis of predisposing genetic risk factors for hypothyroidism in dogs using three high-risk breeds—the Gordon Setter, Hovawart and the Rhodesian Ridgeback. Using a genome-wide association approach and meta-analysis, we identified a major hypothyroidism risk locus shared by these breeds on chromosome 12 (p = 2.1x10-11). Further characterisation of the candidate region revealed a shared ~167 kb risk haplotype (4,915,018–5,081,823 bp), tagged by two SNPs in almost complete linkage disequilibrium. This breed-shared risk haplotype includes three genes (LHFPL5, SRPK1 and SLC26A8) and does not extend to the dog leukocyte antigen (DLA) class II gene cluster located in the vicinity. These three genes have not been identified as candidate genes for hypothyroid disease previously, but have functions that could potentially contribute to the development of the disease. Our results implicate the potential involvement of novel genes and pathways for the development of canine hypothyroidism, raising new possibilities for screening, breeding programmes and treatments in dogs. This study may also contribute to our understanding of the genetic etiology of human hypothyroid disease, which is one of the most common endocrine disorders in humans.
Draft Genome Sequence of Saccharomycopsis fermentans CBS 7830, a Predacious Yeast Belonging to the Saccharomycetales
Saccharomycopsis fermentans is an ascomycetous necrotrophic fungal pathogen that penetrates and kills fungal prey cells via targeted penetration pegs. Here, we report the draft genome sequence and scaffold assembly of this mycoparasite.
Complete mitogenome sequences of four flatfishes (Pleuronectiformes) reveal a novel gene arrangement of L strand coding genes
Few mitochondrial gene rearrangements are found in vertebrates and large-scale changes in these genomes occur even less frequently. It is difficult, therefore, to propose a mechanism to account for observed changes in mitogenome structure. Mitochondrial gene rearrangements are usually explained by the recombination model or tandem duplication and random loss model. In this study, the complete mitochondrial genomes of four flatfishes, Crossorhombus azureus (blue flounder), Grammatobothus krempfi, Pleuronichthys cornutus, and Platichthys stellatus were determined. A striking finding is that eight genes in the C. azureus mitogenome are located in a novel position, differing from that of available vertebrate mitogenomes. Specifically, the ND6 and seven tRNA genes (the Q, A, C, Y, S1, E, P genes) encoded by the L-strand have been translocated to a position between tRNA-T and tRNA-F though the original order of the genes is maintained. These special features are used to suggest a mechanism for C. azureus mitogenome rearrangement. First, a dimeric molecule was formed by two monomers linked head-to-tail, then one of the two sets of promoters lost function and the genes controlled by the disabled promoters became pseudogenes, non-coding sequences, and even were lost from the genome. This study provides a new gene-rearrangement model that accounts for the events of gene-rearrangement in a vertebrate mitogenome.
The Genome Sequence of Rickettsia felis Identifies the First Putative Conjugative Plasmid in an Obligate Intracellular Parasite
We sequenced the genome of Rickettsia felis, a flea-associated obligate intracellular α-proteobacterium causing spotted fever in humans. Besides a circular chromosome of 1,485,148 bp, R. felis exhibits the first putative conjugative plasmid identified among obligate intracellular bacteria. This plasmid is found in a short (39,263 bp) and a long (62,829 bp) form. R. felis contrasts with previously sequenced Rickettsia in terms of many other features, including a number of transposases, several chromosomal toxin–antitoxin genes, many more spoT genes, and a very large number of ankyrin- and tetratricopeptide-motif-containing genes. Host-invasion-related genes for patatin and RickA were found. Several phenotypes predicted from genome analysis were experimentally tested: conjugative pili and mating were observed, as well as β-lactamase activity, actin-polymerization-driven mobility, and hemolytic properties. Our study demonstrates that complete genome sequencing is the fastest approach to reveal phenotypic characters of recently cultured obligate intracellular bacteria. Rickettsia felis is an obligate intracellular bacterium that lives in fleas and causes spotted fever in humans. Its genome sequence provides the first evidence that such bacteria can undergo conjugation.
Draft Genome Sequence of Saccharomycopsis fodiens CBS 8332, a Necrotrophic Mycoparasite with Biocontrol Potential
Saccharomycopsis fodiens is an ascomycetous necrotrophic mycoparasite. Predator-prey interaction leads to killing of the host cell by a penetration peg and utilization of cell content by the predator. Here, we report the 14.9-Mb S. fodiens draft genome sequence assembled into 9 large scaffolds and 13 minor scaffolds (<20 kb).
A novel ammonia oxidizing archaeon from wastewater treatment plant: Its enrichment, physiological and genomic characteristics
Ammonia-oxidizing archaea (AOA) are recently found to participate in the ammonia removal processes in wastewater treatment plants (WWTPs), similar to their bacterial counterparts. However, due to lack of cultivated AOA strains from WWTPs, their functions and contributions in these systems remain unclear. Here we report a novel AOA strain SAT1 enriched from activated sludge, with its physiological and genomic characteristics investigated. The maximal 16S rRNA gene similarity between SAT1 and other reported AOA strain is 96% (with “Ca. Nitrosotenuis chungbukensis”), and it is affiliated with Wastewater Cluster B (WWC-B) based on amoA gene phylogeny, a cluster within group I.1a and specific for activated sludge. Our strain is autotrophic, mesophilic (25 °C–33 °C) and neutrophilic (pH 5.0–7.0). Its genome size is 1.62 Mb, with a large fragment inversion (accounted for 68% genomic size) inside. The strain could not utilize urea due to truncation of the urea transporter gene. The lack of the pathways to synthesize usual compatible solutes makes it intolerant to high salinity (>0.03%), but could adapt to low salinity (0.005%) environments. This adaptation, together with possibly enhanced cell-biofilm attachment ability, makes it suitable for WWTPs environment. We propose the name “Candidatus Nitrosotenuis cloacae” for the strain SAT1.
A marine inducible prophage vB_CibM P1 isolated from the aerobic anoxygenic phototrophic bacterium Citromicrobium bathyomarinum JL354
A prophage vB_CibM-P1 was induced by mitomycin C from the epipelagic strain Citromicrobium bathyomarinum JL354, a member of the alpha-IV subcluster of marine aerobic anoxygenic phototrophic bacteria (AAPB). The induced bacteriophage vB_CibM-P1 had Myoviridae-like morphology and polyhedral heads (approximately capsid 60–100 nm) with tail fibers. The vB_CibM-P1 genome is ~38 kb in size, with 66.0% GC content. The genome contains 58 proposed open reading frames that are involved in integration, DNA packaging, morphogenesis and bacterial lysis. VB_CibM-P1 is a temperate phage that can be directly induced in hosts. In response to mitomycin C induction, virus-like particles can increase to 7 × 109 per ml, while host cells decrease an order of magnitude. The vB_CibM-P1 bacteriophage is the first inducible prophage from AAPB.
Complete Genome Sequence of Enterococcus Bacteriophage EFLK1
We previously isolated EFDG1, a lytic phage against enterococci for therapeutic use. Nevertheless, EFDG1-resistant bacterial strains (EFDG1r) have evolved. EFLK1, a new highly effective phage against EFDG1r strains, was isolated in this study. The genome of EFLK1 was fully sequenced, analyzed, and deposited in GenBank.
Clear Genetic Distinctiveness between Human and Pig Derived Trichuris Based on Analyses of Mitochondrial Datasets
The whipworm, Trichuris trichiura, causes trichuriasis in ∼600 million people worldwide, mainly in developing countries. Whipworms also infect other animal hosts, including pigs (T. suis), dogs (T. vulpis) and non-human primates, and cause disease in these hosts, which is similar to trichuriasis of humans. Although Trichuris species are considered to be host specific, there has been considerable controversy, over the years, as to whether T. trichiura and T. suis are the same or distinct species. Here, we characterised the entire mitochondrial genomes of human-derived Trichuris and pig-derived Trichuris, compared them and then tested the hypothesis that the parasites from these two host species are genetically distinct in a phylogenetic analysis of the sequence data. Taken together, the findings support the proposal that T. trichiura and T. suis are separate species, consistent with previous data for nuclear ribosomal DNA. Using molecular analytical tools, employing genetic markers defined herein, future work should conduct large-scale studies to establish whether T. trichiura is found in pigs and T. suis in humans in endemic regions. Trichuriasis is a neglected tropical disease (NTD) caused by parasitic nematodes of the genus Trichuris (Nematoda), causing significant human and animal health problems as well as considerable socio-economic consequences world-wide. Although Trichuris species are considered to be relatively host specific, there has been significant controversy as to whether Trichuris infecting humans (recognized as T. trichiura) is a distinct species from that found in pigs (recognized as T. suis), or not. In the present study, we sequenced, annotated and compared the complete mitochondrial genomes of Trichuris from these two hosts and undertook a phylogenetic analysis of the mitochondrial datasets. This analysis showed clear genetic distinctiveness and strong statistical support for the hypothesis that T. trichiura and T. suis are separate species, consistent with previous studies using nuclear ribosomal DNA sequence data. Future studies could explore, using mitochondrial genetic markers defined in the present study, cross-transmission of Trichuris between pigs and humans in endemic regions, and the population genetics of T. trichiura and T. suis.
Draft Genome Sequence of Zymomonas mobilis ZM481 (ATCC 31823)
Zymomonas mobilis ZM481 (ATCC 31823) is an ethanol-tolerant strain that can produce the highest level of ethanol in Z. mobilis from glucose in the shortest time. Here, we report a draft genome sequence of ZM481, which can help us understand the genes related to the ethanol tolerance of this strain.
Intraspecific rearrangement of mitochondrial genome suggests the prevalence of the tandem duplication random loss (TDLR) mechanism in Quasipaa boulengeri
Tandem duplication followed by random loss (TDRL) is the most frequently invoked model to explain the diversity of gene rearrangements in metazoan mitogenomes. The initial stages of gene rearrangement are difficult to observe in nature, which limits our understanding of incipient duplication events and the subsequent process of random loss. Intraspecific gene reorganizations may represent intermediate states, and if so they potentially shed light on the evolutionary dynamics of TDRL. Nucleotide sequences in a hotspot of gene-rearrangement in 28 populations of a single species of frog, Quasipaa boulengeri, provide such predicted intermediate states. Gene order and phylogenetic analyses support a single tandem duplication event and a step-by-step process of random loss. Intraspecific gene rearrangements are not commonly found through comparison of all mitochondrial DNA records of amphibians and squamate reptiles in GenBank. The intraspecific variation in Q. boulengeri provides insights into the rate of partial duplications and deletions within a mitogenome, and reveals that fixation and gene-distribution in mitogenomic reorganization is likely non-adaptive. The online version of this article (doi:10.1186/s12864-016-3309-7) contains supplementary material, which is available to authorized users.
Draft Genome Sequence of Polaromonas glacialis Strain R3 9, a Psychrotolerant Bacterium Isolated from Arctic Glacial Foreland
Here we report the draft genome sequence of the psychrotolerant Polaromonas glacialis strain R3-9, isolated from Midtre Lovénbreen glacial foreland near Ny-Alesund, Svalbard Archipelago, Norway.
Rapid Evolution of the Mitochondrial Genome in Chalcidoid Wasps (Hymenoptera: Chalcidoidea) Driven by Parasitic Lifestyles
Among the Chalcidoids, hymenopteran parasitic wasps that have diversified lifestyles, a partial mitochondrial genome has been reported only from Nasonia. This genome had many unusual features, especially a dramatic reorganization and a high rate of evolution. Comparisons based on more mitochondrial genomic data from the same superfamily were required to reveal weather these unusual features are peculiar to Nasonia or not. In the present study, we sequenced the nearly complete mitochondrial genomes from the species Philotrypesis. pilosa and Philotrypesis sp., both of which were associated with Ficus hispida. The acquired data included all of the protein-coding genes, rRNAs, and most of the tRNAs, and in P. pilosa the control region. High levels of nucleotide divergence separated the two species. A comparison of all available hymenopteran mitochondrial genomes (including a submitted partial genome from Ceratosolen solmsi) revealed that the Chalcidoids had dramatic mitochondrial gene rearrangments, involved not only the tRNAs, but also several protein-coding genes. The AT-rich control region was translocated and inverted in Philotrypesis. The mitochondrial genomes also exhibited rapid rates of evolution involving elevated nonsynonymous mutations.
Complete Genome Sequence of the Streptococcus suis Temperate Bacteriophage ϕNJ2
Streptococcus suis is an important cause of meningitis, arthritis, and sudden death in young piglets and of meningitis in humans. A novel temperate S. suis-specific bacteriophage (ϕNJ2) was identified. The phage was induced from the S. suis strain NJ2 by using mitomycin C, and the whole genome sequence was determined. The ϕNJ2 genome is 37,282 bp in length and contains 56 open reading frames (ORFs). While 31 ORFs (55%) encoded hypothetical proteins, other ORFs were predicted to be functional, clearly indicating the novelty of ϕNJ2.
A Framework for Assessing the Concordance of Molecular Typing Methods and the True Strain Phylogeny of Campylobacter jejuni and C. coli Using Draft Genome Sequence Data
Tracking of sources of sporadic cases of campylobacteriosis remains challenging, as commonly used molecular typing methods have limited ability to unambiguously link genetically related strains. Genomics has become increasingly prominent in the public health response to enteric pathogens as methods enable characterization of pathogens at an unprecedented level of resolution. However, the cost of sequencing and expertise required for bioinformatic analyses remains prohibitive, and these comprehensive analyses are limited to a few priority strains. Although several molecular typing methods are currently widely used for epidemiological analysis of campylobacters, it is not clear how accurately these methods reflect true strain relationships. To address this, we have developed a framework and associated computational tools to rapidly analyze draft genome sequence data for the assessment of molecular typing methods against a “gold standard” based on the phylogenetic analysis of highly conserved core (HCC) genes with high sequence quality. We analyzed 104 publicly available whole genome sequences (WGS) of C. jejuni and C. coli. In addition to in silico determination of multi-locus sequence typing (MLST), flaA, and porA type, as well as comparative genomic fingerprinting (CGF) type, we inferred a “reference” phylogeny based on 389 HCC genes. Molecular typing data were compared to the reference phylogeny for concordance using the adjusted Wallace coefficient (AWC) with confidence intervals. Although MLST targets the sequence variability in core genes and CGF targets insertions/deletions of accessory genes, both methods are based on multi-locus analysis and provided better estimates of true phylogeny than methods based on single loci (porA, flaA). A more comprehensive WGS dataset including additional genetically related strains, both epidemiologically linked and unlinked, will be necessary to more comprehensively assess the performance of subtyping methods for outbreak investigations and surveillance activities. Analyses of the strengths and weaknesses of widely used typing methodologies in inferring true strain relationships will provide guidance in the interpretation of this data for epidemiological purposes.
The Dynamic Regulatory Genome of Capsaspora and the Origin of Animal Multicellularity
The unicellular ancestor of animals had a complex repertoire of genes linked to multicellular processes. This suggests that changes in the regulatory genome, rather than in gene innovation, were key to the origin of animals. Here, we carry out multiple functional genomic assays in Capsaspora owczarzaki, the unicellular relative of animals with the largest known gene repertoire for transcriptional regulation. We show that changing chromatin states, differential lincRNA expression, and dynamic cis-regulatory sites are associated with life cycle transitions in Capsaspora. Moreover, we demonstrate conservation of animal developmental transcription-factor networks and extensive network interconnection in this premetazoan organism. In contrast, however, Capsaspora lacks animal promoter types, and its regulatory sites are small, proximal, and lack signatures of animal enhancers. Overall, our results indicate that the emergence of animal multicellularity was linked to a major shift in genome cis-regulatory complexity, most notably the appearance of distal enhancer regulation. • Dynamic chromatin states and cis-regulatory sites in a unicellular context • Elaborate lincRNA regulation associated with a unicellular life cycle • Premetazoan origin of core metazoan developmental transcription-factor networks • Distal enhancer elements are a metazoan innovation Dynamic chromatin states and cis-regulatory sites in a unicellular context Elaborate lincRNA regulation associated with a unicellular life cycle Premetazoan origin of core metazoan developmental transcription-factor networks Distal enhancer elements are a metazoan innovation Analysis of the regulatory genome in one of our closest unicellular relatives suggests that the appearance of developmental promoters and distal enhancer elements, rather than of gene innovations, may have been the critical events underlying the origin of multicellular organisms.
Correction of the auditory phenotype in C57BL/6N mice via CRISPR/Cas9 mediated homology directed repair
Nuclease-based technologies have been developed that enable targeting of specific DNA sequences directly in the zygote. These approaches provide an opportunity to modify the genomes of inbred mice, and allow the removal of strain-specific mutations that confound phenotypic assessment. One such mutation is the Cdh23ahl allele, present in several commonly used inbred mouse strains, which predisposes to age-related progressive hearing loss. We have used targeted CRISPR/Cas9-mediated homology directed repair (HDR) to correct the Cdh23ahl allele directly in C57BL/6NTac zygotes. Employing offset-nicking Cas9 (D10A) nickase with paired RNA guides and a single-stranded oligonucleotide donor template we show that allele repair was successfully achieved. To investigate potential Cas9-mediated ‘off-target’ mutations in our corrected mouse, we undertook whole-genome sequencing and assessed the ‘off-target’ sites predicted for the guide RNAs (≤4 nucleotide mis-matches). No induced sequence changes were identified at any of these sites. Correction of the progressive hearing loss phenotype was demonstrated using auditory-evoked brainstem response testing of mice at 24 and 36 weeks of age, and rescue of the progressive loss of sensory hair cell stereocilia bundles was confirmed using scanning electron microscopy of dissected cochleae from 36-week-old mice. CRISPR/Cas9-mediated HDR has been successfully utilised to efficiently correct the Cdh23ahl allele in C57BL/6NTac mice, and rescue the associated auditory phenotype. The corrected mice described in this report will allow age-related auditory phenotyping studies to be undertaken using C57BL/6NTac-derived models, such as those generated by the International Mouse Phenotyping Consortium (IMPC) programme. The online version of this article (doi:10.1186/s13073-016-0273-4) contains supplementary material, which is available to authorized users.
Patterns of Genome Wide Variation in Glossina fuscipes fuscipes Tsetse Flies from Uganda
The tsetse fly Glossina fuscipes fuscipes (Gff) is the insect vector of the two forms of Human African Trypanosomiasis (HAT) that exist in Uganda. Understanding Gff population dynamics, and the underlying genetics of epidemiologically relevant phenotypes is key to reducing disease transmission. Using ddRAD sequence technology, complemented with whole-genome sequencing, we developed a panel of ∼73,000 single-nucleotide polymorphisms (SNPs) distributed across the Gff genome that can be used for population genomics and to perform genome-wide-association studies. We used these markers to estimate genomic patterns of linkage disequilibrium (LD) in Gff, and used the information, in combination with outlier-locus detection tests, to identify candidate regions of the genome under selection. LD in individual populations decays to half of its maximum value (r2max/2) between 1359 and 2429 bp. The overall LD estimated for the species reaches r2max/2 at 708 bp, an order of magnitude slower than in Drosophila. Using 53 infected (Trypanosoma spp.) and uninfected flies from four genetically distinct Ugandan populations adapted to different environmental conditions, we were able to identify SNPs associated with the infection status of the fly and local environmental adaptation. The extent of LD in Gff likely facilitated the detection of loci under selection, despite the small sample size. Furthermore, it is probable that LD in the regions identified is much higher than the average genomic LD due to strong selection. Our results show that even modest sample sizes can reveal significant genetic associations in this species, which has implications for future studies given the difficulties of collecting field specimens with contrasting phenotypes for association analysis.
Genome Wide Estimates of Mutation Rates and Spectrum in Schizosaccharomyces pombe Indicate CpG Sites are Highly Mutagenic Despite the Absence of DNA Methylation
We accumulated mutations for 1952 generations in 79 initially identical, haploid lines of the fission yeast Schizosaccharomyces pombe, and then performed whole-genome sequencing to determine the mutation rates and spectrum. We captured 696 spontaneous mutations across the 79 mutation accumulation (MA) lines. We compared the mutation spectrum and rate to a recently published equivalent experiment on the same species, and to another model ascomycetous yeast, the budding yeast Saccharomyces cerevisiae. While the two species are approximately 600 million years diverged from each other, they share similar life histories, genome size and genomic G/C content. We found that Sc. pombe and S. cerevisiae have similar mutation rates, but Sc. pombe exhibits a stronger insertion bias. Intriguingly, we observed an increased mutation rate at cytosine nucleotides, specifically CpG nucleotides, which is also seen in S. cerevisiae. However, the absence of methylation in Sc. pombe and the pattern of mutation at these sites, primarily C → A as opposed to C → T, strongly suggest that the increased mutation rate is not caused by deamination of methylated cytosines. This result implies that the high mutability of CpG dinucleotides in other species may be caused in part by a methylation-independent mechanism. Many of our findings mirror those seen in the recent study, despite the use of different passaging conditions, indicating that MA is a reliable method for estimating mutation rates and spectra.
Frequent variations in cancer related genes may play prognostic role in treatment of patients with chronic myeloid leukemia
Genome variability of host genome and cancer cells play critical role in diversity of response to existing therapies and overall success in treating oncological diseases. In chronic myeloid leukemia targeted therapy with tyrosine kinase inhibitors demonstrates high efficacy in most of the patients. However about 15 % of patients demonstrate primary resistance to standard therapy. Whole exome sequencing is a good tool for unbiased search of genetic variations important for prognosis of survival and therapy efficacy in many cancers. We apply this approach to CML patients with optimal response and failure of tyrosine kinase therapy. We analyzed exome variations between optimal responders and failures and found 7 variants in cancer-related genes with different genotypes in two groups of patients. Five of them were found in optimal responders: rs11579366, rs1990236, rs176037, rs10653661, rs3803264 and two in failures: rs3099950, rs9471966. These variants were found in genes associated with cancers (ANKRD35, DNAH9, MAGEC1, TOX3) or participating in cancer-related signaling pathways (THSD1, MORN2, PTCRA). We found gene variants which may become early predictors of the therapy outcome and allow development of new early prognostic tests for estimation of therapy efficacy in CML patients. Normal genetic variation may influence therapy efficacy during targeted treatment of cancers. The online version of this article (doi:10.1186/s12863-015-0308-7) contains supplementary material, which is available to authorized users.
A systematic approach to the reporting of medically relevant findings from whole genome sequencing
The MedSeq Project is a randomized clinical trial developing approaches to assess the impact of integrating genome sequencing into clinical medicine. To facilitate the return of results of potential medical relevance to physicians and patients participating in the MedSeq Project, we sought to develop a reporting approach for the effective communication of such findings. Genome sequencing was performed on the Illumina HiSeq platform. Variants were filtered, interpreted, and validated according to methods developed by the Laboratory for Molecular Medicine and consistent with current professional guidelines. The GeneInsight software suite, which is integrated with the Partners HealthCare electronic health record, was used for variant curation, report drafting, and delivery. We developed a concise 5–6 page Genome Report (GR) featuring a single-page summary of results of potential medical relevance with additional pages containing structured variant, gene, and disease information along with supporting evidence for reported variants and brief descriptions of associated diseases and clinical implications. The GR is formatted to provide a succinct summary of genomic findings, enabling physicians to take appropriate steps for disease diagnosis, prevention, and management in their patients. Our experience highlights important considerations for the reporting of results of potential medical relevance and provides a framework for interpretation and reporting practices in clinical genome sequencing. The online version of this article (doi:10.1186/s12881-014-0134-1) contains supplementary material, which is available to authorized users.
Evaluating the breast cancer predisposition role of rare variants in genes associated with low penetrance breast cancer risk SNPs
Genome-wide association studies (GWASs) have identified numerous single-nucleotide polymorphisms (SNPs) associated with small increases in breast cancer risk. Studies to date suggest that some SNPs alter the expression of the associated genes, which potentially mediates risk modification. On this basis, we hypothesised that some of these genes may be enriched for rare coding variants associated with a higher breast cancer risk. The coding regions and exon-intron boundaries of 56 genes that have either been proposed by GWASs to be the regulatory targets of the SNPs and/or located < 500 kb from the risk SNPs were sequenced in index cases from 1043 familial breast cancer families that previously had negative test results for BRCA1 and BRCA2 mutations and 944 population-matched cancer-free control participants from an Australian population. Rare (minor allele frequency ≤ 0.001 in the Exome Aggregation Consortium and Exome Variant Server databases) loss-of-function (LoF) and missense variants were studied. LoF variants were rare in both the cases and control participants across all the candidate genes, with only 38 different LoF variants observed in a total of 39 carriers. For the majority of genes (n = 36), no LoF variants were detected in either the case or control cohorts. No individual gene showed a significant excess of LoF or missense variants in the cases compared with control participants. Among all candidate genes as a group, the total number of carriers with LoF variants was higher in the cases than in the control participants (26 cases and 13 control participants), as was the total number of carriers with missense variants (406 versus 353), but neither reached statistical significance (p = 0.077 and p = 0.512, respectively). The genes contributing most of the excess of LoF variants in the cases included TET2, NRIP1, RAD51B and SNX32 (12 cases versus 2 control participants), whereas ZNF283 and CASP8 contributed largely to the excess of missense variants (25 cases versus 8 control participants). Our data suggest that rare LoF and missense variants in genes associated with low-penetrance breast cancer risk SNPs may contribute some additional risk, but as a group these genes are unlikely to be major contributors to breast cancer heritability. The online version of this article (doi:10.1186/s13058-017-0929-z) contains supplementary material, which is available to authorized users.
Associations of epithelial sodium channel genes with blood pressure: the GenSalt study
In order to investigate associations of SCNN1A, SCNN1G and SCNN1B genes with blood pressure (BP) in Han Chinese population, we included 2 880 participants did not use antihypertensive medication in the month prior to the baseline survey in the current analysis. Forty-four tag-SNPs in epithelial sodium channel (ENaC) genes were selected and genotyped and nine BP measurements were obtained during 3-day examination. In single-marker analyses, we identified significant associations of SCNN1A marker rs13306613 with diastolic BP (DBP) and SCNN1B marker rs12447134 with systolic BP (SBP) under codominant model after Bonferroni correction (P= 2.82×10−5 and 4.63×10−4, respectively). In addition, 5 SNPs in SCNN1G and 4 SNPs in SCNN1B achieved nominal significance for SBP, DBP or mean arterial pressure (MAP) under the additive model. For example, the minor C allele of rs5735 in SCNN1G gene was associated with decreased SBP, DBP and MAP (P=0.016, 5.41×10−3, and 4.36×10−3, respectively). Gene-based results showed significant associations of SCNN1G and SCNN1Bwith BP levels. This study suggested that ENaC genes play important roles in BP regulation in the Han Chinese population. Future studies are warranted to replicate these findings and functional studies are needed to identify true causal variants in ENaC genes.
Common Variation in the Fat Mass and Obesity Associated (FTO) Gene Confers Risk of Obesity and Modulates BMI in the Chinese Population
OBJECTIVE— Genetic variants in the fat mass and obesity-associated (FTO) gene have been linked with obesity and type 2 diabetes in European populations. We aimed to test the role of FTO genetic variants in obesity and type 2 diabetes in the Chinese population. RESEARCH DESIGN AND METHODS— We genotyped 19 single-nucleotide polymorphisms (SNPs) spanning from the 3′ end of the neighboring RPGRIP1L gene to the 5′ flanking region of the FTO gene. We analyzed their associations with obesity (638 case and 1,610 control subjects), type 2 diabetes (759 case and 784 control subjects), and obesity-related traits in nondiabetic subjects. RESULTS— Among the 19 SNPs, the rs9939609 A allele was strongly associated with obesity (P = 7.0 × 10−4) and BMI (P = 0.0024) in the Chinese population. The odds ratio for obesity was 2.60 (95% CI 1.24–5.46) (P = 0.011) for the AA genotype and 1.32 (1.05–1.66) (P = 0.018) for the AT genotype compared with the TT genotype. Each additional copy of the rs9936609 A allele was associated with a BMI increase of ∼0.37 kg/m2. The rs9939609 A allele was substantially less common in the Chinese population than in the European population (12.6 vs. 45%). We did not find significant associations of the 19 SNPs with type 2 diabetes or other obesity-related traits. CONCLUSIONS— Genetic variation in the FTO gene is strongly associated with obesity and BMI in the Chinese population. The risk variant is less common in the Chinese population, but its effect size on BMI is comparable with that in the European population.
Large scale genomics unveil polygenic architecture of human cortical surface area
Little is known about how genetic variation contributes to neuroanatomical variability, and whether particular genomic regions comprising genes or evolutionarily conserved elements are enriched for effects that influence brain morphology. Here, we examine brain imaging and single-nucleotide polymorphisms (SNPs) data from ∼2,700 individuals. We show that a substantial proportion of variation in cortical surface area is explained by additive effects of SNPs dispersed throughout the genome, with a larger heritable effect for visual and auditory sensory and insular cortices (h2∼0.45). Genome-wide SNPs collectively account for, on average, about half of twin heritability across cortical regions (N=466 twins). We find enriched genetic effects in or near genes. We also observe that SNPs in evolutionarily more conserved regions contributed significantly to the heritability of cortical surface area, particularly, for medial and temporal cortical regions. SNPs in less conserved regions contributed more to occipital and dorsolateral prefrontal cortices. How genetic variation contributes to brain morphology is still poorly understood. Here Chen et al. combine brain imaging with single-nucleotide polymorphism data to discover that a substantial degree of cortical variation is derived from underlying genetic differences.
RAD sequencing reveals genomewide divergence between independent invasions of the European green crab (Carcinus maenas) in the Northwest Atlantic
Genomic studies of invasive species can reveal both invasive pathways and functional differences underpinning patterns of colonization success. The European green crab (Carcinus maenas) was initially introduced to eastern North America nearly 200 years ago where it expanded northwards to eastern Nova Scotia. A subsequent invasion to Nova Scotia from a northern European source allowed further range expansion, providing a unique opportunity to study the invasion genomics of a species with multiple invasions. Here, we use restriction‐site‐associated DNA sequencing‐derived SNPs to explore fine‐scale genomewide differentiation between these two invasions. We identified 9137 loci from green crab sampled from 11 locations along eastern North America and compared spatial variation to mitochondrial COI sequence variation used previously to characterize these invasions. Overall spatial divergence among invasions was high (pairwise FST ~0.001 to 0.15) and spread across many loci, with a mean FST ~0.052 and 52% of loci examined characterized by FST values >0.05. The majority of the most divergent loci (i.e., outliers, ~1.2%) displayed latitudinal clines in allele frequency highlighting extensive genomic divergence among the invasions. Discriminant analysis of principal components (both neutral and outlier loci) clearly resolved the two invasions spatially and was highly correlated with mitochondrial divergence. Our results reveal extensive cryptic intraspecific genomic diversity associated with differing patterns of colonization success and demonstrates clear utility for genomic approaches to delineating the distribution and colonization success of aquatic invasive species.
Genetic Risk Score of NOS Gene Variants Associated with Myocardial Infarction Correlates with Coronary Incidence across Europe
Coronary artery disease (CAD) mortality and morbidity is present in the European continent in a four-fold gradient across populations, from the South (Spain and France) with the lowest CAD mortality, towards the North (Finland and UK). This observed gradient has not been fully explained by classical or single genetic risk factors, resulting in some cases in the so called Southern European or Mediterranean paradox. Here we approached population genetic risk estimates using genetic risk scores (GRS) constructed with single nucleotide polymorphisms (SNP) from nitric oxide synthases (NOS) genes. These SNPs appeared to be associated with myocardial infarction (MI) in 2165 cases and 2153 controls. The GRSs were computed in 34 general European populations. Although the contribution of these GRS was lower than 1% between cases and controls, the mean GRS per population was positively correlated with coronary incidence explaining 65–85% of the variation among populations (67% in women and 86% in men). This large contribution to CAD incidence variation among populations might be a result of colinearity with several other common genetic and environmental factors. These results are not consistent with the cardiovascular Mediterranean paradox for genetics and support a CAD genetic architecture mainly based on combinations of common genetic polymorphisms. Population genetic risk scores is a promising approach in public health interventions to develop lifestyle programs and prevent intermediate risk factors in certain subpopulations with specific genetic predisposition.
Genetic Ancestry of Hadza and Sandawe Peoples Reveals Ancient Population Structure in Africa
The Hadza and Sandawe populations in present-day Tanzania speak languages containing click sounds and therefore thought to be distantly related to southern African Khoisan languages. We analyzed genome-wide genotype data for individuals sampled from the Hadza and Sandawe populations in the context of a global data set of 3,528 individuals from 163 ethno-linguistic groups. We found that Hadza and Sandawe individuals share ancestry distinct from and most closely related to Omotic ancestry; share Khoisan ancestry with populations such as ≠Khomani, Karretjie, and Ju/’hoansi in southern Africa; share Niger-Congo ancestry with populations such as Yoruba from Nigeria and Luhya from Kenya, consistent with migration associated with the Bantu Expansion; and share Cushitic ancestry with Somali, multiple Ethiopian populations, the Maasai population in Kenya, and the Nama population in Namibia. We detected evidence for low levels of Arabian, Nilo-Saharan, and Pygmy ancestries in a minority of individuals. Our results indicate that west Eurasian ancestry in eastern Africa is more precisely the Arabian parent of Cushitic ancestry. Relative to the Out-of-Africa migrations, Hadza ancestry emerged early whereas Sandawe ancestry emerged late.
Genetic origins of the Minoans and Mycenaeans
The origins of the Bronze Age Minoan and Mycenaean cultures have puzzled archaeologists for more than a century. We assembled genome-wide data from nineteen ancient individuals, including Minoans from Crete, Mycenaeans from mainland Greece, and their eastern neighbours from southwestern Anatolia. We show that Minoans and Mycenaeans were genetically similar, having at least three quarters of their ancestry from the first Neolithic farmers of western Anatolia and the Aegean1,2, and most of the remainder from ancient populations like those of the Caucasus3 and Iran4,5. However, the Mycenaeans differed from Minoans in deriving additional ancestry from an ultimate source related to the hunter-gatherers of eastern Europe and Siberia6–8, introduced via a proximal source related to either the inhabitants of either the Eurasian steppe1,6,9 or Armenia4,9. Modern Greeks resemble the Mycenaeans, but with some additional dilution of the early Neolithic ancestry. Our results support the idea of continuity but not isolation in the history of populations of the Aegean, before and after the time of its earliest civilizations.
The renal urate transporter SLC17A1 locus: confirmation of association with gout
Two major gout-causing genes have been identified, the urate transport genes SLC2A9 and ABCG2. Variation within the SLC17A1 locus, which encodes sodium-dependent phosphate transporter 1, a renal transporter of uric acid, has also been associated with serum urate concentration. However, evidence for association with gout is equivocal. We investigated the association of the SLC17A1 locus with gout in New Zealand sample sets. Five variants (rs1165196, rs1183201, rs9358890, rs3799344, rs12664474) were genotyped across a New Zealand sample set totaling 971 cases and 1,742 controls. Cases were ascertained according to American Rheumatism Association criteria. Two population groups were studied: Caucasian and Polynesian. At rs1183201 (SLC17A1), evidence for association with gout was observed in both the Caucasian (odds ratio (OR) = 0.67, P = 3.0 × 10-6) and Polynesian (OR = 0.74, P = 3.0 × 10-3) groups. Meta-analysis confirmed association of rs1183201 with gout at a genome-wide level of significance (OR = 0.70, P = 3.0 × 10-8). Haplotype analysis suggested the presence of a common protective haplotype. We confirm the SLC17A1 locus as the third associated with gout at a genome-wide level of significance.
The genetic architecture of pediatric cognitive abilities in the Philadelphia Neurodevelopmental Cohort
The objective of this analysis was to examine the genetic architecture of diverse cognitive abilities in children and adolescents, including the magnitude of common genetic effects and patterns of shared and unique genetic influences. Subjects included 3,689 members of the Philadelphia Neurodevelopmental Cohort, a general population sample of ages 8-21 years who completed an extensive battery of cognitive tests. We used genome-wide complex trait analysis (GCTA) to estimate the SNP-based heritability of each domain, as well as the genetic correlation between all domains that showed significant genetic influence. Several of the individual domains suggested strong influence of common genetic variants (e.g. reading ability, h2g=0.43, p=4e-06; emotion identification, h2g=0.36, p=1e-05; verbal memory, h2g=0.24, p=0.005). The genetic correlations highlighted trait domains that are candidates for joint interrogation in future genetic studies (e.g. language reasoning and spatial reasoning, r(g)=0.72, p=0.007). These results can be used to structure future genetic and neuropsychiatric investigations of diverse cognitive abilities.
Variants near CHRNA3/5 and APOE have age and sex related effects on human lifespan
Lifespan is a trait of enormous personal interest. Research into the biological basis of human lifespan, however, is hampered by the long time to death. Using a novel approach of regressing (272,081) parental lifespans beyond age 40 years on participant genotype in a new large data set (UK Biobank), we here show that common variants near the apolipoprotein E and nicotinic acetylcholine receptor subunit alpha 5 genes are associated with lifespan. The effects are strongly sex and age dependent, with APOE ɛ4 differentially influencing maternal lifespan (P=4.2 × 10−15, effect −1.24 years of maternal life per imputed risk allele in parent; sex difference, P=0.011), and a locus near CHRNA3/5 differentially affecting paternal lifespan (P=4.8 × 10−11, effect −0.86 years per allele; sex difference P=0.075). Rare homozygous carriers of the risk alleles at both loci are predicted to have 3.3–3.7 years shorter lives. Understanding the genetic influences on human aging requires a large number of subjects for a study of sufficient power. Here, Jim Wilson and colleagues use information on parental ages at death to show that common variants near the genes for apolipoprotein E and nicotinic acetylcholine receptor subunit alpha 5 are associated with longer lifespan.
Genomewide meta‐analysis identifies loci associated with IGF‐I and IGFBP‐3 levels with impact on age‐related traits
The growth hormone/insulin‐like growth factor (IGF) axis can be manipulated in animal models to promote longevity, and IGF‐related proteins including IGF‐I and IGF‐binding protein‐3 (IGFBP‐3) have also been implicated in risk of human diseases including cardiovascular diseases, diabetes, and cancer. Through genomewide association study of up to 30 884 adults of European ancestry from 21 studies, we confirmed and extended the list of previously identified loci associated with circulating IGF‐I and IGFBP‐3 concentrations (IGF1, IGFBP3,GCKR,TNS3, GHSR, FOXO3, ASXL2, NUBP2/IGFALS, SORCS2, and CELSR2). Significant sex interactions, which were characterized by different genotype–phenotype associations between men and women, were found only for associations of IGFBP‐3 concentrations with SNPs at the loci IGFBP3 and SORCS2. Analyses of SNPs, gene expression, and protein levels suggested that interplay between IGFBP3 and genes within the NUBP2 locus (IGFALS and HAGH) may affect circulating IGF‐I and IGFBP‐3 concentrations. The IGF‐I‐decreasing allele of SNP rs934073, which is an eQTL of ASXL2, was associated with lower adiposity and higher likelihood of survival beyond 90 years. The known longevity‐associated variant rs2153960 (FOXO3) was observed to be a genomewide significant SNP for IGF‐I concentrations. Bioinformatics analysis suggested enrichment of putative regulatory elements among these IGF‐I‐ and IGFBP‐3‐associated loci, particularly of rs646776 at CELSR2. In conclusion, this study identified several loci associated with circulating IGF‐I and IGFBP‐3 concentrations and provides clues to the potential role of the IGF axis in mediating effects of known (FOXO3) and novel (ASXL2) longevity‐associated loci.
Genetic variant rs17225178 in the ARNT2 gene is associated with Asperger Syndrome
Autism Spectrum Conditions (ASC) are neurodevelopmental conditions characterized by difficulties in communication and social interaction, alongside unusually repetitive behaviours and narrow interests. Asperger Syndrome (AS) is one subgroup of ASC and differs from classic autism in that in AS there is no language or general cognitive delay. Genetic, epigenetic and environmental factors are implicated in ASC and genes involved in neural connectivity and neurodevelopment are good candidates for studying the susceptibility to ASC. The aryl-hydrocarbon receptor nuclear translocator 2 (ARNT2) gene encodes a transcription factor involved in neurodevelopmental processes, neuronal connectivity and cellular responses to hypoxia. A mutation in this gene has been identified in individuals with ASC and single nucleotide polymorphisms (SNPs) have been nominally associated with AS and autistic traits in previous studies. In this study, we tested 34 SNPs in ARNT2 for association with AS in 118 cases and 412 controls of Caucasian origin. P values were adjusted for multiple comparisons, and linkage disequilibrium (LD) among the SNPs analysed was calculated in our sample. Finally, SNP annotation allowed functional and structural analyses of the genetic variants in ARNT2. We tested the replicability of our result using the genome-wide association studies (GWAS) database of the Psychiatric Genomics Consortium (PGC). We report statistically significant association of rs17225178 with AS. This SNP modifies transcription factor binding sites and regions that regulate the chromatin state in neural cell lines. It is also included in a LD block in our sample, alongside other genetic variants that alter chromatin regulatory regions in neural cells. These findings demonstrate that rs17225178 in the ARNT2 gene is associated with AS and support previous studies that pointed out an involvement of this gene in the predisposition to ASC.
Genetic Variants on Chromosome 1p13.3 Are Associated with Non ST Elevation Myocardial Infarction and the Expression of DRAM2 in the Finnish Population
Myocardial infarction (MI) is divided into either ST elevation MI (STEMI) or non-ST elevation MI (NSTEMI), differing in a number of clinical characteristics. We sought to identify genetic variants conferring risk to NSTEMI or STEMI by conducting a genome-wide association study (GWAS) of MI stratified into NSTEMI and STEMI in a consecutive sample of 1,579 acute MI cases with 1,576 controls. Subsequently, we followed the results in an independent population-based sample of 562 cases and 566 controls, a partially independent prospective cohort (N = 16,627 with 163 incident NSTEMI cases), and examined the effect of disease-associated variants on gene expression in 513 healthy participants. Genetic variants on chromosome 1p13.3 near the damage-regulated autophagy modulator 2 gene DRAM2 associated with NSTEMI (rs656843; odds ratio 1.57, P = 3.11 × 10−10) in the case-control analysis with a consistent but not statistically significant effect in the prospective cohort (rs656843; hazard ratio 1.13, P = 0.43). These variants were not associated with STEMI (rs656843; odds ratio, 1.11, P = 0.20; hazard ratio 0.97, P = 0.87), appearing to have a pronounced effect on NSTEMI risk. A majority of the variants at 1p13.3 associated with NSTEMI were also associated with the expression level of DRAM2 in blood leukocytes of healthy controls (top-ranked variant rs325927, P = 1.50 × 10−12). The results suggest that genetic factors may in part influence whether coronary artery disease results in NSTEMI rather than STEMI.
Genomic population structure of freshwater‐resident and anadromous ide (Leuciscus idus) in north‐western Europe
Climate change experts largely agree that future climate change and associated rises in oceanic water levels over the upcoming decades, will affect marine salinity levels. The subsequent effects on fish communities in estuarine ecosystems however, are less clear. One species that is likely to become increasingly affected by changes in salinity is the ide (Leuciscus idus). The ide is a stenohaline freshwater fish that primarily inhabits rivers, with frequent anadromous behavior when sea salinity does not exceed 15%. Unlike most other anadromous Baltic Sea fish species, the ide has yet to be subjected to large‐scale stocking programs, and thus provides an excellent opportunity for studying the natural population structure across the current salinity gradient in the Danish Belts. To explore this, we used Genotyping‐by‐Sequencing to determine genomic population structure of both freshwater resident and anadromous ide populations in the western Baltic Sea region, and relate the results to the current salinity gradient and the demographic history of ide in the region. The sample sites separate into four clusters, with all anadromous populations in one cluster and the freshwater resident populations in the remaining three. Results demonstrate high level of differentiation between sites hosting freshwater resident populations, but little differentiation among anadromous populations. Thus ide exhibit the genomic population structure of both a typical freshwater species, and a typical anadromous species. In addition to providing a first insight into the population structure of north‐western European ide, our data also (1) provide indications of a single illegal introduction by man; (2) suggest limited genetic effects of heavy pollution in the past; and (3) indicate possible historical anadromous behavior in a now isolated freshwater population.
Common variation near ROBO2 is associated with expressive vocabulary in infancy
Twin studies suggest that expressive vocabulary at ~24 months is modestly heritable. However, the genes influencing this early linguistic phenotype are unknown. Here we conduct a genome-wide screen and follow-up study of expressive vocabulary in toddlers of European descent from up to four studies of the EArly Genetics and Lifecourse Epidemiology consortium, analysing an early (15–18 months, ‘one-word stage’, NTotal=8,889) and a later (24–30 months, ‘two-word stage’, NTotal=10,819) phase of language acquisition. For the early phase, one single-nucleotide polymorphism (rs7642482) at 3p12.3 near ROBO2, encoding a conserved axon-binding receptor, reaches the genome-wide significance level (P=1.3 × 10−8) in the combined sample. This association links language-related common genetic variation in the general population to a potential autism susceptibility locus and a linkage region for dyslexia, speech-sound disorder and reading. The contribution of common genetic influences is, although modest, supported by genome-wide complex trait analysis (meta-GCTA h215–18-months=0.13, meta-GCTA h224–30-months=0.14) and in concordance with additional twin analysis (5,733 pairs of European descent, h224-months=0.20). The genetic basis of expressive vocabulary in children around 2 years old is poorly understood. Here, the authors show that a genetic variant near the ROBO2 gene is associated with early language acquisition in the general population and highlight a potential genetic link between language-related common genetic variation and a linkage region for dyslexia, speech-sound disorder and reading.
HOXB13 is a susceptibility gene for prostate cancer: results from the International Consortium for Prostate Cancer Genetics (ICPCG)
Prostate cancer has a strong familial component but uncovering the molecular basis for inherited susceptibility for this disease has been challenging. Recently, a rare, recurrent mutation (G84E) in HOXB13 was reported to be associated with prostate cancer risk. Confirmation and characterization of this finding is necessary to potentially translate this information to the clinic. To examine this finding in a large international sample of prostate cancer families, we genotyped this mutation and 14 other SNPs in or flanking HOXB13 in 2,443 prostate cancer families recruited by the International Consortium for Prostate Cancer Genetics (ICPCG). At least one mutation carrier was found in 112 prostate cancer families (4.6 %), all of European descent. Within carrier families, the G84E mutation was more common in men with a diagnosis of prostate cancer (194 of 382, 51 %) than those without (42 of 137, 30 %), P = 9.9 × 10−8 [odds ratio 4.42 (95 % confidence interval 2.56–7.64)]. A family-based association test found G84E to be significantly over-transmitted from parents to affected offspring (P = 6.5 × 10−6). Analysis of markers flanking the G84E mutation indicates that it resides in the same haplotype in 95 % of carriers, consistent with a founder effect. Clinical characteristics of cancers in mutation carriers included features of high-risk disease. These findings demonstrate that the HOXB13 G84E mutation is present in ~5 % of prostate cancer families, predominantly of European descent, and confirm its association with prostate cancer risk. While future studies are needed to more fully define the clinical utility of this observation, this allele and others like it could form the basis for early, targeted screening of men at elevated risk for this common, clinically heterogeneous cancer. The online version of this article (doi:10.1007/s00439-012-1229-4) contains supplementary material, which is available to authorized users.
A GWAS Study on Liver Function Test Using eMERGE Network Participants
Liver enzyme levels and total serum bilirubin are under genetic control and in recent years genome-wide population-based association studies have identified different susceptibility loci for these traits. We conducted a genome-wide association study in European ancestry participants from the Electronic Medical Records and Genomics (eMERGE) Network dataset of patient medical records with available genotyping data in order to identify genetic contributors to variability in serum bilirubin levels and other liver function tests and to compare the effects between adult and pediatric populations. The process of whole genome imputation of eMERGE samples with standard quality control measures have been described previously. After removing missing data and outliers based on principal components (PC) analyses, 3294 samples from European ancestry were used for the GWAS study. The association between each single nucleotide polymorphism (SNP) and total serum bilirubin and other liver function tests was tested using linear regression, adjusting for age, gender, site, platform and ancestry principal components (PC). Consistent with previous results, a strong association signal has been detected for UGT1A gene cluster (best SNP rs887829, beta = 0.15, p = 1.30x10-118) for total serum bilirubin level. Indeed, in this region more than 176 SNPs (or indels) had p<10−8 spanning 150Kb on the long arm of chromosome 2q37.1. In addition, we found a similar level of magnitude in a pediatric group (p = 8.26x10-47, beta = 0.17). Further imputation using sequencing data as a reference panel revealed association of other markers including known TA7 repeat indels (rs8175347) (p = 9.78x10-117) and rs111741722 (p = 5.41x10-119) which were in proxy (r2 = 0.99) with rs887829. Among rare variants, two Asian subjects homozygous for coding SNP rs4148323 (G71R) were identified. Additional known effects for total serum bilirubin were also confirmed including organic anion transporters SLCO1B1-SLCO1B3, TDRP and ZMYND8 at FDR<0.05 with no gene-gene interaction effects. Phenome-wide association studies (PheWAS) suggest a protective effect of TA7 repeat against cerebrovascular disease in an adult cohort (OR = 0.75, p = 0.0008). Among other liver function tests, we also confirmed the previous effect of the ABO blood group locus for variation in serum alkaline phosphatase (rs579459, p = 9.44x10-15). Taken together, our data present interesting findings with strong confirmation of previous effects by simply using the eMERGE electronic health record phenotyping. In addition, our findings indicate that similar to the adult population, the UGT1A1 is the main locus responsible for normal variation of serum bilirubin in pediatric populations.
Evaluation of Genetic Association of the INK4 Locus with Primary Open Angle Glaucoma in East Indian Population
INK4 locus at chromosome 9p21 has been reported to be associated with primary open angle glaucoma (POAG) and its subtypes along with the associated optic disc parameters across the populations of European, Japanese and African ancestries. The locus encodes three tumor suppressor genes namely CDKN2A, ARF, CDKN2B and a long non-coding RNA CDKN2B-AS1 (also known as ANRIL). Here, we report association study of 34 SNPs from INK4 locus with POAG in a population of Indo-European ancestry from the eastern part of India (350 patients and 354 controls). With 81% power to detect genetic association we observed only nominal association of rs1011970 (uncorrected p = 0.048) with POAG and rs10120688 (uncorrected p = 0.048) in patients without a high intra-ocular pressure (IOP<21 mm of Hg) compared to controls. This study, in contrast to the previous reports, suggests lack of significant genetic association of INK4 locus with POAG in East Indian population which needs to be replicated in larger studies in diverse world populations.
Genomic diversity and differentiation of a managed island wild boar population
The evolution of island populations in natural systems is driven by local adaptation and genetic drift. However, evolutionary pathways may be altered by humans in several ways. The wild boar (WB) (Sus scrofa) is an iconic game species occurring in several islands, where it has been strongly managed since prehistoric times. We examined genomic diversity at 49 803 single-nucleotide polymorphisms in 99 Sardinian WBs and compared them with 196 wild specimens from mainland Europe and 105 domestic pigs (DP; 11 breeds). High levels of genetic variation were observed in Sardinia (80.9% of the total number of polymorphisms), which can be only in part associated to recent genetic introgression. Both Principal Component Analysis and Bayesian clustering approach revealed that the Sardinian WB population is highly differentiated from the other European populations (FST=0.126–0.138), and from DP (FST=0.169). Such evidences were mostly unaffected by an uneven sample size, although clustering results in reference populations changed when the number of individuals was standardized. Runs of homozygosity (ROHs) pattern and distribution in Sardinian WB are consistent with a past expansion following a bottleneck (small ROHs) and recent population substructuring (highly homozygous individuals). The observed effect of a non-random selection of Sardinian individuals on diversity, FST and ROH estimates, stressed the importance of sampling design in the study of structured or introgressed populations. Our results support the heterogeneity and distinctiveness of the Sardinian population and prompt further investigations on its origins and conservation status.
Drug Metabolizing Enzyme and Transporter Gene Variation, Nicotine Metabolism, Prospective Abstinence, and Cigarette Consumption
The Nicotine Metabolite Ratio (NMR, ratio of trans-3’-hydroxycotinine and cotinine), has previously been associated with CYP2A6 activity, response to smoking cessation treatments, and cigarette consumption. We searched for drug metabolizing enzyme and transporter (DMET) gene variation associated with the NMR and prospective abstinence in 2,946 participants of laboratory studies of nicotine metabolism and of clinical trials of smoking cessation therapies. Stage I was a meta-analysis of the association of 507 common single nucleotide polymorphisms (SNPs) at 173 DMET genes with the NMR in 449 participants of two laboratory studies. Nominally significant associations were identified in ten genes after adjustment for intragenic SNPs; CYP2A6 and two CYP2A6 SNPs attained experiment-wide significance adjusted for correlated SNPs (CYP2A6 P ACT=4.1E-7, rs4803381 P ACT=4.5E-5, rs1137115, P ACT=1.2E-3). Stage II was mega-regression analyses of 10 DMET SNPs with pretreatment NMR and prospective abstinence in up to 2,497 participants from eight trials. rs4803381 and rs1137115 SNPs were associated with pretreatment NMR at genome-wide significance. In post-hoc analyses of CYP2A6 SNPs, we observed nominally significant association with: abstinence in one pharmacotherapy arm; cigarette consumption among all trial participants; and lung cancer in four case:control studies. CYP2A6 minor alleles were associated with reduced NMR, CPD, and lung cancer risk. We confirmed the major role that CYP2A6 plays in nicotine metabolism, and made novel findings with respect to genome-wide significance and associations with CPD, abstinence and lung cancer risk. Additional multivariate analyses with patient variables and genetic modeling will improve prediction of nicotine metabolism, disease risk and smoking cessation treatment prognosis.
Association between Genetic Variants in DNA Double Strand Break Repair Pathways and Risk of Radiation Therapy Induced Pneumonitis and Esophagitis in Non Small Cell Lung Cancer
Radiation therapy (RT)-induced pneumonitis and esophagitis are commonly developed side effects in non-small cell lung cancer (NSCLC) patients treated with definitive RT. Identifying patients who are at increased risk for these toxicities would help to maximize treatment efficacy while minimizing toxicities. Here, we systematically investigated single nucleotide polymorphisms (SNPs) within double-strand break (DSB) repair pathway as potential predictive markers for radiation-induced esophagitis and pneumonitis. We genotyped 440 SNPs from 45 genes in DSB repair pathways in 250 stage I–III NSCLC patients who received definitive radiation or chemoradiation therapy, followed by internal validation in 170 additional patients. We found that 11 SNPs for esophagitis and 8 SNPs for pneumonitis showed consistent effects between discovery and validation populations (same direction of OR and reached significance in meta-analysis). Among them, rs7165790 in the BLM gene was significantly associated with decreased risk of esophagitis in both discovery (OR = 0.59, 95% CI: 0.37–0.97, p = 0.037) and validation subgroups (OR = 0.45, 95% CI: 0.22–0.94, p = 0.032). A strong cumulative effect was observed for the top SNPs, and gene-based tests revealed 12 genes significantly associated with esophagitis or pneumonitis. Our results support the notion that genetic variations within DSB repair pathway could influence the risk of developing toxicities following definitive RT in NSCLC.
Trans ethnic study design approaches for fine mapping
Studies that traverse ancestrally diverse populations may increase power to detect novel loci and improve fine-mapping resolution of causal variants by leveraging linkage disequilibrium differences between ethnic groups. The inclusion of African ancestry samples may yield further improvements because of low linkage disequilibrium and high genetic heterogeneity. We investigate the fine-mapping resolution of trans-ethnic fixed-effects meta-analysis for five type II diabetes loci, under various settings of ancestral composition (European, East Asian, African), allelic heterogeneity, and causal variant minor allele frequency. In particular, three settings of ancestral composition were compared: (1) single ancestry (European), (2) moderate ancestral diversity (European and East Asian), and (3) high ancestral diversity (European, East Asian, and African). Our simulations suggest that the European/Asian and European ancestry-only meta-analyses consistently attain similar fine-mapping resolution. The inclusion of African ancestry samples in the meta-analysis leads to a marked improvement in fine-mapping resolution.
Meta analysis of genome wide association studies identifies common susceptibility polymorphisms for colorectal and endometrial cancer near SH2B3 and TSHZ1
High-risk mutations in several genes predispose to both colorectal cancer (CRC) and endometrial cancer (EC). We therefore hypothesised that some lower-risk genetic variants might also predispose to both CRC and EC. Using CRC and EC genome-wide association series, totalling 13,265 cancer cases and 40,245 controls, we found that the protective allele [G] at one previously-identified CRC polymorphism, rs2736100 near TERT, was associated with EC risk (odds ratio (OR) = 1.08, P = 0.000167); this polymorphism influences the risk of several other cancers. A further CRC polymorphism near TERC also showed evidence of association with EC (OR = 0.92; P = 0.03). Overall, however, there was no good evidence that the set of CRC polymorphisms was associated with EC risk, and neither of two previously-reported EC polymorphisms was associated with CRC risk. A combined analysis revealed one genome-wide significant polymorphism, rs3184504, on chromosome 12q24 (OR = 1.10, P = 7.23 × 10−9) with shared effects on CRC and EC risk. This polymorphism, a missense variant in the gene SH2B3, is also associated with haematological and autoimmune disorders, suggesting that it influences cancer risk through the immune response. Another polymorphism, rs12970291 near gene TSHZ1, was associated with both CRC and EC (OR = 1.26, P = 4.82 × 10−8), with the alleles showing opposite effects on the risks of the two cancers.
A Novel, Functional and Replicable Risk Gene Region for Alcohol Dependence Identified by Genome Wide Association Study
Several genome-wide association studies (GWASs) reported tens of risk genes for alcohol dependence, but most of them have not been replicated or confirmed by functional studies. The present study used a GWAS to search for novel, functional and replicable risk gene regions for alcohol dependence. Associations of all top-ranked SNPs identified in a discovery sample of 681 African-American (AA) cases with alcohol dependence and 508 AA controls were retested in a primary replication sample of 1,409 European-American (EA) cases and 1,518 EA controls. The replicable associations were then subjected to secondary replication in a sample of 6,438 Australian family subjects. A functional expression quantitative trait locus (eQTL) analysis of these replicable risk SNPs was followed-up in order to explore their cis-acting regulatory effects on gene expression. We found that within a 90 Mb region around PHF3-PTP4A1 locus in AAs, a linkage disequilibrium (LD) block in PHF3-PTP4A1 formed the only peak associated with alcohol dependence at p<10−4. Within this block, 30 SNPs associated with alcohol dependence in AAs (1.6×10−5≤p≤0.050) were replicated in EAs (1.3×10−3≤p≤0.038), and 18 of them were also replicated in Australians (1.8×10−3≤p≤0.048). Most of these risk SNPs had strong cis-acting regulatory effects on PHF3-PTP4A1 mRNA expression across three HapMap samples. The distributions of −log(p) values for association and functional signals throughout this LD block were highly consistent across AAs, EAs, Australians and three HapMap samples. We conclude that the PHF3-PTP4A1 region appears to harbor a causal locus for alcohol dependence, and proteins encoded by PHF3 and/or PTP4A1 might play a functional role in the disorder.
Joint Testing of Genotypic and Gene Environment Interaction Identified Novel Association for BMP4 with Non Syndromic CL/P in an Asian Population Using Data from an International Cleft Consortium
Non-syndromic cleft lip with or without cleft palate (NSCL/P) is a common disorder with complex etiology. The Bone Morphogenetic Protein 4 gene (BMP4) has been considered a prime candidate gene with evidence accumulated from animal experimental studies, human linkage studies, as well as candidate gene association studies. The aim of the current study is to test for linkage and association between BMP4 and NSCL/P that could be missed in genome-wide association studies (GWAS) when genotypic (G) main effects alone were considered. We performed the analysis considering G and interactions with multiple maternal environmental exposures using additive conditional logistic regression models in 895 Asian and 681 European complete NSCL/P trios. Single nucleotide polymorphisms (SNPs) that passed the quality control criteria among 122 genotyped and 25 imputed single nucleotide variants in and around the gene were used in analysis. Selected maternal environmental exposures during 3 months prior to and through the first trimester of pregnancy included any personal tobacco smoking, any environmental tobacco smoke in home, work place or any nearby places, any alcohol consumption and any use of multivitamin supplements. A novel significant association held for rs7156227 among Asian NSCL/P and non-syndromic cleft lip and palate (NSCLP) trios after Bonferroni correction which was not seen when G main effects alone were considered in either allelic or genotypic transmission disequilibrium tests. Odds ratios for carrying one copy of the minor allele without maternal exposure to any of the four environmental exposures were 0.58 (95%CI = 0.44, 0.75) and 0.54 (95%CI = 0.40, 0.73) for Asian NSCL/P and NSCLP trios, respectively. The Bonferroni P values corrected for the total number of 117 tested SNPs were 0.0051 (asymptotic P = 4.39*10−5) and 0.0065 (asymptotic P = 5.54*10−5), accordingly. In European trios, no significant association was seen for any SNPs after Bonferroni corrections for the total number of 120 tested SNPs. Our findings add evidence from GWAS to support the role of BMP4 in susceptibility to NSCL/P originally identified in linkage and candidate gene association studies.
Functional characterization of a multi cancer risk locus on chr5p15.33 reveals regulation of TERT by ZNF148
Genome wide association studies (GWAS) have mapped multiple independent cancer susceptibility loci to chr5p15.33. Here, we show that fine-mapping of pancreatic and testicular cancer GWAS within one of these loci (Region 2 in CLPTM1L) focuses the signal to nine highly correlated SNPs. Of these, rs36115365-C associated with increased pancreatic and testicular but decreased lung cancer and melanoma risk, and exhibited preferred protein-binding and enhanced regulatory activity. Transcriptional gene silencing of this regulatory element repressed TERT expression in an allele-specific manner. Proteomic analysis identifies allele-preferred binding of Zinc finger protein 148 (ZNF148) to rs36115365-C, further supported by binding of purified recombinant ZNF148. Knockdown of ZNF148 results in reduced TERT expression, telomerase activity and telomere length. Our results indicate that the association with chr5p15.33-Region 2 may be explained by rs36115365, a variant influencing TERT expression via ZNF148 in a manner consistent with elevated TERT in carriers of the C allele. Genetic variants at multiple loci of chr5p15.33 have been associated with susceptibility to numerous cancers. Here the authors show that the association of one of these loci may be explained by a variant, rs36115365, influencing telomerase reverse transcriptase (TERT) expression via ZNF148.
RNA Seq and ChIP Seq reveals SQSTM1/p62 as a key mediator of JunB suppression of NF κB dependent inflammation
Mice with epidermal deletion of JunB transcription factor displayed a psoriasis-like inflammation. The relevance of these findings to humans and the mechanisms mediating JunB function are not fully understood. Here, we demonstrate that impaired JunB function via gene silencing or overexpression of a dominant negative mutant increased human keratinocyte cell proliferation but decreased cell barrier function. RNA-seq revealed over 500 genes affected by JunB loss-of-function which included an upregulation of an array of proinflammatory molecules relevant to psoriasis. Among these were TNFα, CCL2, CXCL10, IL6R and SQSTM1, an adaptor protein involved in NF-κB activation. ChIP-Seq and gene reporter analyses showed that JunB directly suppressed SQSTM1 through binding to a consensus AP-1 cis-element located around 2 Kb upstream of SQSTM1-trasncription start site. Similar to JunB loss-of-function, SQSTM1-overexpression induced TNFα, CCL2 and CXCL10. Conversely, NF-κB-inhibition genetically with a mutant IκBα or pharmacologically with PDTC prevented cytokine, but not IL6R, induction by JunB-deficiency. Taken together, our findings indicate that JunB controls epidermal growth, barrier formation and proinflammatory responses through direct and indirect mechanisms, pinpointing SQSTM1 as a key mediator of JunB-suppression of NF-κB-dependent inflammation.
Evolution and Phylogeny of Large DNA Viruses, Mimiviridae and Phycodnaviridae Including Newly Characterized Heterosigma akashiwo Virus
Nucleocytoplasmic DNA viruses are a large group of viruses that harbor double-stranded DNA genomes with sizes of several 100 kbp, challenging the traditional concept of viruses as small, simple ‘organisms at the edge of life.’ The most intriguing questions about them may be their origin and evolution, which have yielded the variety we see today. Specifically, the phyletic relationship between two giant dsDNA virus families that are presumed to be close, Mimiviridae, which infect Acanthamoeba, and Phycodnaviridae, which infect algae, is still obscure and needs to be clarified by in-depth analysis. Here, we studied Mimiviridae–Phycodnaviridae phylogeny including the newly identified Heterosigma akashiwo virus strain HaV53. Gene-to-gene comparison of HaV53 with other giant dsDNA viruses showed that only a small proportion of HaV53 genes show similarities with the others, revealing its uniqueness among Phycodnaviridae. Phylogenetic/genomic analysis of Phycodnaviridae including HaV53 revealed that the family can be classified into four distinctive subfamilies, namely, Megaviridae (Mimivirus-like), Chlorovirus-type, and Coccolitho/Phaeovirus-type groups, and HaV53 independent of the other three groups. Several orthologs found in specific subfamilies while absent from the others were identified, providing potential family marker genes. Finally, reconstruction of the evolutionary history of Phycodnaviridae and Mimiviridae revealed that these viruses are descended from a common ancestor with a small set of genes and reached their current diversity by differentially acquiring gene sets during the course of evolution. Our study illustrates the phylogeny and evolution of Mimiviridae–Phycodnaviridae and proposes classifications that better represent phyletic relationships among the family members.
Identifying Source Populations and Genetic Structure for Savannah Elephants in Human Dominated Landscapes and Protected Areas in the Kenya Tanzania Borderlands
We investigated the genetic metapopulation structure of elephants across the trans Rift Valley region of Kenya and Tanzania, one of the remaining strongholds for savannah elephants (Loxodonata africana) in East Africa, using microsatellite and mitochondrial DNA (mtDNA) markers. We then examined this population structure to determine the source population for a recent colonization event of savannah elephants on community-owned land within the trans rift valley region. Four of the five sampled populations showed significant genetic differentiation (p<0.05) as measured with both mtDNA haplotypes and microsatellites. Only the samples from the adjacent Maasai Mara and Serengeti ecosystems showed no significant differentiation. A phylogenetic neighbour-joining tree constructed from mtDNA haplotypes detected four clades. Clade four corresponds to the F clade of previous mtDNA studies that reported to have originated in forest elephants (Loxodonta cyclotis) but to also be present in some savannah elephant populations. The split between clade four and the other three clades corresponded strongly to the geographic distribution of mtDNA haplotypes across the rift valley in the study area. Clade four was the dominant clade detected on the west side of the rift valley with rare occurrences on the east side. Finally, the strong patterns of population differentiation clearly indicated that the recent colonists to the community-owned land in Kenya came from the west side of the rift valley. Our results indicate strong female philopatry within the isolated populations of the trans rift valley region, with gene flow primarily mediated via male movements. The recent colonization event from Maasai Mara or Serengeti suggests there is hope for maintaining connectivity and population viability outside formal protected areas in the region.
Ancient Recombination Events between Human Herpes Simplex Viruses
Herpes simplex viruses 1 and 2 (HSV-1 and HSV-2) are seen as close relatives but also unambiguously considered as evolutionary independent units. Here, we sequenced the genomes of 18 HSV-2 isolates characterized by divergent UL30 gene sequences to further elucidate the evolutionary history of this virus. Surprisingly, genome-wide recombination analyses showed that all HSV-2 genomes sequenced to date contain HSV-1 fragments. Using phylogenomic analyses, we could also show that two main HSV-2 lineages exist. One lineage is mostly restricted to subSaharan Africa whereas the other has reached a global distribution. Interestingly, only the worldwide lineage is characterized by ancient recombination events with HSV-1. Our findings highlight the complexity of HSV-2 evolution, a virus of putative zoonotic origin which later recombined with its human-adapted relative. They also suggest that coinfections with HSV-1 and 2 may have genomic and potentially functional consequences and should therefore be monitored more closely.
Pestivirus infection in cattle dairy farms: E2 glycoprotein ELISA reveals the presence of bovine viral diarrhea virus type 2 in northwestern Italy
Bovine viral diarrhea virus (BVDV) types 1 and 2 are members of the Pestivirus genus of the Flaviviridae family. This genus also includes the HoBi-like virus, tentatively classified as BVDV type 3. BVDV-1 is widely distributed in Italy despite the extensive use of BVDV-1-based vaccines, while BVDV-2 and HoBi-like Pestivirus have been detected occasionally. Monitoring the occurrence of sporadic or atypical pestiviruses is a useful approach to evaluate the need for additional vaccine strains that can be used in BVDV control programs. In this study we developed a multiwell antibody ELISA based on the recombinant E2 protein of the three bovine pestiviruses. We evaluated the assay’s applicability for surveillance purposes using pooled milk samples, each prepared from a maximum of 35 lactating cows and collected from 176 dairy herds. As expected, the majority of the pooled samples reacted to a greater extent against the BVDV-1 E2 antigen. All three milk pools from a single farm reacted to the BVDV-2 antigen, however. Further analysis using spot tests, antigen detection, and sequence analysis of the 5′-UTR region confirmed the presence of five persistently infected calves carrying a BVDV-2a strain. This study highlights for the first time that sporadic circulation of BVDV-2 can be predicted by immunoenzymatic methods in the absence of specific vaccination. The online version of this article (10.1186/s12917-017-1305-z) contains supplementary material, which is available to authorized users.
A new species of blunt headed vine snake (Colubridae, Imantodes) from the Chocó region of Ecuador
We describe a new species of Imantodes from the Chocó region of northwestern Ecuador. The new species differs most significantly from all other congeners in lacking a loreal scale. We analyze the phylogenetic relationships among species of Imantodes based on two mitochondrial genes, and postulate that the new species and Imantodes lentiferus are sister taxa. A key to the species of Imantodes from Ecuador is presented.