CNV identification software tools | Genomic array data analysis
Copy number variants (CNVs) create a major source of variation among individuals and populations. Array-based comparative genomic hybridisation (aCGH) is a powerful method used to detect and compare the copy numbers of DNA sequences at high resolution along the genome.
A fully open-source set of tools to detect and report SNP genotypes, common Copy-Number Polymorphisms (CNPs), and novel, rare, or de novo CNVs in samples processed with the Affymetrix platform. While most of the components of the suite can be run individually (for instance, to only do SNP genotyping), the Birdsuite is especially intended for integrated analysis of SNPs and CNVs.
Detects copy number variations (CNVs) with high resolution. PennCNV is an integrated hidden Markov model (HMM) method that incorporates the population allele frequency for each single nucleotide polymorphism (SNP) and the distance between adjacent SNPs. This application was developed specifically for data generated on the Illumina Infinium platform, but it can be extended to other similar SNP genotyping platforms.
Identifies copy number variations (CNVs) from raw copy number. COKGEN consists in a configurable platform for CNV identification that allows users to: (1) adjust the parameters of our default formulation to tune the behavior of the method to the target application; and (2) specify their own target objective functions and tune parameters to emphasize relative importance of different objective criteria. The software has been tested on Affymetrix 6.0 array data from 270 HapMap individuals.
A tool for accurate and reliable high-throughput detection of copy number variation in the human genome. CNVfinder algorithm was trained using a series of replicate hybridizations of varying quality and using independently verified CNVs to maximize the number of calls while keeping false positives to <5%. Importantly, CNVfinder made more consistent calls across arrays with different ratio variance than SWarray.
A package for array-based CNV (Copy Number Variation) analysis which is designed to control the FDR (False Discovery Rate) while ensuring high sensitivity. For controlling the FDR, we propose a probabilistic latent variable model, cn.FARMS, which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. In experiments, cn.FARMS outperformed its competitors both with respect to FDR and sensitivity, i.e. has fewer false positives while detecting more true CNVs. The reduced FDR increases the discovery power of studies and avoids that researchers are misguided by spurious correlations between CNVs and diseases.
A method that can infer genotypes in case-control data sets for deletion CNVs, or SNPs with an extra, untyped allele at a high-resolution single SNP level. By accounting for linkage disequilibrium (LD), as well as intensity data, calling accuracy is improved. TriTyper uses raw intensity data from the Illumina genotyping platform to identify SNPs with an extra untyped, but common allele.
A comprehensive analysis platform for the processing, analysis and visualization of structural variation based on sequencing data or genomic microarrays, enabling the rapid identification of disease loci or genes. Vivar allows you to scale your analysis with your work load over multiple (cloud) servers, has user access control to keep your data safe but still easy to share, and is easy expandable as analysis techniques advance.
A package for the automatic detection of breakpoints from array CGH profile, and the assignment of a status to each chromosomal region. The breakpoint detection step of GLAD is based on the Adaptive Weights Smoothing (AWS) procedure and provides highly convincing results: our algorithm detects 97, 100 and 94% of breakpoints in simulated data, karyotyping results and manually analyzed profiles, respectively. The percentage of correctly assigned statuses ranges from 98.9 to 99.8% for simulated data and is 100% for karyotyping results.
A fast and accurate algorithm for assigning single nucleotide polymorphism (SNP) genotypes to microarray data from the Illumina BeadArray technology. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls.
Implements a multilevel model adjusting for batch effects and providing allele-specific estimates of copy number. The CRLMM algorithm estimates genotypes through a hierarchical model for the log ratios of A:B intensities that accounts for the dependency on intensity strength, batch effects, and the uncertainty of parameters estimated from the training step. For each platform design supported by CRLMM, we provide one annotation package that contains parameters estimated from the training data for every SNP-genotype combination.
A web-based tool that applies a number of popular algorithms to a single array CGH profile entered by the user. CGHweb generates a heatmap panel of the segmented profiles for each method as well as a consensus profile. The clickable heatmap can be moved along the chromosome and zoomed in or out. It also displays the time that each algorithm took and provides numerical values of the segmented profiles for download. The web interface calls algorithms written in the statistical language R.
A genotype-calling algorithm for Illumina arrays that uses both SNP-wise and sample-wise calling to more accurately ascertain genotypes at rare, low-frequency and common variants, even when genotype intensity clouds are shifted from their expected positions. optiCall works by first taking a random subset of intensity measures, both within and across samples. The subset is used to define regions of high probability for the three genotype classes. Genotypes are then called on a per SNP basis, with all samples overlaid onto the probability regions, which are incorporated as a data-derived.prior during clustering. In this way common variants are seen as three clouds in a per SNP view, and rare variants are called based on the intensity region in which they fall.
Fits a non-homogeneous hidden Markov model to the aCGH data using Markov chain Monte Carlo with Reversible Jump, and returns the probability that each probe is gained or lost. Using these probabilites, recurrent regions (over sets of individuals) of copy number alteration can be found.