Computational protocol: Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics

Similar protocols

Protocol publication

[…] We have employed seven benchmarking datasets (refer to TableĀ ) to assess the performance of eight amino acid mutation impact prediction methods: GERP++, fitCons, SIFT, PolyPhen, CADD, Condel, REVEL and fathmm. These datasets contain variants classified as deleterious (likely significant effect on protein function) or benign (unlikely significant effect on protein function). We have used a variety of datasets that can be broadly categorised into two classes: (1) variants sourced from disease variation catalogues and (2) variants sourced from molecular functional analysis experiments. FigureĀ  depicts the overlaps between parent datasets used in this study. [...] For consistency, Ensembl Variant Effect Predictor (VEP) [] was employed to convert all variant datasets into variant call format (VCF), using their HGVS amino acid mutation notations as inputs. The resultant VCF files were then annotated using VEP and SnpEff []. Condel and fathmm scores were annotated using the VEP custom annotation tools based on precalculated scores available from FannsDB (http://bg.upf.edu/fannsdb). Similarly, GERP++ and fitCons conservation scores were annotated using custom BED files. CADD scores were annotated using CADD v1.2 (http://cadd.gs.washington.edu/download). […]

Pipeline specifications

Software tools GERP, SIFT, PolyPhen, CADD, Condel, REVEL, FATHMM, VEP, SnpEff
Application WGS analysis
Organisms Homo sapiens
Chemicals Amino Acids