Identify sex-specific marker sequences. SEXCMD allows users to (i) design sex-specific marker sequences, (ii) train using a known dataset, and (iii) make an optimal sex marker sequence selection. The pipeline can be run to genome and transcriptome sequencing without aligning next generation sequencing (NGS) reads onto a reference genome but by using tens of sex-specific marker sequences from syntenic regions of the sex chromosomes.
Explores and analyzes genetic data. Hail can generate variant annotations like call rate, Hardy-Weinberg equilibrium p-value, and population-specific allele count, generate sample annotations like mean depth, imputed sex, and TiTv ratio. It can find Mendelian violations in trios, analyze genetic similarity between samples via the GRM and IBD matrix, and compute sample scores and variant loadings using PCA. The software performs association analyses with phenotypes and covariates using linear and logistic regression.
Accepts called genotype data and jointly considers information on the X and Y chromosomes. seXY is a logistic regression model trained on both X chromosome heterozygosity and Y chromosome missingness, that consistently demonstrated >99.5% sex inference accuracy in cross-validation for 889 males and 5,361 females enrolled in prostate cancer and ovarian cancer genome-wide association studies (GWAS). Compared to PLINK, one of the most popular tools for sex inference in GWAS that assesses only X chromosome heterozygosity, seXY achieved marginally better male classification and 3% more accurate female classification.