Microsatellite identification software tools | Whole-genome sequencing data analysis
Short tandem repeats (STRs), or microsatellites, are the class of repeat sequences that have repeat units of up to 6 bp directly adjacent to each other. STRs are generally more polymorphic than other kinds of variation such as sequence copy number and single-nucleotide polymorphisms. The length variability of STRs is associated with phenotypic variation in many species. These disorders are commonly caused by repeat expansion. Analysing the variation of STRs, and particularly long STRs, is an important step to understand their variability across individuals and the mechanisms that lead to their instability.
Screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program.
Detects perfect microsatellites and compound microsatellites in nucleotide sequences. MISA can predict perfect compound microsatellites that contains multiple occurrences of more than one simple sequence motif. This software is based on two Perl scripts that serves as interface modules for the program-to-program data interchange to design primers flanking of the microsatellite loci. It can exploit the NCBI database to find sequences by defining the corresponding accession numbers as input.
Detects microsatellite arrays, design primers, and tag primers using an automated routine. msatcommander locates microsatellite arrays within user-selected repeat classes by making correspond regular expression pattern within each DNA sequence. It employs alphabetical, noncomplementary designation, as well as repeat sequences to discover repeat sequences. This tool considers only primer pairs when they are at least 10-bp distant from the start and stop positions of the detected array.
Finds all perfect simple sequence repeats (SSRs) in a given sequence. SSRIT provides a web app and a standalone version. This searching routine can be used to identify SSRs in different types of genomic DNA sequences, varying in size from several hundred nucleotides (BAC-end reads) up to 1 Mb of long contigs assembled from fully sequenced Bacterial Artificial Chromosome (BAC) and P1-derived Artificial Chromosome (PAC). It needs a sequence in FASTA format.
Automates the process of genotyping microsatellite repeats in Huntington disease (HD) data. ScaleHD is a pipeline designed to be used for large-scale automated genotyping of HTT GAC/CCG repeat parallel sequencing data. It performs quality control, sequence alignment and genotyping on all file pairs presented by the user as input. The pipeline consists of three main stages: sequence quality control (SeqQC), sequence alignment (SeqALN) and automated genotyping (GType).
Permits users to automatically discover structural variations (SVs). Tardis is a toolkit that integrates read pair, read depth, and split read (using soft clipped mappings) sequence signatures to discover several types of SV, while resolving ambiguities among different putative SVs. This application is suitable for cloud use as the memory footprint is low. It is also capable of characterizing deletions, small novel insertions, tandem duplications, inversions, and mobile element retrotransposition.
Performs short tandem repeat (STR) profiling in whole-genome sequencing data sets. lobSTR is an algorithm that consists of three steps: it (1) scans genomic libraries, flags informative reads that fully encompass STR loci, and characterizes their STR sequence, (2) uses a divide-and conquer strategy that anchors the nonrepetitive flanking regions of STR reads to the genome for revealing the STR position and length, and finally it (3) allelotypes the STRs.