SeqClean statistics

info info

Citations per year

Number of citations per year for the bioinformatics software tool SeqClean

Tool usage distribution map

This map represents all the scientific publications referring to SeqClean per scientific context
info info

Associated diseases

This word cloud represents SeqClean usage per disease context

Popular tool citations

chevron_left Sequence contamination detection chevron_right
Want to access the full stats & trends on this tool?


SeqClean specifications


Unique identifier OMICS_18156
Name SeqClean
Software type Package/Module
Interface Command line interface
Restrictions to use None
Input data A DNA sequence file.
Input format FASTA
Operating system Unix/Linux
Programming languages Perl
Computer skills Advanced
Stability Stable
blastall, megablast
Maintained Yes




No version available



  • person_outline Geo Pertea
  • person_outline PeiIng Hwang
  • person_outline Yi-An Chen
  • person_outline Chang-Chun Lin
  • person_outline Chin-Di Wang

Publication for SeqClean

SeqClean citations


Transcriptome analysis in whole blood reveals increased microbial diversity in schizophrenia

PMCID: 5943399
PMID: 29743478
DOI: 10.1038/s41398-018-0107-9

[…] number of reads may affect microbial profiling, we performed normalization by sub-sampling to 100,000 reads for each sample. Next, we filtered out low-quality and low-complexity reads using FASTX and SEQCLEAN (see urls). Finally, the remaining reads were realigned to the human references using the Megablast aligner in order to exclude any potentially human reads. The remaining reads were used as c […]


ROP: dumpster diving in RNA sequencing to find the source of 1 trillion reads across diverse adult human tissues

Genome Biol
PMCID: 5857127
PMID: 29548336
DOI: 10.1186/s13059-018-1403-7

[…] quality < 30 in at least 75% of their base pairs, were identified by FASTX (v 0.0.13). Low-complexity reads, defined as reads with sequences of consecutive repetitive nucleotides, were identified by SEQCLEAN. As a part of the quality control, we also excluded unmapped reads mapped onto the rDNA repeat sequence (HSU13369 Human ribosomal DNA complete repeating unit) (BLAST+ 2.2.30). Starting from n […]


Functional Characteristics of the Flying Squirrel's Cecal Microbiota under a Leaf Based Diet, Based on Multiple Meta Omic Profiling

Front Microbiol
PMCID: 5758534
PMID: 29354108
DOI: 10.3389/fmicb.2017.02622

[…] ed to remove exact duplicates (possible artifacts from emulsion PCR) using CD-hits (Li and Godzik, ), and low-quality parts (Phred quality < 25, N content > 3%, sequences < 100 bp) were trimmed using SeqClean ( After quality control, a total of 569,349 metagenome reads and 483,241 metatranscriptome reads (both with average length > 300 bp) were used for bioinformat […]


Histological, hormonal and transcriptomic reveal the changes upon gibberellin induced parthenocarpy in pear fruit

PMCID: 5798812
PMID: 29423231
DOI: 10.1038/s41438-017-0012-z

[…] he remaining high quality reads were filtered for short reads below 50 bp. Adaptors were first trimmed, and then reads were further assembled by GS de novo assembler (v2.6). Singletons cleaning using Seqclean and lucy with a parameter of minimum length 100 bp Illumina hiseq reads produced in paired-end formats (101 bp) were also assembled using the Trinity software package. Reads were filtered and […]


Phylogeny and putative virulence gene analysis of Bartonella bovis

PMCID: 5938196
PMID: 29311425
DOI: 10.1292/jvms.17-0448

[…] ed for downstream analysis. The assembled contigs were subjected to left over adapters and contaminant filter against NCBI univec database via seq clean (default parameter) ( The whole genome sequencing project has been submitted to the GenBank database (accession no: MWVG00000000). […]


An integrated and comparative approach towards identification, characterization and functional annotation of candidate genes for drought tolerance in sorghum (Sorghum bicolor (L.) Moench)

BMC Genet
PMCID: 5741957
PMID: 29273003
DOI: 10.1186/s12863-017-0584-5

[…] ne annotations against EGMs (Fig. and Additional file : Table S6).A total of 209,835 TIGR transcripts DRESTs and 10,619 UniGene clusters (Additional file : Table S1) were cleaned by a program called SeqClean and then aligned to the sorghum genome using the PASA pipeline. The main input parameter for SeqClean was a transcript fasta file, but with the vector sequence database, the cleaning process […]

Want to access the full list of citations?
SeqClean institution(s)
Bioinformatics Core Laboratory, Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan; Lab of Mathematics in Biology, Institute of Statistical Sciences, Academia Sinica, Taipei, Taiwan

SeqClean reviews

star_border star_border star_border star_border star_border
star star star star star

Be the first to review SeqClean