CD-HIT pipeline

CD-HIT specifications

Information


Unique identifier OMICS_05157
Name CD-HIT
Alternative name CD-HIT-EST
Software type Package/Module
Interface Command line interface
Restrictions to use None
Operating system Unix/Linux
License GNU General Public License version 2.0
Computer skills Advanced
Stability Stable
Maintained Yes

Download


Versioning


Add your version

Information


Unique identifier OMICS_05157
Name CD-HIT
Alternative name CD-HIT-EST
Interface Web user interface
Restrictions to use None
Computer skills Basic
Stability Stable
Maintained Yes

Publications for CD-HIT

CD-HIT IN pipelines

 (142)
2018
PMCID: 5759245
PMID: 29310597
DOI: 10.1186/s12864-017-4379-x

[…] analyses we performed comparing the orthogroup distribution size of the current salmon assembly (at all four filtering steps: unfiltered, after transdecoder single-best orf prediction, after cd-hit clustering at 100% identity and after trinity full-length transcript analysis (e.g. final version)) against the ncbi atlantic salmon refseq proteins. given the high quality of the recently […]

2018
PMCID: 5769438
PMID: 29335005
DOI: 10.1186/s40168-017-0387-y

[…] resfinder [29], and bacmet [30]. these antibiotic resistance databases were combined within a non-redundant set. proteins were clustered in protein families by homology, using cd-hit with parameters of 80% identity and 80% coverage. each protein family was aligned by muscle v. 3.7 [31] with default parameters and a hidden markov model (hmm) was built for each family […]

2018
PMCID: 5790007
PMID: 29382867
DOI: 10.1038/s41598-018-20135-4

[…] was >70% of sequence identity on >10% of the sequence length. any sequence below these thresholds was considered a novel gene and added to the list. the final list of genes was filtered using cd-hit39 with a sequence identity cut-off of 90% nucleotide identity. a total of 4,798 unique genes were discovered and their prevalence was examined in all genomes from this study. functional […]

2018
PMCID: 5851622
PMID: 29538447
DOI: 10.1371/journal.pone.0194335

[…] protein (http://www.ncbi.nlm.nih.gov/protein) sequence database using keyword search “eif2ak” and sequences are manually curated. identical sequences in the retrieved sequences were removed using cd-hit software[34] with default tuple size and a cutting threshold of sequence identity of 95%., a multiple sequence alignment (msa) is obtained based on final dataset of eif2ak sequences trimmed […]

2018
PMCID: 5889037
PMID: 29365169
DOI: 10.1093/molbev/msy010

[…] define unique monomers without shifting sequences to occupy similar registers, we took all tandem repeats corresponding to the major peak and subjected them to local alignment-based clustering using cd-hit-est (li and godzik 2006) with the following parameters: -c 0.8 -bak 1 -m 0 -d 0 -n 4 -g 0 -a 43., for each species, cd-hit-est-reported consensus sequences for clusters containing at least 1% […]

CD-HIT institution(s)
Center for Research in Biological Systems, University of California San Diego, La Jolla, CA, USA

CD-HIT reviews

star_border star_border star_border star_border star_border
star star star star star

Be the first to review CD-HIT