Identifies artefactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. cleanUpdTSeq classifies flanking 3’ ends derived from oligo-dT-based sequencing as true or false/internally primed. It is highly accurate, outperforms previous heuristic filters and facilitates identification of novel polyadenylation sites. The naïve Bayes classifier recalled 92.2% of True Negatives and 93.8% of True Positives, while it incorrectly categorized only 3.2% of predicted positives.
Predicts potential PAS-strong, PAS-weak and PAS-less cleavage/poly(A) sites in human sequences by linear discriminant function (LDF) combining characteristics describing functional motifs (polyadenylation signal [PAS]; cleavage site [CS], motif; GU/U motif) and oligonucleotide composition upstream and/or downstream of these sites. In tests, POLYAR shows high accuracy of prediction of the PAS-strong poly(A) sites, though this program's efficiency in searching for PAS-weak and PAS-less poly(A) sites is not very high but is comparable to other available programs.
