Similar protocols

Protocol publication

[…] all samples, their usage in the samples with insufficient evidence is calculated as for all other sites, but setting the usage to 0 in cases in which the upstream coverage in the specific sample was lower than the downstream coverage. The resulting values are taken as raw estimates of usage of individual poly(A) sites and usage relative to the total from poly(A) sites in a given terminal exon are obtained., To obtain library size normalized expression counts, raw expression values from all quantified sites of a given sample are summed. Each raw count is divided by the summed counts (i.e., the library size) and multiplied by 106, resulting in expression estimates as reads per million (RPM)., PAQR is composed of three modules: 1) a script to infer transcript integrity values based on the method described in a previous study []—the script builds on the published software which is distributed as part of the Python RSeQC package version 2.6.4 []; 2) a script to create the coverage profiles for all considered terminal exons—this script relies on the HTSeq package version 0.6.1 []; and 3) a script to obtain the relative usage together with the estimated expression of poly(A) sites with sufficient evidence of usage., All scripts, intermediate steps, and analysis of the TCGA data sets were executed as workflows created with snakemake version 3.13.0 []., KAPAC, standing for k-mer activity on polyadenylation site choice, aims to identify k-mers that can explain the change in PAS usage observed across samples. For this, we model the relative change in PAS usage within terminal exons (with respect to the mean across samples) as a linear function of the occurrence of a specific k-mer and the unknown “activity” of this k-mer. Note that by modeli […]

Pipeline specifications

Software tools PAQR, RSeQC, HTSeq