Computational protocol: Identification of Human Proteins That Modify Misfolding and Proteotoxicity of Pathogenic Ataxin-1

[…] The EASE program was used to identify over-represented functional categories (Gene Ontology annotation and KEGG pathways) among the identified 21 YFP-ATXN1Q82NT toxicity modifiers. The modifier genes were compared to the human genome; a Fisher's exact test was performed to identify over-represented functional categories. We only considered categories with adjusted p-values<0.05 (adjusted with the Bonferroni method) as statistically significant. For the prediction of coiled-coil domains (CC) in amino acid sequences of modifier proteins the COILS program was used . Only high probability CC sequences (0.8–1) were considered for further investigations. We used Chi-square tests to compare the number of YFP-ATXN1Q82NT toxicity modifiers with CC domains to the human proteome (73,427 protein sequences in SwissProt DB, 20.3%).To identify glutamine-rich (Q-rich) or polyglutamine (polyQ) proteins, a window of 50 amino acids was passed over each protein sequence. For each window position the fraction of Qs amongst all residues or the number of consecutive Qs was counted. A protein was considered as Q-rich when it contains >50% Qs in a stretch of at least 10 amino acids; in comparison a protein was defined as a polyQ protein when it contains a stretch of >10 consecutive Qs.Sequence homologues of human Pum1 and MED15 were selected using the NCBI BLAST sequence search program . For multiple sequence alignments we used the EBI's MUSCLE web tool . The local glutamine composition of aligned sequences was determined by shifting a window of 50 amino acids over all sequences and by counting the fraction of Qs amongst all residues for each window position (not considering gaps). Additionally, the fraction of Qs in an uninterrupted polyQ stretch (>5) was counted for each protein. This information was used to compare the Q and polyQ content of homologous proteins. […]

