A large portion of proteins contain repetitive motifs, which are generated by internal duplications and frequently correspond to structural and functional units of proteins. Many repetitions in protein sequences can be identified by using different approaches. Several repeat families have been studied so far due to their relevance in different biological processes such as health, neurodevelopment and protein engineering, to name just a few. An open question regarding repeat proteins is the existence of other common structures that may have gone undetected. After all, the most common way to detect repeat families so far was to manually annotate the sequence family first and only afterwards visually recognize their structural repetitiveness. Such an approach is obviously difficult when dealing with the entire Protein Data Bank (PDB), especially considering the many uncharacterized protein structures deposited by the main structural genomics consortia. The systematic description of repeat structures becomes a question of using automated methods to detect them in protein structures.

(Di Domenico et al., 2017) RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res.

