TY - JOUR
T1 - Identification of repetitive units in protein structures with ReUPred
AU - Hirsh, Layla
AU - Piovesan, Damiano
AU - Paladin, Lisanna
AU - Tosatto, Silvio C.E.
N1 - Publisher Copyright:
© 2016, Springer-Verlag Wien.
PY - 2016/6/1
Y1 - 2016/6/1
N2 - Over the last decade, numerous studies have demonstrated the fundamental importance of tandem repeat (TR) proteins in many biological processes. A plethora of new repeat structures have also been solved. The recently published RepeatsDB provides information on TR proteins. However, a detailed structural characterization of repetitive elements is largely missing, as repeat unit annotation is manually curated and currently covers only 3 % of the bona fide TR proteins. Repeat Protein Unit Predictor (ReUPred) is a novel method for the fast automatic prediction of repeat units and repeat classification using an extensive Structure Repeat Unit Library (SRUL) derived from RepeatsDB. ReUPred uses an iterative structural search against the SRUL to find repetitive units. On a test set of solenoid proteins, ReUPred is able to correctly detect 92 % of the proteins. Unlike previous methods, it is also able to correctly classify solenoid repeats in 89 % of cases. It also outperforms two recent state-of-the-art methods for the repeat unit identification problem. The accurate prediction of repeat units increases the number of annotated repeat units by an order of magnitude compared to the sequence-based Pfam classification. ReUPred is implemented in Python for Linux and freely available from the URL: http://protein.bio.unipd.it/reupred/.
AB - Over the last decade, numerous studies have demonstrated the fundamental importance of tandem repeat (TR) proteins in many biological processes. A plethora of new repeat structures have also been solved. The recently published RepeatsDB provides information on TR proteins. However, a detailed structural characterization of repetitive elements is largely missing, as repeat unit annotation is manually curated and currently covers only 3 % of the bona fide TR proteins. Repeat Protein Unit Predictor (ReUPred) is a novel method for the fast automatic prediction of repeat units and repeat classification using an extensive Structure Repeat Unit Library (SRUL) derived from RepeatsDB. ReUPred uses an iterative structural search against the SRUL to find repetitive units. On a test set of solenoid proteins, ReUPred is able to correctly detect 92 % of the proteins. Unlike previous methods, it is also able to correctly classify solenoid repeats in 89 % of cases. It also outperforms two recent state-of-the-art methods for the repeat unit identification problem. The accurate prediction of repeat units increases the number of annotated repeat units by an order of magnitude compared to the sequence-based Pfam classification. ReUPred is implemented in Python for Linux and freely available from the URL: http://protein.bio.unipd.it/reupred/.
KW - Protein classification
KW - Repeat protein
KW - Structure prediction
UR - http://www.scopus.com/inward/record.url?scp=84959159515&partnerID=8YFLogxK
U2 - 10.1007/s00726-016-2187-2
DO - 10.1007/s00726-016-2187-2
M3 - Article
C2 - 26898549
AN - SCOPUS:84959159515
SN - 0939-4451
VL - 48
SP - 1391
EP - 1400
JO - Amino Acids
JF - Amino Acids
IS - 6
ER -