Automated domain annotation is an important tool for structural informatics. These pipelines typically involve searching query sequences against hidden Markov model (HMM) profiles, yielding matches to profiles for various domains.… Click to show full abstract
Automated domain annotation is an important tool for structural informatics. These pipelines typically involve searching query sequences against hidden Markov model (HMM) profiles, yielding matches to profiles for various domains. However, domain annotation can be ambiguous or inaccurate when proteins contain domains with non‐contiguous residue ranges, and especially when insertional domains are hosted within them. Here, we present DomainMapper, an algorithm that accurately assigns a unique domain structure annotation to a query sequence, including those with complex topologies. We validate our domain assignments using the AlphaFold database and confirm that non‐contiguity is pervasive (10.74% of all domains in yeast and 4.52% in human). Using this resource, we find that certain folds have strong propensities to be non‐contiguous or insertional across the Tree of Life. DomainMapper is freely available and can be ran as a single command‐line function.
               
Click one of the above tabs to view related content.