Domain Identification Methods

A domain maintains the overall structural characteristics of the whole protein (compactness, hydrophobic core, autonomous folding) and, in most cases, performs its function even after being removed from the context of the protein. Decomposing proteins into simpler, modular building blocks helps realizing that proteins are the result of the combinatorial rearrangement of domains, and it shifts the focus of structural and functional analysis from proteins to domains. Therefore, the non-ambiguous identification of biologically relevant domains is a key issue for a proteomics strategy fully aware of the intrinsic modular nature of proteins.

When the three-dimensional structure of a protein is available, domains can be moderately easily spotted by eye. Manual inspection, however, is affected by two major drawbacks: (1) experts do not always fully agree on domain definitions, even after careful cross-inspection, and domain boundaries may remain concealed with the mist of subjectivity; (2) human processing of structures has become quickly unpractical, as the amount of data from crystallographic studies has been growing at a considerable pace for years. However, automatic methods produce uniform and reproducible definitions and can deal with the increasing volume of available data without much difficulty. Some objective criteria are required to be able to split proteins into substructures in an automated fashion. The most widely used criterion is compactness: from the definition of domains as small, compact structural units, it follows that residue-residue contacts within a domain should be more numerous than the contacts with the rest of the protein. Since compactness is not enough for reliable domain assignment, other criteria are evaluated, such as integrity of secondary structures, domain size and domain fragmentation.

As is common when we consider trusting computational and automatic approaches, doubts arise about the performance and accuracy of such methods. A thorough comparison of four domain assignment algorithms was performed by Veretnik et al. (2004) and Holland et al. (2006). The authors came to the conclusion that each method has its own strengths and weaknesses, and they partially complement each other. None of them is able to match the performance of expert-based methods, although good results can still be achieved in many cases, and significant improvements may come from a meta-method merging multiple methods, summing the benefits and hopefully limiting the defects of each method considered separately. Finally, the very concept of domain may require some degree of flexibility to best accommodate the needs of the context we are in: if we focus on evolution, what we will care about is the part of the domain that is conserved across different proteins and species; if we are more interested in structural features, we will define a domain simply as a compact folding subunit; from a functional point of view, we may regard as domain only the part of the protein involved in performing the function (e.g., for interaction domains, binding). Each definition is correct in its particular context, though they may not always provide coincident domain boundaries.

0 0

Post a comment