Methods to Detect the Specificity of Interaction Modules

In trying to unveil the rules governing the mechanisms of interaction between protein modules and peptides, one would like to explore the sequence space as exhaustively as possible, ideally testing the binding of all representatives of each domain family occurring in the proteome of a living cell to all possible amino acid sequences of reasonable length. However, this brute force strategy is not feasible due to the technical limitations of current techniques. Fortunately, the impossibility to reach perfect generality and completeness in searching potential ligands may be partially overcome by taking into account some a priori knowledge about the binding determinants for any given domain. Capitalizing on the characterization of a large number of target peptides and on the detailed structural information contained in high-resolution three-dimensional structures of interaction modules in complex with their ligand peptides, the sequence characteristics of binding peptides can be determined and confirmed by mutagenic analysis.

This information defines a biased sequence space where a few pre-determined positions contain specific amino acids, while the others are allowed to vary in a combinatorial fashion. Thus, the new search space is reduced in size and can be explored experimentally with currently available techniques. This approach may be concretized by constructing an "oriented peptide library," a biased collection of peptides of degenerate sequence, but having fixed amino acids in the "orienting" positions (e.g., for SH2 domains, the phospho-tyrosine residue would be the orienting amino acid, whereas for SH3 domains the two prolines in the PxxP motif) (Yaffe and Cantley 2000). The peptide mixture is then incubated with the domain of interest. Subsequent sequencing of the adsorbed peptides allows determining the positions showing enrichment for any particular amino acid. Although very powerful in theory, this approach has not been widely employed, mainly because of the technical expertise required to perform complicated peptide biochemistry (Santonico et al. 2005).

An alternative approach is based on phage display (Scott and Smith 1990): a library containing a large number (in the range of 109-1010) of short (10-15 amino acids) random peptides is displayed on bacteriophage capsids and is panned against a domain used as bait. The peptides having affinity with the domain can be purified, and their sequence can be easily obtained by sequencing the DNA of the capsid gene. It is then possible to derive a consensus sequence by aligning a reasonable number of interacting peptides. Although this technique has been successfully employed to profile several families of interaction modules (Sparks et al. 1994; Rickles et al. 1994; Vaccaro et al. 2001; Cestra et al. 1999; Paoluzi et al. 1998; Romano et al. 1999; Dente et al. 1997; Freund et al. 2003), there are some limitations to it: first, whereas it is fairly easy to identify which residues are highly conserved among ligands, it is often problematic to uncover statistical correlations in less conserved positions; second, some peptides are actually able to bind to the domain, even if they do not perfectly match the consensus. When the derived consensus is used to infer physiological partners in the proteome, these two problems inevitably give rise to the occurrence of false positives and false negatives.

A third approach, called SPOT synthesis (Frank 2002), successfully addresses the limitations of phage display. The technique is based on the chemical synthesis of a high number of peptides on a cellulose membrane or a glass slide in an array format: the binding of the domain to each spot is detected by a fluorescent probe, whose intensity is measured by a laser scanner. Since binding of each and every peptide in the collection is tested independently and is semi-quantitatively described by a figure correlating with the dissociation constant, we are able to tell readily which peptides bind and which do not, eliminating the inference step that represents the major drawback of the previously described techniques. However, the number of peptides that can be spotted together on a chip with current facilities is in the order of 104, limiting the applicability of this approach to a rather restricted search space.

Recently, the complementarity of the phage-display and the SPOT synthesis approaches has been exploited in the design of a two-step strategy, named WISE (Whole Interactome Scanning Experiment) (Landgraf et al. 2004), that aims at combining the strengths of the two methods into a single general purpose procedure (see Fig. 3). The first step involves the definition of a "strict" consensus sequence by panning a phage displayed peptide library against the selected domain. Then, a "relaxed" consensus is obtained with the aid of computational tools, and it is used to select a number of peptides between 5,000 and 10,000 from all the protein sequences in the proteome that match the consensus. Finally, the peptides are synthesized on a chip and are probed with the domain of interest. The selection step makes it possible to test all potential domain targets in a proteome.

Though the task of discovering new binding motifs cannot be satisfactorily addressed without the fundamental contribution of experimental approaches, it would be a mistake to underestimate the valuable support computational methods can provide. In fact, in silico predictions may be employed to drive wet lab experiments by formulating rationally founded initial guesses, thus speeding up research (Yaffe 2006). As an example of this, Neduva et al. (2005) developed a smart algorithm to detect linear motifs potentially mediating protein interactions. They start from the assumption that proteins sharing a common interaction partner must also share a feature mediating binding, either a domain or a linear motif: for this reason,

Fig. 3 Overview of the WISE strategy

the analysis focuses on groups of proteins sharing a common interaction partner. Since linear motifs often lie in low complexity regions of the protein, all globular domains, coiled coils, trans-membrane regions, collagen regions and signal peptides are removed from the protein sequences in the set. If homologous segments are detected comparing the sequences with each other, only one representative per region is left, to avoid any bias. Next, all three to eight residue motifs present in the remaining sequence space are found and are scored by their level of over-representation, measured as the binomial probability of observing the motif N times or more in a random set of similar sequences (where N is the number of times the motif occurs in the sequence set). Information from closely related species is also taken into account by looking at conservation of the motifs in ortholo-gous proteins and combined into a final likelihood score.

The algorithm was applied to genome-scale protein interaction datasets and produced tens of candidate linear motifs in yeast, fly, nematode and human. Three of the novel predictions were also confirmed experimentally via fluorescence anisotropy (Neduva et al. 2005).

0 0

Post a comment