Introduction

Protein sequences are usually determined by sequencing the corresponding cDNA. Although this approach is efficient, it is unable to account for posttransla-tional covalent modifications such as the oxidation of cysteines forming disulfide bridges. In the era of genome-wide sequencing projects the insufficiency of this type of information is becoming more apparent. Statistics on recently sequenced organisms indicate that the function of 22-60% of putative reading frames is unknown.1,2 To reveal the full covalent structure of a protein, either the expressed or extracted protein must be analyzed experimentally or its structure needs to

1 M. A. Marti-Renom, A. Stuart, A. Fiser, R. Sánchez, F. Melo, and A. Sali, Annu. Rev. Biophys.

Biomol. Struct. 29,291 (2000).

2 J. Cedano, P. Aloy, J. A. Pérez-Pons, and E. Querol, J. Mol. Biol. 266,594 (1997).

be determined. Both of these approaches are time-consuming.3-5 The sequence databases are approximately 100-fold larger than the three-dimensional protein databases and the gap is growing rapidly.

The oxidation state of cysteine plays an important role in protein structure and function. In its thiol form, cysteine is the most reactive amino acid under physiological conditions, and is often used for adding fluorescent groups and spin labels.6 In oxidized forms, cysteines form disulfide bonds, which are the primary cova-lent cross-links found in proteins and which stabilize the native conformation of a protein. Thus accurate predictions of the oxidation state of cysteines would have numerous applications, for example, in engineering stabilizing cystines or reactive thiol groups,7-11 in locating key reactive thiol groups in enzymatic reactions,12 and in determining topologies to aid in three-dimensional structure predictions.13

In the early 1990s two methods predicting disulfide bond-forming cysteine residues were published.14'15 Both methods used sequence information hidden in the specific sequence environment of cysteines and half-cystines. One method employed a neural network (NN) to recognize disulfide-bonded cysteine,14 whereas the other method performed a statistical analysis of the amino acid frequencies in the sequence environment of cysteine.15 The NN method, tested on an independent data set, achieved 81% accuracy, whereas the statistical method performed at 71% prediction accuracy as tested by a jack-knife procedure.

An important development in sequence-based prediction methods was the incorporation of evolutionary information (see, e.g., Fariselli etal.16). This important factor is exploited by two methods to predict the bonding state of cysteine residues. The method of Fariselli et al. uses a neural network approach, as does the method of Muskal et al.,u but incorporates evolutionary information by feeding the neural network multiple sequence alignments.16 This method has achieved a slightly

3 A. Kremser and I. Rasched, Biochemistry 33,13954 (1994).

4 H. R. Morris and P. Pucci, Biochem. Biophys. Res. Commun. 126, 1122 (1985).

5 T. W. Tannhauser, Y. Konishi, and H. A. Scheraga, Anal. Biochem. 138, 181 (1984).

6 T. E. Creighton, "Proteins: Structures and Molecular Properties," 2nd Ed., p. 162. W. H. Freeman, New York, 1993.

7 M. Matsumura and B. W. Matthews, Methods Enzymol. 202, 336 (1991).

8 J. Clarke and A. R. Fersht, Biochemistry 32,4322 (1993).

9 N. E. Zhou, C. M. Kay, and R. S. Hodges, Biochemistry 32,3178 (1993).

10 J. Eder and M. Wilmanns, Biochemistry 31,4437 (1992).

11 A. Yokota, K. Izutani, M. Takai, Y. Kubo, Y. Noda, Y. Koumoto, H. Tachibana, and S. Segawa, J. Mol. Biol. 295,1275 (2000).

12 V. B. Ritov, R. Goldman, D. A. Stoyanovsky, E. V. Menshikova, and V. E. Kagan, Arch. Biochem. Biophys. 321, 140 (1995).

13 I. Simon, L. Glasser, and H. A. Scheraga, Proc. Natl. Acad. Sci. U.S.A. 88, 3661 (1991).

14 S. M. Muskal, S. R. Holbrook, and S. H. Kim, Protein Eng. 3,667 (1990).

15 A. Fiser, M. Cserzo, E. Tiidos, and I. Simon, FEBS Lett. 302,117 (1992).

16 P. Fariselli, P. Riccobelli, and R. Casadio, Proteins 36, 340 (1999).

higher accuracy than either of the two methods published previously. They also illustrate that the achieved higher accuracy is primarily due to the incorporation of evolutionary information and is only partly due to the fact that the databases have increased more than 10-fold during the last decade. The importance of evolutionary information is illustrated in our more recent work,17 which shows that a relatively simple conservation analysis of multiple sequence alignments can make a sensitive distinction between chemically different cysteine residues, achieving a prediction accuracy above 82%. Another conclusion of the analysis is that the natural borderline between differently conserved cysteines lies in between the different oxidation states rather than between cysteines forming or not forming disulfide bridges. In other words, oxidized cysteines such as those bound to cofactors or to ligands or cysteines participating in binding sites are as conserved as cysteines participating in disulfide bonds. From the viewpoint of conservation it does not seem to matter whether certain cysteine residues are important for structural or for functional reasons.

Here we briefly review the differences between oxidized and reduced cysteines on the surface and in the interior of proteins, the differences in the surrounding microenvironments, and the occurrence of different oxidation states of cysteine in secondary structural elements. We also analyze the types of secondary structural elements linked by a disulfide bridge, the most dominant form of the oxidized state, and the correlation between the cellular location of a protein and the oxidation state of its cysteines. Next, a method is explained for predicting the covalent state of cysteine by analyzing residue conservation of multiple sequence alignments. The results demonstrate that the analysis of multiple sequence alignments is an efficient tool for distinguishing oxidized cysteines from those with reactive sulfhydryl groups.

0 0

Post a comment