Prediction Method

In light of the results of the conservation analyses of multiple sequence alignment we can set up a simple and efficient prediction approach. As discussed, the covalent state of cysteine is determined almost exclusively by the location of the proteins. In our representative set there is hardly a single protein in which oxidized and reduced cysteines occur together, except when a cysteine is covalently

36 M. Matsumura and B. W. Matthews, Science 243,792 (1989).

bonded to heteroatoms, prosthetic groups, or other amino acids in active sites, etc., so that the cysteine is also oxidized. Therefore, we use the criterion that if a larger fraction of the predicted cysteines belongs to one oxidation state (with the higher conservation score to the group of oxidized cysteines, and with the lower conservation score to the group of reduced cysteines) then the other cysteines in the same molecule can be assumed to be in the same oxidation state.

If the predicted number n of reduced and oxidized cysteines is equal in a protein, then the relative conservation score must be taken into account. To compare them, take the average of the relative conservation scores for the predicted bonded cystines and predicted free cysteines, and then take the logarithm of these averages and compare their absolute value. Mathematically, if

In abs

-In abs

then the cysteines are predicted to be oxidized; otherwise they are reduced. The overall mean for the relative conservation score is 1.27. This normalized score is not sensitive to the number of sequences or to their similarities in the alignment.

The efficiency of the prediction was tested by the jack-knife procedure. One alignment was removed in each step and the averages of the relative conservation of half-cystines and cysteines were calculated from the remaining alignments. The average of the two gave the threshold; if the larger fraction of the cysteines in the tested protein fell below the threshold, every cysteine was predicted to be in the reduced state; otherwise every cysteine was considered to be oxidized. In the case of an equal number of predicted oxidized and reduced cysteines, we considered the absolute average deviation from the actually determined threshold, which was determined from the remaining part of the set, as described above. In this case it is possible to define the covalent state of 75.8% of cysteines (119 good vs 38 bad predictions and 89.8% of half-cystines (141 good vs 16 bad predictions) in a jack-knife test (overall average, 82.8%). If we used a constant threshold (1.27) throughout the test the overall efficiency rose above 84%.

Three types of mispredictions can occur. First, some free cysteines are strongly conserved for functional reasons as in the case of the bilin-binding protein (1BBPA), plastocyanin (1PCY), and ferredoxin (3FXC). If we exclude from the prediction those free cysteines that are conserved for functional reasons, the prediction accuracy for cysteines increases by nearly 8%, which means an overall increase in prediction accuracy of 4%. However, in this case 17% of the cysteine residues are excluded from the prediction. Second, the method fails if both forms of cysteine occur in the same protein, for example, oxidoreductase (2SODB). Third, misprediction can occur when the normalized conservation of the cysteines is near the prediction threshold, as in cytochrome c (155C), serine protease (2ALP), and azurin (2AZAA). If the Cr values of cysteines and cystines are close to one another (near the mean), this usually results from an uninformative alignment that has either too few sequences or the sequences are similar and Cr does not vary enough along the sequence.

0 0

Post a comment