F f i

Figure 51. Non-superimposable objects. Mirror images of gloves and corkscrews. Mirror planes are indicated as gray lines.

The same would hold for two corkscrews that are identical except for having right or left-handed threads, or two cars that have the steering wheel on the left-side looking forward or the right-side. You would be very surprised to enter your car and find the steering wheel on the side opposite to the usual one, and you would know that the two cars are different.

A hypothetical molecule having four different groups on carbon is shown in both possible forms in Figure 52.

Figure 52. Two molecules having four different groups attached to the central carbon atom which differ only in the spatial arrangement of those groups. The two molecules, A and A', are mirror images of each other.

The two possible mirror image forms of the a-amlno acid alanine, are shown in Figure 53.


Figure 53. Enantiomers of the a-amino acid alanine.

The two structures A and A' in Figure 52 are symmetrical about a mirror plane, but they cannot be superimposed and, therefore, must be different (see Figure 54). The same is true for the a-amino acids shown in Figure 53. These amino acids are called (S)- and (R)-alanine to distinguish them; they are isomers that have different biological and biochemical properties. One, the (S)-isomer is naturally occurring, and is one of the twenty amino acids coded for by DNA and used as a building block for proteins. The other isomer (R) is a different compound that cannot be substituted for natural (S)-alanine in living systems.






Figure 54. The two molecular structures on the left (A and A) are superimposable, whereas the two structures on the right (A and A') are not.

Any two isomers of different "handedness" or chirality are generally called "enantiomers". Such molecules do not have a plane or point of symmetry. Many important therapeutic agents and naturally occurring compounds possess chirality. The chirality can involve one or more sp3 centers of chirality.

Conformations of Molecules

The geometry of the ring in cyclohexane is unlike that of planar benzene. Planar cyclohexane would involve the vertex angles of the regular hexagon of 120° rather than the preferred tetrahedral angle of 109.5°.

cyclohexane chair conformation cyclohexane chair conformation cyclohexane boat conformation

Figure 55. The chair (X) and boat (Y) conformations of cyclohexane. The chair conformation is significantly more stable than the boat conformation at room temperature

The distortion from 109.5° to 120° is so destabilizing that the ring distorts to a non-planar, puckered form with vertex angles of 109.5° (structures X and Y, Figure 55).

Structure X is called a chair form (chair conformation) to distinguish it from another nonplanar structure (Y) that also has 109.5° vertex angles, but is less stable (by about 6 kcal/mol). This energy difference is so large that more than 99.99% of cyclohexane molecules have a chair form at room temperature. The structures X and Y are termed conformers to indicate that the molecules have different shapes.

The Importance of Molecular Shape

The shape and size of a molecule are very important in determining its affinity for a biomolecular target. In general, the binding between the two increases with increasing area of van der Waals contact attraction and also with the degree of hydrogen bonding. The better the molecular fit, the stronger is the affinity. It is also true that a therapeutic agent in aqueous solution will be stabilized by hydrogen bonding to water and dipolar solvation. Thus, there is a trade-off for therapeutic agents: they must be sufficiently well solvated to be soluble in water, but not so strongly solvated that they cannot be pulled from solution by the target biomolecule.

Noncyclic organic molecules tend to be quite flexible because of the low energy barrier to rotation about single bonds. For this reason, most therapeutically useful structures possess cyclic subunits with a modest number of preferred conformations, or even just one. The conformation of prednisone, a very important anti-inflammatory and immunosuppressive drug is shown in Figure 56, which also shows the preferred conformation of glucose.

Figure 56. The preferred conformations of glucose and prednisone.

The polycyclic framework of prednisone is quite rigid and gives the molecule a characteristic shape. At the same time, a number of polar functional groups are positioned at specific sites in space so that they can bind optimally to the target biomolecule.

1. Smith, J. G. Organic Chemistry (McGraw-Hill. 2nd Edition. 2007); 2. Atkins, P. Atkins' Molecules (Cambridge University Press, 2nd Edition, 2003)


There are many small non-medicinal compounds encountered in everyday life that are required by the body. A sampling of these is shown on this and the following page in the space-filling format.

= Carbon

= Hydrogen


= Phosphorous

= Nitrogen

= Sulfur

= Cobalt





Nitrogen monoxide

Nitrogen monoxide

Nitrogen dioxide

Nitrogen dioxide

Citric acid

Dinitrogen monoxide


Sulfur dioxide

Dinitrogen monoxide

Citric acid iP



Sulfur dioxide


Ethylene oxide





Hydrogen peroxide

Acetic acid


Ethylene oxide




Menthol Nimtinp

Nlcotme Nitroglycerin •

Serotonin Vitamin A Vitamin C

Testosterone Vitamin D->

Cytosine Thymine Cholesterol




Serum Albumin



Proteins are very large molecules formed by the joining of many a-amino acid subunits through amide linkages. A subsection of a protein chain can be represented in a generaland highly simplified way by a two-dimensional line drawing (Figure 1).

OR2 OR4 o

r1 or3 o r5

Figure 1. Portion of a protein/polypeptide chain.

Proteins are essential to life and perform innumerable critical functions. For instance, they serve as catalysts for the formation of the molecules of life (biosynthesis), for the disassembly and recycling of such molecules and for energy production. They act as signaling molecules to transfer information and as regulators of cell function or division. They are key structural components in all living organisms.

Proteins are generally large molecules of sizes ranging from a hundred to several thousand a-amino acid subunits. Smaller collections of linked a-amino acids (10-100) are called peptides. Proteins are biosynthesized from twenty different a-amino acid building blocks by an elaborate and rigorously controlled process that is guided and directed by the genetic material deoxyribonucleic acid (DNA), its offspring ribonucleic acid (RNA) and other proteins. The exact sequence of the various a-amino acids along the protein chain is determined by the sequences of purine and pyrimidine bases in DNA and the genetic code. An almost infinite number of different protein molecules are possible from the 20 different amino acids by variations of a-amino acid sequence and protein size.

The Proteinogenic Amino Acids

The twenty a-amino acids that are the building blocks for proteins differ in the carbon group attached to the a-carbon (Figure 2). They all have the configuration at the a-car bon atom which is shown in Figure 2, and not the mirror image arrangement.

One group of a-amino acids consists of those (beyond glycine) in which R contains tetracoordinate carbons and no functional groups. These a-amino acids are shown in Figure 3.

a-amino acid

dipolar ion

Figure 2. General structure of a-amino acids. These compounds exist predominantly in the form of dipolar ions.

glycine (Gly)

alanine (Ala)

valine (Val)

valine (Val)

proline (Pro)

Figure 3. a-Amino acids with hydrophobic (nonpolar) side chains.

Four of the 20 fundamental proteinogenic a-amino acids possess either a benzenoid or heterobenzenoid ring as part of the variable group R attached to the a-carbon (Figure 2). The structures of these a-amino acids are displayed in Figure 4. Each of these a-amino acids plays a unique role in living systems. For example, the hydroxyl group on the benzenoid ring of tyrosine can be transformed into a phosphate ester, and if this happens, the three-dimensional shape (conformation) of a protein may change drastically. This phenomenon is one of the biochemical ways in which cells in living systems receive information from one another and from their environment. In addition, the five-membered imidazole ring of histidine has basic properties because of the nitrogen atoms in it and is extremely important in the catalysis of biochemical reactions by enzymes, a key family of proteins.

oe oe phenylalanine (Phe)

tryptophan (Trpj phenylalanine (Phe)

tryptophan (Trpj

tyrosine (Tyr)

tyrosine (Tyr)

oe histidine (His)

oe histidine (His)

Figure 4. Amino acids with aromatic side chains.

The rings of phenylalanine and tryptophan are nonpolar and oily (hydrophobic). They serve several critical functions. Because of their nonpolar nature, size and shape they can help stabilize a particular protein conformation by van der Waals attraction with themselves, each other or other nonpolar side chains. Such an interaction is shown in Figure 5.

the interior, where there is little or no water and where they pack close to one another.

Four of the proteinogenic amino acids contain hydroxyl or dicoordinate sulfur functional groups, as displayed in Figure 6. The hydroxyl groups of serine and threonine can be phosphorylated, as described above for tyrosine, resulting in a change of protein shape, properties and function. The remaining six a-amino acids are displayed in Figure 7.

oe oh o

serine (Ser)

cysteine (Cys)

oe oh o

threonine (Thr)

methionine (Met)

Figure 6. Amino acids containing hydroxyl or dicoordinate sulfur functional groups.

The a-amino acids in Figure 7 all contain polar side chains and play a very critical role in enzymes and biochemical catalysis, and also in determining the three-dimensional structure of proteins. Since the side chain of arginine is generally positively charged, it is capable of donating a proton in hydrogen bonding, or catalysis. The amino group of lysine is also uniquely reactive in a chemical sense and crucial to protein function.

ho o nh2

arginine (Arg)

oe nh2

arginine (Arg)

aspartic acid (Asp)

glutamic acid (Glu)

oe glutamic acid (Glu)

Figure 5. An example of attractive k-k interaction between the aromatic rings of phenylalanine and tryptophan.

Hydrophobic groups such as those in Phe, Trp, Leu and lie tend to avoid the interface of proteins with water, i.e., the outside of a folded protein structure, and to be buried in h2n o if oe asparagine (Asn)

glutamine (Gin)

glutamine (Gin)

Figure 7. Hydrophilic amino acids with polar side chains.

Phenylalanine (Phe) side chain

Figure 5. An example of attractive k-k interaction between the aromatic rings of phenylalanine and tryptophan.

Tryptophan (Trp) side chain

Phenylalanine (Phe) side chain

Tryptophan (Trp) side chain

The Simplest Peptide

The dipeptide glycylglycine consists of two glycine units connected through an amide (peptide) bond. The amide functional group involves electron derealization over three atoms as shown in Figure 8 (see also page 15). The bonds to the amide subunit lie in one plane, as imposed by the it-bond between C and N, and the C-N bond distance is shortened (from 1.46 A to 1.32 A).



glycylglycine h

.i m i, glycylglycine h

Side view

1.24 A (no rotation)

(free rotation) (free rotation)

ch3 o mVVW


generic tripeptide ch3 o

Ala-Gly-Phe (alanyl-glycinyl-phenylalanine)

Figure 9. Structures of a generic tripeptide and of the specific tripeptide Ala-Gly-Phe.

Amino Acid Sequence (Primary Structure) of Proteins

The sequence of the individual amino acids in a polypeptide or protein, the primary structure, is critical to its preferred three-dimensional shape and properties. It is the very large number of possible sequences that enables the existence of an unimaginably large number of possible proteins (100'" for a 100 amino acid protein). There are six (3x2x1, 3!) possible primary structures for the tripeptides containing a single Ala, Gly and Phe, as shown in Figure 10.

Ph ch3 o



Figure 8. Glycylglycine and the structure of the peptide bond. The atoms that are part of the planar amide subunit are Indicated with red arrows in the ball-and-stick models.

o chj



Ph o ch3 o Gly-Ala-Phe

The planar amide linkage is a critical organizing element of protein structure because of its rigidity. In contrast, rotation about the bonds attaching carbon to it is rapid and lends flexibility in protein structures.

Figure 9 displays a general formula for a tripeptide and the formula for the specific tripeptide, Ala-Gly-Phe.


Figure 10. Six tripeptides may be generated using only three amino acids, Ala, Gly and Phe.

Preferred Conformations (Secondary Structure) of Proteins: The a-Helix, P-Turn and p-Strand as Motifs

The most common structural motif found in proteins is the a-helix. Figure 11 shows an a-helical polypeptide with 30 amino acid residues in various representations. In structure A the elements in an a-helix are shown, with carbon as gray, oxygen as red, nitrogen as blue and sulfur as yellow. Structure B displays the backbone of the polypeptide without any of the side chains (for clarity). Structure C illustrates the polypeptide chain with stabilization by hydrogen bonds between the carbonyl groups (C=0) and amino groups (N-H).

Figure 11. Three different representations of an o-heiical polypeptide with 30 amino acid residues. Structure A displays the entire polypeptide with the side chains. Structure B shows only the backbone of the polypeptide. Structure C illustrates that the a-helix is held together by a network of hydrogen bonds.

For clarity, peptides and proteins are often represented by a ribbon diagram in which side chains are omitted and the backbone is replaced by a ribbon. Figure 12 shows how these two representations are related to the structure of the polypeptide.

The a-helices in proteins are all right-handed, with the backbone of the polypeptide chain receding from view in a clockwise fashion (Figure 13). The a-helical structure is compact and highly stabilized by hydrogen bonding between the amide C=0 and N-H groups.

Figure 12. Generation of a ribbon diagram of a polypeptide. Structure D shows the entire polypeptide with the backbone. Structure E shows the backbone and its tracing. Structure F shows only the ribbon diagram of the original polypeptide.

Figure 12. Generation of a ribbon diagram of a polypeptide. Structure D shows the entire polypeptide with the backbone. Structure E shows the backbone and its tracing. Structure F shows only the ribbon diagram of the original polypeptide.

Side-view cartoon diagram

Side-view space-filling diagram

Side-view cartoon diagram

Side-view space-filling diagram

Figure 13. An a-helix in ribbon and space-filling representations.

a-Helices can form bundles when the aggregate is stabilized by attractive interactions between neighboring side chains. An important example of such aggregation is the G protein-coupled receptor, rhodopsin, which has seven a-helical domains (Figure

14). Each helix is shown in a different color (see page 78 for more on G proteins).

Figure 14. Ribbon and space-filling diagrams of the G protein-coupled receptor rhodopsin. The seven a-helical domains (different colors) occur embedded in cellular membranes.

The other structural motif that is abundant in proteins is the (3-sheet which is formed by the stabilizing hydrogen bonds between the backbones of two (3-strands (Figure 15).

Figure 15. P-Sheet stabilized by hydrogen bonding between two (3-strands connected by a fl-turn.

p-Sheets occur in proteins both in an antiparallel arrangement, as in Figures 15 and

16, and in a parallel sense.

Figure 16. Simplified representation of a p-sheet The arrows point into the direction of the C-terminus of the polypeptide chain

The Structure of Insulin

One of the simplest and most important protein-type structures is the hormone insulin. Produced in the islet cells of the pancreas, it is essential for life and health and has many different actions in the body. It is a fundamental regulator of energy production, metabolism and muscle function. Insulin is composed of two chains of 21 and 30 amino acids, held together by two disulfide bonds (S-S) formed from the pairing of S-H groups of four of its six cysteines. The other two, which are located in the smaller chain, form another disulfide bond, giving a tricyclic structure. The X-ray crystal structure of insulin, determined by Dorothy Hodgkin and her team in 1969, revealed a molecule with three short a-helices, which are indicated in the shorthand ribbon diagram of Figure 17. Many of the nonpolar (hydrophobic) side chains of insulin are buried in the interior region. There are also hydrophobic groups outside the core in a non-polar surface which are involved in the binding of insulin to its receptor. It is the monomeric (single molecule) form of insulin that circulates in the body and activates the insulin receptor to produce the vital biological responses.

Figure 17. Ribbon representation of monomeric insulin. The a-helices are shown in green and the three disulfide (S-S) bridges are represented as yellow spheres.

An example of a larger and more complicated protein, TolC, is shown in Figure 18. TolC is a bacterial protein that plays a role in bacterial resistance to antibiotics. The TolC protein combines with two other proteins (MexA and AcrB) to form a giant assembly (or protein machine) to pump antibiotics or other molecules harmful to a bacterium out of the organism (see page 143 for more on drug resistance).


Figure 15. P-Sheet stabilized by hydrogen bonding between two (3-strands connected by a fl-turn.

P-strand a-helices -

Figure 18. The protein TolC is a component of the bacterial drug efflux pump and has several ct-helical domains (red) as well as a number of p-sheets (yellow).

a-helices -

ß-sheets a-helices

Figure 18. The protein TolC is a component of the bacterial drug efflux pump and has several ct-helical domains (red) as well as a number of p-sheets (yellow).

Some Other Aspects of Protein Structure

There are an enormous number of possible conformations for a protein with a particular amino acid sequence because there can be fast rotation about the N-C„ and C„-CO single bonds of each amino acid subunit. However, most naturally occurring proteins exist in one, or a small number of preferred conformations. The reason for this is that there are many factors that operate to favor a particular three-dimensional geometry. Among the most important determinants of protein shape are the following:

(1) Sulfur-sulfur linkages may form between the SH groups of two cysteine subunits in the protein. These disulfide (S-S) bridges generate a ring that greatly limits molecular shape.

(2) The interior of proteins is generally rich in hydrophobic groups. Polar or charged side chains tend to be exposed to water or on the surface of the protein.

(3) Organized substructural motifs, such as the a-helix, |3-sheet (both with parallel and antiparallel (3-strands) organize and favor a particular conformation for large subsections of the protein.

(4) These organized substructural motifs can adhere to one another through attractive interactions (H-bonding, electrostatic, van der Waals) leading to the favoring of a particular giant domain structure. For instance, attractions between a-helical subunits can result in the formation of a-helical bundles, such as the 7-helix bundle of rhodopsin.

(5) Repulsions between groups set in when they are closer than the optimal van der Waals contact distance. These repulsions, known as steric repulsions, occur basically because atoms behave as very hard objects once they are within a certain critical distance of one another. Although close packing is essential for intramolecular attraction of protein subsections, it is also limited by steric repulsion.

(6) Substructural motifs such as a-helices or |3-sheets can be connected by short flexible loops or (i-turns.

(7) a-Helices have polarity because the OO groups all point in one direction along the helical axis, and they usually pack together in an antiparallel way.

Determination of the Structures of Proteins

The three-dimensional structures of proteins are most generally elucidated by the use of X-ray diffraction analysis using highspeed computers to decode the data on X-ray beam scattering by a single crystal of the protein. Since the first determination of a protein structure (hemoglobin by John Kendrew and Max Perutz in 1962), thousands of protein structures have been determined. The number of new known protein structures has been growing sharply year by year thanks to the convergence of several key circumstances.

(1) Modern molecular biology and protein science provide access to adequate quantities of pure protein.

(2) Improved techniques for the crystallization of proteins are available.

(3) The use of high-intensity, narrow-beam X-ray sources coupled to advances in analytical software and the ever increasing speed of computers has greatly simplified the determination of protein structure at atomic resolution.

The structure of a crystalline protein can now be determined in just a few weeks. The availability of a three-dimensional protein structure greatly assists in the discovery of molecules that can bind to the protein and affect its structure. Structure-guided design of therapeutic molecules is now a crucial tool for molecular medicine.

1. Whitford, D Proteins: Structure and Function, Hoboken John Wiley & Sons; 2005.


Methotrexate (TrexaH™), used to treat rheumatoid arthritis, is shown in red bound in the active site of its target, the enzyme dihydrofolate reductase (see page 46; PDB ID: 1RG7). The image shows the full protein. The a-helical domains are colored cyan, the fi-sheets are magenta and the loops are orange.

Imatinib (Gleevec™) used to treat leukemia, is shown in magenta bound in the active site of its target, the enzyme tyrosine kinase (see page 195; PDB ID: 1IEP). The image shows the full protein. The ct-helical domains are colored red, the (5-sheets are yellow and the loops are green.

Atorvastatin (Lipitor™), used for the reduction of LDL cholesterol levels, is shown in red bound in the active site of its target, the enzyme HMG-CoA reductase (see page 64; PDB ID: 1HWK) The top image shows the whole enzyme, whereas the bottom image is a close-up view.

Oseltamivir (Tamiflu™), used to prevent influenza A and B viral infections, is shown in magenta bound in the active site of its target, the viral enzyme neuramidinase (see page 150; PDB ID: 2HT8). The top image shows the whole enzyme whereas the bottom image is a close-up view in which oseltamivir is colored red.


0 0

Post a comment