Data Mining Molecular Cloning and Characterization

For the identification and selection of a therapeutic target, several key issues must be considered. These issues include knowledge of the molecular basis of the disease, with regard to tissues, cells, or specific molecules that modulate the disease target [1]. In other words, knowledge about the links of clinical symptoms with specific cell types and molecules will be invaluable in developing potential molecular targets. In the past this knowledge was communicated through patents and manuscripts published in scientific and medical journals. As we enter the information age, with advanced computing and high-speed communication at our disposal, knowledge critical to the selection of molecular target information is readily available to anyone with Internet access. Still more information is available through the recent completion of the Human Genome Project that cataloged the genome sequence [2,3]. Therefore we continue to add to our knowledge the genetic information that may be useful for identifying molecular targets.

What remains relatively unchanged, however, is how one uses the information available in the literature or other sources to identify the drug candidate that will have an optimal effect on the therapeutic target. The strategy to clone and express disease-related genes also remains unchanged.

The ability to manipulate DNA, to analyze the genetic code, to transcribe DNA to RNA, and subsequently to translate RNA to protein in well-defined prokaryotic and eukaryotic cells is known as genetic engineering or recombinant DNA technology. The technology developed in the late 1960s and early 1970s continued to advance rapidly by integrating the results of basic studies from chemistry, biology, biochemistry, microbiology, genetics, pharmacology, and fermentation sci ences [4,5]. Some of the key advances are listed below:

1. Basic understanding of DNA replication paves the way to isolate and manipulate genes of interest.

2. Elucidation of molecular processes involved in transcription and translation [6,7] leads to the development of a bacterial and mammalian protein expression system to mass-produce proteins.

3. Advances in B (antibody-producing) cell biology and tumor (e.g., melanoma) cell biology lead to the development of immortalized cells that overexpress a monospecific (monoclonal) antibody [8].

4. Advances in DNA and peptide sequencing techniques [9-14] lead to a large number of potentially useful gene and peptide sequences now widely available in the national databases.

5. Detailed understanding of gene expression (prokaryotic or eukary-otic plasmid vectors) and regulation (promoter, suppresser, feedback control, etc.) (Figure 4.1) allows large-scale manufacture of functional proteins, and forms the basis for more recent attempts at in vivo gene expression as part of gene therapy.

The first step in producing sufficiently pure and large enough quantities of new macromolecules or proteins is to clone the gene and express its protein in bacterial (prokaryotic) or mammalian (eukaryotic) cells. The detailed technologies and procedures to accomplish this are discussed in a number of recent publications and book chapters [4,5]. A brief description of terminology used in recombinant DNA applications is listed in Table 4.1. We will only discuss these techniques from the perspective of the strategies generally used to increase the efficiency of cloning, expression, and characterization of the recombi-

nant products. Where appropriate, we will highlight advantages and limitations. For this purpose we will use the terms "macro-molecule" and "protein" interchangeably. The isolation of the gene encoding the protein and expressing and characterizing it will be described as a sequential process. We must be mindful that this process requires multiple iterations and simultaneous execution of strategies. The process can be roughly divided into five categories:

1. Identify the best source for isolating the gene that encodes target protein.

2. Clone the gene.

3. Engineer an expression system for the respective protein.

4. Optimize the DNA sequences to enhance protein expression.

5. Verify the molecular and functional characteristics of the expressed recombinant protein.

Identify the Best Source for Isolating the Gene That Encodes Target Protein

To identify an optimal source for the isolation of a gene, one must know the tissues, cells, bacteria, or viruses that produce the highest possible copy numbers of the target genes in DNA or RNA transcripts. Large quantities of the target gene are found in tissues or cells that express large quantities of the gene product.These tissues or cells can be used to isolate the messenger RNA that encodes for the target protein. For example, isolation of genes for bacterial and viral proteins will require biologic sources enriched with the respective bacteria and virus-infected cells or tissues as starting materials to isolate DNA and RNA. For most therapeutic candidates that are endogenous proteins, the tissues responsible for producing these proteins will be the best sources to isolate genes of interest. Alternatively, one can use inducible elements of established tissue-culture cells to increase the mRNA that encodes for the target protein. Inducible elements increase protein expression by several-fold. Additional details on using gene-activation signals to enhance gene isolation are listed in Table 4.2.

From the enriched source of tissues or cells, one can isolate RNA and DNA of target genes. These tissues or cells also allow one to purify small quantities of the putative proteins, thereby permitting generation of antibodies reactive to these

■TABLE 4.1. Terminology used in recombinant DNA research

Genomic DNA

All DNA Sequences of an Organism

cDNA (complementary

DNA copied from a messenger RNA molecule that encodes for


respective amino acid sequences


A small extrachromosomal cicular DNA molecule capable of

reproducing independently in a host, prokaryote, or eukaryote, cell


A plasmid containing either a cDNA or genomic DNA sequence or


Genomic clone

A host cell (usually bacterium) with a vector containing a fragment

of genomic DNA from a different organism of interest

cDNA clone

A vector containing a cDNA molecule from another organism, which

can be used to transcribe or translated into protein in selected

host cells


A complete set of genomic clones from an organism, or cDNA

clones from one cell type

Restriction enzymes or

Prokaryotic enzymes with exquisite sequence recognition of target


DNA duplex and precise cleavage site

Figure 4.1. Schematic representation of protein synthesis beginning with the genetic information in DNA sequences within nuclear chromosomes (A), and concluding with post-translational protein modification (B). Some DNA gene sequences may contain introns or sequences that do not encode for the putative polypeptide and are processed within the nucleus through RNA splicing as exemplified by the well-studied b-globin gene RNA (insert). Other polypeptides such as those of viral gene products may express multiple proteins in a single polypeptide that is proteolytically cleaved or processed after protein translation. Additional post-translational modification may include protein glycosylation in which glycan residues are added. Terminal sugar residues, glucose or galactose, have been shown to play a significant role in the rate of protein clearance from blood circulation.



Displaced strand of DNA

Displaced strand of DNA

Displaced strand of DNA

Intervening sequence



b-Globin gene

Transcription cap formation, and poly A addition - (A)„

Primary transcript

Splicing b-Globin mRNA

Displaced strand of DNA

Intervening sequence

Primary transcript

Splicing b-Globin mRNA




Translation of mRNA to protein in endoplasmic reticulum





Post-translational modification *glycosylation 'proteolytic cleavage *acylation 'disulfide bridges (proper folding)

proteins for downstream verification of the protein generated by the recombinant vectors using molecular cloning strategies.

Clone the Gene

Once the enriched source of a gene that expresses a target macromolecule is identified, three general strategies are commonly used to clone the gene. They are (1) com plementary DNA (cDNA) cloning from mRNA, (2) cloning of gene employing polymerase chain reactions (PCR) or PCR cloning, and (3) cloning the gene by first generating a genomic library or genomic cloning. In some cases, a combination of all three strategies or a variation will be used to isolate DNA fragments and reconstruct the complete DNA sequence encoding the gene product.




Proper glycosylation may be required for

2 UDP-GlcNAc 2 UDP 2 UDP-Gal 2 UDP 2 CMP-NAN



1. Protein conformation

2. Transit to extracellular or proper sites

3. Biologic activity

4. Stability; i.e., biologic half-life (in cells and plasma, etc.)





ASn N-link or Ser O-link protein

Final stage in the synthesis of a neura-minidate-containing carbohydrate unit of a glycoprotein (human transferrin). This terminal glycosylation takes place in the Golgi apparatus.

1. Protein conformation

2. Transit to extracellular or proper sites

3. Biologic activity

4. Stability; i.e., biologic half-life (in cells and plasma, etc.)


Asymmetry of cell membranes. The luminal faces of the ER and other organelles correspond to the extracellular face of the plasma membrane.












slalic acid


Some common sugars found in glycoproteins.


slalic acid


Some common sugars found in glycoproteins.

Figure 4.1. Continued cDNA Cloning. When mRNA of the target gene, such as the insulin gene in pancreatic tissue, can be obtained, one can use the enzyme reverse transcriptase to synthesize DNA complementary (cDNA) to the messenger RNA. Reverse transcrip-tase is the enzyme used by retroviruses to translate their RNA to genomic DNA in completing their life cycle. Messenger

RNA is only a small fraction of total RNA, but it is expressed in high concentrations in tissues that express the target protein. Therefore messenger RNA isolated from enriched sources is likely to contain RNA sequences that encode for the target protein.

Reverse transcriptase requires 10 to 15 bases complementary to target messenger

ITABLE 4.2. Gene-activation signals and tissue or cell sources that enhance isolation and identification of target genes in eukaryotes

Activation Signals



Target Cells, Tissues, and Genes

Hormones Proteins


Circulating or secreted factors

Environmental nutritional signals

Growth hormone Prolactin Estrogens Testosterone

Nerve growth factor

Epidermal growth factor (EGF)

Interleukins or lymphokines



Platelet-derived growth factors (PDGFs)

Amino acids, nucleotides, phosphatases, glycosidases Modulation of glucogenesis and protein synthesis activity Heat shock and stress

Toxins, drugs, carcinogens, heavy metals

Hemorrhagic and inflammatory compounds

Many cells

Secretory cells in breast tissue Liver, brain, reproductive organs Muscle, bone, skin, reproductive organs

Differentiating nerve cells (e.g., axons)

Many cultured cells, cells of skin and eyes

Leukocytes or white blood cells Red blood cell precursors Epithelial cells, white blood cells Fibroblast cells

Lower eukaryotes (cell growth, hydrolytic functions, and replications) Animal cells and metabolically active tissues

Induction or elimination of background mRNA, allowing isolation of specific proteins and mRNAs Cytochrome P-450, monooxygenase enzymes, transporter proteins in liver, kidney, gut, lung, and other tissues White blood cells

RNA to initiate cDNA synthesis. All cellular mRNA contains multiple repeats of adenine bases (poly-A tails). Therefore the complementary thymine bases (oligo-dT) can be used as a primer that binds to the mRNA template required for the reverse transcriptase to synthesize the cDNA. In the case of pancreatic mRNAs (Figure 4.2), the significantly higher mRNA for insulin compared with other proteins allowed success in isolating the insulin-specific cDNA. Subsequent insertion of cDNA into a bacterial expression vector allowed the production of functional insulin that is now marketed as a successful therapeutic product (Figure 4.2).

PCR Cloning. Unfortunately, not every gene yields measurable levels of mRNA. For these situations, amplification of specific DNA sequences is necessary. This was achieved by the invention of in vitro polymerase chain reaction (PCR) techniques to amplify cDNA using the DNA primers specific for the target. With this technique, a complementary oligonucleo-tide probe corresponding to target DNA sequences can be used to isolate the cDNA [15,16].

In the event that no partial DNA sequences are known, but functional target protein is available, we can identify the terminal amino acid sequence. The three-to-


Insulin mRNA

Reverse transcriptase

Insert into plasmid Insulin cDNA -»-

Recombinant DNA molecule containing gene for insulin

Transformed bacterium

Figure 4.2. Cloning of insulin from cDNA isolated from pancreas, the main tissue responsible for synthesis of insulin, and inserted into plasmid vectors that permit expression in host cells.

Recombinant DNA molecule containing gene for insulin

Transformed bacterium

Figure 4.2. Cloning of insulin from cDNA isolated from pancreas, the main tissue responsible for synthesis of insulin, and inserted into plasmid vectors that permit expression in host cells.

five amino acid sequence can be used to predict the corresponding DNA sequences based on the degenerative genetic codes of amino acids (Appendix V). The predicted DNA sequences serve as the primer required to amplify the target gene. The highly selective amplified DNA product generated by PCR can then be cloned into expression vectors for sequence and functional characterization.

Because the efficiency of PCR performed in vitro is very high, we no longer require bacterial replication to amplify the quantity of DNA. Therefore, this is the method of choice for characterizing and cloning the gene for final expression of the protein products (see Figure 4.3 for brief presentation of PCR). PCR gene amplification can also be used to increase low concentrations of cDNA produced by reverse transcribing of mRNA, as described in the previous section.

A limitation of this strategy is the fidelity of enzymes—reverse transcriptase and DNA polymerase—used in PCR to produce consistently error-free DNA sequences. These enzymes typically introduce error at the rate of one base per 400

to 700 kilobases synthesized [17]. By this error rate the methodology allows error-free isolation of DNA sequences that encode up to 1333 amino acids. This translates to 180kDa and covers most of the proteins in use as therapeutics today.

Genomic Cloning. In the event that the two strategies cited above do not provide a satisfactory outcome, a "shotgun" approach to gene cloning, otherwise known as genomic cloning, can be used to clone the entire genome of a cell (Figure 4.4).The entire genome of a cell is digested with a specific restriction nuclease to generate a very large number of fragments that are inserted into millions of bacterial plasmid vectors containing unique and sometimes overlapping genomic DNA sequences. When these plasmids are transfected into bacterial cells, each of the millions of bacteria contains some of the genomic DNA sequences, and each plasmid is a genomic DNA clone; the entire collection of plas-mids is known as a genomic library.

Because nuclease digestion of geno-mic DNA is a random process and the higher eukaryote DNA contains introns

Human DNA

Human DNA

double-stranded DNA containing sequence of interest [ • ]

heat denature single-stranded DNA

reassociate 5'_ 3'

with primers{ m }

Annealing of primers

Extension of target

DNA synthesis heat and denature reassociate


DNA synthesis denature and

New copy reassociate

Total number of DNA copied by PCR = 2n where n is the number of heating and DNA synthesis cycle

Figure 4.3. The polymerase chain reaction (PCR) to exponentially amplify a unique sequence of DNA in a test tube over 25 to 35 cycles of heating and cooling the mixture. Since the discovery of DNA polymerase by Arthur Kornberg, researchers have been dreaming about developing ways to amplify DNA without having to insert the nucleic acid into vectors and allow the bacterial host to amplify the unique sequences. A process such as this is feasible, but it is time-consuming and yields unpredictable results. While the purified DNA polymerase allows synthesis of new strands of DNA, they must be heat-denatured to produce single-stranded DNA as a starting template for the next round of DNA synthesis, a process where most DNA polymerases do not survive. Hence the reaction cannot be recycled. The search for enzymes that can survive heating and cooling has led to the discovery of heat stable, Tac-DNA polymerase that has made in vitro chain DNA amplification, or polymerase chain reaction, a reality. This process requires short DNA sequences to serve as primers for amplification of target sequences. Since then, many more heat-stable polymerases have been isolated and used in PCR. (Adapted from Lancet, 1988; Iss. 8599 1372-3)

Figure 4.4. Genomic cloning of large DNA sequences. Typically chromosomal DNA is cleaved with restriction enzymes into small fragments suitable for inserting into plasmid vectors. These plasmid vectors are reintroduced into unique clones of bacterial host cells generally known as a library of genomic clones. By systematically screening for the genomic clone or clones that contain DNA sequences, the gene of interest can be identified. The expression vector containing the target gene is often required to be reconstructed from several clones, putting together DNA sequences isolated from these clones. The overall cloning process is called the genomic cloning strategy.

Figure 4.4. Genomic cloning of large DNA sequences. Typically chromosomal DNA is cleaved with restriction enzymes into small fragments suitable for inserting into plasmid vectors. These plasmid vectors are reintroduced into unique clones of bacterial host cells generally known as a library of genomic clones. By systematically screening for the genomic clone or clones that contain DNA sequences, the gene of interest can be identified. The expression vector containing the target gene is often required to be reconstructed from several clones, putting together DNA sequences isolated from these clones. The overall cloning process is called the genomic cloning strategy.

or noncoding sequences, an individual DNA plasmid clone generated by this strategy is unlikely to contain the entire sequence coding for a target gene product. In other words, the entire DNA sequence encoding for the polypeptide will be distributed in several clones within the genomic library. While this strategy permits, with minimum effort, generation of a genomic library with millions of DNA clones, the challenge is to find the clones that contain the DNA sequence of the target gene.

This is often done by in situ hybridization of the probe DNA constructed with oligonucleotides generated using the known or predicted sequences of the target gene. The bacterial cells containing these target DNA fragments will react to these probes. These probe-positive bacterial cells are expanded to collect DNA sufficient to provide for sequence analysis. Many of these clones are screened and analyzed to obtain the full genomic DNA, including sequences that are not coded for final protein synthesis. For some applications, where the target gene of interest is small and does not contain an intron, the DNA library can be cloned into vectors designed for protein expression and screened for protein function. Alternatively, if antibody to the target protein is available, it can be used to screen for bacterial clones that react to the antibody.

Regardless of the strategy one uses to identify and clone the gene of interest, these genetic clones of DNA encoding for the target protein are then analyzed initially by their physical size in relation to the size prediction based on protein data (if available). Other analyses include restriction endonuclease fragment analysis (if known), in which predicted unique fragments are generated by the enzyme, which acts on select sequences of the gene. Ultimately the gene is sequenced to deduce the entire coding sequence. It was not long ago that DNA sequencing was one of the most challenging, tedious, and time-consuming tasks in cloning of a protein. The recent advances in fluorescence dye labeling, PCR technology, and automation have expedited this process, allowing the DNA sequencing to be done with minimum effort [18-20]. Today an entire DNA sequence encoding about 4000 bases can deduced and analyzed within one week.

Engineer an Expression System for the Respective Protein

The next step in molecular cloning of the target gene product is to translate the genetic sequence into the protein product so that the functional characteristics can be verified and analyzed in detail. One can choose either prokaryotic hosts (bacteria and phage) or eukaryotic hosts (yeast, insect or mammalian cells) for this purpose (Table 4.3). The selected genes are inserted into the host cells by means of small circular DNA fragments called plas-mids. Once introduced into respective host cells, the plasmid containing the gene of interest will be transcribed and translated into the protein of interest. The host-plasmid combination is known as an expression system.

The choice of prokaryotic versus eukaryotic expression systems is determined by the functional activity requirement of the protein product. When a therapeutic protein is designed, additional considerations are introduced, including post-translational modifications (i.e., glyco-

sylation, acylation, proteolytic cleavage, and protein folding). The vectors destined to be expressed in bacteria are simpler and easier to engineer. The bacterial expression system has higher replication rates and costs less to produce a unit of protein.

Most target proteins of therapeutic interest are human or mammalian in origin, but some require post-translational modification during expression for optimum biological and pharmacokinetic properties. Bacterial expression systems do not allow most post-translational modifications. In those cases, eukaryotic expression systems are chosen. These systems, however, are more complex and require more time and resources to engineer, especially in mammalian cells. Consequently, the final products produced in mammalian cells are more expensive than those produced by bacterial expression systems.

An ideal plasmid vector can be replicated and expressed in both mammalian and prokaryotic cells. Verification of gene insertion in mammalian cells is difficult, and researchers usually turn to bacterial cells for isolation of easily replicated plasmid DNA and sequence analysis. This plasmid DNA is then introduced to mammalian cells for expression.

In the past there were few plasmid vectors that could be replicated and expressed in both mammalian and prokaryotic cells. However, with a better understanding of genetic sequences essential for replication, transcription, and translation, we now have the ability to construct plasmids with nearly ideal properties. Furthermore, some plasmid constructs are now engineered to express additional tags, such as polyhistidine, as a part of protein expression. By using standard techniques to purify the tag, the efficiency of isolating target protein is greatly increased.

Regardless of which plasmid expression system one chooses, the initial goal is to verify that the target gene is inserted and cloned properly in the plasmid so that

■TABLE 4.3. Comparison of the features and requirements of gene expression systems

Expression System


Selection Strategy

Some Required Elements

Prokaryotic: Bacterial plasmid

Prokaryotic: Bacterial cells infected with bacteriophage

Eukaryotic: Insect cells (e.g., Autographica califomica polyhedrosis virus)

Eukaryotic: Yeast plasmid or integration into host chromosome by homologous recombination

Eukaryotic: Animal (recombinant) virus vectors

Eukaryotic: Animal (recombinant) virus vectors

Eukaryotic: Mammalian cell-transient expression

Eukaryotic: Mammalian cell-permanent expression

Inducible or constitutive

Transient expression

High levels of protein expression

Transient or permanent

Transient or lytic infection

Permanent infection

Transient expression

Permanent expression; achieved by integration of gene into host chromosome

Drug resistance

Infected cells

Infected cells

Amino acid requirement in autotrophic strain; heavy metal induction of resistance gene

Infection in susceptible cells

Alteration of transformed host cell phenotype

None: A higher copy number can be achieved using ori sequence that responds to factors in recipient cells

Drug resistance; complementation of deleted essential gene in recipient cells

Bacterial origin (ori) sequence needed for DNA replication"

Viral (bacterial phage) promoter and other elements required to support viral protein synthesis

All essential elements for virus gene; target gene is usually expressed as a polyhedron fusion protein or used as a signal to express target gene

Yeast ori sequence; constitutive or inducible promoter; transcription terminator

All essential viral genes; strong promoter/enhancers; polyadenylation signal; intron sequences

All essential viral genes; strong promoter/enhancers; polyadenylation signal; intron sequences; host transforming gene sequences

Strong constitutive or inducible promoter; polyadenylation signal; intron sequences

Strong constitutive or inducible promoter; polyadneylation signal; intron sequences n >


"ori is the unique nucleic acid sequence that serves as an origin of DNA replication.

ui functional characteristics of the expressed protein product can be verified. About 10 years ago, cloning of a gene and expression of a protein took up to 3 to 5 years. With the development of well-characterized plasmid constructs and standardization of molecular cloning tools, the process has been reduced to a few weeks [21-23].

Optimize the DNA Sequences to Enhance Protein Expression

The purpose of initial cloning and expression of a target gene is to demonstrate its biochemical characteristics and its functional activities, including binding affinity to the putative receptor and mediation of cellular events related to therapeutic responses. The protein produced in the initial expression system, however, may not be enough for preclinical testing, and the level of expression may be so low that even expanding the system will not allow for large-scale manufacturing. Therefore, the initial cloning of a target gene product often will be followed by optimization of the expression system.

The aim of optimization is to (1) provide high efficiency and stable protein expression, (2) include signal peptides or other DNA sequences inserted into plasmid vectors to promote excretion of target protein by the host cells, (3) allow efficient subcloning or transferring the gene into prokaryotic or eukaryotic plasmids, according to the pharmacologic and pharmacoki-netic requirements for the target protein, (4) contain introns or codon modifications to enhance transcription and translation of the target gene, and (5) be adaptable to express functional proteins in limited numbers of host cells (i.e., Chinese hamster ovary cells, E. coli, and yeast). Sometimes, during the optimization of expression systems, a drug company must commit to one of the systems so that their scientists can begin to optimize DNA sequences for production.

Optimization of gene expression may be applied at every step of the process, from initial cloning and characterization to initiation of clinical trials. Often several rounds of optimization will be required to select the expression system that produces the highest yield with the lowest cost fermentation and purification schemes. Significant resources are allocated for optimization because the best protein expression system can lead to several hundred- to a thousandfold increased efficiency in protein produced. Under these conditions, as much as 10% of total proteins produced by the expression system are target proteins.

Verify the Molecular and Gunctional Characteristics of the Expressed Recombinant Protein

In order to verify the molecular and functional characteristics of the protein synthesized by the expression system, a number of molecular tools have been developed. These tools have been optimized over the years to allow efficient verification of the molecular characteristics of recombinant proteins. Some are also used for quality control procedures in the synthesis of recombinant proteins for pharmaceutical use.

During this process some key questions are: (1) Does the DNA sequence inserted into the expression vector contain all the sequences essential for the target protein? (2) Is the messenger RNA expressed at the predicted length and produced in high enough levels? (3) Does the recombinant protein react with antibody? (4) Is there evidence of functional activity of the recombinant protein? Methods for answering these questions are listed in Table 4.4.

When clones of cells are determined to express a putative protein of interest, a set of small-scale purification procedures is developed to further characterize the biochemical and biophysical properties of the protein.Typically, to isolate sufficiently pure

■TABLE 4.4. Some methods used for screening and verifying molecular clones form the gene expression system





i n


Immune-based quantitation

Rapid, specific.

Requires one or more high-affinity


immunosorbent assay

of expressed protein

quantitative, and efficient

antisera or monoclonal antibodies




Spot blot

Immune-based quantitation

Quantitative and specific

Low sensitivity



n 1

Western immunoblot

Immune-based verification of

Excellent tool to detect

Not a quantitative assay and not suitable


the size of expressed protein

protein size and integrity in

for large-scale screening

O o

a sensitive manner


RNA screening

To verify expression of

Sensitive and does not

protein detecting is indirect and problems



require antibodies

in translation or post-translational


modification may be missed



To determine the size and

Highly sensitive for use as

Semi-quantitative and labor-intensive—



integrity of protein

initial characterization tool

not suitable for large-scale screening

n O

for product size and integrity

m 33

Direct colony serening

To identify cells that express

Screening of colonies in situ

Not suitable for screening intracellular or


using protein blot

target protein

using a specific, high affinity

membrane bound protein; must have high



affinity antibody or antiserum

o m


To identify protein-expressing

Relatively efficient and

A qualitative method requiring cell

< m

cells in the population and

provides intracellular details

fixation, which inactivates the protein


subcellular localization of

expressing cells


express protein



To predict and select cells in

Sensitive and allows direct

Required proteins expressed

cell sorter (FACS)

the population that express

sorting and cloning of

intracellularly or on the cell surface. Not

target protein

positive cells

sensitive for expression system

engineered to secrete the protein

protein from either cells or culture medium, column chromatography techniques are used (Box 4.1). Column chromatography techniques allow separation of proteins from contaminants based on charge,protein size, and binding affinity. In some cases the initial purification method includes isolation of a small quantity of protein using antibody immobilized on a solid support. While this immune-based purification, using immuno-affinity column chromatography, permits purification of protein for research purposes, the limited life span of an antibody bound to a column and the limited availability and reproducibility of antibody sources do not permit large-scale purification for pharmaceutical use.

Alternatively, a terminal tag such as poly-histidine can be added to the target protein to improve the efficiency of the purification procedure [24].An affinity column designed to recognize polyhistidine could be used to isolate the tagged protein.The polyhistidine is then cleaved and the mixture dialyzed. While this approach is useful for small-scale preparations, whether it can be used for large pharmaceutical scale preparations remains to be seen.

With even a small amount of purified target protein, detailed molecular characterization can proceed. At this stage the recombinant protein is evaluated in terms of secondary and tertiary molecular structure, protein fragmentation, molecular heterogeneity, degree of glycosylation, and stability. The technologies developed to address these issues are summarized in Table 4.5. Many of these methods are discussed in texts listed in the references. During small-scale preparation of the target protein, isolation and purification tools often are refined and optimized.

A combination of these optimized tools are used for large-scale operations essential to obtaining sufficient quantities of recombinant protein for preclinical and clinical development. Additional scale-up and purification strategies are discussed in the following section. Regardless of the purifi cation strategy, a number of the methods used to verify the expressed protein form the basis of controlling the purity, functionality, and stability of the biopharmaceutical. Some key methods—functional (enzyme activity, cell growth, or receptor binding) assays and enzyme-linked immunosorbent assays (ELISA)—are used to study the time course of drug disposition.

Diabetes Sustenance

Diabetes Sustenance

Get All The Support And Guidance You Need To Be A Success At Dealing With Diabetes The Healthy Way. This Book Is One Of The Most Valuable Resources In The World When It Comes To Learning How Nutritional Supplements Can Control Sugar Levels.

Get My Free Ebook

Post a comment