High Throughput Gene Cloning And Expression For The Fabrication Of Proteome Microarrays

The genomics era ushered in a new type of research. For the first time in history the decoded "blueprint of life" was available for study. The Human Genome Project (HGP) was the first major undertaking of its kind. The HGP was a 13-year project coordinated by the U.S. Department of Energy and the National Institutes of Health that was completed in 2003. The goal was to identify all of the approximately 20,000 to 30,000 genes in human DNA, determine the sequences of the 3 billion chemical base pairs that make up human DNA, and store this information in databases. This initiative sparked the boom in genome sequencing, and today there is a vast amount of publicly available data stored in numerous databases. The genomes of more than 180 organisms have been sequenced since 1995, and the data from these projects are available to the scientific community. The availability of genome sequence data leads to the first big boom in microarray research. Researchers used this sequence to create libraries of oligo nucleotides representing all the ORFs of the organism being studied. Robotic machines were then used to transfer and arrange nanoliter amounts of thousands of gene sequences representing a cell expression state on a single microscope slide. These high-density, highly organized arrays of DNA are called microarrays. DNA microarrays allowed for the miniaturization and high-level multiplexing of Southern blots. Affymetrix and Agilent technologies pioneered the use of DNA microarrays for full genome gene expression analysis. Researchers were now able to track the expression of genes in response to various stimuli and incorporated this new tool in their existing research. They could track the synthesis and degradation at the mRNA level, to determine which genes are turned on and which are turned off in a given cell. We now knew the number and sequence of the expressible genes and the expression pattern of the genes in response to a particular stimulus. An area that still needs to be explored, however, is how the proteins performing their function once expressed. This conundrum has ushered in the age of protein microarrays.

Currently, protein microarrays are being used for a variety of assays. There has been much trial and error when attempting to miniaturize binding assays, enzyme activity assays, and serological assays. The same tools that helped propel the era of the DNA microarray are being applied to protein microar-rays. Protein microarrays permit the miniaturization and high-level multiplexing of protein-based assays. Companies like Protometrix cloned, expressed, and purified thousands of proteins to make the Yeast Proteome chip and the Human Partial Proteome chip. The Human chip, now on version 8, is available commercially from Invitrogen, with approximately 8000 human proteins that can be used for screening binding partners, substrates for enzymes, small molecules, and antibody profiling. That is the equivalent of 8000 co-sedimentation assays, 8000 phosphorylation assays, or 8000 ELISA assays on a surface area of 3 inches x 1 inch. When all the approximately 29,000 ORFs that have been cloned are eventually expressed and printed on a microarray, all proteins expressed by human cells will be contained on two or three microscope slides and ready for assay.

Traditional multistep cloning methods have been a major stumbling block because of their innate low throughput. Starting first by amplifying the desired insert by polymerase chain reaction (PCR), then digesting it and the vector, followed by ligation, subsequent transformation, plating, colony picking, and verification of clones, the volume of work is the rate--imiting step. Technical problems may be encountered in any of the aforementioned steps, and most often than not, researchers spend months cloning a few genes of interest. There are, however, ways to circumvent this laborious cloning methodology when attempting to manufacture a high-content microarray.

One way is to take advantage of the vast genomic sequence data and the high-throughput friendly PCR reaction to amplify out all of the ORFs, followed by a second PCR step to make the products transcriptionally active. The secondary PCR step adds a promoter region at the 5- end and a stop codon at the 3' end of the PCR product, making a transcriptionally active PCR (TAP) product. The production of TAP fragments for expression can yield a large library of expressible ORFs in a relatively short amount of time. TAP fragments have been reported to express very well in a variety of expression systems [12]. Organisms with large genomes that would otherwise take decades to clone into an expressible form will now only take two rounds of PCR. A recent report by Regis et al. has shown that in the case of Plasmodium falciparum, TAP fragments could be used for a large-scale screening of the humoral immune response and eventual selection of antigens for further study [ 13]. While providing the throughput, TAP fragments are limited by the amount of product produced during the second PCR step.

Recent research has shown that serological screening of cDNA libraries created from cancer cell lines can be used for the discovery of cancer biomark-ers. Cancer is the result of a synergistic malfunctioning of multiple signaling pathways. This may or may not be the same set of pathways, proteins, and mutations in every cancer type and tissue, so it becomes even more critical to cast the widest net to find the most potential biomarkers [14]. Lung cancer cell lines were used to make cDNA libraries that have been used successfully for serological identification of serodominant antigens [16,17]. This is encouraging, but the task at hand with this technology is more daunting. After the creation of the reverse complement of the mRNA, there is still a need to use brute-force cloning techniques to accomplish the task. One must keep in mind that the majority of mRNAs in most tissues are in low abundance, and thus quite a few of the mRNAs will be underrepresented in the cDNA library [15,18]. Extremely large number of clones are required to ensure a good representation of these low-abundance genes. There is also the very real problem of partial cDNA clones, which do not have the complete sequence. If we are fortunate enough to minimize these two issues, there is still the very real bottleneck of screening the clones. High-density gridding is becoming more and more common and makes it possible to screen large amounts of mRNA complementary DNA. This technology is designed largely for gene discovery, but if refined to the point that all mRNA transcripts are represented equally, we may have a tool for more reliable screening of human serum samples.

The next rung on the ladder would be the creation of an expressible genome library that contains all the ORFs in the genome. To make the genome library quickly and efficiently, we need to consolidate all the traditional cloning steps into one efficient step. Recombination in vivo provides the all-encompassing single step. Recombination cloning is widely used in research, but for a long time it was thought that bacteria lacked the mechanism to allow for recombination. However, bacterial in vivo homologous recombination has been an efficient and heavily used tool in the genetic field. Recombination - based cloning allows DNA sequences to be inserted or deleted without regard to location of restriction sites - 19,20] . One very widely used methodology for recombination cloning is the Gateway system now offered commercially by Invitrogen. The first step to Gateway cloning is inserting the gene of interest into the Gateway entry vector. There are two ways to clone your gene of interest into a gateway entry clone. The first is the standard cut- and- paste protocol using restriction enzymes and ligase. The second way is to create a PCR product with terminal attB sites, using primers containing a 25-base pair attB sequence, plus four terminal G' s. This product will then be inserted in the entry clone using reconstituted recombination machinery. PCR-based inserts can be made using genomic DNA, a cDNA library, or an plasmid clone containing your gene of interest. An entry clone is a vector containing your gene of interest and flanked by Gateway att recombination sites. Bacteriophage lambda att site recombination is a well-characterized phenomenon [21]. Bacteria have a stretch of DNA sequence called att encoded into their genome, and bacteria phage have the same stretch of DNA sequence. When the phage infects a bacterium, the lambda DNA injected recombines with the corresponding bacterial DNA via the att sites in the presence of integration-specific enzymes. The enzyme-assisted recombination results in integration of the phage DNA into the bacterial genome. Using the same reconstituted lambda att site recombination system, entry clones containing a gene of interest can then be transferred into any expression vector for subsequent protein expression. Although the Gateway System works and increases cloning throughput dramatically, it is not efficient enough to work as a true high-throughput alternative.

In vivo recombination cloning directly into an optimized expressible vector, however, appears to have solved the throughput dilemma. Recent advances in cloning methodologies have resulted in the ability to clone entire bacterial ORFeomes, comprising thousands of genes, in a relatively short time. Figure 2 illustrates the typical results one would see beginning with the PCR step followed by the in vivo recombination cloning, and checking the clones by PCR using sequence specific primers. Specifically, in vivo recombination cloning [22] in Erscherichia coli has broken the barriers posed by traditional brute -force cloning methodologies that rely on multiple -step cloning techniques: from PCR, restriction enzyme digestion of the PCR product and vector, ligation, to single colony screen and selection. Available sequence data are used to design 5- and 3- gene-specific primers for all ORFs encoded in the genome. The primers contain 53 nucleotides, which are comprised of a 33-nucleotide recombination adapter sequence and a 20-nucleotidegene specific sequence. The PCR products are then mixed with a T7-based linear expression vector as described previously [22] and transformed into supercom-petent DH5a cells (Antigen Discovery Inc.). The cells transformed are grown overnight and are checked for turbidity the following day. DNA can then be purified from the mixed cultures using 96-well plate-based mini-prep protocols such as Qiaprep Turbo 96.

The plasmid DNA purified from overnight culture can then be used to express the protein encoded by the insert. Proteins are expressed in a coupled in vitro transcription-translation (IVT) reaction in an E. coli-based cell-free

Figure 2 Synthesis and verification of clones. Representative images of whole Vv ORFeome PCR (A), cloning (B), and QC-PCR (C).

expression system such as RTS 100 from Roche. The unpurified proteins can then be printed directly to nitrocellulose-coated slides along with a set of negative and positive controls using a contact microarray printer. It is crucial that the mixture of unpurified proteins be spotted as quickly as possible. The quality of the arrays is inversely proportional to the amount of time from the end of the reaction to the deposition onto the nitrocellulose. Once the unpurified protein mixture is spotted and dry, it is very stable.

In the high-throughput microarray fabrication first reported by Davies et al. in 2005 [22], the cell-free expressed proteins can be detected using antibodies against the N-terminal polypolyhistidine (polyHis) tag and the C-terminal hemagglutinin (HA) tag directly on the chip. The antibodies were used to monitor expression of the large numbers of parallel reactions. The arrays probed using a mouse monoclonal antibody raised against the polyHis epitope and rat monoclonal antibody raised against the HA epitope are visualized using a fluorescent-based microarray scanner. An example of one such hybridization and scanning output can be seen in Figure 3 . The data can then be quantified using a microarray data analysis software package that can quantify the intensity of the spots on the microarray chip. Each array contains positive control spots printed from serial dilutions of whole immunoglobulin G (IgG). Each array also contains no DNA negative control spots, and the reactivity of



Figure 3 Quality assessment for protein expression. Images of whole Vv proteome microarray probed with mAb against polyhistidine (A) and hemoagglutinin tags (B). 99.0% of Vv proteins were reactive to antipolyhistidine antibody and 88.2% to anti-hemoagglutinin antibody. The images show four dilutions of human IgG (yellow box), positive controls (green box), and six negative controls (mock transcription/translation reaction) (red circles). The remaining spots are Vv proteins. (See insert for color reproduction of the figure. )

these spots is low for both serum samples. The positive and negative controls are used to normalize the data using a modified VSN package in a statistical environment named R [23,24] from arrays that are probed on different days. There are also serially diluted EBNA1 protein control spots, which have been shown to be reactive to varying degrees in different human subjects [25-27]. Once an array has passed quality assurance, it is ready to be used in serological studies.

5 Easy Ways To Stop Smoking

5 Easy Ways To Stop Smoking

Your first day without cigarettes can be difficult, but having a plan will make it easier! Learn what steps to take on the day you quit smoking.

Get My Free Ebook

Post a comment