When considering the topic of bioinformatics, one must recognize that this is a broad term, covering many different research areas. The roots of bioinformatics lie in the decades-old field of computational biology. Advances in computer technology that yielded faster computations fueled the expansion of this field into many different scientific areas, as did the development of new mathematical algorithms that allowed highly sophisticated problems to be solved quickly. For instance, with the computer technology available in 2003, it is relatively straightforward to determine a three-dimensional structure of a protein by x-ray crystallography, by nuclear magnetic resonance-distance geometry-molecular dynamics approaches, or to compare a large set of genes structurally. Out of bioinformatics have grown some new scientific disciplines: functional genomics, structural genomics, and evolutionary genomics. The term bioinformatics is routinely applied to experiments in genomics that rely on sophisticated computations.
One reason why computational methods are so critical is that biological macromolecules must be simulated in a three-dimensional environment for realistic comparisons and visualizations to be made. A typical molecule is represented in a computer by a set of three Cartesian coordinates per atom that specify the position of each atom in space. There are sets of three-atom and dihedral angles to specify interatomic connectivity, and designations for the start and end points of chains, where needed. Fortunately, it is a relatively simple matter to build or download molecules from databases. Although the Cartesian coordinate system provides the relative positions of the atoms in space, the computer offers the opportunity to let the molecule evolve in the fourth dimension, time. This is where energy minimization routines and molecular dynamics simulations are used.
Functional genomics seeks to make an inference about the function of a gene (Fig. 4.15). Given a novel gene sequence to be tested, the functional genomics method starts by comparing the new gene with genetic sequences in a database. In some cases, the name of the gene, a function, or a close analogy to previously identified genes can be made. The test gene is then used as a template to construct, in a computer, a three-dimensional model of the protein. The three-dimensional model is refined by searching databases for possible folded structures of the protein product. Lastly, the likely folded structure of the protein from the new gene is inferred by comparison with folding patterns of proteins of known structure and function.
The aforementioned method is not without difficulty. There is a lot of genetic overlap between organisms of different types. Usually, as a first step, the genomic sequences of prokaryotes and Archaebacteria are subtracted out, but there can still be overlap with organisms such as yeasts. In addition, the human genome possesses introns and exons, and the genetic components that encode one protein may lie on parts of one chromosome that are separated by a large number of nucleotides, or the gene may even be distributed over more than one chromosome.
Each cell in the human body contains a full set of chromosomes and an identical complement of genes. At any given time in a given cell, only a fraction of these genes may be expressed. It is the expressed genes that give each cell its uniqueness. We use the term gene expression to describe the transcription of the information contained in the DNA code into mRNA molecules that are translated into the proteins that perform the major functions of the cell. The amount and type of mRNA that is produced by a cell provides information on which genes are being expressed and how the cell responds dynamically to changing conditions (e.g., disease). Gene expression can act as an "on/off" switch to control which genes are expressed, and a level regulator, somewhat like a volume control, that increases or decreases the level of expression as necessary. Thus, genes can be on or off, and low or high.
Historically, scientists wishing to study gene expression could analyze mRNA from cell lines, but the complexity of the task meant that only a few genes could be studied at once. The advent of DNA microarray technology allows scientists to analyze expression of thousands of genes in a single experiment, quickly and efficiently. DNA microarray technology facilitates the identification and classification of DNA sequence information and takes steps toward assigning functions to the new genes. The fundamental precept of microarray technology is that any mRNA molecule can hybridize to the DNA template from which it originated.
In a typical microarray experiment, an array is constructed of many DNA samples. Automation that was developed for the silicon chip industry helps in this procedure, spotting thousands of different DNA samples on glass plates, silicon chips, or nylon wafers. mRNA isolated from probe cells is treated with a mixture of fluorescent-tagged (usually red, green, and yellow) nucleotides in the presence of reverse transcriptase. This process generates fluorescent-tagged cDNA. The fluors will emit light when excited with a laser. The labeled nucleic acids are considered mobile probes that, when incubated with the stationary DNA, hybridize or bind to the complementary molecules (sequences that can base pair with each other). After hybridization, bound cDNA is detected by use of a laser scanner. Data on the presence or absence of fluorescence, the color, and the intensity at the various points in the array are acquired in a computer for analysis.
As an example, we can consider two cells: cell type 1, a healthy cell, and cell type 2, a diseased cell. Both cell types contain an identical set of four genes: A, B, C, and D. mRNA is isolated from each cell type and used to create fluorescent-tagged cDNA. In this case, red and green are used. Labeled samples are mixed and incubated with a microarray that contains the immobilized genes A, B, C, and D.
The tagged molecules bind to the sites on the array corresponding to the genes being expressed in each cell. A robotic scanner, also a product of silicon chip technology, excites the fluorescent labels, and images are stored in a computer. The computer can compute the red-to-green fluorescence ratio, subtract out background noise, and so on. The computer creates a table of the intensity of red-to-green fluorescence for every point in the matrix. Perhaps both cells express the same levels of gene A, cell 1 expresses more of gene B, cell 2 (the diseased cell) expresses more of gene C, and neither cell expresses gene D. This is a simplistic explanation; experiments have been reported in which as many as 30,000 spots have been placed in the microarray.
DNA microarrays can detect changes in gene expression levels, expression patterns (e.g., the cell cycle), genomic gains and losses (e.g., lost or broken parts of chromosomes in cancer cells), and mutations in DNA (single-nucleotide polymorphism [SNPs]). SNPs are also of interest because they may provide clues about how different people respond to a single drug in different ways.
Was this article helpful?