In silico analysis of biological function and disease association

Finding associations to link a receptor with biological function and disease is a complex process that needs integration of information from sequence databases and literature. In addition to searching the obvious human data sources, one must not forget that information from other species maybe important in some cases. It is impossible to be comprehensive and include all available resources so the process described below is given from the perspective of starting with a sequence and adding information to that knowledge (Fig. 9.3).

In order to be able to interrogate the web resources the chromosomal localization or genetic map position of the receptor are required, together with any information on close genetic markers (single nucleotide polymorphisms—SNPs, or microsatellites). Mapping

Human genomic sequence

Human receptor sequence (cDNA)

EST/cDNA analysis

Microsatellites SNPs

Tissue expression

Disease association-function

Human chromosome location-cytogenetic/cM

Mouse receptor sequence

KOs transgenics mutants

Mouse chromosome location-cytogenetic/cM

Phenotype

Literature e.g. Pubmed, OMIM

Fig. 9.3 Relationships of gene sequence and genetic data sources that can be used to find disease associations.

information for a gene can be obtained in a variety of ways. Given that most of the human genome sequence is now available it is easy to locate a gene on a chromosome by pair-wise sequence comparison methods. There are many web sites that allow this to be done. Ensembl (http://www.ensembl.org), the NCBI site (http://www.ncbi.nlm.nih.gov/BLAST/) and the Human Genome Browser (http://genome.ucsc.edu/goldenPath/hgTracks.html) are just three of the excellent resources available. The OMIM database—Online Mendelian Inheritance in Man—(http://www.ncbi.nlm.nih.gov/Omim/) is a catalogue of human genes and genetic disorders based on the published literature. It is now incorporated in the NCBI's Entrez system and can thus be queried in the same way as GenBank and PubMed (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM). This source will provide information on mapping and known diseases linking to that gene or region. Another useful resource for linking information is the NCBI's map viewer (http://www.ncbi.nlm.nih.gov/PMGIFs/Genomes/MapViewerHelp.html) which is a comprehensive viewer that provides searching opportunities at four levels: (1) the Organism home page summarizing the resources available for that organism; (2) a Genome view displaying the complete genome as a set of chromosome ideograms; (3) a Map view presenting one or more maps of a selected chromosome; and (4) a Sequence view of a chromosome region graphically depicting the biological features annotated to that region. With extensive links, a user can quickly gather information related to genes of interest.

With regard to identification of genetic variation of the receptor sequence, there are other web sources in addition to the ones cited above that may provide further information. HGBASE (http://hgbase.cgr.ki.se/) is an attempt to summarize all known sequence variations in the human genome and includes highly curated information on SNPs, indels, simple tandem repeats and other sequence alternatives (Brookes 2001). The CGAP Genetic Annotation Initiative (http://lpg.nci.nih.gov/) is a research program to explore and apply technology for the identification of genetic variation of genes involved in cancer, but is in fact a source for examining variation in all genes. This is a data mining exercise, and differences observed in ESTs are a key source of data. The SNPs covered are classed as candidate (predicted), validated (observed in an experiment) and confirmed (tested in a minimum number of five CEPH families for Mendelian transmission and placed in genetic reference maps). This site also links to HGBASE and dbSNP, the NCBI database of SNPs (http://www.ncbi.nlm.nih.gov/SNP/).

It is important to remember that studies on mouse receptors, knockouts, for example, or existence of mouse mutants, also increase our understanding of the human receptors. There are several web sources for finding the equivalent mouse receptors and the syntenic regions of mouse chromosomes. A comprehensive source is the mouse genome informatics (MGI) site (http://www.informatics.jax.org/) where one can view mammalian homology maps, and link even further to lower organisms. If the mouse chromosomal region is known it is possible to see if any phenotypes have been linked to the region, for example, the data provided by the mouse ENU Mutagenesis Programme (http://www/mgu/har/mrc/ac/uk/mutabase/) can suggest links to phenotypes (Nolan et al. 2000). This web resource is linked back to the MGI.

Sequence variation can also be analysed in terms of the significance of any nucleotide change to protein sequence and receptor function. It is useful to know if any predicted change has been observed before. A useful reference source for GPCRs is the database tGRAP (originally tinyGRAP) (http://tinygrap.uit.no/), which is a mutation database of GPCR mutation data containing over 10,000 mutations from close to 1400 papers (Beukers et al. 1999). This data source has multiple sequence alignments of family members, with TMDs highlighted, and also links to SWISSPROT entries.

0 0

Post a comment