Identification of Human Gene by Sequence Homology

The step-by-step procedure described below allows identification of the human gene encoding the counterpart of a protein already known in another organism. We have chosen the yeast MRS4 gene, encoding a splicing protein, a member

1 Deambulum: http://www.infobiogen.fr/services/deambulum/fr/

The DEAMBULUM

Searc hi n^G en B ank BLAST Sequenc^Similarity Searching Advanced BLAST search

Swissprot -4-

Fungi -

YPD t

Databases Quick search (in YPD)

Databases

Sequence analysis

•Sequences = - Nucleic acid sequences

--Protein sequences

# - Immunological sequences — «Genomes and organisms

--Human genome

•Culture collections •Bibliographic databases

•Database sequence submission •Nucleic acid sequence analysis = •Protein sequence analysis •Biomolecule structures •Search for a pattern •Sequence comparison •Multiple sequence alignments -

* Reversion, complementation Reverse Complement

Translation-Coding regions Translate tool —Multiple Alignment

Unigene

Fig. 1. Presentation of the Deambulum server and its links with various molecular biology servers referred to in this chapter.

of the mitochondrial carrier family (MCF) that suppresses mitochondrial splicing defects. The human counterpart of this gene is unknown as of January 2001.

Because the amino acid sequences have been better conserved than nucleotide sequences through evolution, it is more efficient to start with the yeast protein sequence to identify the human nucleotide sequence. The yeast protein sequence encoded by MRS4 can be retrieved from various databases, for example, GenBank,2 EMBL,3 SWISS-PROT,4 and Yeast Proteome Database (YPD).5 Using this last resource, the amino acid sequence is retrieved by (1) clicking on Quick search for YPD (Saccharomyces cerevisiae), (2) entering the acronym MRS4 in the Gene names section and clicking on Submit, and (3) selecting on the next screen the retrieved result, MRS4 (in hypertext format), which leads to the general description of the gene and the protein. Then, in the Sequence section of the scrolling screen, select See protein sequence (hypertext format). The sequence displayed on the next screen is selected (no need to eliminate the amino acid numbering) and copied [Ctrl-C], and this particular window is closed.

This amino acid sequence is used to identify human ESTs that could represent part of the human cDNA encoding the MRS4 human protein homolog (Fig. 2). This is performed by using the Basic Local Alignment Search Tool (BLAST), that is, BLAST sequence similarity searching.6 Different kinds of analyses can be performed with this program. The one to be used for our purpose is the Translated BLAST Searches and the program to be chosen is tblastn (click on

2 GenBank: http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html

3 EMBL: http://www.ebi.ac.uk/embl/index.html

4 SWISS-PROT: http://www.expasy.ch/sprot/

5 YPD: http://www.proteome.com/databases/index.html

6 BLAST: http://www.ncbi.nlm.nih.gov/BLAST/

Yeast MRS4 amino acid sequence

BLAST searching (tblastn option)

in human dbEST Genbank

0 0

Post a comment