Database searching using BLAST and FASTA

The aim of this section is to give an overview of the tools available for database searching and sequence comparison, it is not intended to give an in-depth description of sequence alignment algorithms. For the theory and methodology behind tools such as The basic local alignment search tool (BLAST) and FASTA, refer to the original research papers or a bioinformatic text book such as an 'Introduction to Bioinformatics' by (Attwood and Parry-Smith 1999).

The BLAST program is based on a heuristic sequence comparison algorithm used to search sequence databases for optimal local alignments to a query (Altschul et al. 1990,1997).

The BLAST suite of program's support a number of different query options. DNA sequence query to DNA database (BLASTN), protein sequence to protein database (BLASTP), six-frame translated DNA sequence query to protein database (BLASTX), protein to six-frame translated DNA database (TBLASTN) and translated DNA sequence against a translated DNA sequence database (TBLASTX). BLAST can be downloaded for local use or can be run at a number of public web sites such as the NCBI (http://www.ncbi.nlm.nih.gov/BLAST/) or DDBJ (http://www.ddbj.nig.ac.jp/E-mail/homology.html).

Position Specific Iterated BLAST (PSI-BLAST) (Altschul et al. 1997) is a hybrid version of BLAST used for searching protein sequence against a protein database. This combines the speed of the BLAST pairwise alignment algorithm with the advantages of searching with a sequence profile. On the first run of the program a normal BLASTP search is carried out. Sequences, which are found within a pre-determined threshold are used to build a profile. On subsequent searches, or iterations, the profile is used to search the database and the profile refined based on the new matches identified. This method has the advantage that more distant sequence relationships can be found. However, care must be taken to mask out low complexity regions, as these can cause the profile to degenerate and align false positive matches. Applying this method to GPCR searching can be problematic due to the relative low complexity of transmembrane regions. A public PSI-BLAST server can be found via http://www.ncbi.nlm.nih.gov/BLAST/.

FastA, described by Lipman and Pearson in 1985 (Lipman and Pearson 1985) is based around the idea of identifying short regions of sequence common to both sequences. These are then linked together in a heuristic manner similar to BLAST.

0 0

Post a comment