Data sources

Three centres provide the main repositories for sequence data. The European Molecular Biology Laboratory (EMBL) data library located at the European Bioinformatics Institute (EBI) ( GenBankĀ® the National Institute for Health (NIH) database ( located at the National Centre for Biotechnology Information (NCBI) and thirdly the DNA Data Bank of Japan (DDBJ) ( The three centres make up the International Nucleotide Sequence Database Collaboration.

Each centre provides the facilities for researchers to submit new sequences to the databases. Once submitted the sequence is provided with a unique accession number and placed into a subdivision of the database based on taxonomy or sequencing project, that is, Primate (gb_pr), EST (gb_est), high throughput genomic (HTG) (gb_htg). Beyond the sequence, other information such as the sequence description, organism, author references, biological sequence features and cross-references to other databases are captured. The information is shared freely between each centre, negating the need to query each database. Further services are provided by the database centres allowing users to query the databases via text searches using user interfaces such as NCBI's Entrez system ( or the EBI's Sequence Retrieval System (

Parallel to the public sequencing efforts a number of companies, such Human Genome Sciences, Incyte and Celera, have established proprietary EST and genomic sequence databases.

0 0

Post a comment