0% found this document useful (0 votes)
164 views8 pages

DDBJ Database BioInformatics Notes

The document discusses the DDBJ Database, which is one of three nucleotide databases that form the International Nucleotide Sequence Database Collaboration. It was established in 1986 in Japan and collects sequence data from researchers worldwide, making it publicly accessible. The database provides tools for data retrieval and submission, and also works to develop software and provide training.

Uploaded by

euphoria.ly29
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
164 views8 pages

DDBJ Database BioInformatics Notes

The document discusses the DDBJ Database, which is one of three nucleotide databases that form the International Nucleotide Sequence Database Collaboration. It was established in 1986 in Japan and collects sequence data from researchers worldwide, making it publicly accessible. The database provides tools for data retrieval and submission, and also works to develop software and provide training.

Uploaded by

euphoria.ly29
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 8

DDBJ Database

Introduction
Databases are like information banks which are used for storing and retrieving sequence
information. DNA Databank of Japan (DDBJ) is one of three nucleotide databases that
together with the National Centre for Biotechnology Information (NCBI) and European
Bioinformatics Institute (EMBL), form a consortium known as International Nucleotide
Sequence Database Collaboration (INSDC). DDBJ is the only nucleotide sequence databank
of Asian origin and mainly collects sequences from Japanese researchers. It is a primary
nucleotide database; it collects data directly from the researchers; hence the data is freely
accessible by everyone. On accepting a nucleotide sequence, DDBJ issues an accession
number to the submitter which has international recognition.

DDBJ Homepage: https://www.ddbj.nig.ac.jp/index-e.html

History
DDBJC was established in the year 1986 at the National Institute of Genetics (NIG), Japan
with support from the Japanese Ministry of Education, Culture, Sports, Science and
Technology (MEXT). Later, for its efficient functioning, the Center for Information Biology
(CIB) was established at NIG in 1995. In 2004, NIG was made a member of Research
Organization of Information and Systems.
The functioning and maintenance of DDBJ is monitored by an international advisory
committee consisting of 9 members from Japan, Europe and USA. The committee reviews
the functioning of DDBJ and reports the progress of DDBJ in database issue of Nucleic Acid
Research Journal every year. Since its inception there has been a tremendous increase in the
number of sequences submitted to DDBJ.

Figure: Growth of DDBJ/EMBL/NCBI in terms of nucleotide data submitted over the years.

Roles of DDBJ
As a member of INSDC, the primary objective of DDBJ is to collect sequence data from
researchers all over the world and to issue a unique accession number for each entry. The
data collected from the submitters is made publicly available and anyone can access the data
through data retrieval tools available at DDBJ. Everyday data submitted at either DDBJ or
EMBL or NCBI is exchanged, therefore at any given time these three databases contain same
data

.
Activities of DDBJ
Following are the activities of DDBJ:
Collection of sequence
The sequences collected from the submitters are stored in the form of an entry in the
database. Each entry consists of a nucleotide sequence, author information, reference,
organism from which the sequence is determined, properties of the sequence etc.

Figure: Snapshot of nucleotide sequence of large subunit of Rubisco of Arabidopsis thaliana


retrieved from DDBJ. The entry contains information about various features of the sequence
like Accession number, authors etc.

Tools for data retrieval


Retrieval of data is as important as submission and one of the main objectives of any database
is to provide the users with the required information. Any database contains enormous
amount of information and retrieving the requited information is also a tricky task which
depends on right use of search strings DDBJ hosts a number of tools for data retrieval like
getentry (database retrieval by unique identifiers) and All-round Retrieval of Sequence and
Annotation (ARSA). Unique identifiers required for retrieval through getentry can be
accession number, gene name etc.
Following are the steps along with snapshots showing data retrieval from DDB3 using
getentry:
1 Open the homepage of DDB) (http://www.ddbj nig.ac.jp).
2. Click on the Search/Analysis ink on the menu bar
3. Click on getentry link (http://getentry ddbj.nig.ac.jp/top-e.html)
4. Type in the accession number in the search box and dick on search
5. Desired sequence will be retrieved.

Figure: Snapshots of steps taken to retrieve sequence from DDBJ using getentry tool.

Software development
DDBI team continuously focuses on developing new software which can be used for data
analysis. For example, WINA (A Window Analysis Program for the number of synonymous
and nonsynonymous nucleotide substitutions) has been developed by DDB) It is tool which
helps in visualizing the difference in accumulation of both synonymous and nonsynonymous
nucleotide substitutions.

Training courses
DDBJ also focuses on providing teaching assistance on bioinformatics. It conducts
Bioinformatics training course which teaches analysis of data,

Nucleotide Sequence Submission System (NSSS)


DDB1 Nucleotide Sequence Submission System (NSSS) is a recent tool for the submission of
small-scale nucleotide sequence data to DDBT through the World Wide Web Server
(WWW). Researchers can submit their data through Www. using this system. Along with the
sequence other information like description, author information, and references can also be
added. While submitting the sequence the following information is required in a step-by-step
procedure:
1. Details of the contact person: person whom could be contacted in case of any
clarification regarding the sequence. One of the submitters is a contact person.
2. Hold date: The Submitter can either release the sequence immediately after the
submission or can enter any date within 3 years of submission after which the sequence will
be released in the public domain
3. Submitter: Details of all the authors is to be given.
4. References: If the sequence is already published then the reference is to given as primary
citation (includes title of the paper and journal in which the paper is published) while if it is
not published then "unpublished" option should be selected.
5. Sequence: Nucleotide sequence needs to be entered.
6. Template: DDB contains many sample templates that can match the requirements of the
submitter. For example, if a user is submitting a sequence obtained from bacteria then he can
select the template Templates are also used while submitting multiple sequences.
7. Annotation: It includes all the other information about the sequence for example
organism, molecule type, strain, product, etc.
After submission the sequence is reviewed and then a unique accession number is assigned
This is like an ID of the sequence which later on can be used for retrieval of the sequence
from databases.
Figure: Snapshot of the DDBJ Nucleotide Sequence Submission Homepage

Figure: Snapshot of template page of Nucleotide Sequence Submission Tool.

Mass Submission System (MSS)


A Mass Submission System (MSS) is used for submitting large-scale data that cannot be
submitted using NSSS. It is used under the following conditions:
1. When there are a large number of sequences (>1024)
2. If a sequence has many features (>30) i.e. it is a complex submission
3. If the sequence is long (>500kb)
4. If we need to submit sequence which is not accepted by NSSS like Expressed
sequence tags (EST), Sequence tagged site (STS), Transcriptome shotgun assembly
(TSA), High throughput genomic sequence (HTG), Genome survey sequences
(GSS), Whole genome shotgun sequence data (WGS), Contig/constructed data
(CON).
MSS submission requires two files:
1. Sequence file: It contains all the sequences in FASTA format.
2. Annotation file: It contains all the data about the sequence like authors, references,
features etc.

Data Updates
Once sequence is submitted the submitter receives an accession number, but after some time
the submitter feels the need to do some modification or updation in the sequence then the
option of data updation is used. Only the original data submitter is authorized to do data
updation.

DDBJ Sequence Read Archive (DRA)


The data generated by next-generation sequencing machines is neither submitted through
NSSS nor MSS but is submitted via DRA. Next generation sequencing machines produce
enormous sequencing data. Also, these have replaced microarrays as these can be used to
measure quantitative differences. DRA (DDBJ Sequence Read Archive) is a database
consisting of records of output generated by next generation sequencing machines. It contains
data from the primary analysis phase of next-generation sequencing. DRA is similar to NCBI
sequence Read Archive (SRA) and EBI Sequence Read Archive (ERA) and all three form a
part od INSDC. Data submission to DRA is facilitated by a web-based tool called
MetaDefine.

DDBJ Trace Archive (DTA)


DDBJ Trace Archive (DTA) is a database of DNA sequence chromatograms, base calls
quality estimates for single-pass reads from various large scale sequencing projects. It
contains unprocessed sequencing data. It mirrors data from NCBI Trace Archive and EBI
Trace Archive.
BioProject Database
Sequences generated as a result of a study tend to relate in one or the other manner. Such
sequences having common origin are submitted via BioProject. BioProject contains
information about various aspects of the project like title, objectives, funding etc. Data in
BioProject can be submitted by either one or more research group working under the same
project. It organizes data from the projects into archival databases which are linked. The
organization of data helps in fast sequencing and retrieval of data from various primary
dtabases across INSDC.

Database Services
DDBJ Omics Archive
DDBJ Omics Archive (DOR) contains quantitative genomics data from DNA microarray and
next-generation sequencing platforms. It exchanges data with EBI ArryExpress and imilar to
it DOR also works according to MINSEQE (Minimum Information about a High-Throughput
Sequencing Experiment) and MIAME (Minimum Information about a Microarray
Experiment). DOR accepts unprocessed as well as processed data from the researchers.

DDBJ Read Annotation Pipeline


DDBJ Read Annotation Pipeline is a cloud computing based pipeline analysis system. It is used for
analysis and annotation of raw sequencing data from Next-generation sequencers submitted in DRA.
Through cloud computing, users of Read Annotation Pipeline can access computers at the NIG to
annotate their raw sequencing data. Pipeline consists of two processes:
1. Basic Analysis: It involves sequence mapping and de novo assembly.
2. High-level Analysis: It involves analytical process for automatic and manual annotations
like SNP detection.

You might also like