0% found this document useful (0 votes)
42 views60 pages

Biological - Databases Class Work 60

The document provides an overview of biological databases and bioinformatics, detailing their definitions, types, applications, and limitations. It emphasizes the importance of computational tools in understanding biological macromolecules and highlights various databases used for sequence, structural, and functional analysis. Additionally, it discusses the challenges faced by biological databases, such as errors and redundancy, and lists major repositories for DNA sequences.

Uploaded by

ckittu009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views60 pages

Biological - Databases Class Work 60

The document provides an overview of biological databases and bioinformatics, detailing their definitions, types, applications, and limitations. It emphasizes the importance of computational tools in understanding biological macromolecules and highlights various databases used for sequence, structural, and functional analysis. Additionally, it discusses the challenges faced by biological databases, such as errors and redundancy, and lists major repositories for DNA sequences.

Uploaded by

ckittu009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Introduction and

Biological Databases
Overview
Introduction to biological
Databases
Introduction
What Is a Database?
What Is
Bioinformatics?
Types of Databases
Goal
Biological Databases
Scope
Pitfalls of Biological databases
Applications
Information Retrieval from
Limitations
Biological databases
What is Bioinformatics?

Bioinformatics involves the technology


that uses computers for storage,
retrieval, manipulation, and distribution
of information related to biological
macromolecules

Bioinformatics & Computational Biology


Goal

Better understand a living cell and how it


functions at the molecular level
Scope

The development of computational tools


and databases

The application of these tools and


databases in generating biological
knowledge
Scope

Tools development:

Writing software for sequence,


structural, and functional analysis

Construction and curating of biological


databases
Tools: Used in three areas

Molecular Sequence Analysis

Molecular Structural Analysis

Molecular Functional Analysis


Sequence Analysis
Sequence Alignment

Sequence Database Searching

Motif and Pattern Discovery

Gene and Promoter Finding

Reconstruction of Evolutionary
Relationships

...
Structural Analysis

• Protein and nucleic acid structure

Analysis

Comparison

Classification

Prediction
Functional Analysis

Gene Expression Profiling

Protein– Protein Interaction Prediction

Protein Sub cellular Localization


Prediction

Metabolic Pathway Reconstruction

...
Applications

Drug design

Agricultural biotechnology

Forensic DNA analysis


Limitations

Fighting a battle without intelligence is


inefficient and dangerous
Introduction to
Biological Databases
What is a Database?

Type of Databases:

Relational Databases

Object-Oriented Databases
Biological Databases

Primary Databases

Secondary Databases
Databases in Bioinformatics

Sequence databases

Sequence analysis

Functional genomics

Literature databases

Structural databases

Metabolic pathway databases

Specialized databases
Pitfalls of Biological Databases

Errors in Sequence Databases

Redundancy in the Primary Sequence


Databases

False or Incomplete Genes Annotations


Errors in Nucleotide Sequences

sequencing errors

frame-shifts

Contaminated with sequences from


cloning vectors

Exceptional Care for sequences


produced before the 1990s
Redundancy
repeated submission

identical or overlapping sequences by


the same or different authors

revision of annotations

dumping of expressed sequence tags


(EST) data

poor database management


Bioinformatics Databases
Growing steadily in number

Growing amazingly in size

Specialization

Which genome they contain (mouse, human, all of them)

Which types of information about the genome they contain

Contain information such as

Sequences: of bases and of residues

Structure: 3d conformations of known proteins

Families: Which sets of genes are known to be homologous

Annotations: which processes each gene is involved in

And lots of other information


The definitive source….

• More than 1300 DB

• http://nar.oxfordjournals.org/content/39/suppl_1.toc
DNA Sequence
databases
Main repositories:
GenBank (US)
(http://www.ncbi.nlm.nih.gov/Genbank/index.html)

EMBL (Europe)
(http://www.ebi.ac.uk/embl/)

DDBJ (Japan)
(http://www.ddbj.nig.ac.jp/)

Primary databases
DNA sequences are identical
EMBL Database
Number of entries
(current 199,575,971)

Graphs created on 22 November 2010

http://www.ebi.ac.uk/embl/Services/DBStats/
www.ncbi.nlm.nih.gov
ENTREZ
NCBI (USA) National Center for Biotechnology Information
PubMed: The biomedical literature (PubMed)

Nucleotide sequence database (Genbank)

Protein sequence database


http://www.ncbi.nlm.nih.gov/Entrez/
Structure: three-dimensional macromolecular structures

Genome: complete genome assemblies

PopSet: population study data sets

OMIM: Online Mendelian Inheritance in Man

Taxonomy: organisms in GenBank

Books: online books

ProbeSet: gene expression and microarray datasets

3D Domains: domains from Entrez Structure

UniSTS: markers and mapping data

SNP: single nucleotide polymorphisms

CDD: conserved domains

Journals: journals in Entrez

UniGene: gene-oriented clusters of transcript sequences

PMC: full-text digital archive of life sciences journal literature


PubMed is…

• National Library of Medicine's search service


• >20 million citations in MEDLINE
• links to participating online journals
• PubMed tutorial (via side bar)
Entrez integrates…
• the scientific literature;
• DNA and protein sequence databases;
• 3D protein structure data;
• population study data sets;
• assemblies of complete genomes
Entrez is a search and retrieval
system

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Sequence Databases

Annotated sequence databases


SWISS-PROT, GenBank etc…

Usage: identifying function, retrieving information

Low-annotation sequence databases


EST databases, high-throughput genome sequences

Usage: discovery of new genes


General Protein Databases

SWISS-PROT

– Manually curated

– high-quality annotations, less data

GenPept/TREMBL

– Translated coding sequences from GenBank/EMBL

– Few annotations, more up to date

PIR

– Phylogenetic-based annotations

All 3 now combining efforts to form UniProt (http://www.uniprot.org)


Low-annotation Databases

ESTs (Expressed Sequence Tags)

Low quality sequences generated by high


-volume sequencing the 3’ or 5’ end of
cDNAs

High-throughput genome sequences

Produced by mass-sequencing of
genomic DNA
Non-redundant Databases
Sequence data only: cannot be browsed, can
only be searched using a sequence

Combine sequences from more than one


database

Examples:

NR Nucleic (genbank+EMBL+DDBJ+PDB
DNA)
NR Protein (SWISS-
PROT+TrEMBL+GenPept+PDB protein)
Sequence & Structure Databases

PDB (Protein Databank)

Stores 3-dimensional atomic coordinates for biological molecules including


protein and nucleic acids

Data obtained by X-ray crystallography, NMR, or computer modelling

http://www.rcsb.org/pdb/

MMDB (Molecular Modelling database)

Over 28,000 3D macromolecular structures, including proteins and


polynucleotides

(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure)

SCOP (Structural Classification of Proteins)

Classification of proteins according to structural and evolutionary relationships


File Formats
GenBank/GB, genbank flatfile format

NBRF format

EMBL, EMBL flatfile format

Swissprot

GCG, single sequence format of GCG software

DNAStrider, for common Mac program

Pearson/Fasta, a common format used by Fasta programs and others

Phylip3.2, sequential format for Phylip programs

Phylip, interleaved format for Phylip programs (v3.3, v3.4)

Plain/Raw, sequence data only (no name, document, numbering)

MSF multi sequence format used by GCG software

PAUP"s multiple sequence (NEXUS) format

ASN.1 format used by NCBI


EMBL Format
ID TRBG361 standard; mRNA; PLN; 1859 BP.
XX
AC X56734; S46826;
XX
SV X56734.1
XX
DT 12-SEP-1991 (Rel. 29, Created)
FH Key Location/Qualifiers
DT 15-MAR-1999 (Rel. 59, Last updated, Version 9)
FH
XX
DE Trifolium repens mRNA for non-cyanogenic beta- FT source 1..1859
glucosidase FT /db_xref="taxon:3899"
XX FT /mol_type="mRNA"
KW beta-glucosidase. FT /organism="Trifolium repens"
XX FT /tissue_type="leaves"
OS Trifolium repens (white clover) FT /clone_lib="lambda gt10"
OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; FT /clone="TRE361"
Tracheophyta; FT CDS 14..1495
OC Spermatophyta; Magnoliophyta; eudicotyledons; core
FT /db_xref="GOA:P26204"
eudicots; rosids;
OC eurosids I; Fabales; Fabaceae; Papilionoideae; FT /db_xref="SWISS-PROT:P26204"
Trifolieae; Trifolium. FT /note="non-cyanogenic"
XX FT /EC_number="3.2.1.21"
RN [5] FT /product="beta-glucosidase"
RP 1-1859 FT /protein_id="CAA40058.1"
RX MEDLINE; 91322517. FT /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSRSSFPRGFI
RX PUBMED; 1907511. FT FGAGSSAYQFEGAVNEGGRGPSIWDTFTHKYPEKIRDGSNADITVDQYHRYKEDVGIMK
RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.; FT DQNMDSYRFSISWPRILPKGKLSGGINHEGIKYYNNLINELLANGIQPFVTLFHWDLPQ
RT "Nucleotide and derived amino acid sequence of the
FT VLEDEYGGFLNSGVINDFRDYTDLCFKEFGDRVRYWSTLNEPWVFSNSGYALGTNAPGR
cyanogenic
RT beta-glucosidase (linamarase) from white clover FT CSASNVAKPGDSGTGPYIVTHNQILAHAEAVHVYKTKYQAYQKGKIGITLVSNWLMPLD
(Trifolium repens L.)."; FT DNSIPDIKAAERSLDFQFGLFMEQLTTGDYSKSMRRIVKNRLPKFSKFESSLVNGSFDF
RL Plant Mol. Biol. 17(2):209-219(1991). FT IGINYYSSSYISNAPSHGNAKPSYSTNPMTNISFEKHGIPLGPRAASIWIYVYPYMFIQ
XX FT EDFEIFCYILKINITILQFSITENGMNEFNDATLPVEEALLNTYRIDYYYRHLYYIRSA
RN [6] FT IRAGSNVKGFYAWSFLDCNEWFAGFTVRFGLNFVD"
RP 1-1859 FT mRNA 1..1859
RA Hughes M.A.; FT /evidence=EXPERIMENTAL
RT ; XX
RL Submitted (19-NOV-1990) to the EMBL/GenBank/DDBJ
SQ Sequence 1859 BP; 609 A; 314 C; 355 G; 581 T; 0 other;
databases.
RL M.A. Hughes, UNIVERSITY OF NEWCASTLE UPON TYNE, aaacaaacca aatatggatt ttattgtagc catatttgct ctgtttgtta ttagctcatt 60
MEDICAL SCHOOL, NEW CASTLE cacaattact tccacaaatg cagttgaagc ttctactctt cttgacatag gtaacctgag 120
RL UPON TYNE, NE2 4HH, UK tcggagcagt tttcctcgtg gcttcatctt tggtgctgga tcttcagcat accaatttga 180
XX aggtgcagta aacgaaggcg gtagaggacc aagtatttgg gataccttca cccataaata 240
DR GOA; P26204. tccagaaaaa ataagggatg gaagcaatgc agacatcacg gttgaccaat atcaccgcta 300
DR MENDEL; 11000; Trirp;1162;11000. caaggaagat gttgggatta tgaaggatca aaatatggat tcgtatagat tctcaatctc 360
DR SWISS-PROT; P26204; BGLS_TRIRP. ttggccaaga atactcccaa agggaaagtt gagcggaggc ataaatcacg aaggaa
XX
LOCUS SCU49845
Genbank Format
5028 bp DNA PLN
gene 687..3158
21-JUN-1999
/gene="AXL2"
DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial
CDS 687..3158
cds, and Axl2p
/gene="AXL2"
(AXL2) and Rev7p (REV7) genes, complete cds.
/note="plasma membrane glycoprotein"
ACCESSION U49845
/codon_start=1
VERSION U49845.1 GI:1293613
/function="required for axial budding pattern of
KEYWORDS .
cerevisiae"
SOURCE Saccharomyces cerevisiae (baker's yeast)
/product="Axl2p"
ORGANISM Saccharomyces cerevisiae
/protein_id="AAA98666.1"
Eukaryota; Fungi; Ascomycota; Saccharomycotina;
/db_xref="GI:1293615"
Saccharomycetes;
/translation="MTQLQISLLLTATISLLHLVVATPYEAYPIGKQY
Saccharomycetales; Saccharomycetaceae;
TFQISNDTYKSSVDKTAQITYNCFDLPSWLSFDSSSRTFSGEPSSDLL
Saccharomyces.
VILEGTDSADSTSLNNTYQFVVTNRPSISLSSDFNLLALLKNYGYTNG
REFERENCE 1 (bases 1 to 5028)
VFNVTFDRSMFTNEESIVSYYGRSQLYNAPLPNWLFFDSGELKFTGTA
AUTHORS Torpey,L.E., Gibbs,P.E., Nelson,J. and
TSYSFVIIATDIEGFSAVEVEFELVIGAHQLTTSIQNSLIINVTDTGN
Lawrence,C.W.
YLDDDPISSDKLGSINLLDAPDWVALDNATISGSVPDELLGKNSNPAN
TITLE Cloning and sequence of REV7, a gene whose function
DVIYFNFEVVSTTDLFAISSLPNINATRGEWFSYYFLPSQFTDYVNTN
is required for
DHDWVKFQSSNLTLAGEVPKNFDKLSLGLKANQGSQSQELYFNIIGMD
DNA damage-induced mutagenesis in Saccharomyces
NATSTRSSHHSTSTSSYTSSTYTAKISSTSAAATSSAPAALPAANKTS
cerevisiae
CGVAIPLGVILVALICFLIFWRRRRENPDDENLPHAISGPDLNNPANK
JOURNAL Yeast 10 (11), 1503-1509 (1994)
NPFDDDASSYDDTSIARRLAALNTLKLDNHSATESDISSVDEKRDSLS
MEDLINE 95176709
SQSKEELLAKPPVQPPESPFFDPQNRSSSVYMDSEPAVNKSWRYTGNL
PUBMED 7871890
YGSQKTVDTEKLFDLEAPEKEKRTSRDVTMSSLDPWNSNISPSPVRKS
REFERENCE 2 (bases 1 to 5028)
HRNRHLQNIQDSQSGKNGITPTTMSTSSSDDFVPVKDGENFCWVHSME
AUTHORS Roemer,T., Madden,K., Chang,J. and Snyder,M.
VDFSNKSNVNVGQVKDIHGRIPEML
TITLE Selection of axial growth sites in yeast requires
BASE COUNT 1510 a 1074 c 835 g 1609 t
Axl2p, a novel
ORIGIN
plasma membrane glycoprotein
1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaa
JOURNAL Genes Dev. 10 (7), 777-793 (1996)
61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtag
MEDLINE 96194260
121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaa
PUBMED 8846915
181 gaaccgccaa tagacaacat atgtaacata tttaggatat acctcgaaaa taat
REFERENCE 3 (bases 1 to 5028)
241
AUTHORS Roemer,T.
TITLE Direct Submission
JOURNAL Submitted (22-FEB-1996) Terry Roemer, Biology, Yale
University, New
Haven, CT, USA
FEATURES Location/Qualifiers
source 1..5028
/organism="Saccharomyces cerevisiae"
/db_xref="taxon:4932"
Swissprot format
Specialized Sequence Databases

Focus on a specific type of sequences


Sequences are often modified or specially
annotated
Usage depends on the database
Examples:
Ribosomal RNA databases
Immunology databases
Protein domain databases

Pfam (http://www.sanger.ac.uk/Software/Pfam/)

• Collection of multiple sequence alignments and hidden Markov models covering many
common protein domains and families

SMART (a Simple Modular Architecture Research Tool)

• Identification and annotation of genetically mobile domains and the analysis of domain
architectures

• (http://smart.embl-heidelberg.de/help/smart_about.shtml

CDD (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi)

• Combines SMART and Pfam databases

• Easier and quicker search


Sequence Motif Databases

Scan Prosite (http://www.expassy.org/prosite)


and PRINTS
(http://bioinf.man.ac.uk/dbbrowser/PRINTS/)

Store conserved motifs occurring in


nucleic acid or protein sequences

Motifs can be stored as consensus


sequences, alignments, or using
statistical representations such as
residue frequency tables
Ribosomal RNA Databases
RDP (Michigan State University, USA)
http://rdp.cme.msu.edu/html/

rRNA database (University of Antwerp,


Belgium)
http://rrna.uia.ac.be/

ribosomal RNA sequences are pre-aligned


according to their secondary structure
Usage: creating data sets for molecular
phylogeny, especially for microbial taxonomy
and identification
Immunological Sequence Databases

The Kabat Database of Sequences of Proteins of


Immunological Interest
www.hgmp.mrc.ac.uk/Bioinformatics/Databases/kabatp-
help.html

Sequences are classified according to antigen


specificity, and available in pre-aligned format

The Immunogenetics database (IMGT)


http://imgt.cnusc.fr:8104/

Focuses on immunoglobulins, T-cell receptors and MHC


genes
Genome Databases

Focus on one organism or group of organisms:

Colibase (E. coli and related species)


http://colibase.bham.ac.uk/

GDB (human) http://www.gdb.org/

Flybase (Drosophila) http://flybase.bio.indiana.edu/

WormBase (C. elegans) http://wormbase.org

AtDB (Arabidopsis) http://www.arabidopsis.org

SGD (S. cerevisiae) http://genome-www.stanford.edu/Saccharomyces/


Expression Databases
RNA expression

Results of microarray experiments measuring the change in


specific mRNA content under certain conditions

Array Express (EBI) and Geo (NCBI)

Not user friendly

Proteome databases

2D gel electrophoresis images representing the protein content of a


cell or tissue under specific conditions

SWISS 2D PAGE at http://us.expasy.org/ch2d/


Other Database Types
Literature

MEDLINE (http://ncbi.nlm.nih.gov/PubMed/)

HighWire (http://www.highwire.org)

Variation

dbSNP (http://ncbi.nlm.nih.gov/SNP/)

HGBase (http://hgbase/interactiva/de)

Metabolic pathways

KEGG (http://kegg.genome.ad.jp/kegg/)

WIT (http://wit.mcs/anl.gov/WIT2)

Organisms and nomenclature

Taxonomies (e.g.: http://ncbi.nlm.nih.gov/Taxonomy/ )

Mendel (http://mbclserver.rutgers.edu/CPGN)
Methods for Accessing Data

local installation

screen scraping

BioPerl

FTP sites
Local Installations
SRS

Need to obtain license from Lion Biosceinces

Download data from FTP sites

Ensembl

"framework to organize biology around the


sequences of large genomes"

www.ensembl.org
Screen Scraping
URL spoofing
construction of URLs that replicate the query

html parsing
extraction of results from html pages returned by query

Requirements
html module

knowledge of query mechanism

Method NOT advocated by most data providers


BioPerl

BioPerl is a collection of modules that


facilitates the development of Perl scripts
for bioinformatics applications.

www.bioperl.org
SWISSPROT
http://www.ebi.ac.uk/swissprot/

European/Swiss Bioinformatics Institute 1986


Highly accurate, hand curated resource
Aims:
Have a high level of annotation
Often by the people who have been
working with the gene
Have a low level of redundancy
Have a high level of integration with other
databases
TrEMBL
http://www.ebi.ac.uk/trembl/

SWISSPROT’s Big Brother


All genes which have been left out of SWISSPROT

Computer annotated rather than human annotated


PROSITE
http://ca.expasy.org/prosite/

Families of proteins

Can search using regular expressions

Similar to unix commands using wildcards, etc.

E.g., [AC]-x-V-x(4)-{ED}

Interpreted as:
[Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

Families exhibit these patterns


So we can search over families

1574 documents about 1308 different patterns


PFAM
http://pfam.sanger.ac.uk/

Maintained by the Sanger Centre (Cambridge)

Protein families aligned using HMMs

Hidden Markov Models (see later lecture)

Given a new sequence

Find families which the sequence might fit into

Sequence Coverage

11912 families

Split into Pfam-A (high quality) and Pfam-B (low quality)


SCOP and CATH
SCOP http://scop.mrc-lmb.cam.ac.uk/scop/
http://www.cathdb.info/
Structural Classification of Proteins

Hierarchically ordered and manually curated

38221 PDB Entries

110800 Domains

CATH
Classification of protein domain structures

124 folds

226 Superfamily

1148 Sequence family

14473 Domain
Using Databases
with the FASTA Format

May need to know the FASTA format


For residue sequences
First line must start with a > sign
First line contains identification information
for gene
Other lines contain the residue sequence
OK to have a ragged right format
Usually OK to have lower case (but check)
Example FASTA Format
Geninfo num, assigned by the NCBI
Indicates that SWISS-PROT was source database
SWISS-PROT Identifier
Molecule Name

> gi|121664|sp|P00435|GSHC_BOVIN GLUTATHIONE PEROXIDASE

• mcaaqrsaaalaaaaprtvyafsarplaggepfnlsslrgkvllienvak

• slcgttvrdytqmndlqrrlgprglvvlgfpcnqfghqenakneeilncl

• yvrpgggfepnfmlfekcevngekahplfaflrevlptpsddatalmtdp

• kfitwspvcrndvswnfekflvgpdgvpvrrysrrfltidiepdietlls

• qgasa
Analyzing Results
Using PERL Scripts

Database servers now do:

Increasingly specific analysis of your results

But you will eventually need to do analysis

Ideal programming language is PERL

Designed to manipulate text and files

Can use it to play around with (manipulate) strings

Will be using it in the coursework

PERL Tutorial
PDB Format
The PDB format consists of a collection of fixed format records that
describe :

Atomic coordinates,

Chemical and biochemical features

Experimental details of the structure determination

Some structural features such as

Secondary structure assignments,

Hydrogen bonding

Biological assemblies

Active sites

You might also like