0% found this document useful (0 votes)

42 views53 pages

Co3 Session 21

Uploaded by

ar3wnd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views53 pages

Co3 Session 21

Uploaded by

ar3wnd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Department of CSE

COURSE NAME: DBMS

COURSE CODE: 23AD2102R
Topic:
Index Structures, Indexing
and Hashing

Session-21

CREATED BY K. VICTOR BABU

What is Indexing ?

INDEXING is a data structure technique which THE allows you datatothat

quickly
makes up
retrieve records from a database file. computerized
COLLECTIONdatabase a must be
OF physically stored
some computer storage
Indexes are used to quickly locate data withoutonhaving to search every record
medium.
in multiple disk blocks

The DBMS software can then

retrieve,
update, and process this data as needed.

Similar to Indexing in Textbooks

Indexes as Access Paths

 A single-level index is an auxiliary file that makes it more efficient to search

for a record in the data file.
 The index is usually specified on one field of the file (although it could be
specified on several fields)
 One form of an index is a file of entries <field value, pointer to record>,
which is ordered by field value
 The index is called an access path on the field.

3
What is index in Database?

• An Index is a small table having only

two columns. The first column
comprises a copy of the primary or
candidate key of a table. Its second
column contains a set of pointers for
holding the address of the disk block
where that specific key value stored.

An index -
 Takes a search key as input

 Efficiently returns a collection of matching

records.
Types of Indexing

Indexing

Clustere Seconda Multilev

Primary
d ry el
Indexing
Indexing Indexing Indexing

Dense

Sparse
Types of Indexing

WHICH
INDEXING
METHOD IS
USED ?
Primary Indexing
• Primary Index is an ordered file which is of fixed
length size with two fields.
• The first field is the same a primary key and
second field is a pointer that points to that
specific data block.
• The primary Indexing is further divided into two
types.
• Dense Index
• Sparse Index
Dense Index
A record is created for each search key valued
in the database.

Searching is faster

Requires more space to store index records

No. of records in IT = No. of records in HD

Sparse Index
• Sparse index contains only the anchor records
• To locate a record, we find the index record with
the largest search key value <= search key value
we are looking for.
• We start at that record pointed to by the index
record, and proceed along with the pointers in
the file (sequentially) until we find the desired
record.
No. of records in IT = No. of blocks in HD

Time Complexity = log2N +1

Indexes as Access Paths
 The index file usually occupies considerably less disk blocks than the
data file because its entries are much smaller
 A binary search on the index yields a pointer to the file record
 Indexes can also be characterized as dense or sparse
 A dense index has an index entry for every search key value
(and hence every record) in the data file.
 A sparse (or nondense) index, on the other hand, has index
entries for only some of the search values

1
0
Clustered Index
• Clustering index is defined on an ordered data file. The
data file is ordered on a non-key field.
• In some cases, the index is created on non-primary key
columns which may not be unique for each record.
• In such cases, in order to identify the records faster, we
will group two or more columns together to get the
unique values and create index out of them. This
method is known as the clustering index.
• Basically, records with similar characteristics are
grouped together and indexes are created for these
groups.
Indexes as Access Paths
Contains block pointer
which points to the
next block data with
the same clustering
field value.

Searching criteria is
little bit increased.

Uses Sparse index

Time Complexity = log2N + 1 + 1..

Secondary Indexing
Secondary Indexing

Unordered File Secondary Index

With Secondary
Key Example

• File is ordered on Eid(Primary Key)

• Search to be done using Pno
• So, Index table will maintain Pno as a key
and in ordered.

Time Complexity = log2N

+1
Secondary Indexing
Unordered File with Non-key
Secondary Index Example
• Search done by Ename(Non-key)
• Index file contains Ename as key and is
ordered.
• Maintains intermediate index layer which
contains block of record pointers.
• Pointer in IT points to a particular block
and the record pointers in that block will
point to the record in HD.

Time Complexity = log2N + 1

+ 1
Radhika
Rani
Types of Single-Level Indexes
Primary Index Clustering Index Secondary Index
ordered file ordered file ordered file
a secondary means of accessing a file
Data file is ordered on a Data file is ordered on a non-key Data file is ordered may be on candidate key has a
key field (distinct value field (no distinct value for each unique value or a non-key with duplicate values
for each record) record)

file content file content file content

one index entry for each one index entry for each distinct The index is an ordered file with two fields:
disk block. key field value of the field; the index entry 1field value.
value is the first record in points to the first data block that 2it is either a block pointer or a record
the block, which is called contains records with that field pointer.
the block anchor value

nondense (sparse) nondense (sparse) index If key, dense. If non key, dense or
index sparse index
Multi-Level Indexes

A Two-level Primary Index

Dynamic Multilevel Indexes Using B-trees
and B+-trees
Multi-Level Indexes

• Because a single-level index is an ordered file, we can create a primary

index to the index itself;
• In this case, the original index file is called the first-level index and the
index to the index is called the second-level index.
• We can repeat the process, creating a third, fourth, ..., top level until all
entries of the top level fit in one disk block
• A multi-level index can be created for any type of first-level index
(primary, secondary, clustering) as long as the first- level index consists
of more than one disk block
A Two-Level Primary
Index

19
Multi-Level Indexes

• Such a multi-level index is a form of search tree

• However, insertion and deletion of new index entries is a

severe problem because every level of the index is an
ordered file.
Multi-Level Indexes

Tree structure
Dynamic Multilevel Indexes Using B-Trees and
B+- Trees

• Most multi-level indexes use B-tree or B+-tree data

• structures because of the insertion and deletion problem
• This leaves space in each tree node (disk block) to allow for new index
entries
• These data structures are variations of search trees that
• allow efficient insertion and deletion of new search values.
• In B-Tree and B+-Tree data structures, each node corresponds to a
disk block
• Each node is kept between half-full and completely full
Dynamic Multilevel Indexes Using B-Trees and
B+- Trees

 An insertion into a node that is not full is quite efficient

 If a node is full the insertion causes a split into two nodes
 Splitting may propagate to other tree levels
 A deletion is quite efficient if a node does not become less than half
full
 If a deletion causes a node to become less than half full, it must be
merged with neighboring nodes
Dynamic Multilevel Indexes Using B-Trees and
B+- Trees

• Balanced Tree
• In multilevel indexing, inserting
and deleting a record is difficult, as
the corresponding entries in index
tables also need to be changed.
• B-Trees makes these tasks simple.
• Elements are in sorted order
What is Hashing?

THE data that makes up

computerized
COLLECTIONdatabase a must be
• In a huge database structure, it is very inefficient to search
OF physically some all
stored the index
computer storage
values and reach the desired data. on medium.
• Hashing technique is used to calculate the direct location of a data
record on the disk without using index structure.
The DBMS software can then
• Data is stored at the data blocks whoseretrieve,
address is generated by using
the hashing function. update, and process this data as needed.
• The memory location where these records are stored is known as data
bucket or data blocks.
Types of Hashing
Static Hashing

 A bucket is a unit of storage containing one or more records (a bucket is typically

a disk block).
 In a hash file organization we obtain the bucket of a record directly from its
search-key value using a hash function.
 Hash function h is a function from the set of all search-key values K to the set of
all bucket addresses B.
 Hash function is used to locate records for access, insertion as well as deletion.
 Records with different search-key values may be mapped to the same bucket; thus
entire bucket has to be searched sequentially to locate a record.
Example of Hash File Organization

Hash file organization of account file, using branch-name as key (See figure in
next slide.)

There are 10 buckets,

The binary representation of the i th character is assumed to be the integer i.
The hash function returns the sum of the binary representations of the characters
modulo 10

 E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3

Example of Hash File Organization

Hash file organization of account file, using branch-name as key (see previous slide for details).
Hash Functions

 Worst has function maps all search-key values to the same bucket; this makes
access time proportional to the number of search-key values in the file.
 An ideal hash function is uniform, i.e., each bucket is assigned the same
number of search-key values from the set of all possible values.
 Ideal hash function is random, so each bucket will have the same number of
records assigned to it irrespective of the actual distribution of search-key values
in the file.
 Typical hash functions perform computation on the internal binary
representation of the search-key.
 For example, for a string search-key, the binary representations of all the
characters in the string could be added and the sum modulo the number of
buckets could be returned. .
Handling of Bucket Overflows

 Bucket overflow can occur because of

Insufficient buckets
Skew in distribution of records. This can occur due to two reasons:
* multiple records have same search-key value
* chosen hash function produces non-uniform distribution of key values
 Although the probability of bucket overflow can be reduced, it cannot be
eliminated; it is handled by using overflow buckets.
Handling of Bucket Overflows

 Overflow chaining – the overflow buckets of a given bucket are chained together in a
linked list.
 Above scheme is called closed hashing.
 An alternative, called open hashing, which does not use overflow buckets, is
not suitable for database applications.
Hash Indices

 Hashing can be used not only for file organization, but also for index-structure creation.
 A hash index organizes the search keys, with their associated record pointers, into a hash
file structure.
 Strictly speaking, hash indices are always secondary indices
if the file itself is organized using hashing, a separate primary hash index on it using the
same search-key is unnecessary.
However, we use the term hash index to refer to both secondary index structures and
hash organized files.
Example of Hash Index
Deficiencies of Static Hashing

 In static hashing, function h maps search-key values to a fixed set of B of bucket

addresses.
Databases grow with time. If initial number of buckets is too small, performance
will degrade due to too much overflows.
If file size at some point in the future is anticipated and number of buckets
allocated accordingly, significant amount of space will be wasted initially.
If database shrinks, again space will be wasted.
One option is periodic re-organization of the file with a new hash function, but it
is very expensive.
 These problems can be avoided by using techniques that allow the number of
buckets to be modified dynamically.
Dynamic Hashing
 Good for database that grows and shrinks in size
 Allows the hash function to be modified dynamically
 Extendable hashing – one form of dynamic hashing
Hash function generates values over a large range — typically b-bit integers,
with b = 32.
At any time use only a prefix of the hash function to index into a table of
bucket addresses.
Let the length of the prefix be i bits, 0  i  32.
Bucket address table size = [Link] i = 0
Value of i grows and shrinks as the size of the database grows and shrinks.
Multiple entries in the bucket address table may point to a bucket.
Thus, actual number of buckets is < 2i
* The number of buckets also changes dynamically due to
coalescing and splitting of buckets.
General Extendible Hash Structure

In this structure, i2 = i3 = i, whereas i1 = i – 1 (see next slide for

details)
Use of Extendible Hash Structure

 Each bucket j stores a value ij; all the entries that point to the same bucket have
the same values on the first ij bits.
 To locate the bucket containing search-key Kj:
1. Compute h(Kj) = X
2. Use the first i high order bits of X as a displacement into bucket address table, and follow the
pointer to appropriate bucket
 To insert a record with search-key value Kj
 follow same procedure as look-up and locate the bucket, say j.
 If there is room in the bucket j insert record in the bucket.
 Else the bucket must be split and insertion re-attempted (next slide.)
* Overflow buckets used instead in some cases (will see shortly)
Updates in Extendible Hash Structure
To split a bucket j when inserting record with search-key value Kj:
If i > ij (more than one pointer to bucket j)
allocate a new bucket z, and set ij and iz to the old ij -+ 1.
make the second half of the bucket address table entries pointing to j to point to z
remove and reinsert each record in bucket j.
recompute new bucket for Kj and insert record in the bucket (further splitting is
required if the bucket is still full)
If i = ij (only one pointer to bucket j)
increment i and double the size of the bucket address table.
replace each entry in the table by two entries that point to the same bucket.
recompute new bucket address table entry for Kj Now i > ij so use the first case
above.
Updates in Extendable Hash Structure

 When inserting a value, if the bucket is full after

several splits (that is, i reaches some limit b) create an
overflow bucket instead of splitting bucket entry table further.
 To delete a key value,
locate it in its bucket and remove it.
The bucket itself can be removed if it becomes
empty (with appropriate updates to the bucket
address table).
Coalescing of buckets can be done (can coalesce only with a “buddy”
bucket having same value of ij and same ij –1 prefix, if it is present)
Decreasing bucket address table size is also possible
* Note: decreasing bucket address table size is an expensive operation and
should be done only if number of buckets becomes much smaller than the
size of the table
Example

Initial Hash structure, bucket size = 2

Example

 Hash structure after insertion of one Brighton and two Downtown records
Example

Hash structure after insertion of Mianus record

Example

Hash structure after insertion of three Perryridge records

Example

 Hash structure after insertion of Redwood and Round Hill records

Extendible Hashing vs. Other Schemes

 Benefits of extendable hashing:

Hash performance does not degrade with growth of file
Minimal space overhead
 Disadvantages of extendable hashing
Extra level of indirection to find desired record
Bucket address table may itself become very big (larger than memory)
* Need a tree structure to locate desired record in the structure!
Changing size of bucket address table is an expensive operation
 Linear hashing is an alternative mechanism which avoids these
disadvantages at the possible cost of more bucket overflows
Comparison of Ordered Indexing and
Hashing

 Cost of periodic re-organization

 Relative frequency of insertions and deletions
 Is it desirable to optimize average access time at the expense of worst-case access
time?
 Expected type of queries:
Hashing is generally better at retrieving records having a specified value of the
key.
If range queries are common, ordered indices are to be preferred
ACTIVITIES/ CASE STUDIES/ IMPORTANT FACTS
RELATED TO THE SESSION
Consider a dynamic hashing approach for 4-bit integer keys:

1. There is a main hash table of size 4.

2. The 2 least significant bits of a key is used to index into the main hash table.
3. Initially, the main hash table entries are empty.
4. Thereafter, when more keys are hashed into it, to resolve collisions, the set of all keys corresponding to a main
hash table entry is organized as a binary tree that grows on demand.
5. First, the 3rd least significant bit is used to divide the keys into left and right subtrees.
6. to resolve more collisions, each node of the binary tree is further sub-divided into left and right subtrees based
on 4th least significant bit.
7. A split is done only if it is needed, i. e. only when there is a collision.

Consider the following state of the hash table.

SUMMARY

Hashing is a DBMS technique for searching for needed data on the disc
without utilising an index structure. The hashing method is basically used to
index items and retrieve them in a DB since searching for a specific item
using a shorter hashed key rather than the original value is faster.
SELF-ASSESSMENT QUESTIONS

1. What is hashing?

(a) A data structure for storing key-value pairs

(b) b) A technique for converting data of arbitrary size to a fixed size
(c) c) A process of compressing data to save space
(d) d) A method for encrypting data

2. Which of the following is not a suitable use case for hashing?

a) Password storage
b) Data validation
c) Data encryption
d) Sorting large
datasets
TERMINAL QUESTIONS

1. Can you explain the difference between static and dynamic hashing,

and when each is appropriate?

2. What is collision handling, and how is it handled in hashing-based

index structures?

3. How does the choice of hash function affect the performance of a

hashing-based index?

4. What is a primary index, and how is it implemented using hashing in

DBMS?

5. How does extendible hashing differ from linear and quadratic

probing?
REFERENCES FOR FURTHER LEARNING OF
THE SESSION

Reference Books:
1. "Database Management Systems" by Raghu Ramakrishnan and Johannes Gehrke - This book covers
the basics of database management systems, including the concept of index structures.
2. "Database Systems: Design, Implementation, and Management" by Carlos Coronel, Steven Morris,
and Peter Rob - This book provides a comprehensive introduction to database systems, including
index structures and their importance in optimizing database performance.
3. "Database Indexing: A Practical Guide for Developers" by Will Iverson - This book focuses specifically
on the concept of indexing in database management systems, providing practical advice and
examples for developers.
Sites and Web links:
4. "Hash-Based Indexes" by Raghu Ramakrishnan and Johannes Gehrke -
[Link]
5. "Indexing and Hashing" by S. Sudarshan -
[Link]
6. "Extendible Hashing" by Jerome Martin -
[Link]
7. "Concurrency Control in Hash-Based Database Systems" by Christoph G. Schuetz and Michael H.
THANK YOU

Team – DBMS

CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
Types of Indexing Methods Explained
No ratings yet
Types of Indexing Methods Explained
60 pages
Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
38 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
23 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
Indexing
No ratings yet
Indexing
41 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
25 pages
Indexing Lecture Nov 2023 Detailed
No ratings yet
Indexing Lecture Nov 2023 Detailed
37 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Indexing
No ratings yet
Indexing
53 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
FALLSEM2019-20 ITE1003 ETH VL2019201002592 Reference Material I 06-Nov-2019 Indexing
No ratings yet
FALLSEM2019-20 ITE1003 ETH VL2019201002592 Reference Material I 06-Nov-2019 Indexing
32 pages
Chapter 3 File Organization Indexed Methods
No ratings yet
Chapter 3 File Organization Indexed Methods
31 pages
Understanding Database Indexing Techniques
No ratings yet
Understanding Database Indexing Techniques
11 pages
Database Management System-203105251: Assistant Professor Computer Science & Engineering
No ratings yet
Database Management System-203105251: Assistant Professor Computer Science & Engineering
35 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
Indexing
No ratings yet
Indexing
62 pages
Indexing Structures For Files: Database Design Database Design
No ratings yet
Indexing Structures For Files: Database Design Database Design
9 pages
Data Indexing Presentation
No ratings yet
Data Indexing Presentation
38 pages
Database Indexing Mechanisms Explained
No ratings yet
Database Indexing Mechanisms Explained
22 pages
Final Updates - Lec 2
No ratings yet
Final Updates - Lec 2
40 pages
Dbms Unit 5 Notes
No ratings yet
Dbms Unit 5 Notes
23 pages
Indexing
No ratings yet
Indexing
6 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
12 pages
Database Storage & Indexing Guide
No ratings yet
Database Storage & Indexing Guide
41 pages
Unit 3 Storage Strategies Indices B-Trees Hashing
No ratings yet
Unit 3 Storage Strategies Indices B-Trees Hashing
12 pages
Index 1
No ratings yet
Index 1
25 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
7 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
DBMS Unit9
No ratings yet
DBMS Unit9
44 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
23 pages
Chapter 3
No ratings yet
Chapter 3
50 pages
R22 Unit 5
No ratings yet
R22 Unit 5
23 pages
CS2202 IndexingHashing
No ratings yet
CS2202 IndexingHashing
83 pages
Unit 5
No ratings yet
Unit 5
20 pages
Index Structures
No ratings yet
Index Structures
34 pages
Database Indexing Essentials
No ratings yet
Database Indexing Essentials
110 pages
File Organizations and Indexes
No ratings yet
File Organizations and Indexes
51 pages
Understanding Index Algorithms
No ratings yet
Understanding Index Algorithms
27 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Indexing Lecture Nov 2023 Summary
No ratings yet
Indexing Lecture Nov 2023 Summary
41 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
INDEXING
No ratings yet
INDEXING
10 pages
Module 4 Indexing
No ratings yet
Module 4 Indexing
20 pages
Comparing Indexing and Hashing in DBMS
No ratings yet
Comparing Indexing and Hashing in DBMS
47 pages
Indexes
No ratings yet
Indexes
70 pages
Raman Scattering - Wikipedia
No ratings yet
Raman Scattering - Wikipedia
42 pages
Section 2 in A Web 2.0 World - An Expanded Vision of Relevant Product Markets
No ratings yet
Section 2 in A Web 2.0 World - An Expanded Vision of Relevant Product Markets
30 pages
Case Criteria
No ratings yet
Case Criteria
1 page
Hilarious Animal Puppet Show
No ratings yet
Hilarious Animal Puppet Show
6 pages
05 OR BMS MCQs
100% (1)
05 OR BMS MCQs
12 pages
Clinical Pharmacognosy New
No ratings yet
Clinical Pharmacognosy New
59 pages
Capstone Project Question Bank
No ratings yet
Capstone Project Question Bank
11 pages
Microeconomics Group Assignment
No ratings yet
Microeconomics Group Assignment
4 pages
The Application of Remote Sensing Techniques To Su
No ratings yet
The Application of Remote Sensing Techniques To Su
17 pages
TAG Heuer-Services-Pricelist INDIA
No ratings yet
TAG Heuer-Services-Pricelist INDIA
2 pages
Sri Jayalakshmi Transport - CH1
No ratings yet
Sri Jayalakshmi Transport - CH1
29 pages
Leprosy PPT 1
No ratings yet
Leprosy PPT 1
41 pages
Dark Knight Script
No ratings yet
Dark Knight Script
5 pages
Icar-Indian Institute of Pulses Research Kalyanpur, Kanpur - 208 024 (An ISO 9001:2008 Certified Institute)
No ratings yet
Icar-Indian Institute of Pulses Research Kalyanpur, Kanpur - 208 024 (An ISO 9001:2008 Certified Institute)
4 pages
Human Brain: A Gender Mosaic Analysis
No ratings yet
Human Brain: A Gender Mosaic Analysis
6 pages
Filtration Process and Applications
No ratings yet
Filtration Process and Applications
50 pages
EPABX System Specifications for India
No ratings yet
EPABX System Specifications for India
7 pages
Mil STD 1553 Transformers
No ratings yet
Mil STD 1553 Transformers
5 pages
Satyr Monk
No ratings yet
Satyr Monk
11 pages
Ooad Lab
No ratings yet
Ooad Lab
221 pages
CO2 LESSON PLAN Science 7
100% (1)
CO2 LESSON PLAN Science 7
2 pages
Section 9.2 Hyperbolas
No ratings yet
Section 9.2 Hyperbolas
20 pages
CSEC Economics Exam June 2022 Guide
No ratings yet
CSEC Economics Exam June 2022 Guide
18 pages
39 Live Theories of Cost Basic Concepts Theory of Revenue 58
No ratings yet
39 Live Theories of Cost Basic Concepts Theory of Revenue 58
26 pages
Animal Husbandry E-Book
100% (1)
Animal Husbandry E-Book
19 pages
Jee Main Study Material Syllabus
No ratings yet
Jee Main Study Material Syllabus
1 page
Unit 2 Placental Circulation
No ratings yet
Unit 2 Placental Circulation
15 pages
Complete Data
No ratings yet
Complete Data
46 pages
Vishwakarma Invoice 2
No ratings yet
Vishwakarma Invoice 2
2 pages
OW Notes
No ratings yet
OW Notes
3 pages

Co3 Session 21

Uploaded by

Co3 Session 21

Uploaded by

Department of CSE

COURSE NAME: DBMS

CREATED BY K. VICTOR BABU

INDEXING is a data structure technique which THE allows you datatothat

The DBMS software can then

Similar to Indexing in Textbooks

 A single-level index is an auxiliary file that makes it more efficient to search

• An Index is a small table having only

 Efficiently returns a collection of matching

Clustere Seconda Multilev

Requires more space to store index records

No. of records in IT = No. of records in HD

Time Complexity = log2N +1

Uses Sparse index

Time Complexity = log2N + 1 + 1..

Unordered File Secondary Index

• File is ordered on Eid(Primary Key)

Time Complexity = log2N

Time Complexity = log2N + 1

file content file content file content

A Two-level Primary Index

• Because a single-level index is an ordered file, we can create a primary

• Such a multi-level index is a form of search tree

• However, insertion and deletion of new index entries is a

• Most multi-level indexes use B-tree or B+-tree data

 An insertion into a node that is not full is quite efficient

THE data that makes up

 A bucket is a unit of storage containing one or more records (a bucket is typically

There are 10 buckets,

 E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3

 Bucket overflow can occur because of

 In static hashing, function h maps search-key values to a fixed set of B of bucket

In this structure, i2 = i3 = i, whereas i1 = i – 1 (see next slide for

 When inserting a value, if the bucket is full after

Initial Hash structure, bucket size = 2

Hash structure after insertion of Mianus record

Hash structure after insertion of three Perryridge records

 Hash structure after insertion of Redwood and Round Hill records

 Benefits of extendable hashing:

 Cost of periodic re-organization

1. There is a main hash table of size 4.

Consider the following state of the hash table.

(a) A data structure for storing key-value pairs

2. Which of the following is not a suitable use case for hashing?

and when each is appropriate?

2. What is collision handling, and how is it handled in hashing-based

3. How does the choice of hash function affect the performance of a

4. What is a primary index, and how is it implemented using hashing in

5. How does extendible hashing differ from linear and quadratic

You might also like