0% found this document useful (0 votes)

203 views25 pages

Hashing in Data Structure

Hashing is a technique used to implement dictionaries or hash tables that allows for constant-time operations by mapping keys to values using a hash function; collisions occur when multiple keys map to the same slot and require collision resolution methods like separate chaining or open addressing; the performance of hashing depends on the load factor and choice of hash function to minimize collisions.

Uploaded by

Travis Wood

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Download as ppt, pdf, or txt

0% found this document useful (0 votes)

203 views25 pages

Hashing in Data Structure

Uploaded by

Travis Wood

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Download as ppt, pdf, or txt

You are on page 1/ 25

CSCE 3110 Data Structures & Algorithm Analysis

Rada Mihalcea http://www.cs.unt.edu/~rada/CSCE3110 Hashing Reading: Chap.5, Weiss

How to Implement a Dictionary?

Sequences
ordered unordered

Binary Search Trees Skip lists Hashtables

Hashing
Another important and widely useful technique for implementing dictionaries Constant time per operation (on the average) Worst case time proportional to the size of the set for each operation (just like array and chain implementation)

Basic Idea
Use hash function to map keys into positions in a hash table Ideally If element e has key k and h is hash function, then e is stored in position h(k) of table To search for e, compute h(k) to locate position. If no element, dictionary does not contain e.

Example
Dictionary Student Records
Keys are ID numbers (951000 - 952000), no more than 100 students Hash function: h(k) = k-951000 maps ID into distinct table positions 0-1000 hash table array table[1001] ...
0 1 2 3
buckets

1000

Analysis (Ideal Case)

O(b) time to initialize hash table (b number of positions or buckets in hash table) O(1) time to perform insert, remove, search

Ideal Case is Unrealistic

Works for implementing dictionaries, but many applications have key ranges that are too large to have 1-1 mapping between buckets and keys! Example: Suppose key can take on values from 0 .. 65,535 (2 byte unsigned int) Expect 1,000 records at any given time Impractical to use hash table with 65,536 slots!

Hash Functions
If key range too large, use hash table with fewer buckets and a hash function which maps multiple keys to same bucket:
h(k1) = = h(k2): k1 and k2 have collision at slot

Popular hash functions: hashing by division

h(k) = k%D, where D number of buckets in hash table

Example: hash table with 11 buckets

h(k) = k%11 80 3 (80%11= 3), 40 7, 65 10 58 3 collision!

Collision Resolution Policies

Two classes:
(1) Open hashing, a.k.a. separate chaining (2) Closed hashing, a.k.a. open addressing

Difference has to do with whether collisions are stored outside the table (open hashing) or whether collisions result in storing one of the records at another slot in the table (closed hashing)

Closed Hashing
Associated with closed hashing is a rehash strategy: If we try to place x in bucket h(x) and find it occupied, find alternative location h1(x), h2(x), etc. Try each in order, if none empty table is full, h(x) is called home bucket Simplest rehash strategy is called linear hashing
hi(x) = (h(x) + i) % D

In general, our collision resolution strategy is to generate a sequence of hash table slots (probe sequence) that can hold the record; test each slot until find empty one (probing)

Example Linear (Closed) Hashing

D=8, keys a,b,c,d have hash values h(a)=3, h(b)=0, h(c)=4, h(d)=3 Where do we insert d? 3 already filled Probe sequence using linear hashing:
h1(d) = (h(d)+1)%8 = 4%8 = 4 h2(d) = (h(d)+2)%8 = 5%8 = 5* h3(d) = (h(d)+3)%8 = 6%8 = 6 etc. 7, 0, 1, 2

0 1 2 3 4 5 6 7

a c

Wraps around the beginning of the table!

Operations Using Linear Hashing

Test for membership: findItem Examine h(k), h1(k), h2(k), , until we find k or an empty bucket or home bucket If no deletions possible, strategy works! What if deletions? If we reach empty bucket, cannot be sure that k is not somewhere else and empty bucket was occupied when k was inserted Need special placeholder deleted, to distinguish bucket that was never used from one that once held a value May need to reorganize table after many deletions

Performance Analysis - Worst Case

Initialization: O(b), b# of buckets Insert and search: O(n), n number of elements in table; all n key values have same home bucket No better than linear list for maintaining dictionary!

Performance Analysis - Avg Case

Distinguish between successful and unsuccessful searches
Delete = successful search for record to be deleted Insert = unsuccessful search along its probe sequence

Expected cost of hashing is a function of how full the table is: load factor = n/b It has been shown that average costs under linear hashing (probing) are:
Insertion: 1/2(1 + 1/(1 - )2) Deletion: 1/2(1 + 1/(1 - ))

Improved Collision Resolution

Linear probing: hi(x) = (h(x) + i) % D
all buckets in table will be candidates for inserting a new record before the probe sequence returns to home position clustering of records, leads to long probing sequences Linear probing with skipping: hi(x) = (h(x) + ic) % D c constant other than 1 records with adjacent home buckets will not follow same probe sequence

(Pseudo)Random probing: hi(x) = (h(x) + ri) % D

ri is the ith value in a random permutation of numbers from 1 to D-1 insertions and searches use the same sequence of random numbers

Example
I
0 1 2 3 4 5 6 7 8 9 10

II
insert 1052 (h.b. 7)
0

1001 9537 3016

1. What if next element has home bucket 0? h(k) = k%11 go to bucket 3 Same for elements with home bucket 1 or 2! Only a record with home position 3 will stay.

1 2 3 4

5
6 7 8 9 10

p = 4/11 that next record will go to bucket 3

9874 2009 9875

9874 2009 9875 1052

2. Similarly, records hashing to 7,8,9 will end up in 10 3. Only records hashing to 4 will end up in 4 (p=1/11); same for 5 and 6

next element in bucket 3 with p = 8/11

Hash Functions - Numerical Values

Consider: h(x) = x%16
poor distribution, not very random depends solely on least significant four bits of key

Better, mid-square method

if keys are integers in range 0,1,,K , pick integer C such that DC2 about equal to K2, then h(x) = x2/C % D r 2 extracts middle r bits of x , where 2 =D (a base-D digit) better, because most or all of bits of key contribute to result

Hash Function Strings of Characters

Folding Method:
int h(String x, int D) { int i, sum; for (sum=0, i=0; i<x.length(); i++) sum+= (int)x.charAt(i); return (sum%D); }

sums the ASCII values of the letters in the string

ASCII value for A =65; sum will be in range 650-900 for 10 upper-case letters; good when D around 100, for example

order of chars in string has no effect

Hash Function Strings of Characters

Much better: Cyclic Shift
static long hashCode(String key, int D) { int h=0; for (int i=0, i<key.length(); i++){ h = (h << 4) | ( h >> 27); h += (int) key.charAt(i); } return h%D; }

Open Hashing
Each bucket in the hash table is the head of a linked list All elements that hash to a particular bucket are placed on that buckets linked list Records within a bucket can be ordered in several ways
by order of insertion, by key value order, or by frequency of access order

Open Hashing Data Organization

0 1 2 3 4

...
...

D-1

...

Analysis
Open hashing is most appropriate when the hash table is kept in main memory, implemented with a standard in-memory linked list

We hope that number of elements per bucket roughly equal in size, so that the lists will be short If there are n elements in set, then each bucket will have roughly n/D If we can estimate n and choose D to be roughly as large, then the average bucket will have only one or two members

Analysis Contd
Average time per dictionary operation: D buckets, n elements in dictionary average n/D elements per bucket insert, search, remove operation take O(1+n/D) time each If we can choose D to be about n, constant time Assuming each element is likely to be hashed to any bucket, running time constant, independent of n

Comparison with Closed Hashing

Worst case performance is O(n) for both

Number of operations for hashing

23 6 8 10 23 5 12 4 9 19 D=9 h(x) = x % D

Hashing Problem
Draw the 11 entry hashtable for hashing the keys 12, 44, 13, 88, 23, 94, 11, 39, 20 using the function (2i+5) mod 11, closed hashing, linear probing Pseudo-code for listing all identifiers in a hashtable in lexicographic order, using open hashing, the hash function h(x) = first character of x. What is the running time?

PGCIL GIS Specification - Rev 4 (June-14)
100% (1)
PGCIL GIS Specification - Rev 4 (June-14)
74 pages
Study of Van Emde Boas Tree With Application To Dijkstra: Advanced Problem Solving
No ratings yet
Study of Van Emde Boas Tree With Application To Dijkstra: Advanced Problem Solving
16 pages
Informatica B2B Data Transformation 9.1 DS
No ratings yet
Informatica B2B Data Transformation 9.1 DS
3 pages
KSEB Technical SPec PDF
0% (1)
KSEB Technical SPec PDF
279 pages
Transformer Factory Assembly Area Layout
100% (3)
Transformer Factory Assembly Area Layout
55 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Van Emde Boas Trees
No ratings yet
Van Emde Boas Trees
5 pages
Lecture 4: Divide and Conquer: Van Emde Boas Trees
No ratings yet
Lecture 4: Divide and Conquer: Van Emde Boas Trees
7 pages
Comparative Analysis of Brute Force and Boyer Moore Algorithms in Word Suggestion Search
No ratings yet
Comparative Analysis of Brute Force and Boyer Moore Algorithms in Word Suggestion Search
5 pages
Logistic Regression Model - A Review
No ratings yet
Logistic Regression Model - A Review
5 pages
ML UNIT-IV Notes
100% (1)
ML UNIT-IV Notes
23 pages
Morphological PCB
No ratings yet
Morphological PCB
5 pages
Van Emde Boas Tree
No ratings yet
Van Emde Boas Tree
27 pages
Segmentation
100% (1)
Segmentation
51 pages
Knuth-Morris-Pratt Algorithm KENT
No ratings yet
Knuth-Morris-Pratt Algorithm KENT
4 pages
Crop Yield Prediction Using Machine Learning - 2020 - Computers and Electronic
50% (2)
Crop Yield Prediction Using Machine Learning - 2020 - Computers and Electronic
18 pages
Divide and Conquer For Convex Hull
100% (1)
Divide and Conquer For Convex Hull
8 pages
Stock Market Forecasting Using Deep Learning and Technical Analysis A Systematic Review
No ratings yet
Stock Market Forecasting Using Deep Learning and Technical Analysis A Systematic Review
11 pages
Data Structures 2
No ratings yet
Data Structures 2
82 pages
Asymptotic Analysis
No ratings yet
Asymptotic Analysis
42 pages
Outline and Reading: Tries 4/1/2003 9:02 AM
No ratings yet
Outline and Reading: Tries 4/1/2003 9:02 AM
3 pages
2021S - A Step by Step Guide To Regression Analysis
No ratings yet
2021S - A Step by Step Guide To Regression Analysis
10 pages
Characteristics of Data Structures
No ratings yet
Characteristics of Data Structures
3 pages
Android Application For Crop Yield Prediction and Crop Disease Detection
No ratings yet
Android Application For Crop Yield Prediction and Crop Disease Detection
4 pages
Segmentation and Object Recognition Using Edge Detection Techniques
No ratings yet
Segmentation and Object Recognition Using Edge Detection Techniques
9 pages
Autosys Good One
No ratings yet
Autosys Good One
374 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Implementation Data Mining With K-Means Algorithm For Clustering Distribution Rabies Case Area in Palembang City PDF
No ratings yet
Implementation Data Mining With K-Means Algorithm For Clustering Distribution Rabies Case Area in Palembang City PDF
8 pages
Median Finding Algorithm
No ratings yet
Median Finding Algorithm
10 pages
Data Structures Questions
No ratings yet
Data Structures Questions
6 pages
ML Cheatsheet Final
No ratings yet
ML Cheatsheet Final
32 pages
Image Enhancement Techniques
No ratings yet
Image Enhancement Techniques
15 pages
Algorithm Lectures
No ratings yet
Algorithm Lectures
117 pages
Units of Measurements Handout 2
No ratings yet
Units of Measurements Handout 2
7 pages
Minimum Spanning Trees
No ratings yet
Minimum Spanning Trees
19 pages
CSE245 - Algorithms: Single Source Shortest Path (Dijkstra's Algorithm)
No ratings yet
CSE245 - Algorithms: Single Source Shortest Path (Dijkstra's Algorithm)
39 pages
Rice and Climate Change
No ratings yet
Rice and Climate Change
37 pages
Mac Flow
No ratings yet
Mac Flow
12 pages
HSSLiVE XII CS Chapter - 3 Data Structures Joy PDF
No ratings yet
HSSLiVE XII CS Chapter - 3 Data Structures Joy PDF
4 pages
CS7641 Machine Learning Midterm Notes PDF
No ratings yet
CS7641 Machine Learning Midterm Notes PDF
239 pages
Exam Data Structure
No ratings yet
Exam Data Structure
2 pages
Text Processing (Complete)
No ratings yet
Text Processing (Complete)
100 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
9 pages
Constraint Satisfaction Problem
No ratings yet
Constraint Satisfaction Problem
10 pages
Chap 2 Emerging Database Landscape
No ratings yet
Chap 2 Emerging Database Landscape
10 pages
Algorithm Types and Classification
No ratings yet
Algorithm Types and Classification
5 pages
Automobile
No ratings yet
Automobile
15 pages
2-Divide and Conquer Approach
No ratings yet
2-Divide and Conquer Approach
162 pages
Machine Learning
100% (1)
Machine Learning
185 pages
Constraint Satisfaction Problems
No ratings yet
Constraint Satisfaction Problems
35 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
Unit II Requirements Elicitation
No ratings yet
Unit II Requirements Elicitation
23 pages
Graph Analytics PDF
No ratings yet
Graph Analytics PDF
13 pages
8 Logic Programming
No ratings yet
8 Logic Programming
29 pages
Ds and Algo
No ratings yet
Ds and Algo
2 pages
Semiconductor Devices A Brief Review: Dr. K. Fobelets
No ratings yet
Semiconductor Devices A Brief Review: Dr. K. Fobelets
98 pages
New Learning of Python by Practical Innovation and Technology
From Everand
New Learning of Python by Practical Innovation and Technology
Sudhir Pathania
No ratings yet
Java Complete Self-Assessment Guide
From Everand
Java Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Java Reflection Complete Self-Assessment Guide
From Everand
Java Reflection Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Anandaram Barua
No ratings yet
Anandaram Barua
1 page
Double Sampling-Wha It Means PDF
No ratings yet
Double Sampling-Wha It Means PDF
17 pages
Effect of Grading Ring On Voltage Distribution of Arrestors
No ratings yet
Effect of Grading Ring On Voltage Distribution of Arrestors
7 pages
Unidirectional Corona Ring
No ratings yet
Unidirectional Corona Ring
2 pages
CT Grounding To Avoid Nuisance Tripping
100% (3)
CT Grounding To Avoid Nuisance Tripping
2 pages
1363 Bolts and Nut Standard PDF
No ratings yet
1363 Bolts and Nut Standard PDF
7 pages
Ehv Disconnectors For Smart Grid
No ratings yet
Ehv Disconnectors For Smart Grid
2 pages
3 Sec-III-Hardware Fittings & Accessories
No ratings yet
3 Sec-III-Hardware Fittings & Accessories
49 pages
Spacer Damper Issues
No ratings yet
Spacer Damper Issues
5 pages
Mud Architecture: I J I R S E T
No ratings yet
Mud Architecture: I J I R S E T
6 pages
Fretting Fatigue in Overhead Conductors
No ratings yet
Fretting Fatigue in Overhead Conductors
16 pages
400 Puzzles and Answers PDF
No ratings yet
400 Puzzles and Answers PDF
188 pages
Anil Agarwal: Research: Mud As A Traditional Building Material
No ratings yet
Anil Agarwal: Research: Mud As A Traditional Building Material
10 pages
Electrical
No ratings yet
Electrical
34 pages
An Introduction To Conway's Games and Numbers
No ratings yet
An Introduction To Conway's Games and Numbers
30 pages
China Claim Final
No ratings yet
China Claim Final
1 page
Al Si Alloys PDF
No ratings yet
Al Si Alloys PDF
14 pages
Aluminium Recycling Process PDF
No ratings yet
Aluminium Recycling Process PDF
69 pages
CB Spec Svenska
No ratings yet
CB Spec Svenska
44 pages
Aluminium Recycling Process
100% (1)
Aluminium Recycling Process
69 pages