Web Caches, CDNS, and P2Ps

Web caches (proxy server)
Web Caches, CDNs, and P2Ps
Goal: satisfy client request without

involving origin server
user sets browser: Web accesses via
cache
browser sends all HTTP requests to
cache
object in cache: cache returns

object
else cache requests object from

origin server, then returns object
to client
Applications (part 3)
More about Web caching
Caching example (1)
Cache acts as both client and server
Why Web caching?
Assumptions
Cache can do up-to-date check using

If-modified-since HTTP
header
Reduce response time for client request.
average object size = 100,000 bits
Reduce traffic on an institutions access

link.
avg. request rate from institutions

browser to origin serves = 15/sec
Internet dense with caches enables

poor content providers to
effectively deliver content
delay from institutional router to any

origin server and back to router = 2
sec
Issue: should cache take risk and

deliver cached object without
checking?
Heuristics are used.
Consequences
Typically cache is installed by ISP

(university, company, residential ISP)
utilization on LAN = 15%

utilization on access link = 100%
total delay = Internet delay + access
delay + LAN delay
= 2 sec + minutes + milliseconds
Caching example (2)
Caching example (3)
Possible solution
Install cache
suppose hit rate is .4
increase bandwidth of access

link to, say, 10 Mbps
Consequence
40% requests will be satisfied almost
immediately
Consequences
utilization on LAN = 15%
60% requests satisfied by origin server

10 Mbps access link
utilization on access link = 15%

Total delay = Internet delay +
access delay + LAN delay
utilization of access link reduced to 60%,

resulting in negligible delays (say 10
msec)
total delay = Internet delay + access
delay + LAN delay
= 2 sec + msecs + msecs
= .6*2 sec + .6*.01 secs + milliseconds

< 1.3 secs
often a costly upgrade
Content distribution networks (CDNs)
CDN example
The content providers are the CDN

customers.
Origin server
Content replication
distributes HTML
www.foo.com
replaces:
CDN company installs hundreds of

CDN servers throughout Internet
http://www.foo.com/sports.ruth.gif
with
http://www.cdn.com/www.foo.com/sport
s/ruth.gif
in
lower-tier ISPs, close to

users
CDN company
CDN replicates its customers

content in CDN servers. When
provider updates content, CDN
updates servers
cdn.com
distributes gif files
uses its authoritative DNS server to route
redirect requests
More about CDNs
P2P file sharing

Example
routing requests
not just Web pages
CDN creates a map, indicating

distances from leaf ISPs and
CDN nodes
streaming stored audio/video
when query arrives at authoritative

DNS server:
server determines ISP from

which query originates
uses map to determine

best CDN server
streaming real-time audio/video
CDN nodes create

application-layer overlay
network
Alice runs P2P client application on

her notebook computer
Alice chooses one of the peers,

Bob.
File is copied from Bobs PC to
Alices notebook: HTTP
Intermittently connects to Internet;

gets new IP address for each
connection
While Alice downloads, other

users uploading from Alice.
Asks for file ABC
Alices peer is both a Web client

and a transient Web server.
Application displays other peers that

have copy of ABC.
All peers are servers = highly

scalable!
P2P: centralized directory
P2P: problems with centralized directory
original Napster design
Single point of failure
1) when peer connects, it informs

central server:
Performance bottleneck
IP address
content
Copyright infringement
file transfer is decentralized,

but locating content is highly
centralized
2) Alice queries for ABC

3) Alice requests file from Bob
P2P: decentralized directory
More about decentralized directory
Each peer is either a group leader or

assigned to a group leader.
Group leader tracks the content in all
its children.
overlay network
advantages of approach
peers are nodes
no centralized directory server
edges between peers and their group

leaders
Peer queries group leader; group

leader may query other group
leaders.
location service distributed

over peers
more difficult to shut down
edges between some pairs of group

leaders
disadvantages of approach
virtual neighbors
bootstrap node needed
bootstrap node
group leaders can get overloaded
connecting peer is either assigned to

a group leader or designated as
leader
P2P: Query flooding
P2P: more on query flooding
Gnutella
no hierarchy
Send query to neighbors
use bootstrap node to learn about

others
Neighbors forward query
join message
If queried peer has object, it sends

message back to querying peer
Pros
Cons
peers have similar responsibilities: no

group leaders
excessive query traffic
highly decentralized
no peer maintains directory info
query radius: may not have content

when present
bootstrap node
maintenance of overlay network
join
Structured P2P networks
Traditional P2P file-sharing systems do not operate efficiently
Structured P2P networks (file sharing example) has two

questions to consider:
Spend too many messages on constructing and maintaining the

overlay network
Perform random global searches mostly by flooding the network
One advantage of the traditional scheme is that the documents

can be placed anywhere and the document will be found if at
least one of the machines holding a copy is up and reachable.
How do we map objects onto nodes?
How do we route requests to the node that is responsible for the

object?
For the first question, simplest solution just uses hashing.

hash(x) -> n
Where x is the object identifier and n is the node identifier onto
which the object is placed.
What properties do we need from the hash function?
Consistent hashing
Problems with traditional hashing:
Consistent hashing maps both objects

and nodes onto a 128-bit ID space
that is organized as a circle.
When nodes join and leave the hashing function will be affected
hash(x) { return x % 101 }
Need to know the exact number of hosts (in this example, 101)
Hash(object_name) -> objid
To address these issues, structured P2P networks use consistent

hashing.
Hash(IP_addr) -> nodeid
Because the ID space is very large,

an objid and nodeid would not (most
likely) coincide.
Select the node whose id is closest in
this 128-bit space to the object id.
Consistent hashing
Distributed hash tables
Like ordinary hashing, distributes objects evenly across the

nodes. However, unlike ordinary hashing, only a small number
of objects have to move when a node (hash bucket) leaves or
joins.
Suppose you are at node 65a1fc

(hex) and trying to locate objid
d46a1c
How does a user who wants to access a object know which node
holds the object?
Each node keeps a complete table of nodes IDs and associated IP

addresses search the list for the closest node ID and access the
node!
Not practical for large networks (i.e., not scalable)
Another approach: route the request to the appropriate node
Your node does not share

anything with the target object
You know a node that shared at

least the prefix d it is closer
than you to this object
Ask this node to locate object

d46a1c for you
Assuming node d13da3 knows

another node with even longer
prefix the message will be
forwarded even further
As the message moves through the

ID space, the actual message moves
through the Internet
Each node maintains a leaf set

these are nodes that are numerically
closest to the node.
Each node maintains a route table as

shown here
Leaf node peers with other leaf nodes

within the same set of leafs. Suppose
a leaf node is unable to do some
operation because of some error
condition that work may be offloaded
onto another leaf node
Routing table is a 2-D array. Has a

row for each hex digit in the ID (32
rows for a 128-bit ID)
Entry in row i shares a prefix of
length i with this node the entry in
the j-th column has hex value j at
i+1-th position
Routing table at 65a1fcx
x denotes an unspecified suffix

Adding a node to overlay is much
like routing a locate object
message
New node must at least know a
member of the P2P network
(preferable if the closest)
Learns about other nodes through the
routing process fills out its routing
able.
Existing nodes also update their
routing tables based on new arrivals

Web Caches, CDNS, and P2Ps

Uploaded by

Web Caches, CDNS, and P2Ps

Uploaded by

Web caches (proxy server)

Web Caches, CDNs, and P2Ps

Goal: satisfy client request without

object in cache: cache returns

else cache requests object from

More about Web caching

Caching example (1)

Cache acts as both client and server

Why Web caching?

Cache can do up-to-date check using

Reduce response time for client request.

average object size = 100,000 bits

Reduce traffic on an institutions access

avg. request rate from institutions

Internet dense with caches enables

delay from institutional router to any

Issue: should cache take risk and

Typically cache is installed by ISP

utilization on LAN = 15%

Caching example (2)

Caching example (3)

increase bandwidth of access

60% requests satisfied by origin server

utilization on access link = 15%

utilization of access link reduced to 60%,

= 2 sec + msecs + msecs

= .6*2 sec + .6*.01 secs + milliseconds

often a costly upgrade

Content distribution networks (CDNs)

The content providers are the CDN

CDN company installs hundreds of

lower-tier ISPs, close to

CDN replicates its customers

More about CDNs

P2P file sharing

not just Web pages

CDN creates a map, indicating

streaming stored audio/video

when query arrives at authoritative

server determines ISP from

uses map to determine

streaming real-time audio/video

CDN nodes create

Alice runs P2P client application on

Alice chooses one of the peers,

Intermittently connects to Internet;

While Alice downloads, other

Asks for file ABC

Alices peer is both a Web client

Application displays other peers that

All peers are servers = highly

P2P: centralized directory

P2P: problems with centralized directory

original Napster design

Single point of failure

1) when peer connects, it informs

file transfer is decentralized,

2) Alice queries for ABC

P2P: decentralized directory

More about decentralized directory

Each peer is either a group leader or

peers are nodes

no centralized directory server

edges between peers and their group

Peer queries group leader; group

location service distributed

more difficult to shut down

edges between some pairs of group

bootstrap node needed

group leaders can get overloaded

connecting peer is either assigned to

P2P: Query flooding

P2P: more on query flooding

Send query to neighbors

use bootstrap node to learn about

Neighbors forward query

= .62 sec + .6.01 secs + milliseconds