1. Explain different issues in designing a distributed system.
Scalability
• The system should handle increased load efficiently as more users and devices connect.
• Scaling horizontally (adding more nodes) or vertically (upgrading hardware) should be
seamless.
Heterogeneity
• A heterogeneous distributed system consists of interconnected sets of dissimilar hardware or
software systems. Because of the diversity, designing heterogeneous distributed systems is
far more difficult than designing homogeneous distributed Systems.
• In a heterogeneous distributed system, some form of data translation is necessary for
interaction between two incompatible nodes.
Security
• Protecting data from unauthorized access, ensuring authentication, and securing
communication between nodes.
• Encryption, access control, and secure protocols (TLS, SSL) help mitigate risks.
Fault Tolerance
• Failure of particular node should not affect on overall system.
• Locating failed node and hiding failure of a node from user itself a big challenge
Network Latency & Bandwidth
• Communication delays can affect system performance.
• Efficient data distribution strategies and caching mechanisms help optimize performance.
6. Consistency
• Ensuring data consistency across distributed nodes (e.g., strong vs. eventual consistency).
• Trade-offs between consistency, availability, and partition tolerance (CAP theorem).
7. Data Distribution & Load Balancing
• Efficiently distributing data across multiple nodes while preventing bottlenecks.
• Load balancers and partitioning techniques (sharding, hashing) are used.
8. Naming & Addressing
• Each node/resource in the system must be uniquely identified and accessible.
• DNS, distributed directories, and unique identifiers (UUIDs) are common solutions.
9. Inter-Process Communication (IPC)
• Effective communication between distributed components using RPC, message queues, or
REST APIs.
• Handling message loss, duplication, and ordering issues.
11. Synchronization & Clock Management
• Maintaining a consistent timeline across distributed nodes.
• Techniques like logical clocks (Lamport clocks, Vector clocks) and synchronization protocols
(NTP, PTP) help address clock drift.
12. Resource Management
• Efficiently allocating CPU, memory, and storage resources across distributed nodes.
• Containerization (Docker, Kubernetes) helps in dynamic resource allocation.
13. Transparency Issues
• Access Transparency: Users should not need to know where resources are located.
• Replication Transparency: Users should not be concerned about multiple copies of data.
• Failure Transparency: The system should mask failures and recover automatically.
Q2. Discuss various system models of distributed system.
System Models of Distributed Systems
A distributed system can be analyzed using different system models, which help define the structure,
interaction, failure handling, security, and computing approaches. These models guide the design
and implementation of a distributed system.
1. Architectural Models
Architectural models define how different components interact and are structured.
Architectural Styles:
1. Layered Architecture
o In Layered architecture, different components are organized in layers. Each layer
communicates with its adjacent layer by sending requests and getting responses.
o The layered architecture separates components into units. It is an efficient way of
communication. Any layer cannot directly communicate with another layer.
o Example: OSI model, web applications with presentation, business, and data layers.
2. Object-Oriented Architecture
o In this type of architecture, components are treated as objects which convey
information to each other. Object-Oriented Architecture contains an arrangement of
loosely coupled objects. Objects can interact with each other through method calls.
o Example: CORBA (Common Object Request Broker Architecture).
3. Data-Centered Architecture
o Data Centered Architecture is a type of architecture in which a common data space is
present at the center. It contains all the required data in one place a shared data
space.
o All the components are connected to this data space and they follow
publish/subscribe type of communication. It has a central data repository at the
center. Required data is then delivered to the components.
o Example: Database-centric architectures, cloud storage.
4. Event-Based Architecture
o Event-Based Architecture is almost similar to Data centered architecture just the
difference is that in this architecture events are present instead of data. Events are
present at the center in the Event bus and delivered to the required component
whenever needed.
o In this architecture, the entire communication is done through events. When an
event occurs, the system, as well as the receiver, get notified. Data, URLs etc. are
transmitted through events.
o Example: Publish-subscribe systems, message queues (Kafka, RabbitMQ).
Architectural Models:
1. Client-Server Model
o Clients request services from a central server that processes the request and sends a
response.
o Example: Web applications (browsers as clients, web servers like Apache).
2. Peer-to-Peer (P2P) Model
o Nodes (peers) act as both clients and servers, sharing resources without a central
server.
o Example: File-sharing networks (BitTorrent), blockchain networks.
2. Interaction Models
Defines how distributed system components communicate.
1. Synchronous Distributed Systems
o Communication has strict time constraints (bounded message delays).
o Useful in real-time systems (e.g., air traffic control).
o Features:
o Lower & upper bounds on execution time of processes can be set.
o Transmitted messages are received within a known bounded time.
o Drift rates between local clocks have a known bound
o It is possible & safe to use timeouts in order to detect failure of process or
communication link
o Consequences: It is difficult & costly to implement
2. Asynchronous Distributed Systems
o Features:
o • No bound on execution time of processes can be set.
o • No bound on message transmission delays.
o • No bound-on drift rates between local clocks.
o Consequences:
o • Unpredictable in terms of timing
o • No timeouts are used
3. Failure Models
Failures are inevitable in distributed systems, and different models define their nature.
1. Omission Failure
o A component (process or communication link) fails to send or receive messages.
o Example: Packet loss in networks.
2. Arbitrary (Byzantine) Failure
o A component behaves unpredictably, sending incorrect or malicious responses.
o Example: A compromised node in a blockchain.
3. Timing Failure
o A process takes longer than expected to respond, violating timing constraints.
o Example: Delayed responses in financial trading systems.
4. Security Models
Defines how security threats like unauthorized access, data breaches, and attacks are handled.
• Authentication (verifying user identity)
• Authorization (granting permissions)
• Encryption (securing communication)
• Intrusion detection and prevention
5. Computing Models
Defines how computing resources are structured and utilized.
1. Workstation Model
o The workstation model consist of network of personal computers, each with its own
hard disk and local file system and interconnected over the network. These are
termed as diskful workstation.
o This is an ideal situation for an educational department or a company office where
individual workstations are scattered across the campus.
o These workstation are connected in suitable network configuration using the star
topology. Each workstation can also work as standalone single user system.
o Example: Local-area networks in universities.
o
2. Mini-computer Model
o The minicomputer model is a simple extension of the centralized time-sharing
system. distributed computing system based on this model consists of a few
minicomputers (they may be large supercomputers as well) interconnected by a
communication network.
o Each minicomputer usually has multiple users simultaneously logged on to it. For
this, several interactive terminals are connected to each minicomputer. Each user is
logged on to one specific minicomputer, with remote access to other minicomputers.
o The network allows a user to access remote resources that are available on some
machine other than the one on to which the user is currently logged.
o The minicomputer model may be used when resource sharing (such as sharing of
information databases of different types, with each type of database located on a
different machine) with remote users is desired
o Example: Early Unix-based multi-user system
3. Workstation server Model
o The workstation model is a network of personal workstations, each with its own disk
and a local file system. A workstation with its own local disk is usually called a diskful
workstation and a workstation without a local disk is called a diskless workstation.
o In this model, a user logs onto a workstation called his or her home workstation.
Normal computation activities required by the user's processes are performed at the
user's home workstation, but requests for services provided by special servers (such
as a file server or a database server) are sent to a server providing that type of
service that performs the user's requested activity and returns the result of request
processing to the user's workstation.
o Example: File servers, web servers.
o
4. Processor-Pool Model
o The pool of processors consists of a large number of microcomputers and
minicomputers attached to the network. Each processor in the pool has its own
memory to load and run a system program or an application program of the
distributed computing system.
o unlike the workstation-server model in which a processor is allocated to each user, in
the processor pool model the processors are pooled together to be shared by the
users as needed. When the computation is completed, the processors are returned
to the pool for use by other users.
o Example: High-performance computing clusters.
o
5. Hybrid Model
o To combine the advantages of both the workstation-server and processor-pool
models, a hybrid model may be used to build a distributed computing system. The
hybrid model is based on the workstation-server model but with the addition of a
pool of processors. The processors in the pool can be allocated dynamically for
computations that are too large for workstations or that require several computers
concurrently for efficient execution.
o In addition to efficient execution of computation-intensive jobs, the hybrid model
gives guaranteed response to interactive jobs by allowing them to be processed on
local workstations of the users.
Q3. Comparison Between Bully Election Algorithm & Ring Algorithm
Use Suitable for smaller networks where quick Suitable for stable systems where processes
Case leader election is needed. do not fail frequently.
Feature Bully Algorithm Ring Algorithm
Election Initiation Any process can start the election. Only one process starts the election.
The highest-numbered process The process with the highest ID in the
Selection Method
always wins. ring wins.
Message Complexity O(N²) in worst case. O(N) in best case, O(2N) in worst case.
Coordinator The new leader immediately sends a The new leader sends a COORDINATOR
Announcement COORDINATOR message. message through the ring.
Process Failures If the highest process fails, another If multiple processes fail, the ring may
Handling election starts. break.
Not scalable due to high message More scalable as it has fewer
Scalability
exchange. messages.
Faster but expensive in terms of
Speed Slower due to message circulation.
messages.
Systems with predictable topology
Best for Systems with infrequent failures.
(ring-based networks).
4. Conclusion
• The Bully Algorithm is better for small systems where fast election is needed but has high
message complexity.
• The Ring Algorithm is more efficient in larger systems but slower due to message
circulation.
• The choice depends on the system’s needs – Bully for fast, priority-based selection, and
Ring for fair and efficient elections.
Final Verdict:
• Bully Algorithm → Priority-based, Fast but Expensive
• Ring Algorithm → Fair, Scalable but Slower
Q4.Difference Between Logical Clock and Vector Clock
Feature Logical Clock (Lamport Clock) Vector Clock
A single integer counter per process, A vector of counters, one for each
Definition
incremented with each event. process, maintaining causality.
A vector of timestamps (e.g., VC(P) =
Representation A single scalar value (e.g., LC(P) = 5).
[2,5,3]).
Ensures happens-before (→) relation Detects causal ordering and can identify
Event Ordering
but cannot detect concurrency. concurrent events.
Sender increments its clock and sends it.
Sender increments its clock, sends the
Receiver updates its clock to the
Message Passing entire vector, and receiver updates each
maximum of the received and local
entry to the maximum value received.
clock +1.
Concurrency
Cannot detect concurrent events. Can detect concurrent events (`A
Detection
Memory High (vector of size n, where n is the
Low (single integer per process).
Overhead number of processes).
O(n) (entire vector updated and
Complexity O(1) (only one timestamp updated).
compared).
Maintaining causality, detecting
Ordering of events in distributed
Use Case conflicts in version control (Git,
databases, distributed logs.
DynamoDB).
Example:
• Logical Clock (Lamport):
If P1 sends a message to P2, timestamps are:
o P1: LC = 5 → Sends message
o P2 receives it and updates LC = max(3,5) +1 = 6
• Vector Clock:
If P1 sends a message to P2, vector timestamps might be:
o P1: [2,0,0] → Sends message
o P2 receives it and updates to max([2,0,0], [1,1,0]) → [2,1,0]
Key Takeaways:
• Logical clocks are simpler but lack concurrency detection.
• Vector clocks provide better causality tracking but require more memory and computation
Q5. Comparison of Lamport’s, Ricart-Agrawala’s, and Maekawa’s Non-Token-Based Mutual
Exclusion Algorithms
Ricart-Agrawala’s
Feature Lamport’s Algorithm Maekawa’s Algorithm
Algorithm
Uses timestamps to order Uses direct request-reply Uses quorum-based
Approach requests and ensure messages for access communication for better
fairness. control. scalability.
Message O(3N) (request, reply, O(√N) (requesting only a
O(2(N-1)) (request, reply).
Complexity release). subset).
Number of
Messages per 3N messages. 2(N-1) messages. 2√N messages.
Request
Process requests access
Process sends a request to Process requests access
Request from all others and enters
all others and waits for N-1 from a quorum (subset)
Handling CS only after receiving all
replies. of processes.
replies.
Replies are sent based on If a process is not in CS, it Replies come only from
Reply timestamp priority; replies immediately; the quorum group,
Mechanism delayed if another process otherwise, it delays the reducing message
is in CS. reply. overhead.
Maintains fairness but may
Ensures FIFO ordering Less fairness due to
Fairness not follow strict FIFO
using timestamps. quorum-based decisions.
ordering.
Similar to Lamport’s but Higher concurrency, as
Low (only one process can
Concurrency optimized due to fewer only quorum members
access CS at a time).
messages. grant access.
Low – Failure of a process Moderate – Some failures
Failure Low – If a process crashes,
before sending replies are tolerable due to
Tolerance election is needed.
blocks others. quorum overlap.
High – Requires
Low – Requires Moderate – Still needs
communication with only
Scalability communication with all global communication but
a subset, making it more
processes. fewer messages.
scalable.
No deadlocks (requests are
Deadlock No deadlocks (timestamps Possible deadlocks due to
prioritized based on
Possibility resolve conflicts). quorum overlaps.
timestamps).
Conclusion
Best Choice For Algorithm
Strict FIFO Ordering Lamport’s Algorithm
Lower Message Complexity Ricart-Agrawala’s Algorithm
Scalability & Concurrency Maekawa’s Algorithm
• Lamport’s Algorithm is best when fairness and ordering are critical.
• Ricart-Agrawala’s Algorithm is more efficient than Lamport’s.
• Maekawa’s Algorithm is best for large distributed systems due to lower message
complexity.