0% found this document useful (0 votes)
37 views35 pages

Understanding Processes in Distributed Systems

Chapter 3 discusses processes in distributed systems, emphasizing the importance of communication, management, and scheduling. It explains the concepts of threads, their implementation, and their role in both distributed and nondistributed systems, highlighting the benefits of multithreading for performance. The chapter also covers client-server architecture, server organization, and code migration, detailing how processes and resources interact in a distributed environment.

Uploaded by

willwellworld23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views35 pages

Understanding Processes in Distributed Systems

Chapter 3 discusses processes in distributed systems, emphasizing the importance of communication, management, and scheduling. It explains the concepts of threads, their implementation, and their role in both distributed and nondistributed systems, highlighting the benefits of multithreading for performance. The chapter also covers client-server architecture, server organization, and code migration, detailing how processes and resources interact in a distributed environment.

Uploaded by

willwellworld23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Introduction to Distributed System

Chapter 3 - Processes
Processes in Distributed system
• Communication takes place between processes

• A process is a program in execution

• from OS perspective, management and scheduling


of processes is important
• other important issues arise in distributed systems
• multithreading to enhance performance
• how are clients and servers organized
• process or code migration to achieve scalability and to
dynamically configure clients and servers
Real-world examples
• Opening a Document: When you double-click a Microsoft
Word (or Google Docs) file, the Word program (which is the
executable file on your disk) is loaded into memory, and the
Operating System (OS) creates a Process to run it. While
Word is open and active, it is a Process .
• Web Browser Tab: When you open Chrome or Firefox, the
browser itself is a Process. When you open a new tab to visit
a website, the browser often creates a separate Process (or a
set of threads within the main process) to render that new
web page. Each running tab is typically managed as a
distinct Process or part of one.
• Command Execution: When you type a command like ls
(list files) or copy in a terminal/command prompt, the shell
creates a new Process to execute that specific command.
Once the command finishes, the Process is terminated.
In short, any running application, service, or task that
the Operating System is actively managing is a
process. 3
Threads and their Implementation
What is a Thread?
•A thread (sometimes called a "lightweight process") is the smallest
sequence of programmed instructions that can be managed
independently by a scheduler. Think of it as a separate path of
execution within a larger process

•A Thread is the actual path of execution-the worker that carries out


the instructions within a process. When a process is created, it always
starts with at least one thread, which is its primary thread of control.

•Threads can be used in both distributed and nondistributed systems

•Threads in Nondistributed Systems


• a process has an address space (containing program text and
data) and a single thread of control, as well as other resources
such as open files, child processes, accounting information, etc.

4
Threads and their
Implementation
A Process is a complete, self contained environment for a
program's execution. It is the primary unit of resource
ownership and isolation.

Three separate processes, One process containing three


each containing a single distinct threads of execution. 5
thread of execution.
Threads and their
Implementation
Each thread has its own program counter, registers,
stack, and state; but all threads of a process share
address space, global variables and other resources
such as open files, etc.

6
Threads and their Implementation
• Threads take turns in running
• Threads allow multiple executions to take place in the same
process environment, called multithreading
• Thread Usage -Why do we need threads?
e.g., a wordprocessor has different parts; parts for
 interacting with the user
 formatting the page as soon as changes are made
 timed savings (for auto recovery)
 spelling and grammar checking, etc.

1. Simplifying the programming model: since many activities are


going on at once
2. They are easier to create and destroy than processes since they do
not have any resources attached to them
3. Performance improves by overlapping activities if there is too
much I/O; i.e., to avoid blocking when waiting for input or doing
calculations, say in a spreadsheet
4. Real parallelism is possible in a multiprocessor system
7
Threads and their Implementation
• having finer granularity in terms of multiple threads per
process rather than processes provides better performance
and makes it easier to build distributed applications
• in nondistributed systems, threads can be used with
shared data instead of processes to avoid context
switching overhead in interprocess communication (IPC)

8
context switching as the result of IPC
Thread Implementation
• threads are usually provided in the form of a thread
package
• the package contains operations to create and destroy a
thread, operations on synchronization variables such as
mutexes and condition variables
• two approaches of constructing a thread package

a. construct a thread library that is executed entirely


in user mode (the OS is not aware of threads)
• cheap to create and destroy threads; just allocate and
free memory
• context switching can be done using few instructions;
store and reload only CPU register values
• disadv: invocation of a blocking system call will block the
entire process to which the thread belongs and all other
threads in that process

9
Thread Implementation
b. implement them in the OS’s kernel
•let the kernel be aware of threads and schedule them
•expensive for thread operations such as creation and deletion
since each requires a system call
•solution: use a hybrid form of user-level and kernel-level
threads, called lightweight process (LWP)
•a LWP runs in the context of a single (heavy-weight) process,
and there can be several LWPs per process
•the system also offers a user-level thread package for some
operations such as creating and destroying threads, for thread
synchronization (mutexes and condition variables)
•the thread package can be shared by multiple LWPs

10
Thread Implementation

combining kernel-level lightweight processes


and user-level threads
11
Threads in Distributed Systems
• Multithreaded Clients
• consider a Web browser; fetching different parts of a
page can be implemented as a separate thread, each
opening its own TCP/IP connection to the server or to
separate and replicated servers
• each can display the results as it gets its part of the
page
• Multithreaded Servers
• servers can be constructed in three ways
[Link]-threaded process
• it gets a request, examines it, carries it out to
completion before getting the next request
• the server is idle while waiting for disk read, i.e.,
system calls are blocking

12
Threads in Distributed Systems
[Link]
•threads are more important for implementing servers
•e.g., a file server
• the dispatcher thread reads incoming requests for a file
operation from clients and passes it to an idle worker
thread
• the worker thread performs a blocking disk read; in
which case another thread may continue, say the
dispatcher or another worker thread

13
Threads in Distributed Systems
c. finite-state machine
 if threads are not available
 it gets a request, examines it, tries to fulfill the
request from cache, else sends a request to the file
system; but instead of blocking it records the state
of the current request and proceeds to the next
request
 Summary
Model Characteristics
Single-threaded No parallelism, blocking system
process calls
Parallelism, blocking system calls
Threads
(thread only)
Parallelism, nonblocking system
Finite-state machine
calls a server
three ways to construct
14
Anatomy of Clients
• Two issues: user interfaces and client-side
software for distribution transparency
• [Link] Interfaces
• to create a convenient environment for the interaction
of a human user and a remote server; e.g. mobile
phones with simple displays and a set of keys
• GUIs are most commonly used

• The X Window System (or simply X)


• it has the X kernel: the part of the OS that controls the
terminal (monitor, keyboard, pointing device like a
mouse) and is hardware dependent
• contains all terminal-specific device drivers through the
library called xlib

15
the basic organization of the X Window System

16
Anatomy of Clients
b. Client-Side Software for Distribution Transparency
•in addition to the user interface, parts of the processing and
data level in a client-server application are executed at the
client side
•an example is embedded client software for ATMs, cash
registers, etc.
•moreover, client software can also include components to
achieve distribution transparency
•e.g., replication transparency
• assume a distributed system with replicated servers;
the client proxy can send requests to each replica and
a client side software can transparently collect all
responses and passes a single return value to the
client application

17
transparent replication of a server using a client-side solution
• access transparency and failure transparency can also be
achieved using client-side software

18
Servers and design issues
• General Design Issues
• How to organize servers?
• Where do clients contact a server?
• Whether and how a server can be interrupted
• Whether or not the server is stateless

a. Wow to organize servers?


• Iterative server
• the server itself handles the request and returns the
result
• Concurrent server
• it passes a request to a separate process or thread
and waits for the next incoming request; e.g., a
multithreaded server; or by forking a new process
as is done in Unix
19
Servers and design issues
[Link] do clients contact a server?
•using endpoints or ports at the machine where the server is
running where each server listens to a specific endpoint
•how do clients know the endpoint of a service?
• globally assign endpoints for well-known services; e.g.
FTP is on TCP port 21, HTTP is on TCP port 80
• for services that do not require preassigned endpoints,
it can be dynamically assigned by the local OS
• IANA (Internet Assigned Numbers Authority) Ranges
• IANA divided the port numbers into three ranges

• Well-known ports: assigned and controlled by IANA for


standard services, e.g., DNS uses port 53 20
• Registered ports: are not assigned and controlled
by IANA; can only be registered with IANA to
prevent duplication e.g., MySQL uses port 3306
• Dynamic ports or ephemeral ports : neither
controlled nor registered by IANA
• how can the client know this endpoint? two
approaches
I. have a daemon/Directory Service Approach running
and listening to a well-known endpoint; it keeps
track of all endpoints of services on the collocated
server
• the client will first contact the daemon which
provides it with the endpoint, and then the client
contacts the specific server
21
Client-to-server binding using a
ii. usedaemon
a superserver (as in UNIX) that listens to all
endpoints and then forks a process to take care of the
request; this is instead of having a lot of servers running
simultaneously and most of them idle

Client-to-Server binding using a 22


superserver
C. Whether and how a server can be interrupted
 for instance, a user may want to interrupt a file transfer,
may be it was the wrong file
 let the client exit the client application; this will break the
connection to the server; the server will tear down the
connection assuming that the client had crashed
or
 let the client send out-of-bound data, data to be processed
by the server before any other data from the client; the
server may listen on a separate control endpoint; or send it
on the same connection as urgent data as is in TCP
d. Whether or not the server is stateless
 a stateless server does not keep information on the state of
its clients; for instance a Web server
 soft state: a server promises to maintain state for a limited
time; e.g., to keep a client informed about updates; after the
time expires, the client has to poll
23
• a stateful server maintains information about its clients; for
instance a file server that allows a client to keep a local copy of a
file and can make update operations
• Server Clusters
• a server cluster is a collection of machines connected through
a network (normally a LAN with high bandwidth and low
latency) where each machine runs one or more servers
• it is logically organized into three tiers

the general organization of a three-tiered server cluster 24


• Distributed Servers
 the problem with a server cluster is when the logical
switch (single access point) fails making the cluster
unavailable
 hence, several access points can be provided where
the addresses are publicly available leading to a
distributed server
 e.g., the DNS can return several addresses for the
same host name

25
Code Migration

• so far, communication was concerned on passing data

• we may pass programs, even while running and in


heterogeneous systems

• code migration also involves moving data as well: when


a program migrates while running, its status, pending
signals, and other environment variables such as the
stack and the program counter also have to be moved

26
Code Migration
• Reasons for Migrating Code
• to improve performance; move processes from heavily-
loaded to lightly-loaded machines (load balancing)

• to reduce communication: move a client application that


performs many database operations to a server if the
database resides on the server; then send only results to
the client

• to exploit parallelism (for nonparallel programs): e.g.,


copies of a mobile program (a crawler as is called in search
engines) moving from site to site searching the Web

• to have flexibility by dynamically configuring distributed


systems: instead of having a multitiered client-server
application deciding in advance which parts of a program
are to be run where 27
Code Migration

the principle of dynamically configuring a client to communicate


to a server; the client first fetches the necessary software,
and then invokes the server
28
• Models for Code Migration
• a process consists of three segments: code segment
(set of instructions), resource segment (references to
external resources such as files, printers, ...), and
execution segment (to store the current execution
state of a process such as private data, the stack, the
program counter)

• Weak Mobility
• transfer only the code segment and may be some
initialization data; in this case a program always starts
from its initial stage, e.g. Java Applets
• execution can be by the target process (in its own
address space like in Java Applets) or by a separate
process

29
• Strong Mobility
• transfer code and execution segments; helps to
migrate a process in execution
• can also be supported by remote cloning; having an
exact copy of the original process and running on a
different machine; executed in parallel to the original
process; UNIX does this by forking a child process
• migration can be
• sender-initiated: the machine where the code resides
or is currently running; e.g., uploading programs to a
server; may need authentication or that the client is a
registered one
• receiver-initiated: by the target machine; e.g., Java
Applets; easier to implement

30
• Summary of models of code migration

alternatives for code migration 31


Migration and Local Resources
how to migrate the resource segment
not always possible to move a resource; e.g., a reference
to TCP port held by a process to communicate with other
processes
Types of Process-to-Resource Bindings
Binding by identifier (the strongest): a resource is
referred by its identifier; e.g., a URL to refer to a Web
page or an FTP server referred by its Internet (IP) address
Binding by value (weaker): when only the value of a
resource is needed; in this case another resource can
provide the same value; e.g., standard libraries of
programming languages such as C or Java which are
normally locally available, but their location in the file
system may vary from site to site
Binding by type (weakest): a process needs a resource
of a specific type; reference to local devices, such as 32
monitors, printers, ...
• in migrating code, the above bindings cannot change,
but the references to resources can
• how can a reference be changed? depends whether the
resource can be moved along with the code, i.e.,
resource-to-machine binding
 Types of Resource-to-Machine Bindings
• Unattached Resources: can be easily moved with
the migrating program (such as data files associated
with the program)
• Fastened Resources: such as local databases and
complete Web sites; moving or copying may be
possible, but very costly
• Fixed Resources: intimately bound to a specific
machine or environment such as local devices and
cannot be moved
• we have nine combinations to consider
33
Resource-to machine binding
Unattached Fastened Fixed
Process-to- By MV (or GR) GR (or MV) GR
resource binding identifier
CP (or MV, GR) GR (or CP) GR
By value
RB (or GR, CP) RB (or GR, RB (or
By type CP) GR)
actions to be taken with respect to the references to local
resources when migrating code to another machine
• GR: Establish a global system wide reference
• MV: Move the resource
• CP: Copy the value of the resource
• RB: Rebind process to a locally available resource
Exercise:
for each of the nine combinations, give
example resources
34
• Migration in Heterogeneous Systems
• distributed systems are constructed on a heterogeneous
collection of platforms, each with its own OS and machine
architecture
• heterogeneity problems are similar to those of
portability
• easier in some languages
 for scripting languages the source code is interpreted
 for Java an intermediary code is generated by the
compiler for a virtual machine
• in weak mobility
 since there is no runtime information, compile the source
code for each potential platform
• in strong mobility
• difficult to transfer the execution segment since there
may be platform-dependent information such as register
values; Read the book about possible solutions 35

You might also like