0% found this document useful (0 votes)
42 views53 pages

Processes vs. Threads in Systems

Uploaded by

avin.barack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views53 pages

Processes vs. Threads in Systems

Uploaded by

avin.barack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Processes and Threads

The role of processes in distributed systems

1
Overview
• Role of processes & threads in a distributed system
– Asynchronous communication to overlap communication latencies with
local processing.
– Structure processes with multiple threads
– Process migration for load balancing

2
PROCESSES & THREADS IN
DISTRIBUTED SYSTEM

3
Concurrency Transparency
• Traditionally, operating systems used the process concept to
provide concurrency transparency to executing processes.
– In a single computer processes share CPU, memory, disk, etc.
(multiprogramming, process isolation, other techniques)
• Today, multithreading provides concurrency with less overhead in
some instances

4
Large Applications
• Early operating systems (e.g., UNIX) supported
large apps by creating several cooperating
programs via fork( ) system call (Parent process
forks multiple child processes)
– Initially, the child process is a clone of the parent
• The exec system call will replace the existing
executable (code, data, heap & stack) with a new
executable file: executable code, data, etc.
– i.e., an existing process executes a new program

5
Forking a New Process
1. [Link]
2. [Link]

• The fork/exec approach can be used to create a


family of related processes.
• See the above web pages for examples.

6
Large Applications: Overhead
• Many of the steps in forking a new process
duplicate existing process structures
– When followed by an exec, this is wasted effort.
• Processes used IPC mechanisms to exchange
information:
– Pipes, message queues, shared memory
– Overhead: numerous context switches
– To create and then schedule multiple processes
– To create and access shared memory
– Inter-process communication

7
Overhead Due to Process Switching

Restore CPU context


Save CPU context Modify data in MMU
Modify data in MMU registers registers
Invalidate TLB entries (?) ...
...

Context switching as the result of IPC.


8
Large Applications:
Multi-threaded
• Modern languages support multi-threaded
processes where a single process can contain
several separate threads of execution instead of
several separate programs.
• Benefits of multiple threads v multiple programs
– Less overhead for process creation, context switches…
– Less communication overhead
– Easier to handle asynchronous events
– Easier to handle priority scheduling

9
Thread
• Conceptually, one of several concurrent
execution paths contained in a process.
• Two threads in the same process can share
global data as easily as two functions in a single
process
– Threads can be scheduled concurrently with other
threads so they must synchronize access to shared
resources.
• Thread switches are more economical than
process switches; there is less
information to be saved.

10
11
Threads – Revisited
• Uses of multithreading:
– To allow a process to do I/O and computations at the
“same” time: one thread blocks to wait for input,
others can continue to execute. Two threads can do
two separate, but related tasks.
– To allow separate threads in a program to be
distributed across several processors in a shared
memory multiprocessor
– To allow a large application to be structured as
cooperating threads, rather than cooperating
processes (avoiding excess context switches &
process creation overhead)
– Simplify program development (divide-and-conquer)

12
Threads
• POSIX (pThreads): library functions to
manage threads
[Link]
[Link]#CREATIONTERMINATION
• Thread discussion:
[Link]
• Java Thread Tutorial:
[Link]
rency/[Link]

13
Thread Implementations
• Kernel-level
– Support multiprocessing
– Independently schedulable by OS
– Process can still run if one thread blocks on a system
call.
• User-level
– Less overhead than k-level; faster execution of the
entire process
– Not visible to the kernel
• Light weight processes (LWP)
– Hybrid; Example: in Sun’s Solaris OS
• Scheduler activations
– Research based
14
Basic Thread Types
for User Processes
Kernel-Level Threads User-Level Threads
• The kernel creates & thus is • User-level threads are created
aware of KLT threads and by calling functions in a user-
schedules them independently level library rather than by the
as if they were processes. operating system
• One thread in a process may • Thus the OS sees a process with
block for I/O, or some other user-level threads as a single-
event, while other threads in threaded process, so there is no
the process continue to run. way to distribute the threads on a
multi-processor or block one
• Most of the previously
thread of the process.
mentioned uses of threads
assume this kind. • The advantage: they are even
more efficient than kernel
• A kernel-level thread in a user
threads – no mode switches are
process does not have kernel
involved in thread creation or
privileges; i.e, it cannot
execute in kernel mode switching.

15
Implementing KLT and ULT
• Remember
– The kernel creates kernel-level threads and is
responsible for scheduling them.
– User-level threads are created by functions in a user-
level library;
• Comparisons:
– mode switches (KLT) versus procedure call (ULT)
– OS scheduling (KLT) vs application scheduling(ULT)
– Multiple separately executable parts(KLT) vs one (ULT)
• Hybrid threads try to get the best of both worlds:
speed, flexible scheduling plus kernel access.
16
Hybrid Threads –Lightweight Processes (LWP)

• LWP is similar to a kernel-level thread:


– It runs in the context of a regular process
– The process can have several LWPs created
by the kernel in response to a system call.
• User level threads are created by calls to
the user-level thread package.
• The OS schedules an LWP; the user level
thread scheduler chooses a thread to run.
17
Hybrid threads – LWP
• Thread synchronization and context
switching are done at the user level; LWP
is not involved and continues to run.
• If a thread makes a blocking system call
control passes to the OS (mode switch)
– The OS can schedule another LWP or let the
existing LWP continue to execute, in which
case it will look for another thread to run.

18
Hybrid threads – LWP
• Solaris OS was designed to use a variation of
the hybrid approach.
– Processes have LWPs (scheduled by OS) and user-level
threads (1 per LWP, scheduled by the process)
– The OS knows about the LWPs, does not know about the
threads.
• Solaris: developed by SUN microsystems to run
on its hardware but is now owned by Oracle
– Originally a proprietary UNIX system
– Later SUN and AT&T unified several versions of UNIX as
UNIX_SVR4

19
• Advantages of the hybrid approach
– Most thread operations (schedule, destroy,
synchronize) are done at the user level
– Blocking system calls need not block the
whole process
– Applications only deal with user-level threads
– LWPs can be scheduled in parallel on the
separate processing elements of a
multiprocessor.

20
Scheduler Activations
• Another approach to combining benefits of u-level and k-
level threads
• When a thread blocks on a system call, the kernel
executes an upcall to a thread scheduler in user space
which selects another runnable thread
• Violates the principles of layered software
– (communication originates at the user level and proceeds down
to the lower-levels)

21
THREADS IN DISTRIBUTED SYSTEMS

22
Threads in Distributed Systems
• Threads gain much of their power by sharing an address
space
– But … no sharing across nodes in distributed systems
• However, multithreading can be used to improve the
performance of individual nodes in a distributed system.

23
Multithreaded Clients
• Main advantage: hide network latency
– Addresses problems such as delays in downloading documents from web
servers in a WAN
• Hide latency by starting several threads
– One to download text (display as it arrives)
– Others to download photographs, figures, etc.
• Browser displays results as they arrive.

24
Multithreaded Clients
• Even better: if servers are replicated, the multiple threads
may be sent to separate sites.
• Result: data can be downloaded in several parallel
streams, improving performance even more.
• Designate a thread in the client to handle and display
each incoming data stream.

25
Multithreaded Servers
• Improve performance, provide better
structuring
• Consider what a file server does:
– Wait for a request
– Execute request (may need blocking I/O to access data)
– Send reply to client
• Several models for programming the server
– Single threaded
– Multi-threaded
– Finite-state machine
26
Threads in Distributed Systems - Servers
• A single-threaded (iterative) server processes
one request at a time – other requests must
wait.
– Possible solution: create (fork) a new server process
for a new request.
• This approach creates performance problems
(servers must share information across process
borders)
• Instead, create a new server thread.
– Faster, because shared data structures can be
accessed with mode switches, not context switches.

27
Multithreaded Servers

A multithreaded server organized in a


dispatcher/worker model. 28
Finite-state machine
• The file server is single threaded but doesn’t block for I/O
operations
• Instead, save state of current request, switch to a new task –
client request or disk reply.
• Outline of operation:
– Get request, process until blocking I/O is needed
– Save state of current request, start I/O, get next task
– If task = completed I/O, resume process waiting on that I/O using saved
state, else service a new request if there is one.

29
Processes in a Distributed System

Clients, Servers, and Code Migration

30
Another Distributed System Definition
“Distributed computing systems are
networked computer systems in which the
different components of a software
application program run on different
computers on a network, but all of the
distributed components work cooperatively
as if all were running on the same
machine.”
[Link]
%202000%20from%20mba%20server/frankie_gulledge/corba/corba_overview.htm

Client and server components run on different or


same machine (usually different)
31
Client Side Software
• Manages user interface
• Parts of the processing and data (maybe)
• Support for distribution transparency
– Access transparency: RPC client side stubs hide
communication and hardware details.
– Location, migration, and relocation transparency rely
on naming systems, among other techniques
– Failure transparency (e.g., client middleware can
make multiple attempts to connect to a server)

32
Client-Side Software for Replication
Transparency
Transparent replication of a server using a client-side solution.

Here, the client application is shielded from replication issues by client-side software
that takes a single request and turns it into multiple requests; takes multiple responses
and turn them into a single response.
33
More About Servers
• Processes that implement a service for a collection of
clients
– Passive: servers wait until a request arrives
• Server Design:
– Iterative servers: handles one request at a time, returns
response to client
– Concurrent servers: act as a central receiving point
• Multithreaded servers versus forking a new process

34
Stateful versus Stateless
• Some servers keep no information about clients
(Stateless)
– Example: a web server which honors HTTP requests doesn’t
need to remember which clients have contacted it.
• Stateful servers retain information about clients and their
current state, e.g., updating file X.
– Loss of server state may lead to permanent loss of information.

35
Server Clusters
• A server cluster is a collection of machines, connected
through a network, where each machine runs one or more
services.
• Often clustered on a LAN
• Three tiered structure is common
– Client requests are routed to one of the servers through a front-
end switch

36
Server Clusters (1)
• The general organization of a three-tiered server cluster.

37
Three tiered server cluster
• Tier 1: the switch (access/replication transparency)
• Tier 2: the servers
– Some server clusters may need special compute-intensive machines in
this tier to process data
• Tier 3: data-processing servers, e.g. file servers and database
servers
– For other applications, the major part of the workload may be here

38
Server Clusters
• In some clusters, all server machines run the same
services
– May benefit from load balancing
• In others, different machines provide different services
– May benefit from server migration to idle machines
• Server clusters are one proposed use for virtual machines

39
Code Migration: Overview
• Instead of distributed system communication based
on passing data, why not pass code instead?
– Load balancing
– Reduce communication overhead
– Parallelism; e.g., mobile agents for web searches
– Flexibility – configure system architectures dynamically
• Code migration v process migration
– Process migration may require moving the entire process
state; can the overhead be justified?
– Early DS’s focused on process migration & tried to
provide it transparently (e.g., NOW & Condor)
40
Client-Server Examples
• Example 1: (Send Client code to Server)
– Server manages a huge database. If a client application
needs to perform many database operations; e.g., for
data mining, it may be better to ship part of the client
application to the server and send only the results
across the network.
• Example 2: (Send Server code to Client)
– In many interactive DB applications, clients need to fill in
forms that are subsequently translated into a series of
DB operations. Reduce network traffic, improve service.

41
Examples
• Mobile agents: independent code modules that
can migrate from node to node in a network and
interact with local hosts; e.g. to conduct a search
at several sites in parallel
• Dynamic configuration of DS: Instead of pre-
installing client-side software to support remote
server access, download it dynamically from the
server when it is needed.
• Load balancing can be done with either code or
process migration.

42
Code Migration

The principle of dynamically configuring a client to communicate


to a server. The client first fetches the necessary software, and
then invokes the server. 43
Issues for Code Migration
• Heterogeneous systems: Target system has
different architecture, different OS
– Code must be recompiled, re-hosted to new OS.

• Do resources migrate?
– Files, databases, printers, …

• Security
– For host machine
– For migrated code

44
A Model for Code Migration (1)
as described in Fuggetta et. al. 1998

• Three components of a process:


– Code segment: the executable instructions
– Resource segment: references to external resources (files,
printers, other processes, etc.)
– Execution segment: contains the current state
• Private data, stack, program counter, other registers, etc. – same data
that is saved during a context switch.
• How to handle virtual memory issues?

45
A Model for Code Migration (2)
• Weak mobility: transfer the code segment and possibly some
initialization data. (a form of code migration)
– Process can only migrate before it begins to run, or perhaps at a few
intermediate points.
– Requirements: portable code
– Example: Java applets
• Strong mobility: transfer code segment and execution segment.
(Process migration)
– Processes can migrate after they have already started to execute - more
difficult

46
A Model for Code Migration (3)
• Sender-initiated: initiated at the “home” of the
migrating code
– e.g., upload code to a compute server; launch a
mobile agent, send code to a DB
• Receiver-initiated: host machine downloads
code to be executed locally
– e.g., applets, download client code, etc.
• If used for load balancing, sender-initiated
migration lets busy sites send work elsewhere;
receiver-initiated lets idle machines volunteer to
assume excess work.

47
Security in Code Migration
• Code executing remotely may have access to
remote host’s resources, so it should be trusted.
– For example, code uploaded to a server shouldn’t be
able to corrupt its disk

48
Resource Migration*
• Resources are bound to processes
– By identifier: resource reference that identifies a
particular object; e.g. a URL, an IP address, local port
numbers.
– By value: reference to a resource that can be replaced by
another resource with the same “value”, for example, a
standard library.
– By type: reference to a resource by a type; e.g., a printer
or a monitor
• Code migration must not change (weaken) the way
processes are bound to resources.

* - May be omitted 49
Resource Migration*
• How resources are bound to machines:
– Unattached: easy to move; my own files
– Fastened: harder/more expensive to move; a
large DB or a Web site
– Fixed: can’t be moved; local devices
• Global references: meaningful across the
system
– Rather than move fastened or fixed resources,
try to establish a global reference

50
Resource Migration in a Cluster*
• Migrating local resource bindings is simplified in this
example because we assume all machines are located on
the same LAN.
– “Announce” new address to clients
– If data storage is located in a third tier, migration of file bindings
is trivial.

51
Migration and Local Resources*

Actions to be taken with respect to the references to local resources


when migrating code to another machine.
52
Migration in Heterogeneous Systems
• Different computers, different operating systems –
migrated object code is not compatible
• Can be addressed by providing process virtual machines
(e.g., JVM):
– Directly interpret the migrated source code at the host site (as
with scripting languages)
– Interpret intermediate code (object code for a virtual computer)
generated by a compiler.

53

You might also like