0% found this document useful (0 votes)

77 views22 pages

Spark Architecture and Deploy Modes

The notes on Spark

Uploaded by

anuk93620

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views22 pages

Spark Architecture and Deploy Modes

The notes on Spark

Uploaded by

anuk93620

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Spark study notes: core

concepts visualized
Learning Spark is not an easy thing for a person with less
background knowledge on distributed systems. Even
though I have been using Spark for quite some time, I
find it time-consuming to get a comprehensive grasp of
all the core concepts in Spark. The official Spark
documentation provides a very detailed explanation, yet
it focuses more on the practical programming side. Also,
tons of online tutorials can be overwhelming to a starter.
Therefore in this article I would like to note down those
Spark core concepts, but in a more visualized way.
Hope you will find it useful as well!

Note: probably you already have some knowledge about

Hadoop, so I will skip explanations on trivial things such
as nodes and clusters.

Spark architecture and deploy modes

To put it simple, Spark runs on a master-worker
architecture, a typical type of parallel task computing
model. When running Spark, there are a few modes we
can choose from, i.e. local (master, executor, driver are
all in the same single JVM machine), standalone, YARN
and Mesos. Here we only talk about Spark on YARN and
the difference between YARN client and YARN cluster
since both are most commonly used, yet very confusing.
Below two pictures illustrate the setup for both modes.
They look quite similar, don’t they? However, by looking
at the orange highlighted part you will probably notice
the minor difference, which is the location of Spark
driver program. This is basically the only difference
between the two modes.

Fig 1. Spark deployment mode YARN-client (left) and YARN-cluster (right)

Suppose you’ve written a Spark application called

spark_hello_world.py. In client mode, when executing the
python file using spark-submit, the driver is launched
directly within the spark-submit process, hence it will
reside in the same machine as with spark_hello_world.py.
When initializing the Spark context, the driver within the
local machine will connect to the application master in
the cluster. Starting from the master, Spark launch more
executors.

In cluster mode, the spark_hello_world.py code lives in

the client machine and the client machine is outside of
the cluster. When executing the application python code,
it launches a driver program in one of the nodes in the
cluster. Together with Spark application master it can
launch executors and issue application commands.

Given that the setup do not differ much, you must be

wondering why we need two different modes. In practice,
this relates to whether the client machine is physically
co-located with the worker machines or not. If the client
machine is “far” from the worker nodes, e.g. you write
the spark_hello_world.py on your laptop but the workers
are AWS EC2 instances, then it makes sense to use
cluster mode, so as to minimize network latency between
the drivers and the executors. In another scenario, if
your python file is in a gateway machine quite “close” to
the worker nodes, the client mode could be a good
choice.
Executors
Now that we understand the Spark cluster setup, let’s
zoom in to one of the most important elements in Spark -
executor. Executors are the processes that run tasks and
keep data in memory or disk storage across them.

When going through the Spark documentation you might

be surprised at the number of configurable parameters
related to executors. Instead of trying hard to figure out
the relation between several parameters in one’s head
again and again, let’s look at it visually.

Fig 2. Spark executor internals

As shown in Figure 2, in each executor there is an

executor JVM, storing the RDD partitions, cached RDD
partition, running internal threads and tasks. If there are
more cores than required by the tasks, there would also
be free cores in the JVM. This green block of executor
JVM will be our starting point to look at the memory
management in executors.
Executor memory management
In the executor container, there are mainly two blocks of
memory allocated: memory overhead and executor
memory.

Memory overhead is reserved off-heap memory for things

like VM overheads, interned strings, other native
overheads, etc.. By caching data outside of main Java
heap space, but still in RAM, the off-heap memory allows
the cache to overcome lengthy JVM Garbage Collection
pauses when working with large heap sizes.

Executor memory consists of three parts as follows.

 Reserved memory
 User memory: for storing things such as user data
structures and internal metadata in Spark.
 Storage and execution memory: for storing all the
RDD partitions and allocating run-time memory for
tasks.

Figure 3 shows the relevant parameters for each memory

block. Suppose we set [Link] to 4 GB,
then Spark will request 4.4 GB memory in total from the
resource manager. Out of the 4 GB executor memory, we
actually get 3.7 GB because the rest is reserved. And by
default, we get 2.2 GB (0.6 * 3.7) as execution + storage
memory. Out of this, 1.1 GB will be used for storage such
as storing RDDs, and the rest will be execution memory.
Fig 3. Spark executor memory decomposition

RDD, jobs, stages and tasks

If you have already started debugging Spark application
using Spark UI, then probably keywords like jobs, stages
and tasks sound familiar. So how are they relevant with
RDDs?

We know that there are two operations on

RDDs, transformations (e.g. filter, union, distinct,
intersection) by which a new RDD is produced from the
existing one virtually without actual execution,
and actions (e.g. take, show, collect, foreach) which
triggers the execution. When transforming an RDD,
based on the relationship between the parent RDD and
the transformed RDD, the dependency can
be narrow or wide. With narrow dependency, in the
parent RDD one or many partition will be mapped to one
partition in the new RDD. While with wide dependency,
such as when doing a join or sortBy, we need
to shuffle partitions in order to compute the new RDD.
Fig 4–1. narrow dependency in RDD transformation

Fig 4–2. Wide dependency in RDD transformation

The jobs, stages and tasks are therefore determined by

the type of operations and the type of
transformations. A job is created when there is
an action on an RDD. Within the job, there could be
multiple stages, depending on whether or not we need to
perform a wide transformation (i.e. shuffles). In each
stage there can be one or multiple transformations,
mapped to tasks in each executor.

Fig 5. Illustration of one Spark job

To understand it practically let’s look at the following

simple code snippet.
1. val RDD1 = [Link](Array('1', '2', '3', '4', '5')).map{ x
=> val xi = [Link]; (xi, xi+1) }
2. val RDD2 = [Link](Array('1', '2', '3', '4', '5')).map{ x
=> val xi = [Link]; (xi, xi*10) }
3. val joinedData = [Link](RDD1)
4. val filteredRDD = [Link]{case (k, v) => k % 2 == 0}
5. val resultRDD = [Link]{ iter =>
[Link]{ case (k, (v1, v2) ) => (k, v1+v2) } }
6. [Link](2)

There are a few operations in this code,

i.e. map, join, filter, mapPartitions and take. When
creating the RDDs Spark will generate two stages for
RDD1 and RDD2 separately, as shown in stage 0 and 1.
Since map function contains a narrow dependency, the
mapped RDDs will also be included in stage 0 and 1
respectively. Then we join RDD1 and RDD2,
because join is a wide transformation containing shuffles,
Spark creates another stage for this operation.
Afterwards, filter and mapPartition are again a narrow
transformations in stage 2, and by calling take (which is
an action), we trigger Spark’s execution.
Fig 6. DAG visualization

So, that is all the basic stuff for Spark. Hope after
reading this article these concepts are more clear for
you. Happy learning!

References
 [Link]
 [Link]
distribution_of_executors_cores_and_memory_for_spar
k_application.html
 [Link]
 [Link]
management-part-1-push-it-to-the-limits/
 [Link]
[Link]#rdd-operations

Any feedback and comments are welcome. Your

support means a lot to an author! ❤

Connect with me on LinkedIn.

📝 Read this story later in Journal.

🗞 Wake up every Sunday morning to the week’s most

noteworthy Tech stories, opinions, and news waiting in
your inbox: Get the noteworthy newsletter >

Running Spark Jobs on YARN

[Link]

When running Spark on YARN, each Spark executor runs

as a YARN container. Where MapReduce schedules a
container and fires up a JVM for each task, Spark hosts
multiple tasks within the same container. This approach
enables several orders of magnitude faster task startup
time.

Spark supports two modes for running on YARN, “yarn-

cluster” mode and “yarn-client” mode. Broadly, yarn-
cluster mode makes sense for production jobs, while
yarn-client mode makes sense for interactive and
debugging uses where you want to see your application’s
output immediately.
Understanding the difference requires an understanding
of YARN’s Application Master concept. In YARN, each
application instance has an Application Master process,
which is the first container started for that application.
The application is responsible for requesting resources
from the ResourceManager, and, when allocated them,
telling NodeManagers to start containers on its behalf.
Application Masters obviate the need for an active client
— the process starting the application can go away and
coordination continues from a process managed by YARN
running on the cluster.

In yarn-cluster mode, the driver runs in the Application

Master. This means that the same process is responsible
for both driving the application and requesting resources
from YARN, and this process runs inside a YARN
container. The client that starts the app doesn’t need to
stick around for its entire lifetime.
yarn cluster mode

The yarn-cluster mode, however, is not well suited to

using Spark interactively. Spark applications that require
user input, like spark-shell and PySpark, need the Spark
driver to run inside the client process that initiates the
Spark application. In yarn-client mode, the Application
Master is merely present to request executor containers
from YARN. The client communicates with those
containers to schedule work after they start:

Yarn Client Mode

Different Deployment Modes across the cluster

In Yarn Cluster Mode, Spark client will submit spark

application to yarn, both Spark Driver and Spark
Executor are under the supervision of yarn. In yarn client
mode, only the Spark Executor are under the supervision
of yarn. The Yarn ApplicationMaster will request
resource for just spark executor. The driver program is
running in the client process which has nothing to do
with yarn.

Spark Architecture and

Deployment Environment
[Link]

A spark application consists of a driver which run either

on the client or on application master node and many
executors which run across slave nodes in the cluster.

An application can be used for a single batch job, an

interactive session with multiple jobs spaced apart, or a
long-lived server continually satisfying requests. Unlike
MapReduce, an application will have processes,
called Executors, running on the cluster on its behalf
even when it’s not running any jobs. This approach
enables data storage in memory for quick access, as well
as lightning-fast task startup time.

Job of Spark Driver

It is responsible for creating spark context, creating

DAG, breaking the job into stages and task and
scheduling of the task. It defines the transformations and
actions applied to the data set.

At its core, the driver has instantiated an object of the

SparkContext class. This object allows the driver to
acquire a connection to the cluster, request resources,
split the application actions into tasks, and schedule and
launch tasks in the executors.

The driver first asks the application master to allocate

resources for the containers on the worker/slave nodes
and create executors process.

Once the executors are created the driver directly

coordinates with the worker nodes and assign the task to
them.
Job of Executor

An executor is a JVM process which is responsible for

executing the task. many tasks can run in parallel in the
executor.

MapReduce runs each task in its own process. When a

task completes, the process goes away. In Spark, many
tasks can run concurrently in a single process, and this
process sticks around for the lifetime of the Spark
application, even when no jobs are running.

The advantage of this model, as mentioned above, is

speed, Tasks can start up very quickly and process in-
memory data. The disadvantage is coarser-grained
resource management. As the number of executors for an
app is fixed and each executor has a fixed allotment of
resources, an app takes up the same amount of resources
for the full duration that it’s running. (When YARN
supports container resizing, we plan to take advantage of
it in Spark to acquire and give back resources
dynamically.)

Cluster Deployment

The SparkContext can connect to several types of cluster

managers (either Spark’s own standalone cluster
manager, Mesos or YARN), which allocate resources for
containers on which spark executors runs.

Spark acquires executors on nodes in the cluster, which

are processes that run computations and store data for
your application. Next, it sends your application code
(defined by JAR or Python files passed to SparkContext)
to the executors. Finally, SparkContext sends tasks to the
executors to run.

Submitting Application (spark-submit)

The spark-submit script in Spark’s bin directory is used to
launch applications on a cluster. It can use all of Spark’s
supported cluster managers through a uniform interface
so you don’t have to configure your application especially
for each one.
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments

 --deploy-mode:Whether to deploy your driver on the

worker nodes (cluster) or locally as an external client
(client) (default: client)

if your application is submitted from a machine far from

the worker machines (e.g. locally on your laptop), it is
common to use cluster mode to minimize network latency
between the drivers and the executors.

Cluster Manager

Spark Standalone cluster

Standalone Master is the resource manager for the

Spark Standalone cluster

Standalone Worker (aka standalone slave) is the

worker in the Spark Standalone cluster .
Standalone cluster mode is subject to the constraint that
only one executor can be allocated on each worker per
application.

A client first connects to standalone master and ask for

the resources from the standalone master and start the
executor process on the worker node and the driver
either on the client node itself or in one of the worker
node.

Here the client act as the application master which is

responsible for requesting the resources from the
resource manager/standalone master.

Hadoop Yarn

YARN (Yet Another Resource Negotiator) is the resource

management layer for the Apache Hadoop ecosystem.

YARN is a software rewrite that decouples MapReduce’s

resource management and scheduling capabilities from
the data processing component, enabling Hadoop to
support more varied processing approaches and a
broader array of applications.

The Application Master oversees the full lifecycle of an

application, all the way from requesting the needed
containers from the Resource Manager to submitting
container lease requests to the NodeManager.
Each application framework that’s written for Hadoop
must have its own Application Master implementation.
Spark also has the implementation of application master.

Yarn Architecture

Yarn Vs Spark Standalone cluster

 YARN allows you to dynamically share and centrally

configure the same pool of cluster resources between
all frameworks that run on YARN. You can throw your
entire cluster at a MapReduce job, then use some of it
on an Impala query and the rest on Spark application,
without any changes in configuration.
 Spark standalone mode requires each application to
run an executor on every node in the cluster, whereas
with YARN, you choose the number of executors to
use.

Introduction to Apache Spark Concepts
No ratings yet
Introduction to Apache Spark Concepts
24 pages
Spark Architecture
No ratings yet
Spark Architecture
7 pages
Recap Spark
No ratings yet
Recap Spark
21 pages
Spark Architecture Explained
50% (2)
Spark Architecture Explained
12 pages
SparkInternals All
No ratings yet
SparkInternals All
90 pages
Spark Architecture
No ratings yet
Spark Architecture
6 pages
BDA Lec8
No ratings yet
BDA Lec8
39 pages
Architecture and Components of Spark
No ratings yet
Architecture and Components of Spark
6 pages
HDP Training Tesco - II Notes
No ratings yet
HDP Training Tesco - II Notes
250 pages
Spark & Databricks Guide for Developers
No ratings yet
Spark & Databricks Guide for Developers
71 pages
Spark Application Deployment Guide
No ratings yet
Spark Application Deployment Guide
18 pages
Spark Essentials for Data Engineers
No ratings yet
Spark Essentials for Data Engineers
17 pages
Bda Unit5
No ratings yet
Bda Unit5
11 pages
Introduction To Spark
No ratings yet
Introduction To Spark
30 pages
Data Bricks
No ratings yet
Data Bricks
42 pages
Spark DAGs: Execution and Optimization
No ratings yet
Spark DAGs: Execution and Optimization
6 pages
BDA Lec7
No ratings yet
BDA Lec7
32 pages
Spark Notes
No ratings yet
Spark Notes
19 pages
Spark Class 1 PPT
No ratings yet
Spark Class 1 PPT
33 pages
Spark Class 1
No ratings yet
Spark Class 1
33 pages
Apache Spark Overview & Features
No ratings yet
Apache Spark Overview & Features
65 pages
Introduction To Spark For Data Engineers / Data Scientists
100% (3)
Introduction To Spark For Data Engineers / Data Scientists
100 pages
Data Bricks Interview
No ratings yet
Data Bricks Interview
18 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
19 pages
Spark Programming: Big Data Processing Guide
No ratings yet
Spark Programming: Big Data Processing Guide
43 pages
Apache Spark
No ratings yet
Apache Spark
100 pages
Spark Everything
No ratings yet
Spark Everything
34 pages
Apache Spark Guide
No ratings yet
Apache Spark Guide
33 pages
Understanding Apache Spark Architecture
0% (1)
Understanding Apache Spark Architecture
30 pages
Cluster Mode Overview - Spark 3.3.0 Documentation
No ratings yet
Cluster Mode Overview - Spark 3.3.0 Documentation
1 page
Top Spark Interview Q&A
No ratings yet
Top Spark Interview Q&A
21 pages
Spark Memory & Optimization by Divya Anand
No ratings yet
Spark Memory & Optimization by Divya Anand
38 pages
Big Data Engineering - PySpark
100% (2)
Big Data Engineering - PySpark
120 pages
Learn by Doing It
No ratings yet
Learn by Doing It
9 pages
Big Data Technology: Vietnam National University of HCMC
No ratings yet
Big Data Technology: Vietnam National University of HCMC
39 pages
Spark Slides
No ratings yet
Spark Slides
23 pages
Deploying Spark Applications Guide
No ratings yet
Deploying Spark Applications Guide
48 pages
Top Questions For Data Engineering Interviews 1742072752
No ratings yet
Top Questions For Data Engineering Interviews 1742072752
72 pages
Apache Spark Architecture Overview
No ratings yet
Apache Spark Architecture Overview
4 pages
Spark Questions Imp
No ratings yet
Spark Questions Imp
33 pages
07 - Apache Spark - An Introduction
No ratings yet
07 - Apache Spark - An Introduction
36 pages
Spark Programming and RDDs Overview
No ratings yet
Spark Programming and RDDs Overview
59 pages
Bda Unit 5 - Mam
No ratings yet
Bda Unit 5 - Mam
44 pages
Spark Programming Fundamentals Guide
No ratings yet
Spark Programming Fundamentals Guide
54 pages
Spark
No ratings yet
Spark
160 pages
Fastdataanalyticswithsparkandpython 150207060921 Conversion Gate02
No ratings yet
Fastdataanalyticswithsparkandpython 150207060921 Conversion Gate02
75 pages
Apache Spark IP Chatgpt 2 PDF
No ratings yet
Apache Spark IP Chatgpt 2 PDF
34 pages
BDA Unit III IV
No ratings yet
BDA Unit III IV
33 pages
Spark Databricks Summary
80% (5)
Spark Databricks Summary
100 pages
Apache Spark
No ratings yet
Apache Spark
162 pages
Spark Runtime Architecture Overview
No ratings yet
Spark Runtime Architecture Overview
5 pages
Spark Interview Prep for Telugu Speakers
100% (3)
Spark Interview Prep for Telugu Speakers
31 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Course Slideware
No ratings yet
Course Slideware
60 pages
Module 4
No ratings yet
Module 4
29 pages
Data Engineers Guide Apache Spark Delta Lake v3
No ratings yet
Data Engineers Guide Apache Spark Delta Lake v3
94 pages
Bda Unit 5
No ratings yet
Bda Unit 5
11 pages
Cerificate Report Sharique
No ratings yet
Cerificate Report Sharique
12 pages
Revised Potato Pirates X Python (3-Hour Resource)
No ratings yet
Revised Potato Pirates X Python (3-Hour Resource)
50 pages
System & App Errors With Solutions-References
No ratings yet
System & App Errors With Solutions-References
5 pages
4-Way Set Associative Cache Design
No ratings yet
4-Way Set Associative Cache Design
1 page
AOI Smart GUI & Programming Guide
100% (1)
AOI Smart GUI & Programming Guide
21 pages
PIC18F452 GPIO Pin Configuration Guide
No ratings yet
PIC18F452 GPIO Pin Configuration Guide
50 pages
Log 12 05-08-2025
No ratings yet
Log 12 05-08-2025
136 pages
Vintage Game Consoles An Inside Look at Apple Atari Commodore Nintendo and The Greatest Gaming Platforms of All Time 1st Edition Bill Loguidice Full Chapters Included
No ratings yet
Vintage Game Consoles An Inside Look at Apple Atari Commodore Nintendo and The Greatest Gaming Platforms of All Time 1st Edition Bill Loguidice Full Chapters Included
63 pages
SM-G950F SVC Guide - STELS PDF
No ratings yet
SM-G950F SVC Guide - STELS PDF
45 pages
VHDL Unit 2 Part 2
No ratings yet
VHDL Unit 2 Part 2
25 pages
In-Circuit Serial Programming - Picmicro Mid-Range Mcu Family PDF
100% (2)
In-Circuit Serial Programming - Picmicro Mid-Range Mcu Family PDF
14 pages
Jon Galloway - Technical Evangelist Christopher Harrison - Content Developer
No ratings yet
Jon Galloway - Technical Evangelist Christopher Harrison - Content Developer
24 pages
Important Questions: 2-Marks
No ratings yet
Important Questions: 2-Marks
33 pages
Deploying & Configuring DNS Service
79% (14)
Deploying & Configuring DNS Service
11 pages
AI Code Generation for Competitions
No ratings yet
AI Code Generation for Competitions
74 pages
Synopsys DW
No ratings yet
Synopsys DW
11 pages
HikCentral Professional - System Requirements and Performance - V2.4.1 - 20230524
No ratings yet
HikCentral Professional - System Requirements and Performance - V2.4.1 - 20230524
29 pages
AutoLISP & DCL Programming Guide
No ratings yet
AutoLISP & DCL Programming Guide
36 pages
Final Documentation
No ratings yet
Final Documentation
47 pages
JavaScript Drag-and-Drop Guide
No ratings yet
JavaScript Drag-and-Drop Guide
7 pages
ESDF vs WASD: Keybinds Explained
No ratings yet
ESDF vs WASD: Keybinds Explained
10 pages
Zwin Api
No ratings yet
Zwin Api
91 pages
Growth of APPLE
No ratings yet
Growth of APPLE
2 pages
Revolutionizing Efficient Engine Designs With Power Architecture Technology
No ratings yet
Revolutionizing Efficient Engine Designs With Power Architecture Technology
2 pages
Pega Rule Resolution and Availability Guide
No ratings yet
Pega Rule Resolution and Availability Guide
3 pages
Cloud
No ratings yet
Cloud
7 pages
RAVENNA-AES67 Virtual Audio Device GUIDE
No ratings yet
RAVENNA-AES67 Virtual Audio Device GUIDE
17 pages
1.2 Evolution of Microprocessor
No ratings yet
1.2 Evolution of Microprocessor
33 pages
MobiSTOP Ultima Technical Specifications
No ratings yet
MobiSTOP Ultima Technical Specifications
1 page
Wic
No ratings yet
Wic
6 pages
Unit Ii Inheritance & Polymorphism
No ratings yet
Unit Ii Inheritance & Polymorphism
34 pages

Spark Architecture and Deploy Modes

Uploaded by

Spark Architecture and Deploy Modes

Uploaded by

Spark study notes: core

Note: probably you already have some knowledge about

Spark architecture and deploy modes

Fig 1. Spark deployment mode YARN-client (left) and YARN-cluster (right)

Suppose you’ve written a Spark application called

In cluster mode, the spark_hello_world.py code lives in

Given that the setup do not differ much, you must be

When going through the Spark documentation you might

Fig 2. Spark executor internals

As shown in Figure 2, in each executor there is an

Memory overhead is reserved off-heap memory for things

Executor memory consists of three parts as follows.

Figure 3 shows the relevant parameters for each memory

RDD, jobs, stages and tasks

We know that there are two operations on

Fig 4–2. Wide dependency in RDD transformation

The jobs, stages and tasks are therefore determined by

Fig 5. Illustration of one Spark job

To understand it practically let’s look at the following

There are a few operations in this code,

Any feedback and comments are welcome. Your

Connect with me on LinkedIn.

🗞 Wake up every Sunday morning to the week’s most

Running Spark Jobs on YARN

When running Spark on YARN, each Spark executor runs

Spark supports two modes for running on YARN, “yarn-

In yarn-cluster mode, the driver runs in the Application

The yarn-cluster mode, however, is not well suited to

Yarn Client Mode

In Yarn Cluster Mode, Spark client will submit spark

Spark Architecture and

A spark application consists of a driver which run either

An application can be used for a single batch job, an

Job of Spark Driver

It is responsible for creating spark context, creating

At its core, the driver has instantiated an object of the

The driver first asks the application master to allocate

Once the executors are created the driver directly

An executor is a JVM process which is responsible for

MapReduce runs each task in its own process. When a

The advantage of this model, as mentioned above, is

The SparkContext can connect to several types of cluster

Spark acquires executors on nodes in the cluster, which

Submitting Application (spark-submit)

 --deploy-mode:Whether to deploy your driver on the

if your application is submitted from a machine far from

Spark Standalone cluster

Standalone Master is the resource manager for the

Standalone Worker (aka standalone slave) is the

A client first connects to standalone master and ask for

Here the client act as the application master which is

YARN (Yet Another Resource Negotiator) is the resource

YARN is a software rewrite that decouples MapReduce’s

The Application Master oversees the full lifecycle of an

Yarn Vs Spark Standalone cluster

 YARN allows you to dynamically share and centrally

You might also like