GPU Computing With Spark and Python

This document discusses using GPU computing with Spark and Python. It introduces Numba, a Python compiler that can generate machine code for CPU and GPU. Numba allows Python functions to run faster by compiling them to native code. PySpark is also introduced, which allows distributed programming in Python using the Spark framework. The document outlines why Python is both fast for development but slow for execution, and how tools like Numba and PySpark can leverage hardware like GPUs to speed up Python code. It concludes that these tools enable using Python with Spark on GPU clusters.

Uploaded by

2IA16 MUHAMMAD APRIENALDY

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

79 views33 pages

GPU Computing With Spark and Python

Uploaded by

2IA16 MUHAMMAD APRIENALDY

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 33

GPU Computing with

Spark and Python

Aﬁf A. Iskandar
(AI Research Engineer)
My Bio

Aﬁf A. Iskandar
Artiﬁcial Intelligence Research Engineer & Educator

- Artiﬁcial Intelligence Research Engineer @

Unicorn Startup
- Content Creator & Educator @
NgodingPython

Aﬁf A. Iskandar
AI Enthusiast
Bachelor Degree in Mathematics @ Universitas Indonesia
Master Degree in Computer Science @ Universitas Indonesia
Overview

● Why Python ?
● Numba : Python JIT Compiler for CPU and GPU
● PySpark : Distributed Programming in Python
● Hands-On Tutorial
● Conclusion
Why Python ?
Python is Fast
for writing, testing and developing code
Python is Fast
because it’s interpreted, dynamically typed and high level
Python is Slow
For repeated Execution of Low-level task
Python is Slow, Because

● Python is a high level, interpreted and

dynamically-typed language
● Each Python operation comes with a small
type-checking overhead
● With many repeated small operations (e.g. in a
loop), this overhead becomes signiﬁcant!
The paradox ...

what makes Python fast

for development

what makes Python slow

for code execution
Is there another way ?

- Switching languages for speed in your projects can be a

little clunky:
- Sometimes tedious boilerplate for translating data types
across the language barrier
- Generating compiled functions for the wide range of data
types can be difﬁcult
- How can we use cutting edge hardware, like GPUs?
Numba
Compiling Python

● Numba is an open-source, type-specializing compiler for Python

functions
● Can translate Python syntax into machine code if all type information
can be deduced when the function is called.
● Implemented as a module. Does not replace the Python interpreter!
● Code generation done with:
○ LLVM (for CPU)
○ NVVM (for CUDA GPUs).
How Does Numba Work ?
Numba on the CPU
Numba on the CPU
CUDA Kernels in Python
CUDA Kernels in Python
CUDA Kernels in Python
Calling the Kernel from Python
Handling Device Memory Directly
Higher Level Tools : GPU ufuncs
Higher Level Tools : GPU ufuncs
GPU ufuncs Performance
PySpark
What is Apache Spark

● An API and an execution engine for distributed computing on a cluster

● Based on the concept of Resilient Distributed Datasets (RDDs)
○ Dataset: Collection of independent elements (ﬁles, objects, etc) in memory
from previous calculations, or originating from some data store
○ Distributed: Elements in RDDs are grouped into partitions and may be
stored on different nodes
○ Resilient: RDDs remember how they were created, so if a node goes
down, Spark can recompute the lost elements on another node
Computation DAGs

Fig from:
https://databricks.com/blog/2015/06/22/understanding-your-spark-application-through-visualization.html
How Does Spark Scale ?

● All cluster scaling is about minimizing I/O. Spark does this in several
ways:
○ Keep intermediate results in memory with rdd.cache()
○ Move computation to the data whenever possible (functions are
small and data is big!)
○ Provide computation primitives that expose parallelism and
minimize communication between workers: map, ﬁlter, sample,
reduce, …
Python and Spark

● Spark is implemented in Java & Scala on the JVM

● Full API support for Scala, Java, and Python (+ limited support for R)
● How does Python work, since it doesn’t run on the JVM? not counting
IronPython)
Tutorial
Notebook Link : TBA
Conclusion
PySpark and Numba for GPU Clusters

● Numba let’s you create compiled CPU and CUDA functions right
inside your Python applications.
● Numba can be used with Spark to easily distribute and run your code
on Spark workers with GPUs
● There is room for improvement in how Spark interacts with the GPU,
but things do work.
● Beware of accidentally multiplying ﬁxed initialization and compilation
costs.
Thank You

How To Upload AP Invoices Using Oracle WebADI Custom Integrator
100% (1)
How To Upload AP Invoices Using Oracle WebADI Custom Integrator
18 pages
Oops Question Bank
No ratings yet
Oops Question Bank
25 pages
Data Science - Assessment
No ratings yet
Data Science - Assessment
1 page
Latest Angular Interview Questions and Answers-1
100% (1)
Latest Angular Interview Questions and Answers-1
26 pages
Wonderware Alarm DB Logger Object For Wonderware Application Server User Guide Ver 1.x Rev 1.0 PR 00186
No ratings yet
Wonderware Alarm DB Logger Object For Wonderware Application Server User Guide Ver 1.x Rev 1.0 PR 00186
17 pages
MAchine Learning
No ratings yet
MAchine Learning
120 pages
Java Certification Study Notes
No ratings yet
Java Certification Study Notes
91 pages
How To Train An Object Detection Model With Mmdetection - DLology
No ratings yet
How To Train An Object Detection Model With Mmdetection - DLology
7 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
52 pages
Image Processing With CUDA
No ratings yet
Image Processing With CUDA
66 pages
Data Science Learning Path For 50 Days
No ratings yet
Data Science Learning Path For 50 Days
15 pages
K Means
No ratings yet
K Means
18 pages
Yousef Time Series Analysis in Python 2020
100% (1)
Yousef Time Series Analysis in Python 2020
835 pages
Pytorch Lightning Readthedocs Latest
100% (1)
Pytorch Lightning Readthedocs Latest
421 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
Introduction To Keras!: Vincent Lepetit!
No ratings yet
Introduction To Keras!: Vincent Lepetit!
33 pages
Machine Learning Interviews V 2 Week 11715787639480
No ratings yet
Machine Learning Interviews V 2 Week 11715787639480
49 pages
DL Lab Manual
100% (1)
DL Lab Manual
35 pages
Columbia Seaborn Tutorial
No ratings yet
Columbia Seaborn Tutorial
12 pages
21 Machine Learning Using Scikit Learn Ipynb Colaboratory PDF
100% (1)
21 Machine Learning Using Scikit Learn Ipynb Colaboratory PDF
23 pages
ML Lab File
No ratings yet
ML Lab File
53 pages
Hadoop Tutorial
No ratings yet
Hadoop Tutorial
58 pages
CS8451 DESIGN AND ANALYSIS OF ALGORITHMS QUESTION BANK - Watermark PDF
50% (2)
CS8451 DESIGN AND ANALYSIS OF ALGORITHMS QUESTION BANK - Watermark PDF
47 pages
Pytorch: Tensors and Datasets
No ratings yet
Pytorch: Tensors and Datasets
9 pages
Data Structures and Algorithms (DSA) in Python - Self Paced
No ratings yet
Data Structures and Algorithms (DSA) in Python - Self Paced
4 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
332 pages
Deep Learning Tutorial: Release 0.1
No ratings yet
Deep Learning Tutorial: Release 0.1
137 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Difference Between Data Science and Machine Learning
No ratings yet
Difference Between Data Science and Machine Learning
5 pages
Numpy-User-1 10 1
No ratings yet
Numpy-User-1 10 1
107 pages
Python Scikit-Learn Cheat Sheet For Machine Learning
No ratings yet
Python Scikit-Learn Cheat Sheet For Machine Learning
3 pages
Pandas
No ratings yet
Pandas
11 pages
Graph Objects - Python - Plotly
No ratings yet
Graph Objects - Python - Plotly
1 page
Advanced Deep Learning Questions - ChatGPT
No ratings yet
Advanced Deep Learning Questions - ChatGPT
13 pages
A Review of Bayesian Machine Learning Principles, Methods, and Applications
No ratings yet
A Review of Bayesian Machine Learning Principles, Methods, and Applications
6 pages
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
No ratings yet
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
25 pages
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
No ratings yet
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
15 pages
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
No ratings yet
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
6 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
Cuda C/C++ Basics: NVIDIA Corporation
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
67 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
Feature Engineering
No ratings yet
Feature Engineering
13 pages
ML Notes All
No ratings yet
ML Notes All
257 pages
Deep Learning and TensorFlow
No ratings yet
Deep Learning and TensorFlow
50 pages
Text Summarization
No ratings yet
Text Summarization
60 pages
100 Days of Data Engineering - Make A Copy and Use As You Need
No ratings yet
100 Days of Data Engineering - Make A Copy and Use As You Need
7 pages
Python Neural Network
No ratings yet
Python Neural Network
5 pages
Udacity Deep LEarning Part4 RNN
No ratings yet
Udacity Deep LEarning Part4 RNN
338 pages
Analysis and Design of Algorithms PDF
No ratings yet
Analysis and Design of Algorithms PDF
54 pages
Data Structures by Fareed Sem IV 27-Jan-2020 Pages 213 Completed W1
No ratings yet
Data Structures by Fareed Sem IV 27-Jan-2020 Pages 213 Completed W1
213 pages
chiphuyen_acmtechtalkslides
No ratings yet
chiphuyen_acmtechtalkslides
23 pages
Top 9 Data Science Algorithms
No ratings yet
Top 9 Data Science Algorithms
152 pages
Pyspark Learning Hub
No ratings yet
Pyspark Learning Hub
7 pages
Keras Cheat Sheet Python
No ratings yet
Keras Cheat Sheet Python
1 page
PyTorch Crash Course 1713016363
No ratings yet
PyTorch Crash Course 1713016363
15 pages
Deep Learning PPT Full Notes
No ratings yet
Deep Learning PPT Full Notes
105 pages
Python Programming 4th Edition Singh A. all chapter instant download
100% (3)
Python Programming 4th Edition Singh A. all chapter instant download
72 pages
Lecture 4 - Pair RDD and DataFrame
No ratings yet
Lecture 4 - Pair RDD and DataFrame
38 pages
Tensor Flow
100% (1)
Tensor Flow
9 pages
Web App Testing Using Knockout.JS
From Everand
Web App Testing Using Knockout.JS
Roberto Messora
No ratings yet
Feature engineering Complete Self-Assessment Guide
From Everand
Feature engineering Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
GPU Computing With Apache Spark and Python: April 5, 2016
No ratings yet
GPU Computing With Apache Spark and Python: April 5, 2016
55 pages
GPU Computing For Data Science - John Joo
No ratings yet
GPU Computing For Data Science - John Joo
34 pages
IntroGPUs
No ratings yet
IntroGPUs
36 pages
Identify by Your Self
No ratings yet
Identify by Your Self
15 pages
SN Quick Reference 2018
No ratings yet
SN Quick Reference 2018
6 pages
W52P Openvpn With Mikrotik
100% (1)
W52P Openvpn With Mikrotik
2 pages
Dinesh Sharma
No ratings yet
Dinesh Sharma
1 page
Meenakhi Senapati
No ratings yet
Meenakhi Senapati
3 pages
Core PHP (Duration 20-22 Classes) Topics: PHP, Mysql, Javascript
0% (1)
Core PHP (Duration 20-22 Classes) Topics: PHP, Mysql, Javascript
6 pages
IDEF4 - OBJECT-ORIENTED DESIGN METHOD REPORT
No ratings yet
IDEF4 - OBJECT-ORIENTED DESIGN METHOD REPORT
169 pages
Lab 2 - Nguyen Petrik EECE2160
No ratings yet
Lab 2 - Nguyen Petrik EECE2160
16 pages
NEW LAUNCH REPEAT 1 Introducing Amazon SageMaker Studio, The First Full IDE For ML AIM214-R1
No ratings yet
NEW LAUNCH REPEAT 1 Introducing Amazon SageMaker Studio, The First Full IDE For ML AIM214-R1
44 pages
Lab Experiment #01 - System Event Logs
No ratings yet
Lab Experiment #01 - System Event Logs
3 pages
Bsc/Beng/Hnd Computing (Year 1)
No ratings yet
Bsc/Beng/Hnd Computing (Year 1)
37 pages
(2019-ICSE) Hunting For Bugs in Code Coverage Tools Via Randomized Differential Testing
No ratings yet
(2019-ICSE) Hunting For Bugs in Code Coverage Tools Via Randomized Differential Testing
12 pages
WML and Wap
No ratings yet
WML and Wap
3 pages
Untitled
No ratings yet
Untitled
2 pages
A Case Study For Improving The Performance of Web Application
No ratings yet
A Case Study For Improving The Performance of Web Application
4 pages
Programming Assignment 8
No ratings yet
Programming Assignment 8
5 pages
Hardware Sales and Servicing Information System
0% (1)
Hardware Sales and Servicing Information System
39 pages
Ready, Set, Building Applications With GO!
No ratings yet
Ready, Set, Building Applications With GO!
76 pages
William J. Tetrault, Jr. 361 Fairway Ave, Port St. Lucie, Florida 34983 SQL/BI Developer/Software Engineer
No ratings yet
William J. Tetrault, Jr. 361 Fairway Ave, Port St. Lucie, Florida 34983 SQL/BI Developer/Software Engineer
4 pages
Basic Controls
No ratings yet
Basic Controls
3 pages
What A Software Architect Needs To Know When Using AF PDF
100% (1)
What A Software Architect Needs To Know When Using AF PDF
61 pages
House Rent Application
No ratings yet
House Rent Application
11 pages
Introduction To Testing As An Engineering Activity
100% (1)
Introduction To Testing As An Engineering Activity
12 pages
AZ-400 StudyGuide ENU FY23Q1.0
No ratings yet
AZ-400 StudyGuide ENU FY23Q1.0
10 pages
Webpage Creation Using HTML
No ratings yet
Webpage Creation Using HTML
45 pages
Ricardo Tapia Cesena: Java Solution Architect / SR FULL STACK Java Developer
No ratings yet
Ricardo Tapia Cesena: Java Solution Architect / SR FULL STACK Java Developer
9 pages