GPU Computing With Spark and Python
GPU Computing With Spark and Python
Afif A. Iskandar
(AI Research Engineer)
My Bio
Afif A. Iskandar
Artificial Intelligence Research Engineer & Educator
Afif A. Iskandar
AI Enthusiast
Bachelor Degree in Mathematics @ Universitas Indonesia
Master Degree in Computer Science @ Universitas Indonesia
Overview
● Why Python ?
● Numba : Python JIT Compiler for CPU and GPU
● PySpark : Distributed Programming in Python
● Hands-On Tutorial
● Conclusion
Why Python ?
Python is Fast
for writing, testing and developing code
Python is Fast
because it’s interpreted, dynamically typed and high level
Python is Slow
For repeated Execution of Low-level task
Python is Slow, Because
Fig from:
https://databricks.com/blog/2015/06/22/understanding-your-spark-application-through-visualization.html
How Does Spark Scale ?
● All cluster scaling is about minimizing I/O. Spark does this in several
ways:
○ Keep intermediate results in memory with rdd.cache()
○ Move computation to the data whenever possible (functions are
small and data is big!)
○ Provide computation primitives that expose parallelism and
minimize communication between workers: map, filter, sample,
reduce, …
Python and Spark
● Numba let’s you create compiled CPU and CUDA functions right
inside your Python applications.
● Numba can be used with Spark to easily distribute and run your code
on Spark workers with GPUs
● There is room for improvement in how Spark interacts with the GPU,
but things do work.
● Beware of accidentally multiplying fixed initialization and compilation
costs.
Thank You