Skip to content

andyogah/pyspark-tutorial

This branch is 2 commits behind mahmoudparsian/pyspark-tutorial:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Jan 20, 2023
f134a4d · Jan 20, 2023
Jan 1, 2022
Jul 11, 2022
Jun 25, 2022
Jul 11, 2022
Jan 1, 2022
Jan 20, 2023

Repository files navigation

PySpark Tutorial

  • PySpark is the Python API for Spark.

  • The purpose of PySpark tutorial is to provide basic distributed algorithms using PySpark.

  • PySpark supports two types of Data Abstractions:

    • RDDs
    • DataFrames
  • PySpark Interactive Mode: has an interactive shell ($SPARK_HOME/bin/pyspark) for basic testing and debugging and is not supposed to be used for production environment.

  • PySpark Batch Mode: you may use $SPARK_HOME/bin/spark-submit command for running PySpark programs (may be used for testing and production environemtns)




PySpark Examples and Tutorials


Books


Miscellaneous


PySpark Tutorial and References...


Questions/Comments

Thank you!

best regards,
Mahmoud Parsian

Data Algorithms with Spark Data Algorithms with Spark PySpark Algorithms Data Algorithms

About

PySpark-Tutorial provides basic algorithms using PySpark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 79.0%
  • Python 18.5%
  • Shell 2.5%