Data Engineering Nanodegree Program Syllabus
Data Engineering Nanodegree Program Syllabus
Data Engineering
I N CO L L A B O R AT I O N W I T H
In this project, you’ll model user activity data for a music streaming
Course Project app called Sparkify. You’ll create a relational database and ETL
Data Modeling with pipeline designed to optimize queries for understanding what songs
Postgres users are listening to. In PostgreSQL you will also define Fact and
Dimension tables and insert data into your new tables.
LEARNING OUTCOMES
In this project, you are tasked with building an ELT pipeline that
Course Project
extracts their data from S3, stages them in Redshift, and transforms
Build a Cloud Data data into a set of dimensional tables for their analytics team to
Warehouse continue finding insights in what songs their users are listening to.
LEARNING OUTCOMES
In this project, you’ll build an ETL pipeline for a data lake. The data
resides in S3, in a directory of JSON logs on user activity on the app,
Course Project as well as a directory with JSON metadata on the songs in the app.
Build a Data Lake You will load data from S3, process the data into analytics tables
using Spark, and load them back into S3. You’ll deploy this Spark
process on a cluster using AWS.
LEARNING OUTCOMES
Data Wrangling with • Manipulate data with SparkSQL and Spark Dataframes
LESSON TWO
Spark • Use Spark for ETL purposes
Debugging and • Troubleshoot common errors and optimize their code using
LESSON THREE
Optimization the Spark WebUI
LEARNING OUTCOMES
KNOWLEDGE
Find answers to your questions with Knowledge, our
proprietary wiki. Search questions asked by other students,
connect with technical mentors, and discover in real-time
how to solve the challenges that you encounter.
STUDENT HUB
Leverage the power of community through a simple, yet
powerful chat interface built within the classroom. Use
Student Hub to connect with fellow students in your
program as you support and learn from each other.
WORKSPACES
See your code in action. Check the output and quality of
your code by running them on workspaces that are a part
of our classroom.
QUIZZES
Check your understanding of concepts learned in the
program by answering simple and auto-graded quizzes.
Easily go back to the lessons to brush up on concepts
anytime you get an answer wrong.
PROGRESS TRACKER
Stay on track to complete your Nanodegree program with
useful milestone reminders.
Sameh is the CEO of Novelari, lecturer Olli works as a Data Engineer at Wolt. He
at Nile University, and the American has several years of experience on building
University in Cairo (AUC) where he lectured and managing data pipelines on various
on security, distributed systems, software data warehousing environments and has
engineering, blockchain and BigData been a fan and active user of Apache
Engineering. Airflow since its first incarnations.
Juno Lee
C U R R I C U LU M L E A D
AT U DA C I T Y
Juno is the curriculum lead for the School
of Data Science. She has been sharing her
passion for data and teaching, building
several courses at Udacity. As a data
scientist, she built recommendation
engines, computer vision and NLP models,
and tools to analyze user behavior.
C AREER SUPPORT
• Resume support
• Github portfolio review
• LinkedIn profile optimization
By the end of the Nanodegree program, you will have an impressive portfolio
of real-world projects and valuable hands-on experience.
The prerequisites for this program include proficiency in Python and SQL. You
should be comfortable writing functions and loops, using classes, working
with libraries in Python. You should be comfortable querying data using joins,
aggregations, and subqueries in SQL.
You can also prepare by taking a number of Udacity’s free courses, such as:
Introduction to Python Programming SQL for Data Analysis.
Each project will be reviewed by the Udacity reviewer network. Feedback will
be provided and if you do not pass the project, you will be asked to resubmit
the project until it passes.