0% found this document useful (0 votes)

1K views

Azure Data Factory Interview Questions

Uploaded by

alishare

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views

Azure Data Factory Interview Questions

Uploaded by

alishare

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Top 30 Azure Data Factory Interview

Questions and Answers

This post will cover the Top 30 Azure Data Factory Interview
Questions. These are well-researched, up to date and the most
feasible questions that can be asked in your very next interview.

Azure Data Factory is a cloud-based ETL service for scaling out

data Integration and transformation. It offers you to lift and shift
existing SSIS packages on Azure.

Topics of discussions:

• Azure Data Factory Interview Questions

• ADF Interview Questions Basic-Level

• ADF Interview Questions Intermediate-Level

• ADF Interview Questions Advanced-Level

• Conclusion
Azure Data Factory Interview Questions and Answers
I have divided Azure Data Factory Interview questions as per their
difficulty level. Let’s dive right into these questions.

Azure Data Factory Interview Questions Basic Level

Q1) What is Azure Data Factory?

Azure Data Factory is an integration and ETL service offered by

Microsoft. You can create data-driven workflows to orchestrate
and automate data movement. You can also transform the data
over the cloud. It lets you create and run data pipelines that can
help move and transform data and run scheduled pipelines.

Q2) Why do we need Azure Data Factory?

As the world moves to the cloud and big data, data integration and
migration remain an integral part of enterprises in all industries.
ADF helps solve both of these problems efficiently by focusing on
the data and planning, monitoring, and managing the ETL / ELT
pipeline in a single view.

Check out: Azure Data Factory

The reasons for the growing adoption of Azure Data Factory

are:
• Increased value

• Improved results of business processes

• Reduced overhead costs

• Improved decision making

• Increased business process agility

Q3) What do we understand by Integration Runtime?

Integration runtime is referred to as a compute infrastructure used

by Azure Data Factory. It provides integration capabilities across
various network environments.
A quick look at the Types of Integration Runtimes:

Azure Integration Runtime – Can copy data between

•

cloud data stores and send activity to various computing

services such as SQL Server, Azure HDInsight, etc.
• Self Hosted Integration Runtime – It’s basically

software with the same code as the Azure Integration

Runtime, but it’s installed on your local system or virtual
machine over a virtual network.
• Azure SSIS Integration Runtime – It allows you to run
SSIS packages in a managed environment. So when we
lift and shift SSIS packages to the data factory, we use
Azure SSIS Integration Runtime.
Q4) What is the difference between Azure Data Lake and Azure
Data Warehouse?

Azure Data Lake Data Warehouse

Data Lakes are capable of storing data of any form, A Data Warehouse is a store for data that has
size, or shape. previously been filtered from a specific resource.

Business professionals are the ones who use it the

Data Scientists are the ones who use it the most.
most.
It is easily accessible and receives frequent Changing the Data Warehouse becomes a very
changes. strict and costly task.

When the data is correctly stored, it determines the Before storing the data, the data warehouse defines
schema. the schema.

It employs the ELT (Extract, Load, and It employs the ETL (Extract, Transform, and
Transform) method. Load) method.

It’s an excellent tool for conducting in-depth

It is the finest platform for operational users.
research.

Check out: Azure Free Trial Account

Q5) What is the limit on the number of Integration Runtimes?

There is no restriction on the number of integration runtime

instances that can be used. However, the number of VM cores used
by Integration runtime for SSIS package execution is limited to one
per subscription.

Q6) What is Blob Storage in Azure?

Blob storage is specially designed for storing a huge amount of

unstructured data such as text, images, binary data. It helps make
your data available public globally. The most common use of blob
storage is to stream audios and videos, store data for backup,
analysis etc. You can also work with data lakes to perform analytics
using blob storage.

Q7) Difference between Data Lake Storage and Blob Storage.

Data Lake Storage Blob Storage

Blob Storage is a type of general-purpose storage

It’s a big data analytics workload-optimized
that can be used in a variety of situations. It’s also
storage solution.
capable of Big Data Analytics.

A hierarchical file system is used. It’s based on a flat namespace object store.

You can create a storage account with Blob

Data is saved in Data Lake Storage as files within
Storage. The data is stored in containers in the
folders.
storage account.

Text files, binary data, media storage for

Batch, interactive, stream analytics, and machine
streaming, and general-purpose data can all be
learning data can all be stored in it.
stored on it.

Q8) Describe the process to create an ETL process in Azure Data

Factory?

You can create an ETL process with a few steps.

Create a service for linked data store i.e. SQL Server
•

Database.
• Let’s consider you have a dataset for vehicles.

• Now for this dataset, you can create a linked service for

the destination store i.e. Azure Data Lake.

• Then create a Data Set for Data Saving.

• The next step is to create a pipeline and copy activity.

When you are done with creating a pipeline, schedule a
pipeline with the use of an added trigger.
Q9) What is the difference between Azure HDInsight and Azure
Data Lake Analytics?

Azure HDInsight Azure Data Lake Analytics

It’s a Platform as a Service (PaaS) model. It’s a SaaS (Software as a Service) model.

It needs configuring the cluster with predetermined It’s all about passing the data processing queries
nodes in order to process data. We can also process that have been written. To process the data set,
the data using languages like pig or hive. Data Lake Analytics creates compute nodes.

HDInsight Clusters can be readily configured by In terms of setting and customization, it does not
users at their leisure. Users have unrestricted offer a lot of options. However, Azure handles it
access to Spark and Kafka. for its users automatically.

Q 10) What are the top-level concepts of Azure Data Factory?

There are four basic top-level Azure Data Factory concepts:

• Pipeline – It acts as a transport service where many
processes take place.
• Activities – It represents the stages of processes in the
pipeline.
• Datasets – This is the data structure that holds our data.
• Linked Services – These services store information
needed when connecting other resources or services.
Let’s say we have a SQL server, so we need a connection
string that is connected to an external device and we will
mention its source and destination.
Azure Data Factory Interview Questions Intermediate
Level

Q 11) How can we schedule a pipeline?

We can schedule pipelines using a trigger. It follows a world clock

calendar schedule. We can schedule pipelines periodically or
calendar-based recurrent patterns. Here are the two ways:

• Schedule Trigger
• Window Trigger

Q 12) Is there any way to pass parameters to a pipeline run?

Yes absolutely, passing parameters to a pipeline run is a very easy

task. Pipelines are known as the first-class, top-level concepts in
Azure Data Factory. We can set parameters at the pipeline level and
then we can pass the arguments to run a pipeline.

Check out: What is Azure?

Q 13) What is the difference between the mapping data flow
and wrangling data flow transformation?

• Mapping Data Flow: This is a visually designed data

conversion activity that allows users to design graphical
data conversion logic without the need for experienced
developers.
• Wrangling Data Flow: This is a code-free data
preparation activity built into Power Query Online.
Q 14) How do I access the data using the other 80 Dataset types
in Data Factory?

Dataflow mapping now enables Azure SQL databases, data

warehouses, Azure Blob storage, or delimited text files in Azure
Data Lake storage to build native build tools for source and receiver.
You can use a copy operation to declare data from one of the other
connectors, then you can run a data stream operation to transform
the data.

Q 15) Explain the two levels of security in ADLS Gen2?

• Role-based Access Control – It includes built-in Azure

rules such as reader, contributor, owner or customer
roles. It is indicated for two reasons. The first is who can
manage the service themselves, and the second is to
provide users with built-in data mining tools.
• Access Control Lists – Azure Data Lake Storage specifies
exactly which data objects users can read, write, or
execute.
Q 16) What is the difference between the Dataset and Linked
Service in Data Factory?

• Dataset: A reference to a datastore described by a linked

service.
• Linked Service: Just a description of the connection

string used to connect to the data store.

Q 17) What has changed from private preview to limited public
preview regarding data flows?

Some of the things that have changed are mentioned below:

• There is no need to bring your own Azure Databricks

clusters now.
• Data Factory will handle cluster creation and deletion.
• We can still use Data Lake Storage Gen 2 and Blob

Storage to store these files. You may use the appropriate

linked services. You may also use associated services that
are appropriate for the services of the storage engines.
• Blob dataset and Azure Data Lake gen 2 storage split
into delimited text and Apache Parquet dataset.
Q 18) Data Factory supports two types of compute
environments to execute the transform activities. What are
those?

Let’s take a look at the types.

• On-Demand Computing Environment – This is a fully

managed environment provided by ADF. This type of
calculation creates a cluster to perform the
transformation activity and automatically deletes it when
the activity is complete.
• Bring your own environment – In this environment, use
ADF to manage your computing environment yourself.

Q 19) What is Azure SSIS Integration Runtime? Azure

SSIS Integration is a fully managed cluster of virtual
machines hosted in Azure and designed to run SSIS packages in
your data factory. You can scale up SSIS nodes simply by
configuring the node size, or you can scale out by configuring the
number of nodes in the virtual machine cluster.

Q 20) What is required to execute an SSIS package in Data

Factory?

You need to create an SSIS integration runtime and SSIS database

catalog hosted on an Azure SQL database or an Azure SQL
managed instance.

Azure Data Factory Interview Questions Advanced

Level
Q 21) What is Azure Table Storage?
Azure Table Storage is a service that helps users to store structured
data in the cloud and also provides a Keystore with schemas
designed. It is swift and effective for modern-day applications.

Q 22) Can we monitor and manage Azure Data Factory

Pipelines?

Yes, we can monitor and manage ADF Pipelines using the following
steps:

• Go to the Data factory tab and click on the monitor and

manage.
• Now click on the resource manager.
• You will be able to see pipelines, datasets, and linked

services in a tree format.

Q 23) An Azure Data Factory Pipeline can be executed using
three methods. Mention these methods.

Methods to execute Azure Data Factory Pipeline:

• Debug Mode
• Manual execution using trigger now

• Adding schedule, tumbling window/event trigger

Q 24) If we need to copy data from an on-premises SQL Server

instance using a data factory, which integration runtime should
be used?

Self-hosted integration runtime should be installed on the on-

premises machine where the SQL Server Instance is hosted.
Q 25) What are the steps involved in the ETL process?

The ETL (Extract, Transform, Load) process follows four main steps:

• Connect and Collect – Helps move data to local and

crowdsource data storage. Transform data using
computing services such as HDInsight, Hadoop, Spark
etc.
• Publish -Useful for loading data into Azure data
warehouses, Azure SQL databases, Azure Cosmos DB,
and more.
• Monitor – Supports Azure Monitor, API and PowerShell,

log analysis, and pipeline monitoring through the Azure

portal health scope.
Q 26) Can an activity output property be consumed in another
activity?

Yes. An activity output can be consumed in a subsequent activity

with the @activity construct.

Q 27) What is the way to access data by using the other 90

dataset types in Data Factory?

For source and sink, the mapping data flow feature supports Azure
SQL Database, Azure Synapse Analytics, delimited text files from
Azure Blob storage or Azure Data Lake Storage Gen2, and Parquet
files from Blob storage or Data Lake Storage Gen2.

Use the Copy action to stage data from any of the other connectors,
then use the Data Flow activity to transform the data once it’s
staged. For example, your pipeline might copy data into Blob
storage first, then transform it with a Data Flow activity that uses a
dataset from the source.
Q 28) Is it possible to calculate a value for a new column from
the existing column from mapping in ADF?

In the mapping data flow, you can use derive transformation to

generate a new column based on the logic you want. You can either
create a new derived column or update an existing one when
generating a derived column. Enter the name of the column you’re
creating in the Column textbox.

The column dropdown can be used to override an existing column

in your schema. Click the Enter expression textbox to start creating
the derived column’s expression. You have the option of either
inputting your expression or using the expression builder to create
your logic.

Q 29) What is the way to parameterize column name in

dataflow?

We can pass parameters to columns similar to other properties. Like

in derived column customer can use $ColumnNameParam =
toString(byName($myColumnNameParamInData)). These
parameters can be passed from pipeline execution down to Data
flows.

Check out: ADF Interview Questions from Microsoft (FAQs)

Q 30) In what way we can write attributes in cosmos DB in the

same order as specified in the sink in ADF data flow?
Because each document in Cosmos DB is stored as a JSON object,
which is an unordered set of name/value pairs, the order cannot be
guaranteed.

Conclusion
Guys, no doubt there are a number of job offerings for Azure Data
Engineers. And the jobs will increase drastically in the upcoming
years as every other company is opting for cloud computing. But,
how well you prepare for these opportunities is all what matters.

I have divided the latest Azure Data Factory interview questions as

per their difficulty level. These ADF interview questions will surely
help you to get that extra benefit in an interview over other
candidates.

Snowflake Scenario Based Interview Questions
100% (2)
Snowflake Scenario Based Interview Questions
20 pages
Advanced Data Engineering With Databricks
No ratings yet
Advanced Data Engineering With Databricks
154 pages
Azure Data Engineer Interview Questions and Answers
No ratings yet
Azure Data Engineer Interview Questions and Answers
7 pages
Azure Data Engineer Resume - Hire IT People - We Get IT Done
100% (1)
Azure Data Engineer Resume - Hire IT People - We Get IT Done
4 pages
Azure Databricks Interview
100% (2)
Azure Databricks Interview
35 pages
Azure Data Factory
100% (2)
Azure Data Factory
10 pages
Types of Activities in ADF
100% (1)
Types of Activities in ADF
37 pages
Snowflake Faq
No ratings yet
Snowflake Faq
185 pages
Azure Data Factory Interview Questions and Answer
No ratings yet
Azure Data Factory Interview Questions and Answer
12 pages
Azure Data Engineer Content
No ratings yet
Azure Data Engineer Content
6 pages
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
From Everand
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Debananda Ghosh
No ratings yet
Resume: Rakesh Kumar Prasad Mobile:-09560910462 Objective
No ratings yet
Resume: Rakesh Kumar Prasad Mobile:-09560910462 Objective
6 pages
Cisco Application Centric Infrastructure 4.1 With Vmware V1: About This Demonstration
50% (2)
Cisco Application Centric Infrastructure 4.1 With Vmware V1: About This Demonstration
54 pages
Snowflake+Interview+Questions+ +Part+I
No ratings yet
Snowflake+Interview+Questions+ +Part+I
27 pages
100 Dataengineering Interview Questions TRRaveendra 1694654407
No ratings yet
100 Dataengineering Interview Questions TRRaveendra 1694654407
58 pages
Snowflake
No ratings yet
Snowflake
43 pages
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
From Everand
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
Exam OG
5/5 (1)
Power BI Devloper
No ratings yet
Power BI Devloper
3 pages
Manual Software Basic Setup Atronic Cashline WBC Rev 2 0
100% (1)
Manual Software Basic Setup Atronic Cashline WBC Rev 2 0
46 pages
Capstone Format - IT 201
40% (5)
Capstone Format - IT 201
2 pages
Data Factory
100% (2)
Data Factory
26 pages
Azure Data Engineer Interview Questions
No ratings yet
Azure Data Engineer Interview Questions
35 pages
Azure Data Factory
100% (2)
Azure Data Factory
14 pages
Srilakshmi ADE Resume
No ratings yet
Srilakshmi ADE Resume
4 pages
Azure Data Factory Interview Questions
No ratings yet
Azure Data Factory Interview Questions
33 pages
Notes of Azure Data Bricks
No ratings yet
Notes of Azure Data Bricks
16 pages
Commonly Asked Snowflake
No ratings yet
Commonly Asked Snowflake
26 pages
Advanced Project For Data Engineering in Azure
100% (1)
Advanced Project For Data Engineering in Azure
5 pages
Azure Data Engineer - Updated Profile - Raaman
No ratings yet
Azure Data Engineer - Updated Profile - Raaman
4 pages
Pyspark Interview Code
100% (2)
Pyspark Interview Code
197 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
2.7 Years AzureDataEngineer Prateek
No ratings yet
2.7 Years AzureDataEngineer Prateek
2 pages
Pyspark Notes
No ratings yet
Pyspark Notes
93 pages
PySpark VS SQL Interview Questions
No ratings yet
PySpark VS SQL Interview Questions
16 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
Performance Tuning in Azure Databricks
100% (1)
Performance Tuning in Azure Databricks
124 pages
Azure Analytics: Synapse
100% (4)
Azure Analytics: Synapse
251 pages
CV For Snowflake Traning
No ratings yet
CV For Snowflake Traning
4 pages
Databricks Sparkconfig 1669383836
No ratings yet
Databricks Sparkconfig 1669383836
1 page
Snowflake Architecture - Concepts
No ratings yet
Snowflake Architecture - Concepts
38 pages
Databricks
No ratings yet
Databricks
56 pages
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
100% (1)
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
30 pages
Best Practices For Optimizing Your DBT and Snowflake Deployment
No ratings yet
Best Practices For Optimizing Your DBT and Snowflake Deployment
30 pages
Azure Data Engineer Resume
No ratings yet
Azure Data Engineer Resume
2 pages
Interview Questions
No ratings yet
Interview Questions
16 pages
Snowflake UNIT II
No ratings yet
Snowflake UNIT II
44 pages
Snowflake Questions 2
No ratings yet
Snowflake Questions 2
6 pages
Lab 7 - Orchestrating Data Movement With Azure Data Factory
No ratings yet
Lab 7 - Orchestrating Data Movement With Azure Data Factory
26 pages
Deepak Upadhyay BI Resume Updated
No ratings yet
Deepak Upadhyay BI Resume Updated
4 pages
AZURE DATA FACTORY Content
No ratings yet
AZURE DATA FACTORY Content
5 pages
Power BI Interview Questions
100% (2)
Power BI Interview Questions
6 pages
Delta Table and Pyspark Interview Questions
100% (1)
Delta Table and Pyspark Interview Questions
14 pages
Azure Data Engineer + Databricks Content
No ratings yet
Azure Data Engineer + Databricks Content
7 pages
Azure Data Engineer
100% (4)
Azure Data Engineer
54 pages
Kesani Resume - BI Developer
No ratings yet
Kesani Resume - BI Developer
4 pages
Spark SQL and DataFrames - Spark 2.2.0 Documentation
No ratings yet
Spark SQL and DataFrames - Spark 2.2.0 Documentation
35 pages
Snowflake:: Data Warehouse For Cloud
No ratings yet
Snowflake:: Data Warehouse For Cloud
2 pages
Top 88 Data Modeling Interview Questions and Answers
No ratings yet
Top 88 Data Modeling Interview Questions and Answers
19 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
Experience Summary:-: Power BI Developer J.Phaneendra Babu + 91 9182554792
No ratings yet
Experience Summary:-: Power BI Developer J.Phaneendra Babu + 91 9182554792
3 pages
89 Talend Interview Questions For Experienced 2018 - Real Time Scenario
No ratings yet
89 Talend Interview Questions For Experienced 2018 - Real Time Scenario
3 pages
Azure Databricks Overview
100% (1)
Azure Databricks Overview
4 pages
Snowflake and Its Benefits
No ratings yet
Snowflake and Its Benefits
93 pages
Ex:6 Develop An Application That Makes Use of RSS Feed. (To Provide Internet Options in Android)
No ratings yet
Ex:6 Develop An Application That Makes Use of RSS Feed. (To Provide Internet Options in Android)
6 pages
CAT P2 GR 10 June 2023 (Final)
No ratings yet
CAT P2 GR 10 June 2023 (Final)
14 pages
Lab03: Constructor and Destructor
No ratings yet
Lab03: Constructor and Destructor
9 pages
Difference Between C and Embedded C
No ratings yet
Difference Between C and Embedded C
8 pages
Creating Custom Menu
No ratings yet
Creating Custom Menu
31 pages
SAP Flexible Woprkflow Guide
No ratings yet
SAP Flexible Woprkflow Guide
27 pages
Chapter 13 AutoCAD
No ratings yet
Chapter 13 AutoCAD
70 pages
3H DBMS Blood Bank
No ratings yet
3H DBMS Blood Bank
16 pages
Jiobee App Report
No ratings yet
Jiobee App Report
6 pages
Customer Support On Twitter
No ratings yet
Customer Support On Twitter
6 pages
Unit 4 CP
No ratings yet
Unit 4 CP
19 pages
Himanshu Yadav: Experience Skills
No ratings yet
Himanshu Yadav: Experience Skills
3 pages
Empowerment Technologies: 6 Types of Social Media
0% (1)
Empowerment Technologies: 6 Types of Social Media
4 pages
XML, XSLT, WSDL Interview Questions
No ratings yet
XML, XSLT, WSDL Interview Questions
28 pages
JSON
No ratings yet
JSON
24 pages
CV of A Candidate For The Position
No ratings yet
CV of A Candidate For The Position
4 pages
Sy Slog 070717
No ratings yet
Sy Slog 070717
81 pages
PR Elim Ina Ry: Tellabs 6300 Managed Transport System
No ratings yet
PR Elim Ina Ry: Tellabs 6300 Managed Transport System
106 pages
Piping Engineer Resume
100% (1)
Piping Engineer Resume
2 pages
SS120 Web Application Security Course Manual 03202017 Desbloqueado
No ratings yet
SS120 Web Application Security Course Manual 03202017 Desbloqueado
386 pages
T1021P Software Installation Tutorial
No ratings yet
T1021P Software Installation Tutorial
9 pages
Iwellfile User Guide
No ratings yet
Iwellfile User Guide
22 pages
SigmaWin+5.75 Release Notes
No ratings yet
SigmaWin+5.75 Release Notes
7 pages
Krishna Prasad SV: Career Objective
No ratings yet
Krishna Prasad SV: Career Objective
7 pages
Linux Guide
100% (1)
Linux Guide
276 pages
2024-12-10
No ratings yet
2024-12-10
7 pages