0% found this document useful (0 votes)

127 views61 pages

wk3. Data-ETL

The document provides an overview of the Extract-Transform-Load (ETL) process in accounting analytics, detailing the steps of data extraction, transformation, and loading. It emphasizes the importance of data cleaning, restructuring, and integration, along with various patterns for identifying and correcting data issues. Additionally, it discusses the use of ETL tools like Power Query for managing data effectively throughout the ETL process.

Uploaded by

Jake Moynihan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

127 views61 pages

wk3. Data-ETL

Uploaded by

Jake Moynihan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to

Accounting Analytics
APPLY PATTERNS TO ETL PROCESS

Instructor: Huijue Kelly Duan

Extract-Transform-Load Process

The Extract-Transform-Load (ETL) Process

Extract Data
▪ Data extraction involves transferring data
• Source data are moved to the platform where they will be transformed.
▪ Platform is normally a data warehouse (Power BI and Tableau are examples).
• The process also includes data validation or confirming that the data were transferred
completely and correctly.
▪ Compare the number of records.
▪ Compare descriptive statistics for numeric fields.
▪ Validate Date/Time fields.
▪ Compare string limits for text fields.
• ETL tools make it easy to extract data from databases, spreadsheets, text files, and many
other data sources by providing data connectors that are intuitive software programs
designed to extract data.

3
Data Extraction using Data Connectors

Data Extraction using Data Connectors

Panel (A) shows some data connectors
that are available in Excel, while Panel
(B) does the same for Power BI.
Transform Data
Data transformation improves the raw data for analysis through:
• cleaning
• restructuring, and
• integration.

The purpose of transforming data is to validate the data for

completeness and integrity.
Cleaning Data
❖ Data can be incorrect, invalid, inconsistent or incomplete.
❖ Cleaning is one of the most important and time-consuming aspects of data
transformation.
❖ Also known as data cleansing or data scrubbing.
❖ Incomplete data might need to be added.
• A specific strategy for dealing with incomplete data is imputation, which is when estimated values are
substituted for missing data.

❖ Modifying data is necessary when a current value must be replaced with a

new value if data are incorrect, invalid, inconsistent, or incomplete.
❖ Deleting data that are not relevant for analysis may be necessary.
• Example may be redundant data like duplicate sales transactions.
Removing Rows with Power Query
Removing Rows with
Power Query

Power Query, Excel’s

and Power BI’s ETL
tool have commands
for removing
duplicate rows, blank
rows, or rows
containing errors.

LO 5.2
Replacing Values with Power Query

Replacing Values with Power

Query
Power Query’s Find and Replace option
can fix an inconsistency problem.
Restructuring Data
• Data restructuring does not change data values, but it does change how
the data are organized.
• Also known as data wrangling or data munging.

• ETL tools provide a variety of techniques to make restructuring easier:

• Adding and deleting columns.
• Renaming columns and tables.
• Splitting and merging columns.
• Splitting and combining tables.
• Transposing and unpivoting tables.
Integrating Data
• Data integration is the process of connecting related data. Two distinctive
forms of integration:
1. Linking two tables by defining a relationship between them
• Created using primary and foreign keys.
• Cardinality must be specified.

2. Combining two or more tables unites information about the same entity.
• A union combines different tables with the same data structure. The result is a table with more rows.
• A join, or merge, combines data elements or columns from different tables. The result is a table with
more columns.

• Data matching poses a challenge to data integration. It is a process that

compares data and determines whether they describe the same entity.
Data Matching Issues
Data Matching Issues
Panel (A) contains financial information,
and Panel (B) contains demographic
information. How would we reconcile,
or match, the customer names? Specific
issues to be addressed include:
• Nicknames: Jen Pollack versus Jenny
Pollack.
• Typos: Carlos Panetta versus Carlos
Paretta.
• Reversed names: Margarita David
versus David Margo.
Panel (C) shows the merged customer
table. Most ETL tools provide advanced
support for data matching.
Load Data
• Once the data are cleaned and transformed, they are loaded into the
software for analysis.
• Data loading is the process of making the analytical database available for
use.
• Analytical databases are often posted in the cloud where they can be used
simultaneously by many users.
• Like extraction, part of transferring the data is validating whether all records
were transferred and whether they were transferred correctly.
Join Tables
Using Patterns to ETL Data
• A structured set of data preparation patterns can address most data preparation
challenges.
• These patterns signal potential problems and provide guidance for finding them
within the data set and correcting them.
• Each pattern identifies a data issue.
Pattern 1: Incomplete Data Transfer
Compare Row Counts
• Comparing row counts is one way to check completeness.

Add Missing Rows

• If the row counts don’t match, add the missing rows to the source data: the
Service worksheet in the Beans data set file or the ETL’s data set.
Row Count for Excel Worksheet

Row Count for Excel Worksheet

Row Count with Power Query

The row count shown as part of the ID column profile in Power Query.
Pattern 2: Compare Control Amounts
• Compare the Average in the Excel worksheet with the same number for
Incorrect the Service table in the ETL tool.
• Matching numbers indicate a correct transfer of the values for the ID
Data column in the Service table. Similar tests can be run for the other
columns.

Transfer Modify Incorrect Values

• If the numbers do not match, the next step is to determine which data
were transferred incorrectly and what caused the problem. Once
identified, the incorrectly transferred values in the ETL’s data set can be
modified.
Pattern 3: • Data that are irrelevant for decisions bloat the data
model.

Irrelevant • Avoid integrating unreliable data into the data model.

• Scan Columns for irrelevant and unreliable data:
and o Irrelevant columns can be identified primarily by scanning
the data visually.
Unreliable o The data dictionary can also be a helpful tool.
o Most ETL tools provide statistics about errors, null values,
Data and more, that can help determine a column’s reliability.
• Remove columns with irrelevant or unreliable Data
Detecting and Correcting Unreliable Data

Detecting and Correcting Unreliable Data

Pattern 4: • Column names become variables during data exploration
and interpretation.
Incorrect o Their names are important because other people might use the
analytical database.

and • Visually scan a column’s content and its data dictionary

definition.
Ambiguous o Names should accurately describe the column’s content.

Column o
o
They should be intuitive to business people.
Use only common abbreviations that are understood by everyone,

Names o
such as YTD.
Eliminate spaces, underscores, or other symbols.
Modify Column Names
ETL tools make it easy to correct this
problem by renaming the column. This
illustration shows how to rename a
column in Power Query.
Pattern 5: • Data types are an integral part of column
definitions because they determine what we
Incorrect can and cannot do with the data in a column.
• Inspect the data type: ETL tools automatically assign a data type to
Data Types each column during extraction, but sometimes either the assignment is
incorrect or the ETL tool is unable to determine a data type.
• Change the data type: Correct this problem by changing the data type
with an ETL tool.
Inspecting and
Changing Data Types
Panel (A) shows the raw spreadsheet
data. Panel (B) shows the same data set
after extraction in Power Query. Notice
that the data type is ABC123, also
known as the Any data type. It
indicates that Power Query cannot
identify the data type, which means
calculations cannot be performed on
the data in the column.
Power Query Data Types

Power Query Data Types

This shows the different data types available in Power Query.
• Each cell should contain one value describing one
Pattern 6: characteristic. Two or more values in the same cell makes
analysis more challenging.
Composite • Scenarios that violate the single-valued rule and make
and Multi- analysis more complex are composite columns and multi-
valued columns.
Valued • Detect composite or multi-valued columns by visual scanning.
Columns • How a column is restructured depends on whether the
column is composite or multi-valued.
o The solution for a composite column is to split it.
Split Column with Power Query

Power Query | Split Column | Setup

This shows how to split a column in Power Query. Click on the Name column. Then select the
Home tab in the Main Menu. Click Split Column in the ribbon and select By Delimiter.
Split Column by Delimiter
Power Query | Split Column |
Select Delimiter
This window appears after By
Delimiter is selected. Power
Query defaulted to Comma as the
delimiter. In this example, the
Left-most delimiter option was
selected.
• The wrong value is assigned to one of the entities’
characteristics.
Pattern 7: • It is helpful to look for outlying values that stand out in
Incorrect numeric data.

Values • An outlier falls more than 1.5 times the interquartile range
below the first quartile or above the third quartile.
• Once a questionable value is identified, there are a few
options:
• Identify the error’s root cause and eliminate it.
• Correct the value in the source data.
• The value could be corrected in the analytical database, but not in the
source data.
Profiling for
Incorrect
Data
Profiling for Incorrect Data
Panel (A) displays the column profile statistics generated by Power Query. Panel (B)
shows the values of the ActualTime in descending order. The top three values in
Panel (B) are outliers and most likely incorrect. Also, the actual time for the service
with ID “1325”, 25 hours, is invalid. It should be 2.5 hours.
Pattern 8: • Data inconsistency occurs when two or more
different representations of the same value are
Inconsistent •
mixed in the same column.
Two profiling techniques are useful for detection:
Values • Distinct values: Visually scanning the distinct values of a
column.
• Frequencies: Values with a low frequency could indicate
inconsistent data.
• Correct the inconsistent data by identifying the
root cause and eliminating it or modifying the
values in either the source data or the analytical
database.
Profiling for
Inconsistent Data
Panel (A) shows the distinct values for the
JobTitle column in the Employee table. This
information is available for all columns in
Power Query. The illustration shows that
Sr. Manager is inconsistently represented. Low
frequencies might also indicate a misspelling
resulting in an inconsistency. As Panel (B)
shows, the frequency for Sr Manager is 1. The
value distribution shown in Panel (B) is part of
a column’s profile in Power Query. Another
column with inconsistencies issues in the
Beans data set is University.
Pattern 9: • Addresses incompleteness that might make
data unusable and unreliable.
Incomplete • Explore:
Values •
•
Should null values be allowed, and if not, are there any?
If null values are allowed, what percentage of the values are
null? If the percentage is high, should the column be loaded?
• How are incomplete values represented: nulls, or a specific
code? Are the representations consistent?
Addressing • Investigate null values: ETL tools reveal, on a
column-by-column basis, the percentage of null
Incomplete •
values.
Remove the column or replace the null values:
Values o If null values are not allowed but they exist, they
should be replaced.
o If the number of null values is too high to be useful,
then remove the column from the analytical database.
o If there is inconsistency in representing missing values,
design a consistent schema and correct the values in
terms of that schema.
Pattern 10: • Domain-specific rules that determine whether data
are acceptable can be created for most columns.
Invalid Values • Create and Apply Validation Rules:
o Can rely on the profiling information automatically
generated by the ETL tool.
o For a mandatory column, the statistics about null values
provided by ETL tools can be used for validation.
• If a questionable value is identified, eliminate the
root cause, change the value in the source, or
change the value in the analytical database.
Design and Implementation of Validation Rules

Design and Implementation of Validation Rules

Pattern 11: • Table names are part of both the data model and
the data set’s vocabulary, so they must be correct,
Non-Intuitive •
intuitive, and clear.
Scan Tables for incorrect or ambiguous names and
and rename
o Examining a table’s content and its data dictionary
Ambiguous definition can help determine whether the name
accurately reflects its content.
Table Names o Table names should be intuitive, avoid spaces,
underscores, and special coding.
Pattern 12: • Tables are descriptions of entities, and each instance of an
entity should be uniquely identified.

Missing • To be a primary key, a column must have a unique value for

each instance and no null values.

Primary Keys • Primary keys are normally already in place when data are
extracted from a relational database.
• Identify Tables with a Primary Key Missing
• Create a Primary Key
Identify Tables with a
Primary Missing Key
Column Profile in Power Query
The column profile provided by
Power Query provides information
necessary to identify a primary key.
The value for Empty should be zero
and the values for Count, Distinct,
and Unique should be the same.
Creating a
Primary Key

Creating a Primary Key

ETL tools can help with creation of a primary key as shown in
the Power Query Editor.
• Data inconsistencies occur when the same data are recorded more than once
and changed in one place but not the other, such as a customer’s email

Pattern 13: •
address.
Possible scenarios:
Redundant o When there is overlap such as an address that contains state information and a
separate state field.

Content o When there is dependency, which exists when one column’s values are dependent
on the values of another column in the same table. Assume both age and date of
birth are recorded. Age changes as time passes making the data inconsistent. Age
Across should be calculated as part of the analytical database in this situation.
• Perform column-by-column comparisons for overlaps or dependencies.
Columns • Delete redundant and dependent columns.
o When there is dependency, delete the column that contains the dependent value.
Instead, use a formula to recreate the column in the analytical database.
• Pattern 14 determines the validity of a column’s values
Pattern 14: based on the values in one or more other columns in
the same table.
Find Invalid
Values with • Creating and apply intra-table validation rules:
• The goal of the validation rule is to identify invalid data.
Intra-Table • Creating validation rules requires in-depth knowledge of
the business, and they are implemented using a
Rules scripting language.
Design and Implementation of Intra-table
Validation Rules
Design and
Implementation of
Intra-table Validation
Rules
Example of an intra-
table validation rule
Transform Models
• Transformation patterns at the model level search for data issues across
tables:

• Data that describe the same entity spread across multiple tables.

• Data models with a structure that is difficult to understand.

• Data models that do not support efficient processing.

• Analysis is more challenging when data that describe the
same entity are spread across multiple tables.
Pattern 15: • Below shows two possible scenarios:
Data Spread • In Panel (A) both tables, JanuarySales and FebruarySales, have the
same structure but different rows.
Across Tables • In Panel (B) the two tables describe different characteristics of the
same entity–Product. Some information for the product with ID 1 is in
the ProductDescriptions table. Other information for the same
product, ID = 1, is in the ProductAccounting table. In this case, it is the
product table shown in Panel (D) split vertically.
Pattern 15: Strategies
and Summary
• Identify similarly structured
tables/tables describing different
characteristics of the same entity:
▪ Look for two or more tables with the
same structure. These tables would have
LO 5.6
the same columns and similar data.
▪ Search for tables that describe different
characteristics of the same entity.
• Combine tables (i.e., Union)
Pattern 16: Data Models Do Not Comply
with Principles of Dimensional Modeling
• Dimensional modeling is the technique of creating data models
with fact tables surrounded by dimension tables.
• These data models, such as star schemas, are easy to understand
and result in efficient data processing.
• Analyze a Data Model’s Compliance with Dimensional
Modeling Principles:
• Determining the fact and dimension tables and ensure all fields belong to the correct table.
• In an accounting context, fact tables correspond to business transactions.
• Dimension tables describe who participates in the transactions, when the transactions occurred, and
what was given up or acquired.
Current Beans Star Schema An Ideal Beans Star/Snowflake Schema
The Service table is the fact table, the Employee The new data model to be created that accounts for new
table is a who dimension, and the Client table is also dimension tables and transforming multi-valued column
a who dimension. into single-valued column.
Pattern 16: Reconfiguration and Summary
• Carefully review the steps to reconfigure the data model
Pattern 17: • Determines validity of a column’s values based on
the values in one or more other tables.
Find Invalid • A widely used inter-table validation rule is
referential integrity.
Values with ▪ All values in a foreign key should also exist as values in the
corresponding primary key.
Inter-Table • Create and apply inter-table validation rules that
Rules identify invalid data.
• Modify invalid rules.
Inter-Table Validation Rule Example

Design and Implementation of an Inter-Table Validation Rule

Data Loading
• Once the data are cleaned and transformed, it is time to load them into
the software for analysis.
• Data loading is the process of making the analytical database available
for use.
• Since both extraction and loading are transfer processes, they have
similar issues when it comes to the completeness and correctness of
transferred data.
• It is also important that the data model of the analytical database is
validated–that is, that all relationships have been defined.
Pattern 18: • Loading moves the data from the ETL tool to
the analytical database. Three options:
Incomplete 1. Close and apply: Close Power Query and apply all
transformations to the analytical database.

Data Loading 2. Apply: Apply all the transformations to the analytical database
but keep Power Query open.
3. Close: Close Power Query without applying any transformations
to the analytical database.

ETL-Analytical Database Transfer

Pattern 18 Summary
Compare Row Counts
• The row count for the analytical database can be compared with the row count of
the data set in the ETL tool. The ETL tool will also issue an alert if any errors
occurred when the transformations are applied to the analytical database.

Add Missing Rows

• If the numbers do not match, determine which rows were not transferred and why.
Once identified, add the missing rows to the analytical database.
Pattern 19: Compare Control Amounts
• An effective way to validate the correct transfer of data is
Incorrect by comparing sums, averages, or any other control
amounts.

Data Loading Modify Incorrect Values

• If the numbers do not match, determine which data were
transferred incorrectly and what caused the problem.
Once identified, modify the incorrectly transferred values
in the analytical database.
Pattern 20: Missing or Incorrect Data
Relationships
Investigate the completeness and accuracy of the data model:
• A complete and accurate data model is one in which all relationships are
correct.

Data Model
The finalized data model for the analytical database created with Power Query can be compared with your data model to
determine that no relationships are missing, that there are no unnecessary relationships, and that all relationships are
defined correctly.
Modify the • To do this in Power BI, select the Home tab in
the Main Menu and click Manage
Data Model Relationships in the ribbon. The window
shown below will appear. Select the buttons at
the bottom of the window to create, edit, or
delete a relationship.
Define
Relationships
Illustrates some of the aspects of a
relationship that can be defined.
Pattern 20 Summary
Pattern Summary
Patterns to extract data Patterns to transform Tables Patterns to transform models Patterns to load data
1. Incomplete Data Transfer 11. Non-Intuitive and 15. Data Spread Across Tables 18. Incomplete Data Loading
2. Incorrect Data Transfer Ambiguous Table Names
16. Data Models Do Not Comply 19. Incorrect Data Loading
12. Missing Primary Keys with Principles of
20. Missing or Incorrect Data
Patterns to transform columns Dimensional Modeling
13. Redundant Content Across Relationships
3. Irrelevant and Unreliable Data Columns 17. Find Invalid Values with
4. Incorrect and Ambiguous Inter-Table Rules
14. Find Invalid Values with
Column Names Intra-Table Rules
5. Incorrect Data Types
6. Composite and Multi-Valued
Columns
7. Incorrect Values
8. Inconsistent Values
9. Incomplete Values
10. Invalid Values
Thank you!
Contact me at:
duanh@[Link]

Data Analytics in Accounting Overview
No ratings yet
Data Analytics in Accounting Overview
40 pages
Week2 - Master The Data
No ratings yet
Week2 - Master The Data
28 pages
Data Analytics Certificate Glossary
No ratings yet
Data Analytics Certificate Glossary
23 pages
Python Basics for Beginners
No ratings yet
Python Basics for Beginners
112 pages
Excel-With-Ai s1 25th May
No ratings yet
Excel-With-Ai s1 25th May
8 pages
Enhancing DAX with GPT-4O Innovations
No ratings yet
Enhancing DAX with GPT-4O Innovations
9 pages
Market Research Career Profile
No ratings yet
Market Research Career Profile
3 pages
Microsoft Power BI Visual Calculations - Jeroen Ter Heerdt
No ratings yet
Microsoft Power BI Visual Calculations - Jeroen Ter Heerdt
575 pages
Analytics Case Studies Ebook
No ratings yet
Analytics Case Studies Ebook
12 pages
Brochure Tableau
50% (2)
Brochure Tableau
4 pages
(Day-1) - Power BI Do-it-Yourself
No ratings yet
(Day-1) - Power BI Do-it-Yourself
2 pages
Understanding Python Modules and Libraries
No ratings yet
Understanding Python Modules and Libraries
61 pages
SQL For Data Analytics Slides
No ratings yet
SQL For Data Analytics Slides
42 pages
Bo Xi r2 Query Builder Training
No ratings yet
Bo Xi r2 Query Builder Training
51 pages
Offer Letter
No ratings yet
Offer Letter
17 pages
Rhino Python Scripting Guide
100% (1)
Rhino Python Scripting Guide
1 page
CDC Python Learning Hierarchy
No ratings yet
CDC Python Learning Hierarchy
3 pages
Business Analytics For Managers-Unit1&2
100% (1)
Business Analytics For Managers-Unit1&2
39 pages
Python Project Ideas For Beginners
No ratings yet
Python Project Ideas For Beginners
5 pages
Power BI & Tableau Online Course Guide
No ratings yet
Power BI & Tableau Online Course Guide
15 pages
Tableau Technical Interview Questions
No ratings yet
Tableau Technical Interview Questions
41 pages
Advanced DAX Power BI Power Pivot
No ratings yet
Advanced DAX Power BI Power Pivot
18 pages
Data Analyst Syllabus
No ratings yet
Data Analyst Syllabus
25 pages
Proposal On Power BI
No ratings yet
Proposal On Power BI
9 pages
Lecture 1.1 - Introduction To DE
No ratings yet
Lecture 1.1 - Introduction To DE
27 pages
Project Overview and Task Management
No ratings yet
Project Overview and Task Management
251 pages
Build a Website Using Python Guide
No ratings yet
Build a Website Using Python Guide
2 pages
Data Science Life Cycle Sheet
No ratings yet
Data Science Life Cycle Sheet
191 pages
Digital Marketing Course Guide
No ratings yet
Digital Marketing Course Guide
26 pages
Master AI Tools for Career Growth
No ratings yet
Master AI Tools for Career Growth
29 pages
Power BI Full Syllabus
No ratings yet
Power BI Full Syllabus
6 pages
DataMiningForTheMasses (001 158)
No ratings yet
DataMiningForTheMasses (001 158)
158 pages
Week 6 Spreadsheet For Business Analysis - Practical Session
No ratings yet
Week 6 Spreadsheet For Business Analysis - Practical Session
27 pages
Data Analyst Roadmap 2025: 5 Steps
No ratings yet
Data Analyst Roadmap 2025: 5 Steps
3 pages
Power BI Data Cleaning Techniques
0% (1)
Power BI Data Cleaning Techniques
2 pages
Dax Power Bi
No ratings yet
Dax Power Bi
20 pages
Data Science Project Lifecycle
No ratings yet
Data Science Project Lifecycle
55 pages
Power BI Data Tips for Analysts
No ratings yet
Power BI Data Tips for Analysts
10 pages
Excel Power Pivot and Power Query For Dummies - Wiley (PDFDrive) Conv
No ratings yet
Excel Power Pivot and Power Query For Dummies - Wiley (PDFDrive) Conv
1 page
Master Power BI in 15 Days
No ratings yet
Master Power BI in 15 Days
46 pages
Powerbi Road Map
No ratings yet
Powerbi Road Map
3 pages
Types of Visuals in Power BI
No ratings yet
Types of Visuals in Power BI
9 pages
Data Viz Guide for Designers
No ratings yet
Data Viz Guide for Designers
4 pages
Data Science Questions and Answers
No ratings yet
Data Science Questions and Answers
4 pages
Data Warehouse/Data Mart: Components Concepts Characteristics
0% (1)
Data Warehouse/Data Mart: Components Concepts Characteristics
24 pages
Learning Statistics With StatTools PDF
100% (1)
Learning Statistics With StatTools PDF
207 pages
IBM Cognos for Enterprise Users
No ratings yet
IBM Cognos for Enterprise Users
2 pages
Using Power Query for DAX Measures
No ratings yet
Using Power Query for DAX Measures
17 pages
Handson VBA
No ratings yet
Handson VBA
27 pages
Msbi Developer (SSRS, Ssas, Ssis) : Advanced Level
100% (1)
Msbi Developer (SSRS, Ssas, Ssis) : Advanced Level
4 pages
Updated Masterclass Curriculum-2
No ratings yet
Updated Masterclass Curriculum-2
35 pages
Power Query: Excel ETL Tool Overview
50% (2)
Power Query: Excel ETL Tool Overview
34 pages
Mo 210
50% (2)
Mo 210
3 pages
Decision Support Systems Guide
No ratings yet
Decision Support Systems Guide
9 pages
Data Analytics With Excel Lab2 Manual
No ratings yet
Data Analytics With Excel Lab2 Manual
98 pages
Hoist Replacement Decision Analysis
No ratings yet
Hoist Replacement Decision Analysis
1 page
BM MPE Lecture
No ratings yet
BM MPE Lecture
602 pages
New Excel Functions Overview 2022
No ratings yet
New Excel Functions Overview 2022
63 pages
Power Query s1
No ratings yet
Power Query s1
9 pages
Data Cleaning in Power Query - Best Practices and Techniques
No ratings yet
Data Cleaning in Power Query - Best Practices and Techniques
20 pages
Internship Location List
No ratings yet
Internship Location List
3 pages
DCS Interview Questions and Answers
100% (1)
DCS Interview Questions and Answers
8 pages
China Gross Fixed Capital Growth 2022
No ratings yet
China Gross Fixed Capital Growth 2022
80 pages
EIA (Bba)
No ratings yet
EIA (Bba)
3 pages
Authority To Lease - Template
100% (3)
Authority To Lease - Template
4 pages
The Purpose and Value of Art
No ratings yet
The Purpose and Value of Art
7 pages
MAWP in High-Pressure Vessel Design
No ratings yet
MAWP in High-Pressure Vessel Design
13 pages
Table of Contents
No ratings yet
Table of Contents
3 pages
Uk Urban Fieldwork Generic
No ratings yet
Uk Urban Fieldwork Generic
3 pages
Ge Smart Oven 30" - Jts5000snss
No ratings yet
Ge Smart Oven 30" - Jts5000snss
56 pages
Texting, Texting
No ratings yet
Texting, Texting
2 pages
Is.6092.3.1985 Indian Fertilizer Testing Methods
No ratings yet
Is.6092.3.1985 Indian Fertilizer Testing Methods
22 pages
1st Year CS Series Test Basics
No ratings yet
1st Year CS Series Test Basics
1 page
Chegg Solution Formatting Guidelines
No ratings yet
Chegg Solution Formatting Guidelines
8 pages
PhonePe Transaction Insights - Project Documentati
No ratings yet
PhonePe Transaction Insights - Project Documentati
7 pages
DVM-First Round Mock Exam-2025
No ratings yet
DVM-First Round Mock Exam-2025
12 pages
Geography Paper 1 Question Booklet SL-4
No ratings yet
Geography Paper 1 Question Booklet SL-4
12 pages
Blue Moon Application
No ratings yet
Blue Moon Application
7 pages
MAPEH IV Daily Lesson Log 2013
75% (4)
MAPEH IV Daily Lesson Log 2013
6 pages
Quality Control Material
No ratings yet
Quality Control Material
5 pages
Q&A Division Performance Measure
No ratings yet
Q&A Division Performance Measure
11 pages
Contributions of Mangrove Conservation and Restoration To Climat Change Arifanti Et Al 2022 GCB
No ratings yet
Contributions of Mangrove Conservation and Restoration To Climat Change Arifanti Et Al 2022 GCB
16 pages
Project
No ratings yet
Project
66 pages
Circuit.: Power Factor
No ratings yet
Circuit.: Power Factor
22 pages
AHDS 364: Goat and Sheep Management Guide
No ratings yet
AHDS 364: Goat and Sheep Management Guide
22 pages
Program Outcomes Evaluation and Improvement
No ratings yet
Program Outcomes Evaluation and Improvement
14 pages
Coasts Knowledge Organiser
No ratings yet
Coasts Knowledge Organiser
2 pages
Philippine Geothermal Power Overview
No ratings yet
Philippine Geothermal Power Overview
3 pages
Calrad Catalog (90 Series)
No ratings yet
Calrad Catalog (90 Series)
15 pages
Healthy Relationships NonViolent Communication
No ratings yet
Healthy Relationships NonViolent Communication
5 pages

wk3. Data-ETL

Uploaded by

wk3. Data-ETL

Uploaded by

Introduction to

Instructor: Huijue Kelly Duan

The Extract-Transform-Load (ETL) Process

Data Extraction using Data Connectors

The purpose of transforming data is to validate the data for

❖ Modifying data is necessary when a current value must be replaced with a

Power Query, Excel’s

Replacing Values with Power

• ETL tools provide a variety of techniques to make restructuring easier:

• Data matching poses a challenge to data integration. It is a process that

Add Missing Rows

Row Count for Excel Worksheet

Row Count with Power Query

Transfer Modify Incorrect Values

Irrelevant • Avoid integrating unreliable data into the data model.

Detecting and Correcting Unreliable Data

and • Visually scan a column’s content and its data dictionary

Power Query Data Types

Power Query | Split Column | Setup

Design and Implementation of Validation Rules

Missing • To be a primary key, a column must have a unique value for

Creating a Primary Key

• Data models with a structure that is difficult to understand.

• Data models that do not support efficient processing.

Design and Implementation of an Inter-Table Validation Rule

ETL-Analytical Database Transfer

Add Missing Rows

Data Loading Modify Incorrect Values

You might also like