0% found this document useful (0 votes)
4 views

Data Preparation using PowerExcel

The document outlines a course on Microsoft Excel's Power Query and Power Pivot, focusing on their workflow, benefits, and functionalities compared to traditional Excel. It covers data modeling, types of data connections, and the tools available for data transformation and analysis. Additionally, it highlights compatibility issues, particularly for Mac users, and emphasizes best practices for using Power Query effectively.

Uploaded by

Vrushank Bhatt
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Preparation using PowerExcel

The document outlines a course on Microsoft Excel's Power Query and Power Pivot, focusing on their workflow, benefits, and functionalities compared to traditional Excel. It covers data modeling, types of data connections, and the tools available for data transformation and analysis. Additionally, it highlights compatibility issues, particularly for Mac users, and emphasizes best practices for using Power Query effectively.

Uploaded by

Vrushank Bhatt
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

MICROSOFT EXCEL:

INTRO TO POWER
QUERY, POWER PIVOT
GETTING STARTED
COURSE OUTLINE

1 The “Power” Excel Landscape


• Power Query/Power Pivot workflow and key benefits vs. “traditional” Excel

2 Power Query
• Types of data connectors, query editing tools, loading options, etc.

3 Data Modeling 101


• Excel Data Model interface, normalization, table relationships, hierarchies, etc.
VERSIONS & COMPATIBILITY

IMPORTANT NOTE: Power Pivot is currently not available for Mac,


and is only available in certain versions of Excel for Windows/PC

For a full, current list of compatible versions, visit support.office.com (or Google “Where is Power Pivot?”):
https://support.office.com/en-us/article/Where-is-Power-Pivot-aa64e217-4b6e-410b-8337-20b87e1c2a4b (or use: bit.ly/2yd80rd)

Other considerations:
• Power Pivot works best with 64-bit Excel, which can access more processing power and memory (not critical)
• Note: make sure you’re running a 64-bit operating system and that you’ve updated Office to the 64-bit version

• Power Pivot menus, features and tools have evolved over time; what you see on your screen may differ from
what you see on mine, but the fundamental skills and concepts covered are universally applicable
• Even if you have a compatible version of Excel, you may need to enable the Power Pivot or Power Query
plug-ins to access the tools in this course (File > Options > Add-Ins > Manage: COM Add-Ins)
GETTING TO KNOW THE FOODMART DATABASE

• Throughout the course, we’ll be using sample data from a fictional super market chain
called “FoodMart”*
• In addition to daily transactional records from 1997-1998, our data set includes
information about products, customers, stores, and regions
• All files are available for download in the course resources section of your course
dashboard (Course Dashboard > Course Content > All Resources)

Transactions Returns Customer Lookup Calendar Lookup Product Lookup Store Lookup Region Lookup
-transaction_date -return_date customer_id date product_id store_id region_id
-stock_date -product_id customer_acct_num month_num product_brand region_id sales_district
-product_id -store_id first_name quarter product_name store_type sales_region
-customer_id -quantity last_name year product_sku store_name
-store_id customer_address weekday_num product_retail_price store_street_address
-quantity etc.. etc… etc… etc…

“Data” Tables “Lookup” Tables


SETTING EXPECTATIONS

1 I’m using Excel 365 for PC (365 ProPlus, 64-bit)


• Power Pivot is currently not available for Mac
• What you see on your screen will not always match what you see on mine (especially for Excel 2010 or 2013)

2 This course is designed to get you up & running with Excel’s BI tools
• The goal is to provide a solid foundational understanding of Power Query, Power Pivot and DAX; we may
simplify some concepts to make them easier to grasp, and will not cover some of the more advanced tools
LET’S DO THIS.
INTRO TO “POWER EXCEL”
THE “POWER EXCEL” WORKFLOW

These are Excel’s Business Intelligence tools, all of which are available directly in Excel
(provided you have a compatible version); no additional software is required!

RAW DATA POWER QUERY DATA MODEL POWER PIVOT & DAX
Flat files (csv, txt), Excel tables, (aka “Get & Transform”) Explore and analyze the entire
Create table relationships, add
databases (SQL, Azure), folders, Connect to sources, import calculated columns, define data model, and create powerful
streaming sources, web data, etc. data, and apply shaping and hierarchies and perspectives, etc. measures using Data Analysis
transformation tools (ETL) Expressions (DAX)
“THE BEST THING TO HAPPEN TO EXCEL IN 20 YEARS”

• Import and analyze MILLIONS of rows of data in Excel


• Access data from virtually anywhere (database tables, flat files, cloud services, folders, etc.)

• Quickly build models to blend and analyze data across sources


• Instantly connect sources and analyze holistic performance across your entire data model

• Create fully automated data shaping and loading procedures


• Connect to databases and watch data flow through your model with the click of a button

• Define calculated measures using Data Analysis Expressions (DAX)


• No more redundant A1-style “grid” formulas; DAX expressions are flexible, powerful and portable
#1: IMPORT & ANALYZE MILLIONS OF ROWS

When was the last time you loaded


25,000,000 rows of data into Excel?
When you connect to data with Power Query
and load it to Excel’s Data Model, the data is
compressed and stored in memory, NOT in
worksheets (no more 1,048,576 row limit!)
#2: BUILD DATA MODELS TO BLEND SOURCES

This is an example of a Data Model in


“Diagram View”, which allows you to
create connections between tables

Instead of manually stitching tables


together with cell formulas, you
create relationships to blend data
based on common fields
#3: AUTOMATE YOUR DATA PROCESSING

With Power Query, you can


filter, shape and transform
your raw data before loading
it into the data model

Each step is automatically


recorded and saved with the
query, and applied
whenever the source data is
refreshed – like a macro!
WHEN TO USE POWER QUERY & POWER PIVOT

Use Power Query and Power Pivot when you want to…
Analyze more data than can fit into a worksheet

Create connections to databases or external sources

Blend data across multiple large tables

Automate the process of loading and shaping your data

Unleash the full business intelligence capabilities of Excel


POWER QUERY
MEET POWER QUERY

Power Query (aka “Get & Transform”) allows you to:


• Connect to data across a wide range of sources
• Filter, shape, append and transform raw data for further analysis and modeling
• Create stored procedures to automate your data prep (like a macro!)

The Power Query tools live in the Data tab, under


the “Get & Transform” section (Excel 2016)
TYPES OF DATA CONNECTIONS

From File From Database FromAzure From Online Services From Other Sources
THE QUERY EDITOR

Query
Editing
Tools
Formula Bar
(this is “M” code)

Name your
table!
Data
Preview Applied
Steps

Access the Query Editor by creating a new query and choosing the “Edit” option, or by launching
the Workbook Queries pane (Data > Show Queries) and right-clicking an existing query to edit
QUERY EDITOR TOOLS
The HOME tab includes general settings and common table transformation tools

The TRANSFORM tab includes tools to modify existing columns (splitting/grouping, transposing, extracting text, etc.

The ADD COLUMN tools create new columns based on conditional rules, text operations, calculations, dates, etc.
DATA LOADING OPTIONS

When you load data from Power Query, you have several options:
• Table
• Stores the data in a new or existing worksheet
• Requires relatively small data sets (<1mm rows)

• Connection Only
• Saves the data connection settings and applied steps
• Data does not load to a worksheet

• Add to Data Model


• Compresses and loads data to Excel’s Data Model
• Makes data accessible to Power Pivot for further analysis
BASIC TABLE TRANSFORMATIONS
Sort values Change data types Promote header row
(A-Z, Low-High, etc.) (date, $, %, text, etc.)

Duplicate, move &


rename columns
Keep or remove columns
Tip: Right-click the
Tip: use the “Remove Other column header to
Columns” option if you always access common tools
want a specific set
Keep or remove rows
Tip: use the “Remove Duplicates”
option to create a new lookup
table from scratch
TEXT-SPECIFIC TOOLS

Extract characters from a text


column using a fixed length,
first or last, or a defined range
Split a text column based on Tip: Select two or more columns to
either a specific delimiter or merge or concatenate fields
a number of characters

HEY THIS IS IMPORTANT!


You can access many of these tools in both the
“Transform” and “Add Column” menus -- the Format a text column to upper, lower or
difference is whether you want to add a new proper case, or add a prefix or suffix
column or modify an existing one Tip: Use “Trim” to eliminate leading & trailing spaces,
or “Clean” to remove non-printable characters
NUMBER-SPECIFIC TOOLS

Information tools allow


you to define binary flags
(TRUE/FALSE or 1/0) to
Standard Scientific Trigonometry mark each row in a
Statistics functions allow you to column as even, odd,
evaluate basic stats for the selected Standard, Scientific and Trigonometry tools allow you
positive or negative
column (sum, min/max, average, to apply standard operations (addition, multiplication,
count, countdistinct, etc) division, etc.) or more advanced calculations (power,
logarithm, sine, tangent, etc) to each value in a column
Note: These tools return a SINGLE value,
and are commonly used to explore a table Note: Unlike the Statistics options, these tools are applied to
rather than prepare it for loading each individual row in the table
DATE-SPECIFIC TOOLS

Date & Time tools are relatively straight-forward, and include the following options:
• Age: Difference between the current time and the date in each row
• Date Only: Removes the time component of a date/time field
• Year/Month/Quarter/Week/Day: Extracts individual components from a date field
(Time-specific options include Hour, Minute, Second, etc.)
• Earliest/Latest: Evaluates the earliest or latest date from a column as a single value (can
only be accessed from the “Transform” menu)

Note: You will almost always want to perform these operations from the “Add Column” menu to
build out new fields, rather than transforming an individual date/time column

PRO TIP:
Load up a table containing a single date column and use Date tools to build out an entire calendar table
CREATING A BASIC CALENDAR TABLE

Use pre-defined Date options


in the “Add Column” menu to
quickly build out a calendar
table from a list of dates
ADDING AN INDEX COLUMN

Index Columns contain a list of


sequential values that can be used to
identify each unique row in a table
(typically starting from 0 or 1)

These columns are often used to


create unique IDs that can be used to
form relationships between tables
(more on that later!)
ADDING A CONDITIONAL COLUMN

Conditional Columns allow you to define new fields based


on logical rules and conditions (IF/THEN statements)

In this case we’re creating a new conditional column


called “Order Size”, which depends on the values in the
“quantity” column, as follows:
• If quantity >5, Order Size = “Large”
• If quantity is from 2-5, Order Size = “Medium”
• If quantity =1, Order Size = “Small”
• Otherwise Order Size = “Other”
POWER QUERY BEST PRACTICES

Give your queries clear and intuitive names, before loading the data
• Define names immediately; updating query & table names later can be a headache,
especially if you’ve already referenced them in calculated measures
• Don’t use spaces in table names (otherwise you have surround them with single quotes)

Do as much shaping as possible at the source of the data


• Shaping data at the source (i.e. SQL, Access) minimizes the need for complex procedures in
Power Query, and allows you to create new models without replicating the same process

When working with large tables, only load the data you need
• Don’t include hourly data when you only need daily, or product-level transactions when
you only care about store-level performance; extra data will only slow you down
THANK YOU!

You might also like