100% found this document useful (1 vote)
339 views58 pages

01 - Introduction To Data Analytics

A typical day for a data analyst involves cleaning, organizing, and analyzing data to identify trends and insights. They collaborate closely with other teams to understand business needs and communicate findings. The work involves using various tools like Excel, SQL, and visualization software. Key challenges include wrangling messy data and effectively telling the story of the data.

Uploaded by

naimkimi2000
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (1 vote)
339 views58 pages

01 - Introduction To Data Analytics

A typical day for a data analyst involves cleaning, organizing, and analyzing data to identify trends and insights. They collaborate closely with other teams to understand business needs and communicate findings. The work involves using various tools like Excel, SQL, and visualization software. Key challenges include wrangling messy data and effectively telling the story of the data.

Uploaded by

naimkimi2000
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 58

Data Analytics

Welcome to Data Analytics

WELCOME TO GA
GENERAL ASSEMBLY
What You’ll Learn Today

In this lesson, we’ll:


● Outline goals, expectations, and logistics.

● Definition of data analytics vs other data fields

● Identify the skills and mindset of a successful


data analyst.

● Discuss the discipline of data analytics, including


topics such as data formats and data ethics.

● Identify tools and topics within data analytics

2 | © 2023 General Assembly


General Assembly is a global community of
individuals empowered to pursue work they love.

500,000+ 20,000+
30+ 70,000+
Workshop
Global Campuses Course Alumni Expert Instructors
Attendees
3 | © 2023 General Assembly
Meet Your Instructor

Bharath Kumar
Data Analytics and Software Engineering Instructor

• Currently working as a Lead Instructor for Data Analytics and Software


Engineering at General Assembly Malaysia.
• More than thirteen years of experience working for various financial
organizations in Malaysia, including HSBC, Allianz, Hong Leong, CIMB
Banks.
• Cloudera Certified Spark and Hadoop Developer and a multi-cloud
certified engineer with certifications from AWS, Azure, Alibaba and IBM.
• Got strong expertise in designing and executing solutions for large-
scale data warehousing, real-time analytics, and reporting.
• Extensive knowledge includes Java, Hadoop, MapReduce, Hive,
HBase, Spark, Kafka, Apache Air Flow, JavaScript, Python, and I have
experience working with various ETL tools and DevOps tools.

4 | © 2023 General Assembly


Meet Your Instructor/ Teaching Associate

Sanjeevani Shere
Python and Data Science Mentor

● Currently working as Python Mentor in Code


Young India.
● More than Four years of experience working as
python Data Science Mentor in various
company like ID Tech,Campk12.

5 | © 2023 General Assembly Malaysia


Meet Your Instructor / Teaching Assistant

Ruzaini Amiraa Roslan


Content Creator, Youtube / Wordpress

● Data scientist with 4+ experience in the field of data science


PHOTO and analytics.
● Experienced in data analysis, data visualization, machine
learning, statistics and deep learning.
● Programming languages: Python, R, SQL, HTML, CSS, PHP.
● Academic qualifications: BSc (Hons.) Statistics, Universiti
Teknologi MARA; MSc (Data Science and Analytics), Universiti
Kebangsaan Malaysia
● Certifications from Python, Machine Learning and Deep
Learning courses on Kaggle and DataCamp.

6 | © 2023 General Assembly


Group Exercise:
15 minutes
Getting to Know You! (Ice Breaking!)

Introduce yourself and say hi to


your classmates!

Please include:

● Your name and your job?


● Why you’re taking this class.
● Fun fact: What was the first thing
you did after this pandemic?
● Are you familiar with Excel, SQL or
Tableau?

7 | © 2023 General Assembly


Orientation

No such thing as
Slides and data files in
a silly question
Google Drive

Your time - limit Last class –


distractions Thu 19 October

8 | © 2023 General Assembly


Orientation

Out of class support & Camera on, mic off


working as a team

Pulse Check Timing and pace

9 | © 2023 General Assembly


At GA, we create norms for how we’ll work together during the course.

Check out the working


Contribute
norms. Is there Be Present Work Hard
Constructively
anything else we
should add to the list?
Ask
Be Supportive Talk to Us!
Questions

10 | © 2023 General Assembly


Buckle Up for the Journey Ahead!

11 | © 2023 General Assembly


Gaining new idea => the path

CONFUSION OPTIMISM MASTERY

Variance

Time

12 | © 2023 General Assembly


Our Teaching Method

Effortful thought

13 | © 2023 General Assembly


Where We’re Going | Lesson by Lesson
1 Introduction to Data Analytics 9 Subqueries in SQL

2 Data Cleaning and Formulas 10 Introduction to Tableau

3 Referencing and Lookups 11 Data Manipulation in Tableau

4 Aggregating Data With PivotTables 12 Dashboards in Tableau

5 Communicating With Excel 13 Data Narratives in Tableau

6 Introduction to SQL 14 Final Presentation

7 Grouping and CASE WHEN in SQL

8 JOINs and Merges


You’ll Leave This Course Saying...

“I feel empowered to continue learning


new techniques and acquiring new ways
“I am no longer of working with data.”
intimidated by rows and
rows of data! I can
combine, clean, and
visualize data to gain
insights into important “Converting numbers
business trends.” into visually appealing,
easy-to-understand
visualizations is such a
creative process. It’s a
lot of fun!”

15 | © 2023 General Assembly


Introduction to Data Analytics

Course Logistics

WELCOME TO GA
GENERAL ASSEMBLY
Here for You

Office Hours

Calendly Links are updated to


Classroom

Google Classroom and Telegram

Used for course content,


announcements, and class discussions.

17 | © 2023 General Assembly


Graduation Requirements

Complete 80% of homework assignments.

Maintain consistent attendance.

Complete and submit the final project.

18 | © 2023 General Assembly


Homework
● Homework may be assigned at the end of each lesson. It generally consists
of pre-lesson work in myGA (optional), additional practice as assigned by
your instructor, and project work (required).
○ To submit your project work, you’ll be provided with a link to your own Google
drive and share it with your instructor.

○ This will be your work repository where you upload your project work and any
assignments as required by your instructor.

● Grading: The pre-lesson work is not graded. Project work will be graded
after your project presentation.

19 | © 2023 General Assembly


Final Project Presentation

At the end of the course, you’ll give a 5–7 minute presentation that should
address the following:
● Who is your target audience?
● What problem are you solving?
● What is your solution and how does it solve the problem?
● What are some insights and trends you want to share, and why are they important?
● How do you tell the story of your data through data narratives and visualization?

20 | © 2023 General Assembly


Introduction to Data Analytics

The Role of a Data Analyst

WELCOME TO GA
GENERAL ASSEMBLY
Discussion:
15 minutes
What Does a Data Analyst Do?

Maybe you already know what a data analyst does on a daily basis, or maybe
you’re totally new to the field. Focus on 2–3 of the points below and share your
experience:

● What does the typical work day of a DA look like?


● How does a DA collaborate with others?
● What is it like to work for a small startup versus a mid-sized or large company?
● What are some fun things about the job?
● What are some challenges?
● What's the most important skill to have as a DA? (This can also include soft skills.)

23 | © 2023 General Assembly


Data Analyst Data Scientist

The Storyteller The Wizard

● Uses Excel, SQL (or ● Uses Python, R, SAS, SQL,


NoSQL), Python, R, and Matlab, Hive, Pig.
visualization software like ● Works with data using
Tableau, Qliksense, and algorithms, machine
Power BI. learning, and AI.
● Explores data and presents ● Also tells data stories but
trends and insights. with more of a focus math
and coding.

24 | © 2023 General Assembly


Data Analytics vs Data Science
Data Analytics Data Science

Computer
Programming

Domain Descriptive
knowledge Statistics

Linear
Domain Algebra and
Knowledge Inferential
Statistics

Warning: A gross over-simplification!!

25 | © 2023 General Assembly


Data Analytics vs Data Science

Data Analytics

Preparing data Crafting a ‘story’


Obtaining data Understanding data
Communication
Cleaning data Analysis Data visualisation
Investigation Presentation

Data Science

Preparing data Model creation Crafting a ‘story’


Obtaining data Communication
Model training Data visualisation
Cleaning data
Model refinement Presentation
Hypothesis creation/ testing

26 | © 2023 General Assembly


Data Scientist Data Analyst Data Engineer Data Architect Data Admin

• Clean Data • Collect Data • Software Engineers • Design and plan • Data Owner
• Massage Data • Process Data • Develop Database data architecture • Ensures pipelines
• Organize Data • Analyze Data • Prepare Data for • Plan data pipeline are in place

• Build • Build analytics • Plan infrastructure • Responses to any


Digestible/Predictive Descriptive/Diagnost • Maintain data • Domain expert who downtime
/Prescriptive Models ic Models sources understands
• Build BI models • Build BI models • Data lake, data business problem
warehouse owner
How many of us have encountered this?
Storytelling can make or break your organization
Stop showing the data
This charts shows me the data – but so what?
Start storytelling with data
Communicate insight through data
Developing an Analytical Mindset

What to Do How to Do It

Be curious and ready to learn new Ask lots of questions. Explore new
things. techniques and new ways of looking at
data.

Practice, practice, practice! Find data sets online and take them
through the DA Workflow, from framing
all the way to communicating.

Follow what the data tells you. Never twist your analysis to support your
initial hypothesis.

31 | © 2023 General Assembly


Developing an Analytical Mindset (Cont.)

What to Do How to Do It

Stay ORGANIZED! Keep your spreadsheets tidy and easy to


understand.

Document, document, document! Maintain lists of what you delete or


change when cleaning data, and
comment your code.

32 | © 2023 General Assembly


So… What Does Data Actually Look Like?
Rows and
columns?!

33 | © 2023 General Assembly Image source: Data Mining Data Set Reports
The Five Vs (or Characteristics) of Big Data

Volume: Consider the scale of the data (big or small, structure).

Velocity: Understand data sources, timing, and flow.

Variety: What forms and types are required to answer questions?

Veracity: Verify the quality, accuracy, and reliability of sources.

Value: What are the metrics or measurements for desired outcomes?

34 | © 2023 General Assembly


The DA Workflow
Frame: Develop hypothesis-driven
questions for your analysis.
Extract: Select and import relevant data.
Wrangle/Prepare: Clean and prepare
relevant data.
Analyze: Structure, comprehend, and
visualize data.
Interpret: Leverage your analysis to make
decisions and recommendations.
Communicate: Present data-driven
findings and insights in a compelling
manner.

35 | © 2023 General Assembly


The DA Workflow
Frame: Develop hypothesis-driven
questions for your analysis.
Extract: Select and import relevant data.
Wrangle/Prepare: Clean and prepare
relevant data.
Analyze: Structure, comprehend, and
visualize data.
80% of the time
Interpret: Leverage your analysis to make
decisions and recommendations.
Communicate: Present data-driven
findings and insights in a compelling
manner.

36 | © 2023 General Assembly


Introduction to Data Analytics

Data Formats

WELCOME TO GA
GENERAL ASSEMBLY
Let’s Talk About Data Formats

You’ll be looking at a lot of


data throughout this
course.

The formatting of data can


make a real difference in
your work as a DA!

38 | © 2023 General Assembly


Discussion:
2 minutes
Data Formats | Columns

Take a look at this example:

● What do you notice?


● What can we do to improve it?

Share your answers with the class.

39 | © 2023 General Assembly


Discussion:
2 minutes
Data Formats | Columns

How about now?

● What’s changed?
● What makes this
version better?

40 | © 2023 General Assembly


Partner Exercise:
5 minutes
Data Formats | Columns and Rows

Pair up with a classmate and take a look at the example below.

● What do you notice? What does this data set tell you?
● What can we do to improve it?

41 | © 2023 General Assembly


Data Formats | Columns and Rows

Here’s a better way!

One row for each variable:


country name, country code,
year, and life expectancy.

It’s OK if some data are


repeated!

42 | © 2023 General Assembly


Data Tools..
How many you think there is??
Data Tools..

Link to pdf
Example of Tools for Analytics

Workbook/Sheets Analytic Databases Visualization Tools

46 | © 2023 General Assembly


Introduction to Data Analytics

Data Ethics

WELCOME TO GA
GENERAL ASSEMBLY
Data Ethics

Data ethics is about the responsible


and sustainable use of data — doing
the right thing for people and society.
It refers to the principles and
values on which human rights and
personal data protection laws are
based.

— DataEthics.eu

48 | © 2023 General Assembly


Data Ethics Principles

- Privacy

- Transparency

- Faireness

- Accountability

- Responsiblity

49 | © 2023 General Assembly


Real Cases:
Netflix and… Data Ethics

According to a McKinsey
report, 75% of Netflix viewing
decisions result from product
recommendations. This raises
ethical implications such as:

● Addictiveness
● Radicalized content
● Privacy

50 | © 2023 General Assembly Source: How Retailers Can Keep Up With Consumers
Deep Fakes
Poses ethical problems everywhere
Real Cases:
A Healthier Landscape for Product Recommendations

It’s certainly possible! Canopy is working


on a recommendation system that:

● Looks for signs of quality.


● Makes suggestions without
centralized data collection.
● Runs the recommendation algorithms
on a person’s device.
● Shares only anonymized usage data
with company servers.

52 | © 2023 General Assembly Source: The People Trying to Make Internet Recs Less Toxic
Partner Exercise:
10 minutes
Doing Our Part as Data Analysts

Discuss the following with your partner. We’ll regroup after five minutes.

● What does it mean to use data ethically?


● What are some ways you’ve seen data being used
unethically?
● What role do data analysts play in using data
ethically?

53 | © 2023 General Assembly


Group Exercise:
15 minutes
More examples of data

● How Google/Facebook faced problems with Data?


● Biased data reporting
● Prediction models
● The accessibility of datasets and what you do with them?
● Also check out deep fakes, and give an example of good usage of deep
fakes

54 | © 2023 General Assembly


Introduction to Data Analytics

Wrapping Up

WELCOME TO GA
GENERAL ASSEMBLY
Recap Looking Ahead

Today, we: ● Optional myGA lessons:


○ Exploring Data (unit)
● Outlined goals, expectations, and ■ Data Profiling
■ Probing Data With Logical
logistics.
Functions
● Identified the skills and mindset of
○ Data Wrangling (unit)
a successful data analyst. ■ Cleaning Your Data
● Discussed the discipline of data
analytics, including topics such as Up Next: Data Cleaning and
data formats and data ethics. Formulas

56 | © 2023 General Assembly


Q&A

57 | © 2023 General Assembly

You might also like