0% found this document useful (0 votes)
20 views

Data Science Introduction

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Data Science Introduction

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Science

Introductio
n
INTRODUCTION TO DATA SCIENCE, OVERVIEW OF DATA
TOOLS IN DATA SCIENCE, DATA SCIENCE METHODOLOGY

DATA REQUIREMENTS, DATA UNDERSTANDING, DATA


PREPARATION, DATA MODELLING

MODEL EVALUATION, DEPLOYMENT, MODEL FEEDBACK,


OVERVIEW OF STRATEGIC IMPACT OF BAI ACROSS KEY
INDUSTRIES

ANALYTICS 3.0, THE NATURE OF ANALYTICAL COMPETITION,


WHAT MAKES ANALYTICAL COMPETITOR

ANALYTICS AND BUSINESS PERFORMANCE, COMPETING ON


ANALYTICS WITH INTERNAL AND EXTERNAL PROCESS

A ROAD MAP TO ANALYTICAL CAPABILITIES

MANAGING ANALYTICAL PEOPLE

THE ARCHITECTURE OF BUSINESS INTELLIGENCE

ESSENTIAL PRACTICE SKILLS FOR HIGH IMPACT ANALYTICAL


PROJECTS

LISTENING TO CLIENT, FRAMING CENTRAL PROBLEM AND


SCOPING A PROJECT
Introduction to Data Science

▪ Data science has emerged as a critical field for businesses in


today's data-driven world. It involves the application of
scientific methods, statistical analysis, and computational
techniques to extract valuable insights and knowledge from
large and complex datasets. By leveraging data science,
businesses can make informed decisions, enhance
operational efficiency, and gain a competitive edge.
▪ In the context of business, data science focuses on
extracting meaningful information from various data sources,
including customer transactions, social media interactions,
website traffic, sensor data, and more. The goal is to uncover
patterns, trends, and relationships within the data that can
drive actionable insights and support evidence-based
decision-making.
Introduction to Business Analytics

▪ Act of working with factual information in organization


▪ Using appropriate tools
▪ Identifies nuggets of wisdom
▪ Helps in decision making
Business Analytics and Business
Intelligence

▪ BI is about gleaning information from past data sources


hoping for information to be derived that are useful for
decision makers.
▪ Business Analytics is led by an objective to find specific
insights or test and validate some hunches that the
organization and its managers may have using
appropriate tools and techniques and plan for future
trends.
Business Analytics and Business Process
Management

▪ BI is about gleaning information from past data sources


hoping for information to be derived that are useful for
decision makers.
▪ Business Analytics is led by an objective to find specific
insights or test and validate some hunches that the
organization and its managers may have using
appropriate tools and techniques and plan for future
trends.
Process of Data science

Data Collection

Data Cleaning and Preprocessing

Exploratory data Analysis

Feature Engineering

Machine Learning and Statistical Models

Model Evaluation and Validation

Insights and decision making


Overview of Tools in Data Science
1. Programming Languages
2. Data Manipulation and Analysis
3. Data Visualization
4. Machine Learning and Data Modelling
5. Big Data Processing
6. Data integration and workflow
Machine
Data Data Data
Programming Learning and Big Data
Manipulation Visualization Integration
Languages Data Processing
and Analysis and Workflow
Modelling

Python Pandas Matplotlib Scikit-learn Apache Spark Apache Airflow

Tensor Flow
R SQL Seaborn Hadoop Knime
and Keras

Pytorch
DATA SCIENCE AND METHODOLOGY

▪ CRISP – DM ▪ OSEMN
(Obtain, Scrub, Explore, Model, iNterpret)
(Cross Industry Standard
Processing for Data Mining) 1. Data science methodology that provides
a sequential framework for data analysis
It has six phases 2. It begins with obtaining and collecting
relevant data, followed by data cleaning
Business Understanding and preprocessing (scrubbing).
Data understanding
3. Next, exploratory data analysis (EDA)
Data Preparation techniques are applied to gain insights and
Modelling understand the data. Modeling involves
Evaluation
building and training machine learning
models, and interpretation focuses on
Deployment deriving meaningful conclusions and
actionable insights from the results.
DATA SCIENCE AND METHODOLOGY

Hypothesis-Driven Approach: Agile Data Science


▪ Formulating hypotheses based on ▪ Borrowing from agile software
domain knowledge and prior development, this methodology
understanding, designing emphasizes an iterative and
experiments or analyses to test collaborative approach to data
these hypotheses, and drawing science projects.
conclusions based on the results. ▪ It involves breaking down the
▪ It emphasizes the use of project into smaller, manageable
statistical inference and tasks, prioritizing them, and
hypothesis testing to validate or delivering incremental results.
reject hypotheses. Feedback loops and regular
communication with stakeholders
are integral to this methodology.
DATA SCIENCE AND METHODOLOGY

Bayesian Inference: Experimental Design

Bayesian inference is a ▪ Experimental design focuses on


probabilistic methodology that planning and conducting controlled
combines prior knowledge and experiments to investigate causal
data to make inferences and relationships. It involves defining
the experimental factors, selecting
update beliefs.
appropriate variables,
It involves defining prior randomization, and statistical
probabilities, gathering data, and analysis to draw valid conclusions.
using Bayes' theorem to calculate ▪ This methodology helps establish
posterior probabilities. Bayesian cause-and-effect relationships and
methods are particularly useful supports decision-making based on
when dealing with uncertainty and experimental results.
incorporating prior knowledge into
Data Science and Methods

Cross Validation
▪ Cross-validation is a technique for evaluating
the performance of predictive models. It
involves partitioning the data into multiple
subsets, training the model on one subset, and
testing it on another.
▪ This approach helps assess the generalization
capability of the model and identify potential
issues such as overfitting or underfitting.
DATA REQUIREMENT – Defining data
requirement in data science
Data Type and Sources
Clearly defining data requirements at
Data Volume and Size the outset of a data science project
helps set expectations, ensures data
availability, and guides the subsequent
Data Quality and Completeness stages of data collection,
preprocessing, and analysis.
Data Variables and Features
Regular reassessment and refinement
of data requirements may be necessary
Data Granularity and Temporality as the project progresses and new
insights are gained.
Data Privacy and Security
Data Accessibility and
Availability
Data Governance and
Documentation
Ethical Consideration
DATA UNDERSTANDING – Types of data

Primary
Secondary

Unstructured Data
Semi Structured data
Meta Data – (Descriptive, Structural, Admistrative)
Structured Data
• DataDATA UNDERSTANDING
understanding is a crucial step in data science for business
analytics.
• It involves gaining a comprehensive understanding of the
available data to extract meaningful insights and support
decision-making.
 Data Exploration
 Data Profiling
 Data Sources and Integration
 Data Relationships and dependencies
 Domain knowledge Integration
 Data Sampling and Subset Creation
 Data Documentation and Metadata
 Data Privacy and Security
Data Preparation steps in Data Science

Data Cleaning

Data Integration

Data Transformation

Feature Selection

Feature Engineering

Data Encoding

Splitting the data


Data Modelling – Data Science and
Business Analytics Descriptive Modelling Predictive Modelling
focuses on – Predictive modeling
1. Summarizing aims to make
Text Mining: Graph Modeling: 2. Describing predictions or forecasts
Extract Based on historical data based on historical data
meaningful Graph modeling is Use: understand Techniques: statistical
information and used to represent patterns trends and analysis, data
insights from and analyze data relationships visualization, and
unstructured text with complex exploratory data
data. relationships or analysis (EDA)
networks. Use:
Techniques: train models on existing
sentiment Use: social data and then apply
analysis, topic networks, those models to new or
modeling, recommendation unseen data
document systems, fraud
classification, and detection, and Prescriptive Modelling: Time Series Modelling:
text generation supply chain providing recommendations capturing and modeling the
optimization. or optimization solutions patterns, trends, and
Use: analyzing Techniques: linear seasonality in the data to
customer regression, logistic make future predictions
feedback, social regression, decision trees, Techniques: ARIMA,
random forests, support Exponential smoothening,
Model Evaluation

Train/Test Split

Confusion Matrix

Performance Metrics

Cross-Validation

Overfitting and underfitting analysis

Feature Importance
Model Feedback

User / Client Feedback

Performance Evaluation

Error Analysis

Domain Expertise

Continuous Improvement

Communication and
Documnentation

Collaboration and Feedback Loop


Overview of Strategic Impact of BAI across
key industries
HEALTHCARE
HEALTHCARE

RETAIL
RETAIL
FINANCE

DEFENSE
DEFENSE BAI MANUFACTURING

AGRICULTURE
AGRICULTURE
TRANSPORTATION

ENERGY AND UTILITIES


Evolution of Analytics 3.0

▪ Analytics has undergone a ▪ With the advent of Analytics


transformative journey over 2.0, the focus shifted
the years. Analytics 1.0 was towards predictive
marked by descriptive analytics. Organizations
analytics, where historical began utilizing statistical
data was analyzed to gain models and algorithms to
insights into past events forecast future outcomes
and trends. This phase and trends. This enabled
provided organizations with them to make more
a basic understanding of informed decisions and take
their operations and helped proactive measures based
in reporting and on data-driven insights.
visualization.
Analytics 3.0 – the next frontier

▪ Analytics 3.0 represents a paradigm shift in the way data is utilized and
leveraged for decision making. It combines advanced technologies like
AI, machine learning, natural language processing, and big data
analytics to derive actionable insights from complex and diverse
datasets.
▪ One key aspect of Analytics 3.0 is the ability to process and analyze
unstructured data, such as text, images, audio, and video. AI-powered
algorithms can extract valuable information from these data sources,
enabling organizations to gain a deeper understanding of customer
sentiment, preferences, and behavior.
▪ Another defining characteristic of Analytics 3.0 is the integration of real-
time and streaming data analytics. With the proliferation of IoT devices
and sensors, organizations can capture and analyze data in real-time.
This facilitates faster decision making, proactive interventions, and the
ability to respond swiftly to changing market conditions.
Analytics 3.0 - Highlights

Enhanced customer insights

Advanced Risk Management

Process Optimization

Data Driven Decision Making

Innovation and New Business Model


Analytics 3.0 – Challenges and
Considerations

Talent and Skills Gap

Data Privacy and Security

Bias and Fairness

Data Quality and Integration

Interpretability and Explainability

Ethical Use of AI
Nature of Analytical Competition

Data Availability

Technological Advancements

Analytical Talent

Speed and Agility

Innovation and Creativity

Scalability and Infrastructure

Domain Knowledge and Context

Continuous Learning and Improvement


What makes an analytical competitor?

Data-driven Culture
Strong Analytical Skills
Advanced Technology and Tools
Domain Expertise
Scalability and Infrastructure
Innovation and Continuous Learning
Agile and Iterative Approach
Business Impact and Results
Collaboration and Communication
Ethical and Responsible Practices
Competing on analytics with internal and
external process
Internal Process External

Operational
Efficiency Customer Analytics

Talent Management Market Intelligence

Supply Chain Sales and Marketing


Optimization Optimization

Product and Service


Risk Management Innovation
A Roadmap to Analytical Capabilities
For Individual For Companies / Teams

Identify Business Goals and Objectives

Foundation in Data Science


Develop a Data Strategy

Machine Learning and Predictive


Modeling Build a Data Infrastructure

Advanced Analytical Techniques


Hire the Right Talent

Big Data Analytics


Develop Analytical Models

Data Visualization and


Communication Implement Models

Domain Expertise
Measure Performance

Continuous Learning and


Professional Development Refine and Optimize
How to manage analytical people

Foster a Data-Driven Culture

Provide Access to Quality Data and Tools

Set Clear Goals and Expectations

Encourage Autonomy and Creativity

Support Professional Development

Foster Collaboration and Cross-Functional Communication

Recognize and Reward Excellence

Provide Regular Feedback and Support

Balance Workload and Priorities

Encourage Knowledge Sharing


Architecture of Business Process
Management
Architecture of Business Intelligence
From the Architecture of BI

Four Major Components of BI


Data Warehouse Source of Data
Business Analytics A collection of tools for
manipulating, mining, and
analyzing the data in the data
warehouse

Business performance management For monitoring and analyzing


(BPM) performance

User Interface Browser, portals and


dashboards
Essential practice skills for high impact
analytical projects

Problem Formulation
Data Exploration and Preparation
Data Visualization and Communication
Statistical and Analytical Techniques
Machine Learning and Predictive Modeling
Experimental Design and A/B Testing
Business Acumen
Continuous Learning and Adaptability
Collaboration and Teamwork
Ethical Considerations
Listening to client, framing central problem and
scoping a project in data science and business
analytics - A step by step process
Client Engagement: Start by actively Identify the Central Problem: Based on the Understand the Business Context: Gain
listening to the client's needs, goals, and client discussions, distill the information to a deeper understanding of the client's
challenges. Engage in open and identify the central problem or opportunity that industry, market dynamics, competition,
meaningful discussions to understand their the project aims to address. Clearly define the and other relevant factors. This knowledge
business context, objectives, and desired problem statement, ensuring that it is specific, will help you frame the problem in the
outcomes. Ask probing questions to clarify measurable, attainable, relevant, and time-bound appropriate business context and identify
any uncertainties and gain a (SMART). The central problem should capture key drivers and constraints that should be
comprehensive understanding of their the essence of the client's challenge and provide considered in the analysis.
requirements. a clear focus for the project.

Formulate Objectives and Goals: Scope the Project: Define the boundaries and Breakdown of Deliverables and
Collaborate with the client to establish scope of the project based on the problem Milestones: Collaboratively establish the
clear project objectives and goals. These statement and objectives. Determine the data project's deliverables, including
should align with the central problem and sources and variables required for analysis, as intermediate milestones and final
provide a roadmap for the analysis. well as any limitations or constraints. Identify outcomes. This breakdown helps manage
Objectives should be specific, measurable, the key stakeholders involved and establish client expectations, track progress, and
achievable, relevant, and time-bound communication channels and reporting ensure that the project stays on track.
(SMART). They should outline the desired requirements. Define the specific analyses, models,
outcomes, such as increasing revenue, reports, or recommendations that will be
optimizing operations, or improving provided at each stage.
customer satisfaction.
Listening to client, framing central problem and
scoping a project in data science and business
analytics - A step by step process
Resource Planning: Assess the resources required to execute the project successfully. This includes the
team members' skills, expertise, and availability, as well as any additional data or technology needs. Allocate
resources effectively to ensure a balanced workload and maximize efficiency.
Risk Assessment and Mitigation: Identify potential risks and challenges that could impact the project's
success. Evaluate the likelihood and impact of each risk and develop mitigation strategies to address them.
Communicate these risks to the client and establish contingency plans to manage any unforeseen
circumstances.

Establish Timelines and Deadlines: Create a project timeline with specific deadlines for each milestone
and deliverable. Clearly communicate the timeline to the client, ensuring mutual agreement on the project
schedule. Regularly monitor and update the timeline throughout the project to manage progress effectively.

Obtain Client Agreement: Seek formal client agreement and sign-off on the project scope, objectives,
deliverables, and timeline. This ensures that both parties have a shared understanding of the project's scope
and expectations.
Check if you are able to answer the
following questions

▪ Give a Overview of tools used ▪ What is the nature of analytical


in Data Science competition? What are the qualities
of an analytical competitor?
▪ What are the four major
▪ List down your observation on
components of architecture of Strategic Impact of BAI across key
BI. Give a skeletal industries.
representation of BI
architecture ▪ Give step by step approach to
prepare your data for data
▪ Evolution of analytics 3.0 analytics.

▪ What are the areas to compete ▪ What is data modelling? How do


you evaluate and get feedback on
with analytics as internal and
data model?
external process?
▪ List and evaluate the data science
▪ Give the perspectives of methods available for implementing
understanding the data. business analytics.

You might also like