Data Science Introduction
Data Science Introduction
Introductio
n
INTRODUCTION TO DATA SCIENCE, OVERVIEW OF DATA
TOOLS IN DATA SCIENCE, DATA SCIENCE METHODOLOGY
Data Collection
Feature Engineering
Tensor Flow
R SQL Seaborn Hadoop Knime
and Keras
Pytorch
DATA SCIENCE AND METHODOLOGY
▪ CRISP – DM ▪ OSEMN
(Obtain, Scrub, Explore, Model, iNterpret)
(Cross Industry Standard
Processing for Data Mining) 1. Data science methodology that provides
a sequential framework for data analysis
It has six phases 2. It begins with obtaining and collecting
relevant data, followed by data cleaning
Business Understanding and preprocessing (scrubbing).
Data understanding
3. Next, exploratory data analysis (EDA)
Data Preparation techniques are applied to gain insights and
Modelling understand the data. Modeling involves
Evaluation
building and training machine learning
models, and interpretation focuses on
Deployment deriving meaningful conclusions and
actionable insights from the results.
DATA SCIENCE AND METHODOLOGY
Cross Validation
▪ Cross-validation is a technique for evaluating
the performance of predictive models. It
involves partitioning the data into multiple
subsets, training the model on one subset, and
testing it on another.
▪ This approach helps assess the generalization
capability of the model and identify potential
issues such as overfitting or underfitting.
DATA REQUIREMENT – Defining data
requirement in data science
Data Type and Sources
Clearly defining data requirements at
Data Volume and Size the outset of a data science project
helps set expectations, ensures data
availability, and guides the subsequent
Data Quality and Completeness stages of data collection,
preprocessing, and analysis.
Data Variables and Features
Regular reassessment and refinement
of data requirements may be necessary
Data Granularity and Temporality as the project progresses and new
insights are gained.
Data Privacy and Security
Data Accessibility and
Availability
Data Governance and
Documentation
Ethical Consideration
DATA UNDERSTANDING – Types of data
Primary
Secondary
Unstructured Data
Semi Structured data
Meta Data – (Descriptive, Structural, Admistrative)
Structured Data
• DataDATA UNDERSTANDING
understanding is a crucial step in data science for business
analytics.
• It involves gaining a comprehensive understanding of the
available data to extract meaningful insights and support
decision-making.
Data Exploration
Data Profiling
Data Sources and Integration
Data Relationships and dependencies
Domain knowledge Integration
Data Sampling and Subset Creation
Data Documentation and Metadata
Data Privacy and Security
Data Preparation steps in Data Science
Data Cleaning
Data Integration
Data Transformation
Feature Selection
Feature Engineering
Data Encoding
Train/Test Split
Confusion Matrix
Performance Metrics
Cross-Validation
Feature Importance
Model Feedback
Performance Evaluation
Error Analysis
Domain Expertise
Continuous Improvement
Communication and
Documnentation
RETAIL
RETAIL
FINANCE
DEFENSE
DEFENSE BAI MANUFACTURING
AGRICULTURE
AGRICULTURE
TRANSPORTATION
▪ Analytics 3.0 represents a paradigm shift in the way data is utilized and
leveraged for decision making. It combines advanced technologies like
AI, machine learning, natural language processing, and big data
analytics to derive actionable insights from complex and diverse
datasets.
▪ One key aspect of Analytics 3.0 is the ability to process and analyze
unstructured data, such as text, images, audio, and video. AI-powered
algorithms can extract valuable information from these data sources,
enabling organizations to gain a deeper understanding of customer
sentiment, preferences, and behavior.
▪ Another defining characteristic of Analytics 3.0 is the integration of real-
time and streaming data analytics. With the proliferation of IoT devices
and sensors, organizations can capture and analyze data in real-time.
This facilitates faster decision making, proactive interventions, and the
ability to respond swiftly to changing market conditions.
Analytics 3.0 - Highlights
Process Optimization
Ethical Use of AI
Nature of Analytical Competition
Data Availability
Technological Advancements
Analytical Talent
Data-driven Culture
Strong Analytical Skills
Advanced Technology and Tools
Domain Expertise
Scalability and Infrastructure
Innovation and Continuous Learning
Agile and Iterative Approach
Business Impact and Results
Collaboration and Communication
Ethical and Responsible Practices
Competing on analytics with internal and
external process
Internal Process External
Operational
Efficiency Customer Analytics
Domain Expertise
Measure Performance
Problem Formulation
Data Exploration and Preparation
Data Visualization and Communication
Statistical and Analytical Techniques
Machine Learning and Predictive Modeling
Experimental Design and A/B Testing
Business Acumen
Continuous Learning and Adaptability
Collaboration and Teamwork
Ethical Considerations
Listening to client, framing central problem and
scoping a project in data science and business
analytics - A step by step process
Client Engagement: Start by actively Identify the Central Problem: Based on the Understand the Business Context: Gain
listening to the client's needs, goals, and client discussions, distill the information to a deeper understanding of the client's
challenges. Engage in open and identify the central problem or opportunity that industry, market dynamics, competition,
meaningful discussions to understand their the project aims to address. Clearly define the and other relevant factors. This knowledge
business context, objectives, and desired problem statement, ensuring that it is specific, will help you frame the problem in the
outcomes. Ask probing questions to clarify measurable, attainable, relevant, and time-bound appropriate business context and identify
any uncertainties and gain a (SMART). The central problem should capture key drivers and constraints that should be
comprehensive understanding of their the essence of the client's challenge and provide considered in the analysis.
requirements. a clear focus for the project.
Formulate Objectives and Goals: Scope the Project: Define the boundaries and Breakdown of Deliverables and
Collaborate with the client to establish scope of the project based on the problem Milestones: Collaboratively establish the
clear project objectives and goals. These statement and objectives. Determine the data project's deliverables, including
should align with the central problem and sources and variables required for analysis, as intermediate milestones and final
provide a roadmap for the analysis. well as any limitations or constraints. Identify outcomes. This breakdown helps manage
Objectives should be specific, measurable, the key stakeholders involved and establish client expectations, track progress, and
achievable, relevant, and time-bound communication channels and reporting ensure that the project stays on track.
(SMART). They should outline the desired requirements. Define the specific analyses, models,
outcomes, such as increasing revenue, reports, or recommendations that will be
optimizing operations, or improving provided at each stage.
customer satisfaction.
Listening to client, framing central problem and
scoping a project in data science and business
analytics - A step by step process
Resource Planning: Assess the resources required to execute the project successfully. This includes the
team members' skills, expertise, and availability, as well as any additional data or technology needs. Allocate
resources effectively to ensure a balanced workload and maximize efficiency.
Risk Assessment and Mitigation: Identify potential risks and challenges that could impact the project's
success. Evaluate the likelihood and impact of each risk and develop mitigation strategies to address them.
Communicate these risks to the client and establish contingency plans to manage any unforeseen
circumstances.
Establish Timelines and Deadlines: Create a project timeline with specific deadlines for each milestone
and deliverable. Clearly communicate the timeline to the client, ensuring mutual agreement on the project
schedule. Regularly monitor and update the timeline throughout the project to manage progress effectively.
Obtain Client Agreement: Seek formal client agreement and sign-off on the project scope, objectives,
deliverables, and timeline. This ensures that both parties have a shared understanding of the project's scope
and expectations.
Check if you are able to answer the
following questions