0% found this document useful (0 votes)

16 views11 pages

Data Science Revised

Uploaded by

abhishekkumarkymr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views11 pages

Data Science Revised

Uploaded by

abhishekkumarkymr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Q1. What is data science?

Ans.1- Data Science is a multidisciplinary field that uses scientific methods, algorithms, processes,
and systems to extract knowledge and insights from structured and unstructured data.

Note: - It combines principles from:

Statistics and Mathematics – for analysing and modelling data.

Computer Science and Programming – for managing, processing, and automating data workflows.

Domain Expertise – to understand the context and apply insights effectively.

Key Components of Data Science:

1. Data Collection – Gathering raw data from various sources (e.g., databases, APIs, web
scraping).

2. Data Cleaning & Preprocessing – Removing errors, handling missing values, and preparing
data for analysis.

3. Exploratory Data Analysis (EDA) – Summarizing main characteristics of the data using
visualizations and statistics.

4. Model Building – Applying machine learning or statistical models to make predictions or
discover patterns.

5. Interpretation & Insight Generation – Turning model outputs into actionable business
insights.

6. Deployment & Monitoring – Integrating models into applications and tracking their
performance in real-world scenarios.

Tools Commonly Used:

● Programming Languages: Python, R, SQL

● Libraries/Frameworks: Pandas, NumPy, Scikit-learn, TensorFlow, Py-Torch

● Visualization Tools: Matplotlib, Seaborn, Power BI, Tableau

Example Applications:

● Predicting customer churn in telecom

● Detecting fraud in banking

● Recommending products on e-commerce platforms

● Diagnosing diseases from medical images

Q2. What are the different sectors of using data science?

Ans.2 Key sectors of data science are as follows-

1. Healthcare

Use Cases:
● Disease prediction (e.g., cancer detection using imaging data)

● Drug discovery using AI models

● Patient risk scoring and personalized treatment

● Hospital resource optimization (e.g., ICU beds)

2. Finance & Banking

● Use Cases:

o Credit scoring and loan risk assessment

o Fraud detection using transaction pattern analysis

o Algorithmic trading and portfolio optimization

o Customer segmentation for targeted marketing

3. Retail & E-commerce

● Use Cases:

o Product recommendation engines (e.g., Amazon, Flipkart)

o Dynamic pricing and demand forecasting

o Customer sentiment analysis from reviews

o Inventory and supply chain optimization

4. Transportation & Logistics

● Use Cases:

o Route optimization and real-time tracking (e.g., Uber, FedEx)

o Predictive maintenance of vehicles

o Demand forecasting for ride-sharing services

o Logistics network optimization

5. Media & Entertainment

● Use Cases:

o Personalized content recommendations (e.g., Netflix, Spotify)

o Social media trend analysis

o Viewer behaviour analytics and ad targeting

Q3. What are the purposes of Python?

Ans.3 The purpose of Python is to provide a powerful, readable, and easy-to-learn programming
language that supports a wide range of applications — from automation to web development to data
science and beyond.
Feature Purpose

Python emphasizes readable syntax (close to English), making it ideal for

Simplicity & Readability
beginners and professionals alike.

Versatility Works across various domains — web, data science, automation, AI, etc.

Thousands of packages make it easy to do complex tasks without writing

Extensive Libraries
everything from scratch.

Open Source & Community

Free to use and has a large, active community.
Support

Runs on Windows, Mac, Linux, etc. without major changes in code.

Cross-platform

Q4. What are the components of Python?

Ans. 4 Component Description

The engine that reads and executes Python code line by line. Examples:
1. Python Interpreter
CPython (default), PyPy, Jython

Python uses clear, indentation-based syntax which improves code

2. Syntax
readability

3. Variables & Data

Supports various built-in types: int, float, str, bool, list, tuple, dict, set, etc.
Types

Includes arithmetic, comparison, logical, bitwise, and assignment

4. Operators
operators

5. Control Flow
Used to control the execution flow: if, elif, else, for, while, break, continue
Statements

Blocks of reusable code, defined using the def keyword; supports

6. Functions
recursion, default arguments, lambda functions

Modules are Python files (.py) with functions/classes; packages are

7. Modules & Packages
collections of modules in directories with __init__.py

8. Classes & Objects Python is object-oriented: supports inheritance, encapsulation, and

(OOP) polymorphism

9. Exception Handling Built-in support for handling errors using try, except, finally, raise

10. Libraries & Rich ecosystem for data science, ML, web, automation, etc. (e.g., NumPy,
Frameworks Pandas, Flask, Django)
Ans. 4 Component Description

Reading from and writing to files using built-in functions (open(), read(),
11. File I/O
write())

Comes with pre-built modules for OS interaction, math, datetime, JSON,

12. Standard Library
and more

Installable via pip; expands functionality (e.g., requests, matplotlib,

13. Third-party Libraries
scikit-learn)

14. Virtual Environments Isolates project dependencies using venv or tools like virtualenv, conda

15. Integrated Python is supported by many IDEs: PyCharm, VS Code, Jupyter Notebook,
Development Tools etc.

Q5. What are the different data analytics processes?

Ans.5 – The different data analytics processes are as follows-

1.Data Discovery / Problem Definition

● Purpose: Understand the business problem or question.

● Activities: Define objectives and KPIs, Identify what kind of data is needed, Stakeholder
consultation.

2. Data Collection / Acquisition

Purpose: Gather relevant data from various sources.

Sources:

a. Internal databases (ERP, CRM, etc.)

b. Surveys and forms
c. Sensors or IoT devices
d. Web scraping, APIs, third-party data

3.Data Cleaning / Preprocessing

Purpose: Ensure data quality and consistency.

Tasks: Handle missing values, Remove duplicates, Correct errors, Normalize and standardize data, Convert data
types

4. Data Integration / Transformation

● Purpose: Combine and format data for analysis.

● Tasks:
o Merge datasets
o Create new calculated fields
o Reshape data (e.g., pivot/unpivot)
o Apply business logic
5. Data Analysis / Modeling

● Purpose: Identify trends, correlations, and patterns.

● Types:
o Descriptive Analysis: What happened?
o Diagnostic Analysis: Why did it happen?
o Predictive Analysis: What will happen?
o Prescriptive Analysis: What should we do

6.Statistical Modeling / Machine Learning (Advanced)

● Purpose: Build models to predict outcomes or classify data.

● Techniques:
o Regression, classification
o Clustering, time series forecasting
o Deep learning, NLP

7. Data Visualization & Reporting

● Purpose: Present insights clearly to stakeholders.

● Tools:
o Power BI, Tableau, Excel
o Dashboards and automated reports
o Charts, graphs, heatmaps

8. Decision Making / Action

● Purpose: Use insights for strategic, operational, or tactical decisions.

● Output:
o Business recommendations
o Operational improvements
o Customer segmentation
o Risk mitigation strategies

Q6. What is EDA (Exploratory Data Analysis)? What is the purpose of EDA?

Ans.6 Exploratory Data Analysis (EDA) is the process of examining and visualizing data to understand its
structure, patterns, relationships, and anomalies before applying more formal modelling or statistical
techniques.

Purpose of EDA

● Understand data distribution and summary statistics

● Identify missing values, outliers, or errors
● Discover patterns, trends, and correlations
● Decide on feature selection or transformation before modelling.

Q7. What is a Quantitative technique?

Ans.7 Quantitative techniques in data science refer to mathematical, statistical, and computational
methods used to analyse numerical data and extract insights, patterns, and predictions.

Key Features:

● Based on numbers and measurable values

● Involves mathematical modelling, statistical analysis, and machine learning
● Used to predict outcomes, identify trends, and optimize processes
Q8. What is a graphical technique?

Ans.8 A graphical technique in data science refers to the use of visual representations (such as charts,
graphs, plots, and maps) to explore, analyse, and communicate data insights.

These techniques help identify patterns, trends, outliers, and relationships in the data, often making
complex data easier to understand.

Key Purposes of Graphical Techniques:

● Exploratory Data Analysis (EDA) – to visually examine the data before formal modeling
● Data Communication – to present results clearly to stakeholders
● Pattern Recognition – to spot trends, clusters, or anomalies

Q9. State the differences between quantitative techniques and graphical techniques

Ans.9 Quantitave techniques versus

Graphical techniques
Quantitative Techniques Graphical Techniques
Aspect

Use of mathematical, statistical, and Use of visual tools to represent and

Definition
computational methods explore data

Nature Numerical and formula-based Visual and intuitive

To compute, model, and make To visualize patterns, trends, and

Purpose
precise inferences or predictions relationships

Mean, regression, hypothesis Histogram, scatter plot, box plot,

Examples
testing, standard deviation heatmap

Statistical formulas, coding

Tool Used Visualization libraries, BI tools
algorithms

Quantified insights (e.g., correlation Visual understanding (e.g., positive

Type of Insight
= 0.8) trend seen in scatter plot)

May involve complex models or

Complexity Usually simpler, easier to interpret
formulas

Output Format Numbers, coefficients, metrics Graphs, plots, charts

Used in deep analysis and model Used in initial exploration (EDA)

Stage of Analysis
building and final communication

Provides intuitive and visual

Accuracy Provides exact values
interpretation
Data Type Description Best Plots/Charts

Numerical Numbers with meaningful arithmetic (e.g., Histogram, box plot, scatter plot,
(Quantitative) age, salary, temperature) line chart, density plot

Can take any value in a range (e.g., height,

➤ Continuous Line chart, histogram, scatter plot
weight, income)

Takes fixed values (e.g., number of children,

➤ Discrete Bar chart, pie chart, strip plot
count of visits)

Categorical Describes categories or labels (e.g., gender,

Bar chart, pie chart, count plot
(Qualitative) city, product type)

No inherent order (e.g., color, department

➤ Nominal Bar chart, pie chart
name)

Has a logical order (e.g., low/medium/high,

➤ Ordinal Bar chart, stacked bar chart
rating scales)

Data indexed in time order (e.g., daily sales, Line chart, area chart, time-series
Time Series
stock prices over months) plot

Boolean True/False or Yes/No values Bar chart, count plot

Maps (choropleth, scatter geo,

Geospatial Coordinates, regions, or location data
heatmap), point maps

In Python, for plotting purposes—especially with libraries like Matplotlib, Seaborn, or Plotly—the
most used data types are:

1. Lists: Basic and flexible, Common for simple plots.

x = [1, 2, 3, 4]
y = [10, 20, 25, 30]

2.NumPy Arrays: Preferred for mathematical and performance reasons.

import numpy as np

x = np.array([1, 2, 3, 4])

y = np.array([10, 20, 25, 30])

3. Pandas Series: Ideal for labelled data or time series.

import pandas as pd

df = pd.DataFrame({

'x': [1, 2, 3, 4],

'y': [10, 20, 25, 30]

})
● 4. Dictionary (for specific use cases): Sometimes used for pie charts or bar plots.

python

data = {'Apples': 10, 'Bananas': 15, 'Cherries': 7}

Q11. What is statistics?

Ans.11- Statistics is the field of study that involves collecting, organizing, analysing, interpreting,
and presenting data to make decisions or draw conclusions.

In simple words, statistics helps us make sense of data — whether it is figuring out an average,
understanding patterns, or making predictions based on past information.

Data Type When Used

list Simple plotting and small data

numpy.ndarray Mathematical operations and high performance

pandas.Series/DataFrame Label-based and structured data

Category-based plots like pie/bar charts

dict

Q12. What is statistical analysis? State some of the key components of Statistical Analysis.

Ans.12 - It is the process of collecting, exploring, summarizing, interpreting, and presenting data to
discover underlying patterns, trends, relationships, and insights.

Note: - It forms the backbone of data-driven decision-making in data science, business, healthcare,
economics, and many other fields.

Key components of Statistical Analysis are as follows: -

1.Descriptive Statistics- It summarizes and describe features of a dataset. It mainly includes mean,
medium and mode, Standard Deviation, Variance, Minimum, Maximum, Skewness, Kurtosis.

2. Inferential Statistics- It make predictions or inferences about a population

based on a sample. It includes Hypothesis Testing (t-test, z-test, chi-square
test), Confidence Intervals, Regression Analysis ANOVA (Analysis of
Variance).

3. Exploratory Data Analysis (EDA)- It posses Visual and statistical

exploration of data to find patterns or anomalies.
Tools: Histograms, Box plots, Scatter plots, Correlation matrix
4. Predictive Analytics (based on statistical modeling)

Purpose: Forecast future trends.

Tools:

o Linear & Logistic Regression

o Time Series Analysis
o Classification and Clustering (e.g., KNN, K-means)

Prescriptive Analytics

Uses statistical techniques and optimization to recommend actions.

Simple stats

import pandas as pd

data = [12, 15, 14, 10, 8, 13, 15, 16, 14, 10]

df = pd.DataFrame(data, columns=['Scores'])

print("Mean:", df['Scores'].mean())

print("Standard Deviation:", df['Scores'].std())

print("Median:", df['Scores'].median())

Q 13. Differences between statistical analysis and non- statistical analysis

Aspect Statistical Analysis Non-Statistical Analysis

Involves analyzing data using mathematical Involves qualitative or logical reasoning,

techniques, particularly probability and visual inspection, or descriptive
Definition
statistics, to make inferences or draw examination without using formal
conclusions. statistical methods.

Can use qualitative or quantitative

Primarily uses quantitative data (numbers,
Data Type data, often focusing on non-numerical
measurements).
insights.

Statistical software like R, SPSS, Python (with Tools like descriptive tables, reports,
Tools Used libraries like pandas, NumPy, scipy), Excel diagrams, text analysis tools, or simply
(with formulas/statistics). human judgment.

Regression analysis, hypothesis testing, SWOT analysis, trend observation

Examples correlation, standard deviation, t-tests, without metrics, thematic analysis,
ANOVA. heuristic evaluation.

Generally more objective; relies on numerical Often more subjective; may depend on
Objectivity
evidence and probabilities. the analyst’s interpretation or intuition.

To understand context, categorize

To find patterns, test hypotheses, estimate information, summarize findings, or
Purpose
parameters, or predict outcomes using data. generate insights without strict
mathematical models.
Aspect Statistical Analysis Non-Statistical Analysis

Results are harder to replicate or verify

Accuracy & Results can be tested and replicated without formal methods.
Reliability statistically.

Q14. State the major categories of statistics.

Ans.14 The major categories of statistics are typically divided into two broad branches:

1. Descriptive Statistics

This branch deals with summarizing and organizing data so it can be easily understood.

Key Features:

● Focuses on what has happened.

● Does not draw conclusions beyond the data.

Common Techniques:

● Measures of Central Tendency:

o Mean, Median, Mode

● Measures of Dispersion:

o Range, Variance, Standard Deviation, Interquartile Range

● Data Visualization:

o Histograms, Pie Charts, Box Plots, Bar Charts

● Tabulation:

o Frequency distributions, Cross-tabulation

Example:

"The average score of students in a test is 74 out of 100."

🔹 2. Inferential Statistics
This branch involves making predictions or generalizations about a population based on a sample.

Key Features:

● Makes inferences and decisions about a population.

● Involves uncertainty and uses probability theory.

Common Techniques:
● Estimation:

o Confidence intervals

● Hypothesis Testing:

o t-test, z-test, ANOVA, chi-square test

● Regression Analysis:

o Linear & logistic regression

● Correlation Analysis:

o Pearson and Spearman correlation

● Example:
● "Based on a sample of 100 students, we are 95% confident that the average
test score for all students is between 72 and 76."

DAta Sciencefull
No ratings yet
DAta Sciencefull
38 pages
Introduction-It Skills
No ratings yet
Introduction-It Skills
20 pages
Data Science 2
No ratings yet
Data Science 2
15 pages
DS Syllabus
No ratings yet
DS Syllabus
29 pages
DS Unit 1 - NUMPY
No ratings yet
DS Unit 1 - NUMPY
29 pages
Python's Role in Data Science Explained
No ratings yet
Python's Role in Data Science Explained
17 pages
Py Chapter 1 Topic 1
No ratings yet
Py Chapter 1 Topic 1
7 pages
Introduction To Python 1
No ratings yet
Introduction To Python 1
13 pages
Datascience Notes Unit-1
No ratings yet
Datascience Notes Unit-1
19 pages
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
No ratings yet
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
35 pages
Python for Data Science Overview
No ratings yet
Python for Data Science Overview
20 pages
Data Science Using Python
No ratings yet
Data Science Using Python
12 pages
Python's Role in Data Science Explained
No ratings yet
Python's Role in Data Science Explained
2 pages
DS Unit 1 Chapter 1
No ratings yet
DS Unit 1 Chapter 1
40 pages
Python Data Science Intro Course
No ratings yet
Python Data Science Intro Course
6 pages
Introduction to Data Science with Python
No ratings yet
Introduction to Data Science with Python
10 pages
Data Science with Open Source Tools
No ratings yet
Data Science with Open Source Tools
91 pages
PDS Unit1-1
No ratings yet
PDS Unit1-1
104 pages
Python for Data Analysis Overview
No ratings yet
Python for Data Analysis Overview
49 pages
Data Science Using Python - Introduction
No ratings yet
Data Science Using Python - Introduction
6 pages
Paper 5184
No ratings yet
Paper 5184
7 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
4 pages
PDS Chapter 2
No ratings yet
PDS Chapter 2
10 pages
Introduction To EDA
No ratings yet
Introduction To EDA
16 pages
Finall Report Internship
No ratings yet
Finall Report Internship
45 pages
Exploratory Data Analysis With Python
No ratings yet
Exploratory Data Analysis With Python
24 pages
Data Science Concepts and Tools Explained
No ratings yet
Data Science Concepts and Tools Explained
3 pages
FDS Syllabus and CIS
No ratings yet
FDS Syllabus and CIS
10 pages
Py Chapter 1 Topic 3
No ratings yet
Py Chapter 1 Topic 3
4 pages
AI Practical Viva Questions 2024-2025
No ratings yet
AI Practical Viva Questions 2024-2025
3 pages
T - Report Abhishek Choudary
No ratings yet
T - Report Abhishek Choudary
17 pages
FODS Full Notes
No ratings yet
FODS Full Notes
217 pages
DAV Notes
No ratings yet
DAV Notes
266 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
49 pages
Da Ans (GKJ)
No ratings yet
Da Ans (GKJ)
11 pages
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
No ratings yet
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
27 pages
Unit I Introduction To Data Science 9
No ratings yet
Unit I Introduction To Data Science 9
20 pages
Python in Data Science: Key Concepts
No ratings yet
Python in Data Science: Key Concepts
17 pages
1-Pre Requisite For Data Scientist-03!01!2025
No ratings yet
1-Pre Requisite For Data Scientist-03!01!2025
26 pages
Python For Data Science (Anees Ahamad) - 20250408 - 180733 - 0000
No ratings yet
Python For Data Science (Anees Ahamad) - 20250408 - 180733 - 0000
12 pages
Data Science With Python Unlocking Insights
No ratings yet
Data Science With Python Unlocking Insights
8 pages
Python Libraries for Data Science
No ratings yet
Python Libraries for Data Science
4 pages
Chapter 1+ Python Basics
No ratings yet
Chapter 1+ Python Basics
6 pages
Introduction to Data Science Tools
No ratings yet
Introduction to Data Science Tools
12 pages
Foundations of Data Science Syllabus
No ratings yet
Foundations of Data Science Syllabus
244 pages
Python For Data Science .
100% (5)
Python For Data Science .
112 pages
Numpy Notes
No ratings yet
Numpy Notes
38 pages
Essential Data Science Tools Overview
No ratings yet
Essential Data Science Tools Overview
3 pages
Main PART PDF
No ratings yet
Main PART PDF
46 pages
Ppt1 Variable Strings Functions
No ratings yet
Ppt1 Variable Strings Functions
87 pages
DS 3-Marks Semeseter Suggestion
No ratings yet
DS 3-Marks Semeseter Suggestion
54 pages
Python Libraries Seminar Report
100% (2)
Python Libraries Seminar Report
16 pages
Data Science My Notes
No ratings yet
Data Science My Notes
61 pages
Data Science Book
No ratings yet
Data Science Book
383 pages
SENG419-python 98745
No ratings yet
SENG419-python 98745
103 pages
Python Libraries for Data Science Report
No ratings yet
Python Libraries for Data Science Report
47 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
6th Sem Cse Data Science Analytics SM o
No ratings yet
6th Sem Cse Data Science Analytics SM o
40 pages
Data Visualisation Using Matplotlib
No ratings yet
Data Visualisation Using Matplotlib
24 pages
Data Science
No ratings yet
Data Science
4 pages
Ensembling
No ratings yet
Ensembling
9 pages
Jharkhand University of Technology, Ranchi
No ratings yet
Jharkhand University of Technology, Ranchi
2 pages
Notice 144
No ratings yet
Notice 144
1 page
Data Communication Assignment
No ratings yet
Data Communication Assignment
2 pages
Eng Oth 50 Ne Col
No ratings yet
Eng Oth 50 Ne Col
15 pages
Eng Oth 47 Ne Col
No ratings yet
Eng Oth 47 Ne Col
25 pages
Eng Oth 45 Ne Col
No ratings yet
Eng Oth 45 Ne Col
12 pages
ISTE Leaflet New
No ratings yet
ISTE Leaflet New
2 pages
Eng Oth 46 Ne Col
No ratings yet
Eng Oth 46 Ne Col
19 pages
Ethnolinguistic Overview of Kohistani Languages
No ratings yet
Ethnolinguistic Overview of Kohistani Languages
6 pages
Essential Guide to Business Letters
No ratings yet
Essential Guide to Business Letters
22 pages
Android
No ratings yet
Android
42 pages
Algorithm Efficiency in COMP 182
No ratings yet
Algorithm Efficiency in COMP 182
5 pages
CDP-2) Demographic Profile
No ratings yet
CDP-2) Demographic Profile
17 pages
Didache
No ratings yet
Didache
10 pages
Emily Dickinson Term Paper
100% (1)
Emily Dickinson Term Paper
11 pages
Syzygium Gaertner Species in Borneo
No ratings yet
Syzygium Gaertner Species in Borneo
70 pages
Uerdingen: History and Economy Overview
No ratings yet
Uerdingen: History and Economy Overview
4 pages
Skillful 2ed 1 Reading & Writing IELTS
No ratings yet
Skillful 2ed 1 Reading & Writing IELTS
6 pages
PTE General Level 2 Listening Test
No ratings yet
PTE General Level 2 Listening Test
2 pages
Women as Sexual Property in Scripture
No ratings yet
Women as Sexual Property in Scripture
8 pages
Islamic Perspective on Kohl Use
No ratings yet
Islamic Perspective on Kohl Use
6 pages
Influential People Unit Plan for Grades 5/6
No ratings yet
Influential People Unit Plan for Grades 5/6
5 pages
Java Faculty Guide Overview
No ratings yet
Java Faculty Guide Overview
266 pages
Bitcoin CLI & RPC Guide
No ratings yet
Bitcoin CLI & RPC Guide
5 pages
Philosophy of Technology Insights
No ratings yet
Philosophy of Technology Insights
3 pages
Basic 3 Mathematics
No ratings yet
Basic 3 Mathematics
5 pages
Han 1989
No ratings yet
Han 1989
8 pages
Calculus Unit 5
No ratings yet
Calculus Unit 5
4 pages
The Saas Marketing Handbook
100% (7)
The Saas Marketing Handbook
45 pages
Amazonian Language Isolates Handbook
No ratings yet
Amazonian Language Isolates Handbook
744 pages
Verbs List
No ratings yet
Verbs List
9 pages
International Code of Signals
No ratings yet
International Code of Signals
8 pages
Std-2 Joyful Mathematics EnglishMedium
No ratings yet
Std-2 Joyful Mathematics EnglishMedium
152 pages
C Programming Basics Guide
No ratings yet
C Programming Basics Guide
7 pages
Web Developer Resume of Suraj Yadav
No ratings yet
Web Developer Resume of Suraj Yadav
1 page
Configure Your Trimble SPS 461
No ratings yet
Configure Your Trimble SPS 461
6 pages
Grammar Jeopardy Game Questions
No ratings yet
Grammar Jeopardy Game Questions
69 pages
2025-Diffusion Model-Based Image Editing-A Survey
No ratings yet
2025-Diffusion Model-Based Image Editing-A Survey
27 pages

Data Science Revised

Uploaded by

Data Science Revised

Uploaded by

Q1. What is data science?

Note: - It combines principles from:

Statistics and Mathematics – for analysing and modelling data.

Domain Expertise – to understand the context and apply insights effectively.

Key Components of Data Science:

Tools Commonly Used:

●​ Programming Languages: Python, R, SQL

●​ Libraries/Frameworks: Pandas, NumPy, Scikit-learn, TensorFlow, Py-Torch

●​ Visualization Tools: Matplotlib, Seaborn, Power BI, Tableau

●​ Predicting customer churn in telecom

●​ Detecting fraud in banking

●​ Recommending products on e-commerce platforms

●​ Diagnosing diseases from medical images

Q2. What are the different sectors of using data science?

Ans.2 Key sectors of data science are as follows-

●​ Drug discovery using AI models

●​ Patient risk scoring and personalized treatment

●​ Hospital resource optimization (e.g., ICU beds)

2. Finance & Banking

o​ Credit scoring and loan risk assessment

o​ Fraud detection using transaction pattern analysis

o​ Algorithmic trading and portfolio optimization

o​ Customer segmentation for targeted marketing

3. Retail & E-commerce

o​ Product recommendation engines (e.g., Amazon, Flipkart)

o​ Dynamic pricing and demand forecasting

o​ Customer sentiment analysis from reviews

o​ Inventory and supply chain optimization

4. Transportation & Logistics

o​ Route optimization and real-time tracking (e.g., Uber, FedEx)

o​ Predictive maintenance of vehicles

o​ Demand forecasting for ride-sharing services

o​ Logistics network optimization

5. Media & Entertainment

o​ Personalized content recommendations (e.g., Netflix, Spotify)

o​ Social media trend analysis

o​ Viewer behaviour analytics and ad targeting

Q3. What are the purposes of Python?

Python emphasizes readable syntax (close to English), making it ideal for

Thousands of packages make it easy to do complex tasks without writing

Open Source & Community

Runs on Windows, Mac, Linux, etc. without major changes in code.

Q4. What are the components of Python?

Ans. 4 Component Description

Python uses clear, indentation-based syntax which improves code

3. Variables & Data

Includes arithmetic, comparison, logical, bitwise, and assignment

Blocks of reusable code, defined using the def keyword; supports

Modules are Python files (.py) with functions/classes; packages are

8. Classes & Objects Python is object-oriented: supports inheritance, encapsulation, and

Comes with pre-built modules for OS interaction, math, datetime, JSON,

Installable via pip; expands functionality (e.g., requests, matplotlib,

Q5. What are the different data analytics processes?

Ans.5 – The different data analytics processes are as follows-

1.Data Discovery / Problem Definition

●​ Purpose: Understand the business problem or question.

2. Data Collection / Acquisition

Purpose: Gather relevant data from various sources.

a.​ Internal databases (ERP, CRM, etc.)

3.Data Cleaning / Preprocessing

Purpose: Ensure data quality and consistency.

4. Data Integration / Transformation

●​ Purpose: Combine and format data for analysis.

●​ Purpose: Identify trends, correlations, and patterns.

6.Statistical Modeling / Machine Learning (Advanced)

●​ Purpose: Build models to predict outcomes or classify data.

7.​ Data Visualization & Reporting

●​ Purpose: Present insights clearly to stakeholders.

8.​ Decision Making / Action

●​ Purpose: Use insights for strategic, operational, or tactical decisions.

●​ Understand data distribution and summary statistics

Q7. What is a Quantitative technique?

●​ Based on numbers and measurable values

Key Purposes of Graphical Techniques:

Ans.9 Quantitave techniques versus

Use of mathematical, statistical, and Use of visual tools to represent and

● Programming Languages: Python, R, SQL

● Libraries/Frameworks: Pandas, NumPy, Scikit-learn, TensorFlow, Py-Torch

● Visualization Tools: Matplotlib, Seaborn, Power BI, Tableau

● Predicting customer churn in telecom

● Detecting fraud in banking

● Recommending products on e-commerce platforms

● Diagnosing diseases from medical images

● Drug discovery using AI models

● Patient risk scoring and personalized treatment

● Hospital resource optimization (e.g., ICU beds)

o Credit scoring and loan risk assessment

o Fraud detection using transaction pattern analysis

o Algorithmic trading and portfolio optimization

o Customer segmentation for targeted marketing

o Product recommendation engines (e.g., Amazon, Flipkart)

o Dynamic pricing and demand forecasting

o Customer sentiment analysis from reviews

o Inventory and supply chain optimization

o Route optimization and real-time tracking (e.g., Uber, FedEx)

o Predictive maintenance of vehicles

o Demand forecasting for ride-sharing services

o Logistics network optimization

o Personalized content recommendations (e.g., Netflix, Spotify)

o Social media trend analysis

o Viewer behaviour analytics and ad targeting

● Purpose: Understand the business problem or question.

a. Internal databases (ERP, CRM, etc.)

● Purpose: Combine and format data for analysis.

● Purpose: Identify trends, correlations, and patterns.

● Purpose: Build models to predict outcomes or classify data.

7. Data Visualization & Reporting

● Purpose: Present insights clearly to stakeholders.

8. Decision Making / Action

● Purpose: Use insights for strategic, operational, or tactical decisions.

● Understand data distribution and summary statistics

● Based on numbers and measurable values

1. Lists: Basic and flexible, Common for simple plots.

2. Inferential Statistics- It make predictions or inferences about a population

3. Exploratory Data Analysis (EDA)- It posses Visual and statistical

o Linear & Logistic Regression

● Focuses on what has happened.

● Does not draw conclusions beyond the data.

● Measures of Central Tendency:

o Mean, Median, Mode

o Range, Variance, Standard Deviation, Interquartile Range

o Histograms, Pie Charts, Box Plots, Bar Charts

o Frequency distributions, Cross-tabulation

● Makes inferences and decisions about a population.

● Involves uncertainty and uses probability theory.

o t-test, z-test, ANOVA, chi-square test

o Linear & logistic regression

o Pearson and Spearman correlation