0% found this document useful (0 votes)
16 views11 pages

Data Science Revised

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

Data Science Revised

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Q1. What is data science?

Ans.1- Data Science is a multidisciplinary field that uses scientific methods, algorithms, processes,
and systems to extract knowledge and insights from structured and unstructured data.

Note: - It combines principles from:

Statistics and Mathematics – for analysing and modelling data.

Computer Science and Programming – for managing, processing, and automating data workflows.

Domain Expertise – to understand the context and apply insights effectively.

Key Components of Data Science:

1.​ Data Collection – Gathering raw data from various sources (e.g., databases, APIs, web
scraping).

2.​ Data Cleaning & Preprocessing – Removing errors, handling missing values, and preparing
data for analysis.

3.​ Exploratory Data Analysis (EDA) – Summarizing main characteristics of the data using
visualizations and statistics.

4.​ Model Building – Applying machine learning or statistical models to make predictions or
discover patterns.

5.​ Interpretation & Insight Generation – Turning model outputs into actionable business
insights.

6.​ Deployment & Monitoring – Integrating models into applications and tracking their
performance in real-world scenarios.

Tools Commonly Used:

●​ Programming Languages: Python, R, SQL

●​ Libraries/Frameworks: Pandas, NumPy, Scikit-learn, TensorFlow, Py-Torch

●​ Visualization Tools: Matplotlib, Seaborn, Power BI, Tableau

Example Applications:

●​ Predicting customer churn in telecom

●​ Detecting fraud in banking

●​ Recommending products on e-commerce platforms

●​ Diagnosing diseases from medical images

Q2. What are the different sectors of using data science?

Ans.2 Key sectors of data science are as follows-

1.​ Healthcare

Use Cases:
●​ Disease prediction (e.g., cancer detection using imaging data)

●​ Drug discovery using AI models

●​ Patient risk scoring and personalized treatment

●​ Hospital resource optimization (e.g., ICU beds)

2. Finance & Banking

●​ Use Cases:

o​ Credit scoring and loan risk assessment

o​ Fraud detection using transaction pattern analysis

o​ Algorithmic trading and portfolio optimization

o​ Customer segmentation for targeted marketing

3. Retail & E-commerce

●​ Use Cases:

o​ Product recommendation engines (e.g., Amazon, Flipkart)

o​ Dynamic pricing and demand forecasting

o​ Customer sentiment analysis from reviews

o​ Inventory and supply chain optimization

4. Transportation & Logistics

●​ Use Cases:

o​ Route optimization and real-time tracking (e.g., Uber, FedEx)

o​ Predictive maintenance of vehicles

o​ Demand forecasting for ride-sharing services

o​ Logistics network optimization

5. Media & Entertainment

●​ Use Cases:

o​ Personalized content recommendations (e.g., Netflix, Spotify)

o​ Social media trend analysis

o​ Viewer behaviour analytics and ad targeting

Q3. What are the purposes of Python?

Ans.3 The purpose of Python is to provide a powerful, readable, and easy-to-learn programming
language that supports a wide range of applications — from automation to web development to data
science and beyond.
Feature Purpose

Python emphasizes readable syntax (close to English), making it ideal for


Simplicity & Readability
beginners and professionals alike.

Versatility Works across various domains — web, data science, automation, AI, etc.

Thousands of packages make it easy to do complex tasks without writing


Extensive Libraries
everything from scratch.

Open Source & Community


Free to use and has a large, active community.
Support

Runs on Windows, Mac, Linux, etc. without major changes in code.


Cross-platform

Q4. What are the components of Python?

Ans. 4 Component Description

The engine that reads and executes Python code line by line. Examples:
1. Python Interpreter
CPython (default), PyPy, Jython

Python uses clear, indentation-based syntax which improves code


2. Syntax
readability

3. Variables & Data


Supports various built-in types: int, float, str, bool, list, tuple, dict, set, etc.
Types

Includes arithmetic, comparison, logical, bitwise, and assignment


4. Operators
operators

5. Control Flow
Used to control the execution flow: if, elif, else, for, while, break, continue
Statements

Blocks of reusable code, defined using the def keyword; supports


6. Functions
recursion, default arguments, lambda functions

Modules are Python files (.py) with functions/classes; packages are


7. Modules & Packages
collections of modules in directories with __init__.py

8. Classes & Objects Python is object-oriented: supports inheritance, encapsulation, and


(OOP) polymorphism

9. Exception Handling Built-in support for handling errors using try, except, finally, raise

10. Libraries & Rich ecosystem for data science, ML, web, automation, etc. (e.g., NumPy,
Frameworks Pandas, Flask, Django)
Ans. 4 Component Description

Reading from and writing to files using built-in functions (open(), read(),
11. File I/O
write())

Comes with pre-built modules for OS interaction, math, datetime, JSON,


12. Standard Library
and more

Installable via pip; expands functionality (e.g., requests, matplotlib,


13. Third-party Libraries
scikit-learn)

14. Virtual Environments Isolates project dependencies using venv or tools like virtualenv, conda

15. Integrated Python is supported by many IDEs: PyCharm, VS Code, Jupyter Notebook,
Development Tools etc.

Q5. What are the different data analytics processes?

Ans.5 – The different data analytics processes are as follows-

1.Data Discovery / Problem Definition

●​ Purpose: Understand the business problem or question.

●​ Activities: Define objectives and KPIs, Identify what kind of data is needed, Stakeholder
consultation.

2. Data Collection / Acquisition

Purpose: Gather relevant data from various sources.

Sources:

a.​ Internal databases (ERP, CRM, etc.)


b.​ Surveys and forms
c.​ Sensors or IoT devices
d.​ Web scraping, APIs, third-party data

3.Data Cleaning / Preprocessing

Purpose: Ensure data quality and consistency.

Tasks: Handle missing values, Remove duplicates, Correct errors, Normalize and standardize data, Convert data
types

4. Data Integration / Transformation

●​ Purpose: Combine and format data for analysis.


●​ Tasks:
o​ Merge datasets
o​ Create new calculated fields
o​ Reshape data (e.g., pivot/unpivot)
o​ Apply business logic
5. Data Analysis / Modeling

●​ Purpose: Identify trends, correlations, and patterns.


●​ Types:
o​ Descriptive Analysis: What happened?
o​ Diagnostic Analysis: Why did it happen?
o​ Predictive Analysis: What will happen?
o​ Prescriptive Analysis: What should we do

6.Statistical Modeling / Machine Learning (Advanced)

●​ Purpose: Build models to predict outcomes or classify data.


●​ Techniques:
o​ Regression, classification
o​ Clustering, time series forecasting
o​ Deep learning, NLP

7.​ Data Visualization & Reporting

●​ Purpose: Present insights clearly to stakeholders.


●​ Tools:
o​ Power BI, Tableau, Excel
o​ Dashboards and automated reports
o​ Charts, graphs, heatmaps

8.​ Decision Making / Action

●​ Purpose: Use insights for strategic, operational, or tactical decisions.


●​ Output:
o​ Business recommendations
o​ Operational improvements
o​ Customer segmentation
o​ Risk mitigation strategies

Q6. What is EDA (Exploratory Data Analysis)? What is the purpose of EDA?

Ans.6 Exploratory Data Analysis (EDA) is the process of examining and visualizing data to understand its
structure, patterns, relationships, and anomalies before applying more formal modelling or statistical
techniques.

Purpose of EDA

●​ Understand data distribution and summary statistics


●​ Identify missing values, outliers, or errors
●​ Discover patterns, trends, and correlations
●​ Decide on feature selection or transformation before modelling.

Q7. What is a Quantitative technique?

Ans.7 Quantitative techniques in data science refer to mathematical, statistical, and computational
methods used to analyse numerical data and extract insights, patterns, and predictions.

Key Features:

●​ Based on numbers and measurable values


●​ Involves mathematical modelling, statistical analysis, and machine learning
●​ Used to predict outcomes, identify trends, and optimize processes
Q8. What is a graphical technique?

Ans.8 A graphical technique in data science refers to the use of visual representations (such as charts,
graphs, plots, and maps) to explore, analyse, and communicate data insights.

These techniques help identify patterns, trends, outliers, and relationships in the data, often making
complex data easier to understand.

Key Purposes of Graphical Techniques:

●​ Exploratory Data Analysis (EDA) – to visually examine the data before formal modeling
●​ Data Communication – to present results clearly to stakeholders
●​ Pattern Recognition – to spot trends, clusters, or anomalies

Q9. State the differences between quantitative techniques and graphical techniques

Ans.9 Quantitave techniques versus


Graphical techniques
Quantitative Techniques Graphical Techniques
Aspect

Use of mathematical, statistical, and Use of visual tools to represent and


Definition
computational methods explore data

Nature Numerical and formula-based Visual and intuitive

To compute, model, and make To visualize patterns, trends, and


Purpose
precise inferences or predictions relationships

Mean, regression, hypothesis Histogram, scatter plot, box plot,


Examples
testing, standard deviation heatmap

Statistical formulas, coding


Tool Used Visualization libraries, BI tools
algorithms

Quantified insights (e.g., correlation Visual understanding (e.g., positive


Type of Insight
= 0.8) trend seen in scatter plot)

May involve complex models or


Complexity Usually simpler, easier to interpret
formulas

Output Format Numbers, coefficients, metrics Graphs, plots, charts

Used in deep analysis and model Used in initial exploration (EDA)


Stage of Analysis
building and final communication

Provides intuitive and visual


Accuracy Provides exact values
interpretation
Data Type Description Best Plots/Charts

Numerical Numbers with meaningful arithmetic (e.g., Histogram, box plot, scatter plot,
(Quantitative) age, salary, temperature) line chart, density plot

Can take any value in a range (e.g., height,


➤ Continuous Line chart, histogram, scatter plot
weight, income)

Takes fixed values (e.g., number of children,


➤ Discrete Bar chart, pie chart, strip plot
count of visits)

Categorical Describes categories or labels (e.g., gender,


Bar chart, pie chart, count plot
(Qualitative) city, product type)

No inherent order (e.g., color, department


➤ Nominal Bar chart, pie chart
name)

Has a logical order (e.g., low/medium/high,


➤ Ordinal Bar chart, stacked bar chart
rating scales)

Data indexed in time order (e.g., daily sales, Line chart, area chart, time-series
Time Series
stock prices over months) plot

Boolean True/False or Yes/No values Bar chart, count plot

Maps (choropleth, scatter geo,


Geospatial Coordinates, regions, or location data
heatmap), point maps

In Python, for plotting purposes—especially with libraries like Matplotlib, Seaborn, or Plotly—the
most used data types are:

1.​ Lists: Basic and flexible, Common for simple plots.


x = [1, 2, 3, 4]
y = [10, 20, 25, 30]

2.NumPy Arrays: Preferred for mathematical and performance reasons.


import numpy as np

x = np.array([1, 2, 3, 4])

y = np.array([10, 20, 25, 30])

3. Pandas Series: Ideal for labelled data or time series.

import pandas as pd

df = pd.DataFrame({

'x': [1, 2, 3, 4],

'y': [10, 20, 25, 30]

})
●​ 4. Dictionary (for specific use cases): Sometimes used for pie charts or bar plots.

python

data = {'Apples': 10, 'Bananas': 15, 'Cherries': 7}

Q11. What is statistics?

Ans.11- Statistics is the field of study that involves collecting, organizing, analysing, interpreting,
and presenting data to make decisions or draw conclusions.

In simple words, statistics helps us make sense of data — whether it is figuring out an average,
understanding patterns, or making predictions based on past information.

Data Type When Used

list Simple plotting and small data

numpy.ndarray Mathematical operations and high performance

pandas.Series/DataFrame Label-based and structured data

Category-based plots like pie/bar charts

dict

Q12. What is statistical analysis? State some of the key components of Statistical Analysis.

Ans.12 - It is the process of collecting, exploring, summarizing, interpreting, and presenting data to
discover underlying patterns, trends, relationships, and insights.

Note: - It forms the backbone of data-driven decision-making in data science, business, healthcare,
economics, and many other fields.

Key components of Statistical Analysis are as follows: -

1.Descriptive Statistics- It summarizes and describe features of a dataset. It mainly includes mean,
medium and mode, Standard Deviation, Variance, Minimum, Maximum, Skewness, Kurtosis.

2.​ Inferential Statistics- It make predictions or inferences about a population


based on a sample. It includes Hypothesis Testing (t-test, z-test, chi-square
test), Confidence Intervals, Regression Analysis ANOVA (Analysis of
Variance).

3.​ Exploratory Data Analysis (EDA)- It posses Visual and statistical


exploration of data to find patterns or anomalies.
Tools: Histograms, Box plots, Scatter plots, Correlation matrix
4.​ Predictive Analytics (based on statistical modeling)

Purpose: Forecast future trends.


Tools:

o​ Linear & Logistic Regression


o​ Time Series Analysis
o​ Classification and Clustering (e.g., KNN, K-means)

Prescriptive Analytics

Uses statistical techniques and optimization to recommend actions.

Simple stats

import pandas as pd

data = [12, 15, 14, 10, 8, 13, 15, 16, 14, 10]

df = pd.DataFrame(data, columns=['Scores'])

print("Mean:", df['Scores'].mean())

print("Standard Deviation:", df['Scores'].std())

print("Median:", df['Scores'].median())

Q 13. Differences between statistical analysis and non- statistical analysis

Aspect Statistical Analysis Non-Statistical Analysis

Involves analyzing data using mathematical Involves qualitative or logical reasoning,


techniques, particularly probability and visual inspection, or descriptive
Definition
statistics, to make inferences or draw examination without using formal
conclusions. statistical methods.

Can use qualitative or quantitative


Primarily uses quantitative data (numbers,
Data Type data, often focusing on non-numerical
measurements).
insights.

Statistical software like R, SPSS, Python (with Tools like descriptive tables, reports,
Tools Used libraries like pandas, NumPy, scipy), Excel diagrams, text analysis tools, or simply
(with formulas/statistics). human judgment.

Regression analysis, hypothesis testing, SWOT analysis, trend observation


Examples correlation, standard deviation, t-tests, without metrics, thematic analysis,
ANOVA. heuristic evaluation.

Generally more objective; relies on numerical Often more subjective; may depend on
Objectivity
evidence and probabilities. the analyst’s interpretation or intuition.

To understand context, categorize


To find patterns, test hypotheses, estimate information, summarize findings, or
Purpose
parameters, or predict outcomes using data. generate insights without strict
mathematical models.
Aspect Statistical Analysis Non-Statistical Analysis

Results are harder to replicate or verify


Accuracy & Results can be tested and replicated without formal methods.
Reliability statistically.

Q14. State the major categories of statistics.

Ans.14 The major categories of statistics are typically divided into two broad branches:

1. Descriptive Statistics

This branch deals with summarizing and organizing data so it can be easily understood.

Key Features:

●​ Focuses on what has happened.

●​ Does not draw conclusions beyond the data.

Common Techniques:

●​ Measures of Central Tendency:

o​ Mean, Median, Mode

●​ Measures of Dispersion:

o​ Range, Variance, Standard Deviation, Interquartile Range

●​ Data Visualization:

o​ Histograms, Pie Charts, Box Plots, Bar Charts

●​ Tabulation:

o​ Frequency distributions, Cross-tabulation

Example:

"The average score of students in a test is 74 out of 100."

🔹 2. Inferential Statistics
This branch involves making predictions or generalizations about a population based on a sample.

Key Features:

●​ Makes inferences and decisions about a population.

●​ Involves uncertainty and uses probability theory.

Common Techniques:
●​ Estimation:

o​ Confidence intervals

●​ Hypothesis Testing:

o​ t-test, z-test, ANOVA, chi-square test

●​ Regression Analysis:

o​ Linear & logistic regression

●​ Correlation Analysis:

o​ Pearson and Spearman correlation

●​ Example:
●​ "Based on a sample of 100 students, we are 95% confident that the average
test score for all students is between 72 and 76."

You might also like