Q1. What is data science?
Ans.1- Data Science is a multidisciplinary field that uses scientific methods, algorithms, processes,
and systems to extract knowledge and insights from structured and unstructured data.
Note: - It combines principles from:
Statistics and Mathematics – for analysing and modelling data.
Computer Science and Programming – for managing, processing, and automating data workflows.
Domain Expertise – to understand the context and apply insights effectively.
Key Components of Data Science:
1. Data Collection – Gathering raw data from various sources (e.g., databases, APIs, web
scraping).
2. Data Cleaning & Preprocessing – Removing errors, handling missing values, and preparing
data for analysis.
3. Exploratory Data Analysis (EDA) – Summarizing main characteristics of the data using
visualizations and statistics.
4. Model Building – Applying machine learning or statistical models to make predictions or
discover patterns.
5. Interpretation & Insight Generation – Turning model outputs into actionable business
insights.
6. Deployment & Monitoring – Integrating models into applications and tracking their
performance in real-world scenarios.
Tools Commonly Used:
● Programming Languages: Python, R, SQL
● Libraries/Frameworks: Pandas, NumPy, Scikit-learn, TensorFlow, Py-Torch
● Visualization Tools: Matplotlib, Seaborn, Power BI, Tableau
Example Applications:
● Predicting customer churn in telecom
● Detecting fraud in banking
● Recommending products on e-commerce platforms
● Diagnosing diseases from medical images
Q2. What are the different sectors of using data science?
Ans.2 Key sectors of data science are as follows-
1. Healthcare
Use Cases:
● Disease prediction (e.g., cancer detection using imaging data)
● Drug discovery using AI models
● Patient risk scoring and personalized treatment
● Hospital resource optimization (e.g., ICU beds)
2. Finance & Banking
● Use Cases:
o Credit scoring and loan risk assessment
o Fraud detection using transaction pattern analysis
o Algorithmic trading and portfolio optimization
o Customer segmentation for targeted marketing
3. Retail & E-commerce
● Use Cases:
o Product recommendation engines (e.g., Amazon, Flipkart)
o Dynamic pricing and demand forecasting
o Customer sentiment analysis from reviews
o Inventory and supply chain optimization
4. Transportation & Logistics
● Use Cases:
o Route optimization and real-time tracking (e.g., Uber, FedEx)
o Predictive maintenance of vehicles
o Demand forecasting for ride-sharing services
o Logistics network optimization
5. Media & Entertainment
● Use Cases:
o Personalized content recommendations (e.g., Netflix, Spotify)
o Social media trend analysis
o Viewer behaviour analytics and ad targeting
Q3. What are the purposes of Python?
Ans.3 The purpose of Python is to provide a powerful, readable, and easy-to-learn programming
language that supports a wide range of applications — from automation to web development to data
science and beyond.
Feature Purpose
Python emphasizes readable syntax (close to English), making it ideal for
Simplicity & Readability
beginners and professionals alike.
Versatility Works across various domains — web, data science, automation, AI, etc.
Thousands of packages make it easy to do complex tasks without writing
Extensive Libraries
everything from scratch.
Open Source & Community
Free to use and has a large, active community.
Support
Runs on Windows, Mac, Linux, etc. without major changes in code.
Cross-platform
Q4. What are the components of Python?
Ans. 4 Component Description
The engine that reads and executes Python code line by line. Examples:
1. Python Interpreter
CPython (default), PyPy, Jython
Python uses clear, indentation-based syntax which improves code
2. Syntax
readability
3. Variables & Data
Supports various built-in types: int, float, str, bool, list, tuple, dict, set, etc.
Types
Includes arithmetic, comparison, logical, bitwise, and assignment
4. Operators
operators
5. Control Flow
Used to control the execution flow: if, elif, else, for, while, break, continue
Statements
Blocks of reusable code, defined using the def keyword; supports
6. Functions
recursion, default arguments, lambda functions
Modules are Python files (.py) with functions/classes; packages are
7. Modules & Packages
collections of modules in directories with __init__.py
8. Classes & Objects Python is object-oriented: supports inheritance, encapsulation, and
(OOP) polymorphism
9. Exception Handling Built-in support for handling errors using try, except, finally, raise
10. Libraries & Rich ecosystem for data science, ML, web, automation, etc. (e.g., NumPy,
Frameworks Pandas, Flask, Django)
Ans. 4 Component Description
Reading from and writing to files using built-in functions (open(), read(),
11. File I/O
write())
Comes with pre-built modules for OS interaction, math, datetime, JSON,
12. Standard Library
and more
Installable via pip; expands functionality (e.g., requests, matplotlib,
13. Third-party Libraries
scikit-learn)
14. Virtual Environments Isolates project dependencies using venv or tools like virtualenv, conda
15. Integrated Python is supported by many IDEs: PyCharm, VS Code, Jupyter Notebook,
Development Tools etc.
Q5. What are the different data analytics processes?
Ans.5 – The different data analytics processes are as follows-
1.Data Discovery / Problem Definition
● Purpose: Understand the business problem or question.
● Activities: Define objectives and KPIs, Identify what kind of data is needed, Stakeholder
consultation.
2. Data Collection / Acquisition
Purpose: Gather relevant data from various sources.
Sources:
a. Internal databases (ERP, CRM, etc.)
b. Surveys and forms
c. Sensors or IoT devices
d. Web scraping, APIs, third-party data
3.Data Cleaning / Preprocessing
Purpose: Ensure data quality and consistency.
Tasks: Handle missing values, Remove duplicates, Correct errors, Normalize and standardize data, Convert data
types
4. Data Integration / Transformation
● Purpose: Combine and format data for analysis.
● Tasks:
o Merge datasets
o Create new calculated fields
o Reshape data (e.g., pivot/unpivot)
o Apply business logic
5. Data Analysis / Modeling
● Purpose: Identify trends, correlations, and patterns.
● Types:
o Descriptive Analysis: What happened?
o Diagnostic Analysis: Why did it happen?
o Predictive Analysis: What will happen?
o Prescriptive Analysis: What should we do
6.Statistical Modeling / Machine Learning (Advanced)
● Purpose: Build models to predict outcomes or classify data.
● Techniques:
o Regression, classification
o Clustering, time series forecasting
o Deep learning, NLP
7. Data Visualization & Reporting
● Purpose: Present insights clearly to stakeholders.
● Tools:
o Power BI, Tableau, Excel
o Dashboards and automated reports
o Charts, graphs, heatmaps
8. Decision Making / Action
● Purpose: Use insights for strategic, operational, or tactical decisions.
● Output:
o Business recommendations
o Operational improvements
o Customer segmentation
o Risk mitigation strategies
Q6. What is EDA (Exploratory Data Analysis)? What is the purpose of EDA?
Ans.6 Exploratory Data Analysis (EDA) is the process of examining and visualizing data to understand its
structure, patterns, relationships, and anomalies before applying more formal modelling or statistical
techniques.
Purpose of EDA
● Understand data distribution and summary statistics
● Identify missing values, outliers, or errors
● Discover patterns, trends, and correlations
● Decide on feature selection or transformation before modelling.
Q7. What is a Quantitative technique?
Ans.7 Quantitative techniques in data science refer to mathematical, statistical, and computational
methods used to analyse numerical data and extract insights, patterns, and predictions.
Key Features:
● Based on numbers and measurable values
● Involves mathematical modelling, statistical analysis, and machine learning
● Used to predict outcomes, identify trends, and optimize processes
Q8. What is a graphical technique?
Ans.8 A graphical technique in data science refers to the use of visual representations (such as charts,
graphs, plots, and maps) to explore, analyse, and communicate data insights.
These techniques help identify patterns, trends, outliers, and relationships in the data, often making
complex data easier to understand.
Key Purposes of Graphical Techniques:
● Exploratory Data Analysis (EDA) – to visually examine the data before formal modeling
● Data Communication – to present results clearly to stakeholders
● Pattern Recognition – to spot trends, clusters, or anomalies
Q9. State the differences between quantitative techniques and graphical techniques
Ans.9 Quantitave techniques versus
Graphical techniques
Quantitative Techniques Graphical Techniques
Aspect
Use of mathematical, statistical, and Use of visual tools to represent and
Definition
computational methods explore data
Nature Numerical and formula-based Visual and intuitive
To compute, model, and make To visualize patterns, trends, and
Purpose
precise inferences or predictions relationships
Mean, regression, hypothesis Histogram, scatter plot, box plot,
Examples
testing, standard deviation heatmap
Statistical formulas, coding
Tool Used Visualization libraries, BI tools
algorithms
Quantified insights (e.g., correlation Visual understanding (e.g., positive
Type of Insight
= 0.8) trend seen in scatter plot)
May involve complex models or
Complexity Usually simpler, easier to interpret
formulas
Output Format Numbers, coefficients, metrics Graphs, plots, charts
Used in deep analysis and model Used in initial exploration (EDA)
Stage of Analysis
building and final communication
Provides intuitive and visual
Accuracy Provides exact values
interpretation
Data Type Description Best Plots/Charts
Numerical Numbers with meaningful arithmetic (e.g., Histogram, box plot, scatter plot,
(Quantitative) age, salary, temperature) line chart, density plot
Can take any value in a range (e.g., height,
➤ Continuous Line chart, histogram, scatter plot
weight, income)
Takes fixed values (e.g., number of children,
➤ Discrete Bar chart, pie chart, strip plot
count of visits)
Categorical Describes categories or labels (e.g., gender,
Bar chart, pie chart, count plot
(Qualitative) city, product type)
No inherent order (e.g., color, department
➤ Nominal Bar chart, pie chart
name)
Has a logical order (e.g., low/medium/high,
➤ Ordinal Bar chart, stacked bar chart
rating scales)
Data indexed in time order (e.g., daily sales, Line chart, area chart, time-series
Time Series
stock prices over months) plot
Boolean True/False or Yes/No values Bar chart, count plot
Maps (choropleth, scatter geo,
Geospatial Coordinates, regions, or location data
heatmap), point maps
In Python, for plotting purposes—especially with libraries like Matplotlib, Seaborn, or Plotly—the
most used data types are:
1. Lists: Basic and flexible, Common for simple plots.
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
2.NumPy Arrays: Preferred for mathematical and performance reasons.
import numpy as np
x = np.array([1, 2, 3, 4])
y = np.array([10, 20, 25, 30])
3. Pandas Series: Ideal for labelled data or time series.
import pandas as pd
df = pd.DataFrame({
'x': [1, 2, 3, 4],
'y': [10, 20, 25, 30]
})
● 4. Dictionary (for specific use cases): Sometimes used for pie charts or bar plots.
python
data = {'Apples': 10, 'Bananas': 15, 'Cherries': 7}
Q11. What is statistics?
Ans.11- Statistics is the field of study that involves collecting, organizing, analysing, interpreting,
and presenting data to make decisions or draw conclusions.
In simple words, statistics helps us make sense of data — whether it is figuring out an average,
understanding patterns, or making predictions based on past information.
Data Type When Used
list Simple plotting and small data
numpy.ndarray Mathematical operations and high performance
pandas.Series/DataFrame Label-based and structured data
Category-based plots like pie/bar charts
dict
Q12. What is statistical analysis? State some of the key components of Statistical Analysis.
Ans.12 - It is the process of collecting, exploring, summarizing, interpreting, and presenting data to
discover underlying patterns, trends, relationships, and insights.
Note: - It forms the backbone of data-driven decision-making in data science, business, healthcare,
economics, and many other fields.
Key components of Statistical Analysis are as follows: -
1.Descriptive Statistics- It summarizes and describe features of a dataset. It mainly includes mean,
medium and mode, Standard Deviation, Variance, Minimum, Maximum, Skewness, Kurtosis.
2. Inferential Statistics- It make predictions or inferences about a population
based on a sample. It includes Hypothesis Testing (t-test, z-test, chi-square
test), Confidence Intervals, Regression Analysis ANOVA (Analysis of
Variance).
3. Exploratory Data Analysis (EDA)- It posses Visual and statistical
exploration of data to find patterns or anomalies.
Tools: Histograms, Box plots, Scatter plots, Correlation matrix
4. Predictive Analytics (based on statistical modeling)
Purpose: Forecast future trends.
Tools:
o Linear & Logistic Regression
o Time Series Analysis
o Classification and Clustering (e.g., KNN, K-means)
Prescriptive Analytics
Uses statistical techniques and optimization to recommend actions.
Simple stats
import pandas as pd
data = [12, 15, 14, 10, 8, 13, 15, 16, 14, 10]
df = pd.DataFrame(data, columns=['Scores'])
print("Mean:", df['Scores'].mean())
print("Standard Deviation:", df['Scores'].std())
print("Median:", df['Scores'].median())
Q 13. Differences between statistical analysis and non- statistical analysis
Aspect Statistical Analysis Non-Statistical Analysis
Involves analyzing data using mathematical Involves qualitative or logical reasoning,
techniques, particularly probability and visual inspection, or descriptive
Definition
statistics, to make inferences or draw examination without using formal
conclusions. statistical methods.
Can use qualitative or quantitative
Primarily uses quantitative data (numbers,
Data Type data, often focusing on non-numerical
measurements).
insights.
Statistical software like R, SPSS, Python (with Tools like descriptive tables, reports,
Tools Used libraries like pandas, NumPy, scipy), Excel diagrams, text analysis tools, or simply
(with formulas/statistics). human judgment.
Regression analysis, hypothesis testing, SWOT analysis, trend observation
Examples correlation, standard deviation, t-tests, without metrics, thematic analysis,
ANOVA. heuristic evaluation.
Generally more objective; relies on numerical Often more subjective; may depend on
Objectivity
evidence and probabilities. the analyst’s interpretation or intuition.
To understand context, categorize
To find patterns, test hypotheses, estimate information, summarize findings, or
Purpose
parameters, or predict outcomes using data. generate insights without strict
mathematical models.
Aspect Statistical Analysis Non-Statistical Analysis
Results are harder to replicate or verify
Accuracy & Results can be tested and replicated without formal methods.
Reliability statistically.
Q14. State the major categories of statistics.
Ans.14 The major categories of statistics are typically divided into two broad branches:
1. Descriptive Statistics
This branch deals with summarizing and organizing data so it can be easily understood.
Key Features:
● Focuses on what has happened.
● Does not draw conclusions beyond the data.
Common Techniques:
● Measures of Central Tendency:
o Mean, Median, Mode
● Measures of Dispersion:
o Range, Variance, Standard Deviation, Interquartile Range
● Data Visualization:
o Histograms, Pie Charts, Box Plots, Bar Charts
● Tabulation:
o Frequency distributions, Cross-tabulation
Example:
"The average score of students in a test is 74 out of 100."
🔹 2. Inferential Statistics
This branch involves making predictions or generalizations about a population based on a sample.
Key Features:
● Makes inferences and decisions about a population.
● Involves uncertainty and uses probability theory.
Common Techniques:
● Estimation:
o Confidence intervals
● Hypothesis Testing:
o t-test, z-test, ANOVA, chi-square test
● Regression Analysis:
o Linear & logistic regression
● Correlation Analysis:
o Pearson and Spearman correlation
● Example:
● "Based on a sample of 100 students, we are 95% confident that the average
test score for all students is between 72 and 76."