0% found this document useful (0 votes)

13 views

Data Science

Cufucuucuucuc

Uploaded by

developer adarsh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Data Science

Cufucuucuucuc

Uploaded by

developer adarsh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Data Science

https://yashnote.notion.site/Data-Science-1180e70e8a0f80bbbfa2fdee5d1f1d85?
pvs=4
Unit 1
Introduction to Data Science
Difference among AI, Machine Learning, and Data Science
Comparison of AI, ML, and Data Science:
Basic Introduction of Python
Key Features of Python:
Common Use Cases of Python:
Python for Data Science
1. Pandas
2. NumPy
3. Scikit-learn
4. Data Visualization
5. Advanced Python Concepts for Data Science
Introduction to Google Colab
Key Features of Google Colab:
Use Cases of Google Colab:
Popular Dataset Repositories
Discussion on Some Datasets:
Data Pre-processing
Python Example: Data Cleaning (Handling Missing Values)
Data Scales
Python Example: Encoding Ordinal Data
Similarity and Dissimilarity Measures
Python Example: Cosine and Euclidean Similarity
Sampling and Quantization of Data
Sampling:
Quantization:
Python Example: Random Sampling and Quantization
Filtering
Python Example: Moving Average and Median Filter
Data Transformation
Python Example: Data Normalization and Log Transformation

Data Science 1
Data Merging
Python Example: Merging DataFrames
Data Visualization
Python Example: Basic Data Visualization using matplotlib
Principal Component Analysis (PCA)
Python Example: PCA in Python
Correlation
Python Example: Calculating Correlation
Chi-Square Test
Python Example: Chi-Square Test
Summary
Unit 2
Regression Analysis
Linear Regression
Python Example: Simple Linear Regression
Generalized Linear Models (GLM)
Python Example: Logistic Regression
Regularized Regression
Python Example: Ridge and Lasso Regression
Summary of Key Concepts
Cross-Validation
Types of Cross-Validation:
Python Example: K-Fold Cross-Validation
Training and Testing Data Set
Python Example: Train-Test Split
Overview of Nonlinear Regression
Python Example: Nonlinear Regression (Polynomial Regression)
Overview of Ridge Regression
Advantages:
Python Example: Ridge Regression
Summary of Key Concepts
Latent Variables
Examples:
Structural Equation Modeling (SEM)
Key Components of SEM:
Python Libraries for SEM:
Python Example: Factor Analysis (Latent Variable)
Factor Analysis Example (Latent Variables Extraction)

Data Science 2
SEM Example Using semopy
Structural Equation Model Example:
Explanation:
Summary of Key Concepts

Unit 1
Introduction to Data Science
Data Science is a multidisciplinary field that combines statistics, computer
science, mathematics, and domain-specific knowledge to extract insights and
knowledge from structured and unstructured data. Data Science applies scientific
methods, processes, algorithms, and systems to analyze vast amounts of data
and generate actionable insights. In today's world, where data is generated in
massive volumes from various sources such as social media, business
transactions, IoT devices, etc., Data Science plays a critical role in making sense
of that data.
Key Aspects of Data Science:

1. Data Collection: Gathering data from various sources (web scraping, APIs,
surveys, sensors, etc.).

2. Data Cleaning: Data often contains noise, missing values, and

inconsistencies, which need to be addressed through data cleaning
techniques.

3. Data Exploration and Analysis: Exploratory Data Analysis (EDA) involves

visualizing and summarizing the key properties of the data.

4. Statistical Analysis: Using statistics and probability to interpret data patterns,

trends, and relationships.

5. Data Modeling: Applying algorithms and machine learning models to make

predictions or discover insights from the data.

6. Data Visualization: Presenting data in visual formats (graphs, charts, etc.) to

communicate findings to stakeholders.

Skills Required for Data Science:

Data Science 3
1. Mathematics and Statistics: Understanding of concepts like probability,
distributions, hypothesis testing, linear algebra, etc.

2. Programming: Expertise in programming languages like Python, R, and SQL

for data manipulation and analysis.

3. Machine Learning: Knowledge of machine learning algorithms like linear

regression, decision trees, clustering, etc.

4. Data Wrangling and Cleaning: Ability to preprocess data, handle missing data,
and deal with data inconsistencies.

5. Data Visualization: Familiarity with tools like Matplotlib, Seaborn, Tableau, or

Power BI to create meaningful visualizations.

Applications of Data Science:

Healthcare: Predictive modeling for disease outbreaks, personalized

medicine, medical image analysis.

Finance: Fraud detection, algorithmic trading, risk management.

Retail: Recommendation engines, inventory management, market analysis.

Entertainment: Recommendation systems in streaming services, content

analysis.

Transportation: Route optimization, self-driving cars, traffic prediction.

Data Science is essentially a combination of statistics, domain expertise, and

computer science to interpret large-scale data. It is vital in decision-making
processes in various sectors such as business, healthcare, finance, and
government. With advancements in big data technologies and AI, Data Science is
a field with immense growth potential.

Difference among AI, Machine Learning, and Data

Science
Artificial Intelligence (AI):
AI is a broader concept that refers to machines or systems that mimic human
intelligence to perform tasks. It involves creating systems that can perceive their
environment and take actions to achieve specific goals. AI encompasses various

Data Science 4
subfields like Natural Language Processing (NLP), computer vision, robotics, and
more.

Key Points:

Goal of AI: To simulate human intelligence in machines.

Techniques in AI: Search algorithms, expert systems, neural networks, etc.

Types of AI:

Narrow AI: AI systems designed for specific tasks (e.g., Siri, Alexa,
recommendation engines).

General AI: A theoretical concept where machines would possess the

ability to perform any cognitive task that a human can.

Super AI: A future concept where AI surpasses human intelligence.

Examples:

Self-driving cars (AI-driven vehicles).

Image recognition software (AI-based vision systems).

Machine Learning (ML):

Machine Learning is a subset of AI that involves the development of algorithms
that allow computers to learn patterns and make decisions without being explicitly
programmed. ML systems improve their performance over time by learning from
data.

Key Points:

Goal of ML: To enable machines to learn from data and improve with
experience.

Techniques in ML: Supervised learning, unsupervised learning, reinforcement

learning, etc.

Types of ML:

Supervised Learning: The algorithm is trained on labeled data (e.g.,

classification, regression).

Unsupervised Learning: The algorithm is used to find hidden patterns in

unlabeled data (e.g., clustering, association).

Data Science 5
Reinforcement Learning: The model learns through trial and error to
maximize rewards.

Examples:

Spam detection in emails (Supervised ML).

Customer segmentation (Unsupervised ML).

Data Science:
Data Science is a more comprehensive field that integrates AI, ML, and other tools
to work with data in various forms. It focuses on extracting insights and
knowledge from data using a mix of statistics, algorithms, and domain knowledge.
While AI and ML are tools used in Data Science, Data Science is concerned with
the entire data lifecycle from collection to insight generation.

Key Points:

Goal of Data Science: To extract actionable insights from large datasets using
a mix of techniques.

Techniques in Data Science: Data wrangling, data visualization, machine

learning, and statistical analysis.

Scope: Data Science includes AI, ML, and various other techniques like data
mining and business intelligence.

Examples:

Analyzing sales data to predict future trends (Data Science using ML

algorithms).

Building recommendation engines for e-commerce platforms (Data Science

using AI and ML).

Comparison of AI, ML, and Data Science:

Aspect Artificial Intelligence Machine Learning Data Science

Field of creating Subfield of AI focused Broad field focusing on

Definition
intelligent machines on learning data insights

Scope Very broad, includes Narrower, focused on Comprehensive,

ML and more learning from data includes ML, AI, and

Data Science 6
more

Simulate human Learn patterns from Extract insights from

Objective
intelligence data data

Neural networks, NLP, Supervised, Data wrangling,

Techniques
etc. unsupervised learning visualization, ML

Robotics, game Spam filters,

Market analysis, fraud
Applications playing, virtual recommendation
detection
assistants systems

In conclusion, AI is the overarching concept that aims to create intelligent

systems, Machine Learning is a subset of AI that focuses on algorithms capable of
learning from data, and Data Science is a broader discipline that leverages AI and
ML, along with statistics and other techniques, to extract insights from data.

Basic Introduction of Python

Python is a high-level, interpreted, general-purpose programming language. It
was created by Guido van Rossum and first released in 1991. Python emphasizes
code readability and simplicity, making it an ideal choice for beginners and
professionals alike. Its extensive libraries and frameworks make it highly versatile,
used across various domains, including web development, data analysis, artificial
intelligence, and scientific computing.

Key Features of Python:

1. Easy Syntax: Python has a clear and straightforward syntax that resembles
plain English, making it easier to learn and write code.

2. Interpreted Language: Python code is executed line by line, which allows for
interactive debugging.

3. Dynamically Typed: Variables in Python don’t need an explicit declaration, as

the type is inferred during execution.

4. Cross-Platform: Python is available on multiple platforms like Windows, Linux,

and macOS.

5. Extensive Libraries: Python has a vast collection of standard libraries and

external packages for different tasks (e.g., NumPy for numerical

Data Science 7
computations, Pandas for data manipulation, Matplotlib for data visualization,
etc.).

6. Object-Oriented and Functional: Python supports both object-oriented

programming (OOP) and functional programming paradigms.

7. Community Support: Python has a large, active community that continually

contributes to its development and provides support via forums and
documentation.

Common Use Cases of Python:

Web Development: Using frameworks like Django, Flask.

Data Science and Machine Learning: With libraries like Pandas, NumPy,
Scikit-learn, TensorFlow.

Automation/Scripting: Writing scripts for automating tasks like file

management, data scraping, etc.

Scientific Computing: Python is widely used in academic and research

settings for simulations, data analysis, and scientific computation.

Python for Data Science

1. Pandas
Pandas is a powerful library for data manipulation and analysis in Python.

1.1 Advanced DataFrame Operations

Grouping and Aggregation

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo',

'bar'],
'B': [1, 2, 3, 4, 5, 6],
'C': [2.0, 5., 8., 1., 2., 9.]})

Data Science 8
grouped = df.groupby('A').agg({'B': 'sum', 'C': 'mean'})

Pivot Tables

pivoted = df.pivot_table(values='B', index='A', columns='C',

aggfunc='sum')

Merging and Joining

df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],

'A': ['A0', 'A1', 'A2', 'A3']})
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'B': ['B0', 'B1', 'B2', 'B3']})
merged = pd.merge(df1, df2, on='key')

1.2 Time Series Analysis

dates = pd.date_range('20230101', periods=6)

ts = pd.Series(range(6), index=dates)
resampled = ts.resample('2D').sum()

1.3 Handling Missing Data

df = pd.DataFrame({'A': [1, 2, np.nan, 4],

'B': [5, np.nan, np.nan, 8],
'C': [9, 10, 11, 12]})
filled = df.fillna(method='ffill')
interpolated = df.interpolate()

2. NumPy
NumPy is fundamental for numerical computing in Python.

2.1 Advanced Array Operations

Data Science 9
import numpy as np

# Broadcasting
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([10, 20, 30])
result = a + b # b is broadcast to match a's shape

# Fancy indexing
x = np.arange(10)
indices = [2, 5, 8]
selected = x[indices]

2.2 Vectorization

def sigmoid(x):
return 1 / (1 + np.exp(-x))

x = np.linspace(-10, 10, 100)

y = sigmoid(x) # Vectorized operation

3. Scikit-learn
Scikit-learn is a machine learning library for Python.

3.1 Pipeline Creation

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

pipe = Pipeline([
('scaler', StandardScaler()),
('svm', SVC())
])

Data Science 10
pipe.fit(X_train, y_train)
predictions = pipe.predict(X_test)

3.2 Cross-Validation

from sklearn.model_selection import cross_val_score

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()
scores = cross_val_score(rf, X, y, cv=5)

4. Data Visualization
4.1 Matplotlib

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.plot(x, y, 'r-', label='Data')
plt.title('Sample Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()

4.2 Seaborn

import seaborn as sns

sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
sns.scatterplot(x="total_bill", y="tip", hue="time", data=tip
s)
plt.show()

Data Science 11
5. Advanced Python Concepts for Data Science
5.1 List Comprehensions and Generator Expressions

# List comprehension
squares = [x**2 for x in range(10)]

# Generator expression
sum_of_squares = sum(x**2 for x in range(1000000))

5.2 Lambda Functions

df['new_column'] = df['old_column'].apply(lambda x: x*2 if x

> 0 else x)

5.3 Map, Filter, and Reduce

from functools import reduce

numbers = [1, 2, 3, 4, 5]
squared = list(map(lambda x: x**2, numbers))
evens = list(filter(lambda x: x % 2 == 0, numbers))
product = reduce(lambda x, y: x * y, numbers)

These concepts and libraries form the core of Python's data science ecosystem,
providing powerful tools for data manipulation, analysis, and visualization.

Introduction to Google Colab

Google Colab (Colaboratory) is a free, cloud-based Jupyter notebook
environment that allows users to write and execute Python code in their browsers.
Colab is particularly useful for data science and machine learning projects due to
its ability to leverage powerful hardware like GPUs (Graphics Processing Units)
and TPUs (Tensor Processing Units) for computation.

Key Features of Google Colab:

Data Science 12
1. Cloud-Based: No installation is required. Notebooks are stored in Google
Drive, and you can access them from anywhere.

2. Free GPU/TPU Access: Colab provides free access to GPUs and TPUs, which
are vital for high-performance tasks like deep learning.

3. Pre-installed Libraries: Colab comes with many popular libraries like

TensorFlow, PyTorch, Pandas, NumPy, and Scikit-learn already installed.

4. Jupyter Notebook Interface: Colab uses the familiar Jupyter Notebook

interface, allowing you to write, visualize, and execute Python code
interactively.

5. Integration with Google Drive: You can save and load datasets and notebooks
directly to and from Google Drive.

6. Collaboration: Similar to Google Docs, Colab supports real-time collaboration,

enabling multiple users to work on the same notebook simultaneously.

7. Markdown and LaTeX Support: Colab allows for the inclusion of Markdown
and LaTeX (for writing mathematical equations) alongside code.

Use Cases of Google Colab:

Data Science and Machine Learning: Due to its GPU and TPU support, Colab
is commonly used for training machine learning models.

Collaborative Research: Colab’s real-time collaboration feature makes it

suitable for teamwork and academic projects.

Educational Purposes: It's widely used by students and educators for learning
Python and machine learning without the need for local installation.

Prototyping and Experimentation: Researchers and developers use Colab to

quickly prototype and test machine learning models.

Popular Dataset Repositories

Datasets are crucial for training, testing, and evaluating models in machine
learning and data science projects. Numerous repositories provide free access to
diverse datasets across various domains, such as healthcare, finance, image
recognition, and more. Here are some popular dataset repositories:

Data Science 13
1. Kaggle Datasets:

Website: https://www.kaggle.com/datasets

Kaggle is one of the largest platforms for data science competitions and
also hosts a wide range of datasets. Users can search for datasets by
category, size, or application domain.

Popular Datasets:

Titanic Survival Dataset: A well-known dataset for learning data

analysis and machine learning, focused on predicting the survival of
passengers on the Titanic.

MNIST Dataset: A large dataset of handwritten digits commonly used

for image classification.

COVID-19 Dataset: Datasets on COVID-19 cases and trends across

countries, regions, and demographics.

2. UCI Machine Learning Repository:

Website: https://archive.ics.uci.edu/ml/index.php

The UCI Machine Learning Repository is a popular destination for publicly

available datasets, widely used in machine learning research and
education.

Popular Datasets:

Iris Dataset: A classic dataset in machine learning, used for

classification problems involving flower species.

Wine Quality Dataset: Contains features related to wine composition

and helps predict wine quality.

Adult Dataset: Used for income classification based on demographic

attributes.

3. Google Dataset Search:

Website: https://datasetsearch.research.google.com/

Google’s Dataset Search allows users to find datasets across the web on
different platforms. It indexes datasets from a variety of sources such as

Data Science 14
academic journals, governmental agencies, and open data platforms.

4. Data.gov:

Website: https://www.data.gov/

Data.gov is a U.S. government website that provides access to open

datasets across various sectors such as agriculture, education, health, and
public safety.

Popular Datasets:

US Census Data: Comprehensive demographic data about the U.S.

population.

Crime Data: Data related to crimes across various U.S. cities and
states.

Environmental Data: Contains data on climate change, water quality,

and air pollution.

5. AWS Open Data Registry:

Website: https://registry.opendata.aws/

Amazon Web Services (AWS) hosts numerous open datasets for public
use, including datasets for satellite imagery, genomics, and machine
learning models.

Popular Datasets:

Amazon Reviews: A collection of product reviews from Amazon,

useful for NLP tasks.

NOAA Weather Data: Weather-related datasets that include historical

data and real-time monitoring.

SpaceNet Dataset: Satellite imagery datasets used for training models

in geospatial analysis and computer vision.

Discussion on Some Datasets:

1. Titanic Dataset:

Data Science 15
Description: The Titanic dataset contains information on passengers who
were aboard the Titanic when it sank. It includes features such as age,
sex, class, fare, and whether they survived.

Use Case: It is commonly used to teach binary classification algorithms.

The goal is to predict whether a passenger survived or not based on the
given features.

Analysis: With techniques like logistic regression or decision trees, you

can predict passenger survival probability, visualizing patterns in
demographics and survival.

2. MNIST Dataset:

Description: MNIST is a collection of 70,000 images of handwritten digits,

where each image is labeled with the corresponding digit.

Use Case: It is a benchmark dataset for image classification algorithms,

particularly in deep learning. It is widely used to test convolutional neural
networks (CNNs).

Analysis: The dataset is preprocessed and allows researchers to focus on

experimenting with different machine learning models. CNNs usually
achieve high accuracy rates on this dataset.

3. Iris Dataset:

Description: The Iris dataset includes features such as petal length, petal
width, sepal length, and sepal width for three species of Iris flowers.

Use Case: It is widely used for supervised learning tasks like

classification. The goal is to predict the species of the Iris flower based on
its features.

Analysis: With this dataset, techniques like k-Nearest Neighbors (k-NN) or

Support Vector Machines (SVM) can be applied to classify the flower
species.

4. Wine Quality Dataset:

Description: The dataset contains chemical features of different wine

samples, such as acidity, sugar content, and pH, along with their quality
score (ranging from 0 to 10).

Data Science 16
Use Case: It is used for regression problems, where the goal is to predict
the wine quality based on its features.

Analysis: Various regression techniques such as linear regression or

decision trees can be applied to predict wine quality and study how
different factors contribute to the overall quality.

In summary, Python and Google Colab are essential tools for data scientists,
offering powerful features for data analysis, machine learning, and scientific
computing. Popular dataset repositories like Kaggle, UCI, and Data.gov provide
valuable datasets that are commonly used for academic, research, and
commercial purposes. Understanding and analyzing these datasets is a critical
skill in data science.

Data Pre-processing
Data pre-processing is a critical step in the data analysis and machine learning
pipeline. It involves preparing raw data to make it suitable for further analysis or
model training. The quality of the data can significantly influence the performance
of machine learning models. Data pre-processing helps in handling missing
values, removing noise, scaling, transforming, and integrating data from multiple
sources.
Key steps in data pre-processing include:

1. Data Cleaning: Handling missing data, noise, and inconsistencies.

2. Data Integration: Combining data from multiple sources into a unified dataset.

3. Data Transformation: Scaling, normalizing, and converting data types to

ensure uniformity.

4. Data Reduction: Reducing the volume of data to make analysis more efficient
without losing important information.

5. Data Discretization: Converting continuous data into discrete intervals for

certain algorithms that require categorical data.

Example: If you have a dataset with missing values, you can fill them using the
mean, median, or mode of the available data (imputation). Alternatively, rows with
missing values can be removed if they are not critical.

Data Science 17
Python Example: Data Cleaning (Handling Missing Values)

import pandas as pd
import numpy as np

# Sample dataset
data = {'Age': [25, 30, np.nan, 22, np.nan], 'Salary': [5000
0, 54000, np.nan, 42000, 60000]}
df = pd.DataFrame(data)

# Fill missing values with the mean

df['Age'] = df['Age'].fillna(df['Age'].mean())
df['Salary'] = df['Salary'].fillna(df['Salary'].mean())
print(df)

Data Scales
Data can exist on different scales, which determine the type of statistical analysis
and machine learning techniques applicable to it. Understanding data scales is
vital for selecting the right methods for data processing.

1. Nominal Scale:

This is a categorical scale where values represent categories without any

order or ranking.

Example: Gender (Male, Female), colors (Red, Blue, Green).

Operations: Count, Mode.

2. Ordinal Scale:

This scale represents categories that have a meaningful order but no

precise difference between values.

Example: Ratings (Excellent, Good, Fair, Poor), ranking in a race (1st, 2nd,
3rd).

Operations: Median, Percentile.

Data Science 18
3. Interval Scale:

In this scale, the intervals between values are meaningful, but there is no
true zero point. Differences are consistent.

Example: Temperature in Celsius or Fahrenheit, dates on a calendar.

Operations: Addition, Subtraction, Mean, Standard Deviation.

4. Ratio Scale:

This scale has all the characteristics of the interval scale, with a true zero
point that indicates the absence of the quantity being measured.

Example: Height, weight, age, income.

Operations: Multiplication, Division.

Python Example: Encoding Ordinal Data

from sklearn.preprocessing import OrdinalEncoder

# Example of ordinal data: education levels

education_levels = [['High School'], ['Bachelor'], ['Maste
r'], ['PhD']]

# Ordinal encoding
encoder = OrdinalEncoder(categories=[['High School', 'Bachelo
r', 'Master', 'PhD']])
encoded_education = encoder.fit_transform(education_levels)
print(encoded_education)

Similarity and Dissimilarity Measures

Similarity and dissimilarity measures are used to quantify how similar or different
two data points (or sets of data) are. These measures are critical for tasks such as
clustering, classification, and recommendation systems.

Data Science 19
Python Example: Cosine and Euclidean Similarity

from sklearn.metrics.pairwise import cosine_similarity

from scipy.spatial.distance import euclidean

# Example vectors
vector_a = [1, 0, -1]
vector_b = [0, 1, 0]

# Cosine similarity

Data Science 20
cos_sim = cosine_similarity([vector_a], [vector_b])
print("Cosine Similarity:", cos_sim)

# Euclidean distance
euc_dist = euclidean(vector_a, vector_b)
print("Euclidean Distance:", euc_dist)

Sampling and Quantization of Data

Sampling:
Sampling refers to the process of selecting a subset of data from a larger dataset.
It’s particularly important when working with large datasets, as it allows for faster
computation and analysis.

1. Random Sampling: Each data point has an equal probability of being selected.

2. Stratified Sampling: The population is divided into homogeneous subgroups

(strata), and samples are taken from each subgroup proportionally.

3. Systematic Sampling: Data points are selected at regular intervals from the
dataset.

Quantization:
Quantization involves converting continuous data into discrete values or levels.

1. Scalar Quantization: Converts continuous variables into discrete values by

mapping them to quantization intervals.

Python Example: Random Sampling and Quantization

import numpy as np

# Random sampling
data = np.arange(1, 101)
sample = np.random.choice(data, size=10, replace=False)
print("Random Sample:", sample)

Data Science 21
# Quantization (Bin data into 5 levels)
quantized_data = np.digitize(data, bins=[20, 40, 60, 80])
print("Quantized Data:", quantized_data)

Filtering
Filtering is a technique used to remove or reduce noise from a dataset. It is an
essential step in data pre-processing, especially in signal processing and time-
series data. The goal is to smooth the data or remove outliers that can skew the
results of your analysis.

1. Moving Average Filter: Averages the data points over a sliding window,
helping to smooth out short-term fluctuations.

2. Median Filter: Replaces each data point with the median of neighboring
points, often used for outlier removal.

Python Example: Moving Average and Median Filter

import numpy as np
import pandas as pd
from scipy.ndimage import median_filter

# Sample time-series data

data = pd.Series([10, 12, 11, 13, 20, 15, 14, 13, 15, 18, 19,
25])

# Moving average filter (window size = 3)

moving_avg = data.rolling(window=3).mean()
print("Moving Average Filter:\\n", moving_avg)

# Median filter
median_filt = pd.Series(median_filter(data, size=3))
print("Median Filter:\\n", median_filt)

Data Science 22
Data Transformation
Data transformation is the process of converting data into a format suitable for
analysis. This can involve scaling, normalizing, encoding categorical data, or
transforming features to reduce skewness.

1. Normalization: Rescaling data to a range of [0, 1].

2. Standardization: Scaling data so that it has a mean of 0 and a standard

deviation of 1.

3. Logarithmic Transformation: Useful for handling skewed data by applying a

logarithmic function.

Python Example: Data Normalization and Log Transformation

from sklearn.preprocessing import MinMaxScaler, StandardScale

r
import numpy as np

# Sample data
data = np.array([[1, 2], [2, 4], [3, 6], [4, 8], [5, 10]])

# Normalization (Min-Max scaling)

scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)
print("Normalized Data:\\n", normalized_data)

# Log transformation
log_transformed = np.log(data + 1)
print("Log Transformed Data:\\n", log_transformed)

Data Merging
Data merging involves combining two or more datasets into a single dataset based
on a common attribute or key. Common merging operations include:

1. Concatenation: Appending datasets along rows or columns.

Data Science 23
2. Joining: Merging datasets based on a key (like SQL joins: inner, left, right, and
outer).

Python Example: Merging DataFrames

import pandas as pd

# Sample data
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob',
'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Score': [85, 90, 75]})

# Merge (Inner Join)

merged_df = pd.merge(df1, df2, on='ID', how='inner')
print("Merged Data (Inner Join):\\n", merged_df)

Data Visualization
Data visualization is a key aspect of data analysis as it helps to understand
patterns, trends, and relationships in the data. Common visualization techniques
include:

1. Line Plot: Useful for time-series data.

2. Bar Plot: Displays categorical data.

3. Histogram: Shows the distribution of continuous data.

4. Scatter Plot: Shows relationships between two variables.

Python Example: Basic Data Visualization using matplotlib

import matplotlib.pyplot as plt

import seaborn as sns

# Sample data
data = pd.DataFrame({
'Height': [150, 160, 170, 180, 190],

Data Science 24
'Weight': [50, 60, 70, 80, 90]
})

# Scatter plot for Height vs. Weight

plt.scatter(data['Height'], data['Weight'])
plt.title('Height vs Weight')
plt.xlabel('Height')
plt.ylabel('Weight')
plt.show()

# Histogram for Weight distribution

plt.hist(data['Weight'], bins=5)
plt.title('Weight Distribution')
plt.xlabel('Weight')
plt.ylabel('Frequency')
plt.show()

Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique used to reduce the number of
variables in a dataset while retaining most of the variation in the data. PCA
transforms the data into a new set of orthogonal components that capture the
variance of the data.
Steps in PCA:

1. Standardize the data.

2. Compute the covariance matrix.

3. Compute the eigenvectors and eigenvalues of the covariance matrix.

4. Project the data onto the principal components.

Python Example: PCA in Python

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

Data Science 25
import numpy as np

# Sample data
data = np.array([[1, 2], [2, 4], [3, 6], [4, 8], [5, 10]])

# Standardizing the data

scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# Applying PCA
pca = PCA(n_components=1) # Reducing to 1 principal componen
t
data_pca = pca.fit_transform(data_scaled)
print("PCA Transformed Data:\\n", data_pca)

Correlation
Correlation measures the strength and direction of a linear relationship between
two variables. It ranges from -1 to 1:

1: Perfect positive correlation

0: No correlation

1: Perfect negative correlation

Common correlation coefficients:

1. Pearson Correlation: Measures linear correlation between continuous

variables.

2. Spearman Correlation: Measures monotonic relationships (used for ordinal

data).

Python Example: Calculating Correlation

import pandas as pd

# Sample data

Data Science 26
data = pd.DataFrame({
'X': [1, 2, 3, 4, 5],
'Y': [2, 4, 6, 8, 10]
})

# Pearson correlation
correlation = data.corr(method='pearson')
print("Pearson Correlation:\\n", correlation)

Chi-Square Test
The Chi-Square test is used to determine if there is a significant association
between two categorical variables. It compares the observed frequencies with the
expected frequencies to test for independence.

Python Example: Chi-Square Test

import pandas as pd
from scipy.stats import chi2_contingency

# Contingency table for two categorical variables

data = pd.DataFrame({
'Gender': ['Male', 'Female', 'Female', 'Male', 'Male'],
'Purchased': ['Yes', 'No', 'Yes', 'Yes', 'No']

Data Science 27
})

# Create contingency table

contingency_table = pd.crosstab(data['Gender'], data['Purchas
ed'])

# Perform Chi-Square test

chi2, p, dof, expected = chi2_contingency(contingency_table)
print(f"Chi-Square Statistic: {chi2}, p-value: {p}")

Summary
Filtering: Smooths and cleans data using techniques like moving average and
median filters.

Data Transformation: Rescales, normalizes, or logs data to prepare it for

analysis.

Data Merging: Combines datasets using joins or concatenation.

Data Visualization: Visualizes data trends using plots like scatter plots,
histograms, and bar charts.

PCA: Reduces dimensionality by projecting data onto principal components.

Correlation: Measures the linear relationship between variables.

Chi-Square Test: Tests the association between two categorical variables.

All these concepts are critical to understanding how to process, analyze, and draw
insights from data, and Python provides powerful libraries like pandas , numpy , and
matplotlib to handle these tasks.

Unit 2
Regression Analysis
Regression analysis is a statistical technique used to model and analyze the
relationship between a dependent variable (target) and one or more independent

Data Science 28
variables (features). The goal of regression is to predict or explain the dependent
variable based on the given independent variables.
Types of regression analysis:

1. Linear Regression: Models a linear relationship between the dependent and

independent variables.

2. Generalized Linear Models (GLM): Extends linear regression to model non-

normal data (e.g., logistic regression for binary outcomes).

3. Regularized Regression: Enhances linear regression by adding penalty terms

to control overfitting, such as Ridge and Lasso regression.

Linear Regression

The objective of linear regression is to minimize the sum of squared residuals

between the actual and predicted values of y.

Python Example: Simple Linear Regression

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

Data Science 29
# Sample data (simple linear relationship)
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, tes
t_size=0.2, random_state=42)

# Linear regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Predicting on test data

y_pred = model.predict(X_test)

# Plotting the regression line

plt.scatter(X, y, color='blue')
plt.plot(X, model.predict(X), color='red')
plt.title('Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.show()

# Model parameters
print("Intercept:", model.intercept_)
print("Coefficient:", model.coef_)

Generalized Linear Models (GLM)

Generalized Linear Models (GLMs) extend linear regression to handle non-normal
response distributions. In GLMs, the relationship between the independent
variables \( X \) and the dependent variable \( y \) is modeled through a link
function, which connects the linear predictor to the mean of the distribution.
Common types of GLMs:

Data Science 30
2. Poisson Regression: For count data, using the log link function.

Python Example: Logistic Regression

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sample data for binary classification

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([0, 0, 1, 1, 1]) # Binary outcomes

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, tes
t_size=0.2, random_state=42)

# Logistic regression model

log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Predicting on test data

y_pred = log_reg.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Data Science 31
# Probability of class 1
print("Predicted probabilities:\\n", log_reg.predict_proba(X_
test))

Regularized Regression
Regularized regression methods help prevent overfitting by adding a penalty term
to the loss function in the linear regression model. The most common forms of
regularized regression are:

3. Elastic Net Regression:

Elastic Net is a combination of L1 (Lasso) and L2 (Ridge) penalties.

Python Example: Ridge and Lasso Regression

Data Science 32
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import train_test_split

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, tes
t_size=0.2, random_state=42)

# Ridge Regression (L2 regularization)

ridge_reg = Ridge(alpha=1.0) # Alpha is the regularization s
trength
ridge_reg.fit(X_train, y_train)
y_pred_ridge = ridge_reg.predict(X_test)
print("Ridge Regression Predictions:", y_pred_ridge)

# Lasso Regression (L1 regularization)

lasso_reg = Lasso(alpha=0.1)
lasso_reg.fit(X_train, y_train)
y_pred_lasso = lasso_reg.predict(X_test)
print("Lasso Regression Predictions:", y_pred_lasso)

Summary of Key Concepts

1. Linear Regression:

Assumes a linear relationship between the dependent and independent

variables.

Useful for predicting continuous outcomes.

2. Generalized Linear Models (GLMs):

Extends linear regression to non-normal data distributions.

Data Science 33
Common types include logistic regression (for binary classification) and
Poisson regression (for count data).

3. Regularized Regression:

Helps prevent overfitting by adding penalty terms to the loss function.

Ridge (L2) adds squared coefficients as a penalty.

Lasso (L1) adds absolute values of coefficients as a penalty, promoting

sparsity.

Elastic Net combines both L1 and L2 regularization.

These techniques are fundamental in machine learning and statistical modeling for
solving various prediction and classification problems.

Cross-Validation
Cross-validation is a model evaluation technique that helps assess how well a
machine learning model will generalize to unseen data. Instead of splitting the
dataset into just training and testing sets, cross-validation divides the data into
multiple subsets (folds) and trains the model multiple times, each time using a
different subset for validation and the rest for training.

Types of Cross-Validation:
1. K-Fold Cross-Validation: The data is split into k equal-sized subsets (folds).
The model is trained k times, each time using k-1 folds for training and the
remaining fold for validation. The final result is the average of the results from
the k iterations.

2. Stratified K-Fold: Similar to K-Fold, but ensures each fold has a representative
proportion of classes for classification tasks.

3. Leave-One-Out Cross-Validation (LOOCV): A special case of K-Fold where k

is equal to the number of samples, i.e., each sample gets used as a validation
set once.

Python Example: K-Fold Cross-Validation

Data Science 34
from sklearn.model_selection import KFold, cross_val_score
from sklearn.linear_model import LinearRegression
import numpy as np

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# KFold Cross-Validation
kf = KFold(n_splits=3)
model = LinearRegression()

# Cross-validation scores (R-squared)

scores = cross_val_score(model, X, y, cv=kf)
print("Cross-validation scores:", scores)
print("Average R-squared score:", np.mean(scores))

Training and Testing Data Set

In machine learning, it is crucial to evaluate model performance on data that was
not used during the training phase. The dataset is typically divided into two parts:

1. Training Set: Used to train the machine learning model. The model learns the
relationships between the input features and the target variable.

2. Testing Set: Used to evaluate the model's performance on unseen data. The
testing set is used to assess how well the model generalizes to new, unseen
examples.

Splitting the dataset is typically done in a ratio, such as 70% for training and 30%
for testing. In cases where the dataset is large, an additional validation set may
also be used for hyperparameter tuning.

Python Example: Train-Test Split

Data Science 35
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Split data into 70% training and 30% testing

X_train, X_test, y_train, y_test = train_test_split(X, y, tes
t_size=0.3, random_state=42)

# Linear regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Prediction and evaluation on the test set

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error on Test Set:", mse)

Overview of Nonlinear Regression

Nonlinear regression is used when the relationship between the dependent
variable and one or more independent variables is not linear. Unlike linear
regression, nonlinear regression fits a nonlinear function (e.g., polynomial,
exponential, logarithmic) to the data.

Data Science 36
Python Example: Nonlinear Regression (Polynomial Regression)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# Sample nonlinear data

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 4, 9, 16, 25]) # Quadratic relationship

# Polynomial transformation (degree = 2)

poly_model = make_pipeline(PolynomialFeatures(degree=2), Line
arRegression())
poly_model.fit(X, y)

# Predict on new data

y_pred = poly_model.predict(X)

# Plotting the results

plt.scatter(X, y, color='blue')
plt.plot(X, y_pred, color='red')
plt.title('Nonlinear Regression (Polynomial)')
plt.xlabel('X')
plt.ylabel('y')
plt.show()

Data Science 37
print("Predicted values:\\n", y_pred)

Overview of Ridge Regression

Ridge regression is a type of regularized linear regression that adds an L2 penalty
term to the cost function. This penalty term helps to shrink the coefficients and
prevents overfitting by discouraging the model from fitting the training data too
closely.

Advantages:
Reduces model complexity and prevents overfitting.

Can handle multicollinearity (when independent variables are highly

correlated).

Python Example: Ridge Regression

from sklearn.linear_model import Ridge

from sklearn.metrics import mean_squared_error

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

Data Science 38
# Ridge Regression (alpha = regularization strength)
ridge_reg = Ridge(alpha=1.0)
ridge_reg.fit(X, y)

# Predict on the same data

y_pred = ridge_reg.predict(X)
mse_ridge = mean_squared_error(y, y_pred)

# Results
print("Ridge Regression Predictions:", y_pred)
print("Mean Squared Error:", mse_ridge)

Summary of Key Concepts

1. Cross-Validation:

Helps assess the model's performance by splitting the dataset into

multiple subsets.

K-Fold Cross-Validation is a popular method where the data is divided into

k folds and the model is trained k times.

2. Training and Testing Data Set:

Data is typically split into training and testing sets.

The training set is used to train the model, and the test set is used to
evaluate performance on unseen data.

3. Nonlinear Regression:

Used when the relationship between the dependent and independent

variables is not linear.

Polynomial regression is a common example of nonlinear regression.

4. Ridge Regression:

A type of regularized linear regression that adds an L2 penalty term.

Helps reduce overfitting by shrinking the coefficients.

Data Science 39
By understanding and implementing these regression techniques, you can better
model complex data relationships and create more robust predictive models.

Latent Variables
Latent variables are variables that are not directly observed but are inferred or
estimated from other observed variables. They are commonly used in fields such
as psychology, social sciences, and econometrics to represent abstract concepts
like intelligence, socioeconomic status, or customer satisfaction, which are not
directly measurable.

Examples:
Customer Satisfaction: Latent variables might include satisfaction or loyalty,
which are inferred from responses to survey questions.

Intelligence: Inferred from various measurable cognitive tests, but intelligence

itself is a latent variable.

Latent variables are often modeled using factor analysis or structural equation
modeling (SEM).

Structural Equation Modeling (SEM)

Structural Equation Modeling (SEM) is a statistical technique that combines
elements of factor analysis and multiple regression to examine complex
relationships between observed and latent variables. SEM allows researchers to
model relationships between:

1. Observed Variables: Measured directly (e.g., responses to a questionnaire).

2. Latent Variables: Inferred from observed variables (e.g., abstract traits like
"satisfaction").

3. Structural Relations: The cause-and-effect relationships between variables.

Key Components of SEM:

1. Measurement Model: Specifies how latent variables are measured by the
observed variables (similar to factor analysis).

Data Science 40
2. Structural Model: Specifies the relationships between latent variables (similar
to regression).

SEM is represented visually using path diagrams, where:

Squares represent observed variables.

Circles represent latent variables.

Arrows represent the relationships between variables (unidirectional arrows

for causal effects and bidirectional for correlations).

Python Libraries for SEM:

1. semopy : A Python library used to build and estimate SEM models.

2. statsmodels : For factor analysis.

Python Example: Factor Analysis (Latent Variable)

Factor analysis can be used to extract latent variables from a dataset of observed
variables. Below is an example of how to perform factor analysis to extract latent
factors using factor_analyzer library.

Factor Analysis Example (Latent Variables Extraction)

import pandas as pd
from factor_analyzer import FactorAnalyzer

# Example dataset (observed variables)

data = {
'Q1': [4, 5, 6, 7, 8],
'Q2': [2, 4, 5, 6, 7],
'Q3': [3, 5, 6, 7, 8],
'Q4': [1, 3, 4, 6, 7]
}
df = pd.DataFrame(data)

# Perform factor analysis to extract latent variables

fa = FactorAnalyzer(n_factors=1, rotation=None)

Data Science 41
fa.fit(df)

# Get the factor loadings (how much each observed variable co

ntributes to the latent variable)
factor_loadings = fa.loadings_
print("Factor Loadings:\\n", factor_loadings)

# Get the estimated latent variable scores for each observati

on
latent_variable_scores = fa.transform(df)
print("Latent Variable Scores:\\n", latent_variable_scores)

In this example, we assume that the observed variables (e.g., survey questions Q1
to Q4) are used to estimate a single latent factor.

SEM Example Using semopy

To perform SEM in Python, we can use the semopy library, which provides tools for
estimating structural equation models.

Structural Equation Model Example:

In this example, we'll specify an SEM model where latent variable "Satisfaction" is
inferred from observed variables (survey questions), and it impacts another latent
variable "Loyalty".

# Install semopy library first if not installed:

# pip install semopy

import pandas as pd
from semopy import Model, Optimizer

# Example dataset (observed variables)

data = {
'Q1': [4, 5, 6, 7, 8],
'Q2': [2, 4, 5, 6, 7],
'Q3': [3, 5, 6, 7, 8],

Data Science 42
'L1': [3, 4, 5, 6, 7],
'L2': [4, 5, 6, 6, 8],
'L3': [5, 6, 7, 7, 9]
}
df = pd.DataFrame(data)

# Define the SEM model

model_desc = """
# Latent variables
Satisfaction =~ Q1 + Q2 + Q3
Loyalty =~ L1 + L2 + L3

# Structural paths
Loyalty ~ Satisfaction
"""

# Build and optimize the SEM model

model = Model(model_desc)
opt = Optimizer(model)
opt.fit(df)

# Print model parameters (factor loadings, path coefficients)

print(model.inspect())

Explanation:
Satisfaction =~ Q1 + Q2 + Q3 : This line specifies that the latent variable
"Satisfaction" is inferred from the observed variables Q1, Q2, and Q3.

Loyalty =~ L1 + L2 + L3 : Similarly, the latent variable "Loyalty" is inferred from

L1, L2, and L3.

: This defines a structural path where "Loyalty" is

Loyalty ~ Satisfaction

influenced by "Satisfaction".

Summary of Key Concepts

Data Science 43
1. Latent Variables: These are abstract variables that are not directly observed
but are inferred from other measured variables. Latent variables are commonly
used to represent unobservable constructs like intelligence, satisfaction, or
economic status.

2. Structural Equation Modeling (SEM): A powerful statistical method for

examining relationships between observed and latent variables. SEM
combines factor analysis and regression to test complex relationships
between variables in a single model.

3. Python Code Examples:

Factor analysis can be used to extract latent variables.

semopy is a Python library that facilitates building and optimizing structural

equation models.

By using SEM and latent variables, we can model complex relationships in

datasets that involve unobservable concepts, leading to better understanding and
prediction in fields such as social sciences, marketing, and psychology.

Data Science 44

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Read People Like A Book by Patrick King-Edited
62% (65)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (44)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
36 Questions To Fall in Love 1
97% (31)
36 Questions To Fall in Love 1
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Sem Exercise v2.5
100% (1)
Sem Exercise v2.5
31 pages
Toward Sustainable Development and Consumption The Role of The Green Promotion Mix in Driving Gre (HIGHLIGHTED)
No ratings yet
Toward Sustainable Development and Consumption The Role of The Green Promotion Mix in Driving Gre (HIGHLIGHTED)
29 pages
Module 1 Applied Data Science 1.1 and 1.2
No ratings yet
Module 1 Applied Data Science 1.1 and 1.2
104 pages
2D Topics
No ratings yet
2D Topics
17 pages
data science notes 1
No ratings yet
data science notes 1
3 pages
9. Introduction to Emerging Technologies
No ratings yet
9. Introduction to Emerging Technologies
43 pages
data science
No ratings yet
data science
6 pages
DS-Unit-1_ABM
No ratings yet
DS-Unit-1_ABM
103 pages
How To Build AI
No ratings yet
How To Build AI
10 pages
Da QB Ans (GKJ)
No ratings yet
Da QB Ans (GKJ)
45 pages
387d7226-8e90-4a6c-802e-d4382964c288 Machine Learning and Data Analytics Frameworks
No ratings yet
387d7226-8e90-4a6c-802e-d4382964c288 Machine Learning and Data Analytics Frameworks
34 pages
Fd45092a Ccad 459e Bc18 b01536fd6bac Untitled
No ratings yet
Fd45092a Ccad 459e Bc18 b01536fd6bac Untitled
53 pages
Roadmap AI
No ratings yet
Roadmap AI
19 pages
Syllabus
No ratings yet
Syllabus
3 pages
Data Science New
No ratings yet
Data Science New
9 pages
ML Short U1-4
No ratings yet
ML Short U1-4
60 pages
A eye
No ratings yet
A eye
9 pages
Top Data Science Interview Questions and Answers in 2023 PDF
100% (1)
Top Data Science Interview Questions and Answers in 2023 PDF
14 pages
Da Ans (GKJ)
No ratings yet
Da Ans (GKJ)
11 pages
About Course & Syllabus Data Analytics Course
No ratings yet
About Course & Syllabus Data Analytics Course
4 pages
AI PYTHON
No ratings yet
AI PYTHON
7 pages
Machine Learning Masterclass 2023
No ratings yet
Machine Learning Masterclass 2023
6 pages
Da Unit Ii
No ratings yet
Da Unit Ii
25 pages
Module 2 - Aiml Part 1
No ratings yet
Module 2 - Aiml Part 1
37 pages
Class 8 Notes
No ratings yet
Class 8 Notes
5 pages
Mastering in Data Science 3RITPL
100% (1)
Mastering in Data Science 3RITPL
33 pages
Introduction to Data Science and Machine Learning
No ratings yet
Introduction to Data Science and Machine Learning
2 pages
Module 2 - PART 1
No ratings yet
Module 2 - PART 1
50 pages
ML Notes
No ratings yet
ML Notes
52 pages
L1_FA23_BST_AB_Spring_2025
No ratings yet
L1_FA23_BST_AB_Spring_2025
31 pages
Data Science Master Class 2023
No ratings yet
Data Science Master Class 2023
8 pages
Unit 1 - DA - Introduction To Big Data
No ratings yet
Unit 1 - DA - Introduction To Big Data
65 pages
Acp Artificial Intelligence Assignment by Kanhaiya Arora
No ratings yet
Acp Artificial Intelligence Assignment by Kanhaiya Arora
7 pages
Unit 1 - DA - Introduction To Data Science
No ratings yet
Unit 1 - DA - Introduction To Data Science
70 pages
Week11-AI ML DL
No ratings yet
Week11-AI ML DL
43 pages
DA Full
No ratings yet
DA Full
738 pages
Roadmap Geeksforgeeks
No ratings yet
Roadmap Geeksforgeeks
24 pages
Mastering in Data Science 3RITPL
No ratings yet
Mastering in Data Science 3RITPL
33 pages
Python Data Mastery Report
No ratings yet
Python Data Mastery Report
9 pages
Data Science Minimum
No ratings yet
Data Science Minimum
9 pages
ML - AI Roadmap
No ratings yet
ML - AI Roadmap
14 pages
Data Science Course Outline CES LUMS
No ratings yet
Data Science Course Outline CES LUMS
4 pages
Unit 3 of AI in Marketing
No ratings yet
Unit 3 of AI in Marketing
15 pages
AI and ML Notes
No ratings yet
AI and ML Notes
8 pages
Data Science With Python ML Course Syllabus
No ratings yet
Data Science With Python ML Course Syllabus
4 pages
AI Learning
No ratings yet
AI Learning
19 pages
Introduction to Python 1
No ratings yet
Introduction to Python 1
13 pages
MLT Unit-1
No ratings yet
MLT Unit-1
19 pages
Python Ds
No ratings yet
Python Ds
22 pages
Data Analytics TOC
No ratings yet
Data Analytics TOC
6 pages
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
No ratings yet
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
6 pages
Data-Science - Introduction
No ratings yet
Data-Science - Introduction
35 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
Data Science IQ
No ratings yet
Data Science IQ
79 pages
Python
No ratings yet
Python
9 pages
Data Analytics Introduction
No ratings yet
Data Analytics Introduction
9 pages
InterviewMaterial
No ratings yet
InterviewMaterial
14 pages
Difference Between Data Science and Machine Learning
No ratings yet
Difference Between Data Science and Machine Learning
5 pages
Module 4 Data Science
No ratings yet
Module 4 Data Science
42 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
6 pages
ml
No ratings yet
ml
3 pages
Data Science with Python: From Zero to Machine Learning
From Everand
Data Science with Python: From Zero to Machine Learning
Pouvo
No ratings yet
Data Science 1-5
No ratings yet
Data Science 1-5
15 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
Final Year Report 7th Sem
No ratings yet
Final Year Report 7th Sem
41 pages
Training Guidelines - CST Department
No ratings yet
Training Guidelines - CST Department
11 pages
Society 5.0 Smart City Framework Proposal
No ratings yet
Society 5.0 Smart City Framework Proposal
19 pages
Computers in Human Behavior: Sebastian Molinillo, Rafael Anaya-Sánchez, Francisco Liébana-Cabanillas
No ratings yet
Computers in Human Behavior: Sebastian Molinillo, Rafael Anaya-Sánchez, Francisco Liébana-Cabanillas
12 pages
Dissertation BIBS
No ratings yet
Dissertation BIBS
38 pages
Zhou 2020
No ratings yet
Zhou 2020
21 pages
Jram Shoe Cleaner: Submitted by
No ratings yet
Jram Shoe Cleaner: Submitted by
13 pages
Factors of Internal Corporate Social Responsibility and The Effect On Organizational Commitment
No ratings yet
Factors of Internal Corporate Social Responsibility and The Effect On Organizational Commitment
34 pages
Rasoolimanesh 2022 DAPJ 3 2 DiscriminantValidity-Proof-Corrected
No ratings yet
Rasoolimanesh 2022 DAPJ 3 2 DiscriminantValidity-Proof-Corrected
8 pages
The Impact of Digital Advertising and Social Media Intensity On Materialism: The Case of Algerian Young Consumers
No ratings yet
The Impact of Digital Advertising and Social Media Intensity On Materialism: The Case of Algerian Young Consumers
18 pages
Australasian Marketing Journal: Paul C.S. Wu, Gary Yeong-Yuh Yeh, Chieh-Ru Hsiao
No ratings yet
Australasian Marketing Journal: Paul C.S. Wu, Gary Yeong-Yuh Yeh, Chieh-Ru Hsiao
10 pages
1 s2.0 S0957178722000029 Main
No ratings yet
1 s2.0 S0957178722000029 Main
10 pages
cfabase
No ratings yet
cfabase
57 pages
Personality Measurement, Faking, and Employment Selection: Joyce Hogan, Paul Barrett, and Robert Hogan
No ratings yet
Personality Measurement, Faking, and Employment Selection: Joyce Hogan, Paul Barrett, and Robert Hogan
16 pages
Street Food Eating Habits in Bangladesh: A Study On Dhaka City
No ratings yet
Street Food Eating Habits in Bangladesh: A Study On Dhaka City
11 pages
Uniqlo Fashion Brands: The Role of Brand Love and The Impact of Brand Personality On Consumer Behavior
No ratings yet
Uniqlo Fashion Brands: The Role of Brand Love and The Impact of Brand Personality On Consumer Behavior
8 pages
Consequances youth and descriptive and CFA
No ratings yet
Consequances youth and descriptive and CFA
27 pages
Catmethods
No ratings yet
Catmethods
8 pages
Mehralian2016 PDF
No ratings yet
Mehralian2016 PDF
40 pages
Fostering A Safety Culture in Manufacturing Industry
No ratings yet
Fostering A Safety Culture in Manufacturing Industry
41 pages
Developing Teaching Materials in Early Childhood ICT Based Leraning Course Based On E-Learning
No ratings yet
Developing Teaching Materials in Early Childhood ICT Based Leraning Course Based On E-Learning
111 pages
Technology in Society: Ilhami Tuncer
No ratings yet
Technology in Society: Ilhami Tuncer
10 pages
Financial Literacy On Women's Economic Empowerment
No ratings yet
Financial Literacy On Women's Economic Empowerment
14 pages
Sri Vandayuli Riorini-Trisakti - Revisi Ok1
No ratings yet
Sri Vandayuli Riorini-Trisakti - Revisi Ok1
16 pages
Park 2018
No ratings yet
Park 2018
34 pages
Proceedings On Engineering Sciences: H. C. O. Unegbu D.S. Yawas B. Dan-Asabe
No ratings yet
Proceedings On Engineering Sciences: H. C. O. Unegbu D.S. Yawas B. Dan-Asabe
14 pages