0% found this document useful (0 votes)

87 views56 pages

DataScience Unit1 (+notes)

The document provides an introduction to data science. It discusses how data science uses scientific methods and algorithms to extract knowledge and insights from structured and unstructured data. It also outlines the typical lifecycle of a data science project, including defining a business problem, acquiring and preparing data, performing exploratory analysis, building models, and deploying solutions. Key concepts like machine learning, deep learning, and the different types of machine learning algorithms are also introduced.

Uploaded by

110 Krishna Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

87 views56 pages

DataScience Unit1 (+notes)

Uploaded by

110 Krishna Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 56

Introduction to Data Science

Dr. Aakanksha Sharaff

Department of Computer Science and Engineering
National Institute of Technology Raipur C.G. India
Data Science

 Interdisciplinary field that uses scientific methods, processes,

algorithms/techniques and systems to extract knowledge and insights from
structured and unstructured data
 Transforming data into some meaningful insights which can be used by
layman.
 E.g. supermarket
The Ascendance of Data

 Today’s world is drowning in data

 The Internet itself represents a huge graph of knowledge that contains
(among other things) an enormous cross-referenced encyclopedia; domain-
specific databases about movies, music, sports results, pinball machines,
memes, and cocktails
Role of internet

2004 Facebook

2005 Youtube

2010 Instagram

2011 Snapchat
Data Science Lifecycle

1 Business Problem
2 Data Acquisition
3 Data Preparation
4 Exploratory Data Analysis
5 Data Modeling
6 Visualization and Communication
7 Deployment and Maintainence
Business Problem

What is Answer Business

problem? Why??? Problem
Data Acquisition

 Web Servers
 Logs
 Databases
 API’s
 Online repository
Data Preparation

 Data Cleaning
Data Preparation

 Transformation
Exploratory Data Analysis

 Most Important Step in Data Science Lifecycle

 Select Feature Variables
Data Modeling

Machine Deep Learning

Learning
• SVM • CNN
• NB • RNN
• RF • LSTM
Data Modeling
Visualization and Communication

Tableau

PowerBI

Qlikview
Data Science Lifecycle

1 Business Problem
2 Data Acquisition
3 Data Preparation
4 Exploratory Data Analysis
5 Data Modeling
6 Visualization and Communication
7 Deployment and Maintainence
The Art of Data Science

 How Big Data is Changing the Whole Equation for Business,” Wall Street
Journal March 8, 2013
 Several "V"s of big data
Big Data
Big Data (Big Deal!)

Apache
Bigspark
Hadoop
Machine Learning

• Herbert Alexander Simon: “Learning is any

process by which a system improves
performance from experience.”
• Machine Learning is concerned with computer
programs that automatically improve their
performance through experience.
Herbert Simon
Turing Award 1975
Nobel Prize in Economics
1978
Machine Learning
❖ Machine learning is an application of Artificial Intelligence (AI) that
provides system the ability to automatically learn and improve from
experience without being explicitly programmed.

❖ Machine learning focuses on the development of computer programs

that can access data and use it learn for themselves.

❖ The process of learning begins with observations or data, such as

examples, direct experience, or instruction, in order to look for patterns in
data and make better decisions in the future based on the examples that we
provide.

❖ The primary aim is to allow the computers learn automatically

without human intervention or assistance and adjust actions accordingly.
Introduction to Machine Learning
Misspelling of ‘Great’ (typo error)
Example: Spam Filter
(Is this spam?) Its rare that universities put exclamation points in their
subject

Did not address (such as Dear Sir/Madam......)

URL is not an Stanford University URL

Each of these features can be combined in a classifier give us some evidence to find out the email is spam.
Machine Learning Methods

Machine learning algorithms are often categorized as supervised or

unsupervised.

❖ Supervised
❖ Unsupervised
❖ Semi-supervised
❖ Reinforcement
Supervised Learning

Supervised machine learning algorithms can apply what has been learned in
the past to new data using labeled examples to predict future events. Starting from
the analysis of a known training dataset, the learning algorithm produces an
inferred function to make predictions about the output values.

❖ The system can provide targets for any new input after sufficient training.
❖ The learning algorithm can also compare its output with the correct, intended
output and find errors in order to modify the model accordingly.
An Example: Supervised Learning

4-Aug-21 24
Supervised Learning

25
Unsupervised Learning

 In contrast, unsupervised machine learning algorithms are used when the

information used to train is neither classified nor labeled.

❖ Unsupervised learning studies how systems can infer a function to describe a

hidden structure from unlabeled data.
❖ The system doesn’t figure out the right output, but it explores the data and can
draw inferences from datasets to describe hidden structures from unlabeled data.
Semi-supervised Machine Learning

 Semi-supervised machine learning algorithms fall somewhere in

between supervised and unsupervised learning, since they use both
labeled and unlabeled data for training – typically a small amount of
labeled data and a large amount of unlabeled data.

❖ The systems that use this method can considerably improve learning
accuracy.
❖ Usually, semi-supervised learning is chosen when the acquired labeled
data requires skilled and relevant resources in order to train it / learn
from it. Otherwise, acquiring unlabeled data generally doesn’t require
additional resources.
Reinforcement Learning

 Reinforcement machine learning algorithm is a learning method that interacts with its environment by
producing actions and discovers errors or rewards.
 Reinforcement Learning (RL) is a type of machine learning technique that enables an agent to learn in
an interactive environment by trial and error using feedback from its own actions and experiences.

❖ Trial and error search and delayed reward are the most relevant characteristics of reinforcement
learning.
❖ This method allows machines and software agents to automatically determine the ideal behavior within
a specific context in order to maximize its performance.
❖ Reinforcement learning uses rewards and punishment as signals for positive and negative behavior.
❖ Simple reward feedback is required for the agent to learn which action is best; this is known as the
reinforcement signal.
❖ Father of Reinforcement Learning- Richard Sutton
Reinforcement Learning
 It differs from other forms of supervised learning because the sample data set does not train the machine.
Instead, it learns by trial and error. Therefore, a series of right decisions would strengthen the method as it
better solves the problem.
 As compared to unsupervised learning, reinforcement learning is different in terms of goals. While the
goal in unsupervised learning is to find similarities and differences between data points, in reinforcement
learning the goal is to find a suitable action model that would maximize the total cumulative reward of the
agent.

Applications

 RL is quite widely used in building AI for playing computer games.

 In robotics and industrial automation, RL is used to enable the robot to create an efficient adaptive control
system for itself which learns from its own experience and behavior.
 Other applications of RL include text summarization engines, dialog agents (text, speech) which can learn
from user interactions and improve with time, learning optimal treatment policies in healthcare and RL
based agents for online stock trading.
ML
ML vs DL
 ML refers to systems that can assimilate
from experience (training data) and Deep
Learning (DL) states to systems that learn
from experience on large data sets. ML can
be considered as a subset of AI. Deep
Learning (DL) is ML but useful to large data
sets.
 Machine learning algorithms always require
structured data and deep learning networks
rely on layers of artificial neural networks.
Datasets: ML
1. Spam SMS classifier dataset
2. Spam-Mails Dataset
3. ImageNet
4. Iris Flower dataset
5. Breast cancer Wisconsin (Diagnostic) Dataset
6. Twitter sentiment Analysis Dataset
7. MNIST dataset (handwritten data)
8. Amazon review dataset
9. IMDB reviews
10. Sentiment 140
11. Iris Dataset
Predictions and Forecasts

 Predictions aim to identify one outcome,

 whereas forecasts encompass a range of outcomes.
 e.g. “it will rain tomorrow” is to make a prediction, but to say that “the chance of
rain is 40%” (implying that the chance of no rain is 60%) is to make a forecast, as it
lays out the range of possible outcomes with probabilities
Theories and Models

 Data science is one where theories are implemented using data, some of its big
data. This is embodied in an inference stack comprising (in sequence): theories,
models, intuition, causality, prediction, and correlation.
 Theories:
Theories are statements of how the world should be or is, and are derived from
axioms that are assumptions about the world, or precedent theories.
 Models:
• Models are implementations of theory, and in data science are often algorithms
based on theories that are run on data.
• The academic Thomas Davenport writes that models are key, and should not be
increasingly eschewed with increasing data
Causality and Correlation

Causality applies to situations where one

action, say X, causes an outcome, say Y,
whereas Correlation is just relating one action
(X) to another action(Y) but X does not
necessarily cause Y.
Normal distribution

 Bell-shaped curve
 Gaussian Distribution
 This curve is symmetric around the Mean.
 Mean, Median, and Mode are all the same.
 Normal distribution describes how the values of a variable
are distributed. It is typically a symmetric distribution
where most of the observations cluster around the central
peak. The values further away from the mean taper off
equally in both directions.
Poisson distribution

 French mathematician Denis Simon Poisson, is a

discrete distribution function describing the
probability that an event will occur a certain
number of times in a fixed time (or space) interval.
 It is used to model count-based data, like the
number of emails arriving in your mailbox in one
hour or the number of customers walking into a
shop in one day, for instance.
 Poisson distribution helps predict the probability
of certain events happening when you know how
often that event has occurred. It can be used by
businessmen to make forecasts about the number of
customers on certain days and allows them to adjust
supply according to the demand.
Applications of Data Science
 E-commerce
 Analyze search patterns

E-commerce
 Recommend products to customers
 Education
 Explore current trends and find latest course as per industry need
 Collect student feedback


 Understand student requirements

Internet Search
Education
 Take user’s query
 Provide results
 Show relevant recommendations Internet Search
 Advertising products
 Post adds on websites
Explore targeted customers and recommend products to
Recommendation

customers
 Recommendation
 Products, Entertainment (Videos streaming, Music)
Applications of Data Science

Gene Text/Data Mining

Logistics (Route Planning)

Predictive Modeling

Airline Companies
Intuition

 Intuition:
The results of running a model leads to intuition, i.e., a deeper understanding of
the world based on theory, model, and data.
Once we have established intuition for the results of a model, it remains to be seen whether
the relationships we observe are causal, predictive, or merely correlational. Theory may be
causal and tested as such. Granger (1969) causality is often stated in mathematical form for
two stationary time series of data as follows. X is said to Granger cause Y if in the following
equation system,

𝑌 𝑡 = 𝑎1 + 𝑏1 𝑌 𝑡 − 1 + 𝑐1 𝑋 𝑡 − 1 + 𝑒1
X 𝑡 = 𝑎2 + 𝑏2 𝑌 𝑡 − 1 + 𝑐2 𝑋 𝑡 − 1 + 𝑒2

The coefficient of 𝑐1 is significant and 𝑏2 is not signiﬁcant. Hence, X causes Y, but not
vice versa. Causality is a hard property to establish, even with theoretical foundation,
as the causal effect has to be well entrenched in the data.
Correlation

➢ Correlation:
Finally there is correlation, at the end of the data science inference chain.
Contemporaneous movement between two variables is quantified using
correlation. In many cases, we uncover correlation, but no prediction or causality.
Correlation has great value to firms attempting to tease out beneficial information
from big data. And even though it is a linear relationship between variables, it lays
the groundwork for uncovering nonlinear relationships, which are becoming
easier to detect with more data.
Exponentials, Logarithms, and
Compounding
• It is fitting to begin with the fundamental mathematical constant, “e =2.718281828...”, which is also
the function “exp(.)”. We often write this function as 𝑒 𝑥 , where x can be a real or complex variable.
Given y = 𝑒 𝑥 , a fixed change in x results in the same continuous percentage change in y. This is
because ln(y) = x, where ln(.) is the natural logarithm function, and is the inverse function of the
exponential function.
1 𝑛
• The constant e is defined as the limit of a specific function: lim 1 + 𝑛
𝑛→∞

• Exponential compounding is the limit of successively shorter intervals over discrete compounding.
Given a horizon t divided into n intervals per year, one dollar compounded from time zero to time t
𝑟 𝑛𝑡
years over these n intervals at per annum rate r may be written as 1 + .
𝑛

• Continuous-compounding is the limit of this equation when the number of periods n goes to
inﬁnity:
𝑟 𝑛𝑡 1 𝑛/𝑟 𝑡𝑟
 lim 1 + = lim [ 1+ ] = 𝑒 𝑟𝑡
𝑛→∞ 𝑛 𝑛→∞ 𝑛/𝑟
Normal Distribution

 This distribution is the workhorse of many models in the social sciences, and is
assumed to generate much of the data that comprises the Big Data universe.
 Interestingly, most phenomena (variables) in the real world are not normally
distributed. They tend to be “power law” distributed, i.e., many observations of low
value, and very few of high value. The probability distribution declines from left to right
and does not have the characteristic hump shape of the normal distribution.
 we do need to learn about the normal distribution because it is important in statistics,
and the central limit theorem does govern much of the data we look at. Examples of
approximately normally distributed data are stock returns, and human heights.
 If x ~ N(µ,σ 2 ), that is, x is normally distributed with mean m and variance σ 2 , then the
probability “density” function for x is:
1 1 (𝑥−𝜇)2
 𝑓 𝑥 = exp[− ]
2𝜋𝜎 2 2 𝜎2
Normal Distribution (Contd.)

 The cumulative probability is given by the “distribution” function:

𝑥
 F 𝑥 = ‫׬‬−∞ 𝑓 𝑢 𝑑𝑢 and F(𝑥) = 1 – F(- 𝑥)
 because the normal distribution is symmetric. We often also use the notation N(.)
or ⏀(.) instead of F(.).
 The “standard normal” distribution is: x ~ N(0, 1). For the standard normal
distribution: F(0) = 1/2. The normal distribution has continuous support, i.e., a
range of values of x that goes continuously from -∞ to +∞.
Poisson Distribution

 The Poisson is also known as the rare-event distribution. Its density function is:
𝑒 −λ λ𝑛
 𝑓 𝑛, λ = 𝑛!

 where there is only one parameter, i.e., the mean λ. The density function is over
discrete values of n, the number of occurrences given the mean number of
outcomes λ. The mean and variance of the Poisson distribution are both λ. The
Poisson is a discrete-support distribution, with a range of values n = {0, 1, 2,..}.
Moments of a continuous random
variable
 The following formulae are useful to review because any analysis of data begins with
descriptive statistics, and the following statistical “moments” are computed in order to
get a first handle on the data. Given a random variable x with probability density
function f (x), then the following are the first four moments.
 Mean (first moment or average) = 𝐸 𝑥 = ‫𝑥𝑑 𝑥 𝑓𝑥 ׬‬
 In like fashion, powers of the variable result in higher (nth order) moments. These are
“non-central” moments, i.e., they are moments of the raw random variable x, not its
deviation from its mean, i.e., [x E(x)].
 nth moment = 𝐸 𝑥 𝑛 = ‫𝑥𝑑 𝑥 𝑓 𝑛 𝑥 ׬‬
 Central moments are moments of demeaned random variables. The second central
moment is the variance:

2=
 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑉𝑎𝑟 𝑥 = 𝐸 𝑥 − 𝐸 𝑥 𝐸 𝑥 2 − [𝐸(𝑥)] 2
Moments of a continuous random variable
(Contd.)

• The standard deviation is the square-root of the variance, i.e., 𝜎 = 𝑉𝑎𝑟(𝑥). The third central
moment, normalized by the standard deviation to a suitable power is the skewness:
𝐸[𝑥 − 𝐸(𝑥)] 3
𝑠𝑘𝑒𝑤𝑛𝑒𝑠𝑠 =
𝑉𝑎𝑟(𝑥) 3/2
• The absolute value of skewness relates to the degree of asymmetry in the probability density. If
more extreme values occur to the left than the right, the distribution is left-skewed. And vice-versa,
the distribution is right-skewed.
• Correspondingly, the fourth central, normalized moment is kurtosis.
𝐸[𝑥 − 𝐸(𝑥)] 4
𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠 =
[𝑉𝑎𝑟(𝑥)] 2
• Kurtosis in the normal distribution has value 3. We deﬁne “Excess Kurtosis” to be Kurtosis minus 3.
When a probability distribution has positive excess kurtosis we call it “leptokurtic”. Such
distributions have fatter tails (either or both sides) than a normal distribution.
Combining random variables

• Since we often have to deal with composites of random variables, i.e., more than one random variable,
we review here some simple rules for moments of combinations of random variables. There are several
other expressions for the same equations, but we examine just a few here, as these are the ones we will
use more frequently.
• First, we see that means are additive and scalable, i.e.,
• 𝐸 𝑎𝑥 + 𝑏𝑦 = 𝑎𝐸 𝑥 + 𝑏𝐸(𝑦)
• where x, y are random variables, and a, b are scalar constants. The variance of scaled, summed random
variables is as follows:
• 𝑉𝑎𝑟 𝑎𝑥 + 𝑏𝑦 = 𝑎2 𝑉𝑎𝑟 𝑥 + 𝑏2 𝑉𝑎𝑟 𝑦 + 2𝑎𝑏𝐶𝑜𝑣 𝑥, 𝑦
• And the covariance and correlation between two random variables is
• 𝐶𝑜𝑣 𝑥, 𝑦 = 𝐸 𝑥𝑦 − 𝐸 𝑥 𝐸 𝑦
𝐶𝑜𝑣(𝑥,𝑦)
• 𝐶𝑜𝑟𝑟 𝑥, 𝑦 =
𝑉𝑎𝑟 𝑥 𝑉𝑎𝑟(𝑦)
Vector Algebra

• We will be using linear algebra in many of the models. Linear algebra requires the manipulation of
vectors and matrices. We will also use vector calculus. Vector algebra and calculus are very powerful
methods for tackling problems that involve solutions in spaces of several variables, i.e., in high
dimension.
• Rather than work with an abstract exposition, it is better to introduce ideas using an example. We’ll
examine the use of vectors in the context of stock portfolios. We deﬁne the returns for each stock in
a portfolio as:
𝑅1 1
𝑅2 1
• 𝑹= . 𝑼= .
. .
𝑅𝑁 1
• The use of this unit vector will become apparent shortly, but it will be used in myriad ways and is a
useful analytical object.
Vector Algebra (Contd.)

• A portfolio vector is deﬁned as a set of portfolio weights, i.e., the fraction of the portfolio that is
invested in each stock:
𝑤1
𝑤2
• W= .
.
𝑤𝑁
• The total of portfolio weights must add up to 1. σ𝑁 ′
𝑖=1 𝑤𝑖 = 1, 𝑾 𝟏 = 1

• Pay special attention to the line above. In it, there are two ways in which to describe the sum of
portfolio weights. The ﬁrst one uses summation notation, and the second one uses a simple vector
algebraic statement, i.e., that the transpose of w, denoted w’ times the unit vector 1 equals 1.
• The two elements on the left-hand-side of the equation are vectors, and the 1 on the right hand side
is a scalar. The dimension of w’ is (1 x N) and the dimension of 1 is (N x 1). And a (1 x N) vector
multiplied by a (N x 1) results in a (1 x 1) vector, i.e., a scalar.
Statistical Regression
• Consider a multivariate regression where a stock’s returns 𝑅𝑖 are regressed on several market
factors 𝑅𝑘 .
• 𝑅𝑖𝑡 = σ𝑘𝑗=0 𝛽𝑖𝑗 𝑅𝑗𝑡 + 𝑒𝑖𝑡 , ∀𝑖 .

• where t = {1, 2, . . . , T} (i.e., there are T items in the time series), and there are k independent
variables, and usually k = 0 is for the intercept. We could write this also as:
• 𝑅𝑖𝑡 = 𝛽0 + σ𝑘𝑗=1 𝛽𝑖𝑗 𝑅𝑗𝑡 + 𝑒𝑖𝑡 , ∀𝑖 .

• Compactly, using vector notation, the same regression may be written as: 𝑅𝑖 = 𝑅𝑘 𝛽𝑖 + 𝑒𝑖
• Where 𝑅𝑖 , 𝑒𝑖 ∈ 𝑅𝑇 , 𝑅𝑘 ∈ 𝑅 𝑇 𝑘+1 𝑎𝑛𝑑 𝛽𝑖 ∈ 𝑅 𝑘+1 . If there is an intercept in the regression then
the first column of 𝑅𝑘 is 1, the unit vector. Without providing a derivation, you should know that
each regression coefficient is:
𝐶𝑜𝑣(𝑅𝑖 ,𝑅𝑘 )
• 𝛽𝑖𝑘 =
𝑉𝑎𝑟(𝑅𝑘 )
Diversification

• It is useful to examine the power of using vector algebra with an application. Diversiﬁcation occurs
when we increase the number of non-perfectly correlated stocks in a portfolio, thereby reducing
portfolio variance. In order to compute the variance of the portfolio we need to use the portfolio
weights w and the covariance matrix of stock returns R, denoted Σ. We ﬁrst write down the formula
for a portfolio’s return variance:
• 𝑉𝑎𝑟 𝑤 ′ 𝑅 = 𝑤′Σw = σ𝑛𝑖=1 𝑤𝑖2 𝜎𝑖2 + σ𝑛𝑖=1 σ𝑛𝑗=1,𝑖≠𝑗 𝑤𝑖 𝑤𝑗 𝜎𝑖𝑗

• Readers are strongly encouraged to implement this by hand for n = 2 to convince themselves that
the vector form of the expression for variance w’Σw is the same thing as the long form on the right-
hand side of the equation above. If returns are independent, then the formula collapses to:
• 𝑉𝑎𝑟 𝑤 ′ 𝑅 = 𝑤′Σw = σ𝑛𝑖=1 𝑤𝑖2 𝜎𝑖2
Matrix Equations

• Here we examine how matrices may be used to represent large systems of equations easily
and also solve them. Using the values of matrices A, B and w from the previous section, we
write out the following in long form:
3 2 𝑤1 3
• Aw = B 𝑤 =
2 4 2 4
• Find the solution values 𝑤1 and 𝑤2 by hand. And then we may compute the solution for w by
“dividing” B by A. This is not regular division because A and B are matrices. Instead we need
to multiply the inverse of A (which is its “reciprocal”) by B.
• The inverse of A is
0.500 −0.250
• 𝐴−1 =
−0.250 0.375
• Now compute by hand:
0.50
• 𝐴−1 𝐵 =
0.75
Inter and Intra Cluster
Questions?

 Difference between supervised and unsupervised learning with example.

 How Reinforcement learning can be applied in healthcare and other domain?
Example and explanation.
 Inter and Intra Cluster explanation with example.
 Example of prediction and forecasts.
References

 Das, S. R., & DAS, S. (2016). Data science: theories, models, algorithms, and
analytics. Learning, 143, 145.
 Sharaff, A., & Sinha, G. R. (Eds.). (2021). Data Science and Its Applications. CRC
Press.
 Van Der Aalst, W. (2016). Data science in action. In Process mining (pp. 3-23).
Springer, Berlin, Heidelberg.
 Provost, F., & Fawcett, T. (2013). Data Science for Business: What you need to know
about data mining and data-analytic thinking. " O'Reilly Media, Inc.".

Ethnotech - Data Science With Python
No ratings yet
Ethnotech - Data Science With Python
480 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
MAchine Learning
No ratings yet
MAchine Learning
120 pages
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
CM3620 Vansco Volvo
100% (1)
CM3620 Vansco Volvo
106 pages
Makita 3703 Laminate Trimmer Instruction Manual
No ratings yet
Makita 3703 Laminate Trimmer Instruction Manual
20 pages
Machine Learning: Presentation
100% (2)
Machine Learning: Presentation
23 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
Deploy A Machine Learning Model Using Flask - Towards Data Science
No ratings yet
Deploy A Machine Learning Model Using Flask - Towards Data Science
12 pages
Tools Machine Learning
No ratings yet
Tools Machine Learning
9 pages
Data Science Resource Package!
No ratings yet
Data Science Resource Package!
14 pages
Introduction To Splunk
No ratings yet
Introduction To Splunk
7 pages
Statistics Probability
No ratings yet
Statistics Probability
66 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
Python Data Analysis Visualization
No ratings yet
Python Data Analysis Visualization
34 pages
Clouds and Big Data Computing
No ratings yet
Clouds and Big Data Computing
13 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
Tutorial On "R" Programming Language
No ratings yet
Tutorial On "R" Programming Language
25 pages
Linear Regression For Machine Learning
No ratings yet
Linear Regression For Machine Learning
17 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
Lecture+Notes (Upgrad)
No ratings yet
Lecture+Notes (Upgrad)
5 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Introduction To MS Power BI Desktop - Exercise 02 - Deeper Understanding Power BI ETL - V03
No ratings yet
Introduction To MS Power BI Desktop - Exercise 02 - Deeper Understanding Power BI ETL - V03
6 pages
Building Powerful Image Classification Models Using Very Little Data
No ratings yet
Building Powerful Image Classification Models Using Very Little Data
20 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
52 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
General Architecture of Text Mining Systems
No ratings yet
General Architecture of Text Mining Systems
6 pages
Machine Learning
100% (1)
Machine Learning
62 pages
365 Data Science R Course Notes
No ratings yet
365 Data Science R Course Notes
20 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Natural Language Processing: Dr. Ahmed El-Bialy
100% (1)
Natural Language Processing: Dr. Ahmed El-Bialy
49 pages
Machine Learning
No ratings yet
Machine Learning
111 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Getting Started With MLOPs 21 Page Tutorial
No ratings yet
Getting Started With MLOPs 21 Page Tutorial
21 pages
1 1 Intro To Data and Data Science Course Notes
No ratings yet
1 1 Intro To Data and Data Science Course Notes
8 pages
Kenny-230717-Google Data Scientist Guide
No ratings yet
Kenny-230717-Google Data Scientist Guide
8 pages
ML Unit 1 Pallav
No ratings yet
ML Unit 1 Pallav
22 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
ML Cheatsheet Final
No ratings yet
ML Cheatsheet Final
32 pages
1 - Machine Learning (Start)
No ratings yet
1 - Machine Learning (Start)
32 pages
Data Science With Python Explained PDF
No ratings yet
Data Science With Python Explained PDF
1 page
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Summer Internship Report
No ratings yet
Summer Internship Report
27 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
8 pages
Fundamentals of Machine Learning For Predictive Data Analytics
No ratings yet
Fundamentals of Machine Learning For Predictive Data Analytics
49 pages
BigData Nptel
No ratings yet
BigData Nptel
813 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
38 pages
ML Project Shivani Pandey
100% (2)
ML Project Shivani Pandey
49 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
No ratings yet
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
4 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Scala and Spark Overview PDF
No ratings yet
Scala and Spark Overview PDF
37 pages
Using Categorical Data With One Hot Encoding - Kaggle PDF
No ratings yet
Using Categorical Data With One Hot Encoding - Kaggle PDF
4 pages
Machine Learning LAB MANUAL
No ratings yet
Machine Learning LAB MANUAL
56 pages
Deep Learning Questions
50% (2)
Deep Learning Questions
51 pages
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
No ratings yet
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
26 pages
Mastering Machine Learning With Scikit-Learn: Chapter No. 5 "Nonlinear Classification and Regression With Decision Trees"
No ratings yet
Mastering Machine Learning With Scikit-Learn: Chapter No. 5 "Nonlinear Classification and Regression With Decision Trees"
23 pages
ML First Unit
No ratings yet
ML First Unit
70 pages
Ai Notes
No ratings yet
Ai Notes
31 pages
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
From Everand
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
Abhishek Vijayvargia
No ratings yet
Chapter 7
No ratings yet
Chapter 7
5 pages
ES123128 Huang AU2017
No ratings yet
ES123128 Huang AU2017
35 pages
Principles of Second Language Acquisition (SLA) Relevant To The Development of Materials
No ratings yet
Principles of Second Language Acquisition (SLA) Relevant To The Development of Materials
10 pages
Sage X3 - User Guide - HTG-Shipping Interface-UPS PDF
No ratings yet
Sage X3 - User Guide - HTG-Shipping Interface-UPS PDF
12 pages
Key Performance Indicators-Kpi
No ratings yet
Key Performance Indicators-Kpi
2 pages
GSTR1 Excel Workbook Template V1.4
No ratings yet
GSTR1 Excel Workbook Template V1.4
84 pages
Exercise Workbook2 Basic
No ratings yet
Exercise Workbook2 Basic
90 pages
Action-Plan (NSTP)
No ratings yet
Action-Plan (NSTP)
32 pages
Interview Questions
No ratings yet
Interview Questions
7 pages
Robust and Efficient Walking With Spring-Like Legs
No ratings yet
Robust and Efficient Walking With Spring-Like Legs
16 pages
Chapter4 PaoHilario MDM 6.23
No ratings yet
Chapter4 PaoHilario MDM 6.23
31 pages
Rigging JSA Safety
No ratings yet
Rigging JSA Safety
5 pages
Summative-Test-In-English 5
No ratings yet
Summative-Test-In-English 5
8 pages
Current Affairs 6-8
100% (1)
Current Affairs 6-8
3 pages
Supervisor Manual 3
No ratings yet
Supervisor Manual 3
48 pages
HDI Direct Injection Operation
No ratings yet
HDI Direct Injection Operation
31 pages
Basic Civil and Mechanical Engineering Unit IV
100% (4)
Basic Civil and Mechanical Engineering Unit IV
116 pages
Biology: Example A: Extended Essay 1
No ratings yet
Biology: Example A: Extended Essay 1
35 pages
AFP in The Control-D Environment 9.0.01 Implementation Guide
No ratings yet
AFP in The Control-D Environment 9.0.01 Implementation Guide
78 pages
Rfet Analysis PDF
100% (1)
Rfet Analysis PDF
20 pages
PCM-Pro XL Specification Sheet
No ratings yet
PCM-Pro XL Specification Sheet
2 pages
Gas Discharge Lamp Basics: Fluorescent Lamps, Ballasts, and Fixtures
No ratings yet
Gas Discharge Lamp Basics: Fluorescent Lamps, Ballasts, and Fixtures
4 pages
C++ Lab
No ratings yet
C++ Lab
12 pages
FCB Grid
No ratings yet
FCB Grid
38 pages
Annotations
No ratings yet
Annotations
6 pages
Study Plan For English, Annual Study (2020-2021)
No ratings yet
Study Plan For English, Annual Study (2020-2021)
6 pages
7es DLP TEMPLATE ScienceDemo JenniferNarra
No ratings yet
7es DLP TEMPLATE ScienceDemo JenniferNarra
7 pages
Power System Protection
No ratings yet
Power System Protection
4 pages