Lecture+Notes (Upgrad)
Lecture+Notes (Upgrad)
systems to extract knowledge and insights from data in various forms. While machine learning is
the process of training a machine (computer) to extract patterns from data and apply it at
various stages.
Data science and data analytics are ideally the two sides of the same coin. Analytics is the
discovery, interpretation and communication of meaningful pattern in the data. Data science
also means extracting insights from data and help in data-driven decision making.
● Earlier when data was not very big in size, neither it was complex. Knowledge of
Statistics was good enough to analyse these data. This was the era of Analytics.
● Over time, data set has become more complex and managing them requires specialized
Engineering skill sets. Now you require stronghold on Statistics and Computer Science
to excel on this. This is the era of Data Science.
Some people limit Analytics to Descriptive level only, while Predictive, prescriptive, Artificial
intelligence are part of Data Science.
Exploratory Data Analysis refers to the critical process of performing initial investigations on
data so as to discover patterns, to spot anomalies, to test a hypothesis and to check
assumptions with the help of summary statistics and graphical representations.
2) Data Preparation: This steps included converting the feature variables into a suitable
form to be used for analysis. The transformation done is known as the feature
transformation.
Feature transformation: It refers to creating new features using the existing features.
Scaling or normalizing features within a range say between 0 to 1.
Univariate: As the term “univariate” suggests, it deals with analysing variables one at a time. It
is important to separately understand each variable before moving on to analysing multiple
variables together. Given a data set, the first step is to understand what it contains. Information
about a data set can be gained simply by performing a univariate analysis of the dataset.
Bivariate: It deals with the correlation between a pair of two variables. Correlation is a metric to
find the relationship between the variables. It is a number between -1 and 1 which quantifies the
extent to which two variables ‘correlate’ with each other. A positive correlation means that two
variables will increase together and decrease together, e.g. an increase in rain is accompanied
by an increase in humidity. A negative correlation means that if one variable increases the other
decreases, e.g. in some cases, as the price of a commodity decreases its demand increases.
Big Data: It refers to the ability to work with collections of data that had been impractical before
because of their volume, velocity, and variety. Inexpensive storage and distributed computing
have made it easy to work on Big data. Big data is characterised by 3 Vs - Volume, Velocity and
Variety. Volume refers to the size of the data, velocity refers to the rate at which the data is
being received, and variety refers to the different types of data that you may get - images, text,
numbers, speech, videos etc.
Prediction focuses to understand the relationship between different variables. We use many
machine learning algorithms to make predictions, such as linear regression, logistic regression,
SVM, decision trees, random forest etc.
Regression is a statistical measure used in finance, investing and other disciplines that attempt
to determine the strength of the relationship between one variable and a series of changing
variables
In (time series) forecasting, based on the previous value of a variable, you attempt to predict
its future values, i.e. given past sales data, you want to predict future sales. You look for
patterns in the sales data itself and not on the relationship between sales and the other
variables.
Neural Networks: A robust function that takes an arbitrary set of inputs and fits it to an arbitrary
set of outputs that are binary... In practice, Neural Networks are used in deep learning research
to match images to features and much more. What makes Neural Networks special is their use
of a hidden layer of weighted functions called neurons, with which you can effectively build a
network that maps a lot of other functions. Without a hidden layer of functions, Neural Networks
would be just a set of simple weighted functions.
Natural Language Processing: A branch of computer science for parsing text of spoken
languages (for example, English or Mandarin) to convert it to structured data that you can use to
drive program logic. Early efforts focused on translating one language to another or accepting
complete sentences as queries to databases; modern efforts often analyze documents and
other data (for example, tweets) to extract potentially valuable information.
Recommending – what to act next – based on the person’s earlier behavior, as well as
meta-layer of the products builds the recommender system. There are generally 3 kinds of
recommendations:
a) Person to person
b) Product to product
c) Product to person
● You can download this document from the website for self-use only.
● Any copies of this document, in part or full, saved to disc or to any other storage medium may
only be used for subsequent, self-viewing purposes or to print an individual extract or copy for
non-commercial personal use only.
● Any further dissemination, distribution, reproduction, copying of the content of the document
herein or the uploading thereof on other websites or use of content for any other
commercial/unauthorized purposes in any way which could infringe the intellectual property
rights of UpGrad or its contributors, is strictly prohibited.
● No graphics, images or photographs from any accompanying text in this document will be used
separately for unauthorised purposes.
● No material in this document will be modified, adapted or altered in any way.
● No part of this document or UpGrad content may be reproduced or stored in any other website
or included in any public or private electronic retrieval system or service without UpGrad’s prior
written permission.
● Any rights not expressly granted in these terms are reserved.