0% found this document useful (0 votes)
745 views

Lecture+Notes (Upgrad)

Uploaded by

GAME OVER
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
745 views

Lecture+Notes (Upgrad)

Uploaded by

GAME OVER
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and

systems to extract knowledge and insights from data in various forms. While machine learning is
the process of training a machine (computer) to extract patterns from data and apply it at
various stages.

Data science and data analytics are ideally the two sides of the same coin. Analytics is the
discovery, interpretation and communication of meaningful pattern in the data. Data science
also means extracting insights from data and help in data-driven decision making.

● Earlier when data was not very big in size, neither it was complex. Knowledge of
Statistics was good enough to analyse these data. This was the era of Analytics.
● Over time, data set has become more complex and managing them requires specialized
Engineering skill sets. Now you require stronghold on Statistics and Computer Science
to excel on this. This is the era of Data Science.

Some people limit Analytics to Descriptive level only, while Predictive, prescriptive, Artificial
intelligence are part of Data Science.

Exploratory Data Analysis refers to the critical process of performing initial investigations on
data so as to discover patterns, to spot anomalies, to test a hypothesis and to check
assumptions with the help of summary statistics and graphical representations.

It is extremely important in creating reports/dashboards for the decision makers.

EDA mainly consists of the following parts:

© Copyright 2019. UpGrad Education Pvt. Ltd. All rights reserved


1) Data Cleaning​: There are various types of quality issues when it comes to data, and
that’s why data cleaning is one of the most time-consuming steps of data analysis. For
example, there could be formatting errors (e.g. rows and columns are merged), missing
values, repeated rows, spelling inconsistencies etc. These issues could make it difficult
to analyse data and could lead to errors or irrelevant results. Thus, these issues need to
be corrected before data is analysed.

2) Data Preparation​: This steps included converting the feature variables into a suitable
form to be used for analysis. The transformation done is known as the ​feature
transformation​.

Feature transformation: It refers to creating new features using the existing features.
Scaling or normalizing features within a range say between 0 to 1.

Univariate: As the term “univariate” suggests, it deals with analysing variables one at a time. It
is important to separately understand each variable before moving on to analysing multiple
variables together. Given a data set, the first step is to understand what it contains. Information
about a data set can be gained simply by performing a univariate analysis of the dataset.

Bivariate: It deals with the correlation between a pair of two variables. Correlation is a metric to
find the relationship between the variables. It is a number between -1 and 1 which quantifies the
extent to which two variables ‘correlate’ with each other. A positive correlation means that two
variables will increase together and decrease together, e.g. an increase in rain is accompanied
by an increase in humidity. A negative correlation means that if one variable increases the other
decreases, e.g. in some cases, as the price of a commodity decreases its demand increases.

Big Data: It refers to the ability to work with collections of data that had been impractical before
because of their volume, velocity, and variety. Inexpensive storage and distributed computing
have made it easy to work on Big data. Big data is characterised by 3 Vs - Volume, Velocity and
Variety. Volume refers to the size of the data, velocity refers to the rate at which the data is
being received, and variety refers to the different types of data that you may get - images, text,
numbers, speech, videos etc.

© Copyright 2019. UpGrad Education Pvt. Ltd. All rights reserved


Data Architecture: ​Data architecture is a set of rules, policies, standards and models that
govern and define the type of data collected and how it is used, stored, managed and integrated
within an organization and its database systems
 
Parallel Computing: Parallel computing is a type of computing in which many calculations or
the execution of processes are carried out simultaneously. Large problems can often be divided
into smaller ones, which can then be solved at the same time

Prediction focuses to understand the relationship between different variables. We use many
machine learning algorithms to make predictions, such as linear regression, logistic regression,
SVM, decision trees, random forest etc.

Regression is a statistical measure used in finance, investing and other disciplines that attempt
to determine the strength of the relationship between one variable and a series of changing
variables

In ​(time series) forecasting​, based on the previous value of a variable, you attempt to predict
its future values, i.e. given past sales data, you want to predict future sales. You look for
patterns in the sales data itself and not on the relationship between sales and the other
variables.

Supervised Learning​: a type of machine learning algorithm in which a system is taught to


classify input into specific, known classes. Classification is one such technique which classifies
data points into one of the various possible classes.

Unsupervised Learning​: A class of machine learning algorithms designed to identify groupings


of data without knowing in advance what the groups will be. Clustering is one such technique,
including K-mean clustering as one of the clustering algorithm.

© Copyright 2019. UpGrad Education Pvt. Ltd. All rights reserved


Deep Learning: ​Typically, a multi-level algorithm that gradually identifies things at higher levels
of abstraction. For example, the first level may identify certain lines, then the next level identifies
combinations of lines as shapes, and then the next level identifies combinations of shapes as
specific objects. As you might guess from this example, deep learning is popular for image
classification

Neural Networks: A robust function that takes an arbitrary set of inputs and fits it to an arbitrary
set of outputs that are binary... In practice, Neural Networks are used in deep learning research
to match images to features and much more. What makes Neural Networks special is their use
of a hidden layer of weighted functions called neurons, with which you can effectively build a
network that maps a lot of other functions. Without a hidden layer of functions, Neural Networks
would be just a set of simple weighted functions.

Natural Language Processing: A branch of computer science for parsing text of spoken
languages (for example, English or Mandarin) to convert it to structured data that you can use to
drive program logic. Early efforts focused on translating one language to another or accepting
complete sentences as queries to databases; modern efforts often analyze documents and
other data (for example, tweets) to extract potentially valuable information.

PARTITION​: Used to define a frame in a window function.

Artificial Intelligence is essentially teaching a machine to think like a human. It is surprisingly


difficult. Suppose you are watching a cricket match. You can look at the eyes of the batsman
and know what shot he will play. Now if you can train a machine to predict the same, you can
imagine what a big breakthrough that is. That is Artificial Intelligence. 
 

Recommending – what to act next – based on the person’s earlier behavior, as well as
meta-layer of the products builds the recommender system. There are generally 3 kinds of
recommendations:
a) Person to person
b) Product to product
c) Product to person

© Copyright 2019. UpGrad Education Pvt. Ltd. All rights reserved


Disclaimer​: All content and material on the UpGrad website is copyrighted material, either belonging to
UpGrad or its bonafide contributors and is purely for the dissemination of education. You are permitted
to access print and download extracts from this site purely for your own education only and on the
following basis:

● You can download this document from the website for self-use only.
● Any copies of this document, in part or full, saved to disc or to any other storage medium may
only be used for subsequent, self-viewing purposes or to print an individual extract or copy for
non-commercial personal use only.
● Any further dissemination, distribution, reproduction, copying of the content of the document
herein or the uploading thereof on other websites or use of content for any other
commercial/unauthorized purposes in any way which could infringe the intellectual property
rights of UpGrad or its contributors, is strictly prohibited.
● No graphics, images or photographs from any accompanying text in this document will be used
separately for unauthorised purposes.
● No material in this document will be modified, adapted or altered in any way.
● No part of this document or UpGrad content may be reproduced or stored in any other website
or included in any public or private electronic retrieval system or service without UpGrad’s prior
written permission.
● Any rights not expressly granted in these terms are reserved.

© Copyright 2019. UpGrad Education Pvt. Ltd. All rights reserved

You might also like