0% found this document useful (0 votes)
8 views14 pages

Lecture 1 - Introduction To Data Science

Uploaded by

tinashemishoni28
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
8 views14 pages

Lecture 1 - Introduction To Data Science

Uploaded by

tinashemishoni28
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 14

Introduction to

Data Science
Data Science
What Is Data Science
• Data Science is about data gathering, analysis and decision-
making.
• Data science is a collection of techniques used to extract value 2
from data
• Data Science is about finding patterns in data, through analysis,
and make future predictions.
• By using Data Science, companies are able to make:
I. Better decisions (should we choose A or B)
II. Predictive analysis (what will happen next?)
III. Pattern discoveries (find pattern, or maybe hidden information
in the data)
Where is Data Science Needed?

▹ Data Science is used in many industries in the world today,


e.g. banking, consultancy, healthcare, and manufacturing.
▹ Examples of where Data Science is needed:
I. For route planning: To discover the best routes to ship 3
II. To foresee delays for flight/ship/train etc. (through
predictive analysis)
III. To create promotional offers
IV. To find the best suited time to deliver goods
V. To forecast the next years revenue for a company
VI. To analyze health benefit of training
VII. To predict who will win elections
How Does a Data Scientist Work?

▹ A Data Scientist requires expertise in several backgrounds:

▹ Machine Learning
▹ 4
Statistics
▹ Programming (Python or R)
▹ Mathematics
▹ Databases
Cont.
Here is how a Data Scientist works:
▹ Ask the right questions - To understand the business
problem.
▹ Explore and collect data - From database, web logs,
customer feedback, etc.
▹ Extract the data - Transform the data to a standardized 5
format.
▹ Clean the data - Remove erroneous values from the data.
▹ Find and replace missing values - Check for missing values
and replace them with a suitable value (e.g. an average
value).
▹ Normalize data - Scale the values in a practical range (e.g.
140 cm is smaller than 1,8 m. However, the number 140 is
larger than 1,8. - so scaling is important).
▹ Analyze data, find patterns and make future predictions.
▹ Represent the result - Present the result with useful
insights in a way the "company" can understand.
AI, MACHINE LEARNING, AND DATA
SCIENCE

▹ Artificial intelligence, Machine learning, and data science


are all related to each other.
▹ Unsurprisingly, they are often used interchangeably and
conflated with each other in popular media and business
6
communication.
▹ However, all of these three fields are distinct depending
on the context
▹ Artificial intelligence is about giving machines the
capability of mimicking human behavior, particularly
cognitive functions. Examples would be: facial recognition,
automated driving
▹ Machine learning can either be considered a sub-field or
one of the tools of artificial intelligence.
▹ It is providing machines the capability of learning from
experience.
Cont.
▹ Data science is the business application of machine
learning, artificial intelligence, and other quantitative
fields like statistics, visualization, and mathematics.
▹ It is an interdisciplinary field that extracts value from
7
data.
▹ In the context of how data science is used today, it
relies heavily on machine learning and is sometimes
called data mining.
▹ Examples of data science user cases are:
I. recommendation engines that can recommend movies for
a particular user,.
II. a fraud alert model that detects fraudulent credit card
transactions.
8
Data Science Classification

▹ Data science problems can be broadly categorized into


supervised or unsupervised learning models.
▹ Supervised or directed data science tries to infer a
function or relationship based on labeled training data and
uses this function to map new unlabeled data.
▹ Supervised techniques predict the value of the output
9
variables based on a set of input variables. To do this, a
model is developed from a training dataset where the values
of input and output are previously known.
▹ The model generalizes the relationship between the input
and output variables and uses it to predict for a dataset
where only input variables are known.
▹ The output variable that is being predicted is also called a
class label or target variable.
▹ Supervised data science needs a sufficient number of
labeled records to learn the model from the data.
Cont.

▹ Unsupervised or undirected data science uncovers


hidden patterns in unlabeled data.
▹ In unsupervised data science, there are no output
variables to predict.
1
▹ The objective of this class of data science techniques, is 0
to find patterns in data based on the relationship between
data points themselves.
▹ An application can employ both supervised and
unsupervised learners
▹ Data science problems can also be classified into tasks
such as: classification, regression, association analysis,
clustering, anomaly detection, recommendation engines,
feature selection, time series forecasting, deep learning,
and text mining
1
1
Data Science Algorithms

▹ An algorithm is a logical step-by-step procedure for


solving a problem.
▹ In data science, it is the blueprint for how a particular
data problem is solved.
1
▹ Many of the learning algorithms are recursive, where a 2
set of steps are repeated many times until a limiting
condition is met. Some algorithms also contain a random
variable as an input and are aptly called randomized
algorithms.
▹ A classification task can be solved using many different
learning algorithms such as decision trees, artificial
neural networks, k-NN, and even some regression
algorithms.
cont
▹ The choice of which algorithm to use depends on the type
of dataset, objective, structure of the data, presence of
outliers, available computational power, number of
records, number of attributes, and so on
1
3
1
4

You might also like