Chapter 1
Chapter 1
CS300
By: Dr. Muhammad Khan Afridi
Why Data Science?
Data Science is a hot and growing field, and it
doesn’t take a great deal of sleuthing to find
analysts breathlessly prognosticating that over
the next 10 years, we’ll need billions and billions
more data scientists than we currently have.
But what is data science?
• Data science is the domain of study that deals
with vast volumes of data using modern tools
and techniques to find unseen patterns, derive
meaningful information, and make business
decisions.
• Data science uses complex machine learning
algorithms to build predictive models.
The Data Science Lifecycle
Data science’s lifecycle consists of five distinct
stages, each with its own tasks:
1. Capture: Data Acquisition, Data Entry, Signal
Reception, Data Extraction. This stage involves
gathering raw structured and unstructured data.
2. Maintain: Data Warehousing, Data Cleansing,
Data Staging, Data Processing, Data Architecture.
This stage covers taking the raw data and putting
it in a form that can be used.
The Data Science Lifecycle
3. Process: Data Mining, Clustering/Classification,
Data Modeling, Data Summarization. Data
scientists take the prepared data and examine its
patterns, ranges, and biases to determine how
useful it will be in predictive analysis.
4. Analyze: Exploratory/Confirmatory, Predictive
Analysis, Regression, Text Mining, Qualitative
Analysis. Here is the real meat of the lifecycle.
This stage involves performing the various
analyses on the data.
The Data Science Lifecycle
5. Communicate: Data Reporting, Data
Visualization, Business Intelligence, Decision
Making. In this final step, analysts prepare the
analysis in easily readable forms such as charts,
graphs, and reports.
What Does a Data Scientist Do?
A data scientist analyzes business data to extract
meaningful insights. In other words, a data
scientist solves business problems through a
series of steps, including:
– Before tackling the data collection and analysis,
the data scientist determines the problem by
asking the right questions and gaining
understanding.
– The data scientist then determines the correct set
of variables and data sets.
What Does a Data Scientist Do?
– The data scientist gathers structured and
unstructured data from many disparate sources—
enterprise data, public data, etc.
– Once the data is collected, the data scientist
processes the raw data and converts it into a
format suitable for analysis.
– After the data has been rendered into a usable
form, it’s fed into the analytic system—ML
algorithm or a statistical model. This is where the
data scientists analyze and identify patterns and
trends.
What Does a Data Scientist Do?
– When the data has been completely rendered, the
data scientist interprets the data to find
opportunities and solutions.
– The data scientists finish the task by preparing the
results and insights to share with the appropriate
stakeholders and communicating the results.
Why Become a Data Scientist?
• According to Glassdoor and Forbes, demand
for data scientists will increase by 28 percent
by 2026, which speaks of the profession’s
durability and longevity, so if you want a
secure career, data science offers you that
chance.
• Furthermore, the profession of data scientist
came in second place in the Best Jobs in
America for 2021 survey, with an average base
salary of USD 127,500.
Where Do You Fit in Data Science?
Data science offers you the opportunity to focus
on and specialize in one aspect of the field.
Here’s a sample of different ways you can fit into
this exciting, fast-growing field.
Data Scientist
• Job role: Determine what the problem is, what
questions need answers, and where to find
the data. Also, they mine, clean, and present
the relevant data.
• Skills needed: Programming skills (SAS, R,
Python), storytelling and data visualization,
statistical and mathematical skills, knowledge
of Hadoop, SQL, and Machine Learning.
Data Analyst
• Job role: Analysts bridge the gap between the
data scientists and the business analysts,
organizing and analyzing data to answer the
questions the organization poses. They take
the technical analyses and turn them into
qualitative action items.
• Skills needed: Statistical and mathematical
skills, programming skills (SAS, R, Python), plus
experience in data wrangling and data
visualization.
Data Engineer
• Job role: Data engineers focus on developing,
deploying, managing, and optimizing the
organization’s data infrastructure and data
pipelines. Engineers support data scientists by
helping to transfer and transform data for
queries.
• Skills needed: NoSQL databases (e.g.,
MongoDB, Cassandra DB), programming
languages such as Java and Scala, and
frameworks (Apache Hadoop).
Data Science Tools
The data science profession is challenging, but
fortunately, there are plenty of tools available to
help the data scientist succeed at their job.
– Data Analysis: SAS, Jupyter, R Studio, MATLAB,
Excel, RapidMiner
– Data Warehousing: Informatica/ Talend, AWS
Redshift
– Data Visualization: Jupyter, Tableau, Cognos, RAW
– Machine Learning: Spark MLib, Mahout, Azure ML
studio
Difference Between Business
Intelligence and Data Science
Business intelligence is a combination of the
strategies and technologies used for the analysis
of business data/information. Like data science,
it can provide historical, current, and predictive
views of business operations.
Difference Between Business
Intelligence and Data Science
Business Intelligence Data Science