0% found this document useful (0 votes)

4 views20 pages

Cha 2

Chapter two provides an overview of data science, defining it as a multi-disciplinary field that utilizes scientific methods to extract knowledge from various data types. It discusses the data processing cycle, types of data, and the data value chain, emphasizing the importance of data acquisition, analysis, curation, storage, and usage. Additionally, it introduces big data concepts, including its characteristics, the need for clustered computing, and the Hadoop ecosystem for managing large data sets.

Uploaded by

Abinet Arba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views20 pages

Cha 2

Uploaded by

Abinet Arba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter two

Data Science

Prepared by: Abinet A. (MSc).

17/05/2025 1
2.1 An Overview of Data Science
 Data science is a multi-disciplinary field that uses scientific
methods, processes, algorithms, and systems to extract knowledge
and useful information from structured, unstructured and semi-
structured data.
 It is a multidisciplinary field that uses tools and techniques
to manipulate the data so that you can find something new
and meaningful.
 Data science uses the most powerful hardware, programming systems,
and most efficient algorithms to solve the data related problems.
 We can say that data science is all about:
 Asking the correct questions and analyzing the raw data.
Prepared by: Abinet A. (MSc).
17/05/2025 2
2.1 An Overview of Data Science…
 Modeling the data using various complex and efficient algorithms.
 Visualizing the data to get a better perspective.
 Understanding the data to make better decisions and finding the final result.
• Example: suppose we want to travel from station A to station B by
car. Now, we need to take some decisions such as which route will be
the best route to reach faster at the location, in which route there will
be no traffic jam, and which will be cost-effective. All these decision
factors will act as input data, and we will get an appropriate answer
from these decisions, so this analysis of data is called the data analysis,
which is a part of data science.

Prepared by: Abinet A. (MSc).

17/05/2025 3
2.2 Data and information
What are data and information?
 Data can be defined as a representation of facts, concepts, or instructions
in a formalized manner.
 It should be suitable for communication, interpretation, or processing,
by human or electronic machines.
 It can be described as unprocessed facts and figures.
 Data is represented with the help of characters such as alphabets (A-Z, a-
z,) digits( 0-9) or special characters (+, -, /, *, <, >, = ) etc.
 Information is the processed data on which decisions and actions
are based.
 It is data that has been processed into a form that is meaningful to
the recipient
Prepared by: Abinet A. (MSc).
17/05/2025 4
Data and information…
 We can enter data into computer by using input device and
display data by using output devices.
 Input devices are used to insert data into computer, the inputted
data will be processed by processing unit.
 Example of input devices are keyboard, mouse, scanners,
cameras, etc.
 Output devices are used to display the result.
 Example of output devices are Printer, Headphones,
Speakers, Projectors, etc.

Prepared by: Abinet A. (MSc).

17/05/2025 5
2.2 Data and information…
Data Processing Cycle
 Data processing is the re-structuring of data by people or machines
to increase their usefulness and add values for a particular purpose.
 Basic steps of data processing are:-
 Input
 Processing, and
 Output.

Figure 2.1 Data Processing Cycle

Prepared by: Abinet A. (MSc).

17/05/2025 6
2.2 Data and information…
 Input-in this step, the input data is prepared in some convenient
form for processing.
Ex: P(Principal Amount), R(Rate of interest) and Number of years.
 Processing − The input data is changed to produce data in a
more useful form.
Ex: Interest can be calculated.
 Output −the result of the processing step is collected.
The particular form of the output data depends on the use of the data.
Ex: output data may be payroll for employees.

Prepared by: Abinet A. (MSc).

17/05/2025 7
2.3 Data types and their representation
 Data types can be described from different perspectives:
2.3.1 Data types from Computer programming perspective
 Common data types in programming are:
 Integers(int)- is used to store whole numbers, mathematically known as
integers
 Booleans(bool)- is used to represent restricted to one of two values: true or
false
 Characters(char)- is used to store a single character
 Floating-point numbers(float)- is used to store real numbers
 Alphanumeric strings(string)- used to store a combination of characters and
numbers

Prepared by: Abinet A. (MSc).

17/05/2025 8
2.3 Data types and their representation…
2.3.2 Data types from Data Analytics perspective
From a data analytics point of view, there are three common data types
 Structured
 Unstructured, and
 Semi-structured data types.
 Structured Data- is data that are stored in a pre-defined format.
 It has a tabular format or stored in rows and columns.
Ex: Excel files or SQL databases

Prepared by: Abinet A. (MSc).

17/05/2025 9
2.3 Data types and their representation…
 Unstructured data is information that either does not have a
predefined data model or is not organized in a pre-defined manner.
 Can not stored in rows and columns.
 Ex: text files, audio, video, images etc.
 Semi-structured is a data that looks like structured data and sometimes
it looks like unstructured data.
 Ex: JSON(java script object notation) and XML (extensible
markup language)
 Metadata – Data about Data
 It provides additional information about a specific set of data.
 Ex: In a set of photographs, metadata could describe when and where the
photos were taken.

Prepared by: Abinet A. (MSc).

17/05/2025 10
2.4 Data value Chain
 Data Value Chain is introduced to describe the information flow
within a big data system as a series of steps needed to generate value
and useful insights from data.
 The Big Data Value Chain include the following key activities:
 Data Acquisition
 Data Analysis
 Data Curation
 Data Storage
 Data Usage
 Data Acquisition is the process of gathering, filtering, and cleaning
data before it is put in a data warehouse or any other storage solution
on which data analysis can be carried out.

Prepared by: Abinet A. (MSc).

17/05/2025 11
Data value Chain…
 Data Analysis is concerned with making the raw data acquired
amenable to use in decision-making as well as domain-specific usage
• Data Curation is the active management of data over its life cycle to
ensure it meets the necessary data quality requirements for its effective
usage.
• Data Storage is the persistence and management of data in a scalable
way that satisfies the needs of applications that require fast access to
the data.
• Data Usage covers the data-driven business activities that need access
to data, its analysis, and the tools needed to integrate the data analysis
within the business activity.
Prepared by: Abinet A. (MSc).
17/05/2025 12
2.5 Basic concepts of big data
What Is Big Data?
 Big data is a large sets of complex data, both structed and
unstructured which traditional processing techniques and/or algorithms
are unable to operate on.
 It Refers to data sets whose size is beyond the ability of typical
database software tools to capture, store, manage and analyze.
Big data is characterized by 3V
• Volume: large amounts of data Zeta bytes/Massive datasets
• Velocity: Data is live streaming or in motion
• Variety: Data comes in many different forms from diverse sources
• Veracity: Can we trust the data? How accurate is it? etc.
Prepared by: Abinet A. (MSc).
17/05/2025 13
2.5 Basic concepts of big data…

Figure 2.4 Characteristics of big data

Prepared by: Abinet A. (MSc).

17/05/2025 14
2.5 Basic concepts of big data…
Clustered Computing
Because of the qualities of big data, individual computers are often
inadequate for handling the data at most stages.
To better address the high storage and computational needs of big data,
computer clusters are a better fit.
Advantages of Clustered Computing
 Resource Pooling: Combining the available storage space to hold
data is a clear benefit
 CPU and memory pooling are also extremely important.

Prepared by: Abinet A. (MSc).

17/05/2025 15
2.5 Basic concepts of big data…
• High Availability: Clusters can provide varying levels of fault
tolerance and availability guarantees to prevent hardware or software
failures from affecting access to data and processing.
• Easy Scalability: Clusters make it easy to scale vertically or
horizontally by adding additional machines to the group.
• Using clusters requires a solution for managing cluster membership,
coordinating resource sharing, and scheduling actual work on
individual nodes.
Hadoop and its Ecosystem
• Hadoop is an open-source framework intended to make interaction
with big data easier.

Prepared by: Abinet A. (MSc).

17/05/2025 16
2.5 Basic concepts of big data…
Four key characteristics of Hadoop are:
• Economical: Its systems are highly economical as ordinary computers
can be used for data processing
• Reliable: It is reliable as it stores copies of the data on different
machines and is resistant to hardware failure.
• Scalable: It is easily scalable both, horizontally and vertically. A few
extra nodes help in scaling up the framework.
• Flexible: It is flexible and you can store as much structured and
unstructured data as you need to and decide to use them later.

Prepared by: Abinet A. (MSc).

17/05/2025 17
2.5 Basic concepts of big data…
• Hadoop has an ecosystem that has evolved from its four core
components:
 Data management
 Access
 Processing, and
 Storage.

Prepared by: Abinet A. (MSc).

17/05/2025 18
2.5 Basic concepts of big data…

PreFpiagru
edrbey:2A.b5inH
et a
A.d(o
MoScp
). Ecosystem
17/05/2025 19
2.5 Basic concepts of big data…
Big Data Life Cycle with Hadoop
1. Ingesting data into the system- is ingested or transferred to
Hadoop from various sources such as relational databases, systems,
or local files.
2. Processing the data in storage-The data is stored and processed in
this stage.
3. Computing and analyzing data- Here, the data is analyzed by
processing frameworks such as Pig, Hive, and Impala. Pig converts
the data using a map and reduce and then analyzes it.
4. Visualizing the results- In this stage, the analyzed data can be
accessed by users, which is performed by tools such as Hue and
Cloudera Search.
Prepared by: Abinet A. (MSc).
17/05/2025 20

Course Name: Introduction To Emerging Technologies
No ratings yet
Course Name: Introduction To Emerging Technologies
24 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Chapter 2EMR
No ratings yet
Chapter 2EMR
21 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
37 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
24 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
Data Science Fundamentals and Concepts
No ratings yet
Data Science Fundamentals and Concepts
27 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
22 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
29 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
33 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
30 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
20 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
37 pages
IET - Chapter 2
No ratings yet
IET - Chapter 2
32 pages
Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
27 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
27 pages
Understanding Data Science Basics
No ratings yet
Understanding Data Science Basics
36 pages
CH 2
No ratings yet
CH 2
20 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
CH 2
No ratings yet
CH 2
23 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
28 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
32 pages
EmTec Chapter 2
No ratings yet
EmTec Chapter 2
32 pages
Ch2 Emerging
No ratings yet
Ch2 Emerging
24 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
22 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
27 pages
Chapter 2 (Data Science)
No ratings yet
Chapter 2 (Data Science)
35 pages
Data Science
No ratings yet
Data Science
32 pages
Ict Ch. 2
No ratings yet
Ict Ch. 2
38 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
41 pages
Data Science Overview and Challenges
No ratings yet
Data Science Overview and Challenges
41 pages
Emerging Tech CH 2
No ratings yet
Emerging Tech CH 2
52 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
Introduction To Emerging Technologies Chapter 2
No ratings yet
Introduction To Emerging Technologies Chapter 2
31 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
25 pages
Emergency Chapter Two
No ratings yet
Emergency Chapter Two
41 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
32 pages
(ET) Chapter - 2
No ratings yet
(ET) Chapter - 2
31 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
40 pages
Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
33 pages
Big Data Processing Tools Overview
No ratings yet
Big Data Processing Tools Overview
56 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
21 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
20 pages
Chap 2-Data Analysis
No ratings yet
Chap 2-Data Analysis
27 pages
Data Science Fundamentals Overview
No ratings yet
Data Science Fundamentals Overview
55 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
57 pages
Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
24 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
28 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
41 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
43 pages
Data Science
No ratings yet
Data Science
23 pages
Cours Outline
No ratings yet
Cours Outline
4 pages
Cha 5
No ratings yet
Cha 5
23 pages
Cha 3
No ratings yet
Cha 3
38 pages
Chapter 1
No ratings yet
Chapter 1
37 pages
Chapter 3
No ratings yet
Chapter 3
39 pages
Chapter 5
No ratings yet
Chapter 5
33 pages
Chapter 6
No ratings yet
Chapter 6
38 pages
Chapter 3
No ratings yet
Chapter 3
39 pages
山东轩烨焊接项目方案英文版2025 10 31
No ratings yet
山东轩烨焊接项目方案英文版2025 10 31
27 pages
Vijayawada to Hubli Train Ticket Details
No ratings yet
Vijayawada to Hubli Train Ticket Details
2 pages
2018 CRE Primer Chap 2
No ratings yet
2018 CRE Primer Chap 2
96 pages
Men's Nike Sportswear Tech Fleece Taped Full-Zip Hoodie Finish Line
No ratings yet
Men's Nike Sportswear Tech Fleece Taped Full-Zip Hoodie Finish Line
1 page
ASV Ventilation Quick Guide
No ratings yet
ASV Ventilation Quick Guide
28 pages
ch01 JB
No ratings yet
ch01 JB
33 pages
How To Travel Banani Fast From Mirpur 11 Metro or Bashundhara RA, Which Is Near - Google Search
No ratings yet
How To Travel Banani Fast From Mirpur 11 Metro or Bashundhara RA, Which Is Near - Google Search
1 page
CV in English
No ratings yet
CV in English
3 pages
793D Engine and System Test Procedures
100% (2)
793D Engine and System Test Procedures
73 pages
SAM - Mệnh đề quan hệ
No ratings yet
SAM - Mệnh đề quan hệ
7 pages
Dialogue 21: Sarah: Todd
No ratings yet
Dialogue 21: Sarah: Todd
15 pages
Audi A5 Sportback 2.0 TFSI Quattro: Technical Data
No ratings yet
Audi A5 Sportback 2.0 TFSI Quattro: Technical Data
2 pages
Kế hoạch bài dạy tuần 15 TIẾNG ANH LỚP 4 NĂM 2024-2025
No ratings yet
Kế hoạch bài dạy tuần 15 TIẾNG ANH LỚP 4 NĂM 2024-2025
14 pages
Business Management Quiz Questions
No ratings yet
Business Management Quiz Questions
14 pages
Subarajan 2024
No ratings yet
Subarajan 2024
16 pages
Build A USB Powered AA NiMH and NiCd Battery Charger
No ratings yet
Build A USB Powered AA NiMH and NiCd Battery Charger
89 pages
Uniform Order Form-1
No ratings yet
Uniform Order Form-1
2 pages
Buy Scindapsus Plants Near You
No ratings yet
Buy Scindapsus Plants Near You
1 page
Safety Culture's Impact on Financials
No ratings yet
Safety Culture's Impact on Financials
11 pages
Final Exam: Educational Theories & SBM
No ratings yet
Final Exam: Educational Theories & SBM
5 pages
Philosophy Exam Results 2022
No ratings yet
Philosophy Exam Results 2022
3 pages
Analytical Procedures in Audit Planning
100% (1)
Analytical Procedures in Audit Planning
3 pages
Lesson Script in Mathematics
No ratings yet
Lesson Script in Mathematics
50 pages
Evidence p1
No ratings yet
Evidence p1
250 pages
P&G Hiring Process: Your Steps To Success
No ratings yet
P&G Hiring Process: Your Steps To Success
20 pages
Opening Up Relations: Marilyn Strathern
No ratings yet
Opening Up Relations: Marilyn Strathern
30 pages
Understanding Statistical Concepts in Psychology
No ratings yet
Understanding Statistical Concepts in Psychology
6 pages
Action Plan On The Conduct of The Modular Distance Learning April 2024
No ratings yet
Action Plan On The Conduct of The Modular Distance Learning April 2024
6 pages
2324 7821 Ma Fashion Management
No ratings yet
2324 7821 Ma Fashion Management
6 pages
CM745 Blood Collection Mixer Overview
No ratings yet
CM745 Blood Collection Mixer Overview
4 pages

Cha 2

Uploaded by

Cha 2

Uploaded by

Chapter two

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

Figure 2.1 Data Processing Cycle

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

Figure 2.4 Characteristics of big data

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

Prepared by: Abinet A. (MSc).

You might also like