Data Types and Sources

Uploaded by

ljndeleon

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Data Types and Sources

Uploaded by

ljndeleon

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Data Types and Sources

Understanding of both structured

and unstructured data
"Structured and unstructured data are two common types of data found in many
different fields, particularly in the areas of information technology and data
science. It is important to know the distinctions between them in order to
properly handle, analyze, and gain insights from the data."
Structured Data:
• Structured data is usually kept in tables and organized in a relational
database. Each field holds data in a specific format. Some fields
have a strict format like phone numbers or addresses, while others
can have varying lengths of text, like names or descriptions.
• Structured data can be created by people or machines. It is simple to
handle and can be easily searched through both manual searches and
automated analysis using statistical methods and machine learning
algorithms.
• Examples include databases, spreadsheets, CSV files, and relational
data.
Structured Data:
• Characteristics:
1. Structured data is well-organized and has a predetermined structure.
2. A clear schema or data model outlines the structure of the data.
3. It is simple to analyze structured data through tools like SQL.
4. Examples of structured data include customer details, financial data, and
inventory records stored in a database.
Pros and cons of structured data
Pros
• Easily used by machine learning (ML) algorithms:
• Machine learning algorithms find it easy to work with structured data because
of its clear and organized format, making it simple to manipulate and query.
• Easily used by business users:
• Business users also find structured data user-friendly as it does not require
deep knowledge of various data types and their functions. With a basic
understanding of the topic, users can access and interpret the data easily.
• Accessible by more tools:
• Structured data can be accessed by a wider range of tools compared to
unstructured data, as there are more tools available for using and analyzing
structured data due to its long history compared to unstructured data.
Cons
• Limited usage:
• Data that has a specific format can only be used for its intended use, which
makes it less adaptable and user-friendly. Limited storage options:

• Limited storage options:

• Structured data is usually kept in data storage systems like data warehouses
that have strict schemas. As a result, any changes in data needs require all
structured data to be updated, which can take up a lot of time and resources.
Use cases for structured data
• CRM software analyzes data to create sets of information that show
patterns and trends in how customers behave. (contact lists)

• Online booking: Hotel and ticket reservation information such as dates,

prices, and destinations is structured in a way that follows the rows and
columns format typically seen in a pre-determined data model.

• Accounting involves using organized data to track and document financial

activities in businesses or accounting departments. (invoicing systems)
Unstructured Data
• Unstructured data refers to different types of content like documents,
videos, audio files, social media posts, and emails. It can be challenging to
organize and classify these kinds of data.
• Unstructured data, usually considered as qualitative data, cannot be
analyzed using traditional data tools and methods. Because it lacks a
predefined data model, unstructured data is best stored in non-relational
(NoSQL) databases.
• Unstructured data is usually made up of data sets instead of distinct data
elements, like a lengthy document covering various subjects. This makes it
difficult to categorize the contents of the document as a single entity.
Tools designed for structured data cannot effectively organize unstructured
documents.
Unstructured Data
• Unstructured data can be controlled, but the data items are usually
kept as objects in their original form. Users and tools can work with
the data as necessary, otherwise it will stay in its original raw form, a
process called schema-on-read.
• The significance of unstructured data is growing quickly.
Current estimates show that unstructured data makes up more
than 80% of all data in businesses, with 95% of businesses
giving priority to managing unstructured data. .
Pros and cons of unstructured
data
Unstructured data includes things like text, mobile activity, social media posts, and
IoT sensor data. This type of data has benefits such as being flexible, fast, and
requiring less storage space. However, the drawbacks include the need for special
skills and resources.
Pros
• Native format:
• Stored in its original form, unstructured data is not organized until it is
required. By keeping it in its natural state, the range of file formats in the
database grows, giving data scientists access to a larger pool of data that they
can carefully analyze.
• Fast accumulation rates:
• Since there is no requirement to define the data beforehand, it can be
gathered rapidly and effortlessly.
• Data lake storage:
• Allows the storage of large amounts of data and charges based on actual
usage, reducing expenses and making it easier to adjust capacity as needed.
Cons
• Requires expertise:
• Data science skills are needed to organize and analyze unstructured data
because it lacks a specific format. This is helpful for data analysts but can be
difficult for non-specialized business users who may not grasp specialized
data concepts or know how to use their data effectively.
• Specialized tools:
• Data managers are limited in their product choices because they need specific
tools to work with unstructured data.
Use cases for unstructured data
• Data mining:
• Businesses can utilize unstructured data to analyze how consumers behave,
feel about products, and make purchases in order to improve their services
for customers.

• Predictive data analytics:

• Notify companies about important upcoming events so they have time to plan
and adapt to major changes in the market.

• Chatbots:
• Analyze the text to direct customer queries to the right sources for answers.
What are the key differences between
structured and unstructured data?
• Sources:
• Structured data comes from sources like GPS sensors, online forms, network
logs, web server logs, and OLTP systems. Unstructured data sources include
email messages, word-processing documents, and PDF files.
• Forms:
• Structured data is made up of numerical data and values, while unstructured
data includes things like sensors, text files, audio files, and video files.
• Models:
• Structured data is data that follows a specific format and organization before
being stored, while unstructured data is stored in its original form and isn't
organized until needed.
What are the key differences between
structured and unstructured data?
• Storage:
• Structured data is information that is organized in tables, like Excel
spreadsheets or SQL databases, which take up less space. This type of data
can be stored in data warehouses, allowing for easy scalability. On the other
hand, unstructured data is stored as media files or in NoSQL databases, which
require more storage space. It can be stored in data lakes, but this makes it
harder to scale.
• Uses:
• Machine learning relies on structured data for its algorithms, while natural
language processing and text mining utilize unstructured data.
Levels of Measurement
• The concept of level of measurement pertains to the accuracy with
which a variable has been assessed. During the data collection
process, various forms of information are gathered based on the
specific objectives of the inquiry.
• If one were to examine the expenditure patterns of residents in
Tokyo, a survey could be distributed to 500 individuals in order to
gather information regarding their income, precise location, age, and
expenditure on different goods and services. These components
represent the variables of the study, measurable data with varying
values among participants.
Why are levels of measurement important?
• The level of measurement is crucial because it dictates the kind of
statistical analysis that can be conducted. As a consequence, it
influences both the quality and the thoroughness of the conclusions
you can draw from your data.
• It is important to plan ahead and determine the specific levels of
measurement needed in order to accurately conduct certain
statistical tests.
Four Different Levels of Measurement
• In descending order of precision, the four different levels of
measurement are:
• Nominal – Latin for name only (Republican, Democrat, Green, Libertarian)
• Ordinal – Think ordered levels or ranks (small–8oz, medium–12oz, large–
32oz)
• Interval – Equal intervals among levels (1 dollar to 2 dollars is the same
interval as 88 dollars to 89 dollars)
• Ratio – Let the “o” in ratio remind you of a zero in the scale (Day 0, day 1, day
2, day 3, …)
Nominal Level of Measurement
• At this measurement level, the numbers are just for categorizing the
data. You can use words, letters, and symbols that include numbers.
Imagine there is information on individuals categorized into three
gender groups. For example, females are labeled as F, males as M,
and transgender individuals as T. This method of categorization is
known as nominal measurement.
Ordinal Level of Measurement
• This type of measurement shows a clear order among the
observations of a variable.
• For example, if a student gets the highest score of 100 in the class, he
would be given the first rank. If another classmate scores 92, she
would be given the second rank. A third student scoring 81 would
then be assigned the third rank, and so forth. The ordinal
measurement level highlights the ranking of the measurements.
• For example, you could measure the variable “income” on an ordinal
scale as follows:
• low income
• medium income
• high income.
• Another example could be level of education, classified as follows:
• high school
• master’s degree
• doctorate
Interval Level of Measurement
• The interval level of measurement not only puts measurements in
order, but it also ensures that the differences between each interval
on the scale are the same.
• For instance, anxiety in a student can be measured on an interval
scale between scores of 10 and 11, which is equivalent to a student
scoring between 40 and 41.
• An example of this type of measurement is temperature in Celsius,
where the difference between 94°C and 96°C is the same as the
difference between 100°C and 102°C.
Ratio Level of Measurement
• This type of measurement includes equal intervals and a value of zero
possible for observations. The presence of zero sets it apart from
other types of measurement, but it shares similarities with the
interval level.
• In the ratio level, the distances between points on the scale are equal.
Discrete Data vs. Continuous Data
Discrete data and continuous data are two types of quantitative information. The
key distinction between them is the kind of data they display. Discrete data usually
presents information on specific events, while continuous data often reveals
patterns or trends in data as time passes.
Values:
• Discrete data consists of specific, countable figures, like the number
of students in a class.
• On the other hand, continuous data often involves measurable values
that cover a range of information, like the height difference between
the shortest and tallest students in a class.
Types of data:
• Discrete data usually consists of whole numbers, while continuous
data often includes fractions or decimals.
Methods:
• Discrete data can typically be measured with easy techniques like a
number line or bar chart, while continuous data may require more
advanced methods such as curves or histograms.
Time intervals:
• Discrete data stays the same during a certain period, while
continuous data changes values at different times.
Discrete/continuous data
visualization
Visualizing discrete and continuous data can be an important step to help you
understand the performance of a business. Here are a few ways you can represent
this type of data:
• https://www.ibm.com/blog/structured-vs-unstructured-data/
• https://www.imperva.com/learn/data-security/structured-and-
unstructured-data/
• https://careerfoundry.com/en/blog/data-analytics/data-levels-of-
measurement/#what-are-levels-of-measurement-in-data-and-
statistics
• https://www.indeed.com/career-advice/career-development/what-
is-discrete-
data#:~:text=Values%3A%20Discrete%20data%20represents%20exact
,tallest%20student%20in%20a%20class.

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
62% (66)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
KamaSutra Positions
69% (83)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
Sample Mental Health Progress Note
96% (47)
Sample Mental Health Progress Note
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (55)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Pega CSSA
No ratings yet
Pega CSSA
851 pages
Unit - 1 Notes - Introduction To Data-Analytics PDF
0% (1)
Unit - 1 Notes - Introduction To Data-Analytics PDF
106 pages
Module I-Assignment 03 - Taking A Closer Look at Historical Sources - Primary Secondary and Tertiary (Individual Work)
No ratings yet
Module I-Assignment 03 - Taking A Closer Look at Historical Sources - Primary Secondary and Tertiary (Individual Work)
5 pages
MPLAB Blockset Simulink
67% (3)
MPLAB Blockset Simulink
58 pages
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
No ratings yet
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
73 pages
Chapter - 2
No ratings yet
Chapter - 2
38 pages
Chapter One - DS Introduction
No ratings yet
Chapter One - DS Introduction
40 pages
Data and Data Storage
No ratings yet
Data and Data Storage
29 pages
HTC Emerging Ch2
No ratings yet
HTC Emerging Ch2
37 pages
CH-2 Data Science
No ratings yet
CH-2 Data Science
45 pages
Unit - I Part I
No ratings yet
Unit - I Part I
48 pages
DA_Unit_1
No ratings yet
DA_Unit_1
44 pages
Unit 1
No ratings yet
Unit 1
36 pages
#2 Data Science
No ratings yet
#2 Data Science
32 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
Unit 1
No ratings yet
Unit 1
61 pages
5.1 Data and Databases
No ratings yet
5.1 Data and Databases
14 pages
Set Software Programs Organization Storage Retrieval Data Database
No ratings yet
Set Software Programs Organization Storage Retrieval Data Database
26 pages
Big Data Analytics Unit Test-I Answers Bank
No ratings yet
Big Data Analytics Unit Test-I Answers Bank
10 pages
Chapter 2-2
No ratings yet
Chapter 2-2
34 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
Undestanding Data Module-3
No ratings yet
Undestanding Data Module-3
8 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Data Warehousing & Data Mining - Study Material
No ratings yet
Data Warehousing & Data Mining - Study Material
27 pages
unit-1ppt-241202105748-ba1c594f
No ratings yet
unit-1ppt-241202105748-ba1c594f
30 pages
unit-1ppt
No ratings yet
unit-1ppt
29 pages
Chap 2-Data Analysis
No ratings yet
Chap 2-Data Analysis
27 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
Rudra Bhatt Data
No ratings yet
Rudra Bhatt Data
9 pages
Understanding Data
No ratings yet
Understanding Data
8 pages
Unit 1 - Lecture 1.2,3 - Data Science & Big Data
No ratings yet
Unit 1 - Lecture 1.2,3 - Data Science & Big Data
34 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
Chaper 3 FoDS - Copy
No ratings yet
Chaper 3 FoDS - Copy
127 pages
Ccs367-Storage Technologies-Unit - I
No ratings yet
Ccs367-Storage Technologies-Unit - I
53 pages
1 Da
No ratings yet
1 Da
12 pages
Unit 1: To Data Science
No ratings yet
Unit 1: To Data Science
56 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
Unit-5-Foundations-of-Business-Intelligence-Databases-and-Information-Management
No ratings yet
Unit-5-Foundations-of-Business-Intelligence-Databases-and-Information-Management
14 pages
Lesson 3 Data Science
No ratings yet
Lesson 3 Data Science
12 pages
Bda Combined
No ratings yet
Bda Combined
102 pages
Class+2+ +Lecture+Note.
No ratings yet
Class+2+ +Lecture+Note.
43 pages
Facets of Data:: Self-Describing Structure
No ratings yet
Facets of Data:: Self-Describing Structure
6 pages
Data Types
No ratings yet
Data Types
36 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
Unit 4
No ratings yet
Unit 4
29 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Mra Unit III
No ratings yet
Mra Unit III
44 pages
Chapter 01: Types of Digital Data
No ratings yet
Chapter 01: Types of Digital Data
79 pages
Unit 1 Rept
No ratings yet
Unit 1 Rept
61 pages
Data For Business Analytics Unit 2
No ratings yet
Data For Business Analytics Unit 2
23 pages
Introduction To Data Architecture: Lecture # 1 Dr. Saif Ur Rehman Malik
No ratings yet
Introduction To Data Architecture: Lecture # 1 Dr. Saif Ur Rehman Malik
18 pages
DS Week 01
No ratings yet
DS Week 01
11 pages
4. GE ELECT 1 - Data and Databases
No ratings yet
4. GE ELECT 1 - Data and Databases
5 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Unit 3
No ratings yet
Unit 3
28 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
The Key Differences Between Data Vs Information: Unit 1 Introduction and Fundamentals of Data
No ratings yet
The Key Differences Between Data Vs Information: Unit 1 Introduction and Fundamentals of Data
27 pages
Data Analytics-Wps Office
No ratings yet
Data Analytics-Wps Office
21 pages
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
M.Sc. - Comp. Sci
No ratings yet
M.Sc. - Comp. Sci
70 pages
3500/25 Enhanced Keyphasor: Description
No ratings yet
3500/25 Enhanced Keyphasor: Description
7 pages
Wireless Communication Using Sound: CSE 561 Course Project
No ratings yet
Wireless Communication Using Sound: CSE 561 Course Project
13 pages
Introduction To DevOps
No ratings yet
Introduction To DevOps
71 pages
subject codes
No ratings yet
subject codes
2 pages
An Introduction To Microsoft PowerPoint 2019 Presentations
No ratings yet
An Introduction To Microsoft PowerPoint 2019 Presentations
29 pages
Case Study 1
No ratings yet
Case Study 1
12 pages
Android Magazine
100% (1)
Android Magazine
100 pages
5th AI Evaluation Schedule
No ratings yet
5th AI Evaluation Schedule
1 page
BDA Anysecu WP-9900
No ratings yet
BDA Anysecu WP-9900
2 pages
Philips 26 Widescreen Flat TV 26pf4310 10 Manual de Usuario
No ratings yet
Philips 26 Widescreen Flat TV 26pf4310 10 Manual de Usuario
3 pages
Modern Communication Technologies
No ratings yet
Modern Communication Technologies
3 pages
Sick Encoder - Programming - Solutions
No ratings yet
Sick Encoder - Programming - Solutions
20 pages
Blob Compression Delphi Source Code
No ratings yet
Blob Compression Delphi Source Code
2 pages
Using Opnet in Education
No ratings yet
Using Opnet in Education
6 pages
Smart Aquarium Management System: Advances in Parallel Computing November 2020
No ratings yet
Smart Aquarium Management System: Advances in Parallel Computing November 2020
6 pages
Curriculum Vitae: Harpreet Singh
No ratings yet
Curriculum Vitae: Harpreet Singh
2 pages
CTJV803 CLASS Cloud Log Assuring Soundness and Secrecy Scheme For Cloud Forensics
No ratings yet
CTJV803 CLASS Cloud Log Assuring Soundness and Secrecy Scheme For Cloud Forensics
15 pages
Csc1202 Assignmet - Solution
No ratings yet
Csc1202 Assignmet - Solution
9 pages
Et G320240FV1
No ratings yet
Et G320240FV1
23 pages
SAM4040 Family Technical Manual V1.0
No ratings yet
SAM4040 Family Technical Manual V1.0
75 pages
Installation Oracle RAC - AIX
No ratings yet
Installation Oracle RAC - AIX
48 pages
Devops Books
No ratings yet
Devops Books
10 pages
Complete Download Modelling and Analysis of Enterprise Information Systems 1st Edition Angappa Gunasekaran PDF All Chapters
100% (1)
Complete Download Modelling and Analysis of Enterprise Information Systems 1st Edition Angappa Gunasekaran PDF All Chapters
57 pages
Blood Bank Management System
No ratings yet
Blood Bank Management System
5 pages
Alesis Fusion 8HD Upgrade Guide
No ratings yet
Alesis Fusion 8HD Upgrade Guide
21 pages
Bandwidth and Throughput Measurements in Broadband Networks
No ratings yet
Bandwidth and Throughput Measurements in Broadband Networks
43 pages