0% found this document useful (0 votes)
12 views

Data Types and Sources

-

Uploaded by

ljndeleon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Data Types and Sources

-

Uploaded by

ljndeleon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Data Types and Sources

Understanding of both structured


and unstructured data
"Structured and unstructured data are two common types of data found in many
different fields, particularly in the areas of information technology and data
science. It is important to know the distinctions between them in order to
properly handle, analyze, and gain insights from the data."
Structured Data:
• Structured data is usually kept in tables and organized in a relational
database. Each field holds data in a specific format. Some fields
have a strict format like phone numbers or addresses, while others
can have varying lengths of text, like names or descriptions.
• Structured data can be created by people or machines. It is simple to
handle and can be easily searched through both manual searches and
automated analysis using statistical methods and machine learning
algorithms.
• Examples include databases, spreadsheets, CSV files, and relational
data.
Structured Data:
• Characteristics:
1. Structured data is well-organized and has a predetermined structure.
2. A clear schema or data model outlines the structure of the data.
3. It is simple to analyze structured data through tools like SQL.
4. Examples of structured data include customer details, financial data, and
inventory records stored in a database.
Pros and cons of structured data
Pros
• Easily used by machine learning (ML) algorithms:
• Machine learning algorithms find it easy to work with structured data because
of its clear and organized format, making it simple to manipulate and query.
• Easily used by business users:
• Business users also find structured data user-friendly as it does not require
deep knowledge of various data types and their functions. With a basic
understanding of the topic, users can access and interpret the data easily.
• Accessible by more tools:
• Structured data can be accessed by a wider range of tools compared to
unstructured data, as there are more tools available for using and analyzing
structured data due to its long history compared to unstructured data.
Cons
• Limited usage:
• Data that has a specific format can only be used for its intended use, which
makes it less adaptable and user-friendly. Limited storage options:

• Limited storage options:


• Structured data is usually kept in data storage systems like data warehouses
that have strict schemas. As a result, any changes in data needs require all
structured data to be updated, which can take up a lot of time and resources.
Use cases for structured data
• CRM software analyzes data to create sets of information that show
patterns and trends in how customers behave. (contact lists)

• Online booking: Hotel and ticket reservation information such as dates,


prices, and destinations is structured in a way that follows the rows and
columns format typically seen in a pre-determined data model.

• Accounting involves using organized data to track and document financial


activities in businesses or accounting departments. (invoicing systems)
Unstructured Data
• Unstructured data refers to different types of content like documents,
videos, audio files, social media posts, and emails. It can be challenging to
organize and classify these kinds of data.
• Unstructured data, usually considered as qualitative data, cannot be
analyzed using traditional data tools and methods. Because it lacks a
predefined data model, unstructured data is best stored in non-relational
(NoSQL) databases.
• Unstructured data is usually made up of data sets instead of distinct data
elements, like a lengthy document covering various subjects. This makes it
difficult to categorize the contents of the document as a single entity.
Tools designed for structured data cannot effectively organize unstructured
documents.
Unstructured Data
• Unstructured data can be controlled, but the data items are usually
kept as objects in their original form. Users and tools can work with
the data as necessary, otherwise it will stay in its original raw form, a
process called schema-on-read.
• The significance of unstructured data is growing quickly.
Current estimates show that unstructured data makes up more
than 80% of all data in businesses, with 95% of businesses
giving priority to managing unstructured data. .
Pros and cons of unstructured
data
Unstructured data includes things like text, mobile activity, social media posts, and
IoT sensor data. This type of data has benefits such as being flexible, fast, and
requiring less storage space. However, the drawbacks include the need for special
skills and resources.
Pros
• Native format:
• Stored in its original form, unstructured data is not organized until it is
required. By keeping it in its natural state, the range of file formats in the
database grows, giving data scientists access to a larger pool of data that they
can carefully analyze.
• Fast accumulation rates:
• Since there is no requirement to define the data beforehand, it can be
gathered rapidly and effortlessly.
• Data lake storage:
• Allows the storage of large amounts of data and charges based on actual
usage, reducing expenses and making it easier to adjust capacity as needed.
Cons
• Requires expertise:
• Data science skills are needed to organize and analyze unstructured data
because it lacks a specific format. This is helpful for data analysts but can be
difficult for non-specialized business users who may not grasp specialized
data concepts or know how to use their data effectively.
• Specialized tools:
• Data managers are limited in their product choices because they need specific
tools to work with unstructured data.
Use cases for unstructured data
• Data mining:
• Businesses can utilize unstructured data to analyze how consumers behave,
feel about products, and make purchases in order to improve their services
for customers.

• Predictive data analytics:


• Notify companies about important upcoming events so they have time to plan
and adapt to major changes in the market.

• Chatbots:
• Analyze the text to direct customer queries to the right sources for answers.
What are the key differences between
structured and unstructured data?
• Sources:
• Structured data comes from sources like GPS sensors, online forms, network
logs, web server logs, and OLTP systems. Unstructured data sources include
email messages, word-processing documents, and PDF files.
• Forms:
• Structured data is made up of numerical data and values, while unstructured
data includes things like sensors, text files, audio files, and video files.
• Models:
• Structured data is data that follows a specific format and organization before
being stored, while unstructured data is stored in its original form and isn't
organized until needed.
What are the key differences between
structured and unstructured data?
• Storage:
• Structured data is information that is organized in tables, like Excel
spreadsheets or SQL databases, which take up less space. This type of data
can be stored in data warehouses, allowing for easy scalability. On the other
hand, unstructured data is stored as media files or in NoSQL databases, which
require more storage space. It can be stored in data lakes, but this makes it
harder to scale.
• Uses:
• Machine learning relies on structured data for its algorithms, while natural
language processing and text mining utilize unstructured data.
Levels of Measurement
• The concept of level of measurement pertains to the accuracy with
which a variable has been assessed. During the data collection
process, various forms of information are gathered based on the
specific objectives of the inquiry.
• If one were to examine the expenditure patterns of residents in
Tokyo, a survey could be distributed to 500 individuals in order to
gather information regarding their income, precise location, age, and
expenditure on different goods and services. These components
represent the variables of the study, measurable data with varying
values among participants.
Why are levels of measurement important?
• The level of measurement is crucial because it dictates the kind of
statistical analysis that can be conducted. As a consequence, it
influences both the quality and the thoroughness of the conclusions
you can draw from your data.
• It is important to plan ahead and determine the specific levels of
measurement needed in order to accurately conduct certain
statistical tests.
Four Different Levels of Measurement
• In descending order of precision, the four different levels of
measurement are:
• Nominal – Latin for name only (Republican, Democrat, Green, Libertarian)
• Ordinal – Think ordered levels or ranks (small–8oz, medium–12oz, large–
32oz)
• Interval – Equal intervals among levels (1 dollar to 2 dollars is the same
interval as 88 dollars to 89 dollars)
• Ratio – Let the “o” in ratio remind you of a zero in the scale (Day 0, day 1, day
2, day 3, …)
Nominal Level of Measurement
• At this measurement level, the numbers are just for categorizing the
data. You can use words, letters, and symbols that include numbers.
Imagine there is information on individuals categorized into three
gender groups. For example, females are labeled as F, males as M,
and transgender individuals as T. This method of categorization is
known as nominal measurement.
Ordinal Level of Measurement
• This type of measurement shows a clear order among the
observations of a variable.
• For example, if a student gets the highest score of 100 in the class, he
would be given the first rank. If another classmate scores 92, she
would be given the second rank. A third student scoring 81 would
then be assigned the third rank, and so forth. The ordinal
measurement level highlights the ranking of the measurements.
• For example, you could measure the variable “income” on an ordinal
scale as follows:
• low income
• medium income
• high income.
• Another example could be level of education, classified as follows:
• high school
• master’s degree
• doctorate
Interval Level of Measurement
• The interval level of measurement not only puts measurements in
order, but it also ensures that the differences between each interval
on the scale are the same.
• For instance, anxiety in a student can be measured on an interval
scale between scores of 10 and 11, which is equivalent to a student
scoring between 40 and 41.
• An example of this type of measurement is temperature in Celsius,
where the difference between 94°C and 96°C is the same as the
difference between 100°C and 102°C.
Ratio Level of Measurement
• This type of measurement includes equal intervals and a value of zero
possible for observations. The presence of zero sets it apart from
other types of measurement, but it shares similarities with the
interval level.
• In the ratio level, the distances between points on the scale are equal.
Discrete Data vs. Continuous Data
Discrete data and continuous data are two types of quantitative information. The
key distinction between them is the kind of data they display. Discrete data usually
presents information on specific events, while continuous data often reveals
patterns or trends in data as time passes.
Values:
• Discrete data consists of specific, countable figures, like the number
of students in a class.
• On the other hand, continuous data often involves measurable values
that cover a range of information, like the height difference between
the shortest and tallest students in a class.
Types of data:
• Discrete data usually consists of whole numbers, while continuous
data often includes fractions or decimals.
Methods:
• Discrete data can typically be measured with easy techniques like a
number line or bar chart, while continuous data may require more
advanced methods such as curves or histograms.
Time intervals:
• Discrete data stays the same during a certain period, while
continuous data changes values at different times.
Discrete/continuous data
visualization
Visualizing discrete and continuous data can be an important step to help you
understand the performance of a business. Here are a few ways you can represent
this type of data:
• https://www.ibm.com/blog/structured-vs-unstructured-data/
• https://www.imperva.com/learn/data-security/structured-and-
unstructured-data/
• https://careerfoundry.com/en/blog/data-analytics/data-levels-of-
measurement/#what-are-levels-of-measurement-in-data-and-
statistics
• https://www.indeed.com/career-advice/career-development/what-
is-discrete-
data#:~:text=Values%3A%20Discrete%20data%20represents%20exact
,tallest%20student%20in%20a%20class.

You might also like