Data Types and Sources
Data Types and Sources
• Chatbots:
• Analyze the text to direct customer queries to the right sources for answers.
What are the key differences between
structured and unstructured data?
• Sources:
• Structured data comes from sources like GPS sensors, online forms, network
logs, web server logs, and OLTP systems. Unstructured data sources include
email messages, word-processing documents, and PDF files.
• Forms:
• Structured data is made up of numerical data and values, while unstructured
data includes things like sensors, text files, audio files, and video files.
• Models:
• Structured data is data that follows a specific format and organization before
being stored, while unstructured data is stored in its original form and isn't
organized until needed.
What are the key differences between
structured and unstructured data?
• Storage:
• Structured data is information that is organized in tables, like Excel
spreadsheets or SQL databases, which take up less space. This type of data
can be stored in data warehouses, allowing for easy scalability. On the other
hand, unstructured data is stored as media files or in NoSQL databases, which
require more storage space. It can be stored in data lakes, but this makes it
harder to scale.
• Uses:
• Machine learning relies on structured data for its algorithms, while natural
language processing and text mining utilize unstructured data.
Levels of Measurement
• The concept of level of measurement pertains to the accuracy with
which a variable has been assessed. During the data collection
process, various forms of information are gathered based on the
specific objectives of the inquiry.
• If one were to examine the expenditure patterns of residents in
Tokyo, a survey could be distributed to 500 individuals in order to
gather information regarding their income, precise location, age, and
expenditure on different goods and services. These components
represent the variables of the study, measurable data with varying
values among participants.
Why are levels of measurement important?
• The level of measurement is crucial because it dictates the kind of
statistical analysis that can be conducted. As a consequence, it
influences both the quality and the thoroughness of the conclusions
you can draw from your data.
• It is important to plan ahead and determine the specific levels of
measurement needed in order to accurately conduct certain
statistical tests.
Four Different Levels of Measurement
• In descending order of precision, the four different levels of
measurement are:
• Nominal – Latin for name only (Republican, Democrat, Green, Libertarian)
• Ordinal – Think ordered levels or ranks (small–8oz, medium–12oz, large–
32oz)
• Interval – Equal intervals among levels (1 dollar to 2 dollars is the same
interval as 88 dollars to 89 dollars)
• Ratio – Let the “o” in ratio remind you of a zero in the scale (Day 0, day 1, day
2, day 3, …)
Nominal Level of Measurement
• At this measurement level, the numbers are just for categorizing the
data. You can use words, letters, and symbols that include numbers.
Imagine there is information on individuals categorized into three
gender groups. For example, females are labeled as F, males as M,
and transgender individuals as T. This method of categorization is
known as nominal measurement.
Ordinal Level of Measurement
• This type of measurement shows a clear order among the
observations of a variable.
• For example, if a student gets the highest score of 100 in the class, he
would be given the first rank. If another classmate scores 92, she
would be given the second rank. A third student scoring 81 would
then be assigned the third rank, and so forth. The ordinal
measurement level highlights the ranking of the measurements.
• For example, you could measure the variable “income” on an ordinal
scale as follows:
• low income
• medium income
• high income.
• Another example could be level of education, classified as follows:
• high school
• master’s degree
• doctorate
Interval Level of Measurement
• The interval level of measurement not only puts measurements in
order, but it also ensures that the differences between each interval
on the scale are the same.
• For instance, anxiety in a student can be measured on an interval
scale between scores of 10 and 11, which is equivalent to a student
scoring between 40 and 41.
• An example of this type of measurement is temperature in Celsius,
where the difference between 94°C and 96°C is the same as the
difference between 100°C and 102°C.
Ratio Level of Measurement
• This type of measurement includes equal intervals and a value of zero
possible for observations. The presence of zero sets it apart from
other types of measurement, but it shares similarities with the
interval level.
• In the ratio level, the distances between points on the scale are equal.
Discrete Data vs. Continuous Data
Discrete data and continuous data are two types of quantitative information. The
key distinction between them is the kind of data they display. Discrete data usually
presents information on specific events, while continuous data often reveals
patterns or trends in data as time passes.
Values:
• Discrete data consists of specific, countable figures, like the number
of students in a class.
• On the other hand, continuous data often involves measurable values
that cover a range of information, like the height difference between
the shortest and tallest students in a class.
Types of data:
• Discrete data usually consists of whole numbers, while continuous
data often includes fractions or decimals.
Methods:
• Discrete data can typically be measured with easy techniques like a
number line or bar chart, while continuous data may require more
advanced methods such as curves or histograms.
Time intervals:
• Discrete data stays the same during a certain period, while
continuous data changes values at different times.
Discrete/continuous data
visualization
Visualizing discrete and continuous data can be an important step to help you
understand the performance of a business. Here are a few ways you can represent
this type of data:
• https://www.ibm.com/blog/structured-vs-unstructured-data/
• https://www.imperva.com/learn/data-security/structured-and-
unstructured-data/
• https://careerfoundry.com/en/blog/data-analytics/data-levels-of-
measurement/#what-are-levels-of-measurement-in-data-and-
statistics
• https://www.indeed.com/career-advice/career-development/what-
is-discrete-
data#:~:text=Values%3A%20Discrete%20data%20represents%20exact
,tallest%20student%20in%20a%20class.