Understanding Data

The document provides an overview of data, including its definition, importance, types (structured and unstructured), and processes such as collection, storage, and processing. It also discusses measures of central tendency (mean, median, mode) and measures of variability (range, standard deviation) to analyze data effectively. The content is aimed at understanding the role of data in decision-making across various fields.

Uploaded by

Jayabharathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views21 pages

Understanding Data

Uploaded by

Jayabharathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CSC

SUBJECT CODE : 41
2ND PU - 2025-2026

Understanding data
Understanding data
Introduction to data
Data collection
Data storage
Data processing
Statistical techniques for data processing
Data
• Data is a collection of characters, numbers, and other symbols that
represents values of some situations or variables.
E.G Name, gender of a person, images, online posts, comments etc.
Importance of data
Data is crucial for decision making.
E.G:
• Pharmaceutical companies record data while trying out a new medicine
to see its effectiveness.
• Libraries maintain data about books in the library and the membership
of the library.
• The search engines give us results after analysing large volume of data
available on the websites across World Wide Web (www).
• Weather alerts are generated by analysing data received from various
satellites.
Types of data
• data come from different sources, they can be in different formats.
E.g: An image is a collection of pixels; a video is made up of frames;
• There are 2 types of data
1) Structured Data
2) Unstructured Data
Structured Data:
• Data which is organised and can be recorded in a well defined format
is called structured data.
• Structured data is usually stored in computer in a tabular format.
E.G Attendance register, sales transactions
Unstructured Data :
data which are not in the well defined format / not in traditional row
and column structure is called unstructured data.
E.G Newspaper, text documents, business reports etc.
• Unstructured data are sometimes described with the help of
metadata.
• Metadata is basically data about data.
E.G email as subject, recipient, main body, attachment, etc.
Structured data example:

ModelNo Unit ProductName Price Discount(%) Items_in_Inventory

ABC1 Water bottle 126 8 13

ABC2 Melamine Plates 320 5 45

ABC3 Dinner Set 4200 10 8
Data Collection
• Data collection here means identifying already available data or
collecting from the appropriate sources.
E.G Suppose there are three different scenarios where sales data in a
grocery store are available:
• Sales data are available with the shopkeeper in a diary or register
• Data are already available in a digital format, say in a CSV (comma
separated values) file.
• The shopkeeper has so far not recorded any data in either form but
wants to get a software developed for maintaining sales data and
accounts.
Data are continuously being generated at different sources.
E.G
• Hospitals: Collecting data about patients.
• Shopping malls: Collecting data about the items being purchased by
people. Etc.
Data Storage
• Data storage is the process of storing data on storage devices so that
data can be retrieved later.
• Data storage is needed and important because large volumes are data
are generated daily, so storing them ensures easy retrieval and
analysis when needed.
• There are numerous digital storage devices available to store the data
like, Hard Disk Drive (HDD), Solid State Drive (SSD), CD/DVD, Tape
Drive, Pen Drive, Memory Card, etc.
• We store data like images, documents, audios/ videos, etc. as files in
our computers.
• However, file processing has certain limitations, which can be
overcome through Database Management System (DBMS).
Data Processing
• Data need to be processed to get results and after analysing those
results, we make conclusions or decisions.
(or)
Data Processing Is the method of converting raw data into meaningful
information.
E.g. online bill payment, registration of complaints, booking tickets, etc.

Raw data Information

(Numbers/Text/Images) (In the form of table/chart/text)
Input Processing Output

Data collection Store Results

Data entry Retrieve Reports
Update
Measures of Central Tendency
A measure of central tendency is a single value that gives us some idea
about the data. Three most common measures of central tendency are
the mean, median, and mode.
(A) Mean: Mean is simply the average of numeric values of an
attribute. Mean is also called average.
Suppose there are data on weight of 40 students in a class. Instead of
looking at each of the data values, we can calculate the average to get
an idea about the average weight of students in that class.
Definition: Given n values x1 , x2 , x3 ,...xn , mean is computed as
Assume that height (in cm) of students in a class are as follows
[90,102,110,115,85,90,100,110,110].
Mean or average height of the class is
90+102+110+115+85+90+100+110+110
9
=>912
9
=>101.33cm
Mean is not a suitable choice if there are outliers in the data. To
calculate mean, the outliers or extreme values should be removed from
the given data and then calculate mean of the remaining data.
(B) Median:
• Median is also computed for a single attribute/variable at a time.
When all the values are sorted in ascending or descending order, the
middle value is called the Median.
• When there are odd number of values, then median is the value at
the middle position.
• If the list has even number of values, then median is the average of
the two middle values.
Eg.
• In order to compute the median, for the above example the first step
is to sort data in ascending or descending order.
• We have sorted the height data in ascending order as
[85,90,90,100,102,110,110,110, 115].
• As there are total 9 values (odd number), the median is the value at
position 5, that is 102 cm.
(C) Mode:
• Value that appears most number of times in the given data of an
attribute/variable is called Mode.
• It is computed on the basis of frequency of occurrence of distinct
values in the given data.
• A data set has no mode if each value occurs only once.
• There may be multiple modes in the data if more than one values
have same highest frequency.
• Mode can be found for numeric as well as non-numeric data.
• In the above example, In the list of height of students, mode is 110 as
its frequency of occurrence in the list is 3, which is larger than the
frequency of rest of the values.
Measures of Variability
• The measures of variability refer to the spread or variation of the
values around the mean. They are also called measures of dispersion.
• They also indicate difference within the group.
• Common measures of dispersion or variability are Range and
Standard Deviation.
(A) Range:
• It is the difference between maximum and minimum values of the
data (the largest value minus the smallest value). Range can be
calculated only for numerical data.
E.G. difference in salaries of employees, marks of a student, price of
toys, etc.
• Let M be the largest or maximum value and S is the smallest or
minimum value in the data, then Range is the difference between two
extreme values i.e.
M – S or Maximum – Minimum.
Example: In the above example, minimum hight value is 85 cm and
maximum hight value is 115 cm. Hence, range is 115-85 = 30 cm.
(B) Standard deviation:
• Standard deviation refers to differences within the group or set of
data of a variable.
• Range uses only two extreme values in the data, but calculation of
standard deviation considers all the given data.
• It is calculated as the positive square root of the average of squared
difference of each value from the mean value of data.
• Smaller value of standard deviation means data are less spread while
a larger value of standard deviation means data are more spread.
• Given n values x1, x2, x3,...xn, and their mean x, the standard
deviation, represented as σ (greek letter sigma) is computed as

Understanding Data
No ratings yet
Understanding Data
5 pages
Chapter 1 RM
No ratings yet
Chapter 1 RM
44 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Data Analysis3
No ratings yet
Data Analysis3
31 pages
E-Note 33325 Content Document 20250319114322AM
No ratings yet
E-Note 33325 Content Document 20250319114322AM
69 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Data Literacy
No ratings yet
Data Literacy
9 pages
Characteristics of The Data: Unprocessed, Unorganised and Discrete
No ratings yet
Characteristics of The Data: Unprocessed, Unorganised and Discrete
4 pages
DWDM 3-1 Unit 2
No ratings yet
DWDM 3-1 Unit 2
32 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
DA Major Notes
No ratings yet
DA Major Notes
46 pages
Stats and Its Real World Applications.
No ratings yet
Stats and Its Real World Applications.
53 pages
CHAPTER 4 Data Management
No ratings yet
CHAPTER 4 Data Management
16 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
45 pages
Chapter - 3 Data Pre - Processing
No ratings yet
Chapter - 3 Data Pre - Processing
54 pages
Data Science Notes
No ratings yet
Data Science Notes
3 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Data Preprocessing Techniques Overview
No ratings yet
Data Preprocessing Techniques Overview
66 pages
Data Exploration and Preprocessing Guide
No ratings yet
Data Exploration and Preprocessing Guide
81 pages
Data Handluing
No ratings yet
Data Handluing
108 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
Data Processing and Analysis Techniques
No ratings yet
Data Processing and Analysis Techniques
72 pages
Unit 1
No ratings yet
Unit 1
78 pages
Analytical Decision Making
No ratings yet
Analytical Decision Making
27 pages
Statistics for Computer Science Students
No ratings yet
Statistics for Computer Science Students
6 pages
Excel & Python Statistical Functions
No ratings yet
Excel & Python Statistical Functions
44 pages
Data Management
No ratings yet
Data Management
31 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Data Management
No ratings yet
Data Management
57 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
25 Essential Data Analysis Terms Every Analyst Should Know
No ratings yet
25 Essential Data Analysis Terms Every Analyst Should Know
11 pages
Data Exploration
No ratings yet
Data Exploration
61 pages
Data Management for Students
No ratings yet
Data Management for Students
11 pages
3 Data Visualization
No ratings yet
3 Data Visualization
75 pages
Introduction to Statistics and Analytics
No ratings yet
Introduction to Statistics and Analytics
44 pages
Statistics Module: Arijit Mitra
No ratings yet
Statistics Module: Arijit Mitra
25 pages
Process and Summarize Data
No ratings yet
Process and Summarize Data
2 pages
Ch01 ICS422 04
No ratings yet
Ch01 ICS422 04
84 pages
DATA 240 - 23 - Lec3 - FA 2024 - Dist
No ratings yet
DATA 240 - 23 - Lec3 - FA 2024 - Dist
50 pages
Statistics
No ratings yet
Statistics
81 pages
Statistics For Data Science PDF - Statistics-for-Data-Science PDF
No ratings yet
Statistics For Data Science PDF - Statistics-for-Data-Science PDF
14 pages
Topics To Be Covered
No ratings yet
Topics To Be Covered
58 pages
Data Analysis by Dr. E. Mushi
No ratings yet
Data Analysis by Dr. E. Mushi
70 pages
How Much Data Does Google Handle?
No ratings yet
How Much Data Does Google Handle?
132 pages
Explain Briefly The Stages in Data Processing
No ratings yet
Explain Briefly The Stages in Data Processing
7 pages
Cec 218 - 042006
No ratings yet
Cec 218 - 042006
83 pages
Data Management
No ratings yet
Data Management
43 pages
Chapter 2 BSC TY Statistical Data Analysis
No ratings yet
Chapter 2 BSC TY Statistical Data Analysis
124 pages
1 Data and Statistics
No ratings yet
1 Data and Statistics
65 pages
Understanding Data Types and Averages
No ratings yet
Understanding Data Types and Averages
15 pages
Chapter 5 Summarising and Analysing Data (S)
No ratings yet
Chapter 5 Summarising and Analysing Data (S)
20 pages
The Machine Learning Process Involves Several Steps That Help Develop and Deploy A Successful Machine Learning Model
No ratings yet
The Machine Learning Process Involves Several Steps That Help Develop and Deploy A Successful Machine Learning Model
62 pages
DS Mini
No ratings yet
DS Mini
3 pages
LM Maths Section 8 Lversion
No ratings yet
LM Maths Section 8 Lversion
41 pages
ML - Data - Preprocessing For Machine Learning
No ratings yet
ML - Data - Preprocessing For Machine Learning
44 pages
Business Statistics Course Overview
No ratings yet
Business Statistics Course Overview
63 pages
LM Maths Section 8 Tversion
No ratings yet
LM Maths Section 8 Tversion
45 pages
Data Analysis Steps and Types Explained
No ratings yet
Data Analysis Steps and Types Explained
30 pages
Ch-8 Database Concepts
No ratings yet
Ch-8 Database Concepts
3 pages
Worksheet 1 - SQL
No ratings yet
Worksheet 1 - SQL
7 pages
Database Concepts
No ratings yet
Database Concepts
25 pages
Student Table
No ratings yet
Student Table
1 page
Understand Data
No ratings yet
Understand Data
27 pages
Class 11 Lesson Plan
No ratings yet
Class 11 Lesson Plan
25 pages
Term 1 IP AK
No ratings yet
Term 1 IP AK
6 pages
Section
No ratings yet
Section
6 pages
Section Physical Education
No ratings yet
Section Physical Education
5 pages
Information Practices: Section A
No ratings yet
Information Practices: Section A
8 pages
Dataframe Notes
No ratings yet
Dataframe Notes
26 pages
Pandas DataFrame and Series Guide
No ratings yet
Pandas DataFrame and Series Guide
6 pages
Iteration Over DataFrame
No ratings yet
Iteration Over DataFrame
10 pages
13 Ab
No ratings yet
13 Ab
2 pages
Ict 24 Week Learning Plan
No ratings yet
Ict 24 Week Learning Plan
114 pages
Python BCA5thSem QA 2025
No ratings yet
Python BCA5thSem QA 2025
3 pages
ILWIS Tutorials
No ratings yet
ILWIS Tutorials
13 pages
Talha's Resume
No ratings yet
Talha's Resume
1 page
Global Mapper Overview and Features
75% (4)
Global Mapper Overview and Features
64 pages
Intouch Symbolos PDF
No ratings yet
Intouch Symbolos PDF
17 pages
Ais 2
No ratings yet
Ais 2
3 pages
TFT Series Dev Kit Manual
No ratings yet
TFT Series Dev Kit Manual
106 pages
Programmable Controller Engineering Software Melsoft GX Works3 FB Quick Start Guide
No ratings yet
Programmable Controller Engineering Software Melsoft GX Works3 FB Quick Start Guide
56 pages
CD Burning Error Log
No ratings yet
CD Burning Error Log
4 pages
Az - 104 - Day3note Files From Class On 11 - 2024
No ratings yet
Az - 104 - Day3note Files From Class On 11 - 2024
4 pages
Agile Processes Scrum
No ratings yet
Agile Processes Scrum
24 pages
E-Commerce Basics for BCom Students
No ratings yet
E-Commerce Basics for BCom Students
19 pages
Technical Presentation - Chandigarh MSME - March'2025
No ratings yet
Technical Presentation - Chandigarh MSME - March'2025
40 pages
Basic Data Structures: Queues and Deques
No ratings yet
Basic Data Structures: Queues and Deques
31 pages
CB3591 ESSS Question Bank
No ratings yet
CB3591 ESSS Question Bank
17 pages
Ankur - Chawla - TIBCO BW Integration Engineer
No ratings yet
Ankur - Chawla - TIBCO BW Integration Engineer
5 pages
Speech Quality and MOS
33% (3)
Speech Quality and MOS
27 pages
c655eGPRS 1
No ratings yet
c655eGPRS 1
61 pages
Drucker 2021 - DH
No ratings yet
Drucker 2021 - DH
14 pages
Detailed Java RoadMap
No ratings yet
Detailed Java RoadMap
8 pages
IP Project Devansh Amrujkar
No ratings yet
IP Project Devansh Amrujkar
11 pages
BHT-002 WIFI 说明书带MOES 181016 PDF
No ratings yet
BHT-002 WIFI 说明书带MOES 181016 PDF
2 pages
LinkedIn Marketing Agency Guide
No ratings yet
LinkedIn Marketing Agency Guide
21 pages
Happiness As Found in Forethought Minus Fearthought - Horace Fletcher
No ratings yet
Happiness As Found in Forethought Minus Fearthought - Horace Fletcher
268 pages
Zhang Et Al. - 2019 - Edge Video Analytics For Public Safety A Review
No ratings yet
Zhang Et Al. - 2019 - Edge Video Analytics For Public Safety A Review
22 pages
ABAP 7.40 Quick
67% (3)
ABAP 7.40 Quick
42 pages
How To Diagnose A Problem in The Item Catalog PDF
No ratings yet
How To Diagnose A Problem in The Item Catalog PDF
8 pages
Lab 5 - Basic Switch Configuration
No ratings yet
Lab 5 - Basic Switch Configuration
13 pages

Understanding Data

Uploaded by

Understanding Data

Uploaded by

CSC

ModelNo Unit ProductName Price Discount(%) Items_in_Inventory

ABC1 Water bottle 126 8 13

ABC2 Melamine Plates 320 5 45

Raw data Information

Data collection Store Results

You might also like