2.structure and Unstructured Data Disruptive System
2.structure and Unstructured Data Disruptive System
Arindam Ghosh
DATA ANALYTICS
Disruptive Technology Innovation
Structured Data
Definition: Structured data refers to highly organized information that adheres to a predefined
schema or model. This type of data is typically stored in fixed fields within records or files, making it
easily accessible and manageable.
Characteristics:
1. Organization:
o Structured data is arranged in rows and columns, much like a spreadsheet or database
table. Each row represents a record, while each column corresponds to a specific
attribute or field of that record.
2. Schema:
o A predefined schema dictates how data is stored, including data types (e.g., integer,
string, date) and relationships between tables. This schema ensures consistency and
integrity of the data.
3. Data Types:
4. Querying:
o Structured data can be easily queried using Structured Query Language (SQL),
allowing users to perform complex searches and data manipulation. This makes it
ideal for reporting and analytics.
Examples:
• Databases: Relational databases like MySQL, PostgreSQL, and Microsoft SQL Server.
• Spreadsheets: Applications like Microsoft Excel or Google Sheets, where data is organized
in tabular format.
• CRM Systems: Customer relationship management software that stores structured customer
data.
Uses:
Definition: Unstructured data refers to information that lacks a clear format or structure, making it
challenging to collect, process, and analyse. This data does not conform to a predefined schema and
often requires advanced techniques for extraction of insights.
Characteristics:
1. Lack of Organization:
o Unstructured data can exist in various forms and does not fit neatly into tables. It may
include free text, multimedia files, or other formats that require interpretation.
2. Variety of Formats:
3. Difficult to Analyse:
Examples:
• Social media: Posts, comments, and multimedia content on platforms like Facebook,
Twitter, and Instagram.
• Emails: The content and attachments in emails can provide valuable information but lack a
structured format.
• Multimedia Files: Videos, podcasts, and images that may contain rich content but require
specialized tools for analysis.
• Web Content: Blogs, articles, and forums that offer insights but do not follow a consistent
structure.
Uses:
• Machine Learning: Training algorithms on diverse datasets that include text, images, and
audio for predictive analytics and pattern recognition.