Skip to content

Latest commit

 

History

History
 
 

08_Data-Ingestion

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

8_ Data Ingestion

Data ingestion is the process of obtaining and importing data for immediate use or storage in a database.

1_ Summary of data formats

This section provides an overview of various data formats like CSV, JSON, XML, etc., and their characteristics.

2_ Data discovery

Data discovery is the process of collecting data from different sources by performing exploratory data analysis.

3_ Data sources & Acquisition

This section discusses various data sources and the methods of acquiring data from these sources.

4_ Data integration

Data integration involves combining data from different sources and providing users with a unified view of the data.

5_ Data fusion

Data fusion is the process of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source.

6_ Transformation & enrichment

Transformation involves converting the data from one format or structure into another. Enrichment refers to enhancing data with relevant information that could make the data more useful.

7_ Data survey

Data survey involves collecting data by asking people questions and recording their answers.

8_ Google OpenRefine

Google OpenRefine is a tool for working with messy data, cleaning it, transforming it from one format into another, and extending it with web services and external data.

9_ How much data ?

This section discusses the considerations and strategies for determining the amount of data needed for specific purposes.

10_ Using ETL

ETL stands for Extract, Transform, Load. It's a process that extracts data from source systems, transforms the information into a consistent data type, then loads the data into a single depository.