Data ingestion is the process of obtaining and importing data for immediate use or storage in a database.
This section provides an overview of various data formats like CSV, JSON, XML, etc., and their characteristics.
Data discovery is the process of collecting data from different sources by performing exploratory data analysis.
This section discusses various data sources and the methods of acquiring data from these sources.
Data integration involves combining data from different sources and providing users with a unified view of the data.
Data fusion is the process of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source.
Transformation involves converting the data from one format or structure into another. Enrichment refers to enhancing data with relevant information that could make the data more useful.
Data survey involves collecting data by asking people questions and recording their answers.
Google OpenRefine is a tool for working with messy data, cleaning it, transforming it from one format into another, and extending it with web services and external data.
This section discusses the considerations and strategies for determining the amount of data needed for specific purposes.
ETL stands for Extract, Transform, Load. It's a process that extracts data from source systems, transforms the information into a consistent data type, then loads the data into a single depository.