introduction to Big Data
platform
Dr. Anil Kumar Dubey
Associate Professor,
Computer Science & Engineering Department,
ABES EC, Ghaziabad
Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Uttar
Pradesh, Lucknow
Basic
Bigdata platform is a type of IT solution that
combines the features and capabilities of several big
data application and utilities within a single solution.
An enterprise class IT platform that enables
organization in developing, deploying, operating and
managing a big data infrastructure /environment.
Refersto IT solutions that combine several Big Data
Tools and utilities into one packaged answer, and this
is then used further for managing as well as
analyzing Big Data.
Conti…
Big data platform generally consists of
Big data storage
Servers
Database
Big data management
Business intelligence & other big data
management utilities.
It also supports
Custom development
Conti..
Primary benefit behind a big data platform is
to reduce the complexity of multiple vendors/
solutions into a one cohesive solution.
Big data platform are also delivered through
cloud where the provider provides an all
inclusive big data solutions and services.
Platforms
Hadoop Delta Lake Migration Platform
Data Catalog Platform
Data Ingestion Platform
IoT Analytics Platform
Data Integration and Management Platform
ETL Data Transformation Platform
Conti…
Hadoop - Delta Lake Migration Platform
It is an open-source software platform managed
by Apache Software Foundation.
It is used to manage and store large data sets at
a low cost and with great efficiency.
Delta Lake is an open-source storage layer that
helps bring reliability to the data lakes. It provides
ACID transactions, unifies streaming and batch
data processing, and scalable metadata handling.
Conti…
Conti…
Data Catalog Platform
It provides a single self-service environment to the
users, helping them find, understand, and trust the
data source.
It also helps the users to discover the new data
sources if there are any. Discovering and
understanding data sources are the initial steps for
registering the sources.
Userssearch for the Data Catalog Tools based on the
needs and filter the appropriate results.
Conti…
Data Catalog Platform
The exact data you need,
Exactly when you need it,
Complete with all the footnotes:
◦ Definitions
◦ Stewardship details
◦ Popularity metrics
◦ Nitty-gritty of policies.
It's not just about accessing data; it's about
Conti…
Data Catalog Platform tools
Conti…
Data Ingestion Platform
This layer is the first step for the data coming
from variable sources to start its journey.
This means the data here is prioritized and
categorized, making data flow smoothly in
further layers in this process flow.
It
refers to the process of collecting data from
multiple sources and loading it into another
Conti…
Data Ingestion Platform
Businesses most commonly use a subtype of data
ingestion called ETL (extract, transform, load), which allows
the data to be transformed before it's loaded.
ETL is a subtype of data ingestion. Through this process, data
is extracted and cleaned up before being loaded in a
warehouse.
Some of the most popular ETL tools include [Link],
Airbyte, Matillion, Talend, and Wavefront.
[Link] is a no-code data pipeline platform that simplifies
Conti…
Data Ingestion Tools
Conti…
IoT Analytics Platform
It provides a wide range of tool to work upon big data; this
functionality of it comes handy while using it over
the IoT case.
IoTdata analytics platforms are software tools that help
businesses collect and analyze the data from their far-
flung network of IoT (Internet of Things) devices.
IoTnetworks collect vast amounts of data – from consumer
spending patterns to traffic usage – and IoT data analytics
platforms are essential in helping companies generate the
insight needed for competitive advantage.
Conti…
Analytic
Pros Cons Pricing
Platforms
AWS IoT -Scalable Request a quote or start
-Lack of guidelines
Analytics -Predictive analysis free.
-Secure
Microsoft Azure Request a quote or start
communication -Expensive
IoT free.
-Easy integration
IBM Watson IoT -Centralized dashboard
-Needs better training Free trial or contact sales.
Platform -Flexible
Pricing for Standard,
-Event alerts -Limited support
ThingSpeak Academic, Student, and
-Instant visualizations -Not for experts
home online.
Oracle IoT Cloud -Simple deployment -Needs more
Free trial or contact sales.
Service -Documentation integration
-Great network
Start for free or contact
Datadog mapping -Not for beginners
sales.
-Metric history
-Strong visibility -Needs more
Cisco IoT Request a quote.
-Improves uptime language tools
Conti…
Big Data Integration and Management
Platform
ElixirData provides a highly customizable
solution for Enterprises.
ElixirData provides Flexibility, Security, and
Stability for an Enterprise application and Big
Data Infrastructure to deploy on-premises and
Public Cloud with cognitive insights using
Machine Learning and Artificial Intelligence.
Conti…
ETL Data Transformation Platform
This Platform can be used to build pipelines
and even schedule the running of the same
for data transformation.
Essential Components of Big Data Platform
Several essential components as given:
Data Ingestion, Management, ETL, and Warehouse
Stream Computing
Analytics/ Machine Learning
Integration
Data Governance
Provides Accurate Data
Scalability
Price Optimization
Reduced Latency
Conti…
Data Ingestion, Management, ETL, and
Warehouse – It provides these resources for
effective data management and effective data
warehousing, and this manages data as a
valuable resource.
Stream Computing – Helps compute the
streaming data that is used for real-time analytics.
Analytics/ Machine Learning – Features for
advanced analytics and machine learning.
Conti…
Integration – It provides its user with features like
integrating big data from any source with ease.
Data Governance – It also provides
comprehensive security, data governance, and
solutions to protect the data.
Provides Accurate Data – It delivers with analytic
tools which in turn helps to omit any inaccurate
data that has not been analyzed. This also helps
the business to make the right decision by utilizing
Conti…
Scalability – It also helps scale the application to
analyze all time climbing data; it sizes to provide
efficient analysis. It offers scalable storage capacity.
Price Optimization – Data analytics with the help of a
big data platform provides insight for B2C and B2B
enterprises which helps the business to optimize the
prices they charge accordingly.
Reduced Latency – With the set of the warehouse,
analytics tools, and efficient Data transformation, it
helps to reduce the data latency and provide high
Big Data Platform Use Cases
Insurance Fraud Detection: Companies
handling a large number of financial
transactions use tools provided by this
platform to look for any fraud that’s
happening.
In Real Life: It can be used for various use
cases of real-time stream processing like in
the field of Media and Entertainment,
Weather patterns, the Transportation
Drivers for Big Data
Six main business drivers
1) Digitization of Society
2) Plummeting of Technology costs
3) Connectivity through Cloud Computing
4) Increased Knowledge about Data Science
5) Social Media Applications
6) Upcoming Internet-of-Things (IoT)
1. Digitization of Society
Big Data is largely consumer driven and consumer
oriented. Most of the data in the world is generated
by consumers, who are nowadays ‘always-on’.
Most people now spend 4-6 hours per day
consuming and generating data through a variety
of devices and (social) applications.
With every click, swipe or message, new data is
created in a database somewhere around the
world. Because everyone now has a smartphone in
their pocket, the data creation sums to
incomprehensible amounts.
2. Plummeting of Technology Costs
The costs of data storage and processors keep declining,
making it possible for small businesses and individuals to
become involved with Big Data.
For storage capacity, the often-cited Moore’s Law still holds
that the storage density (and therefore capacity) still
doubles every two years. The plummeting of technology
costs has been depicted in the figure below.
Besides the plummeting of the storage costs, a second key
contributing factor to the affordability of Big Data has been
the development of open source Big Data software
frameworks.
Most popular software framework (nowadays considered the
standard for Big Data) is Apache Hadoop for distributed
storage and processing.
3. Connectivity through Cloud Computing
Cloud computing environments (where data is
remotely stored in distributed storage systems)
have made it possible to quickly scale up or scale
down IT infrastructure and facilitate a pay-as-you-
go model.
This means that organizations that want to process
massive quantities of data (and thus have large
storage and processing requirements) do not have
to invest in large quantities of IT infrastructure.
Instead, they can license the storage and
processing capacity they need and only pay for the
amounts they actually used.
4. Increased Knowledge about Data Science
The knowledge and education about data
science has greatly professionalized and more
information becomes available every day.
While statistics and data analysis mostly
remained an academic field previously, it is
quickly becoming a popular subject among
students and the working population.
5. Social Media Applications
Social media data provides insights into the
behaviors, preferences and opinions of ‘the public’ on
a scale that has never been known before.
Due to this, it is immensely valuable to anyone who is
able to derive meaning from these large quantities of
data.
Social media data can be used to identify customer
preferences for product development, target new
customers for future purchases, or even target
potential voters in elections.
Social media data might even be considered one of
the most important business drivers of Big Data.
6. Upcoming Internet of Things (IoT)
IoT is the network of physical devices, vehicles,
home appliances and other items embedded
with electronics, software, sensors, actuators,
and network connectivity which enables these
objects to connect and exchange data.
It is increasingly gaining popularity as
consumer goods providers start including
‘smart’ sensors in household appliances.
Examples of these devices include thermostats,
smoke detectors, televisions, audio systems
and even smart refrigerators.
Conti…
THANK
YOU