0% found this document useful (0 votes)
85 views31 pages

Unit 1 Topic 2 Big Data Platform

The document provides an introduction to Big Data platforms, highlighting their role as integrated IT solutions for managing and analyzing large data sets. It discusses various components of Big Data platforms, including data ingestion, management, and analytics, as well as specific platforms like Hadoop and IoT analytics. Additionally, it outlines the drivers of Big Data, including digitization, reduced technology costs, and the rise of IoT.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views31 pages

Unit 1 Topic 2 Big Data Platform

The document provides an introduction to Big Data platforms, highlighting their role as integrated IT solutions for managing and analyzing large data sets. It discusses various components of Big Data platforms, including data ingestion, management, and analytics, as well as specific platforms like Hadoop and IoT analytics. Additionally, it outlines the drivers of Big Data, including digitization, reduced technology costs, and the rise of IoT.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

introduction to Big Data

platform

Dr. Anil Kumar Dubey


Associate Professor,
Computer Science & Engineering Department,
ABES EC, Ghaziabad
Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Uttar
Pradesh, Lucknow
Basic
 Bigdata platform is a type of IT solution that
combines the features and capabilities of several big
data application and utilities within a single solution.

 An enterprise class IT platform that enables


organization in developing, deploying, operating and
managing a big data infrastructure /environment.

 Refersto IT solutions that combine several Big Data


Tools and utilities into one packaged answer, and this
is then used further for managing as well as
analyzing Big Data.
Conti…
Big data platform generally consists of
 Big data storage
 Servers
 Database
 Big data management
 Business intelligence & other big data
management utilities.

It also supports


 Custom development
Conti..
Primary benefit behind a big data platform is
to reduce the complexity of multiple vendors/
solutions into a one cohesive solution.

Big data platform are also delivered through


cloud where the provider provides an all
inclusive big data solutions and services.
Platforms
Hadoop Delta Lake Migration Platform
Data Catalog Platform
Data Ingestion Platform
IoT Analytics Platform
Data Integration and Management Platform
ETL Data Transformation Platform
Conti…
Hadoop - Delta Lake Migration Platform
It is an open-source software platform managed
by Apache Software Foundation.

It is used to manage and store large data sets at


a low cost and with great efficiency.

Delta Lake is an open-source storage layer that


helps bring reliability to the data lakes. It provides
ACID transactions, unifies streaming and batch
data processing, and scalable metadata handling.
Conti…
Conti…
Data Catalog Platform
It provides a single self-service environment to the
users, helping them find, understand, and trust the
data source.

It also helps the users to discover the new data


sources if there are any. Discovering and
understanding data sources are the initial steps for
registering the sources.

Userssearch for the Data Catalog Tools based on the


needs and filter the appropriate results.
Conti…
Data Catalog Platform
The exact data you need,
Exactly when you need it,
Complete with all the footnotes:
◦ Definitions
◦ Stewardship details
◦ Popularity metrics
◦ Nitty-gritty of policies.

It's not just about accessing data; it's about


Conti…
Data Catalog Platform tools
Conti…
Data Ingestion Platform
This layer is the first step for the data coming
from variable sources to start its journey.

This means the data here is prioritized and


categorized, making data flow smoothly in
further layers in this process flow.

It
refers to the process of collecting data from
multiple sources and loading it into another
Conti…
Data Ingestion Platform
 Businesses most commonly use a subtype of data
ingestion called ETL (extract, transform, load), which allows
the data to be transformed before it's loaded.

 ETL is a subtype of data ingestion. Through this process, data


is extracted and cleaned up before being loaded in a
warehouse.

 Some of the most popular ETL tools include [Link],


Airbyte, Matillion, Talend, and Wavefront.

 [Link] is a no-code data pipeline platform that simplifies


Conti…
Data Ingestion Tools
Conti…
IoT Analytics Platform
 It provides a wide range of tool to work upon big data; this
functionality of it comes handy while using it over
the IoT case.

 IoTdata analytics platforms are software tools that help


businesses collect and analyze the data from their far-
flung network of IoT (Internet of Things) devices.

 IoTnetworks collect vast amounts of data – from consumer


spending patterns to traffic usage – and IoT data analytics
platforms are essential in helping companies generate the
insight needed for competitive advantage.
Conti…
Analytic
Pros Cons Pricing
Platforms
AWS IoT -Scalable Request a quote or start
-Lack of guidelines
Analytics -Predictive analysis free.
-Secure
Microsoft Azure Request a quote or start
communication -Expensive
IoT free.
-Easy integration
IBM Watson IoT -Centralized dashboard
-Needs better training Free trial or contact sales.
Platform -Flexible

Pricing for Standard,


-Event alerts -Limited support
ThingSpeak Academic, Student, and
-Instant visualizations -Not for experts
home online.

Oracle IoT Cloud -Simple deployment -Needs more


Free trial or contact sales.
Service -Documentation integration
-Great network
Start for free or contact
Datadog mapping -Not for beginners
sales.
-Metric history
-Strong visibility -Needs more
Cisco IoT Request a quote.
-Improves uptime language tools
Conti…
Big Data Integration and Management
Platform
ElixirData provides a highly customizable
solution for Enterprises.

ElixirData provides Flexibility, Security, and


Stability for an Enterprise application and Big
Data Infrastructure to deploy on-premises and
Public Cloud with cognitive insights using
Machine Learning and Artificial Intelligence.
Conti…
ETL Data Transformation Platform
This Platform can be used to build pipelines
and even schedule the running of the same
for data transformation.
Essential Components of Big Data Platform
Several essential components as given:
Data Ingestion, Management, ETL, and Warehouse
Stream Computing
Analytics/ Machine Learning
Integration
Data Governance
Provides Accurate Data
Scalability
Price Optimization
Reduced Latency
Conti…
Data Ingestion, Management, ETL, and
Warehouse – It provides these resources for
effective data management and effective data
warehousing, and this manages data as a
valuable resource.

Stream Computing – Helps compute the


streaming data that is used for real-time analytics.

Analytics/ Machine Learning – Features for


advanced analytics and machine learning.
Conti…
Integration – It provides its user with features like
integrating big data from any source with ease.

Data Governance – It also provides


comprehensive security, data governance, and
solutions to protect the data.

Provides Accurate Data – It delivers with analytic


tools which in turn helps to omit any inaccurate
data that has not been analyzed. This also helps
the business to make the right decision by utilizing
Conti…
 Scalability – It also helps scale the application to
analyze all time climbing data; it sizes to provide
efficient analysis. It offers scalable storage capacity.

 Price Optimization – Data analytics with the help of a


big data platform provides insight for B2C and B2B
enterprises which helps the business to optimize the
prices they charge accordingly.

 Reduced Latency – With the set of the warehouse,


analytics tools, and efficient Data transformation, it
helps to reduce the data latency and provide high
Big Data Platform Use Cases
Insurance Fraud Detection: Companies
handling a large number of financial
transactions use tools provided by this
platform to look for any fraud that’s
happening.

In Real Life: It can be used for various use


cases of real-time stream processing like in
the field of Media and Entertainment,
Weather patterns, the Transportation
Drivers for Big Data
Six main business drivers
1) Digitization of Society
2) Plummeting of Technology costs
3) Connectivity through Cloud Computing
4) Increased Knowledge about Data Science
5) Social Media Applications
6) Upcoming Internet-of-Things (IoT)
1. Digitization of Society
Big Data is largely consumer driven and consumer
oriented. Most of the data in the world is generated
by consumers, who are nowadays ‘always-on’.
Most people now spend 4-6 hours per day
consuming and generating data through a variety
of devices and (social) applications.
With every click, swipe or message, new data is
created in a database somewhere around the
world. Because everyone now has a smartphone in
their pocket, the data creation sums to
incomprehensible amounts.
2. Plummeting of Technology Costs
 The costs of data storage and processors keep declining,
making it possible for small businesses and individuals to
become involved with Big Data.
 For storage capacity, the often-cited Moore’s Law still holds
that the storage density (and therefore capacity) still
doubles every two years. The plummeting of technology
costs has been depicted in the figure below.
 Besides the plummeting of the storage costs, a second key
contributing factor to the affordability of Big Data has been
the development of open source Big Data software
frameworks.
 Most popular software framework (nowadays considered the
standard for Big Data) is Apache Hadoop for distributed
storage and processing.
3. Connectivity through Cloud Computing
Cloud computing environments (where data is
remotely stored in distributed storage systems)
have made it possible to quickly scale up or scale
down IT infrastructure and facilitate a pay-as-you-
go model.
This means that organizations that want to process
massive quantities of data (and thus have large
storage and processing requirements) do not have
to invest in large quantities of IT infrastructure.
Instead, they can license the storage and
processing capacity they need and only pay for the
amounts they actually used.
4. Increased Knowledge about Data Science
The knowledge and education about data
science has greatly professionalized and more
information becomes available every day.
While statistics and data analysis mostly
remained an academic field previously, it is
quickly becoming a popular subject among
students and the working population.
5. Social Media Applications
 Social media data provides insights into the
behaviors, preferences and opinions of ‘the public’ on
a scale that has never been known before.
 Due to this, it is immensely valuable to anyone who is
able to derive meaning from these large quantities of
data.
 Social media data can be used to identify customer
preferences for product development, target new
customers for future purchases, or even target
potential voters in elections.
 Social media data might even be considered one of
the most important business drivers of Big Data.
6. Upcoming Internet of Things (IoT)
IoT is the network of physical devices, vehicles,
home appliances and other items embedded
with electronics, software, sensors, actuators,
and network connectivity which enables these
objects to connect and exchange data.
It is increasingly gaining popularity as
consumer goods providers start including
‘smart’ sensors in household appliances.
Examples of these devices include thermostats,
smoke detectors, televisions, audio systems
and even smart refrigerators.
Conti…
THANK
YOU

You might also like