0% found this document useful (0 votes)
249 views48 pages

Microsoft Modern Data Estate

Microsoft Modern Data Estate

Uploaded by

Shyam Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
249 views48 pages

Microsoft Modern Data Estate

Microsoft Modern Data Estate

Uploaded by

Shyam Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Classified as Microsoft Confidential

Data & AI Strategy

Our strategy is to build best-in-class platforms and productivity


services for an intelligent cloud and an intelligent edge infused
with artificial intelligence (“AI”).

Classified as Microsoft Confidential


From Data to Decisions and Actions

Classified as Microsoft Confidential


THE MODERN DATA ESTATE

LOB CRM Graph Image Social IoT

Hybrid

Operational databases Operational databases


Data warehouses Data warehouses
Data Lakes Data Lakes

Reason over any data, anywhere Flexibility of choice Security and performance

Classified as Microsoft Confidential


THE MICROSOFT OFFERING

LOB CRM Graph Image Social IoT

SQL Server Hybrid Azure Data Services


Easiest lift and shift
with no code changes

Industry leader 2 years in a row Operational databases Operational databases 70% faster than Aurora
#1 TPC-H performance Data warehouses Data warehouses 2x global reach than Redshift
T-SQL query over any data Data lakes Data lakes No Limits Analytics with 99.9% SLA

AI built-in | Most secure | Lowest TCO

Reason over any data, anywhere Flexibility of choice Security and performance

Classified as Microsoft Confidential


Classified as Microsoft Confidential
Breaking points of traditional approach

Classified as Microsoft Confidential


Breaking points of traditional approach

Classified as Microsoft Confidential


Breaking points of traditional approach

Classified as Microsoft Confidential


Breaking points of traditional approach

Classified as Microsoft Confidential


Evolution of the data warehouse
Increasing data volumes. New data sources and types. Big Data + DW workloads

LOB

CRM INGEST STORE PREP & TRAIN MODEL &


SERVE
Graph

Image
Data orchestration Big data Hadoop/Spark and Data warehouse
and monitoring store machine learning
Social
Apps + insights

IoT

Classified as Microsoft Confidential


The Data Lake Approach
Store all data Do analysis
Ingest all data in native format
regardless of requirements Using analytic engines
without schema
like Hadoop and ADLA
definition

Devices
Batch queries
Interactive queries
Real-time analytics
Machine Learning
Data warehouse

Classified as Microsoft Confidential


Microsoft’s Data Lake Journey

Classified as Microsoft Confidential


Microsoft’s internal Data Lake (Cosmos)


A data lake for everyone to put their data
Tools approachable by any developer
Data Stored
• Batch, Interactive, Streaming, ML
• Used across Office, Xbox Live, Azure, Windows, Bing,
Skype, etc…

By the numbers Xbox Live


• Exabytes of data under management
Office365
• 100Ks of Physical Servers LCA
Live
• 100Ks of Batch Jobs
Bing
Millions of Interactive Queries
SMSG
• Yammer

• Huge Streaming Pipelines CRM/Dynamics

Skype
• 10K+ Developers running diverse workloads and
Exchange
scenarios Windows
Malware Protection Microsoft Stores
Commerce Risk

Classified as Microsoft Confidential


2010 2011 2012 2013 2014 2015 2016
Reflections on our Data Lake Journey
Collect the data first Data Virality
We could not have predicted the scenarios or Value large datasets helps bootstrap the entire
the value we derived ahead of time. company into using big data.

The Power of Sharing Visibility & Control more important


Built a unified platform with a consistent than ever
security model. Built tools that allow users to Ever growing needs for Auditing, Compliance,
shape and join all data, while maintaining Data provenance, Regulatory
security, auditing and compliance for key data
sets.

Classified as Microsoft Confidential


Engineering for Data Lake

Design for Data/Schema Agility Not about Cost Savings


Data producers and data consumers must be It’s about delivering value for our business not
able to independently innovate frequently saving costs.
without critical data pipelines breaking.

Engineering Mindset and Skillset


Adopt data science, machine learning,
experimentation, telemetry, etc.

Classified as Microsoft Confidential


Classified as Microsoft Confidential
Azure

Business
apps Cosmos DB
Data Lake SQL DB
Web & mobile apps
Store Data Lake Analytics

Data Factory
(Data Movement)
SQL Data
Blob Azure Databricks Warehouse
(Spark) Operational reports
Storage
Custom
apps

Machine Learning Analysis


Services
Analytical dashboards
Event Hubs
Stream Analytics
Sensors Kafka on HDInsight
and devices

Azure ExpressRoute Azure Data Factory Azure Key Vault Operations Management Suite
Private Connections Orchestration Key Management Monitoring

Classified as Microsoft Confidential


K N O W I N G T H E VA R I O U S B I G D ATA S O L U T I O N S

CONTROL EASE OF USE

Azure Data Lake


Reduced Administration
Analytics
Azure Databricks

Azure HDInsight

ANALYTICS
BIG DATA
Azure Marketplace
HDP | CDH | MapR

Any Hadoop technology, Workload optimized, Frictionless & Optimized Data Engineering in a
any distribution managed clusters Spark clusters Job-as-a-service model

IaaS Clusters Managed Clusters Big Data as-a-service


Azure Data Lake
Analytics
Azure Data Lake Store

BIG DATA
STORAGE
Azure Storage

Classified as Microsoft Confidential


B I G D ATA & A D VA N C E D A N A LY T I C S AT A G L A N C E -
PA R T N E R S

Ingest Store Prep & Train Model & Serve Intelligence


Business
apps

Teradata
Intellicloud
Apache Predictive apps
Kafka with Spark/Hive/Pig
Open 10
HDFS on Cassandra on
Custom 01

Source HortonWorks/Clou DataStax


apps
Horton dera/MapR
Works &
Cloudera Azure MySQL
MySQL
Operational reports
database
Apache Storm
Machine
Apache Spark streaming
Learning
Sensors Azure
and devices PostgreSQL
Analytical dashboards

Classified as Microsoft Confidential


Microsoft OSS Porfolio

Microsoft Linux and Open Source

Classified as[Link]
Microsoft Confidential
Azure Data Factory: HYBRID DATA INTEGRATION AT SCALE
Data Processing & Movement CLOUD
Relational data Any BI tool

Dashboards | Reporting
Mobile BI | Cubes

OLTP ERP CRM LOB

Advanced
V-NET
Analytics
Machine Learning
Non-relational data Stream analytics Cognitive | AI

Any language
Web Media Social media Devices ON-PREMISE
.NET | Java | R | Python
Ruby | PHP | Scala

AZURE DATA FACTORY ORCHESTRATES DATA PIPELINE ACTIVITY WORKFLOW & SCHEDULING

Classified as Microsoft Confidential


A ZURE BLOB STORAGE
A highly scalable object storage for unstructured data

▪ Serverless Azure Service.


▪ Automatically scales as more data is uploaded.
▪ Can store billions of objects.
▪ Can store Images, Videos, Audio, Documents etc.
▪ Three types of Blobs: Block, Append and Page. Blobs are mutable.
▪ Four Replication Options: LRS, GRS, ZRS and RA-GRS
▪ Three storage tiers – Hot, Cool and Archive. Object can move between tiers.
▪ Strongly consistent
▪ SLA: 99.9 uptime and 99.99% for reads with RA-GRA (details)
▪ Monitoring via Azure Monitor
▪ Data encrypted at rest and in motion

Classified as Microsoft Confidential


A Z U R E D ATA L A K E S TO R E
A highly scalable, parallel, file system in the cloud specifically optimized for big data Analytics

▪ No limits on: number of files, size of individual files,


total amount of data stored, how long data can be Azure Data Lake Store File

stored or ingestion throughput


Block 1 Block 2 … Block n
▪ Low latency and high throughput workloads can be
used for ingesting streaming data.
▪ Stores all data types
▪ Is Hadoop-compatible (via WebHDFS REST API).
Sha
Supported by leading Hadoop distros and Sha Sha Sha Sha Sha Sha
rd Sha Sha Sha Sha Sha
rd rd rd rd rd
Block
rd rd
Block rd
Block rd
Block rd
Block rd
Block
HDInsight.
Data Node Data Node Data Node Data Node Data Node Data Node
▪ Provides POSIX-style permissions for RBAC
Backend Storage in Azure
▪ Integrates with AAD for authentication.

Classified as Microsoft Confidential


Azure SQL Database
SQL Database
A flavor of SQL DB designed to provide (PaaS)
easy migration to fully managed PaaS

Managed Elastic
Singleton
Instance Pool

Unmatched app Unmatched PaaS Favorable business


compatibility capabilities model

• Fully-fledged SQL • Lowest TCO + rich • Competitive


instance with nearly Azure ecosystem • Transparent
100% compat with • Built-in auto • Frictionless
on-prem management

Classified as Microsoft Confidential


Your work so far How PaaS helps?
Hardware resources - purchasing and management Built-in
Scales on-demand
Protect data with backups (with health checks and retention) Built-in
Geo-redundancy and PITR
High availability implementation (Always On, mirroring, log shipping) Built-in
99.99% SLA and auto-failover
Disaster recovery Geo-redundant backups
Geo-replication on a click
Compliance with standards Built-in
Security Easy to use features
Patching and updates Built-in
No regression guarantee
Monitoring Easy to use features
Performance tuning and maintenance Built-in
• Query plan regressions, index optimization
Classified as Microsoft Confidential
W H AT I S A Z U R E C O S M O S D B
A globally distributed, massively scalable, multi-model database service

Table API
MongoDB

SQL

Key-value Column-family Document Graph

Guaranteed low latency


at the 99th percentile
Elastic scale out Five well-defined
of storage & throughput consistency models

Turnkey global
Comprehensive
distribution
SLAs

Classified as Microsoft Confidential


Machine Learning & AI Portfolio
When to use what?
Microsoft
ML & AI
products

Build your own or consume pre- Build your


Consume
trained models? own

Azure Machine Learning Cognitive services, bots

Which experience do you


Code first Visual tooling
want?

(On-prem) (cloud) (cloud)


Deployment target
ML Server AML services (Preview) AML Studio

What engine(s) do you want


to use?
On- SQL SQL Spark Hadoop Azure DSVM Azure
prem Server Server Batch Container
Hadoop Service
Classified as Microsoft Confidential
Azure AI Services

Tools

Azure Infrastructure

Classified as Microsoft Confidential


Classified as Microsoft Confidential
Azure Machine Learning - Experimentation

Local machine

Scale up to DSVM

Command line tools


Scale out with Spark on HDInsight
IDEs
Notebooks in Workbench
AZURE ML Azure Batch AI (Coming Soon)
VS Code Tools for AI
E X P E R I M E N TAT I O N

ML Server

Classified as Microsoft Confidential


Azure Machine Learning – Model Management

Single node deployment


(cloud/on-prem)

Azure Container Service

Azure IoT Edge

AZURE ML DOCKER Microsoft ML Server


MODEL MANAGEMENT

Spark clusters

SQL Server

Classified as Microsoft Confidential


Classified as Microsoft Confidential
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure

Best of Databricks Best of Microsoft

Designed in collaboration with the founders of Apache Spark

One-click set up; streamlined workflows

Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

Native integration with Azure ser vices (Power BI, SQL DW, Cosmos DB, Blob Storage)

Enterprise-grade Azure security (Active Director y integration, compliance, enterprise -grade SL As)

Classified as Microsoft Confidential


Azure Databricks key audiences & benefits

Data scientist Data engineer CDO, VP of analytics


Integrated workspace Improved ETL performance Fast, collaborative analytics platform
accelerating time to market
Easy data exploration • Zero management clusters, serverless
No dev-ops required
Collaborative experience Easy to schedule jobs
Enterprise grade security
Interactive dashboards Automated workflows
• Encryption
Faster insights Enhanced monitoring & troubleshooting
• End-to-end auditing
• Best Spark & serverless • Automated alerts & easy access to logs
• Role-based control
• Databricks managed Spark Zero Management Spark
• Compliance
Cluster democratization (serverless)

Unified analytics platform


Providedasby
Classified Microsoft
Microsoft and Databricks under NDA
Confidential
Introducing Azure

Azure
Azure

Saas

Azure

Office 365
Public
Cloud

Classified as Microsoft Confidential


Classified as Microsoft Confidential
Azure Analysis Services
Azure
Analysis Services
Cloud data sources
Visualizations & insights
SQL Database
Power BI

SQL Other 3rd party tools & services


Data Warehouse

On-prem data sources Authoring & Dev

SQL Server Visual Studio

Other SSMS
data sources

Classified as Microsoft Confidential


Solution scenarios
Let’s walk through these scenarios to see the architecture in action…

Modern DW Advanced Analytics Internet of Things (IoT)

“We want to incorporate all “We are trying to predict “We are trying to get insights
of our data including ‘big when our customers churn.” from our devices in real-time,
data” with our data etc.”
warehouse”

Classified as Microsoft Confidential


BUSINESS APPS

AZURE CLI

AZURE DATA FACTORY

BCP COMMAND LINE UTILITY

SQL SERVER INTEGRATION SERVICES


ANALYTICAL DASHBOARDS

CUSTOM APPS

Classified as Microsoft Confidential


r AZURE CLI, AZURE DATA FACTORY POLYBASE

LOGS, FILES AND MEDIA


(UNSTRUCTURED) AZURE DATA LAKE STORE
AZURE STORAGE

DATA MIGRATION SERVICE ANALYTICAL DASHBOARDS


AZURE SQL DATA WAREHOUSE AZURE ANALYSIS SERVICES
BUSINESS / CUSTOM
APPS
(STRUCTURED)

Classified as Microsoft Confidential


r POLYBASE

DATA FACTORY
LOGS, FILES AND MEDIA
AZURE DATA LAKE STORE
(UNSTRUCTURED)

ANALYTICAL DASHBOARDS
AZURE SQL DATA WAREHOUSE AZURE ANALYSIS SERVICES
BUSINESS / CUSTOM DATA FACTORY
APPS
(STRUCTURED)

Classified as Microsoft Confidential


r
DATA FACTORY
LOGS, FILES AND MEDIA AZURE HDINSIGHT
AZURE DATA LAKE STORE
(UNSTRUCTURED)

POLYBASE
ANALYTICAL DASHBOARDS
BUSINESS / CUSTOM DATA FACTORY
APPS
(STRUCTURED)

AZURE SQL DATA WAREHOUSE

Classified as Microsoft Confidential


r
DATA FACTORY
LOGS, FILES AND MEDIA
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS
(UNSTRUCTURED)

POLYBASE
ANALYTICAL DASHBOARDS
BUSINESS / CUSTOM DATA FACTORY
APPS
(STRUCTURED)

AZURE SQL DATA WAREHOUSE

Classified as Microsoft Confidential


AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
r
LOGS, FILES AND MEDIA WEB & MOBILE APPS
(UNSTRUCTURED)
DATA FACTORY
AZURE DATA LAKE STORE
AZURE DATABRICKS

POLYBASE
ANALYTICAL DASHBOARDS
BUSINESS / CUSTOM DATA FACTORY
APPS
(STRUCTURED)

AZURE SQL DATA WAREHOUSE

Classified as Microsoft Confidential


AZURE MACHINE LEARNING & MACHINE LEARNING SERVER

AZURE IOT HUB

AZURE EVENT HUBS


AZURE DATABRICKS
SENSORS AND IOT
(UNSTRUCTURED)
POLYBASE ANALYTICAL DASHBOARDS

AZURE HDINSIGHT
AZURE DATA LAKE STORE
(Kafka)

AZURE SQL DATA WAREHOUSE

Classified as Microsoft Confidential


AZURE MACHINE LEARNING & MACHINE LEARNING SERVER

AZURE IOT HUB

AZURE EVENT HUBS


SENSORS AND IOT AZURE STREAM ANALYTICS
(UNSTRUCTURED)
POLYBASE ANALYTICAL DASHBOARDS

AZURE HDINSIGHT
AZURE DATA LAKE STORE
(Kafka)

AZURE SQL DATA WAREHOUSE

Classified as Microsoft Confidential


COSMOS DB, SQL DB
DATA LAKE STORE DATA LAKE ANALYTICS
BUSINESS APPS

DATA FACTORY
(Data movement)

BLOB STORAGE WEB & MOBILE APPS


AZURE DATABRICKS
SQL DATA WAREHOUSE

CUSTOM APPS

AZURE IOT HUB


OPERATIONAL REPORTS
ANALYSIS SERVICES
MACHINE LEARNING

EVENT HUBS
SENSORS AND DEVICES

KAFKA ON HDINSIGHT
STREAM ANALYTICS ANALYTICAL DASHBOARDS

INGEST STORE TRANSFORM PUBLISH


AND ANALYZE
Classified as Microsoft Confidential

You might also like