Architecture Patterns of Analytics and Big Data
Architecture Patterns of Analytics and Big Data
• Purpose: This pattern was proposed as a • Purpose: This pattern caters to organizations
• Purpose: This pattern aims to handle massive
simplification of the Lambda Architecture. Instead that need to store vast amounts of raw data in
quantities of data by taking advantage of both its native format, allowing for flexible, on-
of maintaining two separate code bases for batch
batch and stream processing methods. demand processing and analytics.
and stream processing, Kappa uses a single
codebase.
• Components: • Components:
• Components:
• Batch Layer: Manages the master dataset • Ingestion Layer: Collects and ingests data
and pre-computes batch views. • Stream Processing Layer: Handles incoming data from various sources in its raw format.
• Speed Layer: Deals with real-time data and streams and processes them in real-time. It can • Storage Layer: Acts as a repository storing
computes real-time views. reprocess all data if needed. vast amounts of raw data, usually in a
• Serving Layer: Responds to ad-hoc queries • Serving Layer: Holds the processed data and distributed file system.
by using both batch views and real-time serves queries. • Processing & Analytics Layer: Contains tools
views. and frameworks to process and analyze data
• Technologies: Apache Kafka for the data stream on-demand. This can be batch or real-time
• Technologies: Hadoop (for the batch layer), management, Kafka Streams or Apache Flink for processing.
Storm or Kafka Streams (for the speed layer), stream processing, and any fast, distributed • Consumption Layer: Where processed data
and Apache Cassandra or HBase (for the database for the serving layer (e.g., Apache is made available to end-users, either for
serving layer). Cassandra or Elasticsearch). querying, reporting, or further analytics.
Serverless (Function-as-a-
Microservices Architecture Grid Computing Architecture
Service) Architecture
Multi-host, Distributed
Single-host, Multiple-
Containers Pattern Sidecar Pattern
container Pattern
(Orchestrated Containers)
Purpose: Extend or
Purpose: Isolate Purpose: Distribute and enhance the functionality
application components manage containers across of a container without
or services within different multiple host machines, modifying the container
containers on a single providing fault tolerance, itself. Sidecars are typically
host, enabling component high availability, and used for tasks like logging,
scaling and isolation scalability. monitoring, security, etc.
without requiring multiple
host machines.
Components:
Components:
Orchestration Engine: Coordinates and manages containers Components:
Host OS: The base operating system of across multiple hosts (e.g., Kubernetes, Docker Swarm).
the host machine. Application Container: The primary container that holds
Service Discovery: Helps containers discover and communicate the main application logic.
Container Engine: Software that with each other across hosts.
manages the creation and runtime of Sidecar Container: The secondary container that runs
containers (e.g., Docker). Load Balancer: Distributes incoming traffic to appropriate alongside the application container, providing additional
containers, ensuring even load and high availability. functionality or features.
Containers: Each container
encapsulating a specific service or Nodes: Individual host machines or VMs that run containers. Shared Volumes: A storage volume shared between the
component of the application. Containers: Each container encapsulating a specific service or application and sidecar containers to exchange data or
Typical Use-Cases: Development component of the application. configuration.
environments, small-scale production Typical Use-Cases: Large-scale web applications, microservice Typical Use-Cases: Enriching application features without
applications, or any scenario with architectures, and other distributed systems demanding high touching its core logic, especially useful in microservices
limited infrastructure resources. availability and scalability. where common concerns (e.g., logging, monitoring) are
externalized into sidecars.
Different Architecture patterns for Databases
Lift-and-Shift (Rehosting):
Purpose: Quickly migrate applications and data from one environment to another with minimal changes, often used for moving on-premises infrastructure to the cloud.
Rebuilding (Re-architecting):
Purpose: Redesign and rewrite the application from the ground up, leveraging the features and services of the target platform to achieve better scalability, resilience, and flexibility.
• Components
• Identity Provider (IdP)
Zero Trust Network Architecture • Policy Engine
Purpose: Assume no trust for any user or device, regardless • Network Segmentation
of whether they are inside or outside the network perimeter. • Continuous Monitoring & Analytics
Grant access based on strict identity verification and ongoing • Technologies: Google's BeyondCorp, Cisco's Zero
context-aware security checks. Trust, and solutions like Okta, Duo Security.
• Components
• User Directory
Identity and Access Management (IAM) • Authentication Service
Purpose: Manage digital identities and their permissions • Authentication Engine
across systems and services. Ensure the right individuals • Multi-Factor Authentication
access the right resources at the right times for the right • Role-Based Access Control (RBAC)
reasons. • Identity Federation
• Technologies: AWS IAM, Azure Active Directory,
Google Cloud Identity, Okta, and many others.
• Components:
• Data Classification
• Data Loss Prevention (DLP
Data-Centric Security Architecture • Data Masking & Tokenization
Purpose: Focus on securing the data itself rather than just the • Data Encryption
perimeter or endpoints. This is especially relevant with the • Access Control
rise of mobile devices, cloud computing, and decentralized • Audit & Monitoring
data storage. • Technologies: Symantec DLP, McAfee Total Protection
for Data, Microsoft's Azure Information Protection,
and various encryption solutions.
Different Architecture Patterns for Storage
Components Components
Components:
Storage Nodes Storage Array
Storage Devices
Data Distribution & Management Layer Controller
File System Layer
RESTful API Interface Logical Unit Numbers (LUNs
Network Protocol Layer
Metadata Store Storage Protocol
NAS Head or Controller
Technologies: Amazon S3, Google Cloud Storage, Cache Layer
OpenStack Swift. Redundancy Mechanisms
Technologies: Amazon EBS, Google Persistent Disk,
traditional SAN (Storage Area Network) solutions. Technologies: Solutions like NetApp, QNAP,
Synology, and cloud equivalents like Amazon
EFS or Azure Files.
Different reference Architecture Patterns for Payments
Purpose: Break down the payment processing into smaller, decoupled services, each handling specific functionalities. This approach facilitates scalability, resilience, and faster iterations.
• Components:
• Payment Initiation Service
• Payment Queue
• Processing Microservices
• Notification Service
• Audit & Logging Service
• Security & Fraud Detection Service
• Typical Use-Cases: Large-scale e-commerce platforms, fintech startups, businesses requiring high customization or integration with multiple payment methods.
Different reference Architecture Patterns for Headless e-Commerce