100% found this document useful (1 vote)
100 views23 pages

AWS DataEngineering

This document contains summaries of various big data and cloud technologies including Apache Kafka, Amazon Kinesis, Amazon S3, Amazon Redshift, Amazon DynamoDB, Amazon EMR, Amazon Athena, Amazon Elasticsearch Service, Amazon QuickSight, data security technologies, container orchestration technologies like Docker, Kubernetes, and Airflow. It discusses concepts like data streams, producers, consumers, scaling, security, cost optimization, integration patterns, and best practices for these services.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (1 vote)
100 views23 pages

AWS DataEngineering

This document contains summaries of various big data and cloud technologies including Apache Kafka, Amazon Kinesis, Amazon S3, Amazon Redshift, Amazon DynamoDB, Amazon EMR, Amazon Athena, Amazon Elasticsearch Service, Amazon QuickSight, data security technologies, container orchestration technologies like Docker, Kubernetes, and Airflow. It discusses concepts like data streams, producers, consumers, scaling, security, cost optimization, integration patterns, and best practices for these services.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 23

PYTHON, SCALA, AWS, SPARK, KAFKA, REDSHIFT, DYNAMODB, POSTGRESQL,

AIRFLOW, DOCKER COMPOSE, DOCKER SWARM, KUBERNETES, etc.,


▪ Data Streams
▪ Producers
▪ Consumers
▪ Enhanced Fan Out
▪ Scaling
▪ Duplicate Records
▪ Security
▪ Data Firehose
▪ Overview
▪ Security
▪ Cost
▪ Limits and Restrictions
▪ Best Practices
▪ SQS with SNS
▪ IoT Actions and Correlated Services (AWS)
▪ IoT Policies and Roles
▪ Security Credentials
▪ Communication Protocols & Security for Devices
▪ Overview
▪ Sources and Targets
▪ Tasks and Task Assessment Reports
▪ Migration Types
▪ Migrating Large Tables
▪ Monitoring
▪ Validtion
▪ Pricing, etc.,
▪ KAFKA BASICS
▪ ONPREMISES Vs. CLOUD
▪ MANAGED SERVICE FOR APACHE KAFKA (AWS CLOUD)
▪ CONNECT
▪ SERVERLESS
▪ KINESIS Vs. MSK
▪ Storage Classes
▪ Lifecycle Rules
▪ Versioning
▪ Replication
▪ Performance
▪ Encryption
▪ Security & Bucket Policies
▪ Glacier
▪ Event Notification
▪ Big Data
▪ RCU & WCU Throughput
▪ APIs
▪ Indexes
▪ PartiQL
▪ DAX
▪ Streams
▪ TTL
▪ Security
▪ Overview
▪ Pricing
▪ Redis
▪ Elastic Search
▪ Integration
▪ Costs
▪ Promises
▪ Anti-Patterns
▪ Data Catalog
▪ Endpoints
▪ Jobs
▪ Costs
▪ Anti-Patterns
▪ Studio
▪ DataBrew
▪ Elastic Views
▪ Integration
▪ Storage
▪ Promises
▪ Serverless
▪ Hive
▪ Pig
▪ HBase
▪ Presto
▪ Zeppelin
▪ Security
▪ Instance Types
▪ Spark Basics
▪ PySpark (Scala, Python & SQL Versions)
▪ Integrtion with Kinesis & Redshift
▪ Data Pipeline
▪ Step Functions
▪ Kinesis Analytics
▪ Index Management
▪ Service Performance
▪ Performance
▪ Costs
▪ Security
▪ Architecture
▪ Spectrum
▪ Performance Tuning
▪ Durability, Scaling
▪ Distribution Styles
▪ Sort Keys
▪ Data Flows
▪ COPY command
▪ Integration
▪ Resizing
▪ AQUA
▪ Security
▪ Serverless
▪ Overview
▪ Pricing
▪ SQL Vs NoSQL
▪ RDS (MySQL, Postgresql)
▪ Aurora, etc.,
▪ Pricing
▪ Dashboards
▪ ML Insights
▪ Visualization Types
▪ S3 Encryption
▪ KMS
▪ Cloud HSM
▪ STS
▪ Cross Account Access
▪ Identity Federation
▪ Policies
▪ CloudTrail
▪ VPC Endpoints
▪ Overview
▪ Deploying Airflow with Flux
▪ Troubleshooting Deployments
▪ Deploying Airflow in EKS with CodePipeline and Flux
▪ Unit Testing in Airflow, etc.,
▪ DockerHub
▪ Docker Comnpose
▪ Docker Swarm
▪ Minikube
▪ Configurtion Files
▪ Object Types & API Versions
▪ Pods
▪ Connect to Containers, etc.,

You might also like