Databricks Architecture Interview Preparation
Databricks Architecture Interview Preparation
Detailed Explanation
Databricks follows a two-plane architecture consisting of the Control Plane and Compute Plane,
designed to separate data management from data processing while maintaining security and
scalability. The Control Plane is fully managed by Databricks as a SaaS offering, while the
Compute Plane can exist either in the customer's cloud account (classic) or in Databricks'
managed environment (serverless). This architecture enables the lakehouse paradigm,
combining the flexibility of data lakes with the reliability of data warehouses through Delta Lake.
[1] [2] [3] [4]
The platform operates on a medallion architecture with Bronze (raw), Silver (cleansed), and
Gold (business-ready) data layers, supporting both batch and streaming workloads through
Databricks Runtime (optimized Apache Spark). Unity Catalog provides centralized governance
across workspaces with fine-grained access control, data lineage tracking, and metadata
management. [3] [5] [4]
Cheat-Sheet Notes
Core Architecture Components:
Control Plane (Databricks-managed SaaS):
Web application & REST APIs
Unity Catalog & Meta Store for governance
Job scheduler & workflow management
Notebooks, DBSQL, Git integration
Access control (IAM/RBAC) [1] [3]
Compute Plane (Data processing):
Classic: Customer cloud account (AWS/Azure/GCP)
Serverless: Databricks-managed compute
Clusters with Databricks Runtime (optimized Spark)
Network isolation between workspaces [6] [1]
Storage Layer:
Delta Lake (ACID transactions, schema enforcement)
Cloud storage (S3/ADLS/GCS)
Workspace storage bucket [4] [1]
Key Features:
Delta Lake: ACID transactions, time travel, schema evolution
Unity Catalog: Three-level namespace ([Link])
Medallion Architecture: Bronze → Silver → Gold data layers
Databricks Runtime: Performance-optimized Spark engine [5] [4]
Blind Spots to Remember:
Data never leaves customer's cloud account in classic compute
Serverless compute still maintains network isolation
Unity Catalog enables cross-workspace governance
Delta Lake provides data versioning and rollback capabilities
Cost optimization through cluster autoscaling and spot instances [4] [1]
1. [Link]
2. [Link]
3. [Link]
4. [Link]
5. [Link]
6. [Link]
7. [Link]
8. [Link]
9. [Link]
10. [Link]
d72180ae2b
11. [Link]
questions-answers-
12. [Link]
nalytics-architecture
13. [Link]
14. [Link]
ml
15. [Link]
16. [Link]
17. [Link]
18. [Link]
0924140359681-VOhT
19. [Link]
20. [Link]