0% found this document useful (0 votes)
55 views3 pages

Databricks Architecture Interview Preparation

Databricks Architecture Interview Preparation(1)

Uploaded by

tharumpravesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views3 pages

Databricks Architecture Interview Preparation

Databricks Architecture Interview Preparation(1)

Uploaded by

tharumpravesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Databricks Architecture Interview Preparation

Detailed Explanation
Databricks follows a two-plane architecture consisting of the Control Plane and Compute Plane,
designed to separate data management from data processing while maintaining security and
scalability. The Control Plane is fully managed by Databricks as a SaaS offering, while the
Compute Plane can exist either in the customer's cloud account (classic) or in Databricks'
managed environment (serverless). This architecture enables the lakehouse paradigm,
combining the flexibility of data lakes with the reliability of data warehouses through Delta Lake.
[1] [2] [3] [4]

The platform operates on a medallion architecture with Bronze (raw), Silver (cleansed), and
Gold (business-ready) data layers, supporting both batch and streaming workloads through
Databricks Runtime (optimized Apache Spark). Unity Catalog provides centralized governance
across workspaces with fine-grained access control, data lineage tracking, and metadata
management. [3] [5] [4]

Cheat-Sheet Notes
Core Architecture Components:
Control Plane (Databricks-managed SaaS):
Web application & REST APIs
Unity Catalog & Meta Store for governance
Job scheduler & workflow management
Notebooks, DBSQL, Git integration
Access control (IAM/RBAC) [1] [3]
Compute Plane (Data processing):
Classic: Customer cloud account (AWS/Azure/GCP)
Serverless: Databricks-managed compute
Clusters with Databricks Runtime (optimized Spark)
Network isolation between workspaces [6] [1]
Storage Layer:
Delta Lake (ACID transactions, schema enforcement)
Cloud storage (S3/ADLS/GCS)
Workspace storage bucket [4] [1]
Key Features:
Delta Lake: ACID transactions, time travel, schema evolution
Unity Catalog: Three-level namespace ([Link])
Medallion Architecture: Bronze → Silver → Gold data layers
Databricks Runtime: Performance-optimized Spark engine [5] [4]
Blind Spots to Remember:
Data never leaves customer's cloud account in classic compute
Serverless compute still maintains network isolation
Unity Catalog enables cross-workspace governance
Delta Lake provides data versioning and rollback capabilities
Cost optimization through cluster autoscaling and spot instances [4] [1]

Opener for Interview Answer


"Databricks employs a two-plane architecture that separates the control plane, which
Databricks manages as a SaaS service for workspace management and governance, from the
compute plane where actual data processing occurs in either customer-managed or serverless
environments, all built on top of Delta Lake for reliable lakehouse functionality." [2] [1]

Hot Interview Questions


1. "Explain the difference between Control Plane and Compute Plane in Databricks
architecture."
Control Plane: Databricks-managed backend services (web app, metadata, scheduling)
Compute Plane: Where data processing happens (customer cloud or serverless)
Data security: Customer data never touches control plane [7] [1]
2. "How does Unity Catalog fit into Databricks architecture and what problems does it
solve?"
Centralized governance across multiple workspaces
Three-level namespace: [Link]
Fine-grained access control and data lineage tracking
Eliminates data silos and governance fragmentation [3] [4]
3. "Describe the medallion architecture and when you'd use each layer."
Bronze: Raw ingestion, audit trails, data archiving
Silver: Cleansed data, analytics, operational reporting
Gold: Business aggregates, BI dashboards, ML models [5]
4. "What are the advantages of serverless compute vs classic compute in Databricks?"
Serverless: No infrastructure management, automatic scaling, faster startup
Classic: More control, custom networking, cost optimization with reserved instances
Both maintain data isolation and security [6] [1]
5. "How does Delta Lake enhance the traditional data lake architecture?"
ACID transactions for data reliability
Schema enforcement and evolution
Time travel for versioning and auditing
Optimized file formats and data skipping for performance [4]

1. [Link]
2. [Link]
3. [Link]
4. [Link]
5. [Link]
6. [Link]
7. [Link]
8. [Link]
9. [Link]
10. [Link]
d72180ae2b
11. [Link]
questions-answers-
12. [Link]
nalytics-architecture
13. [Link]
14. [Link]
ml
15. [Link]
16. [Link]
17. [Link]
18. [Link]
0924140359681-VOhT
19. [Link]
20. [Link]

You might also like