Lakeflow Connect Databricks
© 2025 Impetus Technologies – Confidential
Contents
LakeFlow Adaptor : Solution to upgrade to Low code Automatic Pipelines..................................................3
Its architecture features:................................................................................................................................4
Key Features of Databricks Lakeflow:...........................................................................................................4
Benefits of Using Databricks Lakeflow:.........................................................................................................4
Probable Steps to Migrate Existing Databricks Jobs to Lakeflow Pipelines:................................................4
2
LakeFlow Adaptor : Solution to upgrade to Low code Automatic Pipelines(LakeFlow)
Lakeflow, a new solution that contains everything you need to build and operate
production data pipelines.
It includes new native, highly scalable connectors for databases like SQL Server and for
enterprise applications like Salesforce, Workday, Google Analytics, ServiceNow, and
SharePoint.
Users can transform data in batch and streaming using standard SQL and Python. We are
also announcing Real Time Mode for Apache Spark, allowing stream processing at orders
of magnitude faster latencies than microbatch.
Finally, you can orchestrate and monitor workflows and deploy to production using CI/CD.
Databricks Lakeflow is native to the Data Intelligence Platform, providing serverless
compute and unified governance with Unity Catalog.
3
Its architecture features:
Automated Workflows: Intuitive drag-and-drop design tools for building data
workflows.
Dependency Management: Guarantees that tasks execute in the proper order.
Delta Live Tables (DLT) Integration: Streamlines data processing and enhances
reliability.
Key Features of Databricks Lakeflow:
Lakeflow is more than just an advanced tool; it offers substantial capabilities. Here’s what you
can expect:
1. Low-Code Interface: Design complex workflows without deep coding expertise.
2. Real-Time Monitoring: Stay informed with alerts and automated retries.
3. Delta Live Tables Integration: Enhanced data reliability and tracking.
4. Workflow Scheduling: Automate data ingestion, transformation, and analysis.
Benefits of Using Databricks Lakeflow:
1. Scalability: Efficiently manages large-scale data workloads.
2. Reliability: Features integrated monitoring and alerting to ensure error-free pipelines.
3. Simplified Data Pipelines: A low-code approach minimizes complexity in pipeline
creation.
4. Cost-Effectiveness: Optimized resource utilization leads to savings in both time and
costs.
5. Unlock value from the data in just a few easy steps.
6. Built-in data connectors are available for popular enterprise applications, file sources
and databases.
7. Flexible and easy: Fully managed connectors provide a simple UI and API for easy
setup and democratize data access. Automated features also help simplify pipeline
maintenance with minimal overhead.
8. Built-in connectors: Data ingestion is fully integrated with the Data Intelligence
Platform. Create ingestion pipelines with governance from Unity Catalog, observability
from Lakehouse Monitoring, and seamless orchestration with workflows for analytics,
machine learning and BI.
9. Efficient ingestion: Increase efficiency and accelerate time to value. Optimized
incremental reads and writes and data transformation help improve the performance
and reliability of your pipelines, reduce bottlenecks, and reduce impact to the source
data for scalability.
Probable Steps to Migrate Existing Legacy Databricks Jobs to Lakeflow Pipelines:
Evaluate Current Job Configurations:
Review existing Databricks jobs to understand their configurations, dependencies,
and data sources.
Identify any specific requirements or customizations that need to be replicated in
Lakeflow.
4
Utilize Managed Connectors:
Leverage the built-in connectors provided by Lakeflow Connect for seamless
integration with your data sources, such as SQL Server or other databases.
Ensure that the necessary permissions and access controls are in place for the
connectors to function properly.
Set Up Gateway and Ingestion Pipelines:
Create a gateway pipeline to extract data from the source database using a DLT
pipeline with classic compute.
Establish an ingestion pipeline that ingests the staged data into Delta tables using
serverless compute.
Use the Databricks SDK or UI to configure these pipelines effectively.
Implement Change Data Capture (CDC):
Enable CDC or change tracking on your source databases to ensure that only
incremental changes are ingested.
This will help maintain data freshness and reduce the load on your systems.
Test and Validate:
Run test migrations to validate that the data is being ingested correctly and that the
pipelines are functioning as expected.
Monitor for any errors or issues during the migration process and address them
promptly.
Monitor and Optimize:
After migration, continuously monitor the performance of the Lakeflow pipelines.
Optimize configurations as needed to ensure efficient data processing and resource
utilization.
Limitations:
1. Learning Curve: Teams familiar with classic jobs may face challenges adapting to the
new system.
2. Process Adjustments Required: Existing processes and tools may need
modifications for full integration.
3. Schema Evolution Constraints: Certain schema changes may require a full table
refresh, limiting flexibility.
4. One table for multiple pipelines: A key limitation within pipelines is that pointing to
the same destination table from different pipelines will result in the table
being overwritten. This means you cannot concurrently manage or append data from
distinct pipelines into a single, shared table without one pipeline output replacing the
others.
5. External Table Support: A key limitation in pipeline is how it handles external path
for tables it consists of. When using Hive Metastore, pipelines allow you to specify
a "path" attribute, giving you control over where data is stored. However, with Unity
Catalog, the "path" attribute is unsupported; Unity Catalog automatically manages
5
storage locations within its own cloud path, meaning you have less direct control over
data placement.
6. Language Monogamy: SQL and Python Separation: Unlike standard notebooks
that allow mixed language cells, pipelines enforce a strict single-language rule
per notebook. A pipeline notebook must be either entirely Python or entirely SQL. If
both syntaxes are present, the pipeline will only execute based on the notebook's
designated type, ignoring the other language. This means you cannot seamlessly
interleave SQL and Python code within a single notebook for pipeline execution. (two
notebooks independent of different languages works)
Conclusion: