0% found this document useful (0 votes)
70 views11 pages

Streams and Task

The document explains the concepts of streams and tasks in Snowflake, highlighting how streams capture changes to tables through Change Data Capture (CDC) and the different types of streams available. It also details how tasks automate and schedule SQL executions, including their scheduling options and handling of failures. Additionally, it provides examples of creating streams and tasks, emphasizing their roles in managing data processes efficiently.

Uploaded by

Vishakha Vyas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views11 pages

Streams and Task

The document explains the concepts of streams and tasks in Snowflake, highlighting how streams capture changes to tables through Change Data Capture (CDC) and the different types of streams available. It also details how tasks automate and schedule SQL executions, including their scheduling options and handling of failures. Additionally, it provides examples of creating streams and tasks, emphasizing their roles in managing data processes efficiently.

Uploaded by

Vishakha Vyas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Understanding Streams and Tasks in

Snowflake

Visakha
STREAMS
A stream object records changes made to tables, including inserts, updates, and deletes.

•It captures metadata about each change to allow actions to be taken using the modified data.
•This process is called Change Data Capture (CDC).
•A table stream tracks changes to rows in a source table.
•It creates a change table that shows what changed at the row level between two points in
time.
•You can query and consume the changes in a transactional manner.
•A stream becomes stale when its offset exceeds the data retention period of the source table.
•To prevent staleness, consume stream records before the retention period ends.
•Snowflake temporarily extends the data retention period up to 14 days to avoid staleness.
•The STALE_AFTER timestamp indicates when a stream will become stale.
•Recreating a table or view will cause the stream to become stale
•Streams can be created to query change data on standard tables (including shared tables),
views (including secure views), directory tables, dynamic tables, Apache Iceberg™ tables
(with limitations), event tables, and external tables.
Stream Columns

1.A stream does not store actual table data but tracks DML changes by recording an offset for the source object.
2.The METADATA$ACTION column indicates the type of DML operation (INSERT, DELETE) that was recorded
in the stream.
3.The METADATA$ISUPDATE column shows whether the operation was part of an UPDATE statement, with
updates represented as DELETE and INSERT pairs.
4.The METADATA$ROW_ID column provides a unique, immutable ID for each row, allowing tracking of
changes over time.
5.METADATA$ROW_IDs are consistent across streams on the same source object but may differ for streams
on views or clones/replicas of the source.
Types of Streams
1.Standard Streams
•Tracks all DML changes: inserts, updates, deletes, and truncates, ensuring comprehensive change
data capture.
•Provides a net delta by joining inserted and deleted rows, offering a clear view of the data changes.
•Not suitable for geospatial data as it cannot retrieve change data for such types.
2.Append-only Streams
•Exclusively tracks row inserts, which makes it more efficient for scenarios requiring only new data.
•Deletion or truncation events are not captured, making it ideal for append-heavy datasets.
•Provides better performance in scenarios like ETL by reducing the overhead of handling deletions or
updates. ( Append_only = true)
3.Insert-only Streams
•Tracks only row inserts in Apache Iceberg™ or external tables, ideal for cloud storage scenarios.
•Does not record delete operations, so only newly added rows are captured, simplifying processing.
•Suitable for use with cloud-based external tables where files are overwritten or appended without
triggering deletes. (create stream x on EXTERNAL TABLE my_x_tbl;)
• CREATE OR REPLACE STREAM order_raw_stream ON
TABLE order_raw;
TASKS
A task in Snowflake automates and schedules the execution of SQL queries,
stored procedures, or scripts to streamline data processes.

• Types of Functions Executed by Tasks:


1. Single SQL Statement
2. Call to Stored Procedure
3. Procedural Logic Using Snowflake Scripting
Task Scheduling and Execution

•Task Scheduling: Tasks can be scheduled using specific parameters to automate execution.
•Triggered Tasks: Tasks can be triggered automatically based on specific events or
conditions.
•Manually Executing Tasks: Tasks can be executed manually for testing or one-time
operations.
•Versioning of Task Runs: Each task run is versioned to track and manage changes over
time.
•Automatically Suspend Tasks After Failed Runs: Tasks automatically suspend if they fail,
preventing further execution.
•Automatically Retry Failed Task Runs: Failed tasks can be automatically retried based on
predefined settings. ( Max_retries=3)
TYPES OF TASKS
CRON JOBS
• A cron job is a time-based job scheduler in Unix-like operating systems, allowing users to run
commands or scripts automatically at specified interval

• | | | | +---- Day of the week (0 - 7) (Sunday = 0 or 7)


• | | | +------ Month (1 - 12)
• | | +-------- Day of the month (1 - 31)
• | +---------- Hour (0 - 23)
• +------------ Minute (0 - 59)

•Minute (0 - 59): Specifies the minute when the command should run.
•Hour (0 - 23): Specifies the hour (in 24-hour format).
•Day of the month (1 - 31): The specific day of the month to run the job.
•Month (1 - 12): The month when the job should be executed.
•Day of the week (0 - 7): The day of the week (both 0 and 7 represent Sunday).
Create Steam
CREATE OR REPLACE STREAM order_raw_stream ON TABLE order_raw;

Create Tasks

CREATE OR REPLACE TASK load_order_analytics


WAREHOUSE = compute_wh
SCHEDULE = 'USING CRON 5 * * * * UTC' --- or SCHEDULE = '5 MINUTE'
AS
INSERT INTO order_analytics
SELECT * FROM order_raw_stream;

Resume Task

ALTER TASK load_order_analytics RESUME;

You might also like