A Databricks job has been configured with 3 tasks, each of which is a Databricks notebook. Task A does not
depend on other tasks. Tasks B and C run in parallel, with each having a serial dependency on Task A.
If task A fails during a scheduled run, which statement describes the results of this run?
The team wants to create a batch processing pipeline to process all new records inserted in the orders_raw table and propagate them to the orders_cleaned table.
Which solution minimizes the compute costs to propagate this batch of data?
• raw_iot ingests raw device measurement data from a heart rate tracking device.
• bpm_stats incrementally computes user statistics based on BPM measurements from raw_iot.
How can the data engineer configure this pipeline to be able to retain manually deleted or updated records in the raw_iot table, while recomputing the downstream table bpm_stats table when a pipeline update is run?
© Copyrights DumpsEngine 2025. All Rights Reserved
We use cookies to ensure your best experience. So we hope you are happy to receive all cookies on the DumpsEngine.