You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to
BigQuery daily. You have noticed that when the Data Science team runs a query filtered on a date column and
limited to 30–90 days of data, the query scans the entire table. You also noticed that your bill is increasing
more quickly than you expected. You want to resolve the issue as cost-effectively as possible while
maintaining the ability to conduct SQL queries. What should you do?
You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data
from BigQuery. The reference data is small enough to fit in memory on a single worker. The pipeline should
write enriched results to BigQuery for analysis. Which job type and transforms should this pipeline use?
Which Cloud Dataflow / Beam feature should you use to aggregate data in an unbounded data source every
hour based on the time when the data entered the pipeline?
You have several Spark jobs that run on a Cloud Dataproc cluster on a schedule. Some of the jobs run in
sequence, and some of the jobs run concurrently. You need to automate this process. What should you do?
You are consulting to a company developing an IoT application that analyzes data from sensors deployed on drones. The application depends on a database that can write large volumes of data at low latency. The company has used Hadoop HBase in the past but wants to migrate to a managed database service. What service would you recommend?