dbx-exam-guide

Question 1

A data engineering team wants to grant a group access to all tables in the finance schema using Unity Catalog. Which SQL statement correctly grants SELECT privileges at the schema level?

A) GRANT SELECT ON SCHEMA finance TO GROUP finance_team B) GRANT SELECT ON DATABASE finance TO GROUP finance_team C) GRANT SELECT ON ALL TABLES IN SCHEMA finance TO GROUP finance_team D) GRANT SELECT ON SCHEMA finance TO USER finance_team

Question 2

You are configuring Lakeflow to ingest data from an external S3 bucket into a Delta table. Which configuration ensures that schema evolution is handled automatically during ingestion?

A) Set mergeSchema to true in the Lakeflow pipeline configuration B) Set cloudFiles.schemaEvolutionMode to merge in the pipeline C) Set cloudFiles.inferSchema to true in the pipeline D) Set mergeSchema to auto in the pipeline configuration

Question 3

A workspace admin wants to restrict access so that only specific clusters can access a Unity Catalog volume. Which scope should the admin configure the volume permissions at?

A) Workspace level B) Cluster level C) Catalog level D) Volume level

Question 4

A Lakeflow pipeline is scheduled to run every hour, but sometimes fails due to missing source files. Which configuration allows the pipeline to skip missing files and continue processing?

A) Set cloudFiles.ignoreMissingFiles to true B) Set cloudFiles.allowMissingFiles to true C) Set cloudFiles.skipMissingFiles to true D) Set cloudFiles.continueOnError to true

Question 5

You want to grant a user the ability to create tables in a Unity Catalog schema but not drop existing tables. Which privilege should you grant?

A) CREATE TABLE B) MODIFY C) USAGE D) CREATE

Question 6

A data analyst tries to query a table in Unity Catalog but receives a “permission denied” error. The analyst has SELECT on the table but not on the parent schema. What is the minimum additional privilege needed?

A) USAGE on the schema B) MODIFY on the schema C) SELECT on the catalog D) USAGE on the catalog

Question 7

You are using Lakeflow to orchestrate a multi-step pipeline. Which statement about task dependencies is correct?

A) Tasks can only depend on immediate upstream tasks B) Tasks can depend on any other task in the pipeline C) Tasks must be executed sequentially in the order defined D) Tasks can only depend on the first task in the pipeline

Question 8

A Unity Catalog administrator wants to audit all changes to a specific table. Which feature should they use?

A) Table lineage in Unity Catalog B) Audit logs in Databricks workspace C) Table change history in Unity Catalog D) Data access logs in Lakeflow

Question 9

You want to use Lakeflow to ingest data from multiple cloud sources into a single Delta table. Which configuration is required to avoid duplicate records?

A) Set cloudFiles.deduplicate to true B) Set cloudFiles.uniqueRecords to true C) Set cloudFiles.mergeDuplicates to true D) Set cloudFiles.dropDuplicates to true

Question 10

A developer wants to use Unity Catalog to manage access to a Delta table across multiple workspaces. What is required for this setup?

A) The table must be in a catalog shared across workspaces B) The table must be registered in each workspace separately C) The table must be stored in the workspace’s default catalog D) The table must be assigned to a workspace-local schema

HANDS-ON TESTING SECTION

Question 1

Test data setup:

CREATE SCHEMA IF NOT EXISTS finance;
CREATE TABLE finance.transactions (id INT, amount DOUBLE);

Hands-on test code:

-- Option A
GRANT SELECT ON SCHEMA finance TO GROUP finance_team;

-- Option B
GRANT SELECT ON DATABASE finance TO GROUP finance_team;

-- Option C
GRANT SELECT ON ALL TABLES IN SCHEMA finance TO GROUP finance_team;

-- Option D
GRANT SELECT ON SCHEMA finance TO USER finance_team;

Question 2

Test data setup:

# Sample S3 bucket with evolving schema
import json
dbutils.fs.mkdirs("/mnt/s3/sample")
dbutils.fs.put("/mnt/s3/sample/file1.json", json.dumps({"id": 1, "name": "A"}), True)
dbutils.fs.put("/mnt/s3/sample/file2.json", json.dumps({"id": 2, "name": "B", "age": 30}), True)

Hands-on test code:

# Option A
pipeline_config = {"mergeSchema": True}

# Option B
pipeline_config = {"cloudFiles.schemaEvolutionMode": "merge"}

# Option C
pipeline_config = {"cloudFiles.inferSchema": True}

# Option D
pipeline_config = {"mergeSchema": "auto"}

Question 3

Test data setup:

CREATE VOLUME finance_vol;

Hands-on test code:

-- Option A
GRANT READ ON VOLUME finance_vol TO GROUP finance_team; -- at workspace level

-- Option B
GRANT READ ON VOLUME finance_vol TO CLUSTER cluster_id;

-- Option C
GRANT READ ON VOLUME finance_vol TO CATALOG finance;

-- Option D
GRANT READ ON VOLUME finance_vol TO GROUP finance_team; -- at volume level

Question 4

Test data setup:

# Simulate missing files in source directory
dbutils.fs.mkdirs("/mnt/source")
dbutils.fs.put("/mnt/source/file1.csv", "id,name\n1,A", True)
# file2.csv is intentionally missing

Hands-on test code:

# Option A
options = {"cloudFiles.ignoreMissingFiles": "true"}

# Option B
options = {"cloudFiles.allowMissingFiles": "true"}

# Option C
options = {"cloudFiles.skipMissingFiles": "true"}

# Option D
options = {"cloudFiles.continueOnError": "true"}

Question 5

Test data setup:

CREATE SCHEMA hr;

Hands-on test code:

-- Option A
GRANT CREATE TABLE ON SCHEMA hr TO USER analyst;

-- Option B
GRANT MODIFY ON SCHEMA hr TO USER analyst;

-- Option C
GRANT USAGE ON SCHEMA hr TO USER analyst;

-- Option D
GRANT CREATE ON SCHEMA hr TO USER analyst;

Question 6

Test data setup:

CREATE SCHEMA sales;
CREATE TABLE sales.orders (id INT, total DOUBLE);
GRANT SELECT ON TABLE sales.orders TO USER analyst;

Hands-on test code:

-- Option A
GRANT USAGE ON SCHEMA sales TO USER analyst;

-- Option B
GRANT MODIFY ON SCHEMA sales TO USER analyst;

-- Option C
GRANT SELECT ON CATALOG main TO USER analyst;

-- Option D
GRANT USAGE ON CATALOG main TO USER analyst;

Question 7

Test data setup:

# Lakeflow pipeline with three tasks
tasks = [
    {"name": "extract"},
    {"name": "transform", "depends_on": ["extract"]},
    {"name": "load", "depends_on": ["transform"]}
]

Hands-on test code:

# Option A: Only immediate upstream
# Option B: Any task
# Option C: Sequential only
# Option D: Only first task

Question 8

Test data setup:

CREATE TABLE audit_demo (id INT);

Hands-on test code:

-- Option A
SHOW TABLE LINEAGE audit_demo;

-- Option B
-- Check workspace audit logs

-- Option C
DESCRIBE HISTORY audit_demo;

-- Option D
-- Check Lakeflow data access logs

Question 9

Test data setup:

# Multiple sources with overlapping data
dbutils.fs.put("/mnt/source1/file1.csv", "id,name\n1,A\n2,B", True)
dbutils.fs.put("/mnt/source2/file2.csv", "id,name\n2,B\n3,C", True)

Hands-on test code:

# Option A
options = {"cloudFiles.deduplicate": "true"}

# Option B
options = {"cloudFiles.uniqueRecords": "true"}

# Option C
options = {"cloudFiles.mergeDuplicates": "true"}

# Option D
options = {"cloudFiles.dropDuplicates": "true"}

Question 10

Test data setup:

CREATE CATALOG shared_catalog;
CREATE SCHEMA shared_catalog.data;
CREATE TABLE shared_catalog.data.sales (id INT, amount DOUBLE);

Hands-on test code:

-- Option A
-- Table in shared_catalog, accessible across workspaces

-- Option B
-- Register table in each workspace

-- Option C
-- Table in workspace's default catalog

-- Option D
-- Table in workspace-local schema

=== ANSWERS ===

Question 1: A - Only option A uses correct syntax and scope for granting SELECT at schema level to a group

Question 2: B - cloudFiles.schemaEvolutionMode is the correct option for Lakeflow schema evolution

Question 3: D - Volume permissions are set at the volume level, not workspace/cluster/catalog

Question 4: A - cloudFiles.ignoreMissingFiles is the correct option to skip missing files

Question 5: D - CREATE privilege allows creating tables, but not dropping existing ones

Question 6: A - USAGE on the schema is required in addition to SELECT on the table

Question 7: B - Lakeflow allows tasks to depend on any other task, not just immediate upstream

Question 8: C - Table change history tracks all changes to a table in Unity Catalog

Question 9: A - cloudFiles.deduplicate is the correct option to avoid duplicate records

Question 10: A - Table must be in a catalog shared across workspaces for cross-workspace access

Detailed Explanations:

Question 1:

A is correct: Grants SELECT on all tables in the schema to the group.
B: DATABASE is not the correct scope in Unity Catalog.
C: Syntax is invalid; “ALL TABLES IN SCHEMA” is not supported.
D: Grants to a user, not a group.

Question 2:

B is correct: Lakeflow uses cloudFiles.schemaEvolutionMode for schema evolution.
A: mergeSchema is for Delta, not Lakeflow.
C: inferSchema only infers, doesn’t merge.
D: mergeSchema doesn’t accept “auto”.

Question 3:

D is correct: Volume permissions are managed at the volume level.
A: Workspace-level permissions don’t apply to volumes.
B: Cluster-level permissions aren’t supported for volumes.
C: Catalog-level permissions don’t control volume access.

Question 4:

A is correct: cloudFiles.ignoreMissingFiles skips missing files.
B/C/D: These options are not valid Lakeflow configs.

Question 5:

D is correct: CREATE allows table creation, not dropping.
A: CREATE TABLE is not a valid privilege.
B: MODIFY allows altering/dropping.
C: USAGE only allows access, not creation.

Question 6:

A is correct: USAGE on schema is required to access objects within.
B: MODIFY is unnecessary for querying.
C: SELECT on catalog doesn’t grant access to schema.
D: USAGE on catalog is not sufficient.

Question 7:

B is correct: Tasks can depend on any other task.
A: Not limited to immediate upstream.
C: Not always sequential.
D: Not limited to first task.

Question 8:

C is correct: Table change history shows all changes.
A: Lineage shows data flow, not changes.
B: Workspace audit logs are broader.
D: Lakeflow logs are for pipeline runs.

Question 9:

A is correct: Deduplication is handled by cloudFiles.deduplicate.
B/C/D: Not valid Lakeflow options.

Question 10:

A is correct: Shared catalog enables cross-workspace access.
B: Duplicate registration is not required.
C/D: Limits access to single workspace.

This site is open source. Improve this page.