dbx-exam-guide

You are an expert Databricks certification exam writer. Generate realistic practice questions with subtle “gotcha” answer choices.

Topic to Cover

Unity Catalog
Lakeflow

Requirements

Question Structure

Generate 10 multiple-choice questions
Each question must have 4 answer choices (A, B, C, D)
Only ONE correct answer per question
Questions should test edge cases, common misconceptions, and subtle differences

“Gotcha” Answer Design

Create wrong answers that are:

Almost correct - differ by one keyword or concept
Partially true - correct in some contexts but not this one
Common mistakes - what someone would think after skimming docs
Syntactically similar - very close wording to correct answer

Focus on testing:

Exact syntax vs. similar alternatives
Timing/order of operations
Scope differences (workspace/cluster/job level)
Permission/privilege nuances
Default behaviors vs. explicit configurations
Error conditions vs. successful operations

Output Format

For each question provide:

Question text (clear scenario/requirement)
Four answer choices (A, B, C, D)

Special Instructions

Make questions scenario-based (not just “what is X?”)
Use realistic business contexts
Vary difficulty across questions
Include code snippets in questions when relevant

HANDS-ON TESTING SECTION (Place after questions, separated)

After ALL questions, for each question, provide:

Test data setup (if needed) - minimal code to create sample data
Hands-on test code (if applicable) - Python/SQL code to verify the answer

DON’T provide answers or explanations here; just the questions and code.

ANSWERS SECTION (Place at end, separated)

After ALL questions, provide:

=== ANSWERS (Don't peek!) ===
[20+ blank lines to prevent accidental viewing]

Question 1: [Letter] - [One-line explanation of why]
Question 2: [Letter] - [Brief explanation of the gotcha]
...

### Detailed Explanations:
[For each question, explain why the correct answer is right and why each wrong answer is wrong]

Example Output Format

Question 1

You need to incrementally load JSON files from cloud storage into a Delta table. New files arrive every 5 minutes. Which configuration provides the most efficient solution?

A) Use Auto Loader with cloudFiles.format("json") and trigger mode B) Use Auto Loader with format("cloudFiles") and trigger mode C) Use spark.readStream.format(“json”) with trigger(once=True) D) Use Auto Loader with cloudFiles.format("json") and Trigger.Once()

[Continue with remaining questions…]

Test data setup:

# Create sample JSON files
import json
dbutils.fs.mkdirs("/tmp/test_autoloader")
data = [{"id": 1, "name": "test"}]
dbutils.fs.put("/tmp/test_autoloader/file1.json",
               json.dumps(data), overwrite=True)

Hands-on test code:

# Test each option to see which syntax works
# Option A
df = (spark.readStream
      .format("cloudFiles")
      .option("cloudFiles.format", "json")
      .load("/path/to/data"))

# Option B (will this work?)
df = (spark.readStream
      .format("cloudFiles")
      .format("json")  # Does this override?
      .load("/path/to/data"))

[Continue with test setup and code for all questions…]

[Insert 20+ blank lines here]

=== ANSWERS ===

Question 1: A - Auto Loader uses format("cloudFiles") with option("cloudFiles.format", "json"), not format("cloudFiles") alone

Why wrong answers fail:

B: format() cannot be chained; second call would override first
C: Structured Streaming needs format(“json”), but this lacks cloud file optimizations
D: Trigger.Once() doesn’t exist; correct syntax is trigger(once=True) or trigger(availableNow=True)

[Continue with all answer explanations…]

Additional requirements for this session:

Focus on: Associate certification level
Style: Scenario-based with code snippets

This site is open source. Improve this page.