dbx-exam-guide

Ways to Study

Here are some of the different ways to start out your exam studies. Note that all of them should include some amount of practical, hands-on activity, as it is the crux of success for DBX certification. Also keep in mind that, even though more recent publications are more likely to feature content which will be the part of the current certification exam, studying through slightly outdated material is still a very good starting point if you’re completely new to Databricks, but shouldn’t be the be-all and end-all.

Watching the Study Courses

There is plenty of video material available online which can prepare you for the certification exams. You can find both official and unofficial courses on all major educational platforms, with some being interactive, and most of them being on-demand.

For start, there are a lot of detailed and updated courses available from Databricks itself, on their Databricks Academy platform. They are available in either on-demand or scheduled interactive forms (live, with an instructor). These courses should be the go-to ones. If you are working for a company which is in a partnership with Databricks, you should be able to access this material for free. If not, there are still some courses which might be available to you without payment. If you opt to enroll in an interactive course, make sure to attend the classes and absolutely don’t shy away from actively engaging in a conversation with the instructor in case you have any questions.

There are also several on-demand courses on platforms such as Udemy and LinkedIn Learning. Links to some of the courses are available in the video courses section below. Note that you should always check whether the course you intend to enroll in is up-to-date. Check the date of last update, and make sure the course is not stale. If the last updated date is a year or more ago, it is safe to assume that the course is mostly outdated.

Reading the Books and Study Guides

If reading books is more up your alley, there are several publications available from certain publishers. You can find links to some of them in the books section below.

Benefits of reading the books (whether printed or e-books) are the ability to truly study at your own pace, mark down interesting or tricky sections, quickly compare the strategies or results from different chapters, etc. Also, books tend to be slightly more detailed when it comes to explanation of various processes.

Drawbacks are that sometimes the material can feel dense and presence of details can easily be mistaken for unnecessary prose or complexity, which can lure you into the trap of skipping some of the material to “unstuck” yourself. The devil is in the details, so you should try to avoid this. Allocate time for chapters or sections and commit to them. You also need to actively keep yourself from reading the book for too long, and allow to “interrupt” yourself and switch to Databricks for some hands-on application of studied material, as just reading is not enough to adopt the material. Also, keep in mind that books tend to get obsolete fast in today’s rapidly developing world. Always take note of publication date and try to be proactive when it comes to determining the freshness of the study material.

Reading the Official Tutorials and Documentation

Sometimes the best way is to just dive into the tutorials and sift through the online documentation. If you already have solid experience with similar technologies (Python, SQL, Spark, notebooks, etc.) going head first into the raw documentation can have benefits of “hacking your way through DBX”. However, do keep in mind that the terminology, while functionally similar, might not always be similarly named or formulated to what you encountered in other tools. Because of that, always follow some form of tutorial and then get the details from the docs.

Study Buddy

If you have a colleague or a friend who is also pursuing this certification, you can try studying together. The advantage of having a study buddy is that sometimes two heads think better than one. You can complement each other’s notes, point out immediate errors, clear out any uncertainties, do a Q&A and review each other’s progress, and so much more. Additionally, it can also bring a psychological benefit of knowing that you’re not alone in this and you have support.

If you take this path, best way to keep track is to have a common curriculum and tackle it together. This is, of course, very much depending on how much time you can allocate, but always aim to study synchronously, online or in-person, instead of meeting occasionally for checkpoints.

Example Notebooks

Using pre-made and annotated Notebooks is a great way to dive into Databricks through hands-on approach. They allow you to quickly go through the matter and execute example code while doing it. Added benefit is that you can immediately tinker with the code and try out different scenarios or compare different approaches to solving a certain problem.

Most notebooks available online are not strictly official, but can be very well written and structured. Having said that, there are plenty of official notebooks available in Databricks Academy Labs, though they are closely tied to the curriculum available in the Databricks Academy courses.

Alternatively, you can find many personal notebooks by people who were preparing for the exams and archived their study files and notebooks online. Searching for databricks data engineering exam on GitHub should yield some good results. As with other sources, check for the freshness of the material. Be extra careful when it comes to executing the code from unofficial sources; be vigilant and use your best judgement. For safety purposes, this guide will not contain links to unofficial sources.

Using Generative AI

Using generative AI services like ChatGPT, Claude or Gemini (there are tons of others, of course) to generate a custom curriculum can be a good alternative or supplement to all of the methods listed above. In fact, there is a whole section about using generative AI available here, so this is just a tl;dr.

Utilising gen-AI services can save you a lot of time when it comes to organising your study and your time, and provide you with a useful framework that you can use as more than just a starting point. You can use it to generate a curriculum, to generate use-case examples, get quick answers if you’re stuck, create mock tests and so on – the limit is your imagination. Keep in mind that you can also use the Databricks Assistant inside your notebooks, even in the Free Edition.

However, using generative AI to study for a certification exam is not without its caveats. First and foremost, generative AI is an emerging technology which is still not reliable. The output you get from these services can be outdated, misleading, or just plain wrong. As such, any output should be taken with a lump of salt. Second, depending on the output quality and accuracy, as well as model age, you might end up spending a lot of time learning the wrong things. Third, even if you notice the errors, but still decide to try to salvage the generated output, working around them can take more mental energy than reading the official docs or just not using gen-AI outright. If you don’t have experience in using gen-AI models, parsing and scrutinising their output, a better bet would be to use other studying methods.

Official exam resources

Additional official resources

Video courses

Books