Visit Official SkillCertPro Website :-
For a full set of 550+ questions. Go to
https://skillcertpro.com/product/databricks-data-engineer-professional-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.
Question 1:
A data engineer executes the following command to add a NOT NULL constraint on one of the columns of a Delta table that already had null values.
ALTER TABLE universities ALTER COLUMN location SET NOT NULL;
Which of the following would be the outcome if after executing the above command, the data engineer tries to add another null value in the location column of the Delta table?
A. The ALTER TABLE command will fail but new NULLS cannot be added to the location column.
B. Since the ALTER TABLE command will return an error, new NULLS can be added to the location column.
C. The ALTER TABLE command will be successfully executed but new NULLS can still be added to the location column.
D. The ALTER TABLE command will be successful and no new NULLS will be accepted in the location column.
E. ALTER TABLE command will be successful and all the previous and new NULL values will be dropped.
Answer: B
Explanation:
As the location column already has NULL values, the ALTER TABLE command will return an error and the NOT NULL constraint will not be added to the column. It means that you can still add NULL values in the column.
Also note, every time a constraint is added to an existing table, the already existing values should contain valid values. If the already existing data is not valid, the constraint cannot be added to the table.
Static Notebook
To view all the notebooks in this course, download Notebooks.dbc and import to your Databricks Account.
More Info: Adding a NOT NULL constraint
Question 2:
An organization stores the salary of its employees in the emp_sal delta table. The requirement is to store the last salary as well as the current salary of all the employees in the table without increasing the number of rows of the table. Which of the following type of table can be selected to incorporate the requirement?
A. Type 0 SCD
B. Type 1 SCD
C. Type 2 SCD
D. Type 3 SCD
E. Type 4 SCD
Answer: D
Explanation:
Three of the most commonly used types of Slowly Changing Dimension(SCD) are:
1. Type 1 SCD – The old record is overwritten by the new record.
2. Type 2 SCD – The new record is appended to the table whereas the old record is set to not-active either by using the end_date or active column.
3. Type 3 SCD – Changes are tracked using column addition. A new column is added for the current value while the original value is retained, as well.
In this question, as the number of rows should not increase and the previous record should also be retained, Type 3 SCD can be used to fulfill the requirement which adds another column current_salary which can be used for storing the current salary whereas the original column stores the previous salary.
More Info: Type 3 SCD
Question 3:
A new member has recently been added to a team of developers. The new member wants to run an existing notebook using %run magic command in their newly created notebook. What is the minimum notebook-level permission that can be granted to the new member allowing them to run the existing notebook?
A. No permissions are required
B. Can Read permission
C. Can Run permission
D. Can Edit permission
E. Can Mange permission
Answer: B
Explanation:
There are 5 levels of permissions for a notebook in Databricks:
1. No permissions
2. Can Read – Can view the cells of the notebook, add comments and run the notebook using the %run magic command or the notebook workflows.
3. Can Run – All the permissions stated in Can Read plus can attach or detach notebooks from a cluster and run the cells of the notebook.
4. Can Edit – All the permissions in Can Run and can also edit the cells.
5. Can Manage – All the permissions stated in Can Edit and also the ability to change the permissions of the notebook.
So, the data engineer with Can Manage permission can change the permissions on the notebook while the other one cannot do the same.
More Info: Notebook-level permissions in Databricks
Question 4:
Two data engineers tries to print the defects_df DataFrame using show() and display() methods respectively. Which of the following statements describes the difference between the type of output format displayed for both the data engineers.
A. The show() method displays the DataFrame in a tabular format but the display() method prints only the column names without any data.
B. The display() method can be used to visualize the DataFrame in the form of charts, graphs etc.
C. The show() method can only be used in Databricks.
D. Running the display() method converts a DataFrame to an RDD.
E. Both B and D are true.
Answer: B
Explanation:
Both show() and display() methods can be used to print a DataFrame in Databricks. While working in Databricks, the display() method enables visualization over a DataFrame whereas the show() method does not.
Also note, the show() method supports arguments like truncate and vertical to view the data in different forms.
Static Notebook
To view all the notebooks in this course, download Notebooks.dbc and import to your Databricks Account.
More Info: display() method in Databricks | show() method in Spark
Question 5:
A team of data engineers is working on their respective notebooks attached to a common cluster. While creating the cluster, the admin updated the Spark config properties by adding spark.sql.autoBroadcastJoinThreshold 100b One of the data engineers runs the following command on their notebook named show_orders:
spark.conf.set(“spark.sql.autoBroadcastJoinThreshold“, ‘1024b‘)
Another data engineer from the team, who is working on a notebook named fetch_results runs the following commands:
%run ./show_orders
spark.sparkContext.getConf().get(“spark.sql.autoBroadcastJoinThreshold“)
Assuming all commands run successfully, which of the following statements explains the output?
A. The output will be ‘1024b‘ as the config property defined in the notebook prevails over the config property set during the cluster creation.
B. The output will be ‘100b‘ because the config property was changed in another notebook.
C. The output will be ‘100b‘ as the config property defined during the cluster creation cannot be altered at the context level in the notebook.
D. There will be no output because there is no print statement.
E. The output will be None because the default value of the config property is None.
Answer: C
Explanation:
Once the config property is defined during the cluster creation, it cannot be altered at the context level. To change the already defined config property at the context level, you will need to edit the cluster’s spark properties using Cluster UI.
IMPORTANT – If you run spark.conf.get(“spark.sql.autoBroadcastJoinThreshold“) after running spark.conf.set(“spark.sql.autoBroadcastJoinThreshold“, ‘1024b‘) the result will be ‘1024b‘ as this change is made at the notebook level. But once you try to read the config property using getConf() method of sparkContext (as asked in the question), the output will be ‘100b‘
Static Notebook
To view all the notebooks in this course, download Notebooks.dbc and import to your Databricks Account.
More Info: Spark Config properties in Databricks Clusters
For a full set of 550+ questions. Go to
https://skillcertpro.com/product/databricks-data-engineer-professional-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.
Question 6:
A streaming query has been started while the target table already contains some records. Which output mode would be selected for the query given that the output mode is not specified by the data engineer?
A. As the output mode is mandatory, the query will fail.
B. The complete mode will be auto-selected.
C. The default output mode i.e update mode will be selected.
D. As the output mode is omitted, the append mode will be selected.
E. The output mode depends on the type of data being loaded.
Answer: D
Explanation:
Output mode is specified using .outputMode() in a Spark streaming query. There are three types of output modes supported in a Spark Streaming query:
1. Complete Mode: Entire table is replaced with the current micro-batch of data.
2. Append Mode: Only the new records are added to the target table.
3. Update Mode: Only the rows changed since the last micro-batch of data are updated in the target table.
Also, note that the default mode is Append mode. So, if the output mode is not specified the query will not fail and the append mode will be auto-selected.
Static Notebook
To view all the notebooks in this course, download Notebooks.dbc and import to your Databricks Account.
More Info: Output modes in Spark streaming | Default output mode in Spark streaming
Question 7:
A streaming application is using AutoLoader to ingest new files as they come in an S3 location. To infer the schema, AutoLoader uses the first 50 GB of data or the first 1000 files, whichever is lesser. Which of the following configurations should be changed to set the default value to 500 files for all future queries using AutoLoader?
A. spark.databricks.cloudFiles.schemaInference.sampleSize.numFiles
B. spark.sql.cloudFiles.schemaInference.sampleFileSize.numBytes
C. spark.sql.cloudFiles.schemaInference.sampleSize.numFiles
D. spark.databricks.cloudFiles.schemaInference.sampleFileSize.numBytes
E. spark.databricks.sql.cloudFiles.schemaInference.sampleSize.numFiles
Answer: A
Explanation:
To re-iterate the learnings from the question itself, AutoLoader uses the first 50 GB or first 1000 files to infer the schema, whichever condition is met first.
For example, if there are 400 files with a collective file size of 50 GB, AutoLoader will infer the schema from those 400 files. Alternatively, if there are a total of 3400 files that collectively sizes 40 GB, AutoLoader will infer the schema from the first 1000 files only.
These default settings can be changed using spark configurations. There are two types of configurations that can be used to change the sample size to infer the schema:
1. spark.databricks.cloudFiles.schemaInference.sampleSize.numBytes – It can be used to set the limit on the total size, which is 50 GB by default.
2. spark.databricks.cloudFiles.schemaInference.sampleSize.numFiles – This is what is asked in the question. It is used to limit the number of files to infer the schema, which is 1000 by default.
More Info: Auto Loader schema inference
Question 8:
A data engineer needs to create a function named print_details that prints the name of the current catalog and database separated by a space. Which of the following statements can be used to create the function?
A. CREATE FUNCTION print_details() RETURNS concat(current_catalog(),‘ ‘,current_database());
B. CREATE FUNCTION print_details(STRING) RETURNS concat(current_catalog(),‘ ‘,current_database());
C. CREATE FUNCTION print_details RETURNS STRING RETURN concat(current_catalog(),‘ ‘,current_database());
D. CREATE FUNCTION print_details(STRING) RETURN concat(current_catalog(),‘ ‘,current_database());
E. CREATE FUNCTION print_details() RETURNS STRING RETURN concat(current_catalog(),‘ ‘,current_database());
Answer: E
Explanation:
This is a tricky question as RETURN and RETURNS keywords create confusion. The RETURNS keyword is used to specify the type of return value like string and int whereas the RETURN keyword is used to print the actual return value. Also, note that the name of the function should be appended with the empty brackets if the function does not accept any arguments i.e print_details()
After knowing the above stuff, you can easily answer these types of questions in the actual exam.
Static Notebook
To view all the notebooks in this course, download Notebooks.dbc and import to your Databricks Account.
More Info: Crete a user-defined SQL function | current_catalog() function | currrent_database() function
Question 9:
Three different types of views available in Databricks i.e view, temporary view and global temporary view are being used in a single notebook. Which of the following views will be accessible if the notebook is detached and re-attached to the same cluster?
A. All the 3 views will be accessible.
B. Only the global temp view will be accessible.
C. None of the views will be accessible.
D. All the views except the temporary view will be accessible.
E. Temporary and global temporary views will not be accessible.
Answer: D
Explanation:
Let us take a look at each of the views one by one:
1. View – It persists through multiple sessions and thus, will be accessible even after the notebook is detached and re-attached.
2. Temporary view – As the name suggests, the temporary views are tied to a session. So, these types of views will not be accessible once the notebook is detached from the cluster.
3. Global temporary view – They are stored in a temporary database called global_temp and are accessible even if the notebook is detached and re-attached from the cluster.
Hence, only the temporary view will lose its existence once the notebook is detached and re-attached.
Static Notebook
To view all the notebooks in this course, download Notebooks.dbc and import to your Databricks Account.
More Info: Types of Views in Databricks
Question 10:
The data engineering team has a secret scope named “prod-scope” that contains sensitive secrets in a production workspace.
A data engineer in the team is writing a security and compliance documentation, and wants to explain who could use the secrets in this secret scope.
Which of the following roles is able to use the secrets in the specified secret scope ?
A. Workspace Administrators
B. Secret creators
C. Users with MANAGE permission on the secret scope
D. Users with READ permission on the secret scope
E. All the above are able to use the secrets in the secret scope
Answer: E
Explanation:
Administrators*, secret creators, and users granted access permission can use Databricks secrets. The secret access permissions are as follows:
MANAGE – Allowed to change ACLs, and read and write to this secret scope.
WRITE – Allowed to read and write to this secret scope.
READ – Allowed to read this secret scope and list what secrets are available.
Each permission level is a subset of the previous level’s permissions (that is, a principal with WRITE permission for a given scope can perform all actions that require READ permission).
* Workspace administrators have MANAGE permissions to all secret scopes in the workspace.
Reference:
https://docs.databricks.com/security/secrets/index.html
https://docs.databricks.com/security/auth-authz/access-control/secret-acl.html#permission-levels
For a full set of 550+ questions. Go to
https://skillcertpro.com/product/databricks-data-engineer-professional-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.