Visit Official SkillCertPro Website :-
For a full set of 757 questions. Go to
https://skillcertpro.com/product/databricks-certified-data-engineer-associate-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.
Question 1:
A company has launched a new feature in its mobile application. The team of data analysts is starting a marketing campaign to monitor the KPIs of the new feature for the first 7 days. They have set up a Databricks SQL dashboard which gets automatically refreshed in every 1 hour.
Which of the following options can be used to reduce of the cost of this dashboard refresh schedule over time?
A. Manually refresh the dashboard in every 1 hour instead of setting an automatic refresh schedule
B. Temporarily pause scheduled dashboard updates when it has been refreshed recently
C. Increase the cluster size of the SQL endpoint
D. Increase the maximum bound of the SQL endpoint’s scaling range
E. Stop automatically updating the dashboard after one week to reduce the cost of SQL warehouse
Answer: E
Explanation:
Stop automatically updating a dashboard
If a dashboard is configured for automatic updates, it has a Scheduled button at the top, rather than a Schedule button. To stop automatically updating the dashboard and remove its subscriptions:
1. Click Scheduled.
2. In the Refresh every drop-down, select Never.
3. Click Save. The Scheduled button label changes to Schedule.
Question 2:
A data analyst has created a query in Databricks SQL to analyze different key metrics for their e-commerce business on a monthly basis like the number of orders placed, number of repeat customers, total revenue generated, total profit made and so on across multiple geographical regions. She wants to see the results dynamically for different states in the country based on the particular value selected.
Which of the following is the best approach that can be used by the data analyst to complete this task?
A. Create a view for every state and then query it to get the data directly for that particular state
B. Create a SQL UDF and pass the parameter state at runtime to get the data for that state
C. Use dropdown list with single value select option as query parameter and select any value of State from that list to substitute into query at runtime for getting the desired results
D. Manually provide the value of state in the where clause of query and then run it to get the corresponding data
E. This type of querying to filter the data at runtime is not possible in Databricks
Answer: C
Explanation:
Query parameters
A query parameter lets you substitute values into a query at runtime. Any string between double curly braces {{ }} is treated as a query parameter. A widget appears above the results pane where you set the parameter value.
Query parameter types
Text
Number
Dropdown List
Query Based Dropdown List
Date and Time
Dropdown List
To restrict the scope of possible parameter values when running a query, you can use the Dropdown List parameter type. An example would be SELECT * FROM users WHERE name='{{ dropdown_param }}’. When selected from the parameter settings panel, a text box appears where you can enter your allowed values, each one separated by a new line. Dropdown lists are Text parameters, so if you want to use dates or dates and times in your Dropdown List, you should enter them in the format your data source requires. The strings are not escaped. You can choose between a single value or multi-value dropdown.
Single value: Single quotes around the parameter are required.
Multi-value: Toggle the Allow multiple values option. In the Quotation drop-down, choose whether or not to wrap the parameters with quotes or use single or double quotes. If you choose quotes, you don’t need to add quotes around the parameter.
In your query, change your WHERE clause to use the IN keyword.
SELECT …
FROM …
WHERE field IN ( {{ Multi Select Parameter }} )
The parameter multi-selection widget lets you pass multiple values to the database.
So, the correct answer is: “Use Single value Dropdown List as query parameter and select any value of State from that list to substitute into query at runtime for getting the desired results”
Question 3:
A data engineering team has created a Databricks Job having 5 tasks with a linear dependency and it takes around 1 hour to run the job. One fine day, the job got failed suddenly. The first 3 tasks got succeeded but there was an issue in the 4th task which caused the failure and due to this, the 5th task got skipped.
Which of the following steps can be performed by them to repair this job run successfully ensuring that execution time and compute resources is reduced?
A. They can manually execute the code for last 2 tasks to repair the job run
B. They can repair this job so that all the 5 tasks will be re-run
C. They can repair this job so that only the last 2 tasks will be re-run
D. They need to delete the failed job run and trigger a new run for this job
E. They can keep the failed job run and simply start a new run for the job
Answer: C
Explanation:
Repair an unsuccessful job run
You can repair failed or canceled multi-task jobs by running only the subset of unsuccessful tasks and any dependent tasks. Because successful tasks and any tasks that depend on them are not re-run, this feature reduces the time and resources required to recover from unsuccessful job runs.
You can change job or task settings before repairing the job run. Unsuccessful tasks are re-run with the current job and task settings. For example, if you change the path to a notebook or a cluster setting, the task is re-run with the updated notebook or cluster settings.
To repair an unsuccessful job run:
Click Jobs in the sidebar.
In the Name column, click a job name. The Runs tab shows active runs and completed runs, including any unsuccessful runs.
Click the link for the unsuccessful run in the Start time column of the Completed Runs (past 60 days) table. The Job run details page appears.
Click Repair run. The Repair job run dialog appears, listing all unsuccessful tasks and any dependent tasks that will be re-run.
To add or edit parameters for the tasks to repair, enter the parameters in the Repair job run dialog. Parameters you enter in the Repair job run dialog override existing values. On subsequent repair runs, you can return a parameter to its original value by clearing the key and value in the Repair job run dialog.
Click Repair run in the Repair job run dialog.
So, the correct answer is: “They can repair this job so that only the last 2 failed tasks will be re-run”
Question 4:
A senior data analyst called James is currently the owner of marketing schema. As part of an upcoming social media marketing campaign, Michael wants to create a new table called digital_marketing inside that schema. James granted the CREATE TABLE privilege to Michael and then he created that table for his work.
Who is basically the owner of the new table called digital_marketing?
A. James and Michael both are owners of new table called digital_marketing
B. James is the owner of marketing schema. So, all the tables in this schema will be owned by him.
C. Michael is the owner of the digital_marketing table.
D. Workspace admin is the owner of new table digital_marketing by default.
E. Account admin is the owner of new table digital_marketinng by default.
Answer: C
Explanation:
Manage Unity Catalog Object Ownership
Each securable object in Unity Catalog has an owner. The owner can be any account-level user or group, called a principal.
The principal that creates an object becomes its initial owner. An objects owner has all privileges on the object, such as SELECT and MODIFY on a table, in addition to the permission to grant privileges to other principals.
Object ownership can be transferred to other principals by either the current owner, a metastore admin, or the owner of the catalog or schema that contains the table.
So, the correct answer is: “Michael is the owner of the digital_marketing table.“
Question 5:
Mike is currently the owner of sales table. He has granted the privilege for creating a view called regional_sales_vw to Ross. After the view has been created, David asks Ross to grant him the privilege to query the data from the view. Ross gave him the privileges but when David tries to access the view, h
A. David needs to be the owner of regional_sales_vw to access it
B. Ross needs to take the transfer of ownership of sales table from Mike
C. Mike needs to provide SELECT privilege to David on sales table
D. David needs to be the owner of sales table in order to access regional_sales_vw
E. Ross needs to provide SELECT privilege to David on sales table
Answer: C
Explanation:
Let‘s figure out the correct answer for this question.
A user has SELECT privileges on a view of table T, but when that user tries to SELECT from that view, they get the error User does not have privilege SELECT on table.
This common error can occur for one of the following reasons:
Table T has no registered owner because it was created using a cluster or SQL warehouse for which table access control is disabled.
The grantor of the SELECT privilege on a view of table T is not the owner of table T or the user does not also have select SELECT privilege on table T.
Suppose there is a table T owned by A. A owns view V1 on T and B owns view V2 on T.
A user can select on V1 when A has granted SELECT privileges on view V1.
A user can select on V2 when A has granted SELECT privileges on table T and B has granted SELECT privileges on V2.
In our scenario, Ross has granted SELECT privilege on regional_sales_vw to David but he does not have SELECT privilege on sales table. In order to query the view, David needs SELECT privilege on both table and view.
So, the correct answer is Mike needs to provide SELECT privilege to David on sales table.
For a full set of 757 questions. Go to
https://skillcertpro.com/product/databricks-certified-data-engineer-associate-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.
Question 6:
What is the normalized unit of processing power on the Databricks Lakehouse Platform used for measurement and pricing purpose of different workloads?
A. Databricks Normalized Unit (DNU)
B. Databricks Lakehouse Platform Unit (DLPU)
C. Databricks Unit (DBU)
D. Databricks Cluster Unit (DCU)
E. Databricks Standard Unit (DSU)
Answer: C
Explanation:
A Databricks Unit (DBU) is a normalized unit of processing power on the Databricks Lakehouse Platform used for measurement and pricing purposes. The number of DBUs a workload consumes is driven by processing metrics which may include the compute resources used and the amount of data processed. For example, 1 DBU is the equivalent of Databricks running on an i3.xlarge machine with the Databricks 8.1 standard runtime for an hour.
So, the correct answer is: Databricks Unit (DBU)
Question 7:
What is the underlying mechanism used by the Auto Loader to incrementally and efficiently process new data files as they arrive in cloud storage?
A. Delta Live Tables
B. Databricks SQL
C. Multi-hop Architecture
D. COPY INTO
E. Structured Streaming
Answer: E
Explanation:
How does Auto Loader work?
Auto Loader provides a Structured Streaming source called cloudFiles. Given an input directory path on the cloud file storage, the cloudFiles source automatically processes new files as they arrive, with the option of also processing existing files in that directory.
You can use Auto Loader to process billions of files to migrate or backfill a table. Auto Loader scales to support near real-time ingestion of millions of files per hour.
How does Auto Loader track ingestion progress?
As files are discovered, their metadata is persisted in a scalable key-value store (RocksDB) in the checkpoint location of your Auto Loader pipeline. This key-value store ensures that data is processed exactly once.
In case of failures, Auto Loader can resume from where it left off by information stored in the checkpoint location and continue to provide exactly-once guarantees when writing data into Delta Lake. You don’t need to maintain or manage any state yourself to achieve fault tolerance or exactly-once semantics.
So, the correct answer is Structured Streaming.
Question 8:
A data engineer has set up a Delta Live Tables pipeline that updates all the tables once and then stops. The compute resources of the pipeline persist to allow for additional testing.
Which of the following options correctly describes the execution modes of this DLT pipeline?
A.DLT pipeline is configured to run in Development mode using the Continuous Pipeline mode.
B. DLT pipeline is configured to run in Production mode using the Continuous Pipeline mode.
C. DLT pipeline is configured to run in Development mode using the Triggered Pipeline mode.
D. DLT pipeline is configured to run in Production mode mode using the Triggered Pipeline mode.
E. Cannot infer the execution mode of DLT Pipeline from the above information
Answer: C
Explanation:
Let‘s try to understand the correct answer for this question by looking at different execution modes.
Development Mode: When you run your pipeline in development mode, the Delta Live Tables system does the following.
Reuses a cluster to avoid the overhead of restarts. By default, clusters run for two hours when development mode is enabled. You can change this with the pipelines.clusterShutdown.delay setting in the Cluster configuration.
Disables pipeline retries so you can immediately detect and fix errors.
Triggered Pipeline: Triggered pipelines update each table with whatever data is currently available and then stop the cluster running the pipeline. Delta Live Tables automatically analyzes the dependencies between your tables and starts by computing those that read from external sources. Tables within the pipeline are updated after their dependent data sources have been updated.
So, the correct answer is: DLT pipeline is configured to run in Development mode using the Triggered Pipeline mode.
Question 9:
What are the output modes available in the Spark Structured Streaming?
A. Append, Complete, Merge
B. Merge, Overwrite, Append
C. Merge, Update, Complete
D. Overwrite, Append, Update
E. Append, Complete, Update
Answer: E
Explanation:
Let‘s try to find out the correct answer for this question.
Output Modes
Append mode (default) – This is the default mode, where only the new rows added to the Result Table since the last trigger will be outputted to the sink. This is supported for only those queries where rows added to the Result Table is never going to change. Hence, this mode guarantees that each row will be output only once (assuming fault-tolerant sink). For example, queries with only select, where, map, flatMap, filter, join, etc. will support Append mode.
Complete mode – The whole Result Table will be outputted to the sink after every trigger. This is supported for aggregation queries.
Update mode – (Available since Spark 2.1.1) Only the rows in the Result Table that were updated since the last trigger will be outputted to the sink. More information to be added in future releases.
So, the correct answer is Append, Complete, Update
Question 10:
Which of the following data workloads will utilize a Bronze table as its source?
A. A job that ingests raw data from a streaming source into the Lakehouse
B. A job that develops a feature set for a machine learning application
C. A job that aggregates cleaned data to create standard summary statistics
D. A job that cleanses data by removing junk characters from its columns and parse them into a human-readable format
E. A job that queries aggregated data to publish key insights into a dashboard
Answer: D
Explanation:
Let‘s try to understand the answer of this question.
Bronze Tables
Bronze tables contain raw data ingested from various sources like batch and streaming. For example: JSON files, RDBMS data, IoT data, Kafka data, etc. Bronze tables serve as the data lake, where massive amounts of data come in continuously. When it arrives, it’s dirty because it comes from different sources, some of which are not so structured. It‘s basically a place where data can be captured and retained in its rawest form.
Silver Tables
Silver tables provide a more refined view of our data. We can cleanse the data by removing unwanted characters from fields, select required fields and join fields from various bronze tables to enrich streaming records, or update account statuses based on recent activity.
So, the correct answer is “A job that cleanses data by removing junk characters from its columns and parse them into a human-readable format “.
For a full set of 757 questions. Go to
https://skillcertpro.com/product/databricks-certified-data-engineer-associate-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.