Visit Official SkillCertPro Website :-
For a full set of 760+ questions. Go to
https://skillcertpro.com/product/databricks-certified-data-engineer-associate-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.
Question 1:
How does a Delta Lake differ from a traditional data lake?
A. Delta lake is Datawarehouse service on top of data lake that can provide reliability, security, and performance
B. Delta lake is a caching layer on top of data lake that can provide reliability, security, and performance
C. Delta lake is an open storage format like parquet with additional capabilities that can provide reliability, security, and performance
D. Delta lake is an open storage format designed to replace flat files with additional capabilities that can provide reliability, security, and performance
E. Delta lake is proprietary software designed by Databricks that can provide reliability, security, and performance
Answer: C
Explanation:
What is Delta?
Delta lake is
· Open source
· Builds up on standard data format
· Optimized for cloud object storage
· Built for scalable metadata handling
Delta lake is not
· Proprietary technology
· Storage format
· Storage medium
· Database service or data warehouse
Question 2:
How VACCUM and OPTIMIZE commands can be used to manage the DELTA lake?
A. VACCUM command can be used to compact small parquet files, and the OPTIMZE command can be used to delete parquet files that are marked for deletion/unused.
B. VACCUM command can be used to delete empty/blank parquet files in a delta table. OPTIMIZE command can be used to update stale statistics on a delta table.
C. VACCUM command can be used to compress the parquet files to reduce the size of the table, OPTIMIZE command can be used to cache frequently delta tables for better performance.
D. VACCUM command can be used to delete empty/blank parquet files in a delta table, OPTIMIZE command can be used to cache frequently delta tables for better performance.
E. VACCUM command can be used to delete parquet files that are marked for deletion/unused, and the OPTIMIZE command can be used to compact small parquet files.
Answer: E
Explanation:
VACCUM:
You can remove files no longer referenced by a Delta table and are older than the retention threshold by running the vacuum command on the table. vacuum is not triggered automatically. The default retention threshold for the files is 7 days. To change this behavior, see Configure data retention for time travel.
OPTIMIZE:
Using OPTIMIZE you can compact data files on Delta Lake, this can improve the speed of read queries on the table. Too many small files can significantly degrade the performance of the query.
Question 3:
Which of the following SQL statements can be used to update a transactions table, to set a flag on the table from Y to N
A. MODIFY transactions SET active_flag = ‘N‘ WHERE active_flag = ‘Y‘
B. MERGE transactions SET active_flag = ‘N‘ WHERE active_flag = ‘Y‘
C. UPDATE transactions SET active_flag = ‘N‘ WHERE active_flag = ‘Y‘
D. REPLACE transactions SET active_flag = ‘N‘ WHERE active_flag = ‘Y‘
Answer: C
Explanation:
The answer is
UPDATE transactions SET active_flag = ‘N‘ WHERE active_flag = ‘Y‘
Delta Lake supports UPDATE statements on the delta table, all of the changes as part of the update are ACID compliant.
Question 4:
You were asked to identify number of times a temperature sensor exceed threshold temperature (100.00) by each device, each row contains 5 readings collected every 5 minutes, fill in the blank with the appropriate functions.
Schema: deviceId INT, deviceTemp ARRAY, dateTimeCollected TIMESTAMP
SELECT deviceId, __ (__ (__(deviceTemp], i -> i > 100.00)))
FROM devices
GROUP BY deviceId
A. SUM, COUNT, SIZE
B. SUM, SIZE, SLICE
C. SUM, SIZE, ARRAY_CONTAINS
D. SUM, SIZE, ARRAY_FILTER
E. SUM, SIZE, FILTER
Answer: E
Explanation:
FILER function can be used to filter an array based on an expression
SIZE function can be used to get size of an array
SUM is used to calculate to total by device
https://docs.mongodb.com/manual/core/sharding-shard-key/
Question 5:
You are currently looking at a table that contains data from an e-commerce platform, each row contains a list of items(Item number) that were present in the cart, when the customer makes a change to the cart the entire information is saved as a separate list and appended to an existing list for the duration of the customer session, to identify all the items customer bought you have to make a unique list of items, you were asked to create a unique item’s list that was added to the cart by the user, fill in the blanks of below query by choosing the appropriate higher-order function?
Note: See below sample data and expected output.
Schema: cartId INT, items Array
Fill in the blanks:
A. SELECT cartId, _(_(items)) FROM carts
B. ARRAY_UNION, ARRAY_DISCINT
C. ARRAY_DISTINCT, ARRAY_UNION
D. ARRAY_DISTINCT, FLATTEN
E. FLATTEN, ARRAY_DISTINCT
F. ARRAY_DISTINCT, ARRAY_FLATTEN
Answer: D
Explanation:
FLATTEN -> Transforms an array of arrays into a single array.
ARRAY_DISTINCT -> The function returns an array of the same type as the input argument where all duplicate values have been removed.
For a full set of 760+ questions. Go to
https://skillcertpro.com/product/databricks-certified-data-engineer-associate-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.
Question 6:
Which of the following python statements can be used to replace the schema name and table name in the query?
A. table_name = “sales“
schema_name = “bronze“
query = f“select * from schema_name.table_name“
B. table_name = “sales“
query = “select * from {schema_name}.{table_name}“
C. table_name = “sales“
query = f“select * from {schema_name}.{table_name}“
D. table_name = “sales“
query = f“select * from + schema_name +“.“+table_name“
Answer: C
Explanation:
The answer is
table_name = “sales“
query = f“select * from {schema_name}.{table_name}“
It is always best to use f strings to replace python variables, rather than using string concatenation.
Question 7:
You are asked to setup an AUTO LOADER to process the incoming data, this data arrives in JSON format and get dropped into cloud object storage and you are required to process the data as soon as it arrives in cloud storage, which of the following statements is correct
A. AUTO LOADER is native to DELTA lake it cannot support external cloud object storage
B. AUTO LOADER has to be triggered from an external process when the file arrives in the cloud storage
C. AUTO LOADER needs to be converted to a Structured stream process
D. AUTO LOADER can only process continuous data when stored in DELTA lake
E. AUTO LOADER can support file notification method so it can process data as it arrives
Answer: E
Explanation:
Auto Loader supports two modes when ingesting new files from cloud object storage
Directory listing: Auto Loader identifies new files by listing the input directory, and uses a directory polling approach.
File notification: Auto Loader can automatically set up a notification service and queue service that subscribe to file events from the input directory.
File notification is more efficient and can be used to process the data in real-time as data arrives in cloud object storage.
Choosing between file notification and directory listing modes | Databricks on AWS
Question 8:
What is the main difference between the silver layer and gold layer in medallion architecture?
A. Silver optimized to perform ETL, Gold is optimized query performance
B. Gold is optimized go perform ETL, Silver is optimized for query performance
C. Silver is copy of Bronze, Gold is a copy of Silver
D. Silver is stored in Delta Lake, Gold is stored in memory
E. Silver may contain aggregated data, gold may preserve the granularity of original data
Answer: A
Explanation:
Medallion Architecture – Databricks
Gold Layer:
1. Powers Ml applications, reporting, dashboards, ad hoc analytics
2. Refined views of data, typically with aggregations
3. Reduces strain on production systems
4. Optimizes query performance for business-critical data
Exam focus: Please review the below image and understand the role of each layer(bronze, silver, gold) in medallion architecture, you will see varying questions targeting each layer and its purpose.
Question 9:
How do you upgrade an existing workspace managed table to a unity catalog table?
ALTER TABLE table_name SET UNITY_CATALOG = TRUE
A. Create table catalog_name.schema_name.table_name as select * from hive_metastore.old_schema.old_table
B. Create table table_name as select * from hive_metastore.old_schema.old_table
C. Create table table_name format = UNITY as select * from old_table_name
D. Create or replace table_name format = UNITY using deep clone old_table_name
Answer: A
Explanation:
The answer is Create table catalog_name.schema_name.table_name as select * from hive_metastore.old_schema.old_table
Basically, we are moving the data from an internal hive metastore to a metastore and catalog that is registered in the Unity catalog.
note: if it is a managed table the data is copied to a different storage account, for a large tables this can take a lot of time. For an external table the process is different.
Managed table: Upgrade a managed to Unity Catalog
External table: Upgrade an external table to Unity Catalog
Question 10:
Which of the following SQL command can be used to insert or update or delete rows based on a condition to check if a row(s) exists?
A. MERGE INTO table_name
B. COPY INTO table_name
C. UPDATE table_name
D. INSERT INTO OVERWRITE table_name
E. INSERT IF EXISTS table_name
Answer: A
Explanation:
https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.html
MERGE INTO target_table_name [target_alias]
USING source_table_reference [source_alias]
ON merge_condition
[ WHEN MATCHED [ AND condition ] THEN matched_action ] […]
[ WHEN NOT MATCHED [ AND condition ] THEN not_matched_action ] […]
matched_action
{ DELETE |
UPDATE SET * |
UPDATE SET { column1 = value1 } [, …] }
not_matched_action
{ INSERT * |
INSERT (column1 [, …] ) VALUES (value1 [, …])
For a full set of 760+ questions. Go to
https://skillcertpro.com/product/databricks-certified-data-engineer-associate-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.