Visit Official SkillCertPro Website :-
Databricks Spark Developer 3.0 (Scala) Exam Dumps 2022
Databricks Spark Developer 3.0 (Scala) Practice Tests 2022. Contains 160+ exam questions to pass the exam in first attempt.
For a full set of 160+ questions. Go to
https://skillcertpro.com/product/databricks-spark-developer-3-0-scala-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.
Question 1:
Which of the following is true for driver?
A. Reports the state of some computation back to a central system.
B. Responsible for allocating resources for worker nodes.
C. Is a chunk of data that sit on a single machine in a cluster?
D. Responsible for assigning work that will be completed in parallel.
E. Responsible for executing work that will be completed in parallel.
Answer: D
Explanation:
The driver is the machine in which the application runs. It is responsible for three main things: 1) Maintaining information about the Spark Application, 2) Responding to the user’s program, 3) Analyzing, distributing, and scheduling work across the executors.
Question 2:
Given that the number of partitions of dataframe df is 4 and we want to write a parquet file in a given path. Choose the correct number of files after a successful write operation.
A. 8
B. 1
C. 4
Answer: C
Explanation:
We control the parallelism of files that we write by controlling the partitions prior to writing and therefore the number of partitions before writing equals to number of files created after the write operation.
Question 3:
For the following dataframe if we want to fully cache the dataframe, what functions should we call in order ?
val df = spark.range(1 * 10000000).toDF(“id“).withColumn(“s2“, $“id“ * $“id“)
A. only
df.count()
B. df.take(1)
C. df.cache() and then df.count()
D. df.cache() and then df.take(1)
E. only
df.cache()
Answer: C
Explanation:
When you use cache() or persist(), the DataFrame is not fully cached until you invoke an action that goes through every record (e.g., count()). If you use an action like take(1), only one partition will be cached because Catalyst realizes that you do not need to compute all the partitions just to retrieve one record.
Question 4:
Which of the following statement is true for broadcast variables ?
A. It provides a mutable variable that a Spark cluster can safely update on a per-row basis
B. It is a way of updating a value inside of a variety of transformations and propagating that value to the driver node in an efficient and fault-tolerant way.
C. Broadcast variables are shared, immutable variables that are cached on every machine in the cluster instead of serialized with every single task
D. The canonical use case is to pass around a extermely large table that does not fit in memory on the executors.
Answer: C
Explanation:
Broadcast variables are a way you can share an immutable value efficiently around the cluster without encapsulating that variable in a function closure. The normal way to use a variable in your driver node inside your tasks is to simply reference it in your function closures (e.g., in a map operation), but this can be inefficient, especially for large variables such as a lookup table or a machine learning model. The reason for this is that when you use a variable in a closure, it must be deserialized on the worker nodes many times (one per task).
Question 5:
If we want to store RDD as deserialized Java objects in the JVM and if the RDD does not fit in memory, store the partitions that dont fit on disk, and read them from there when theyre needed also replicate each partition on two cluster nodes, which storage level we need to choose ?
A. MEMORY_AND_DISK
B. MEMORY_ONLY_2
C. MEMORY_AND_DISK_2
D. MEMORY_AND_DISK_2_SER
Answer: C
Explanation:
see https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/storage/StorageLevel.html
StorageLevel.MEMORY_AND_DISK_2 is Same as MEMORY_AND_DISK storage level but replicate each partition to two cluster nodes.
For a full set of 160+ questions. Go to
https://skillcertpro.com/product/databricks-spark-developer-3-0-scala-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.
Question 6:
You have a need to sort a dataframe df which has some null values on column a. You want the null values to appear last, and then the rest of the rows should be ordered ascending based on the column a. Choose the right code block to achieve your goal.
It is not possible to sort, when there are null values on the specified column.
A. df.sort(asc_nulls_last(a))
B. df.orderBy(asc_nulls_last(a))
C. df.orderBy(asc_nulls_last(“a“))
D. df.orderBy(asc(“a“))
Answer: C
Explanation:
Correct syntax is;
df.orderBy(asc_nulls_last(“a“))
df.sort(asc_nulls_last(“a“))
Question 7:
The code block shown below contains an error. The code block is intended to write a text file in the path. What should we add to part 1 in order to fix ?
val a = Array(1002, 3001, 4002, 2003, 2002, 3004, 1003, 4006)
val b = spark.createDataset(a).withColumn(“a“, col(“value“) % 1000).withColumn(“b“, col(“value“) % 1000)
-1
df.write.text(my_file.txt)
A. df.drop(“a“)
B. df.repartition(8)
C. df.coalse(4)
D. df.take()
Answer: A
Explanation:
When you write a text file, you need to be sure to have only one string column; otherwise, the write will fail
Question 8:
What is the best description of a catalog ?
A. A JVM process in order to help garbage collection.
B. It's the interface for managing a metadata catalog of relational entites such as databases, tables, functions etc.
C. It is sparks core unit to parallelize its workflow.
D. Logically equivalent of DataFrames.
Answer: B
Explanation:
The highest level abstraction in Spark SQL is the Catalog. The Catalog is an abstraction for the storage of metadata about the data stored in your tables as well as other helpful things like databases, tables, functions, and views
Question 9:
At which stage Catalyst optimizer generates one or more physical plans ?
A. Code Generation
B. Analysis
C. Physical Planning
D. Logical Optimization
Answer: C
Explanation:
First set of optimizations takes place in step logical optimization. See the link for more detail: https://databricks.com/glossary/catalyst-optimizer
Question 10:
The following statement will create a managed table
dataframe.write.saveAsTable(“unmanaged_my_table“)
A. FALSE
B. TRUE
Answer: B
Explanation:
One important note is the concept of managed versus unmanaged tables. Tables store two important pieces of information. The data within the tables as well as the data about the tables; that is, the metadata. You can have Spark manage the metadata for a set of files as well as for the data. When you define a table from files on disk, you are defining an unmanaged table. When you use saveAsTable on a DataFrame, you are creating a managed table for which Spark will track of all of the relevant information.
For a full set of 160+ questions. Go to
https://skillcertpro.com/product/databricks-spark-developer-3-0-scala-exam-questions/
SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
SkillCertPro updates exam questions every 2 weeks.
You will get life time access and life time free updates
SkillCertPro assures 100% pass guarantee in first attempt.