गुरुर्ब्रह्मा गुरुर्विष्णु गुरुर्देवो महेश्वरा गुरुर्साक्षात परब्रह्म तस्मै श्री गुरवे नमः !
Question 1 In Storm, data is broken up into pieces called: 1 point
Bolts
Tuples
Knuths
None of these options
SOLUTION-Tuples
Question 2 Small world networks have: 1 point
Low clustering coefficients and long path lengths
High clustering coefficients and long path lengths
High clustering coefficients and short path lengths
Low clustering coefficients and short path lengths
SOLUTION-High clustering coefficients and short path lengths
Question 3 Which of the following represents a power-law graph, when one plots the degree on the x (horizontal) axis and the number of vertices with degree x on the y (vertical) axis? 1 point
Both x and y axes are linear, and the plot is a straight downward sloping line.
The x axis is linear and the y axis is logarithmic, and the plot is a straight downward sloping line.
Both x and y axes are logarithmic, and the plot is a straight downward sloping line.
None of these options
SOLUTION-Both x and y axes are logarithmic, and the plot is a straight downward sloping line.
Question 4 Which one of the following is FALSE? (MULTIPLE CHOICE QUESTION)
Heavy-tailed degree distributions have some vertices with very high degree.
A graph with exponential degree distribution is heavy-tailed
None of these options
Question 4 Which of the following is FALSE about power-law and small-world graphs? 1 point
For a graph that is small world and power law, killing a few randomly-selected vertices will disconnect the graph (split it into two graphs) with high probability.
Not all small-world graphs are power-law graphs.
For a graph that is small world and power law, killing a few of the highest degree vertices will disconnect the graph (split it into two graphs) with high probability.
Not all power-law graphs are small-world graphs.
SOLUTION-
Question 5 In a single processor system, switching from one task to another incurs an extra time overhead, called a context switch. Typical context switch overheads in some recent Intel processors are a few 10s of nanoseconds. Which of the following scheduling algorithms will have the highest total context switch overhead for a given workload (consider the case where all jobs have arrived at time 0)? 1 point
Shortest task first
Priority, with no preemption
Round robin
First in first out
SOLUTION-Round robin
Question 6 Which of the following is FALSE about Hadoop Capacity Scheduler, as discussed in lecture? 1 point
It uses FIFO scheduling, e.g., within each queue.
It can use a hard limit and a soft limit.
It uses shortest job first scheduling and is thus optimal.
Normally it does not allow task preemption.
SOLUTION-It uses shortest job first scheduling and is thus optimal.
Question 7 You are implementing a variant of Storm where you have a few types of grouping available (some of them similar to Storm). Now at a given bolt B you’re trying to do a join (cross-product) of two incoming streams S1 and S2. S1 is the stream of incoming tweets on Twitter, while S2 is the stream of new accounts being created on Twitter.
Now, among the tasks of B (which is parallelized), which of the following grouping combinations would you use? 1 point
Shuffle grouping for both S1 and S2
All grouping for both S1 and S2
Shuffle grouping for S1, and All grouping for S2
None of these options
SOLUTION-Shuffle grouping for S1, and All grouping for S2
Question 8 You are implementing a variant of Storm where you have a few types of grouping available (some of them similar to Storm, some of them new). Now you write a topology (for this Storm variant) that calculates Wordcount on a given dataset. The topology consists of a spout B1, which sends sentence tuples to a bolt B2 that generates <word, 1> tuples, which in turn sends data to a bolt B3 that sums up all <word, i> tuples.
For bolt B3, which of the following groupings would be the most preferable, in order to be correct, efficient, and also balance load: 1 point
Round-robin among tasks of B3
Fields grouping which uses consistent hashing underneath among tasks of B3, by hashing the first element (word) of the tuple
Fields grouping which uses consistent hashing among tasks of B3, by hashing the entire tuple
All grouping among tasks of B3, like in Storm
SOLUTION-Fields grouping which uses consistent hashing underneath among tasks of B3, by hashing the first element (word) of the tuple
Question 9 Your colleague has built a new distributed graph processing system that uses the Gather-Apply-Scatter philosophy and where vertex programs are run. She has a large Web graph on which she wants to run PageRank. In this graph in which each vertex is a page, while links between pages are directed edges denoting if a page links to another, someone has written the following gather, apply, and scatter programs.
Gather: each page P gathers all other page’s PageRanks
Apply:
To compute PageRank of vertex P
acc = 0;
for each page Q that is an in-neighbor of P do:
acc = acc - Q.PageRank/Q.degree;
end for
P.PageRank = 0.85 * acc + 0.15;
Scatter: Send page P’s PageRanks to all other pages.
Which of the following are errors (with respect to both correctness and efficiency) in the above pseudocode (MULTIPLE answers): 1 point
Scatter should only send PageRank of P to P’s neighbors, and not all other pages
There are no errors in this pseudocode.
In the for loop, the value of acc should be increased (+ instead of -) by “Q.PageRank/Q.degree”
Gather should only gather the PageRanks of P’s neighbors in the graph, not all other pages in the graph
SOLUTION-Scatter should only send PageRank of P to P’s neighbors, and not all other pages
In the for loop, the value of acc should be increased (+ instead of -) by “Q.PageRank/Q.degree”
Gather should only gather the PageRanks of P’s neighbors in the graph, not all other pages in the graph
Question 10 In order to make a graph processing engine faster, your colleague wants servers to be able to execute the Apply phase before the Gather phase is completed. This can be done (MULTIPLE answers): 1 point
If the Apply phase is applied only for vertices whose neighbors' values have been completely received (i.e., their respective Gather phases are completed)
Always
Never
If the Scatter phase is not started for vertices whose Gather phases’ output (i.e., their respective neighbor information) has not been completely received
SOLUTION-If the Apply phase is applied only for vertices whose neighbors' values have been completely received (i.e., their respective Gather phases are completed)
If the Scatter phase is not started for vertices whose Gather phases’ output (i.e., their respective neighbor information) has not been completely received
Question 11 In an extended ring graph, each node has four neighbors: two clockwise neighbors and two anticlockwise neighbors. The clustering coefficient of this graph, calculated based on the formula discussed in lecture, is: 1 point
1/3
1
1/2
0
SOLUTION-1/2
Question 12 In a single processor system, five tasks arrive simultaneously, with the following known run-times: 10 s, 15, s, 4 s, 15 s, 23 s. What is the lowest average completion time for this task set? 1 point
31.6 s
35.0 s
67 s
32.6 s
SOLUTION-31.6 s
Question 13 In a single processor system, seven tasks arrive simultaneously, with the following known run-times: 5 s, 4 s, 4 s, 3 s, 2 s, 1 s, 1 s. What is the lowest average completion time for this task set? 1 point
20 s
9.5 s
10.2 s
8.5 s
SOLUTION-8.5 s
Question 14 A system using the Dominant Resource Fairness scheduling strategy receives the following two jobs. Job 1 requires for each task <1 CPU, 4 GB RAM>, while Job 2 requires for each task <3 CPUs, 1 GB RAM>. The system has a total of 36 CPUs and 72 GB of RAM. Which of the following allocations satisfies Dominant Resource Fairness? 1 point
None of these options
12 tasks to Job 1, 8 tasks to Job 2
8 tasks to Job 1, 12 tasks to Job 2
12 tasks to Job 1, 12 tasks to Job 2
SOLUTION-12 tasks to Job 1, 8 tasks to Job 2
Question 15 A system using the Dominant Resource Fairness scheduling strategy receives the following two jobs. Job 1 requires for each task <1 CPU, 3 GB RAM>, while Job 2 requires for each task <3 CPUs, 2 GB RAM>. The system has a total of 20 CPUs and 40 GB of RAM. Which of the following is TRUE? 1 point
Both Job 1’s and Job 2’s dominant resource is CPU.
Job 1’s dominant resource is CPU, while Job 2’s dominant resource is RAM.
Job 1’s dominant resource is RAM, while Job 2’s dominant resource is CPU.
Both Job 1’s and Job 2’s dominant resource is RAM.
SOLUTION-Job 1’s dominant resource is RAM, while Job 2’s dominant resource is CPU.