Scheduling Theory

Fall 2023

(Permanently) Under Construction

Course based predominantly on the Elements of Scheduling draft.

(Which used to have its own website but that appears no longer available. That one had some of the figures redrawn as vector graphics. I do not know if there are any significant differences.)

Schedule of presentations

5th of February (Monday), 10:15 AM, room 5, Sungjoo Ryu, Simple Approximation Algorithms for Minimizing the Total Weighted Completion Time of Precedence-Constrained Jobs by S. Jaeger and P. Warode; slides: pptx, pdf
7th of February (Wednesday), 1:00 PM, room 4, Andrzej Tkaczyk, Complexity of preemptive minsum scheduling on unrelated parallel machines by R. Sitters
7th of February (Wednesday), 2:30 PM, room 4, Krystyna Korzonek, Open Shop and Flow Shop Scheduling based on the textbook
8th of February (Thursday), 2:30 PM, room 119, Agnieszka Tatarczuk, Minimizing Maximum Flow Time on Related Machines via Dynamic Posted Pricing by S. Im, B. Moseley, K. Pruhs, C. Stein
8th of February (Thursday), 4:00 PM, room 119, Jan Wańkowicz, Algorithm(s) for finding Sidney decomposition
9th of February (Friday), 10:15 AM, room 4, Kacper Puchalski, The "staircase algorithm" for Q|pmtn, r_j |C_max based on the textbook
9th of February (Friday), 11:45 AM, room 4, Mateusz Opala, Scheduling unit-time taks with arbitrary release times and deadlines by M.R. Garey, D.S. Johnson, B.B. Simons, R.E. Tarjan
9th of February (Friday), 13:15, room 4, Łukasz Pluta, The Recognition of Series Parallel Digraphs. SIAM J. Comput. 11(2): 298-313 (1982) by Jacobo Valdes, Robert Endre Tarjan, Eugene L. Lawler
12th of February (Monday), 13:30, room 119, Krzysztof Nyczka, The Geometry of Scheduling by N. Bansal and K. Pruhs
12th of February (Monday), 15:00, room 119, Szymon Kosakowski, Preference Orders and Precedence Constraints for Total Weighted Completion Time on a single machine based on the textbook
12th of February (Monday), 16:30, room 119, Grzegorz Maliniak, Single machine scheduling with release dates by M.X. Goemans, M. Queyranne, A.S. Schulz, M. Skutella, and Y. Wang
12th of February (Monday), 18:00, room 119, Paweł Głowacki, Preemptive scheduling of equal-length jobs to maximize weighted throughput, by P. Baptiste, M. Chrobak, C. Dürr, W. Jawor, and N. Vakhania

Handouts:

Handout 1

Handout 2

Handout 3

Handout 4, based on excerpts from Optimal Time-Critical Scheduling via Resource Augmentation by C. A. Phillips, C. Stein, E. Torng, and J. Wein

Handout 11 (Updated on Jan 25)

Handout 12 (Jan 25: To be continued)

Topics for student presentations:

We studied an O(n^3 log^2 n) algorithm for (1|rj, pj = p|L_max), which consisted in binary search for the maximum lateness combined with an O(n^3 log n) feasibility testing algorithm, which used the concept of forbidden regions. However, feasibility in this setting can be tested far more efficiently in O(n log n) time. You are to present that algorithm, given in: M.R. Garey, D.S. Johnson, B.B. Simons, R.E. Tarjan, Scheduling unit-time taks with arbitrary release times and deadlines. SIAM J. Comput. 10, 256–269 (1981).
We skipped some results on preference orders and precedence constraints from the textbook. You are to present those, specifically Chapter 4.2 and 4.3, and perhaps also 4.8.
Apart from in- and out-trees, series-parallel partial orders (DAGs) form an important class that can often be handled efficiently. You are to present a linear-time algorithm for recognizing such digraphs, and returning their decomposition trees. You may find that in, e.g.: Jacobo Valdes, Robert Endre Tarjan, Eugene L. Lawler, The Recognition of Series Parallel Digraphs. SIAM J. Comput. 11(2): 298-313 (1982). (You can probably find it also in some textbooks.)
We proved that the randomized α_j-point algorithm (with uniform distribution) gives a 2-approximate solution for 1|r_j|∑ w_jC_j. A more careful analysis and a different distribution yield a better bound, beating ratio 2 also if a single α is used for all jobs. You are to present these results, due to: M.X. Goemans, M. Queyranne, A.S. Schulz, M. Skutella, Y. Wang, Single machine scheduling with release dates. SIAM J. Discrete Math. 15, 165–192 (2002).
We have seen two 2-approximation algorithms for 1|prec|∑w_jC_j, and there are a few others, which all rely on linear programming and/or flow techniques, or Sidney decomposition, which again relies on sophisticated flow techniques. But there is another algorithm, based on a round-robin rule, see 2309.12031.pdf (arxiv.org). You are to present the results of that article, ideally all, not just the single-machine algorithm.
We discussed application of Sidney decompositions but now how to find them. Present one or more algorithms that do it. You can find pointers to articles discussing that in the introduction of the article from the previous point. Also, the first polynomial algorithm seems to be due to: E.L. Lawler, Sequencing Jobs to Minimize Total Weighted Completion Time Subject to Precedence Constraints, Annals of Discrete Mathematics, Volume 2, 1978, Pages 75-90
The NP-hardness of R|pmtn|∑C_j is surprising, since we know that without preemptions (R|pmtn|∑C_j) can be solved in polynomial time. You are to present this hardness result, due to: R. Sitters, Complexity of preemptive minsum scheduling on unrelated parallel machines. J. Algorithms 57 (2005), 37–48
We noted that the Smith Ratio Rule is optimal for 1||∑w_jC_j. Moving to P||∑w_jC_j, things get harder, as even for 2 machines this problem is NP-hard (check references for this in Scheduling Zoo). However, the Smith Ratio Rule still provides fairly good approximation guarantees: The tight ratio was established in: T. Kawaguchi, S. Kyan, Worst case bound of an LRF schedule for the mean weighted flow-time problem. SIAM J. Comput. 15, 1119–1129 (1986), and a simplified analysis is given in: U. Schwiegelshohn, An alternative proof of the Kawaguchi-Kyan bound for the Largest-Ratio-First rule. Oper. Res. Lett. 39(4): 255-259 (2011). You are to present the hardness result, as well as the (tight) analysis of the algorithm. If possible, compare the two analyses.
Excerpts Chapter 8 not covered in the lecture, e.g., stochastic setting, and/or the MultiFit results (these are not covered in the textbook, but references are provided)
While solving the 1|p_j=p[,pmtn]|∑w_jU_j, some of you already familiarized themselves with the article by P. Baptiste and perhaps also one by M. Chrobak et al. improving the bounds from O(n^7) to O(n^5) for unweighted non-preemptive case. There is another related article (by Baptiste, Chrobak, et al.), which improves the bounds for (weighted) preemptive setting from O(n^10) to O(n^4), i.e., better than the one aforementioned. You are to present this result, comparing it with the approach of Baptiste.
The "staircase algorithm" for Q|pmtn, r_j |C_max from the textbook (from Chapter 8 or 9, depending on version..).
The "Meggido method" as well as solutions for (P|Q)|pmtn|L_max, some of which involve its applications, perhaps also the same with release times (same chapter as in the previous point)
Own Topic

Presentations Assignment (by topic number):

Mateusz Opala or Łukasz Pluta
Szymon Kosakowski
Mateusz Opala or Łukasz Pluta
Grzegorz Maliniak
Sungjoo Ryu
Jan Wańkowicz
Andrzej Tkaczyk
Krystyna Korzonek (unless she chooses Open Shops and/or Flow Shops instead)
not picked
Paweł Głowacki
Kacper Puchalski
not picked
Agnieszka Tatarczuk, Krzysztof Nyczka (perhaps yet to be confirmed)

Those of you not listed above have chosen to take the exam.

Grades

Slightly different from those originally proposed, but these have certain appeal, not merely aesthetic.

Grade Threshold (# presented solutions)

3.0 1

3.5 2

4.0 3

4.5 5

5.0 8

With some harder to quantify (beneficial) exceptions, e.g. more weight for particularly hard problems or nice solutions, consistency, etc.

There is an option to present some results in a seminar/lecture format for several points. Inquire if interested.

An incomplete list of unsolved exercises:

(Last edit: Jan 15)

Set 1: 11 (!!)
Set 2: 4(?), 7, and 8
Set 4: 2, 3, 6, 7, and 8
Set 6: 5
Set 7: 1, 2
Set 8: 1, 2
Set 9: 1, 3, 6, and 7
Set 10: 4** (this is one is tough..) 5-6, 8-13, 14-15 (new)

Hints:

4c from 3rd set: Suppose that job k precedes job j. What does the inequality say about the relation between their respective lateness values ?
6 from 4th set: First, realize (and explain why) that if an (speed-s) algorithm relies only the order of deadlines, it has to complete every job by the time (speed-1) YardStick does so. The instance you should consider should keep YardStick in the underworked mode at all times, except for the first m identical jobs. All jobs can be released at time 0. Ideally, your bound will be a function of m. If that seems too hard, you try working it out for m=2 and m=3.
7 from 4th set: The following describes the adversary strategy for the best known (and more general) bound: Let the algorithm have m+p machines of speed s. Let k be any integer between 0 and m. Consider successive "iterations", each of length 1, and each consisting of m jobs with processing time 1-k/m and deadline at the end of its iteration, as well as k jobs with processing time 1 and deadline at the end of next iteration. Call the time 1-k/m since the beginning of an iteration its critical time. The adversary ends the sequence of iterations at the critical time of an iteration they choose, at which point they release m more jobs, each with processing time k/m and deadline k/m at the end of that iteration (i.e., tight for speed-1). First, argue that this is feasible instance. Next, consider the amount w of total remaining processing time of the first kind of jobs at the critical moment of any iteration. What upper bound can you give for w given that this may be the final iteration? From this, derive an upper bound on the amount of processing that the algorithm can do in any iteration. On the other hand, you should be able to derive a lower bound on the amount of processing that the algorithm has to do in every iteration, lest it accumulates more "pending work" with each iteration, making it miss some deadline eventually. From these, you should derive a lower bound on s a function of m and k (and p if you allow p>0); to get the bound, optimize the choice of k.
8 from 2nd set: The following generalization of the lower bound on L_max to f_max is natural: max_{S: subset of all jobs} min_{j: a job in S} {f_j(r(S)+p(S))}; it should be fairly easy to prove that preemptive LCL does indeed match this bound for some job (which must be the last one to complete in one of the blocks)
4 from 2nd set: (This is perhaps not very interesting, but you should be able to prove such statements.) Begin by showing that there exists an optimal schedule that with no unforced idle time and at most n-1 preemptions. (Perhaps to that end you will need to consider a modified instance, where we aim to minimize L_max with d_j set to C*_j for every job, and think of the EDD schedule for it.) Next, that such schedule must conform with the block structure found by our algorithm, i.e., for each block B schedule exactly the same jobs (though possibly in a different order) in [r(B),r(B)+p(B)]. Finally, for a given block B (with fixed set of jobs), prove by induction on |B| (i.e., the number of jobs in the block) that the schedule given by Schedule Block is optimal.
7 from 2nd set: Much like in Ex 4, you should begin by identifying the block structure and proving that there exists an optimal schedule that shares it. To use EDD for this, as we did before (perhaps for due-dates only?) modify release dates so that they conform with precedence constraints, i.e., if i->j, make sure that r_j >= r_i+p_i. With block structure established, when determining the schedule for a block, when determining the last job to complete, the algorithm should restrict its choice to those that have no successors within the block.
(Dec 08) 1 from 7th Set: Not a full solution but an idea that feels right. Consider instances I_0,..,,I_p such that I_0 has a single job of size p available at time 0 and I_k (for k>0) additionally has n-1 jobs of size 0 released at time k. Assume that n>>p. Then the optimal solution for I_k is to stay idle till time k, have the zero-length jobs complete at that point, and the long job at time k+p. This has cost kn+p. Also, There are at most p+1 different deterministic algorithms, whose costs can be easily determined on each instance. Now try finding a probability distribution over these instances such that the expected cost of all these algorithms is (almost) equal, and then optimize to maximize the ratio of E[OPT]/E[ALG]. Some tweaks might simplify the calculations. An alternative approach is to explicitly use a correspondence (though not exact) to the Ski Rental Problem, for which a lower bound of e/(e-1) is known.
(Dec 11) 2 from 8th Set: More of an intuition/possible approaches. (1) Try a "graphical" proof using 2D Gantt charts, e.g., something similar to the one for Sidney decompositions, to be presented in today's lecture (though this seems to get more tricky with preemptions..) (2) Something that may be studied as an independent questions, to which a positive answer seems relevant to the exercise: Suppose we are given a schedule S with mean busy times M_1,..,M_n. Is it true that if we take that schedule and stretch it by a factor of 2 (i.e., if j was partially processes in [a,b], now it is processed in [2a,2b]), then every job j completes by the time 2M_j? If not, is there another schedule where this holds?
(Dec 17) 4 from 8th Set: The way to derandomize this algorithm is by ``fixing'", the alpha_j random variables successively in any order of j, e.g., 1,2,..,n, in such a way that the (conditotional) expected cost of the schedule, conditioned on the already fixed variables, is no larger than the previous/initial expected value of a suitable upper bound on the cost of the schedule. By "suitable", we mean one that was used in the analysis, and also that can be efficiently evaluated. E.g., the upper bound using C*_j values is useless, since we cannot hope to determine those. Simply using the sum of weighted completion times seems bad as well, since (as we have seen in another exercise), there can be exponentially many possible schedules, so calculating the expected values of all the completion tasks has to be done in an efficient way. To that end, we choose to work with the P_j and R_j random variables. Their (weighted) sum over all j is exactly the cost of our schedule. Moreover, equation (3) and the one before tells us how to evaluate each (perhaps conditional) expected value of P_j, by summing the n processing times, each weighted by an appropriate probability of one job preceding another. (We need a similar property for R_j, but try this one yourselves.) Let X denote this random sum and T its initial expectation. Moreover, let F_j be shorthand for "alpha_1, .., alpha_j being fixed the way desribed next". Then, for j=1,2,..,n, given E[X|F_{j-1}] <= T, which for j=1 holds vacuously, because there is no conditioning, fix alpha_j by restricting it to a particular maximal time interval in which only j is processed in the reference schedule. Notice that this is sufficient for determining job order and the probabilities mentioned above. Moreover, as the job being processed may only change in the Smith ratio schedule upon completion or release of a job, there are at most 2n-1 such intervals (for all jobs together, not just the job j). Therefore, if there are k intervals possible intervals for job j, let Q_i be the random event that the alpha_j point lies in Q_i. We can determine each of E[X|F_{j-1} and Q_i] for i=1,2,..,k. As E[X|F_{j-1}]<=T by inductivre assumption and E[X|F_{j-1}] is the sum of Pr[Q_i] * E[X|F_{j-1} and Q_i] over all i, for at least one of the event Q_i we have E[X|F_{j-1} and Q_i]<=T. Choose any such Q_i and fix it, i.e., make C_j the intersection of C_{j-1} and Q_i. Once we are through fixing the alpha-points for all jobs, subject to C_n the schedule of our algorithm is completely determined, and its cost is no larger than T.

Page updated

Report abuse