CPSC 327 | Data Structures and Algorithms | Spring 2024 |
Structural things about the recursive backtracking development process:
Be careful to distinguish between two views of "size" — in the big-Oh sense, "size" is the size of the input, but in the "smaller subproblem" sense, the size for a series-of-choices formulation is the number of choices left to make.
The "produce output" approach is to repeatedly produce the next thing in the output. For a subset problem — what's the next thing in the subset? For an ordering problem — what's the next thing in order? The goal of considering "process input" and "produce output" as approaches is to identify what the decision is in the series-of-choices formulation, not the criteria used to make the decision.
The subproblem output can be defined as only the solution to subproblem itself or it can be the solution to the entire problem — does the base case return an empty solution (solving the rest of an already-complete problem) or the whole solution (possible because the partial solution has been passed along as part of the input to each subproblem)? In general, prefer the former because then only the aspects of the partial solution relevant to the subproblem need to be passed along. This is essential for situations where dynamic programming is applicable.
Focus on identifying all of the legal alternatives for the "Alternatives" step. Legal alternatives which aren't going to lead to the desired solution should be identified (and justified as safe) as part of pruning.
The base case for a series-of-choices formulation is "out of choices" (i.e. a complete solution). Identify the circumstances under which you have a complete solution.
Showing correctness: "establish the base case" by explaining why the solution is complete at that point; that it is a legal solution comes from never making an illegal choice along the way. "Maintain the main case" by explaining why all of the legal (and only the legal) alternatives are covered, why the subproblem input is right, and why what is done with the subproblem output is right. These are {\em not} simply restatements of the base case and main case, though the steps are typically formulaic and the correctness is apparent.
The level of detail in the implementation step should be "just enough". The goal is to make any decisions that affect the big-Oh running time without getting unnecessarily detailed. "Well-known" things don't need to be explained further. Pseudocode- or code-level detail is not needed. You do need to identify when there's a choice of implementations with different running times or when a non-standard implementation is needed.
For example, if your algorithm is to repeatedly find the largest value to work with next, this is the time to identify that the more efficient implementation is to sort the values first.
For example, it is well-known and common that values can be retrieved in O(1) by accessing a variable or a field in an object, lookup in an array, or lookup in a Map. So if the distances between each pair of controls is part of the input, you don't need to specify that there's a 2D array or even state that the distance between two controls can be determined in O(1) time — just assume that. On the other hand, if input needs to be gone through in a particular order and there isn't anything in the specifications indicating that it comes in that order, you do need to account for sorting — but it is well-known that sorting is O(n log n) so you don't need to go further and identify a particular sorting algorithm.
For example, you don't need to say that a for loop will be used to go through a collection, or even provide details of how to traverse a standard type of collection.
The running time is the time for each subproblem (often O(1), though not always) × bh — b is the branching factor (the number of alternatives for each choice) and h is the length of the longest solution.
For pruning, identify legal alternatives that won't eliminate the desired solution. (For optimization problems, this would be legal alternatives that won't lead to a better solution than what is already known.) It is necessarily to explain why it is safe to prune.
The bound function can't be something that returns the same value for all partial solutions as that does nothing for distinguishing between partial solutions worth continuing and those not. A greedy strategy for computing a bound is only safe if it can't lead to a worse answer than a legal solution. For example, a solution for 0-1 knapsack is a legal solution for fractional knapsack, so a greedy solution (which is optimal) for fractional knapsack won't result in a lower value than a legal 0-1 solution. Thus, a greedy fractional solution as an upper bound estimate is safe.
Any legal solution is safe for the initial solution estimate. One strategy is to identify a greedy heuristic that hopefully does well in terms of finding a good solution. This will be safe because it is finding an actual solution. Sometimes other strategies can be used to establish a bound for the solution value without actually finding a solution; in that case, it is necessary to explain why it is safe.
Specific things about the time-o problem:
The output for this problem has characteristics of both subset and ordering problems — the order controls are visited in matters, and they may not all be included in the solution — but it is primarily an ordering problem because the order needs to be determined. (By contrast, longest increasing subsequence also has characteristics of both subset and ordering problems because the subsequence is ordered, but it is primarily a subset problem because the ordering of the output is dictated by the ordering in the original sequence S — the task is to figure out which elements are in the sequence because the ordering of those elements in the solution is known.)
It is legal to visit controls outside the time window, to exceed the time limit, and to visit controls more than once.
It is legal to wait for a time window to open. That adds to the alternatives — it's not just which control to visit next but also how long to wait at that control.
Everyone must report to the finish, and that can happen at any point — it is not necessary to visit all the controls or use up all the available time first.
Ideally, all of the legal alternatives are identified in the "Alternatives" step and consideration of which aren't going to be useful to explore is part of pruning. However, since it is, in fact, legal to stop and wait at any point on the course and for any amount of time, there are essentially an infinite number of alternatives — the decision is "what to do next?" and that can be to wait or to proceed in some direction. It is OK (and necessary) to recognize that there's always an equivalent solution where any waiting time along a leg is done at the next control instead, and where any waiting time done at a control that doesn't result in more points (e.g. waiting after a time window has closed) can be done at the next control instead — so the possibility of waiting only needs to be considered when arriving at a control before its window opens, and the only waiting times that need to be considered are to wait until the window opens or don't wait at all.
Competitors are only done when they have reached the finish. Since visiting controls more than once and going overtime is legal, there are legal solutions that are infinitely long. In order to ensure termination, it is necessary to expand the idea of "complete solution" to one that is "functionally complete" — for a maximization problem, once it is no longer possible to add value to the solution, there is no point in doing anything other than going to the finish.