CPSC 327 | Data Structures and Algorithms | Spring 2024 |
In general: be careful when copying or following a previous example as a template — watch out for copy-and-paste errors (leaving terminology from other problems in place) and make sure you understand the why of what the solution is for each step of the process (e.g. don't suddenly introduce an overall time limit for completing all of the challenges because 0-1 knapsack had on overall weight capacity).
The targets step is about what runtime you are trying to beat. If nothing else is stated, address the brute force algorithm and its running time. For optimization problems, the brute force algorithm is to enumerate all possible solutions and then pick the best. Also state the running time, and be more specific than just "exponential" — enumerating all possible solutions takes at least as much time as the number of solutions, so that gives a base line. For find a subset problems, the number of possible subsets is 2n — there are two choices for each of n things. For ordering problems, the number of possible orderings is n! — there are n choices for the first thing, n-1 choices remaining for the second thing, and so on.
The tactics step is about looking for something specific to help guide the algorithm development process, so something like "try not to use too much time or space" isn't helpful. For a specific algorithmic paradigm (divide-and-conquer, greedy, recursive backtracking, dynamic programming) the choice of paradigm itself achieves an improvement over the brute force algorithm and is likely to be the most important tactic. (Recursive backtracking doesn't itself improve on the brute force, but it leads you to focus on pruning and branch-and-bound techniques.) In a case where the problem is to develop a particular variety of algorithm, you can skip this step.
Using dynamic programming means a series-of-choices formulation — the solution will obtained by making a series of choices about something. The approaches step is to identify what that series of choices might be. "Process input" is a series of choices about the input elements — what to do with each input item. For a subset problem, this is include or not include the element in the subset. For an ordering problem, this is where the element goes in the ordering. "Produce output" is a series of choices about the output elements — how to produce the next thing in the output. For a subset problem, this is the next element to include in the subset. For an ordering problem, this is the next element in the ordering.
Once you've identified possible approaches, settle on one before moving forward with the rest of the algorithm. For subset problems, process input has a lower branching factor and can be easier to think about and represent efficiently. For ordering problems, produce output can be easier to think about and represent efficiently. (This doesn't mean it is wrong to choose a different approach, just keep in mind both what seems easier to think about and, when you get to thinking about subproblems and memoization, what is most efficient to represent. Sometimes you may need to work through with one approach and then come back and consider another if you get stuck or to see if there's an improvement.)
Subproblem definition: for development purposes, dynamic programming is really just an improved implementation of recursive backtracking. For recursive algorithms, the subproblem always involves the same task as the original problem (though there could be additional elements, such as more returned) but is generalized. For recursive backtracking, "generalized" means "solve the rest of the problem in light of the current partial solution". Write the subproblem task as this template filled in with the specifics of the particular problem.
Also explicitly identify the input and the output for the subproblem. The input will be the same as the input for the original problem plus what is needed about the partial solution — if solving the rest of the problem depends on the actual partial solution, dynamic programming isn't going to be helpful. (Remember that dynamic programming relies on repeated subproblems — the same subproblem arises from different partial solutions.) To determine what is needed about the partial solution, move on to the base case and main case steps, then consider what those depend on.
For optimization problems, the output should be the solution for the subproblem. This typically includes both the solution itself (which items picked, which ordering, etc) and the value of the optimization criteria for that solution (the total value, etc).
For the base case, identify both the conditions that define the base case (no more choices left to make) and what the solution is.
For the main case, the general form is "for each legal alternative for the current choice, solve the rest of the problem in light of adding that choice to the partial solution so far, then pick the desired result". For optimization problems, the "desired result" is the best one. Write the main case as this template with the specifics filled in for the particular problem.
Be specific about the input and usage of the output for the subproblems! Don't just say "solve the resulting subproblem" but instead identify exactly what the input is for each subproblem in terms of the input to the curent subproblem, and identify exactly what the returned result is in terms of the returned results from the subproblems. Introduce some notation — it is convenient to treat the subproblem as a function call, so name it and name the input values. See the writeups for longest increasing subsequence and TSP for examples.
For termination, identify the measure of progress (how is the number of choices left being reduced?) and explain why that means the base case is always reached. See the notion of at least an implied "because" in the discussion of correctness below — this step is often fairly self-evident, but it isn't just a restatement of what the base case is.
For the correctness steps, explain why each thing is correct — don't just restate the base case and main steps. For example, "we hand the right subproblems to the friends and they give us the right solutions back" and "all the legal alternatives are considered" doesn't explain why the subproblems handed off are right or that all the legal alternatives are considered. Instead, be specific ("the only options are to do challenge k or not") and connect what is done to what is legal (how is what is done in the main case avoid picking challenges that conflict with each other?). Also explain where the result returned comes from — what is done to the subproblem solution to incorporate the choice that was made to generate that subproblem? There can be a very fine distinction between simply stating what the base and main cases do and explaining why they are correct; keep in mind "because" even if the word isn't explicitly written. ("All the legal alternatives are considered" lacks a "because"; "the only two options are to do the next challenge or not, and doing the next challenge is a legal option because..." implies a "because" in the "two options" clause and explicitly states it in the second part.)
Memoization: the result of a subproblem is the solution for the rest of the problem, so that is what is stored in the array. For example, score[k] would store the maximum score possible from challenges k to n, not the score from challenges 1 to k (that sounds like the value of the partial solution).
The running time is the size of the array (the number of subproblems) times the work per subproblem. It's not O(n) because each subproblem is solved exactly once — the point of dynamic programming is that each subproblem is only solved once, but the running time depends on how many of those there are.