Limits of Computing

Back to All pairs shortest paths and dynamic programming

Lower Bounds on time for classes of algorithms.

11.1: Trivial lower bounds: time to read all the input, or generate all the output.

11.2 Decision trees. DasGupta ch 2 page 8, done earlier

Goals for P-NP, Section 11.3

Understand P, NP, NP hard, NP complete at the level described below.
Be able to show a problem is in P by expressing the order of the algorithm in terms of the size of the input in bits.
Understand Polynomial Reducibility.
Given a simple reduction transformation, be able to prove it works and take only polynomial time
In simple situations find reduction transformations
Understand the problems associated with assorted standard names: CNF and 3-CNF satisfiability, Hamiltonian Cycle, Hamiltonian Path, Graph coloring, Knapsack, Subset Sum, Travelling Salesperson. Follow other problems in 13.3.2 if their statements are repeated.

Rough Outline

P: a class of yes-no problems that execute in polynomial time: i.e. whose execution time is O(n^k) for some constant k, where n is the length of the input in bits. Problems that have polynomial time solutions are called tractable. Problems that do not have polynomial time solutions are called intractable.

This is not the ultimate practical class: An algorithm that is O(n¹⁰⁰⁰⁰⁰) is not very useful: but polynomials are closed under addition, multiplication, and composition: polynomials form the simplest class with these properties including f(n) = n.

Example: Given the connection matrix for a connected undirected graph, does ther exist an Euler path (using each edge once)?

oddCount = 0
for each vertex
count edges to other vertices
if the count is odd: oddCount++
return oddCount < 3

Here k = 1: the length of the algorithm is O(number of matrix entries) = O(input size).

The encoding issue: Decision problem: Is p prime: Can find remainders for i = 2, 3, ... <= sqrt(n). apparently order of sqrt(p). So is the time polynomial in the input? See the comment at the end of this page.

NP: a class of yes-no problems satisying the following: Given input data for which the answer is yes, there exists some string s of information that will allow you to conclude that the answer is yes in polynomial time. The precise definition of NP is very complicated with the details of encoding, but it includes all problems like "Given a problem, is there some solution data (given by s) for this input data, for which you can confirm that solution data is correct in polynomial time." You can skip the definitions in terms of Nondeterministic (except to remember that is where the N came from in NP).

One idea from the "nondeterministic" that is important to remember is the idea of dealing with a random string s: you must allow for the possibility that the string is in some sense garbage: Part of checking s is to make sure it is in a legal form, not just that the data actually solves the problem.

The central example problem:

CNF (Conjunctive Normal Form): boolean expressions involving subexpressions with variables or their negations ORed together, and any number of these ORed expression ANDed together.

define k-CNF to be expressions in CNF where all the ORed subexpression involve no more than k boolean variables.

Example in 2-CNF: (x ⋁ ~y) ⋀ (~x ⋁ ~z) ⋀ (w) ⋀(y ⋁ q).
Example in 3-CNF: (x ⋁ ~y ⋁ z) ⋀ (x ⋁ z) ⋀ (w ⋁ y ⋁ q)

CNF and k-CNF-satisfying problems: determine whether there is an assignment of values to the Boolean variables, so the expression is true. Easy for 2-CNF, no tractable solution known for 3-CNF or CNF.

Proof that the CFN satisfying problem is in NP (though it is not known if it isn in P):

The order of the input O(n), is at least the order of a list of the distinct symbols. The string s could just be a sequence of the symbols that are true in a solution, which will be of length O(n). You can pass through the input expression once accumulating the answer, and at each symbol spend O(n) time going through s looking for the symbol, is all spending O(n²) time.

Be sure you understand the proof! You will be expected to show that problems are in NP! It is essential to understand that you are only confirming a proposed solution, not finding a solution!

For a problem in P, the string s can be empty, since the answer can be directly calculated in polynomial time, and P is contained in NP. On the other hand there could be problems that cannot be solved in polynoimial time, but which can have the correctness of a proposed soluton confirmed in polynomial time. Such a problem would be in NP but not P.

NP hard: a problem H that is "at least as hard" as any in NP: "At least as hard" means roughly that any NP problem A can be translated into a special case of H in polynomial time. More technically, "Problem A translates into a special case of H in polynomial time" means A is polynomially reducible to H. (Defined below.)

NP complete problem: a problem that is in NP and is NP hard (bounds from each side: easy enough to be in NP (input with yes answer testable in polynomial time) and hard enough that it is NP hard (all other NP problems are essentially a special case of it) The first example of an NP complete problem (CNF) is due to Cook in 1971, and many very different looking problems were derived from this problem by Karp in 1972.

It is easy to see that P ⊆ NP. It is the Holy Grail of algorithms to figure out whether P = NP or not. There are now hundreds of different sounding (but not really different underneath) NP complete problems, many seeming apparently close to a problem in P, but not the same. Most computer scientists working in algorithms believe there are NP problems harder than those in P, BUT neither have have ever found an NP complete problem which there could prove could NOT be solved in polynomial time. There was a proposed proof a few years ago, but it was withdrawn.

Some understanding of NP complete problems is useful, even if you are not interested in proving anything so general. It is not a lot of work to show that many problems are NP complete (by a polynomial reduction). Such a problem is likely to be very hard to solve. Hence time spent looking for a tractable solution is likly to be wasted (unless you are going for the long shot and the ACM's Turing Award).

Another briefly stated NP-complete example, Subset sum: given n positive integers: is there a subset summing to k?

Show it is in NP....

Similar problems of different difficulties
The ones listed as hard are NP complete:

Hamiltonian circuit: cycle that visits each vertex exactly once (hard) as distinguished from an Euler circuit that includes each edge once (easy algorithm above).
Minimum weighted path between two vertices (easy with Dijkstra's algorithm) vs travelling salesman problem: minimum weight hamiltonian circuit (hard).
CNF and K-CNF-satisfying problems: determine whether there is an assignment of values to the Boolean variables, so the expression is true. Easy for 2-CNF, no tractable solution known for 3-CNF or CNF.
Graph coloring: can color vertices with k colors so no adjacent ones the same color? (useful for setting up final exam schedules: vertices are courses, connected if a student is taking both courses.) 2, 3 coloring easy. k coloring hard.

Optimization vs decision problems

Only decision problems are officially in P. Optimization problems generally are solvable in polynomial time if related decision problems are:

optimization: find length of the longest simple cycle.
decision: is there a cycle of length >= k?

In a graph with n nodes, you can solve the optimization problem vis a binary search with decision problems, multiplying the solution time by lg n, so if the decision problem is in P, the optimization problem is also solvable in polynomial time.

Motivation of the proof of the first NP complete problem, CNF:
Roughly, an answer testing algorithm can be thought of a computer program. If it can be solved in polynomial time, it can be unwrapped into a program with no loops, with a polynomial bound on the the number of steps. The program can be translated to machine language, which can be implemented with AND, OR, and NOT gates with the data in the input, which can be expressed in terms of Boolean algebra, which can be expressed in CNF, so finding a solution to the CNF generates an answer to the original problem. All the translations in the middle still keep the size of the expressions polynomial in the length of the input.

This proof was a great achievement, but the CNF problem seemed like it could easily be more general than any other common problem.

It was also very important, therefore, next year, when Karp showed a tree of clever reductions from CNF to many other common NP problems, and hence these other problems were also NP Complete. DasGupta outlines many of these. If you want to see some amazing creativity, have a look. I am NOT going to ask you to reproduce any of these. I will give examples and exercises of extremely easy ones.

Polynomial Reduction

Polynomial reduction is a central idea in discussions of NP complete problems! It refines the idea of "no harder than" or "maps to a subset of the problems in".

Polynomial reducibilty P -> Q

Problem P is polynomially reducible to Q if there is a transformation T that takes each possible input to P to an input for Q, such that the worst case time for the transformation is bounded by a polynomial function t of the length of the input to P.
P(x) is true iff Q(T(x)) is true

The arrow and the term "reducible" are counter intuitive for me. I think of what you "reduce to" being less, but no, that is backwards. It would be better to say that all of P can be shown to be polynomially equivalent to a subset of the problems in Q, so Q is a larger class of problems, like P ⊂ Q, but that is not the book's notation and terminology.

In class: how to reduce hamilton circuit to traveling salesman decision problem?

Theorem 13.3: If A -> B and B is in P, then A is in P.
Proof: A(x) = B(f(x)). Suppose the time bound on B with input of length n is b(n). and the time to convert x to f(x) is bounded by t(n). It takes one unit of time to create each unit of output, so the size of f(x) <= t(n) and hence the time to solve B using f(x) is bounded by b(t(n)). The total time including the time for f is t(n) + b(t(n)), a polynomial since t and b are.

Hence if an NP complete problem is in P, all problems reducible to it (all of NP) must also be in P, and P = NP.

A harder transformation, if we have time: Show the Hamiltonian Path problem (either way - say undirected) is NP Complete, given that the corresponding Hamiltonian Circuit Problem is NP Hard.

Other limits: Not only are there problems that are hard to solve, there are problems that are impossible to solve like the Halting problem. A version is in the text, and also there is a concise version in DasGupta ch9 p 38. (after the exercises - very end of chapter).

There will be no new topics for the final on the last several days of class, only review and reworking old topics, more examples to work.

Comment: The length of a single integer k is O(lg k). This does not generally matter when we consider a variable number n integers as the input. It does matter if the input is a fixed finite number of possibly large integers (as with finding prime factorization).

The End!