So cleaning the input, which is more technically also known as pre-processing--what's the idea behind that?
Here's what you would normally do for an NP-complete problem as we have talked about so far:
So if you're given the input for an NP-complete problem. What you would do using the techniques from the previous units
is you would fire up your search tree to try and find an optimal solution.
And of course, that search tree has exponential size. So the algorithm goes through that tree here
until, at a certain point in time, it says, "Bingo, I found a solution," or, "I found the best possible solution."
The idea of pre-processing now is similar to something that we already saw for vertex cover or independent set,
where, for certain vertices, while we were traversing the search tree, or even in advance,
what we could already say for certain vertices, we know what assignment that vertex is going to have in an optimum solution.
And we could make that statement without actually going through any branching, but in polynomial time.
And that is the idea of pre-processing. The idea of pre-processing is, if you can actually find certain parts of the input,
where in polynomial time, of course, you can already say how they would be handled in an optimum solution.
So we're kind of nibbling away at the input here. And what that, of course, means is if your pre-processing is successful,
or especially if it's very successful, then the search tree that results from that input is not going to be as big.
So there's certain parts of the search tree that you don't have to do, because you already have found out
in the pre-processing what that part of the solution is going to look like.
So the search tree size will decrease. So we can, for example, cut off this branch here, because we've already
pre-processed this, and we can cut off this one here because we also pre-processed that one.
So now let's make this more concrete, and let me give you a concrete example.
And we're going to do this for SAT, because SAT is a problem where pre-processing is usually very successful.
So if you were, for example, to use a commercial SAT solver, then pre-processing will play a very very important role in that.
I once talked to somebody who develops those solvers, and they basically said that his package works 90-95%
through pre-processing. So even for SAT instances with thousands of variables, his package can basically solve it,
but it can only solve it because the pre-processing algorithms are very good.
So you'll remember that SAT was the problem of finding if a given Boolean formula has a satisfying assignment or not.
And I'm now going to write down a Boolean formula for you, and then we're going to do a little quiz
to make pre-processing more concrete.
So the SAT formula is x1 or x3 or x5, and not x1 or x2 or x4, and so on, and so on.
Now, of course this formula here doesn't have very many variables. It's just six variables--
x1, x2, x3, x4, x5 and x6. So with a little playing around, you would probably be able to figure out if this Boolean formula here
has a satisfying assignment or not. But of course, what we want to do now is pre-processing,
And that means that we want to see if, for certain variables, in this Boolean formula, we can figure out
if they should be set to true or false, without actually trying all possible combinations.
And as I said, we're going to do this as a quiz. So what I would like you to do is to look at this Boolean formula here,
and then consider the variables x1, x2, x3 and x4, and for each of those variables, determine if it's easy to see,
if they should be set to true or false. And by easy, I mean without actually trying around different true assignments
for the other variables, but you can basically immediately say, for these variables, if they should be set to true or false.
I'm going to give you one hint for the solution, and that is that, in my opinion--and this is a bit of a subjective question--
I think that for two of these variables here, it's rather easy to see. And I would like you to select those two.