Sunday, March 21, 2010

Big Os, little statements

When we teach big-Oh theory we consider functions at a very large-scale resolution. We can make sweeping, and in another context ridiculous, statements such as 3n2+5, 15n2+17, and n2 are all the "same," since they grow at the same rate as n grows very large.

It requires a recalibration of what we mean by "same." If you were to plot the functions in the previous paragraph on a graph, they would trace different lines, but would grow at the same rate (at least after a "while"). Their sameness is in how they grow.

The weird thing is that our motivation for studying big-Oh is to use these functions as bounds on the running time of programs. We try to count algorithm operations in a platform-independent manner, so that our estimation of running-time is not mired in a particular processor chip, programming langauge, or operating system. After lots of generalization all that's left is how quickly the algorithm execution time grows with the size of the input.

However, running times that differ by factors such as 15 or 3 matter a great deal. Once we're satisfied, or not concerned, with how an algorithm scales with input size, we have to worry about bottlenecks that makes the difference between, say, 15n2 and 13n2 as bounds on running time. At that point we'd get a good profiler and see whether we could shave off a statement here, a loop iteration there, and optimize our program.

Saturday, March 6, 2010

Getting to zeroth base

Lots of students arrive in CSC165 with a template for proofs by induction that has served them well thus far but (I think) won't be quite enough for the rest of their existence as computer scientists. I try to show them a few examples that don't quite fit the template.

My first divergence from templates is to prove things about inequalities, rather than equalities. Such a small thing makes a large difference.

Equalities are tidy, when we claim that for each natural number n, some expression involving n is equal to some other expression involving n. During the induction step we plug in the expression involving n (we're justified in doing this by the induction hypothesis) and it is an exact fit. Often the desired result just falls off the other end.

Inequalities are another matter. Suppose we want to prove that for each natural number n, 3n is at least n3. During the induction step we must show that if the claim is true for some generic natural n, then it is true for n's successor n+1. In other words, we have to show that 3n+1 is at least (n+1)3. In this case when we plug in the inductive hypothesis that 3n is at least n3, we don't have an exact fit (an equality) but an overestimate (an inequality). Some art is required to make sure that we can use overestimates at every step to show that the desired result follows.

The art that seems to work, in this case, requires the assumption that n is at least 3. This is strange, since the claim itself is true when n is 0, 1, 2, or 3, but the piece of logic that carries us from the claim for n to the analogous claim for n+1 needs n to be 3. That leaves us with no choice but to establish the claim for 0, 1, 2 and 3 by direct verification, as base cases.

This is the second place I break templates. Lots of students have rules-of-thumb for deciding what base cases are needed in a proof by induction. In truth, the decision about what must be established as a base case, by direction verification, is made by examining the induction step and seeing which cases are left out (not covered) by its logic.

The entire effect is that although induction is a familiar topic to most CSC165 students, there are still aspects of this proof technique that many of them haven't seen before.