In CSC165 I coerce students into using a highly-structured expression of proof. Just as the programming language python uses white space on the left (indentation) to indicate scope, they must indent portions of their proof that share scope.
You want to prove something about every real number, so you name a generic real number x so that you can reason about it...
Assume x is a generic real number. To tip off the reader that I'm within the world-view where x has the properties of a real number, I indent one level from the left.
Several assumptions later I may have indented the text nearly off the right-hand margin. To make things even more unnatural, any non-trivial steps in reasoning must be justified. The prose ends up being broken up, and the flow is broken.
It feels as though truth and beauty are at odds here, or at least structure and beauty. I convince myself that, down the road, there will be a payoff when I (or some of my students) will write proofs that have both an easy-flowing style and crystal-clear structure.
Until then, I coerce my students into writing structured proofs.
Saturday, October 16, 2010
Thursday, September 2, 2010
Un-Nintendoed consequences

Decades ago Mike Constable showed me a bike route that follows laneways from Bathurst and College to about Clinton. Later I figured out how to combine this, using a north/south lane between Euclid and Manning, to join up with with some more lanes running all the way to Shaw just above Dundas (there's a small traverse across a former schoolyard that has become either a movie set or a construction site).
Along the northern run of this route there's a municipal parking lot with this sign ("Purchase ticket and place face up on dashboard" --- to come). I can't ride by without having visions of happy customers snoozing with their cheeks
on the dashboard. I passed my vision on to my sons, and thought of it as our private joke.
Then I noticed that somebody had spray-painted out the word "up," making the double-entendre much more intelligible. Nice to have more people on the same side of a joke as I am, I thought. That wasn't the end of it.
A week or so ago, I noticed that a new, grafitti-free sign had been put up. I guess this is in analogy with the broken-window theory of crime prevention: if you fix minor ruptures in the atmosphere such as broken windows and grafitti, you decrease the prevalence of harder stuff like muggings. I'm not sure I believe that, but I assume that the sign-replacer was thinking along those lines: better nip this bit of irony before it gets out of hand!

After all, irony allows us to consider alternative interpretations of reality. Start going down that road, and you might try to change reality as it's currently implemented. If you prefer your quo static, irony deficiency is a virtue rather than a vice.
If the world were programmed like a video game, irony could lead to un-Nintendoed consequences.
Sunday, March 21, 2010
Big Os, little statements
When we teach big-Oh theory we consider functions at a very large-scale resolution. We can make sweeping, and in another context ridiculous, statements such as 3n2+5, 15n2+17, and n2 are all the "same," since they grow at the same rate as n grows very large.
It requires a recalibration of what we mean by "same." If you were to plot the functions in the previous paragraph on a graph, they would trace different lines, but would grow at the same rate (at least after a "while"). Their sameness is in how they grow.
The weird thing is that our motivation for studying big-Oh is to use these functions as bounds on the running time of programs. We try to count algorithm operations in a platform-independent manner, so that our estimation of running-time is not mired in a particular processor chip, programming langauge, or operating system. After lots of generalization all that's left is how quickly the algorithm execution time grows with the size of the input.
However, running times that differ by factors such as 15 or 3 matter a great deal. Once we're satisfied, or not concerned, with how an algorithm scales with input size, we have to worry about bottlenecks that makes the difference between, say, 15n2 and 13n2 as bounds on running time. At that point we'd get a good profiler and see whether we could shave off a statement here, a loop iteration there, and optimize our program.
It requires a recalibration of what we mean by "same." If you were to plot the functions in the previous paragraph on a graph, they would trace different lines, but would grow at the same rate (at least after a "while"). Their sameness is in how they grow.
The weird thing is that our motivation for studying big-Oh is to use these functions as bounds on the running time of programs. We try to count algorithm operations in a platform-independent manner, so that our estimation of running-time is not mired in a particular processor chip, programming langauge, or operating system. After lots of generalization all that's left is how quickly the algorithm execution time grows with the size of the input.
However, running times that differ by factors such as 15 or 3 matter a great deal. Once we're satisfied, or not concerned, with how an algorithm scales with input size, we have to worry about bottlenecks that makes the difference between, say, 15n2 and 13n2 as bounds on running time. At that point we'd get a good profiler and see whether we could shave off a statement here, a loop iteration there, and optimize our program.
Saturday, March 6, 2010
Getting to zeroth base
Lots of students arrive in CSC165 with a template for proofs by induction that has served them well thus far but (I think) won't be quite enough for the rest of their existence as computer scientists. I try to show them a few examples that don't quite fit the template.
My first divergence from templates is to prove things about inequalities, rather than equalities. Such a small thing makes a large difference.
Equalities are tidy, when we claim that for each natural number n, some expression involving n is equal to some other expression involving n. During the induction step we plug in the expression involving n (we're justified in doing this by the induction hypothesis) and it is an exact fit. Often the desired result just falls off the other end.
Inequalities are another matter. Suppose we want to prove that for each natural number n, 3n is at least n3. During the induction step we must show that if the claim is true for some generic natural n, then it is true for n's successor n+1. In other words, we have to show that 3n+1 is at least (n+1)3. In this case when we plug in the inductive hypothesis that 3n is at least n3, we don't have an exact fit (an equality) but an overestimate (an inequality). Some art is required to make sure that we can use overestimates at every step to show that the desired result follows.
The art that seems to work, in this case, requires the assumption that n is at least 3. This is strange, since the claim itself is true when n is 0, 1, 2, or 3, but the piece of logic that carries us from the claim for n to the analogous claim for n+1 needs n to be 3. That leaves us with no choice but to establish the claim for 0, 1, 2 and 3 by direct verification, as base cases.
This is the second place I break templates. Lots of students have rules-of-thumb for deciding what base cases are needed in a proof by induction. In truth, the decision about what must be established as a base case, by direction verification, is made by examining the induction step and seeing which cases are left out (not covered) by its logic.
The entire effect is that although induction is a familiar topic to most CSC165 students, there are still aspects of this proof technique that many of them haven't seen before.
My first divergence from templates is to prove things about inequalities, rather than equalities. Such a small thing makes a large difference.
Equalities are tidy, when we claim that for each natural number n, some expression involving n is equal to some other expression involving n. During the induction step we plug in the expression involving n (we're justified in doing this by the induction hypothesis) and it is an exact fit. Often the desired result just falls off the other end.
Inequalities are another matter. Suppose we want to prove that for each natural number n, 3n is at least n3. During the induction step we must show that if the claim is true for some generic natural n, then it is true for n's successor n+1. In other words, we have to show that 3n+1 is at least (n+1)3. In this case when we plug in the inductive hypothesis that 3n is at least n3, we don't have an exact fit (an equality) but an overestimate (an inequality). Some art is required to make sure that we can use overestimates at every step to show that the desired result follows.
The art that seems to work, in this case, requires the assumption that n is at least 3. This is strange, since the claim itself is true when n is 0, 1, 2, or 3, but the piece of logic that carries us from the claim for n to the analogous claim for n+1 needs n to be 3. That leaves us with no choice but to establish the claim for 0, 1, 2 and 3 by direct verification, as base cases.
This is the second place I break templates. Lots of students have rules-of-thumb for deciding what base cases are needed in a proof by induction. In truth, the decision about what must be established as a base case, by direction verification, is made by examining the induction step and seeing which cases are left out (not covered) by its logic.
The entire effect is that although induction is a familiar topic to most CSC165 students, there are still aspects of this proof technique that many of them haven't seen before.
Friday, February 26, 2010
oopsilon-deltoids
Proving, or disproving, the existence of limits is a great target for the tools of logic. The bare-bones limit concept involves a cascade of three quantifiers, mixing universal and existential quantification:
Limit notation conceals some of the quantification of epsilon and delta:
Continuity adds an extra feature to the limit concept --- the limit L that is approached by f(x) is exactly f(c). Now pile on another feature: f is continuous at every real number. As a limit this says that for every real number c, the limit of f(x) as x approaches c is f(c). Expressed with quantifiers, the whole bundle becomes:
The generic statement above talks about the function f(x). To make it more concrete, let's suppose this is the square function, so f(x) is x2. Suppose, in addition, that I use letters near the end of the Latin alphabet to stand for real numbers, so y seems as good a choice as c for the point in the domain we approach for the limit. Now the above statement about continuity becomes:
From a logical point-of-view, this is extremely similar to the first statement about continuity at every real point, except instead of a generic function f(x), there is a definite function x2. However, from a psychological and cognitive point-of-view, many people are in the habit of thinking of y as the dependent variable that (graphically) expresses position in the vertical dimension of a graph. Looking at the statement above, they picture y2 shooting out sideways or something.
The lesson is that logical expression is very sensitive to context and connotations that it is afloat in. Although logic can't completely concede to the surrounding context, since it tries to be precise and self-sufficient, logic has to be aware of strange, distracting associations that the use of a particular symbol or word may cause.
For every positive real number epsilon, there exists a positive real number delta, for every real number x, if |x-c| is less than delta, then |f(x)-f(c)| is less than epsilon.
Limit notation conceals some of the quantification of epsilon and delta:
As x approaches c, f(x) approaches L."Approaching" means getting arbitrarily close, and the limit forms says you can get within epsilon of L by getting within delta of c.
Continuity adds an extra feature to the limit concept --- the limit L that is approached by f(x) is exactly f(c). Now pile on another feature: f is continuous at every real number. As a limit this says that for every real number c, the limit of f(x) as x approaches c is f(c). Expressed with quantifiers, the whole bundle becomes:
For every real number c, for every positive real number delta, there exists a positive real number delta, for every real number x, if |x-c| is less than delta, then |f(x)-f(c)| is less than epsilon.Okay so far? This is the sort of statement we learn how to structure proofs (and disproofs) of in CSC165. We become adept at juggling long strings of symbols and realizing the importance of having one sort of quantifier precede another. We become used to freely changing the symbolic names of variables as suits our purposes. We (and here I think I mean me) can lose sight of conventions that attach particular meaning to particular symbols.
The generic statement above talks about the function f(x). To make it more concrete, let's suppose this is the square function, so f(x) is x2. Suppose, in addition, that I use letters near the end of the Latin alphabet to stand for real numbers, so y seems as good a choice as c for the point in the domain we approach for the limit. Now the above statement about continuity becomes:
For every real number y, for every positive real number epsilon, there exists a positive real number delta, for every real number x, if |x-y| is less than delta, then |x2 - y2| is less than epsilon.
From a logical point-of-view, this is extremely similar to the first statement about continuity at every real point, except instead of a generic function f(x), there is a definite function x2. However, from a psychological and cognitive point-of-view, many people are in the habit of thinking of y as the dependent variable that (graphically) expresses position in the vertical dimension of a graph. Looking at the statement above, they picture y2 shooting out sideways or something.
The lesson is that logical expression is very sensitive to context and connotations that it is afloat in. Although logic can't completely concede to the surrounding context, since it tries to be precise and self-sufficient, logic has to be aware of strange, distracting associations that the use of a particular symbol or word may cause.
Thursday, February 18, 2010
symmetry
Symmetric: The same, only different.
That'll do as a first approximation of a definition. Symmetrical objects aren't identical, but their shared structure (the "same" part) provides us with a short-cut to understanding them.
Logical operations have an abundance of symmetry. Lots of us are familiar with De Morgan's Law: not (P and Q) is equivalent to (not P) or (not Q): the not operation distributes over and, plus "toggles" it into an or. Symmetrically, you can replace every and by an or (and vice-versa) above, to get the other half of De Morgan's law.
There's probably some deep aesthetic pleasure and satisfaction we experience when we discover symmetry, and this probably helps us remember and use symmetrical concepts. However, occasionally the symmetry is so striking that we find it difficult to completely absorb and use. Perhaps this difficulty recedes when we become used to the symmetry, but it's certainly there at first.
I'm thinking of the distributive laws. One form states that P and (Q or R) is equivalent to (P and Q) or (P and R). We can certainly verify this using logical tools such as truth tables and Venn diagrams, and it is analogous to the distributive property in arithmetic, where multiplication is distributed over addition (just substitute multiplication for and and addition for or). However, the comfort and support of analogy evaporate when the second, symmetrical, form is considered: transform each and above into an or, and vice versa. In algebra, there is no distribution of addition over multiplication to help our intuition, but in logic and distributes over or, and or distributes over and.
I became forcefully aware of this working on an exercise I had posed to my students. In the middle of several transformations of some logical expressions, many of us ended up starting at something like:
(not P or R1) and (not P or R2)
Most of us saw one application of the distributive law --- distribute the middle and over the bracketed ors, corresponding to what we would call "expanding " in arithmetic:
((not P or R1) and not P) or ((not P or R1) and R2)
Unfortunately this approach, even when repeated on the bracketed expression on either side of the central or, didn't seem to lead very quick to the desired result. The interesting thing is that few of us saw that the original expression contained another application of the distributive law: the or following not P was distributed over the and, and this can be reversed, corresponding to what would call "factoring" in arithmetic. This approach yields:
not P or (R1 and R2)
... which happened to take us closer to the solution of the exercise.
What intrigues me is the tendency to "see" the first possible application of the distributive law, and not the second. An underlying difficulty is that there are twice as many distributive laws in logic (and over or plus or over and) as there are in arithmetic. This is probably aggravated by it being cognitively a bit harder to recognize the possibility of factoring compared to the possibility of expanding (gathering versus distributing, perhaps).
And we haven't (most of us, anyway) had several years of schoolwork preparing us to recognize these patterns. In some parallel universe, kids learn to manipulate logic symbols in grade one, and they just shake their heads ironically at our difficulty. But then, they are completely stumped by addition being commutative.
That'll do as a first approximation of a definition. Symmetrical objects aren't identical, but their shared structure (the "same" part) provides us with a short-cut to understanding them.
Logical operations have an abundance of symmetry. Lots of us are familiar with De Morgan's Law: not (P and Q) is equivalent to (not P) or (not Q): the not operation distributes over and, plus "toggles" it into an or. Symmetrically, you can replace every and by an or (and vice-versa) above, to get the other half of De Morgan's law.
There's probably some deep aesthetic pleasure and satisfaction we experience when we discover symmetry, and this probably helps us remember and use symmetrical concepts. However, occasionally the symmetry is so striking that we find it difficult to completely absorb and use. Perhaps this difficulty recedes when we become used to the symmetry, but it's certainly there at first.
I'm thinking of the distributive laws. One form states that P and (Q or R) is equivalent to (P and Q) or (P and R). We can certainly verify this using logical tools such as truth tables and Venn diagrams, and it is analogous to the distributive property in arithmetic, where multiplication is distributed over addition (just substitute multiplication for and and addition for or). However, the comfort and support of analogy evaporate when the second, symmetrical, form is considered: transform each and above into an or, and vice versa. In algebra, there is no distribution of addition over multiplication to help our intuition, but in logic and distributes over or, and or distributes over and.
I became forcefully aware of this working on an exercise I had posed to my students. In the middle of several transformations of some logical expressions, many of us ended up starting at something like:
(not P or R1) and (not P or R2)
Most of us saw one application of the distributive law --- distribute the middle and over the bracketed ors, corresponding to what we would call "expanding " in arithmetic:
((not P or R1) and not P) or ((not P or R1) and R2)
Unfortunately this approach, even when repeated on the bracketed expression on either side of the central or, didn't seem to lead very quick to the desired result. The interesting thing is that few of us saw that the original expression contained another application of the distributive law: the or following not P was distributed over the and, and this can be reversed, corresponding to what would call "factoring" in arithmetic. This approach yields:
not P or (R1 and R2)
... which happened to take us closer to the solution of the exercise.
What intrigues me is the tendency to "see" the first possible application of the distributive law, and not the second. An underlying difficulty is that there are twice as many distributive laws in logic (and over or plus or over and) as there are in arithmetic. This is probably aggravated by it being cognitively a bit harder to recognize the possibility of factoring compared to the possibility of expanding (gathering versus distributing, perhaps).
And we haven't (most of us, anyway) had several years of schoolwork preparing us to recognize these patterns. In some parallel universe, kids learn to manipulate logic symbols in grade one, and they just shake their heads ironically at our difficulty. But then, they are completely stumped by addition being commutative.
Friday, February 5, 2010
The goldilocks problem
In teaching students how to design and implement proofs, we start them with an extremely structured format. This structure is both a boon and an irritation, since it helps organize the ingredients of a proof, while at the same time making the format of a proof extremely predictable. We have the goldilocks problem of making the form predictable enough, but not too predictable.
If students want to prove something about all natural numbers, they introduce a name for a generic natural number, prove things about it, and then conclude that the results they've proved apply to all natural numbers.
We force them to indent the results they prove about the generic natural number, to emphasize that they are inside the "world" where name they've introduced is a generic natural number.
If they go further and implement a proof about an implication, of the form P implies Q, we have them (at least for a direct proof) assume P is true, indent some more, and then derive Q. Our rationale for this step is that if P were false the implication would be vacuously true, so they only have to worry about the case where P is true.
We expect students who become mature theoreticians will develop their own voice, and write clear, valid proofs that diverge from the proof format that we impose in our course, with its arbitrary order and indentation. We don't know exactly when they will develop this mathematical maturity (some of them may already have), so in the mean time we coerce them into mastering one format for proofs that they, and the course teaching staff, can agree works.
My challenge in communicating the proof format to them is to get most students to the point that they are nodding along with the proof format because they understand it, but stop before they are all nodding off. Goldilocks.
If students want to prove something about all natural numbers, they introduce a name for a generic natural number, prove things about it, and then conclude that the results they've proved apply to all natural numbers.
We force them to indent the results they prove about the generic natural number, to emphasize that they are inside the "world" where name they've introduced is a generic natural number.
If they go further and implement a proof about an implication, of the form P implies Q, we have them (at least for a direct proof) assume P is true, indent some more, and then derive Q. Our rationale for this step is that if P were false the implication would be vacuously true, so they only have to worry about the case where P is true.
We expect students who become mature theoreticians will develop their own voice, and write clear, valid proofs that diverge from the proof format that we impose in our course, with its arbitrary order and indentation. We don't know exactly when they will develop this mathematical maturity (some of them may already have), so in the mean time we coerce them into mastering one format for proofs that they, and the course teaching staff, can agree works.
My challenge in communicating the proof format to them is to get most students to the point that they are nodding along with the proof format because they understand it, but stop before they are all nodding off. Goldilocks.
Subscribe to:
Comments (Atom)
