Danny's theory: 2010

Saturday, October 16, 2010

Navigating scope

In CSC165 I coerce students into using a highly-structured expression of proof. Just as the programming language python uses white space on the left (indentation) to indicate scope, they must indent portions of their proof that share scope.

You want to prove something about every real number, so you name a generic real number x so that you can reason about it...

Assume x is a generic real number. To tip off the reader that I'm within the world-view where x has the properties of a real number, I indent one level from the left.

Several assumptions later I may have indented the text nearly off the right-hand margin. To make things even more unnatural, any non-trivial steps in reasoning must be justified. The prose ends up being broken up, and the flow is broken.

It feels as though truth and beauty are at odds here, or at least structure and beauty. I convince myself that, down the road, there will be a payoff when I (or some of my students) will write proofs that have both an easy-flowing style and crystal-clear structure.

Until then, I coerce my students into writing structured proofs.

Thursday, September 2, 2010

Un-Nintendoed consequences

Decades ago Mike Constable showed me a bike route that follows laneways from Bathurst and College to about Clinton. Later I figured out how to combine this, using a north/south lane between Euclid and Manning, to join up with with some more lanes running all the way to Shaw just above Dundas (there's a small traverse across a former schoolyard that has become either a movie set or a construction site).

Along the northern run of this route there's a municipal parking lot with this sign ("Purchase ticket and place face up on dashboard" --- to come). I can't ride by without having visions of happy customers snoozing with their cheeks

on the dashboard. I passed my vision on to my sons, and thought of it as our private joke.

Then I noticed that somebody had spray-painted out the word "up," making the double-entendre much more intelligible. Nice to have more people on the same side of a joke as I am, I thought. That wasn't the end of it.

A week or so ago, I noticed that a new, grafitti-free sign had been put up. I guess this is in analogy with the broken-window theory of crime prevention: if you fix minor ruptures in the atmosphere such as broken windows and grafitti, you decrease the prevalence of harder stuff like muggings. I'm not sure I believe that, but I assume that the sign-replacer was thinking along those lines: better nip this bit of irony before it gets out of hand!

After all, irony allows us to consider alternative interpretations of reality. Start going down that road, and you might try to change reality as it's currently implemented. If you prefer your quo static, irony deficiency is a virtue rather than a vice.

If the world were programmed like a video game, irony could lead to un-Nintendoed consequences.

Sunday, March 21, 2010

Big Os, little statements

When we teach big-Oh theory we consider functions at a very large-scale resolution. We can make sweeping, and in another context ridiculous, statements such as 3n²+5, 15n²+17, and n² are all the "same," since they grow at the same rate as n grows very large.

It requires a recalibration of what we mean by "same." If you were to plot the functions in the previous paragraph on a graph, they would trace different lines, but would grow at the same rate (at least after a "while"). Their sameness is in how they grow.

The weird thing is that our motivation for studying big-Oh is to use these functions as bounds on the running time of programs. We try to count algorithm operations in a platform-independent manner, so that our estimation of running-time is not mired in a particular processor chip, programming langauge, or operating system. After lots of generalization all that's left is how quickly the algorithm execution time grows with the size of the input.

However, running times that differ by factors such as 15 or 3 matter a great deal. Once we're satisfied, or not concerned, with how an algorithm scales with input size, we have to worry about bottlenecks that makes the difference between, say, 15n² and 13n² as bounds on running time. At that point we'd get a good profiler and see whether we could shave off a statement here, a loop iteration there, and optimize our program.

Saturday, March 6, 2010

Getting to zeroth base

Lots of students arrive in CSC165 with a template for proofs by induction that has served them well thus far but (I think) won't be quite enough for the rest of their existence as computer scientists. I try to show them a few examples that don't quite fit the template.

My first divergence from templates is to prove things about inequalities, rather than equalities. Such a small thing makes a large difference.

Equalities are tidy, when we claim that for each natural number n, some expression involving n is equal to some other expression involving n. During the induction step we plug in the expression involving n (we're justified in doing this by the induction hypothesis) and it is an exact fit. Often the desired result just falls off the other end.

Inequalities are another matter. Suppose we want to prove that for each natural number n, 3ⁿ is at least n³. During the induction step we must show that if the claim is true for some generic natural n, then it is true for n's successor n+1. In other words, we have to show that 3ⁿ⁺¹ is at least (n+1)³. In this case when we plug in the inductive hypothesis that 3ⁿ is at least n³, we don't have an exact fit (an equality) but an overestimate (an inequality). Some art is required to make sure that we can use overestimates at every step to show that the desired result follows.

The art that seems to work, in this case, requires the assumption that n is at least 3. This is strange, since the claim itself is true when n is 0, 1, 2, or 3, but the piece of logic that carries us from the claim for n to the analogous claim for n+1 needs n to be 3. That leaves us with no choice but to establish the claim for 0, 1, 2 and 3 by direct verification, as base cases.

This is the second place I break templates. Lots of students have rules-of-thumb for deciding what base cases are needed in a proof by induction. In truth, the decision about what must be established as a base case, by direction verification, is made by examining the induction step and seeing which cases are left out (not covered) by its logic.

The entire effect is that although induction is a familiar topic to most CSC165 students, there are still aspects of this proof technique that many of them haven't seen before.

Friday, February 26, 2010

oopsilon-deltoids

Proving, or disproving, the existence of limits is a great target for the tools of logic. The bare-bones limit concept involves a cascade of three quantifiers, mixing universal and existential quantification:

For every positive real number epsilon, there exists a positive real number delta, for every real number x, if |x-c| is less than delta, then |f(x)-f(c)| is less than epsilon.

Limit notation conceals some of the quantification of epsilon and delta:

As x approaches c, f(x) approaches L.

"Approaching" means getting arbitrarily close, and the limit forms says you can get within epsilon of L by getting within delta of c.

Continuity adds an extra feature to the limit concept --- the limit L that is approached by f(x) is exactly f(c). Now pile on another feature: f is continuous at every real number. As a limit this says that for every real number c, the limit of f(x) as x approaches c is f(c). Expressed with quantifiers, the whole bundle becomes:

For every real number c, for every positive real number delta, there exists a positive real number delta, for every real number x, if |x-c| is less than delta, then |f(x)-f(c)| is less than epsilon.

Okay so far? This is the sort of statement we learn how to structure proofs (and disproofs) of in CSC165. We become adept at juggling long strings of symbols and realizing the importance of having one sort of quantifier precede another. We become used to freely changing the symbolic names of variables as suits our purposes. We (and here I think I mean me) can lose sight of conventions that attach particular meaning to particular symbols.

The generic statement above talks about the function f(x). To make it more concrete, let's suppose this is the square function, so f(x) is x². Suppose, in addition, that I use letters near the end of the Latin alphabet to stand for real numbers, so y seems as good a choice as c for the point in the domain we approach for the limit. Now the above statement about continuity becomes:

For every real number y, for every positive real number epsilon, there exists a positive real number delta, for every real number x, if |x-y| is less than delta, then |x² - y²| is less than epsilon.

From a logical point-of-view, this is extremely similar to the first statement about continuity at every real point, except instead of a generic function f(x), there is a definite function x². However, from a psychological and cognitive point-of-view, many people are in the habit of thinking of y as the dependent variable that (graphically) expresses position in the vertical dimension of a graph. Looking at the statement above, they picture y² shooting out sideways or something.

The lesson is that logical expression is very sensitive to context and connotations that it is afloat in. Although logic can't completely concede to the surrounding context, since it tries to be precise and self-sufficient, logic has to be aware of strange, distracting associations that the use of a particular symbol or word may cause.

Thursday, February 18, 2010

symmetry

Symmetric: The same, only different.

That'll do as a first approximation of a definition. Symmetrical objects aren't identical, but their shared structure (the "same" part) provides us with a short-cut to understanding them.

Logical operations have an abundance of symmetry. Lots of us are familiar with De Morgan's Law: not (P and Q) is equivalent to (not P) or (not Q): the not operation distributes over and, plus "toggles" it into an or. Symmetrically, you can replace every and by an or (and vice-versa) above, to get the other half of De Morgan's law.

There's probably some deep aesthetic pleasure and satisfaction we experience when we discover symmetry, and this probably helps us remember and use symmetrical concepts. However, occasionally the symmetry is so striking that we find it difficult to completely absorb and use. Perhaps this difficulty recedes when we become used to the symmetry, but it's certainly there at first.

I'm thinking of the distributive laws. One form states that P and (Q or R) is equivalent to (P and Q) or (P and R). We can certainly verify this using logical tools such as truth tables and Venn diagrams, and it is analogous to the distributive property in arithmetic, where multiplication is distributed over addition (just substitute multiplication for and and addition for or). However, the comfort and support of analogy evaporate when the second, symmetrical, form is considered: transform each and above into an or, and vice versa. In algebra, there is no distribution of addition over multiplication to help our intuition, but in logic and distributes over or, and or distributes over and.

I became forcefully aware of this working on an exercise I had posed to my students. In the middle of several transformations of some logical expressions, many of us ended up starting at something like:

(not P or R1) and (not P or R2)

Most of us saw one application of the distributive law --- distribute the middle and over the bracketed ors, corresponding to what we would call "expanding " in arithmetic:

((not P or R1) and not P) or ((not P or R1) and R2)

Unfortunately this approach, even when repeated on the bracketed expression on either side of the central or, didn't seem to lead very quick to the desired result. The interesting thing is that few of us saw that the original expression contained another application of the distributive law: the or following not P was distributed over the and, and this can be reversed, corresponding to what would call "factoring" in arithmetic. This approach yields:

not P or (R1 and R2)

... which happened to take us closer to the solution of the exercise.

What intrigues me is the tendency to "see" the first possible application of the distributive law, and not the second. An underlying difficulty is that there are twice as many distributive laws in logic (and over or plus or over and) as there are in arithmetic. This is probably aggravated by it being cognitively a bit harder to recognize the possibility of factoring compared to the possibility of expanding (gathering versus distributing, perhaps).

And we haven't (most of us, anyway) had several years of schoolwork preparing us to recognize these patterns. In some parallel universe, kids learn to manipulate logic symbols in grade one, and they just shake their heads ironically at our difficulty. But then, they are completely stumped by addition being commutative.

Friday, February 5, 2010

The goldilocks problem

In teaching students how to design and implement proofs, we start them with an extremely structured format. This structure is both a boon and an irritation, since it helps organize the ingredients of a proof, while at the same time making the format of a proof extremely predictable. We have the goldilocks problem of making the form predictable enough, but not too predictable.

If students want to prove something about all natural numbers, they introduce a name for a generic natural number, prove things about it, and then conclude that the results they've proved apply to all natural numbers.

We force them to indent the results they prove about the generic natural number, to emphasize that they are inside the "world" where name they've introduced is a generic natural number.

If they go further and implement a proof about an implication, of the form P implies Q, we have them (at least for a direct proof) assume P is true, indent some more, and then derive Q. Our rationale for this step is that if P were false the implication would be vacuously true, so they only have to worry about the case where P is true.

We expect students who become mature theoreticians will develop their own voice, and write clear, valid proofs that diverge from the proof format that we impose in our course, with its arbitrary order and indentation. We don't know exactly when they will develop this mathematical maturity (some of them may already have), so in the mean time we coerce them into mastering one format for proofs that they, and the course teaching staff, can agree works.

My challenge in communicating the proof format to them is to get most students to the point that they are nodding along with the proof format because they understand it, but stop before they are all nodding off. Goldilocks.

Friday, January 29, 2010

limits and turf

Sometimes we computer science folk get into jurisdictional disputes with our neighbours in mathematics (partly because our neighbours used to be us, depending on how you reckon the heritage of computer science).

A standard dispute is over whether zero is a natural number. From a computer science perspective it seems pretty innocuous: zero is often a useful case to consider in the realm of whole numbers of a non-negative flavour, and it's certainly possible to count things starting from zero (and we do, in many programming languages, with list indices and such). In calculus, though, they insist on excluding this completely innocuous (not to mention small) element from the natural numbers.

Now I find myself a little impatient with the way at least one calculus text presents limits. I think of finite limits (what value does a function "get close" to as it argument approaches some constant) and infinite limits (what does it mean for a function to "get close" to infinity as its argument gets close to some value) as part of the same topic. After finding that these two topics seemed hugely different to my fall semester students, I began checking my calculus texts. Courant and Spivak get around to limits of sequences, and limits that are unbounded, toward the end of their treatment of limits. Most of my students use Sala, Hille, and Etgen, where finite limits are treated in the first hundred pages, and the remaining limit-related topics are spread over hundreds more pages (and months of the course).

My motive for discussing limits is twofold. They provide a good example of mixing statements involving "for all" with statements involving "exists". They are also an ideal starting point for the computer science topic of asymptotics: how do you express the idea that one function grows qualitatively faster than another?

The mixed quantifiers value of limits takes my students back to calculus. For every positive real number epsilon, there is a positive real number delta, for every real number x, if x is within delta of a, then x-squared is within epsilon of a-squared. A true statement, since you get to choose the delta based on the epsilon. However, if you change the order of "for every positive real number epsilon" and the "there is a positive real number delta" you get a falsehood: you can't choose a delta that works for every epsilon.

The same reasoning, with appropriate modifications, works for infinite limits. For every positive real number epsilon, there is a positive real number delta, for every real number x, if x is within delta of infinity, then x-squared is within epsilon of infinity. What does "within delta" mean in this context? It means delta to the right of zero --- being close to infinity is synonymous with being far from zero.

Computer scientists want to talk about how exp(x) grows faster than x-squared. One slick way to do this (so slick that we don't allow our students to use it for a few weeks) is to consider the ratio exp(x)/x-squared. If, for every positive real number epsilon there is a positive real number delta, for every real number x, when x is within delta of infinity (bigger than delta), exp(x)/x-squared is within epsilon of infinity (bigger than epsilon), then exp(x) grows faster than x-squared (it does).

To we CS folk, the same notion of limit is being used in all these cases. It's a bit odd that they wouldn't be closely combined in our students' calculus text.

Thursday, January 21, 2010

Mick's quantifiers

There exists a house h in New Orleans, for every poor boy p, they call h the Rising Sun, and h has been the ruin of p.

I know it doesn't scan, but I can't help it. I've got this ear-worm of Mick Jagger singing the lyric above. I don't think The Stones ever did a cover of House of the Rising Sun, but I can hear it just as though they did.

I got into this state through an occupational hazard filtered through a perceptual problem. I'll be teaching a topic in logic next week called mixed quantifiers, the sort of thing that happens when you make a statement that some object of one type exists with the property that every object of some (possibly other) type has a property ... and so on. That's the occupational hazard. My perceptual problem is that whenever I hear a phrase like "mixed quantifiers" some auditory circuit runs a simulation of what the world might be like if it were really "Mick's quantifiers." So I can't safely hear phrases such as "mixed metaphors," "mixed company," or "mixed grill" without exploring a whole line of speculative alternative interpretations. Although I'm duty-bound to teach my students about mixed quantifiers, there will constantly be a voice at the back of my head wondering how all this fits in with Mick's quantifiers.

I mean, shouldn't I be asking my students to consider the difference between there being at least one instance of a house that has a terrible effect on every boy (or girl, depending on the version), and the situation where we switch the order?

For every poor boy p, there exists a house h in New Orleans, h is the ruin of p, and h is called the Rising Sun.

With this reversed order, each poor boy finds their own particular Rising Sun to torment them, whereas in the first version they all end up tormented in the same place. Wouldn't the world be a better place if veteran blues musicians improvised on riffs that are part of their common culture, and switched the quantifiers around a bit? Surely this is what Trad. Anon. had in mind when he/she/they wrote:

With one foot on existence, the other foot on for all, I'm going back to New Orleans to wear that chain and ball.

Wednesday, January 20, 2010

This week Friday falls on Wednesday.

I've already taught one cycle of this week's material to my Monday evening class, and now I'm trying to make sure my Monday-Wednesday-Friday version of the same material is consistent with it, and preparing other material. So the look backwards function of a blog can be exercised from this vantage point as well as any other.

I learned (or perhaps re-learned) just how thoroughly counter-intuitive the notion of vacuous truth can be this week. The name sounds as though the variety of truth is silly, trivial, or empty, but it's an important technique in logic. If we have an implication that claims that if P is true, then Q follows, then then implication as a whole is true whenever P is false. The notion rests on the fact that an implication claims that Q follows from P, and the only time that "following from" rule is broken is when P is true and Q breaches faith by being false.

In particular, whenever P is false, no breach of faith is possible.

I've often fired up vast and powerful pedagogical machinery to convey this point, being sure to make it several different ways since it registers with different people in differently. There are always some people who resist the first onslaught, so I back the machinery up and take another run at it. Usually nobody is willing to admit that vacuous truth still seems weird after two or three runs at it.

Monday night, though, I presented a particularly twisted exemplar of vacuous truth to an audience that wasn't shy about admitting that it seemed weird. Some idea of the dynamics are seen in the annotated slide where I present the claim that for every real number x, if x^2-2x+2 = 0, then x > x+5. The unnerving thing is that the entire if-then claim is true, since you'll never find a real number that satisfies the first part, but falsifies the second (there are no real roots of that quadratic equation). There's no magic: the false conclusion doesn't become true, but the claim that it follows from the antecedent is true.

In response to my question about whether this implication is true for all real numbers (it is), the Monday night crowd was split. Once we thrashed out the point that any real number substituted for x gives us a false antecedent (hence a true implication), I asked whether the claim true if it asked where there existed some real number where the antecedent implied the consequent. The majority disagreed, and asked me to come up with examples. I started spinning off various real numbers: 17.98, 13.532, ... (any real number would work), and we eventually agreed that the "there exists" form of the statement is also true. Then somebody asked what happens if we change the set x is a member of from the real numbers (where there is no solution to the quadratic equation), to the complex numbers (where all solutions exist). That made the question more interesting, and I found myself regretting that I hadn't thought of this twist first.

I had the feeling that several people were coming to grips with vacuous truth in real time during the lecture. Impressive.

Friday, January 15, 2010

Correct unless mistaken

The narrow way we use the English word "unless" in a course about mathematical expression is a real eye-opener. Consider the following statements:

You can't win the lottery unless you buy a ticket.
I won't go unless you come too.

In the first, you could translate "unless" as "if-not," so not buying the ticket guarantees not winning, but the converse is false: you aren't guaranteed to win if you buy the ticket. In the second statement, many of my workmates read "unless" as "if-and-only-if-not," so if you don't come too, I certainly won't go, but if you come, I will certainly come. If we take our case to the web, we'll find lots of support for the "if-not" translation, but also some for a combination of both. For the purposes of mathematics, I'll stick with the "if-not" version, but I can see that it's slippery.

Another good confrontation with real-world interpretation came when I asked students to explain the double-meaning in the headline:

Iraqi head seeks arms

They noted that the two meanings of head (the body part perched on your neck, and the leader of some entity) paired up neatly with two meanings of arms (the body parts hanging from your shoulders, and weapons), and they concluded that the headline certainly meant to pair the leader meaning of head with weapons. I realize that Condie probably wasn't thinking of this headline when he made La Sala, but it does leave open the question of whether the other interpretation of the headline doesn't have some support in the real world (wherever that is).

Danny's theory