Since the cat’s out of the bag and we’ve had some public discussion of our first Problem of the Week, I thought I’d officially kick things off with a new problem, this time under the ground rules discussed in our last post. This one comes from our regular reader John Smith, who wrote me this problem in email. It comes in two parts; here’s the first:

Consider a rectangle with sides of integer lengths a and b, subdivided into ab unit squares, and a diagonal line from corner to corner. Show that the number of unit squares that the line crosses is a+b-\gcd(a,b). (Count only those cases where the line crosses the interior of a square.)

A little while later, John wrote me a follow-up question, which will be our part two:

What would be the n-dimensional analogue of the previous problem? How would you prove it?

Please submit your solutions through email to topological[dot]musings[At]gmail[dot]com, rendering the stuff in brackets in the obvious way; please do not reply in comments! Solutions should arrive by May 20 (11:59pm UTC time if you want to be exact). We will post one of the correct solutions (or otherwise our own), but all correct solutions will be acknowledged, and successful solvers will have their names entered into our Problem-Solving Hall of Fame. So get your answers in, and happy solving!

[Added May 15: if you just want to solve part one, that's fine -- credit will be awarded for correct solutions. In general for multi-part problems, we ask that you submit all parts you intend to solve in one mailing -- thanks.]

Todd and I, after some discussion, have decided to start a new series of posts called “Problem of the Week“. Since the title is self-explanatory, there is nothing much to write about, except that this series will be a permanent fixture on our blog. Each week we will post a problem (unless both of us are too busy), and we invite solutions from all our readers, regular or not; in most cases solutions should be submitted through email within a week after the post. To keep things fair, we ask readers not to submit solutions in the comment section of a post. At the end of the allotted time, we will then select the most elegant solution (or anyway, one that we like) to a given problem, and post it together with the name of the proposer. (Note: solutions may be edited, or composed from solutions of more than one proposer.) Other people who submit correct solutions will also be acknowledged. In addition, credit will be given to all such people by having their names enshrined in the Problem-Solving Hall of Fame page, forever! Who could ask for anything more?!

Most of our problems will be “elementary”, meaning they won’t require knowledge of advanced areas of mathematics; generally they will require some knowledge of subjects such as elementary number theory, college-level calculus, basic algebra, trigonometry, inequalities, etc., and some “mathematical maturity”.

Please submit solutions to topological[dot]musings[At]gmail.com (remove the characters within the square brackets and replace them with the appropriate ones). You may send in your solutions in pdf, LaTeX or text format. (Though we prefer LaTeX and/or pdf submissions, we ask you not to worry too much about the format.) [Added May 15: in case of multi-part problems, please submit all the parts you intend to solve in one mailing. Credit will be awarded for each part correctly solved.]

I guess that’s it! Keep checking the blog for the next POW problem!

[Update: I have decided to pose at least one problem (or perhaps, two problems) a week from now and invite solutions to those problems from the readers. This should be exciting. I am sure all of you would want your names to be listed in the Problem-Solving Hall of Fame page! 8) ]

This post is about getting the readers of this blog to participate! And so, let me pose an elementary divisibility problem and invite solutions from the readers. “Elementary” doesn’t necessarily mean “easy”, by the way. It just means elementary methods can be used to solve the problem. I am particularly looking for elegant solutions, though all kinds of solutions are welcome. Readers with correct solutions will be given credit and their names will be included in the Problem-Solving Hall of Fame page on this blog! So, go ahead, write down your solutions.

Problem: Suppose x and y are two positive natural numbers such that x^2 + xy + 1 is divisible by y^2 + xy + 1. Prove that x = y.

The following theorem, I feel, is not very well-known, though it is a particularly useful one for solving certain types of “limit” problems. Let me pose a couple of elementary problems and offer their solutions. First, the theorem.

Stolz-Cesàro: Let (a_n)_{n \ge 1} and (b_n)_{n \ge 1} be two sequences of real numbers, such that (b_n) is positive, strictly increasing and unbounded. Then,

\displaystyle \lim_{n \to \infty} \frac{a_n}{b_n} = \lim_{n \to \infty} \frac{a_{n+1} - a_n}{b_{n+1} - b_n},

if the limit on the right hand side exists.

The proof involves the usual \epsilon - \delta method, and I will avoid presenting it here since it isn’t particularly interesting. Just as Abel’s lemma is the discrete analogue of integration by parts, the Stolz-Cesàro theorem may be considered the discrete analogue of L’Hospital’s rule in calculus.

Problem 1: Evaluate the limit \displaystyle \lim_{n \to \infty} \frac{1^k + 2^k + \ldots + n^k}{n^{k+1}}, where k \in \mathbb{N}.

Solution: One may certainly consider the above limit as a Riemann-sum which may then be transformed into the integral \displaystyle \int_0^1 x^k \, dx, which then obviously evaluates to 1/(k+1). But, we will take a different route here.

First, let a_n = 1^k + 2^k + \ldots + n^k and b_n = n^{k+1}. Then, we note that the sequence (b_n) is positive, strictly increasing and unbounded. Now,

\displaystyle \lim_{n \to \infty} \frac{a_{n+1} - a_n}{b_{n+1} - b_n} = \lim_{n \to \infty} \frac{(n+1)^k}{(n+1)^{k+1} - n^{k+1}}

\displaystyle = \lim_{n \to \infty} \frac{(n+1)^k}{\left(1 + \binom{k+1}{1}n + \binom{k+1}{2}n^2 + \ldots + \binom{k+1}{k}n^k + n^{k+1}\right) - n^{k+1}}

(using the binomial theorem)

\displaystyle = \lim_{n \to \infty} \frac{(n+1)^k / n^k}{\left(1 + \binom{k+1}{1}n + \binom{k+1}{2}n^2 + \ldots + \binom{k+1}{k}n^k \right) / n^k}

\displaystyle = \lim_{n \to \infty} \frac{(1 + 1/n)^k}{\binom{k+1}{k}} = \frac1{k+1}.

Therefore, using the Stolz-Cesàro theorem, we conclude that the required limit is also 1/(k+1).

Let us now look at another problem where applying the aforesaid theorem makes our job a lot easier. This problem is an example of one that is not amenable to the other usual methods of evaluating limits.

Problem 2: Let k\geq 2 be integers and suppose C: y = \sqrt {2x + 1}\ (x > 0). Given the tangent line at the point (a_{k},\, \sqrt {2a_{k} + 1}) from the point (0, k) to C, evaluate

\displaystyle \lim_{n\to\infty}\frac {1}{n^{3}}\sum_{k = 2}^{n}a_{k}.

Solution:(This is basically the solution I had offered elsewhere a while ago; so, it’s pretty much copy/paste!)

\displaystyle y = \sqrt {2x + 1}\Rightarrow \frac {dy}{dx} = \frac {1}{\sqrt {2x + 1}}.

So, the equation of the tangent line at the point (a_{k}, \sqrt {2a_{k} + 1}) is given by

\displaystyle y - \sqrt {2a_{k} + 1} = \frac {1}{\sqrt {2a_{k} + 1}}(x - a_{k}).

Since the point (0,k) lies on this line, we must have

\displaystyle k - \sqrt {2a_{k} + 1} = \frac {1}{\sqrt {2a_{k} + 1}}( - a_{k})

The above, after squaring and some algebraic manipulation yields
a_{k}^{2} + 2(1 - k^{2})a_{k} + 1 - k^{2} = 0, which implies a_{k} = (k^{2} - 1) + k(\sqrt {k^{2} - 1}). We drop the negative root because a_{k} > 0 for all k\ge 2.

(This is where the Stolz-Cesàro theorem actually comes into play!)

Now, let (b_{n}) and (c_{n}) be two sequences such that \displaystyle b_{n} = \sum_{k = 2}^{n}a_{k} and c_{n} = n^{3}.
Note that (c_{n}) is a positive, increasing and unbounded sequence.

Therefore, \displaystyle \lim_{n\to \infty}\frac {b_{n + 1} - b_{n}}{c_{n + 1} - c_{n}} = \lim_{n\rightarrow \infty}\frac {\sum_{k = 2}^{n + 1}a_{k} - \sum_{k = 2}^{n}a_{k}}{(n + 1)^{3} - n^{3}} = \lim_{n\rightarrow \infty}\frac {a_{n + 1}}{3n^{2} + 3n + 1}

\displaystyle = \lim_{n\to \infty}\frac {(n + 1)^{2} - 1 + (n + 1)\sqrt {(n + 1)^{2} - 1}}{3n^{2} + 3n + 1}

\displaystyle = \lim_{n \to \infty} \frac{(1 + 1/n)^2 - 1/n^2 + (1 + 1/n)\sqrt{1 + 2/n}}{3 + 3/n + 1/n^2}

= 2/3.

Therefore, by the Stolz- Cesàro theorem, we have

\displaystyle \lim_{n\to \infty}\frac {b_{n}}{c_{n}} = 2/3 \,, and so

\displaystyle \lim_{n\to \infty}\frac {1}{n^{3}}\sum_{k = 2}^{n}a_{k} = 2/3.

The late great Paul Erdös was not a religious man (his take on religion seems to have been fairly ironic, referring for example to God as “The Supreme Fascist”), except of course when it came to mathematics. Ever the Platonist, he considered that when he died, he might finally get a chance to gaze upon “The Book” which, as if written by God, contains the most beautiful and enlightening proofs of all theorems. The highest form of praise from Erdös for a proof was, “It’s straight from The Book!” He also said, “You don’t have to believe in God, but you should believe in The Book!”

Do you believe in The Book? I’m not sure I do!

In fact, there is this book by Aigner and Ziegler, “Proofs from The Book”. In it they include the following one-sentence proof by Don Zagier on Fermat’s two square theorem (that a prime congruent to 1 \pmod 4 is a sum of two squares):

A One-Sentence Proof That Every Prime p congruent to 1 modulo 4 Is a Sum of Two Squares

                                        D. Zagier

      Department of Mathematics, University of Maryland, College Park, MD 20742

The involution on a finite set S = {(x,y,z) \in N^3 : x^2 +4yz = p } defined by

                      ( x+2z, z, y-x-z )  if   x < y-z
       (x,y,z) ---> { ( 2y-x, y, x-y+z )  if y-z < x < 2y
                      ( x-2y, x-y+z, y )  if   x > 2y

has exactly one fixed point, so |S| is odd and the involution defined by 

       (x,y,z) ---> (x,z,y)

also has a fixed point.

I plucked this off the Web from here; the author of the page prefaces it with a comment:

The following constitutes the essential text of a complete research article; I have
omitted only some comments at the end concerning the history of this type of argument.
The author reproves a famous result.  He builds his proof into a single sentence
as simply a tour-de-force.  In fact, he has left many straightforward steps for
the reader to verify.

1.  As an exercise in critical reading, list all the implicit claims that the
reader must verify in order to accept this argument as a proof.

2.  As an exercise in logic and algebra, supply all the details necessary to
support these claims.   Package all this as a long-winded rewrite of Zagier's
article written so that any high school algebra student could easily read it
with comprehension.

You should expect to expand Zagier's single sentence to a full page or more.

Um, yeah.

My own reaction to this proof: it is surely dazzling in its compression, although one’s first reaction is likely to be “WTF?!?” — what just happened here? The underlying idea is that the number of fixed points of an involution f: S \to S on a finite set S (i.e., a function f equal to its own inverse) has the same parity as S itself; it follows that if S has odd parity, then any involution on S has at least one fixed point; such a fixed point of the involution (x, y, z) \leftrightarrow (x, z, y) on Zagier’s set S yields a solution (x, y) to x^2 + 4y^2 = p, whence the theorem. So the bulk of the proof is in showing that S has odd parity, by showing that his nontrivial involution has exactly one fixed point.

And I guess you can see, by staring at his casewise-defined involution for a while, that its only fixed point is (x, y, z) = (1, 1, n) where p = 4n+1. It then remains to check that this really is a well-defined function from S to S, and it really is an involution. The full verification probably does take up at least a page.

It truly is a jaw-dropping proof. My problem though is that it looks like black magic. I mean, I can construct a line-by-line verification that the proof does what it purports to do, but in a deeper sense I still don’t get it. How Zagier cooked up this involution is a mystery to me, and unless I made a concerted effort to memorize it, it would remain unmemorable to me (that is, unless someone were to reveal the underlying mystery to me — I suspect that that would take a few sentences or more! Can anyone help me?).

What do you think — does it qualify as a Book proof? Me personally, I prefer proofs which are enlightening — arguments that I can really understand, proofs that stick, proofs I can take with me to the grave. Put it this way: if God were to write a proof which consumed an absolute minimum number of bytes in some optimal language, it still wouldn’t be much of a Book proof to me unless I (a limited human) could really understand it, and if it were really better in that sense than its closest competitors.

I don’t think I believe too strongly in the reality of “Book proofs”, or at least I’m skeptical that every theorem can be said to have a Book proof. Every mathematical statement and proof is embodied in some larger context or matrix of ideas, many requiring patient assimilation before a light suddenly flashes on. I tend to believe that’s the rule rather than the exception, and the idea that we should believe in a Book proof for every theorem, possessing a snappy immediacy which cries “Behold!”, is based on a dangerous and even crazy fallacy concerning the nature of mathematics.

[At the same time: we can all agree that Erdös was an absolute genius at finding Book proofs! :-) ]

In this post, I’d like to move from abstract, general considerations of Boolean algebras to more concrete ones, by analyzing what happens in the finite case. A rather thorough analysis can be performed, and we will get our first taste of a simple categorical duality, the finite case of Stone duality which we call “baby Stone duality”.

Since I have just mentioned the “c-word” (categories), I should say that a strong need for some very basic category theory makes itself felt right about now. It is true that Marshall Stone stated his results before the language of categories was invented, but it’s also true (as Stone himself recognized, after categories were invented) that the most concise and compelling and convenient way of stating them is in the language of categories, and it would be crazy to deny ourselves that luxury.

I’ll begin with a relatively elementary but very useful fact discovered by Stone himself — in retrospect, it seems incredible that it was found only after decades of study of Boolean algebras. It says that Boolean algebras are essentially the same things as what are called Boolean rings:

Definition: A Boolean ring is a commutative ring (with identity 1) in which every element x is idempotent, i.e., satisfies x^2 = x.

Before I explain the equivalence between Boolean algebras and Boolean rings, let me tease out a few consequences of this definition.

Proposition 1: For every element x in a Boolean ring, 2x = 0.

Proof: By idempotence, we have x + 1 = (x+1)^2 = x^2 + 2x + 1. Since x = x^2, we may additively cancel in the ring to conclude 0 = 2x. \Box

This proposition implies that the underlying additive group of a Boolean ring is a vector space over the field \mathbb{Z}_2 consisting of two elements. I won’t go into details about this, only that it follows readily from the proposition if we define a vector space over \mathbb{Z}_2 to be an abelian group V together with a ring homomorphism \mathbb{Z}_2 \to Hom(V, V) to the ring of abelian group homomorphisms from V to itself (where such homomorphisms are “multiplied” by composing them; the idea is that this ring homomorphism takes an element r = 0, 1 to scalar-multiplication r \cdot (-): V \to V).

Anyway, the point is that we can now apply some linear algebra to study this \mathbb{Z}_2-vector space; in particular, a finite Boolean ring B is a finite-dimensional vector space over \mathbb{Z}_2. By choosing a basis, we see that B is vector-space isomorphic to \mathbb{Z}_{2}^{n} where n is the dimension. So the cardinality of a finite Boolean ring must be of the form 2^n. Hold that thought!

Now, the claim is that Boolean algebras and Boolean rings are essentially the same objects. Let me make this more precise: given a Boolean ring B, we may construct a corresponding Boolean algebra structure on the underlying set of B, uniquely determined by the stipulation that the multiplication x \cdot y of the Boolean ring match the meet operation x \wedge y of the Boolean algebra. Conversely, given a Boolean algebra B, we may construct a corresponding Boolean ring structure on B, and this construction is inverse to the previous one.

In one direction, suppose B is a Boolean ring. We know from before that a binary operation on a set B that is commutative, associative, unital [has a unit or identity] and idempotent — here, the multiplication of B — can be identified with the meet operation of a meet-semilattice structure on B, uniquely specified by taking its partial order to be defined by: x \leq y iff x = x \cdot y. It immediately follows from this definition that the additive identity 0 \in B satisfies 0 \leq y for all y (is the bottom element), and the multiplicative identity 1 \in B satisfies x \leq 1 for all x (is the top element).

Notice also that x \wedge (1-x) = x (1-x) = 0, by idempotence. This leads one to suspect that 1-x will be the complement of x in the Boolean algebra we are trying to construct; we are partly encouraged in this by noting x = 1 - (1 - x), i.e., x is equal to its putative double negation.

Proposition 2: x \mapsto 1-x is order-reversing.

Proof: Looking at the definition of the order, this says that if x = x y, then 1-y = (1-x)(1-y). This is immediate. \Box

So, x \mapsto 1 - x is an order-reversing map B \to B (an order-preserving map B \to B^{op}) which is a bijection (since it is its own inverse). We conclude that B \to B^{op}: x \mapsto 1-x is a poset isomorphism. Since B has meets and B \cong B^{op}, B^{op} also has meets (and the isomorphism preserves them). But meets in B^{op} are joins in B. Hence B has both meets and joins, i.e., is a lattice. More exactly, we are saying that the function f(x) = 1 - x takes meets in B to joins in B; that is,

f(x \wedge y) = 1 - x y = f(x) \vee f(y) = (1 - x) \vee (1 - y)

or, replacing x by 1-x and y by 1-y,

1 - (1-x)(1-y) = x \vee y

whence x \vee y = x + y - x y = x + y + xy, using the proposition 1 above.

Proposition 3: 1 - x is the complement of x.

Proof: We already saw x \wedge (1-x) = x(1-x) = 0. Also

x \vee (1-x) = x + (1 - x) + x(1-x) = x + (1-x) + 0 = 1,

using the formula for join we just computed. This completes the proof. \Box

So the lattice is complemented; the only thing left to check is distributivity. Following the definitions, we have (x \vee y) \wedge z = (x + y + xy)z = xz + yz + xyz. On the other hand, (x \wedge z) \vee (y \wedge z) = xz + yz + (xz)(yz) = xz + yz + xyz, using idempotence once again. So the distributive law for the lattice is satisfied, and therefore we get a Boolean algebra from a Boolean ring.

Naturally, we want to invert the process: starting with a Boolean algebra structure on a set B, construct a corresponding Boolean ring structure on B whose multiplication is the meet of the Boolean algebra (and also show the two processes are inverse to one another). One has to construct an appropriate addition operation for the ring. The calculations above indicate that the addition should satisfy x \vee y = x + y + x \wedge y, so that x \vee y = x + y if x \wedge y = 0 (i.e., if x and y are disjoint): this gives a partial definition of addition. Continuing this thought, if we express x \vee y = x + y + x \wedge y as a disjoint sum of some element a and x \wedge y, we then conclude x \vee y = a + x \wedge y, whence a = x + y by cancellation. In the case where the Boolean algebra is a power set PX, this element a is the symmetric difference of x and y. This generalizes: if we define the addition by the symmetric difference formula x + y := (\neg x \wedge y) \vee (x \wedge \neg y), then x + y is disjoint from x \wedge y, so that

(x + y) + x \wedge y

= (x + y) \vee (x \wedge y) = (\neg x \wedge y) \vee (x \wedge \neg y) \vee (x \wedge y) = x \vee y

after a short calculation using the complementation and distributivity axioms. After more work, one shows that x + y is the addition operation for an abelian group, and that multiplication distributes over addition, so that one gets a Boolean ring.

Exercise: Verify this last assertion.

However, the assertion of equivalence between Boolean rings and Boolean algebras has a little more to it: recall for example our earlier result that sup-lattices “are” inf-lattices, or that frames “are” complete Heyting algebras. Those results came with caveats: that while e.g. sup-lattices are extensionally the same as inf-lattices, their morphisms (i.e., structure-preserving maps) are different. That is to say, the category of sup-lattices cannot be considered “the same as” or equivalent to the category of inf-lattices, even if they have the same objects.

Whereas here, in asserting Boolean algebras “are” Boolean rings, we are making the stronger statement that the category of Boolean rings is the same as (is isomorphic to) the category of Boolean algebras. In one direction, given a ring homomorphism f: B \to C between Boolean rings, it is clear that f preserves the meet x \cdot y and join x + y + x y of any two elements x, y [since it preserves multiplication and addition] and of course also the complement 1 + x of any x; therefore f is a map of the corresponding Boolean algebras. Conversely, a map f: B \to C of Boolean algebras preserves meet, join, and complementation (or negation), and therefore preserves the product x \wedge y and sum (\neg x \wedge y) \vee (x \wedge \neg y) in the corresponding Boolean ring. In short, the operations of Boolean rings and Boolean algebras are equationally interdefinable (in the official parlance, they are simply different ways of presenting of the same underlying Lawvere algebraic theory). In summary,

Theorem 1: The above processes define functors F: \mbox{BoolRing} \to \mbox{BoolAlg}, G: \mbox{BoolAlg} \to \mbox{BoolRing}, which are mutually inverse, between the category of Boolean rings and the category of Boolean algebras.

  • Remark: I am taking some liberties here in assuming that the reader is already familiar with, or is willing to read up on, the basic notion of category, and of functor (= structure-preserving map between categories, preserving identity morphisms and composites of morphisms). I will be introducing other categorical concepts piece by piece as the need arises, in a sort of apprentice-like fashion.

Let us put this theorem to work. We have already observed that a finite Boolean ring (or Boolean algebra) has cardinality 2^n — the same as the cardinality of the power set Boolean algebra PX if X has cardinality n. The suspicion arises that all finite Boolean algebras arise in just this way: as power sets of finite sets. That is indeed a theorem: every finite Boolean algebra B is naturally isomorphic to one of the form PX; one of our tasks is to describe X in terms of B in a “natural” (or rather, functorial) way. From the Boolean ring perspective, X is a basis of the underlying \mathbb{Z}_2-vector space of B; to pin it down exactly, we use the full ring structure.

X is naturally a basis of PX; more precisely, under the embedding i: X \to PX defined by x \mapsto \{x\}, every subset S \subseteq X is uniquely a disjoint sum of finitely many elements of i(X): S = \sum_{x \in X} a_x(S) \{x\} where a_x(S) \in \{0, 1\} = \mathbb{Z}_2: naturally, a_x(S) = 1 iff x \in S. For each S, we can treat the coefficient a_x(S) as a function of x valued in \mathbb{Z}_2. Let \hom(X, \mathbb{Z}_2) denote the set of functions X \to \mathbb{Z}_2; this becomes a Boolean ring under the obvious pointwise definitions (f + g)(x) := f(x) + g(x) and (f g)(x) = f(x) g(x). The function PX \to \hom(X, \mathbb{Z}_2) which takes S \in PX to the coefficient function a_{(-)}(S) is a Boolean ring map which is one-to-one and onto, i.e., is a Boolean ring isomorphism. (Exercise: verify this fact.)

Or, we can turn this around: for each x \in X, we get a Boolean ring map PX \to \mathbb{Z}_2 which takes S to a_x(S). Let \mbox{Bool}(PX, \mathbb{Z}_2) denote the set of Boolean ring maps PX \to \mathbb{Z}_2.

Proposition 4: For a finite set X, the function X \to \mbox{Bool}(PX, \mathbb{Z}_2) that sends x to a_x(-) is a bijection (in other words, an isomorphism).

Proof: We must show that for every Boolean ring map \phi: PX \to \mathbb{Z}_2, there exists a unique x \in X such that \phi = a_x(-), i.e., such that \phi(T) = a_x(T) for all T \in PX. So let \phi be given, and let S be the intersection (or Boolean ring product) of all T \in PX for which \phi(T) = 1. Then

\phi(S) = \phi(\prod_{T: \phi(T) = 1} T) = \prod_{T: \phi(T) = 1} \phi(T) = 1.

I claim that S must be a singleton \{x\} for some (evidently unique) x \in X. For 1 = \phi(S) = \phi(\sum_{x \in S} \{x\}) = \sum_{x \in S} \phi(\{x\}), forcing \phi(\{x\}) = 1 for some x \in S. But then S \subseteq \{x\} according to how S was defined, and so S = \{x\}. To finish, I now claim \phi(T) = a_x(T) for all T \in PX. But \phi(T) = 1 iff S \subseteq T iff x \in T iff a_x(T) = 1. This completes the proof. \Box

This proposition is a vital clue, for if B is to be isomorphic to a power set PX (equivalently, to \hom(X, \mathbb{Z}_2)), the proposition says that the X in question can be retrieved reciprocally (up to isomorphism) as \mbox{Bool}(B, \mathbb{Z}_2).

With this in mind, our first claim is that there is a canonical Boolean ring homomorphism

B \to \hom(\mbox{Bool}(B, \mathbb{Z}_2), \mathbb{Z}_2)

which sends b \in B to the function eval_b which maps \phi \in \mbox{Bool}(B, \mathbb{Z}_2) to \phi(b) (i.e., evaluates \phi at b). That this is a Boolean ring map is almost a tautology; for instance, that it preserves addition amounts to the claim that eval_{b+c}(\phi) = eval_b(\phi) + eval_c(\phi) for all \phi \in \mbox{Bool}(B, \mathbb{Z}_2). But by definition, this is the equation \phi(b+c) = \phi(b) + \phi(c), which holds since \phi is a Boolean ring map. Preservation of multiplication is proved in exactly the same manner.

Theorem 2: If B is a finite Boolean ring, then the Boolean ring map

eval: B \to \hom(\mbox{Bool}(B, \mathbb{Z}_2), \mathbb{Z}_2)

is an isomorphism. (So, there is a natural isomorphism B \cong P(\mbox{Bool}(B, \mathbb{Z}_2).)

Proof: First we prove injectivity: suppose b \in B is nonzero. Then \neg b \neq 1, so the ideal (\neg b) = \{a \cdot \neg b: a \in B\} = \{x \in B: x \leq \neg b\} is a proper ideal. Let I be a maximal proper ideal containing \neg b, so that B/I is both a field and a Boolean ring. Then B/I \cong \mathbb{Z}_2 (otherwise any element x \in B/I not equal to 0, 1 \in B/I would be a zero divisor on account of x(1-x) = 0). The evident composite

B \to B/I \cong \mathbb{Z}_2

yields a homomorphism \phi:  B  \to \mathbb{Z}_2 for which \phi(\neg b) = \phi(1-b) = 0, so \phi(b) = eval_b(\phi) = 1. Therefore eval_b is nonzero, as desired.

Now we prove surjectivity. A function g: \mbox{Bool}(B, \mathbb{Z}_2) \to \mathbb{Z}_2 is determined by the set of elements \phi mapping to 1 under g, and each such homomorphism \phi: B \to \mathbb{Z}_2, being surjective, is uniquely determined by its kernel, which is a maximal ideal. Let J be the intersection of these maximal ideals; it is an ideal. Notice that an ideal is closed under joins in the Boolean algebra, since if x, y belong to J, then so does x \vee y = x + y + x y. Let j be the join of the finitely many elements of J; notice J = \{x \in B: x \leq j\} = (j) (actually, this proves that every ideal of a finite Boolean ring B is principal). In fact, writing k_\phi for the unique element such that \ker(\phi) = (k_\phi), we have

j = \bigwedge_{\phi: g(\phi) = 1} k_\phi

(certainly j \leq k_\phi for all such \phi, since J \subseteq \ker(\phi) = \{x \in B: x \leq k_\phi\}, but also \bigwedge_{g(\phi) = 1} k_\phi belongs to the intersection of these kernels and hence to J = \{x \in B: x \leq j\}, whence \bigwedge_{g(\phi) = 1} k_\phi \leq j).

Now let b = 1- j; I claim that g = eval_b, proving surjectivity. We need to show g(\phi) = eval_b(\phi) = \phi(b) for all \phi \in \mbox{Bool}(B, \mathbb{Z}_2). In one direction, we already know from the above that if g(\phi) = 1, then j belongs to the kernel of \phi, so \phi(j) = 0, whence \phi(b) = \phi(1-j) = 1.

For the other direction, suppose \psi(b) = 1, or that \psi(j) = 0. Now the kernel of \psi is principal, say (k) for some k \neq 1. We have j \leq k, so

k = k \vee j = k \vee \bigwedge_{g(\phi) = 1} k_\phi = \bigwedge_{g(\phi) = 1} k \vee k_\phi

from which it follows that k \vee k_\phi \neq 1 for some \phi \in g^{-1}(1). But then (k \vee k_\phi) is a proper ideal containing the maximal ideals (k) and (k_\phi); by maximality it follows that (k) = (k \vee k_\phi) = (k_\phi). Since \psi and \phi have the same kernels, they are equal. And therefore g(\psi) = g(\phi) = 1. We have now proven both directions of the statement (\psi(b) = 1 if and only if g(\psi) = 1), and the proof is now complete. \Box

  • Remark: In proving both injectivity and surjectivity, we had in each case to pass back and forth between certain elements b and their negations, in order to take advantage of some ring theory (kernels, principal ideals, etc.). In the usual treatments of Boolean algebra theory, one circumvents this passage back-and-forth by introducing the notion of a filter of a Boolean algebra, dual to the notion of ideal. Thus, whereas an ideal is a subset I \subseteq B closed under joins and such that x \wedge y \in I for x \in B, y \in I, a filter is (by definition) a subset F closed under meets and such that x \vee y \in F whenever x \in B, y \in F (this second condition is equivalent to upward-closure: y \in F and y \leq x implies x \in F). There are also notions of principal filter and maximal filter, or ultrafilter as it is usually called. Notice that if I is an ideal, then the set of negations \{\neg x: x \in I\} is a filter, by the De Morgan laws, and vice-versa. So via negation, there is a bijective correspondence between ideals and filters, and between maximal ideals and ultrafilters. Also, if f: B \to C is a Boolean algebra map, then the inverse image f^{-1}(1) is a filter, just as the inverse image f^{-1}(0) is an ideal. Anyway, the point is that had we already had the language of filters, the proof of theorem 2 could have been written entirely in that language by straightforward dualization (and would have saved us a little time by not going back and forth with negation). In the sequel we will feel free to use the language of filters, when desired.

For those who know some category theory: what is really going on here is that we have a power set functor

P(-) = \hom(-, \mathbb{Z}_2): \mbox{FinSet}^{op} \to \mbox{FinBool}

(taking a function f: X \to Y between finite sets to the inverse image map f^{-1}: PY \to PX, which is a map between finite Boolean algebras) and a functor

Q(-) = \mbox{Bool}(-, \mathbb{Z}_2): \mbox{FinBool}^{op} \to \mbox{FinSet}

which we could replace by its opposite Q(-)^{op}: \mbox{FinBool} \to \mbox{FinSet}^{op}, and the canonical maps of proposition 4 and theorem 2,

X \to \mbox{Bool}(\hom(X, \mathbb{Z}_2), \mathbb{Z}_2),

B \to \hom(\mbox{Bool}(B, \mathbb{Z}_2), \mathbb{Z}_2),

are components (at X and B) of the counit and unit for an adjunction Q(-)^{op} \dashv P(-). The actual statements of proposition 4 and theorem 2 imply that the counit and unit are natural isomorphisms, and therefore we have defined an adjoint equivalence between the categories \mbox{FinSet}^{op} and \mbox{FinBool}. This is the proper categorical statement of Stone duality in the finite case, or what we are calling “baby Stone duality”. I will make some time soon to explain what these terms mean.

I will write a series of posts as a way of gently introducing category theory to the ‘beginner’, though I will assume that the beginner will have some level of mathematical maturity. This series will be based on the the book, Conceptual Mathematics: A first introduction to categories by Lawvere and Schanuel. So, this won’t go into most of the deeper stuff that’s covered in, say, Categories for the Working Mathematician by Mac Lane. We shall deal only with sets (as our objects) and functions (as our morphisms). This means we deal only with the Category of Sets! Therefore, the reader is not expected to know about advanced stuff like groups and/or group homomorphisms, vector spaces and/or linear transformations, etc. Also, no knowledge of college level calculus is required. Only knowledge of sets and functions, some familiarity in dealing with mathematical symbols and some knowledge of elementary combinatorics are required. That’s all!

Sets, maps and composition

An object (in this category) is a finite set or collection.

A map f: A \to B (in this category) consists of the following:

i) a set A called the domain of the map;

ii) a set B called the codomain of the map; and

iii) a rule assigning to each a \in A, an element b \in B.

We also use ‘function’, ‘transformation’, ‘operator’, ‘arrow’ and ‘morphism’ for ‘map’ depending on the context, as we shall see later.

An endomap f: A \to A is a map that has the same object as domain and codomain, which in this case is A.

An endomap in which f(a) = a for every a \in A is called an identity map, also denoted by 1_A.

Composition of maps is a process by which two maps are combined to yield a third map. Composition of maps is really at the heart of category theory, and this will be evident in plenty in the later posts. So, if we have two maps f: X \to Y and g:Y \to Z, then g \circ f: X \to Z is the third map obtained by composing f and g. Note that g \circ f is read ‘g following f‘.

Guess what? Those are all the ingredients we need to define our category of sets! Though we shall deal only with sets and functions, the following definition of a category of sets is actually pretty much the same as the general definition of a category.

Definition: A CATEGORY consists of the following:

(1) OBJECTS: these are usually denoted by A, B, C, \ldots etc.

(2) MAPS: these are usually denoted by f, g, h, \ldots etc.

(3) For each map f, one object as DOMAIN of f and one object as CODOMAIN of f. So, f: A \to B has domain A and codomain B. This is also read as ‘f is a map from A to B‘.

(4) For each object A, there exists an IDENTITY MAP, 1_A. This is also written as 1_A: A \to A.

(5) For each pair of maps f: A \to B and g: B \to C, there exists a COMPOSITE map, g \circ f: A \to C. ( ‘g following f‘.)

such that the following RULES are satisfied:

(i) (IDENTITY LAWS): If f: A \to B, then we have 1_B \circ f = f and f \circ 1_A = f.

(ii) (ASSOCIATIVE LAW): If f: A \to B , \, g: B \to C and h: C \to D, then we have (h \circ g)\circ f = h \circ (g \circ f). ( ‘h following g following f‘) Note that in this case we are allowed to write h \circ g \circ f without any ambiguity.

Exercises: Suppose A = \{ a, b, c, d \} and B = \{ 1, 2, 3 \}.

(1) How many maps f are there from A to B?

(2) How many maps f are there from A to A?

(3) How many maps f are there from B to A?

(4) How many maps f are there from B to B?

(5) How many maps f are there from A to A satisfying f \circ f = f?

(This is a non-trivial exercise for the general case in which |A| = n for some positive integer n.)

(6) How many maps g are there from B to B satisfying g \circ g = g?

(7) Can you find a pair of maps f: A \to B and g: B \to A such that g \circ f = 1_A. If yes, how many pairs can you find?

(8 ) Can you find a pair of maps h: B \to A and k: A \to B such that k \circ h = 1_B. If yes, how many pairs can you find?

Bonus exercise:

How many maps f: A \to B are there if A is the empty set and |B| = n for some n \in \mathbb{N}? What if |A| = n and B is the empty set? What if both A and B are empty?

In this installment, I will introduce the concept of Boolean algebra, one of the main stars of this series, and relate it to concepts introduced in previous lectures (distributive lattice, Heyting algebra, and so on). Boolean algebra is the algebra of classical propositional calculus, and so has an abstract logical provenance; but one of our eventual goals is to show how any Boolean algebra can also be represented in concrete set-theoretic (or topological) terms, as part of a powerful categorical duality due to Stone.

There are lots of ways to define Boolean algebras. Some definitions were for a long time difficult conjectures (like the Robbins conjecture, established only in the last ten years or so with the help of computers) — testament to the richness of the concept. Here we’ll discuss just a few definitions. The first is a traditional one, and one which is pretty snappy:

A Boolean algebra is a distributive lattice in which every element has a complement.

(If X is a lattice and x \in X, a complement of x is an element y such that x \wedge y = 0 and x \vee y = 1. A lattice is said to be complemented if every element has a complement. Observe that the notions of complement and complemented lattice are manifestly self-dual. Since the notion of distributive lattice is self-dual, so therefore is the notion of Boolean algebra.)

  • Example: Probably almost everyone reading this knows the archetypal example of a Boolean algebra: a power set PX, ordered by subset inclusion. As we know, this is a distributive lattice, and the complement