You are currently browsing the category archive for the ‘Exposition’ category.

Welcome to the 54th Carnival of Mathematics, and Happy Fourth of July to our American readers! Indeed, the carnival should have been hosted yesterday, and I apologize for being a day late.

Trivia: Today, we have the 234th Independence Day celebrations in the  US, and ours is the 54th carnival. 2+3+4 = 5+4, see? Boy, do I feel so clever!

Ok, let’s begin, now!

We start off with a post, submitted by Shai Deshe, that presents a collection of YouTube videos explaining different kinds of infinities in set theory, causality vs conditionality in probability and some topology. The videos are the kind of ones that “math people” could use to explain a few mathematical concepts to their friends, family members and colleagues who may not be enamored of math very much but may still possess a lingering interest in it.

Experimental philosophy, according to the Experimental Philosophy Society, “involves the collection of empirical data to shed light on philosophical issues“. As such, a careful quantitative analyses of results of experiments are used to shed light on many philosophical issues/debates. Anthony Chemero wrote a post titled, ‘What Situationist Experiments Show‘, that links to a paper with the same title that he coauthored with John Campbell and Sarah Meerschaert. In the paper, the authors, through quantitative analyses of actual experimental data, argue that virtue ethics has not lost to the siuationist side, whose critiques of virtue theory are far from convincing.

Next, I would like to bring the readers’ attention to two math blogs that came into existence somewhat recently and which I think have a lot of really good mathematical content. They are Annoying Precision and A Portion of the Book. In my opinion, their blog posts contain a wealth of mathematical knowledge, especially for undergraduates (and graduate students too!), who, if inclined toward problem-solving, will enjoy the posts even more. Go ahead and dive into them!

At Annoying Precision, a project aimed at the “Generally Interested Lay Audience” that Qiaochu Yuan started aims “to build up to a discussion of the Polya enumeration theorem without assuming any prerequisites other than a passing familiarity with group theory.” It begins with GILA I: Group Actions and Equivalence Relations, the last post of the series being GILA VI: The cycle index polynomials of the symmetric groups.

Usually, undergrads hardly think integrals have much to do with combinatorics. At A Portion of the Book, Masoud Zargar has a very nice post that deals with the intersection of Integrals, Combinatorics and Geometry.

Tom Escent submitted a link to an article titled, “Introduction to Nerds on Wall Street“, which actually provides a very small snapshot of the book named, Nerds on Wall Street: Math, Machines and Wired Markets whose author is David J. Leinweber. I haven’t read the book yet, but based on generally good reviews, it seems like it chronicles the contribution of Quant guys to Wall Street over the past several decades. Should be interesting to Math and CS majors, I think.

Let’s have a post on philosophy and logic, shall we? At Skeptic’s Play, there is a discussion on Gödel’s modal ontological argument regarding the possibility of existence of God. As someone who has just begun a self-study of modal logic, I will recommend Brian K. Chellas’ excellent introduction to the subject, titled Modal Logic: An Introduction.

Then, there is the Daily Integral, a blog dealing with solving elementary integrals and which I think may be particularly useful for high-school students.

Let me close this carnival by asking the reader, “What do you think is the world’s oldest mathematical artifact?” There are several candidates, and according to The Number Warrior, candidate #1 is The Lebombo Bone, found in the Lebombo Mountains of South Africa and Swaziland, that dates back to 35,000 BC!

That’s all for now! Thanks to everyone who made submissons.

After a long hiatus, I’d like to renew the discussion of axiomatic categorical set theory, more specifically the Elementary Theory of the Category of Sets (ETCS). Last time I blogged about this, I made some initial forays into “internalizing logic” in ETCS, and described in broad brushstrokes how to use that internal logic to derive a certain amount of the structure one associates with a category of sets. Today I’d like to begin applying some of the results obtained there to the problem of constructing colimits in a category satisfying the ETCS axioms (an ETCS category, for short).

(If you’re just joining us now, and you already know some of the jargon, an ETCS category is a well-pointed topos that satisfies the axiom of choice and with a natural numbers object. We are trying to build up some of the elementary theory of such categories from scratch, with a view toward foundations of mathematics.)

But let’s see — where were we? Since it’s been a while, I was tempted to review the philosophy behind this undertaking (why one would go to all the trouble of setting up a categories-based alternative to ZFC, when time-tested ZFC is able to express virtually all of present-day mathematics on the basis of a reasonably short list of axioms?). But in the interest of time and space, I’ll confine myself to a few remarks.

As we said, a chief difference between ZFC and ETCS resides in how ETCS treats the issue of membership. In ZFC, membership is a global binary relation: we can take any two “sets” A, B and ask whether A \in B. Whereas in ETCS, membership is a relation between entities of different sorts: we have “sets” on one side and “elements” on another, and the two are not mixed (e.g., elements are not themselves considered sets).

Further, and far more radical: in ETCS the membership relation x \in A is a function, that is, an element x “belongs” to only one set A at a time. We can think of this as “declaring” how we are thinking of an element, that is, declaring which set (or which type) an element is being considered as belonging to. (In the jargon, ETCS is a typed theory.) This reflects a general and useful philosophic principle: that elements in isolation are considered inessential, that what counts are the aggregates or contexts in which elements are organized and interrelated. For instance, the numeral ‘2’ in isolation has no meaning; what counts is the context in which we think of it (qua rational number or qua complex number, etc.). Similarly the set of real numbers has no real sense in isolation; what counts is which category we view it in.

I believe it is reasonable to grant this principle a foundational status, but: rigorous adherence to this principle completely changes the face of what set theory looks like. If elements “belong” to only one set at a time, how then do we even define such basic concepts as subsets and intersections? These are some of these issues we discussed last time.

There are other significant differences between ZFC and ETCS: stylistically, or in terms of presentation, ZFC is more “top-down” and ETCS is more “bottom-up”. For example, in ZFC, one can pretty much define a subset \{x \in X: P\} by writing down a first-order formula P in the language; the comprehension (or separation) axiom scheme is a mighty sledgehammer that takes care of the rest. In the axioms of ETCS, there is no such sledgehammer: the closest thing one has to a comprehension scheme in the ETCS axioms is the power set axiom (a single axiom, not an axiom scheme). However, in the formal development of ETCS, one derives a comprehension scheme as one manually constructs the internal logic, in stages, using the simple tools of adjunctions and universal properties. We started doing some of that in our last post. So: with ZFC it’s more as if you can just hop in the car and go; with ETCS you build the car engine from smaller parts with your bare hands, but in the process you become an expert mechanic, and are not so rigidly attached to a particular make and model (e.g., much of the theory is built just on the axioms of a topos, which allows a lot more semantic leeway than one has with ZF).

But, in all fairness, that is perhaps the biggest obstacle to learning ETCS: at the outset, the tools available [mainly, the idea of a universal property] are quite simple but parsimonious, and one has to learn how to build some set-theoretic and logical concepts normally taken as “obvious” from the ground up. (Talk about “foundations”!) On the plus side, by building big logical machines from scratch, one gains a great deal of insight into the inner workings of logic, with a corresponding gain in precision and control and modularity when one would like to use these developments to design, say, automated deduction systems (where there tend to be strong advantages to using type-theoretic frameworks).

Enough philosophy for now; readers may refer to my earlier posts for more. Let’s get to work, shall we? Our last post was about the structure of (and relationships between) posets of subobjects Sub(X) relative to objects X, and now we want to exploit the results there to build some absolute constructions, in particular finite coproducts and coequalizers. In this post we will focus on coproducts.

Note to the experts: Most textbook treatments of the formal development of topos theory (as for example Mac Lane-Moerdijk) are efficient but highly technical, involving for instance the slice theorem for toposes and, in the construction of colimits, recourse to Beck’s theorem in monad theory applied to the double power-set monad [following the elegant construction of Paré]. The very abstract nature of this style of argumentation (which in the application of Beck’s theorem expresses ideas of fourth-order set theory and higher) is no doubt partly responsible for the somewhat fearsome reputation of topos theory.

In these notes I take a much less efficient but much more elementary approach, based on an arrangement of ideas which I hope can be seen as “natural” from the point of view of naive set theory. I learned of this approach from Myles Tierney, who was my PhD supervisor, and who with Bill Lawvere co-founded elementary topos theory, but I am not aware of any place where the details of this approach have been written up before now. I should also mention that the approach taken here is not as “purist” as many topos theorists might want; for example, here and there I take advantage of the strong extensionality axiom of ETCS to simplify some arguments.

The Empty Set and Two-Valued Logic

We begin with the easy observation that a terminal category, i.e., a category \mathbf{1} with just one object and one morphism (the identity), satisfies all the ETCS axioms. Ditto for any category C equivalent to \mathbf{1} (where every object is terminal). Such boring ETCS categories are called degenerate; obviously our interest is in the structure of nondegenerate ETCS categories.

Let \mathbf{E} be an ETCS category (see here for the ETCS axioms). Objects of \mathbf{E} are generally called “sets”, and morphisms are generally called “functions” or “maps”.

Proposition 0: If an ETCS category \mathbf{E} is a preorder, then \mathbf{E} is degenerate.

Proof: Recall that a preorder is a category in which there is at most one morphism A \to B for any two objects A, B. Every morphism in a preorder is vacuously monic. If there is a nonterminal set A, then the monic A \to 1 to any terminal set defines a subset A \subseteq 1 distinct from the subset defined by 1 \to 1, thus giving (in an ETCS category) distinct classifying maps \chi_A, t: 1 \to P(1), contradicting the preorder assumption. Therefore all objects A are terminal. \Box

Assume from now on that \mathbf{E} is a nondegenerate ETCS category.

Proposition 1: There are at least two truth values, i.e., two elements 1 \to P(1), in \mathbf{E}.

Proof: By proposition 0, there exist sets X, Y and two distinct functions f, g: X \to Y. By the axiom of strong extensionality, there exists x: 1 \to X such that fx \neq gx. The equalizer E \to 1 of the pair fx, gx: 1 \to Y is then a proper subset of 1, and therefore there are at least two distinct elements \chi_E, \chi_1: 1 \to P(1). \Box

Proposition 2: There are at most two truth values 1 \to P(1); equivalently, there are at most two subsets of 1.

Proof: If U, V \subseteq 1 are distinct subsets of 1, then either U \neq U \cap V or V \neq U \cap V, say the former. Then 1_U: U \subseteq U and U \cap V \subseteq U are distinct subsets, with distinct classifying maps \chi_{1_U}, \chi_{U \cap V}: U \to P(1). By strong extensionality, there exists x: 1 \to U distinguishing these classifying maps. Because 1 is terminal, we then infer 1 \subseteq U and U \subseteq 1, so U = 1 as subsets of 1, and in that case only V can be a proper subset of 1. \Box

By propositions 1 and 2, there is a unique proper subset of the terminal object 1. Let 0 \to 1 denote this subset. Its domain may be called an “empty set”; by the preceding proposition, it has no proper subsets. The classifying map 1 \to P1 of 0 \subseteq 1 is the truth value we call “false”.

Proposition 3: 0 is an initial object, i.e., for any X there exists a unique function 0 \to X.

Proof: Uniqueness: if f, g: 0 \to X are maps, then their equalizer x: E \to 0, which is monic, must be an isomorphism since 0 has no proper subsets. Therefore f = g. Existence: there are monos

\displaystyle 0 \to 1 \stackrel{t_X}{\to} PX

\displaystyle X \stackrel{\sigma}{\to} PX

where t_X is “global truth” (classifying the subset X \subseteq X) on X and \sigma is the “singleton mapping x \mapsto \{x\}” on X, defined as the classifying map of the diagonal map \delta: X \subseteq X \times X (last time we saw \sigma is monic). Take their pullback. The component of the pullback parallel to \sigma is a mono P \to 0 which again is an isomorphism, whence we get a map 0 \cong P \to X using the other component of the pullback. \Box

Remark: For the “purists”, an alternative construction of the initial set 0 that avoids use of the strong extensionality axiom is to define the subset 0 \subseteq 1 to be “the intersection all subsets of 1“. Formally, one takes the extension \left[\phi\right] \subseteq 1 of the map

\displaystyle \phi = (1 \stackrel{t_{P1}}{\to} PP1 \stackrel{\bigcap}{\to} P1);

where the first arrow represents the class of all subsets of P1, and the second is the internal intersection operator defined at the end of our last post. Using formal properties of intersection developed later, this intersection 0 \subseteq 1 has no proper subsets, and then the proof of proposition 3 carries over verbatim. \Box

Corollary 1: For any X, the set 0 \times X is initial.

Proof: By cartesian closure, maps 0 \times X \to Y are in bijection with maps of the form 0 \to Y^X, and there is exactly one of these since 0 is initial. \Box

Corollary 2: If there exists f: X \to 0, then X is initial.

Proof: The composite of \langle f, 1_X \rangle: X \to 0 \times X followed by \pi_2: 0 \times X \to X is 1_X, and \pi_2 followed by \langle f, 1_X \rangle: X \to 0 \times X is also an identity since 0 \times X is initial by corollary 1. Hence X is isomorphic to an initial object 0 \times X. \Box

By corollary 2, for any object Y the arrow 0 \to Y is vacuously monic, hence defines a subset.

Proposition 4: If X \not\cong 0, then there exists an element x: 1 \to X.

Proof: Under the assumption, X has at least two distinct subsets: 0 \subseteq X and 1_X: X \subseteq X. By strong extensionality, their classifying maps \chi_0, \chi_1: X \to P1 are distinguished by some element x: 1 \to X.

External Unions and Internal Joins

One of the major goals in this post is to construct finite coproducts in an ETCS category. As in ordinary set theory, we will construct these as disjoint unions. This means we need to discuss unions first; as should be expected by now, in ETCS unions are considered locally, i.e., we take unions of subsets of a given set. So, let A, B \subseteq X be subsets.

To define the union A \cup B \subseteq X, the idea is to take the intersection of all subsets containing A and B. That is, we apply the internal intersection operator (constructed last time),

\displaystyle \bigcap: PPX \to PX,

to the element 1 \to PPX that represents the set of all subsets of X containing A and B; the resulting element 1 \to PX represents A \cup B. The element 1 \to PPX corresponds to the intersection of two subsets

\{C \in PX: A \subseteq C\} \cap \{C \in PX: B \subseteq C\} \subseteq PX.

Remark: Remember that in ETCS we are using generalized elements: C \in PX really means a function C: U \to PX over some domain U, which in turn classifies a subset \left[C\right] \subseteq U \times X. On the other hand, the A here is a subset A \subseteq X. How then do we interpret the condition “A \subseteq C“? We first pull back \chi_A: 1 \to PX over to the domain U; that is, we form the composite \displaystyle U \stackrel{!}{\to} 1 \stackrel{\chi_A}{\to} PX, and consider the condition that this is bounded above by C: U \to PX. (We will write \chi_A \leq C, thinking of the left side as constant over U.) Externally, in terms of subsets, this corresponds to the condition U \times A \subseteq \left[C\right].

We need to construct the subsets \{C \in PX: A \subseteq C\}, \{C \in PX: B \subseteq C\}. In ZFC, we could construct those subsets by applying the comprehension axiom scheme, but the axioms of ETCS have no such blanket axiom scheme. (In fact, as we said earlier, much of the work on “internalizing logic” goes to show that in ETCS, we instead derive a comprehension scheme!) However, one way of defining subsets in ETCS is by taking loci of equations; here, we express the condition A \subseteq C, more pedantically A \subseteq \left[C\right] or \chi_A \leq C, as the equation

(\chi_A \Rightarrow C) = t_X

where the right side is the predicate “true over X“.

Thus we construct the subset \{C \in PX: A \subseteq C\} of PX via the pullback:

{C: A ≤ C} -------> 1
     |              |
     |              | t_X
     V chi_A => -   V
    PX -----------> PX

Let me take a moment to examine what this diagram means exactly. Last time we constructed an internal implication operator

\Rightarrow: P1 \times P1 \to P1

and now, in the pullback diagram above, what we are implicitly doing is lifting this to an operator

\Rightarrow_X: PX \times PX \to PX

The easy and cheap way of doing this is to remember the isomorphism PX \cong P1^X we used last time to uncover the cartesian closed structure, and apply this to

\displaystyle P1^X \times P1^X \cong (P1 \times P1)^X \stackrel{\Rightarrow^X}{\to} P1^X

to define \Rightarrow_X: PX \times PX \to PX. This map classifies a certain subset of X \times PX \times PX, which I’ll just write down (leaving it as an exercise which involves just chasing the relevant definitions):

\{(x, T, S) \in X \times PX \times PX: x \in_X T \Rightarrow x \in_X S\}

Remark: Similarly we can define a meet operator \wedge_X: PX \times PX \to PX by exponentiating the internal meet P1 \times P1 \to P1. It is important to know that the general Heyting algebra identities which we established last time for P1 lift to the corresponding identities for the operators \wedge_X, \Rightarrow_X on PX. Ultimately this rests on the fact that the functor (-)^X, being a right adjoint, preserves products, and therefore preserves any algebraic identity which can be expressed as a commutative diagram of operations between such products.

Hence, for the fixed subset A \subseteq X (classified by \chi_A: 1 \to PX), the operator

\chi_A \Rightarrow -: PX \to PX

classifies the subset

\{(x, S): x \in_X A \Rightarrow x \in_X S\} \hookrightarrow X \times PX

Finally, in the pullback diagram above, we are pulling back the operator \chi_A \Rightarrow - against t_X. But, from last time, that was exactly the method we used to construct universal quantification. That is, given a subset

R \subseteq X \times Y

we defined \forall_Y R \subseteq X to be the pullback of t_Y: 1 \hookrightarrow PY along \chi_R: X \to PY. Putting all this together, the pullback diagram above expresses the definition

\{C \in PX: A \subseteq C\} := \{C \in PX: \forall_{x \in X} \ x \in_X A \Rightarrow x \in_X C\}

that one would expect “naively”.

Now that all the relevant constructions are in place, we show that A \cup B is the join of A and B in the poset Sub(X). There is nothing intrinsically difficult about this, but as we are still in the midst of constructing the internal logic, we will have to hunker down and prove some logic things normally taken for granted or zipped through without much thought. For example, the internal intersection operator was defined with the help of internal universal quantification, and we will need to establish some formal properties of that.

Here is a useful general principle for doing internal logic calculations. Let \chi_R: X \to PY be the classifying map of a subset R \subseteq X \times Y, and let f: U \to X be a function. Then the composite \chi_R f: U \to PY classifies the subset

(f \times 1_Y)^{-1}(R) \subseteq U \times Y

so that one has the general identity \chi_R f = \chi_{(f \times 1)^{-1}(R)}. In passing back and forth between the external and internal viewpoints, the general principle is to try to render “complicated” functions U \to PY into a form \chi_R f which one can more easily recognize. For lack of a better term, I’ll call this the “pullback principle”.

Lemma 1: Given a relation R \hookrightarrow X \times Y and a constant c: 1 \to Y, there is an inclusion

\forall_Y R \subseteq (1_X \times c)^{-1}(R)

as subsets of X. (In traditional logical syntax, this says that for any element c: 1 \to Y,

\forall_{y: Y} R(x, y) implies R(x, c)

as predicates over elements x \in X. This is the type of thing that ordinarily “goes without saying”, but which we actually have to prove here!)

Proof: As we recalled above, \forall_Y R \subseteq X was defined to be \chi_R^{-1}(t_Y), the pullback of global truth t_Y: 1 \to PY along the classifying map \chi_R: X \to PY. Hold that thought.

Let

Pc: PY \to P1

be the map which classifies the subset \{S \in PY: c \in_Y S\}. Equivalently, this is the map

P1^c: P1^Y \to P1^1

under the canonical isomorphisms PY \cong P1^Y, P1^1 \cong P1. Intuitively, this maps f \mapsto f(c), i.e., plugs an element c \in Y into an element f \in P1^Y.

Using the adjunction (- \times Y) \dashv (-)^Y of cartesian closure, the composite

\displaystyle X \stackrel{\chi_R}{\to} PY \stackrel{Pc}{\to} P1

transforms to the composite

X \times 1 \stackrel{1_X \times c}{\to} X \times Y \stackrel{\chi_{R}}{\to} P1

so by the pullback principle, (Pc)\chi_R: X \times 1 \to P1 classifies (1_X \times c)^{-1}(R) \subseteq X \times 1 \cong X.

Equivalently,

(1_X \times c)^{-1}(R) = \chi_{R}^{-1} (Pc)^{-1}(t) \qquad (1)

Also, as subsets of PY, we have the inclusion

(t_Y: 1 \hookrightarrow PY) \subseteq (Pc)^{-1}(t) \qquad (2)

[this just says that t_Y belongs to the subset classified by Pc, or equivalently that c: 1 \to Y is in the subset Y \subseteq Y]. Applying the pullback operation \chi_{R}^{-1} to (2), and comparing to (1), lemma 1 follows. \Box

Lemma 2: If R \subseteq S as subsets of X \times Y, then \forall_Y R \subseteq \forall_Y S.

Proof: From the last post, we have an adjunction:

T \times Y \subseteq S if and only if T \subseteq \forall_Y S

for any subset of T \subseteq X. So it suffices to show \forall_Y R \times Y \subseteq S. But

\forall_Y R \times Y \subseteq R \subseteq S

where the first inclusion follows from \forall_Y R \subseteq \forall_Y R. \Box

Next, recall from the last post that the internal intersection of F: 1 \to PPX was defined by interpreting the following formula on the right:

\displaystyle \bigcap F = \forall_{S \in PX} (S \in_{PX} F) \Rightarrow (x \in_X S)

Lemma 3: If F \leq G: 1 \to PPX, then \displaystyle \bigcap G \leq \bigcap F: 1 \to PX.

Proof: F: 1 \to PPX classifies the subset \{S \in PX: S \in_{PX} F\}, i.e.,  F is identified with the predicate S \in_{PX} F in the argument S, so by hypothesis (S \in_{PX} F) \leq (S \in_{PX} G) as predicates on S. Internal implication a \Rightarrow b is contravariant in the argument a [see the following remark], so

\left[(S \in G) \Rightarrow (x \in S)\right] \leq \left[(S \in F) \Rightarrow (x \in S)\right]

Now apply lemma 2 to complete the proof. \Box

Remark: The contravariance of - \Rightarrow b, that is, the fact that

x \leq y implies (y \Rightarrow b) \leq (x \Rightarrow b),

is a routine exercise using the adjunction [discussed last time]

a \wedge c \leq b if and only if c \leq (a \Rightarrow b).

Indeed, we have

x \wedge (y \Rightarrow b) \leq y \wedge (y \Rightarrow b) \leq b \qquad (*)

where the first inequality follows from the hypothesis x \leq y, and the second follows from y \Rightarrow b \leq y \Rightarrow b. By the adjunction, the inequality (*) implies (y \Rightarrow b) \leq (x \Rightarrow b). \Box

Theorem 1: For subsets A, B of X, the subset A \cup B is an upper bound of A and B, i.e., A, B \subseteq A \cup B.

Proof: It suffices to prove that \displaystyle A = \bigcap \{C \in PX: A \subseteq C\}, since then we need only apply lemma 3 to the trivially true inclusion

\{C \in PX: A \subseteq C\} \cap \{C \in PX: B \subseteq C\} \subseteq \{C \in PX: A \subseteq C\}

to infer A \subseteq A \cup B, and similarly B \subseteq A \cup B. (Actually, we need only show \displaystyle A \subseteq \bigcap \{C \in PX: A \subseteq C\}. We’ll do that first, and then show full equality.)

The condition we want,

A \subseteq \{x \in X: \forall_{S \in PX} (A \subseteq S) \Rightarrow (x \in_X S)\},

is, by the adjunction (- \times PX) \dashv \forall_{PX}: Sub(X \times PX) \to Sub(X), equivalent to

A \times PX \subseteq \{(x, S): A \subseteq S \Rightarrow (x \in_X S)\}

which, by a \wedge\Rightarrow adjunction, is equivalent to

(A \times PX) \cap (X \times \{S \in PX: A \subseteq S\}) \subseteq \ \in_X \qquad (1)

as subsets of X \times PX. So we just have to prove (1). At this point we recall, from our earlier analysis, that

\{S \in PX: A \subseteq S\} = \{S \in PX: \forall_{x \in X} x \in_X A \Rightarrow x \in_X S\}

Using the adjunction (X \times -) \dashv \forall_X, as in the proof of lemma 2, we have

X \times \{S \in PX: \forall_{x \in X} x \in_X A \Rightarrow x \in_X S\}

\subseteq \{(x, S): x \in_X A \Rightarrow x \in_X S\} := (A \times PX) \Rightarrow \in_X,

which shows that the left side of (1) is contained in

(A \times PX) \cap ((A \times PX) \Rightarrow \in_X) \subseteq \ \in_X,

where the last inclusion uses another \wedge\Rightarrow adjunction. Thus we have established (1) and therefore also the inclusion

\displaystyle A \subseteq \bigcap \{S \in PX: A \subseteq S\}

Now we prove the opposite inclusion

\displaystyle \bigcap \{S \in PX: A \subseteq S\} \subseteq A,

that is to say

\{x \in X: \forall_{S \in PX} A \subseteq S \Rightarrow x \in_X S\} \subseteq A \qquad (**)

Here we just use lemma 1, applied to the particular element \chi_A: 1 \to PX: we see that the left side of (**) is contained in

\{x \in X: A \subseteq \left[\chi_A\right] \Rightarrow x \in_X A\}

which collapses to \{x \in X: x \in_X A\} = A, since A = \left[\chi_A\right]. This completes the proof. \Box

Theorem 2: A \cup B is the least upper bound of A, B, i.e., if C \subseteq X is a subset containing both A \subseteq X and B \subseteq X, then A \cup B \subseteq C.

Proof: We are required to show that

\{x \in X: \forall_{S \in PX} \ (A \subseteq S \wedge B \subseteq S) \Rightarrow x \in_X S\} \subseteq C.

Again, we just apply lemma 1 to the particular element \chi_C: 1 \to PX: the left-hand side of the claimed inclusion is contained in

\{x \in X: (A \subseteq C \wedge B \subseteq C) \Rightarrow x \in_X C\}

but since (A \subseteq C \wedge B \subseteq C) is true by hypothesis (is globally true as a predicate on the implicit variable x \in X), this last subset collapses to

\{x \in X: t_X \Rightarrow x \in_X C\} = \{x \in X: x \in_X C\} = C

which completes the proof. \Box

Theorems 1 and 2 show that for any set X, the external poset Sub(X) admits joins. One may go on to show (just on the basis of the topos axioms) that as in the case of meets, the global external operation of taking joins is natural in X, so that by the Yoneda principle, it is classified by an internal join operation

\vee: P1 \times P1 \to P1,

namely, the map which classifies the union of the subsets

\displaystyle \left[\pi_1\right] = P1 \times 1 \stackrel{1 \times t}{\hookrightarrow} P1 \times P1

\displaystyle \left[\pi_2\right] = 1 \times P1 \stackrel{t \times 1}{\hookrightarrow} P1 \times P1

and this operation satisfies all the expected identities. In short, P1 carries an internal Heyting algebra structure, as does PX \cong P1^X for any set X.

We will come back to this point later, when we show (as a consequence of strong extensionality) that P1 is actually an internal Boolean algebra.

Construction of Coproducts

Next, we construct coproducts just as we do in ordinary set theory: as disjoint unions. Letting X, Y be sets (objects in an ETCS category), a disjoint union of X and Y is a pair of monos

i: X \hookrightarrow Z \qquad j: Y \hookrightarrow Z

whose intersection is empty, and whose union or join in Sub(Z) is all of Z. We will show that disjoint unions exist and are essentially unique, and that they satisfy the universal property for coproducts. We will use the notation X + Y for a disjoint union.

Theorem 3: A disjoint union of X and Y exists.

Proof: It’s enough to embed X, Y disjointly into some set C, since the union of the two monos in Sub(C) would then be the requisite Z. The idea now is that if a disjoint union or coproduct X+Y exists, then there’s a canonical isomorphism P(X + Y) \cong PX \times PY. Since the singleton map

\sigma: X + Y \to P(X + Y) \cong PX \times PY

is monic, one thus expects to be able to embed X and Y disjointly into PX \times PY. Since we can easily work out how all this goes in ordinary naive set theory, we just write out the formulas and hope it works out in ETCS.

In detail, define i_X: X \to PX \times PY to be

\displaystyle X \cong X \times 1 \stackrel{\sigma_X \times \chi_0}{\to} PX \times PY

where \sigma_X is the singleton mapping and \chi_0 classifies 0 \hookrightarrow Y; similarly, define i_Y: Y \to PX \times PY to be

\displaystyle Y \cong 1 \times Y \stackrel{\chi_0 \times \sigma_Y}{\to} PX \times PY.

Clearly i_X and i_Y are monic, so to show disjointness we just have to show that their pullback is empty. But their pullback is isomorphic to the cartesian product of the pullbacks of the diagrams

\displaystyle X \stackrel{\sigma_X}{\to} PX \stackrel{\chi_0}{\leftarrow} 1 \qquad 1 \stackrel{\chi_0}{\to} PY \stackrel{\sigma_Y}{\leftarrow} Y

so it would be enough to show that each (or just one) of these two pullbacks is empty, let’s say the first.

Suppose given a map h: A \to X which makes the square

  A -------> 1
  |          |
h |          | chi_0
  V sigma_X  V
  X -------> PX

commute. Using the pullback principle, the map A \to 1 \stackrel{\chi_0}{\to} PX classifies

0 \times A \hookrightarrow X \times A

which is just the empty subset. This must be the same subset as classified by \sigma_X h = \chi_{\delta}h (where \delta: X \hookrightarrow X \times X is the diagonal), which by the pullback principle is

(1_X \times h)^{-1}(\delta) \hookrightarrow X \times A.

An elementary calculation shows this to be the equalizer of the pair of maps

\pi_1, h\pi_2: X \times A \stackrel{\to}{\to} X

So this equalizer E is empty. But notice that \langle h, 1 \rangle: A \to X \times A equalizes this pair of maps. Therefore we have a map A \to E \cong 0. By corollary 2 above, we infer A \cong 0. This applies to the case where A is the pullback, so the pullback is empty, as was to be shown. \Box

Theorem 4: Any two disjoint unions of X, Y are canonically isomorphic.

Proof: Suppose i: X \to Z \leftarrow Y: j is a disjoint union. Define a map

\phi= \langle \phi_1, \phi_2 \rangle: Z \to PX \times PY

where \phi_1: Z \to PX classifies the subset \langle 1_X, i \rangle: X \to X \times Z, and \phi_2: Z \to PY classifies the subset \langle 1_Y, j \rangle: Y \to Y \times Z. Applying the pullback principle, the composite \phi_1 i: X \to PX classifies

(1_X \times i)^{-1}(\langle 1_X, i \rangle) \hookrightarrow X \times X

which is easily seen to be the diagonal on X. Hence \phi_1 i = \sigma_X. On the other hand, \phi_1 j: Y \to PX classifies the subset

(1_X \times j)^{-1}(\langle 1_X, i \rangle) \hookrightarrow X \times Y

which is empty because i and j are disjoint embeddings, so \phi_1 j = \chi_0: Y \to PX. Similar calculations yield

\phi_2 i = \chi_0: X \to PY \qquad \phi_2 j = \sigma_Y: Y \to PY.

Putting all this together, we conclude that \phi i = i_X: X \to PX \times PY and \phi j = i_Y: Y \to PX \times PY, where i_X and i_Y were defined in the proof of theorem 3.

Next, we show that \phi is monic. If not, then by strong extensionality, there exist distinct elements c, d: 1 \to Z for which \phi(c) = \phi(d); therefore, \phi_1 c = \phi_1 d: 1 \to PX and \phi_2 c = \phi_2 d: 1 \to PY. By the pullback principle, these equations say (respectively)

c^{-1}(i) = d^{-1}(i) \hookrightarrow 1 \qquad c^{-1}(j) = d^{-1}(j) \hookrightarrow 1

If c^{-1}(i) = 1, then both c, d: 1 \to Z factor through the mono i: X \to Z. However, since \phi i = i_{X} is monic, this would imply that \phi (c) \neq \phi (d), contradiction. Therefore c^{-1}(i) = c \cap i = 0. By similar reasoning, c \cap j = 0. Therefore

i \subseteq \neg c \qquad j \subseteq \neg c

where \neg = (- \Rightarrow 0): Sub(Z) \to Sub(Z) is the negation operator. But then i \cup j \subseteq \neg c. And since 1: Z \hookrightarrow Z is the union i \cup j by assumption, \neg c must be the top element \top \in Sub(Z), whence c: 1 \to Z is the bottom element 0. This contradicts the assumption that the topos is nondegenerate. Thus we have shown that \phi must be monic.

The argument above shows that \phi: Z \hookrightarrow PX \times PY is an upper bound of i_X: X \to PX \times PY and i_Y: Y \to PX \times PY in Sub(PX \times PY). It follows that the join X + Y constructed in theorem 3 is contained in \phi: Z \to PX \times PY, and hence can be regarded as the join of X and Y in Sub(Z). But Z is their join in Sub(Z) by assumption of being a disjoint union, so the containment X + Y \subseteq Z must be an equality. The proof is now complete. \Box

Theorem 5: The inclusions i_X: X \to X + Y, i_Y: Y \to X + Y exhibit X + Y as the coproduct of X and Y.

Proof: Let f: X \to B, g: Y \to B be given functions. Then we have monos

\langle 1_X, f \rangle: X \hookrightarrow X \times B \qquad \langle 1_Y, g \rangle: Y \hookrightarrow Y \times B \qquad (1)

Now the operation - \times B: Sub(C) \to Sub(C \times B) certainly preserves finite meets, and also preserves finite joins because it is left adjoint to \forall_B: Sub(C \times B) \to Sub(C). Therefore this operation preserves disjoint unions; we infer that the monos

X \times B \stackrel{i_X \times B}{\to} (X + Y) \times B \qquad Y \times B \stackrel{i_Y \times B}{\to} (X + Y) \times B \qquad (2)

exhibit (X + Y) \times B as a disjoint union of X \times B, Y \times B. Composing the monos of (1) and (2), we have disjoint embeddings of X and Y in (X + Y) \times B. Using theorem 4, X + Y is isomorphic to the join of these embeddings; this means we have an inclusion

\langle 1, h \rangle: X + Y \hookrightarrow (X + Y) \times B

whose restriction to X yields \langle i_X, f \rangle and whose restriction to Y yields \langle i_Y, g \rangle. Hence h: X + Y \to B extends f and g. It is the unique extension, for if there were two extensions h, h', then the equalizer of \langle 1, h \rangle and \langle 1, h' \rangle would be an upper bound of X, Y in Sub((X + Y) \times B), contradicting the fact that X + Y is the least upper bound. This completes the proof. \Box

I think that’s enough for one day. I will continue to explore the categorical structure and logic of ETCS next time.

This post is a continuation of the discussion of “the elementary theory of the category of sets” [ETCS] which we had begun last time, here and in the comments which followed. My thanks go to all who commented, for some useful feedback and thought-provoking questions.

Today I’ll describe some of the set-theoretic operations and “internal logic” of ETCS. I have a feeling that some people are going to love this, and some are going to hate it. My main worry is that it will leave some readers bewildered or exasperated, thinking that category theory has an amazing ability to make easy things difficult.

  • An aside: has anyone out there seen the book Mathematics Made Difficult? It’s probably out of print by now, but I recommend checking it out if you ever run into it — it’s a kind of extended in-joke which pokes fun at category theory and abstract methods generally. Some category theorists I know take a dim view of this book; I for my part found certain passages hilarious, in some cases making me laugh out loud for five minutes straight. There are category-theory-based books and articles out there which cry out for parody!

In an attempt to nip my concerns in the bud, let me remind my readers that there are major differences between the way that standard set theories like ZFC treat membership and the way ETCS treats membership, and that differences at such a fundamental level are bound to propagate throughout the theoretical development, and impart a somewhat different character or feel between the theories. The differences may be summarized as follows:

  • Membership in ZFC is a global relation between objects of the same type (sets).
  • Membership in ETCS is a local relation between objects of different types (“generalized” elements or functions, and sets).

Part of what we meant by “local” is that an element per se is always considered relative to a particular set to which it belongs; strictly speaking, as per the discussion last time, the same element is never considered as belonging to two different sets. That is, in ETCS, an (ordinary) element of a set A is defined to be a morphism x: 1 \to A; since the codomain is fixed, the same morphism cannot be an element 1 \to B of a different set B. This implies in particular that in ETCS, there is no meaningful global intersection operation on sets, which in ZFC is defined by:

A \cap B = \{x: (x \in A) \wedge (x \in B)\}

Instead, in ETCS, what we have is a local intersection operation on subsets A \hookrightarrow X, B \hookrightarrow X of a set. But even the word “subset” requires care, because of how we are now treating membership. So let’s back up, and lay out some simple but fundamental definitions of terms as we are now using them.

Given two monomorphisms i: A \to X, j: B \to X, we write i \subseteq j (A \subseteq B if the monos are understood, or A \subseteq_X B if we wish to emphasize this is local to X) if there is a morphism k: A \to B such that i = j k. Since j is monic, there can be at most one such morphism k; since i is monic, such k must be monic as well. We say i, j define the same subset if this k is an isomorphism. So: subsets of X are defined to be isomorphism classes of monomorphisms into X. As a simple exercise, one may show that monos i, j into X define the same subset if and only if i \subseteq j and j \subseteq i. The (reflexive, transitive) relation \subseteq_X on monomorphisms thus induces a reflexive, transitive, antisymmetric relation, i.e., a partial order on subsets of X.

Taking some notational liberties, we write A \subseteq X to indicate a subset of X (as isomorphism class of monos). If x: U \to X is a generalized element, let us say x is in a subset A \subseteq X if it factors (evidently uniquely) through any representative mono i: A \to X, i.e., if there exists x': U \to A such that x = i x'. Now the intersection of two subsets A \subseteq X and B \subseteq X is defined to be the subset A \cap B \subseteq X defined by the pullback of any two representative monos i: A \to X, j: B \to X. Following the “Yoneda principle”, it may equivalently be defined up to isomorphism by specifying its generalized elements:

A \cap B :=_i \{x \in X: (x \mbox{ is in } A) \wedge (x \mbox{ is in } B)\}.

Thus, intersection works essentially the same way as in ZFC, only it’s local to subsets of a given set.

While we’re at it, let’s reformulate the power set axiom in this language: it says simply that for each set B there is a set P(B) and a subset \in_B \subseteq B \times P(B), such that for any relation R \subseteq B \times A, there is a unique “classifying map” \chi_R: A \to P(B) whereby, under 1_B \times \chi_R: B \times A \to B \times P(B), we have

R = (1_B \times \chi_R)^{-1}(\in_B).

The equality is an equality between subsets, and the inverse image on the right is defined by a pullback. In categorical set theory notation,

R = \{\langle b, a \rangle \in B \times A: b \in_B \chi_R(a)\}.

Hence, there are natural bijections

\displaystyle \frac{R \subseteq B \times A}{A \to P(B)} \qquad \frac{R \subseteq B \times A}{B \to P(A)}

between subsets and classifying maps. The subset corresponding to \phi: A \to P(B) is denoted \left[\phi\right] \subseteq B \times A or \left[\phi\right] \subseteq A \times B, and is called the extension of \phi.

The set P(1) plays a particularly important role; it is called the “subset classifier” because subsets A \subseteq X are in natural bijection with functions \chi: X \to P(1). [Cf. classifying spaces in the theory of fiber bundles.]

In ordinary set theory, the role of P(1) is played by the 2-element set \{f, t\}. Here subsets A \subseteq X are classified by their characteristic functions \chi_A: X \to \{f, t\}, defined by \chi_A(x) := t iff x \in A. We thus have A = \chi_A^{-1}(t); the elementhood relation \in_1 \hookrightarrow 1 \times P(1) boils down to t: 1 \to P(1). Something similar happens in ETCS set theory:

Lemma 1: The domain of elementhood \in_1 \to 1 \times P(1) \cong P(1) is terminal.

Proof: A map X \to \in_1, that is, a map \chi: X \to P(1) which is in \in_1 \subseteq P(1), corresponds exactly to a subset \chi^{-1}(\in_1) \subseteq X which contains all of X (i.e., the subobject 1_X: X \subseteq X). Since the only such subset is 1_X, there is exactly one map X \to \in_1. \Box

Hence elementhood \in_1 \subseteq 1 \times P(1) is given by an element t: 1 \to P(1). The power set axiom says that a subset A \subseteq X is retrieved from its classifying map \chi_A: X \to P(1) as the pullback \chi^{-1}_A(t) \subseteq X.

Part of the power of, well, power sets is in a certain dialectic between external operations on subsets and internal operations on P(1); one can do some rather amazing things with this. The intuitive (and pre-axiomatic) point is that if C has finite products, equalizers, and power objects, then P(1) is a representing object for the functor

Sub: C^{op} \to Set

which maps an object X to the collection of subobjects of X, and which maps a morphism (“function”) f: X \to Y to the “inverse image” function f^{-1}: Sub(Y) \to Sub(X), that sends a subset j: B \subseteq Y to the subset f^{-1}(B) \subseteq X given by the pullback of the arrows f: X \to Y, j: B \subseteq Y. By the Yoneda lemma, this representability means that external natural operations on the Sub(X) correspond to internal operations on the object P(1). As we will see, one can play off the external and internal points of view against each other to build up a considerable amount of logical structure, enough for just about any mathematical purpose.

  • Remark: A category satisfying just the first three axioms of ETCS, namely existence of finite products, equalizers, and power objects, is called an (elementary) topos. Most or perhaps all of this post will use just those axioms, so we are really doing some elementary topos theory. As I was just saying, we can build up a tremendous amount of logic internally within a topos, but there’s a catch: this logic will be in general intuitionistic. One gets classical logic (including law of the excluded middle) if one assumes strong extensionality [where we get the definition of a well-pointed topos]. Topos theory has a somewhat fearsome reputation, unfortunately; I’m hoping these notes will help alleviate some of the sting.

To continue this train of thought: by the Yoneda lemma, the representing isomorphism

\displaystyle \theta: \hom(-, P(1)) \stackrel{\sim}{\to} Sub(-)

is determined by a universal element \theta_{P(1)}(1_{P(1)}), i.e., a subset of P(1), namely the mono t: 1 \to P(1). In that sense, t: 1 \to P(1) plays the role of a universal subset. The Yoneda lemma implies that external natural operations on general posets Sub(X) are completely determined by how they work on the universal subset.

Internal Meets

To illustrate these ideas, let us consider intersection. Externally, the intersection operation is a natural transformation

\cap_X: Sub(X) \times Sub(X) \to Sub(X).

This corresponds to a natural transformation

\hom(X, P(1)) \times \hom(X, P(1)) \to \hom(X, P(1))

which (by Yoneda) is given by a function \wedge: P(1) \times P(1) \to P(1). Working through the details, this function is obtained by putting X = P(1) \times P(1) and chasing

\langle \pi_1, \pi_2\rangle \in \hom(P(1) \times P(1), P(1)) \times \hom(P(1) \times P(1), P(1))

through the composite

\displaystyle \hom(X, P(1)) \times \hom(X, P(1))

\displaystyle \stackrel{\sim}{\to} Sub(X) \times Sub(X) \stackrel{\cap}{\to} Sub(X) \stackrel{\sim}{\to} \hom(X, P(1)).

Let’s analyze this bit by bit. The subset \left[\pi_1\right] = \pi_{1}^{-1}(t) \subseteq P(1) \times P(1) is given by

t \times id: 1 \times P(1) \to P(1) \times P(1),

and the subset \left[\pi_2\right] = \pi_{2}^{-1}(t) \subseteq P(1) \times P(1) is given by

id \times t: P(1) \times 1 \to P(1) \times P(1).

Hence \left[\pi_1\right] \cap \left[\pi_2\right] \subseteq P(1) \times P(1) is given by the pullback of the functions t \times id and id \times t, which is just

t \times t: 1 \times 1 \to P(1) \times P(1).

The map \wedge: P(1) \times P(1) \to P(1) is thus defined to be the classifying map of t \times t: 1 \times 1 \subseteq P(1) \times P(1).

To go from the internal meet \wedge: P(1) \times P(1) \to P(1) back to the external intersection operation, let A \subseteq X, B \subseteq X be two subsets, with classifying maps \chi_A, \chi_B: X \to P(1). By the definition of \wedge, we have that for all generalized elements x \in X

\chi_A(x) \wedge \chi_B(x) = t if and only if \langle \chi_A(x), \chi_B(x) \rangle = \langle t, t \rangle

(where the equality signs are interpreted with the help of equalizers). This holds true iff x is in the subset A \subseteq X and is in the subset B \subseteq X, i.e., if and only if x is in the subset A \cap B \subseteq X. Thus \chi_A \wedge \chi_B is indeed the classifying map of A \cap B \subseteq X. In other words, \chi_{A \cap B} = \chi_A \wedge \chi_B.

A by-product of the interplay between the internal and external is that the internal intersection operator

\wedge: P(1) \times P(1) \to P(1)

is the meet operator of an internal meet-semilattice structure on P(1): it is commutative, associative, and idempotent (because that is true of external intersection). The identity element for \wedge is the element t: 1 \to P(1). In particular, P(1) carries an internal poset structure: given generalized elements u, v: A \to P(1), we may define

u \leq v if and only if u = u \wedge v

and this defines a reflexive, symmetric, antisymmetric relation \left[\leq\right] \subseteq P(1) \times P(1):

\left[\leq\right] :=_i \{\langle u, v \rangle \in P(1) \times P(1): u = u \wedge v\},

equivalently described as the equalizer

\left[\leq\right] \to P(1) \times P(1) \stackrel{\to}{\to} P(1)

of the maps \pi_1: P(1) \times P(1) \to P(1) and \wedge: P(1) \times P(1) \to P(1). We have that u \leq v if and only if \left[u\right] \subseteq \left[v\right].

Internal Implication

Here we begin to see some of the amazing power of the interplay between internal and external logical operations. We will prove that P(1) carries an internal Heyting algebra structure (ignoring joins for the time being).

Let’s recall the notion of Heyting algebra in ordinary naive set-theoretic terms: it’s a lattice P that has a material implication operator \Rightarrow such that, for all x, y, z \in P,

x \wedge y \leq z if and only if x \leq y \Rightarrow z.

Now: by the universal property of P(1), a putative implication operation \Rightarrow: P(1) \times P(1) \to P(1) is uniquely determined as the classifying map of its inverse image (\Rightarrow)^{-1}(t) \subseteq P(1) \times P(1), whose collection of generalized elements is

\{\langle u, v \rangle \in P(1) \times P(1): (u \Rightarrow v) = t\}

Given \langle u, v \rangle: A \to P(1) \times P(1), the equality here is equivalent to

t \leq u \Rightarrow v

(because t: 1 \to P(1) is maximal), which in turn is equivalent to

t \wedge u = u \leq v

This means we should define \Rightarrow: P(1) \times P(1) \to P(1) to be the classifying map of the subset

\left[\leq\right] \subseteq P(1) \times P(1).

Theorem 1: P(1) admits internal implication.

Proof: We must check that for any three generalized elements u, v, w: A \to P(1), we have

w \leq u \Rightarrow v if and only if w \wedge u \leq v.

Passing to the external picture, let \left[u\right], \left[v\right], \left[w\right] be the corresponding subsets of A. Now: according to how we defined \Rightarrow, a generalized element e \in A is in \left[u \Rightarrow v\right] if and only if u(e) \leq v(e). This applies in particular to any monomorphism e: \left[w\right] \to A that represents the subset \left[w\right] \subseteq A.

Lemma 2: The composite

\displaystyle u(e) = (\left[w\right] \stackrel{e}{\to} A \stackrel{u}{\to} P(1))

is the classifying map of the subset \left[w\right] \cap \left[u\right] \subseteq \left[w\right].

Proof: As subsets of \left[w\right], (u e)^{-1}(t) = e^{-1} u^{-1}(t) = e^{-1}(\left[u\right]) = \left[w\right] \cap \left[u\right] where the last equation holds because both sides are the subsets defined as the pullback of two representative monos e: \left[w\right] \to A, i: \left[u\right] \to A. \Box

Continuing the proof of theorem 1, we see by lemma 2 that the condition u(e) \leq v(e) corresponds externally to the condition

\left[w\right] \cap \left[u\right] \subseteq \left[w\right] \cap \left[v\right]

and this condition is equivalent to \left[w\right] \cap \left[u\right] \subseteq \left[v\right]. Passing back to the internal picture, this is equivalent to w \wedge u \leq v, and the proof of theorem 1 is complete. \Box

Cartesian Closed Structure

Next we address a comment made by “James”, that a category satisfying the ETCS axioms is cartesian closed. As everything else in this article, this uses only the fact that such a category is a topos: has finite products, equalizers, and “power sets”.

Proposition 1: If A, B are “sets”, then P(A \times B) represents an exponential P(B)^A.

Proof: By the power set axiom, there is a bijection between maps into the power set and relations:

\displaystyle \frac{\phi: X \to P(A \times B)}{R \subseteq X \times (A \times B)}

which is natural in X. By the same token, there is a natural bijection

\displaystyle \frac{R \subseteq (X \times A) \times B}{\phi': X \times A \to P(B)}.

Putting these together, we have a natural isomorphism

\hom(-, P(A \times B)) \cong \hom(- \times A, P(B))

and this representability means precisely that P(A \times B) plays the role of an exponential P(B)^A. \Box

Corollary 1: P(A) \cong P(1)^A. \Box

The universal element of this representation is an evaluation map A \times P(A) \to P(1), which is just the classifying map of the subset \in_A \subseteq A \times P(A).

Thus, P(B)^A \cong P(A \times B) represents the set of all functions \phi: A \to P(B) (that is, relations from A to B). This is all we need to continue the discussion of internal logic in this post, but let’s also sketch how we get full cartesian closure. [Warning: for those who are not comfortable with categorical reasoning, this sketch could be rough going in places.]

As per our discussion, we want to internalize the set of such relations which are graphs of functions, i.e., maps \phi where each \phi(a) \subseteq B is a singleton, in other words which factor as

\displaystyle A \to B \stackrel{\sigma}{\to} P(B)

where \sigma: B \to P(B) is the singleton mapping:

b \mapsto \{b\} = \{c \in B: b = c\}.

We see from this set-theoretic description that \sigma: B \to P(B) classifies the equality relation

\{\langle b, c\rangle \in B \times B: b = c\} \subseteq B \times B

which we can think of as either the equalizer of the pair of maps \pi_1, \pi_2: B \times B \to B or, what is the same, the diagonal map \delta_B = \langle 1_B, 1_B \rangle: B \to B \times B.

Thus, we put \sigma = \chi_{\delta}: B \to P(B), and it is not too hard to show that the singleton mapping \sigma is a monomorphism. As usual, we get this monomorphism as the pullback \chi_{\sigma}^{-1}(t) of t: 1 \to P(1) along its classifying map \chi_{\sigma}: P(B) \to P(1).

Now: a right adjoint such as (-)^A preserves all limits, and in particular pullbacks, so we ought then to have a pullback

       B^A ---------------> 1^A
        |                    |
sigma^A |                    | t^A
        V                    V
     P(B)^A -------------> P(1)^A
            (chi_sigma)^A

Of course, we don’t even have B^A yet, but this should give us an idea: define \sigma^A, and in particular its domain B^A, by taking the pullback of the right-hand map along the bottom map. In case there is doubt, the map on the bottom is defined Yoneda-wise, applying the isomorphism

\hom(P(B)^A \times A, P(1)) \cong \hom(P(B)^A, P(1)^A)

to the element in the hom-set (on the left) given by the composite

\displaystyle P(B)^A \times A \stackrel{ev}{\to} P(B) \stackrel{\chi_\sigma}{\to} P(1).

The map on the right of the pullback is defined similarly. That this recipe really gives a construction of B^A will be left as an exercise for the reader.

Universal Quantification

As further evidence of the power of the internal-external dialectic, we show how to internalize universal quantification.

As we are dealing here now with predicate logic, let’s begin by defining some terms as to be used in ETCS and topos theory:

  • An ordinary predicate of type A is a function \phi: A \to P(1). Alternatively, it is an ordinary element \phi': 1 \to P(1)^A \cong P(A). It corresponds (naturally and bijectively) to a subset \left[\phi\right]: S \subseteq A.
  • A generalized predicate of type A is a function \phi': X \to P(A) \cong P(1)^A. It may be identified with (corresponds naturally and bijectively to) a function \phi: X \times A \to P(1), or to a subset \left[\phi\right]: S \subseteq X \times A.

We are trying to define an operator \forall_A which will take a predicate of the form \phi: X \times A \to P(1) [conventionally written \phi(x, a)] to a predicate \forall_A \phi: X \to P(1) [conventionally written \forall_{a \in A} \phi(x, a)]. Externally, this corresponds to a natural operation which takes subsets of X \times A to subsets of X. Internally, it corresponds to an operation of the form

\forall_A: P(A) \cong P(1)^A \to P(1).

This function is determined by the subset (\forall_A)^{-1}(t) \subseteq P(1)^A, defined elementwise by

\{\phi \in P(1)^A: \forall_A \phi = t\}.

Now, in ordinary logic, \forall_{a \in A} \phi(a) is true if and only if \phi(a) is true for all a \in A, or, in slightly different words, if \phi: A \to P(1) is constantly true over all of A:

\displaystyle \phi = (A \stackrel{!}{\to} 1 \stackrel{t}{\to} P(1)).

The expression on the right (global truth over A) corresponds to a function t_A: 1 \to P(1)^A, indeed a monomorphism since any function with domain 1 is monic. Thus we are led to define the desired quantification operator \forall_A: P(1)^A \to P(1) as the classifying map of t_A: 1 \subseteq P(1)^A.

Let’s check how this works externally. Let \phi: X \to P(1)^A be a generalized predicate of type A. Then according to how \forall_A has just been defined, \forall_A \phi: X \to P(1) classifies the subset

\displaystyle \{x \in X: \phi(x, -) = t_A: A \to P(1))\} \subseteq X

There is an interesting adjoint relationship between universal quantification and substitution (aka “pulling back”). By “substitution”, we mean that given any predicate \psi: X \to P(1) on X, we can always pull back to a predicate on X \times A (substituting in a dummy variable a of type A, forming e.g. \psi(x) \wedge \left[a=a\right]) by composing with the projection \pi: X \times A \to X. In terms of subsets, substitution along A is the natural external operation

(\left[\psi\right] \subseteq X) \mapsto (\left[\psi\right]\times A \subseteq X \times A).

Then, for any predicate \phi: X \times A \to P(1), we have the adjoint relationship

\left[\psi\right] \times A \subseteq \phi if and only if \left[\psi\right] \subseteq \forall_A \phi

so that substitution along A is left adjoint to universal quantification along A. This is easy to check; I’ll leave that to the reader.

Internal Intersection Operators

Now we put all of the above together, to define an internal intersection operator

\bigcap: PPX \to PX

which intuitively takes an element 1 \to PPX (a family F of subsets of X) to its intersection 1 \to PX, as a subset \bigcap F \subseteq X.

Let’s first write out a logical formula which expresses intersection:

x \in \bigcap F \ \ \mbox{if and only if} \ \ \forall_{S \in PX} (S \in F) \Rightarrow (x \in S)\}.

We have all the ingredients to deal with the logical formula on the right: we have an implication operator \Rightarrow as part of the internal Heyting algebra structure on P(1), and we have the quantification operator \forall_{PX}. The atomic expressions (S \in F) and (x \in S) refer to internal elementhood: (x \in S) means \langle x, S\rangle \in X \times PX is in \ \in_{X}\ \subseteq X \times PX, and (S \in F) means \langle S, F\rangle \in PX \times PPX is in \ \in_{PX}\ \subseteq PX \times PPX.

There is a slight catch, in that the predicates “(S \in_{PX} F)” and “(x \in_X S)” (as generalized predicates over PX, where S lives) are taken over different domains. The first is of the form \phi_1: PPX \to P(1)^{PX}, and the second is of the form \phi_2: X \to P(1)^{PX}. No matter: we just substitute in some dummy variables. That is, we just pull these maps back to a common domain PPX \times X, forming the composites

\displaystyle \psi_1 = (PPX \times X \stackrel{\pi_1}{\to} PPX \stackrel{\phi_1}{\to} P(1)^{PX})

and

\displaystyle \psi_2 = (PPX \times X \stackrel{\pi_2}{\to} X \stackrel{\phi_2}{\to} P(1)^{PX}).

Putting all this together, we form the composite

\displaystyle PPX \times X \stackrel{\langle \psi_1, \psi_2\rangle}{\to} P(1)^{PX} \times P(1)^{PX}

\displaystyle \cong (P(1) \times P(1))^{PX} \stackrel{(\Rightarrow)^{PX}}{\to} P(1)^{PX} \stackrel{\forall_{PX}}{\to} P(1)

This composite directly expresses the definition of the internal predicate (x \in \bigcap F) given above. By cartesian closure, this map PPX \times X \to P(1) induces the desired internal intersection operator, \bigcap: PPX \to PX.

This construction provides an important bridge to getting the rest of the internal logic of ETCS. Since we can can construct the intersection of arbitrary definable families of subsets, the power sets PX are internal inf-lattices. But inf-lattices are sup-lattices as well; on this basis we will be able to construct the colimits (e.g., finite sums, coequalizers) that we need. Similarly, the intersection operators easily allow us to construct image factorizations: any function f: X \to Y can be factored (in an essentially unique way) as an epi or surjection X \to I to the image, followed by a mono or injection I \to Y. The trick is to define the image as the smallest subset of Y through which f factors, by taking the intersection of all such subsets. Image factorization leads in turn to the construction of existential quantification.

As remarked above, the internal logic of a topos is generally intuitionistic (the law of excluded middle is not satisfied). But, if we add in the axiom of strong extensionality of ETCS, then we’re back to ordinary classical logic, where the law of excluded middle is satisfied, and where we just have the two truth values “true” and “false”. This means we will be able to reason in ETCS set theory just as we do in ordinary mathematics, taking just a bit of care with how we treat membership. The foregoing discussion gives indication that logical operations in categorical set theory work in ways familiar from naive set theory, and that basic set-theoretic constructions like intersection are well-grounded in ETCS.

One of the rare pleasures of doing mathematics — not necessarily high-powered research-level mathematics, but casual fun stuff too — is finally getting an answer to a question tucked away at the back of one’s mind for years and years, sometimes decades. Let me give an example: ever since I was pretty young (early teens), I’ve loved continued fractions; they are a marvelous way of representing numbers, with all sorts of connections to non-trivial mathematics [analysis, number theory (both algebraic and transcendental!), dynamical systems, knot theory, …]. And ever since I’ve heard of continued fractions, there’s one little factoid which I have frequently seen mentioned but which is hardly ever proved in the classic texts, at least not in the ones I looked at: the beautiful continued fraction representation for e = 2.7182818....

[Admittedly, most of my past searches were done in the pre-Google era — today it’s not that hard to find proofs online.]

This continued fraction was apparently “proved” by Euler way back when (1731); I once searched for a proof in his Collected Works, but for some reason didn’t find it; perhaps I just got lost in the forest. Sometimes I would ask people for a proof; the responses I got were generally along the lines of “isn’t that trivial?” or “I think I can prove that”. But talk is cheap, and I never did get no satisfaction. That is, until a few (maybe five) years ago, when by accident I spotted a proof buried in Volume 2 of Knuth’s The Art of Computer Programming. Huge rush of relief! So, if any of you have been bothered by this yourselves, maybe this is your lucky day.

I’m sure most of you know what I’m talking about. To get the (regular) continued fraction for a number, just iterate the following steps: write down the integer part, subtract it, take the reciprocal. Lather, rinse, repeat. For example, the sequence of integer parts you get for \sqrt{2} is 1, 2, 2, 2, … — this means

\sqrt{2} = 1 + 1/(2 + 1/(2 + 1/(2 + ...))),

giving the continued fraction representation for \sqrt{2}. Ignoring questions of convergence, this equation should be “obvious”, because it says that the continued fraction you get for \sqrt{2} - 1 equals the reciprocal of the continued fraction for \sqrt{2} + 1.

Before launching in on e, let me briefly recall a few well-known facts about continued fractions:

  • Every rational number has a continued fraction representation of finite length. The continued fraction expresses what happens when one iterates the Euclidean division algorithm.

For example, the integer parts appearing in the continued fraction for 37/14:

37/14 = \mathbf{2} + 1/(\mathbf{1} + 1/(\mathbf{1} + 1/(\mathbf{1} + 1/\mathbf{4})))

duplicate the successive quotients one gets by using the division algorithm to compute \mbox{gcd}(37, 14):

37 = \mathbf{2} \cdot 14 + 9

14 = \mathbf{1} \cdot 9 + 5

9 = \mathbf{1} \cdot 5 + 4

5 = \mathbf{1} \cdot 4 + 1

4 = \mathbf{4} \cdot 1 + 0

  • A number has an infinite continued fraction if and only if it is irrational. Let I denote the space of irrationals between 0 and 1 (as a subspace of \mathbb{R}). The continued fraction representation (mapping an irrational x \in I to the corresponding infinite sequence of integer parts a_1, a_2 \ldots in its continued fraction representation x = \left[0, a_1, a_2, \ldots \right]) gives a homeomorphism I \to \mathbb{N}^{\mathbb{N}}, where \mathbb{N}^{\mathbb{N}} carries a topology as product of countably many copies of the discrete space \mathbb{N}.

In particular, the shift map \displaystyle \sigma: \mathbb{N}^{\mathbb{N}} \to  \mathbb{N}^{\mathbb{N}}, defined by (\sigma(f))(n) = f(n+1), corresponds to the map \tau: I \to I defined by \tau(x) = 1/x \pmod 1. The behavior of \tau is a paragon, an exemplary model, of chaos:

  1. There is a dense set of periodic points of \tau. These are quadratic surds like \sqrt{2} - 1: elements of I that are fixed points of fractional linear transformations \displaystyle x \mapsto \frac{ax + b}{cx + d} (for integral a, b, c, d and |ad - bc| = 1).
  2. The transformation \tau is topologically mixing.
  3. There is sensitive dependence on initial conditions.

For some reason, I find it fun to observe this sensitive dependence using an ordinary calculator. Try calculating something like the golden mean \displaystyle \frac{\sqrt{5} - 1}{2}, and hit it with \tau over and over and watch the parade of integer parts go by (a long succession of 1’s until the precision of the arithmetic finally breaks down and the behavior looks random, chaotic). For me this activity is about as enjoyable as popping bubble wrap.

  • Remark: One can say rather more in addition to the topological mixing property. Specifically, consider the measure \displaystyle \mu(E) := \int_E \frac1{1 + x} dx on I, where \mu(I) = \log 2. It may be shown that \tau: I \to I is a measure-preserving transformation; much more significantly, \tau is an ergodic transformation on the measure space. It then follows from Birkhoff’s ergodicity theorem that whenever f: I \to \mathbb{R} is integrable, the time averages \displaystyle \frac1{n} \sum_{k=0}^{n-1} f(\tau^k(x)) approach the space average \displaystyle \frac1{\log 2}\int_I f(u) \frac{du}{1 + u} for almost all x \in I. Applying this fact to f([0, a_1, a_2, \ldots]) = \log(a_1), it follows that for almost all irrationals x \in I, the geometric mean of the integer parts a_1, a_2, \ldots a_n approaches a constant, Khinchin’s constant K = 2.685452.... A fantastic theorem!

Anyway, I digress. You are probably waiting to hear about the continued fraction representation of e, which is \left[2, 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, \ldots \right]:

e = 2 + 1/(1 + 1/(2 + 1/(1 + 1/(1 + 1/(4 + \ldots)))))

Cute little sequence, except for the bump at the beginning where there’s a 2 instead of a 1. One thing I learned from Knuth is that the bump is smoothed away by writing it in a slightly different way,

e = \left[1, 0, 1, 1, 2, 1, 1, 4, 1, 1, 6, 1, \ldots \right],

involving triads 1, 2n, 1, where n = 0, 1, 2, \ldots.

Anyway, how to prove this fact? I’ll sketch two proofs. The first is the one I found in Knuth (loc. cit., p. 375, exercise 16; see also pp. 650-651), and I imagine it is close in spirit to how Euler found it. The second is from a lovely article of Henry Cohn which appeared in the American Mathematical Monthly (Vol. 116 [2006], pp. 57-62), and is connected with Hermite’s proof of the transcendence of e.

PROOF 1 (sketch)

Two functions which Euler must have been very fond of are the tangent function and its cousin the hyperbolic tangent function,

\displaystyle \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}},

related by the equation \tan(ix) = i\tanh(x). These functions crop up a lot in his investigations. For example, he knew that their Taylor expansions are connected with Bernoulli numbers, e.g.,

\displaystyle \tanh(x) = \sum_{n \geq 1} \frac{B_{2n}4^n (4^n - 1)}{(2n)!} x^{2n-1}.

The Taylor coefficients T_n = E_{2n-1} where \displaystyle \tan(x) = \sum_{n \geq 1} \frac{E_{2n-1} x^{2n-1}}{(2n-1)!} are integers called tangent numbers; they are the numbers 1, 2, 16, … which appear along the right edge of the triangle

1

0, 1

1, 1, 0

0, 1, 2, 2

5, 5, 4, 2, 0

0, 5, 10, 14, 16, 16

where each row is gotten by taking partial sums from the preceding row, moving alternately left-to-right and right-to-left. The numbers 1, 1, 5, … which appear along the left edge are called secant numbers S_n, the Taylor coefficients of the secant function. Putting E_{2n} = S_n, the secant and tangent numbers E_n together are called Euler numbers, and enjoy some interesting combinatorics: E_n counts the number of “zig-zag permutations” k \mapsto a_k of \{1, 2, \ldots, n\}, where a_1 > a_2 < a_3 > \ldots. For more on this, see Stanley’s Enumerative Combinatorics (Volume I), p. 149, and also Conway and Guy’s The Book of Numbers, pp. 110-111; I also once gave a brief account of the combinatorics of the E_n in terms the generating function \sec(x) + \tan(x), over here.

Euler also discovered a lovely continued fraction representation,

\displaystyle \tanh(x) = 1/(x^{-1} + 1/(3x^{-1} + 1/(5x^{-1} + \ldots ))),

as a by-product of a larger investigation into continued fractions for solutions to the general Riccati equation. Let’s imagine how he might have found this continued fraction. Since both sides of the equation are odd functions, we may as well consider just x > 0, where 0 < \tanh(x) < 1. Thus the integer part is 0; subtract the integer part and take the reciprocal, and see what happens.

The MacLaurin series for \tanh(x) is x - x^3/3 + \ldots = x(1 - x^2/3 + \ldots); its reciprocal has a pole at 0 of residue 1, so

\displaystyle \frac1{\tanh(x)} - \frac1{x} = f_1(x)

gives a function f_1(x) = x/3 + \ldots which is odd and analytic near 0. Now repeat: reciprocating f_1(x), we get a simple pole at 0 of residue 3, and

\displaystyle \frac1{f_1(x)} - \frac{3}{x} = f_2(x)

gives a function f_2(x) which is odd and analytic near 0, and one may check by hand that its MacLaurin series begins as x/5 + \ldots.

The pattern continues by a simple induction. Recursively define (for x > 0)

\displaystyle f_{n+1}(x) = \frac1{f_n(x)} - (2n+1)x^{-1}

It turns out (lemma 1 below) that each f_n(x) is odd and analytic near 0, and then it becomes plausible that the continued fraction for \tanh(x) above is correct: we have

\displaystyle \tanh(x) = 1/(x^{-1} + 1/(3x^{-1} + \ldots 1/((2n-1)x^{-1} + f_n(x)) \ldots ))

Indeed, assuming the fact that f_n(x) is uniformly bounded over n, these expressions converge as n \to \infty, so that the continued fraction expression for \tanh(x) is correct.

Lemma 1: Each f_n(x) (as recursively defined above) is odd and analytic near 0, and satisfies the differential equation

\displaystyle f_{n}'(x) = 1 - f_n(x)^2 - 2n x^{-1}f_n(x).

Proof: By induction. In the case n = 0, we have that f_0(x) = \tanh(x) is analytic and

f_{0}'(x) = 1/\cosh^2(x) = 1 - f_{0}^2(x).

Assuming the conditions hold when n = k, and writing

f_k(x) = a_{k1}x + a_{k3}x^3 + \ldots

we easily calculate from the differential equation that a_{k1} = 1/(2k+1). It follows that

\displaystyle f_{k+1}(x) := \frac1{f_k(x)} - (2k+1)x^{-1}

is indeed analytic in a neighborhood of 0. The verification of the differential equation (as inductive step) for the case n = k+1 is routine and left to the reader. \Box

  • Remark: The proof that the continued fraction above indeed converges to \tanh(x) is too involved to give in detail here; I’ll just refer to notes that Knuth gives in the answers to his exercises. Basically, for each x in the range (-\pi/2, \pi/2), he gets a uniform bound |f_n(x)| \leq \tan|x| for all n, and notes that as a result convergence of the continued fraction is then easy to prove for such x (good enough for us, as we’ll be taking x = 1/2). He goes on to say, somewhat telegraphically for my taste, “Careful study of this argument reveals that the power series for f_n(z) actually converges for |z| < \pi \sqrt{2n+1}/2; therefore the singularities of f_n(z) get farther and farther away from the origin as n grows, and the continued fraction actually represents \tanh(z) throughout the complex plane.” [Emphasis his] Hmm…

Assuming the continued fraction representation for \tanh(x), let’s tackle e. From the continued fraction we get for instance

\displaystyle \tanh(1/2) = \frac{e^{1/2} - e^{-1/2}}{e^{1/2} + e^{-1/2}} = 1/(2 + 1/(6 + 1/(10 + \ldots)))

Taking reciprocals and manipulating,

\displaystyle 1 + \frac{2}{e - 1} = 2 + 1/(6 + 1/(10 + 1/(14 + \ldots ))).

Theorem 1: e-1 = \left[ 1, 1, 2, 1, 1, 4, 1, 1, 6, \ldots \right].

Proof: By the last displayed equation, it suffices to show

2\left[ 0, 1, 1, 2, 1, 1, 4, 1, 1, 6, \ldots \right] = \left[ 1, 6, 10, 14, \ldots \right].

This follows from a recursive algorithm for multiplying a continued fraction by 2, due to Hurwitz (Knuth, loc. cit., p. 375, exercise 14):

Lemma 2: 2\left[ 0, 2a + 1, b, c, \ldots \right] = \left[ 0, a, 1, 1 + 2\left[ 0, b-1, c, \ldots \right] \right], and

2\left[0, 2a, b, c, \ldots \right] = \left[ 0, a, 2\left[ b, c, \ldots \right] \right] .

I won’t bother proving this; instead I’ll just run through a few cycles to see how it applies to theorem 1:

2\left[ 0, 1, 1, 2, 1, 1, 4, \ldots \right] = \left[ 0, 0, 1, 1 + 2\left[ 0, 0, 2, 1, 1, 4\ldots \right] \right]

= \left[ 1, 1 + 2\left[ 2, 1, 1, 4, \ldots \right] \right]

= \left[ 1, 1 + 4 + 2\left[ 0, 1, 1, 4, \ldots \right] \right]

= \left[ 1, 5 + \left[ 0, 0, 1, 1 + 2\left[ 0, 0, 4, \ldots \right] \right] \right]

= \left[ 1, 6 + \left[ 0, 1 + 2\left[ 4, 1, 1, 6, \ldots \right] \right] \right]

= \left[ 1, 6, 1 + 2\left[ 4, 1, 1, 6, \ldots \right] \right]

= \left[ 1, 6, 1 + 8 + 2\left[ 0, 1, 1, 6, \ldots \right] \right]

and so on. Continuing this procedure, we get \left[ 1, 6, 10, 14, \ldots \right], which finishes the proof of theorem 1. \Box

PROOF 2

I turn now to the second proof (by Cohn, loc. cit.), which I find rather more satisfying. It’s based on Padé approximants, which are “best fit” rational function approximations to a given analytic function, much as the rational approximants \displaystyle \frac{p_n}{q_n} = \left[ a_0, a_1, \ldots, a_n \right] provide “best fits” to a given real number x = \left[ a_0, a_1, \ldots \right]. (By “best fit”, I mean a sample theorem like: among all rational numbers whose denominator is bounded above in magnitude by q_n, the approximant \displaystyle \frac{p_n}{q_n} comes closest to x.)

Definition: Let f be a function analytic in a neighborhood of 0. The Padé approximant to f of order (m, n), denoted r_{m, n}(x), is the (unique) rational function \displaystyle \frac{p(x)}{q(x)} such that \mbox{deg}(p) = m, \mbox{deg}(q) = n, and the MacLaurin coefficients of r_{m, n} agree with those of f up to degree m + n. \Box

This agreement of MacLaurin coefficients is equivalent to the condition that the function

\displaystyle \frac{q(x)f(x) - p(x)}{x^{m+n+1}}

is analytic around 0. Here, we will be interested in Padé approximants to f(x) = e^x.

In general, Padé approximants may be computed by (tedious) linear algebra, but in the present case Hermite found a clever integration trick which gets the job done:

Proposition 1: Let r be a polynomial of degree k. Then there are polynomials p, q of degree at most k such that

\displaystyle \int_{0}^1 r(t)e^{xt} dt = \frac{q(x)e^x - p(x)}{x^{k+1}}

Explicitly,

p(x) = r(0)x^k - r'(0)x^{k-1} + r''(0)x^{k-2} - \ldots

q(x) = r(1)x^k - r'(1)x^{k-1} + r''(1)x^{k-2} - \ldots

Proof: Integration by parts yields

\displaystyle \int_{0}^1 r(t)e^{xt} dt = \frac{r(1)e^x - r(0)}{x} - \frac1{x} \int_{0}^1 r'(t)e^{xt} dt

and the general result follows by induction. \Box

It is clear that the integral of proposition 1 defines a function analytic in x. Taking k = m + n, this means we can read off the Padé approximant r_{m, n}(x) = p(x)/q(x) to e^x from the formulas for p, q in proposition 1, provided that the polynomial r(t) [of degree m + n] is chosen so that \mbox{deg}(p) = m, \mbox{deg}(q) = n. Looking at these formulas, all we have to do is choose r to have a zero of order n at t = 0, and a zero of order m at t = 1. Therefore r(t) = t^n (t-1)^m fits the bill.

Notice also we can adjust r by any constant multiple; the numerator and denominator p(x), q(x) are adjusted by the same constant multiples, which cancel each other in the Padé approximant r_{m, n}(x) = p(x)/q(x).

Taking r(t) = t^n (t-1)^m in proposition 1, we then infer

\displaystyle \int_{0}^1 t^n (t-1)^m e^t dt = q(1)e - p(1).

Notice that this integral is small when m, n are large. This means that r_{m, n}(1) = p(1)/q(1) will be close to e (see the following remark), and it turns out that by choosing m, n appropriately, the values r_{m, n}(1) coincide exactly with rational approximants coming from the continued fraction for e.

  • Remark: Note that for the choice r(t) =t^n (t-1)^m, the values p(1), q(1) derived from proposition 1 are manifestly integral, and q(1) \neq 0. [In particular, |q(1)| \geq 1, justifying the claim that |p(1)/q(1) - e| is small if q(1)e - p(1) is.] In fact, p(1), q(1) may be much larger than necessary; e.g., they may have a common factor, so that the fraction p(1)/q(1) is unreduced. This ties in with how we adjust r(t) by a constant factor, as in theorem 2 below.

For n \geq 0, let p_n/q_n denote the n^{th} rational approximant arising from the infinite continued fraction

\left[ a_0, a_1, a_2, \ldots \right] = \left[ 1, 0, 1, 1, 2, 1, 1, 4, 1, \dots \right],

where \mbox{gcd}(p_n, q_n) = 1. From standard theory of continued fractions, we have the following recursive rule for computing the integers p_n, q_n from the a_n: p_0 = a_0, q_0 = 1, p_{-1} := 1, q_{-1} := 0, and

p_{n+1} = a_{n+1} p_n + p_{n-1} \qquad q_{n+1} = a_{n+1} q_n + q_{n-1}.

Explicitly, a_{3n} = a_{3n+2} = 1 and a_{3n+1} = 2n, so

  1. p_{3n} = p_{3n-1} + p_{3n-2} \qquad q_{3n} = q_{3n-1} + q_{3n-2}
  2. p_{3n+1} = 2n p_{3n} + p_{3n-1} \qquad q_{3n+1} = 2n q_{3n} + q_{3n-1}
  3. p_{3n+2} = p_{3n+1} + p_{3n} \qquad q_{3n+2} = q_{3n+1} + q_{3n}

[Note: p_1 = 1 and q_1 = 0, so p_1/q_1 is infinite, but that won’t matter below.]

Theorem 2: Define, for n \geq 0,

\displaystyle A_n := \int_{0}^1 \frac{t^n (t-1)^n}{n!} e^t dt

\displaystyle B_n := \int_{0}^1 \frac{t^{n+1} (t-1)^n}{n!} e^t dt

\displaystyle C_n := \int_{0}^1 \frac{t^n (t-1)^{n+1}}{n!} e^t dt

Then A_n = q_{3n}e - p_{3n}, B_n = p_{3n+1} - q_{3n+1}e, and C_n = p_{3n+2} - q_{3n+2}e.

Proof: It is easy to see A_0 = e - 1, B_0 = 1, and C_0 = 2 - e. In view of the recursive relations for the p_n, q_n above, it suffices to show

  • A_n = -B_{n-1} - C_{n-1}
  • B_n = -2n A_{n-1} + C_{n-1}
  • C_n = B_n - A_n

The last relation is trivial. The first relation A_n + B_{n-1} + C_{n-1} = 0 follows by integrating both sides of the identity

\displaystyle \frac{t^n (t-1)^n e^t}{n!} + \frac{t^n (t-1)^{n-1} e^t}{(n-1)!} + \frac{t^{n-1} (t-1)^n e^t}{(n-1)!} = \frac{d}{dt} \frac{t^n (t-1)^n e^t}{n!}

over the interval \left[ 0, 1 \right]. The second relation B_n + 2n A_{n-1} - C_{n-1} = 0 follows by integrating both sides of the identity

\displaystyle \frac{t^{n+1} (t-1)^n e^t}{n!} + 2n\frac{t^{n-1} (t-1)^{n-1} e^t}{(n-1)!} - \frac{t^{n-1} (t-1)^n e^t}{(n-1)!} = \frac{d}{dt} \frac{t^n (t-1)^{n+1} e^t}{n!}

which we leave to the reader to check. This completes the proof. \Box

Theorem 2 immediately implies that

e = \left[ 1, 0, 1, 1, 2, 1, 1, 4, 1, 1, 6, 1, \ldots \right];

indeed, the rational approximants p_k/q_k to the right-hand side have the property that q_k |p_k/q_k - e| is one of |A_n|, |B_n|, or |C_n| (for k = 3n, 3n+1, 3n+2 respectively), and looking at their integral expressions, these quantities approach 0 very rapidly.

This in turn means, since the denominators q_k grow rapidly with k, that the rational approximants p_k/q_k approach einsanely” rapidly, and this in itself can be used as the basis of a proof that e is transcendental (Roth’s theorem). To give some quantitative inkling of just “how rapidly”: Knuth in his notes gives estimates on how close the approximant

1/(x^{-1} + 1/(3x^{-1} + \ldots + 1/(2n+1)x^{-1} \ldots ))

is to the function \tanh(x). It’s something on the order of \displaystyle \frac{(n!)^2}{(2n)!(2n+1)!} (loc. cit., p. 651).

  • Remark: Quoting Roth’s theorem in support of a theorem of Hermite is admittedly anachronistic. However, the Padé approximants and their integral representations used here did play an implicit role in Hermite’s proof of the transcendence of e; in fact, Padé was a student of Hermite. See Cohn’s article for further references to this topic.

[Wow, another long post. I wonder if anyone will read the whole thing!]

[Edits in response to the comment below by Henry Cohn.]

In mathematics you don’t understand things. You just get used to them.”

— John von Neumann

I had been wanting to write on this topic – no, I am not referring to the above quote by von Neumann – for quite some time but I wasn’t too sure if doing so would contribute anything “useful” to the ongoing discussion on the pedagogical roles of concrete and abstract examples in mathematics, a discussion that’s been going on on various blogs for some time now. In part coaxed by Todd, let me share some of my own observations for whatever they are worth.

First, some background. A few months ago, Scientific American published an article titled In Abstract: Avoid Concrete Example When Teaching Math (by Nikhil Swaminathan). Some excerpts from that article can be read below:

New research published in Science suggests that attempts by math teachers to make the subject easier to grasp by providing such practical examples may actually have made it tougher to learn.

For their study, Kaminski and her colleagues taught 80 undergraduate students—split into four 20-person groups—a new mathematical system (based on several simple arithmetic concepts) in different ways.

One group was taught using generic symbols such as circles and diamonds. The other groups were taught using practical scenarios such as combining liquids in measuring cups.

The researchers then tested their grasp of the concept by seeing how well they could apply it to an unrelated situation, in this case a children’s game. The results: students who learned using symbols on average scored 80 percent; the others scored between 40 and 50 percent, according to Kaminski.

One may read the entire article online to learn a bit more about the study done. Let me add that I do agree with the overall conclusion of the study cited: in mathematics concrete examples (in contradistinction to abstract ones) more often than not obfuscate the underlying concepts behind those examples, thus hindering “real” or complete understanding of those concepts. However, I also feel that such a claim must be somewhat qualified because there is more to it than meets the eye.

Sometimes the line between abstract examples and concrete ones can be quite blurry. What is more, some concrete examples may even be more abstract than other concrete ones. In this post, I will assume (and hope others do too) that the distinction between an abstract example and a concrete one (that I have chosen for this post) is sharp enough for our discussion. Of course, my aim is not to highlight such a distinction but to emphasize the importance of both abstract and concrete examples in mathematical education, for I firmly believe that a “concrete” understanding of concepts isn’t necessarily subsumed under an “abstract” one, even though a concrete example may just be a special case of a more general and abstract one. What is more, and this may sound surprising, abstract examples may sometimes not reveal certain useful principles which, on the other hand, may be clearly revealed by concrete ones!

Let me illustrate what I wrote above by discussing a somewhat well-known problem and its two related solutions, one of which employs an abstract approach and the other a concrete one, if you will. Some time ago, Isabel at God Plays Dice pointed to an online article titled An Intuitive Explanation of Bayesian Reasoning by Eliezer Yudkowsky, and I borrow the problem I am about to discuss in this post from that article.

PROBLEM: 1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

How may one proceed to solve this problem? Well, first, let us look at an “abstract” solution.

“ABSTRACT” SOLUTION: Here we employ the machinery of set-theoretic probability theory to arrive at our answer. We first note that what we really want to compute is the probability of a woman having breast cancer given that she has tested positive. That is, we want to compute the conditional probability P(A/B), where event A corresponds to that of a woman having breast cancer and event B corresponds to that of a woman testing positive for breast cancer. Now, from Bayes’ theorem, we have

\displaystyle P(A/B) = \frac{P(B/A) P(A)}{P(B/A) P(A) + P(B/A^{c}) P(A^{c})}.

Also, we note that P(A) = 0.01, P(B/A) = 0.8, P(A^{c}) = 0.99 and P(B/A^{c}) = 0.096. Plugging these values into the above formula immediately yields P(A/B) = 7.76%. And, we are done.

A couple of observations.

1. It is not hard to observe that the derivation of Bayes’ formula follows from the definition of conditional probability, viz. P(A/B) = P(AB)/P(B), where P(B) > 0, and the usual set-theoretic rules involving the union and intersection of sets (events). And, this derivation can be carried out through sheer manipulation of symbols under those rules. By that I mean, if a student knows enough set theory as well as the “laws” of set-theoretic probability theory, then the derivation of Bayes’ theorem makes absolutely no (or, almost no) use of the “intuitive” faculty of a student.

2. The abstract method presented above also subsumes the concrete method, as we shall see shortly. What is more, Bayes’ formula can be generalized even further. This means that once we have this particularly useful “abstract” tool at our disposal, we can solve any number of similar problems by repeatedly using this tool in concrete (and even abstract) cases. In addition, Bayes’ theorem can also be thought of as a “black box” to which we apply certain inputs in order to get our output. This should not surprise us, for in mathematics the use of theorems as black boxes is a common one.

Now, the above two observations may lead one to believe that indeed there is almost no need to find a “concrete” solution to the above problem. After all, the abstract case takes care of the concrete cases completely.

However, let us see if we can come up with a concrete (that is, a far less abstract) solution and examine it more closely to see if we can extract some useful ideas/techniques from the same.

“CONCRETE” SOLUTION: Suppose we choose a random sample of 100,000 women of age forty. (We choose that figure for reasons that will be clear soon.) Then, we have two groups of women.

1st group: 1,000 (1%) women who have breast cancer.

2nd group: 99,000 (99%) women who don’t have breast cancer.

Now, in the 1st group, 800 (80% of 1,000) women will test positive, and, in the 2nd group, 9,504 (9.6% of 99,000) women will test positive. So, it is clear that if a woman tests positive, then the probability that she belongs to the 1st group (that is, she really has cancer) is 800/(800+9504) = 7.76 %. And, we are done.

Let me quickly point out a very important advantage the above solution has over the abstract one we saw earlier.

Indeed, we finally “see” what’s really going on. That is, from an intuitive standpoint, we observe in the above solution that there is a “tree structure” involved in our reasoning. The sample of 1,00,000 women bifurcates into two distinct samples, one of which has 1,000 women who have breast cancer and the other that has 99,000 women who don’t. Next, we observe that each of these two samples in turn bifurcates into two samples, one of which comprises women who test positive and the other that comprises women who don’t. This clearly reveals to the student the “tree structure” in the above reasoning. This makes the concrete solution much more appealing and “satisfying” to the average student. In fact, the generalization we talked about earlier in regard to Bayes’ theorem can even be carried out in this particular method: we will only need to increase the depth and/or breadth of our “tree” by extending more nodes from existing ones!

Moreover, one may recall that the use of such “trees” in reasoning is quite common in mathematics. For instance, the two most basic rules of Combinatorial Principles, viz. the Rule of Sum and the Rule of Product are proved using such “trees”. So, this is one instance in which a concrete solution reveals much more clearly a quite fundamental principle/technique (use of “trees” in reasoning) in mathematics that isn’t clearly revealed at all in the abstract solution we examined earlier.

In other words, much thought needs to be put in deciding if abstract examples should necessarily be “favored” over concrete ones in mathematics education. From a pedagogical standpoint, sometimes concrete examples are simply much better than abstract ones!

A couple of weeks ago, when Miodrag Milenkovic posed an interesting general problem in connection with POW-7, I was reminded of the “hairy ball theorem” (obviously a phrase invented during a more innocent era!), and a surprisingly easy proof of same that John Baez once told me over beers in an English pub. John is quite a good story-teller, and I wasn’t able to guess the punch line of the proof before it came out of his mouth –when it came, I was so surprised that I nearly fell off my stool! Well, I happened to run across the original source of this proof recently, and though it may be “old news” for some, it’s such a nice proof that I thought it was worth sharing.

The hairy ball theorem says: every continuous tangent vector field on a sphere of even dimension must vanish somewhere (at some point of the sphere, the tangent vector is zero). In the case of an ordinary 2-dimensional sphere, if you think of a vector at a point as a little “hair” emanating from that point, then the theorem says that you can’t comb the hairs of a sphere so that they all lie flat against the sphere: there will be a cowlick sticking straight out somewhere.

The classical proofs usually make some kind of appeal to homology theory: for example, a deep result is that the Euler characteristic of a compact manifold can be computed in terms of any continuously differentiable tangent vector field, by adding up the so-called “indices” of the vector field in the neighborhoods of critical points, where the vector field vanishes (a technical result shows there is no loss of generality in assuming the vector field is continuously differentiable). If the vector field vanishes nowhere, then the Euler characteristic is the empty sum 0; this cannot be in the case of an even-dimensional sphere, because its Euler characteristic is 2. The hairy ball theorem follows.

Some of these homology-based proofs are quite slick, but generally speaking, homology theory requires some heavy infrastructure; the question is whether a more elementary proof exists. The following “analytic” proof is due to John Milnor and uses very little machinery, basically nothing beyond advanced calculus. I will follow his exposition (American Mathematical Monthly, July 1978, pp. 521-524) pretty closely.

For the first step, suppose that we have a continuously differentiable vector field v defined in a compact region of space \mathbb{R}^n, A. For any real t and for x \in A, define a function

f_t := (x \mapsto x + tv(x))

The matrix of first partial derivatives of f_t is \displaystyle I + t[\frac{\partial v_i}{\partial x_j}]_{ij}, where I denotes the n \times n identity matrix. For |t| sufficiently small, the determinant of this matrix is strictly positive over all of A.

Lemma 1: If |t| is sufficiently small, then f_t is a one-to-one function of A onto its image, and \mbox{vol}(f_t(A)) is a polynomial function of t.

Proof: Since v is continuously differentiable over a compact region, there is [using e.g. the mean value theorem, evidently a red rag for some people 😉 ] a Lipschitz constant c > 0 so that

|v(x) - v(y)| \leq c|x-y|, \qquad x, y \in A

We have f_t(x) = f_t(y) only if t(v(x) - v(y)) = y - x; if we assume |t| < c^{-1}, this can happen only if x = y. So f_t is one-to-one for such t.

The determinant of the matrix of first partials above is of the form

1 + t\sigma_1(x) + t^2\sigma_2(x) + \ldots + t^n \sigma_n(x)

where the \sigma_k are continuous functions in x. By the first part of the lemma, we may take |t| so small that f_t is a continuously differentiable embedding, and then by a change-of-variables formula in multivariate calculus we have that

\displaystyle \mbox{vol}(f_t(A)) = a_0 + a_1 t + \ldots + a_n t^n

where \displaystyle a_k = \int_A \sigma_k(x) dx_1 \ldots dx_n. This completes the proof. \Box

Next, suppose that on S^{n-1} we have a continuously differentiable non-vanishing field v of tangent vectors. Applying the continuously differentiable map v \mapsto v/|v|, we assume the vector field v consists of unit tangent vectors. For each u \in S^{n-1}, the vector u + tv(u) is thus of length (1 + t^2)^{1/2}, hence f_t maps the unit sphere S^{n-1} to the sphere of radius (1 + t^2)^{1/2}.

Lemma 2: For sufficiently small |t|, the map f_t maps S^{n-1} onto the sphere of radius (1 + t^2)^{1/2}.

Proof: Extend the vector field v on S^{n-1} (and therefore also f_t) to the compact region between two concentric spheres, A = \{v \in \mathbb{R}^n: 1/2 \leq |v| \leq 3/2\}, by homothety, i.e., put v(ru) = rv(u), f_t(ru) = rf_t(u) for 1/2 \leq r \leq 3/2 and u \in S^{n-1}. There is a Lipschitz constant c > 0 such that

|v(x) - v(y)| \leq c|x - y| for x, y \in A.

Take |t| < \mbox{min}(c^{-1}, 1/3), and let u_0 be any unit vector. The function

h(x) := u_0 - tv(x), \qquad x \in A

maps the complete metric space A to itself (because |t| < 1/3 and |v(x)| \leq 3/2 — just use triangle inequalities), and h satisfies a Lipschitz condition |h(x) - h(y)| \leq k|x - y| where k = c|t| < 1. By a classical fixed-point theorem, h has under these conditions a (unique) fixed point x, so that x + tv(x) = u_0. Rescaling u_0 and x by the factor (1 + t^2)^{1/2}, the statement of lemma 2 follows. \Box

Now we prove the hairy ball theorem. If w is a continuous non-vanishing vector field of tangent vectors on S^{n-1}, let \varepsilon > 0 be the absolute minimum of |w|. By standard techniques (e.g., using the Stone-Weierstrass approximation theorem), there is a continuously differentiable vector field v such that |v - w| < \varepsilon/2, and then |v| > \varepsilon/2 so that v is also non-vanishing. As above, we may then substitute v/|v| for v, i.e., assume that v consists of unit vectors.

Given b > 1 > a > 0, we extend v to the region A = \{x \in R^n: a \leq |x| \leq b\} by homothety. For sufficiently small |t|, the map f_t defined above maps a spherical shell |x| = r in this region bijectively onto the shell |x| = r(1 + t^2)^{1/2}. Hence f_t maps A to the region \{x: a(1 + t^2)^{1/2} \leq |x| \leq b(1 + t^2)^{1/2}\}, and we get a dilation factor:

\mbox{vol}(f_t(A)) = (1 + t^2)^{n/2}\mbox{vol}(A).

By lemma 1, (1 + t^2)^{n/2} is polynomial in t. So n must be even; therefore S^{n-1} admits a non-vanishing vector field only if n-1 is odd. This gives the hairy ball theorem. \Box

Man, what an awesome proof. That John Milnor is just a master of technique.

Just a quick note on how any of this bears on Milenkovic’s problem. He asked whether for any topological embedding of S^{n-1} in \mathbb{R}^n and any point P in the region I interior to the embedding, there exists a hyperplane H through P such that the barycenter of the (n-1)-dimensional region H \cap I coincides with P.

Under the further simplifying assumption that the barycenter varies continuously with H, the answer is ‘yes’ for even-dimensional spheres. For (taking P to be the origin) we can define a tangent vector field whose value at u \in S^{n-1} is the vector from P to the barycenter of u^\perp \cap I. For n-1 even, this vector vanishes for some u, hence P coincides with the barycenter for that particular u.

After this brief (?) categorical interlude, I’d like to pick up the main thread again, and take a closer look at the some of the ingredients of baby Stone duality in the context of categorical algebra, specifically through the lens of adjoint functors. By passing a topological light through this lens, we will produce the spectrum of a Boolean algebra: a key construction of full-fledged Stone duality!

Just before the interlude, we were discussing some consequences of baby Stone duality. Taking it from the top, we recalled that there are canonical maps

i_B: B \to \hom(\mbox{Bool}(B, \mathbf{2}), \mathbf{2}): b \mapsto (\phi \mapsto \phi(b))

j_X: X \to \mbox{Bool}(\hom(X, \mathbf{2}), \mathbf{2}): x \mapsto (\sigma \mapsto \sigma(x))

in the categories of Boolean algebras B and sets X. We said these are “natural” maps (even before the notion of naturality had been formally introduced), and recalled our earlier result that these are isomorphisms when B and X are finite (which is manifestly untrue in general; for instance, if B is a free Boolean algebra generated by a countable set, then for simple reasons of cardinality B cannot be a power set).

What we have here is an adjoint pair of functors between the categories Set and \mbox{Bool} of sets and Boolean algebras, each given by a hom-functor:

(P^{op} = \hom(-, \mathbf{2})^{op}: Set \to \mbox{Bool}^{op}) \dashv (Q = \mbox{Bool}(-, \mathbf{2}): \mbox{Bool}^{op} \to Set)

(P^{op} acts the same way on objects and morphisms as P: Set^{op} \to \mbox{Bool}, but is regarded as mapping between the opposite categories). This actually says something very simple: that there is a natural bijection between Boolean algebra maps and functions

\displaystyle \frac{\phi: B \to \hom(X, \mathbf{2})}{\hat{\phi}: X \to \mbox{Bool}(B, \mathbf{2})}

given by the formula \hat{\phi}(x)(b) = \phi(b)(x). [The very simple nature of this formula suggests that it’s nothing special to Boolean algebras — a similar adjunction could be defined for any algebraic theory defined by operations and (universally quantified) equations, replacing \mathbf{2} by any model of that theory.] The unit of the adjunction at X is the function j_X: X \to QP^{op}X, and the counit at B is the Boolean algebra map i_B (regarded as a morphism \varepsilon_B: P^{op}QB \to B mapping the other way in the opposite category \mbox{Bool}^{op}).

The functor QP^{op} is usually described in the language of ultrafilters, as I will now explain.

Earlier, we remarked that an ultrafilter in a Boolean algebra is a maximal filter, dual to a maximal ideal; let’s recall what that means. A maximal ideal I in a Boolean ring B is the kernel of a (unique) ring map

\phi: B \to \mathbf{2}

i.e., has the form I = \phi^{-1}(0) for some such map. Being an ideal, it is an additive subgroup I \subseteq  B such that x \in B, y \in I implies x y = x \wedge y \in I. It follows that if x, y \in I, then x \vee y = x + y + xy \in I, so I is closed under finite joins (including the empty join 0 = \bot). Also, if x \leq y and y \in I, then x = x \wedge y = xy \in I, so that I is “downward-closed”.

Conversely, a downward-closed subset I \subseteq B which is closed under finite joins is an ideal in B (exercise!). Finally, if I is a maximal ideal, then under the quotient map

\phi: B \to B/I \cong \mathbf{2}

we have that for all b \in B, either \phi(b) = 0 or \phi(b) = 1, i.e., that either b \in I or \neg b = 1 - b \in I.

Thus we have redefined the notion of maximal ideal in a Boolean algebra in the first-order theory of posets: a downward-closed set I \subseteq B closed under finite joins, such that every element or its complement (but never both!) is contained in I. [If both x, \neg x \in I, then x \vee \neg x = 1 \in I, whence b \in I for all b \in B (since b \leq 1 and I is downward-closed). But then I isn’t a maximal (proper) ideal!]

The notion of ultrafilter is dual, so an ultrafilter in a Boolean algebra B is defined to be a subset F \subseteq B which

  • Is upward-closed: if x \in F and x \leq y, then y \in F;
  • Is closed under finite meets: if x, y \in F, then x \wedge y \in F;
  • Satisfies dichotomy: for every x \in B, exactly one of x, \neg x belongs to F.

If I is a maximal ideal, then \neg I = \{\neg x: x \in I\} is an ultrafilter, and we have natural bijections between the following concepts:

Boolean algebra maps B \to \mathbf{2} \leftrightarrow maximal ideals I \subseteq B \leftrightarrow ultrafilters F \subseteq B

so that QB = \mbox{Bool}(B, \mathbf{2}) is naturally identified with the set of ultrafilters in B.

If X is a set, then an ultrafilter on X is by definition an ultrafilter in the Boolean algebra P X. Hence QP^{op}X is identified with the set of ultrafilters on X, usually denoted \beta X. The unit map

j_X: X \to QP^{op}X \cong \beta X

maps x \in X to an ultrafilter denoted \mbox{prin}_X(x) \subseteq PX, consisting of all subsets S \subseteq X which contain x, and called the principal ultrafilter generated by x.

We saw that when X is finite, the function j_X (and therefore also \mbox{prin}_X) is a bijection: there every ultrafilter is principal, as part of baby Stone duality (see Proposition 4 here). Here is a slight generalization:

Proposition 1: If an ultrafilter F on X contains a finite set S \subseteq X, then F is principal.

Proof: It is enough to show F contains \{x\} for some x \in S. If not, then F contains the complement \neg\{x\} for every x \in S (by dichotomy), and therefore also the finite intersection

\bigcap_{x \in S} \neg\{x\} = \neg S,

which contradicts the fact that S \in F. \Box

It follows that nonprincipal ultrafilters can exist only on infinite sets X, and that every cofinite subset of X (complement of a finite set) belongs to such an ultrafilter (by dichotomy). The collection of cofinite sets forms a filter, and so the question of existence of nonprincipal ultrafilters is the question of whether the filter of cofinite sets can be extended to an ultrafilter. Under the axiom of choice, the answer is yes:

Proposition 2: Every (proper) filter in a Boolean algebra is contained in some ultrafilter.

Proof: This is dual to the statement that every proper ideal in a Boolean ring is contained in a maximal ideal. Either statement may be proved by appeal to Zorn’s lemma: the collection of filters which contain a given filter has the property that every linear chain of such filters has an upper bound (namely, the union of the chain), and so by Zorn there is a maximal such filter. \Box

As usual, Zorn’s lemma is a kind of black box: it guarantees existence without giving a clue to an explicit construction. In fact, nonprincipal ultrafilters on sets X, like well-orderings of the reals, are notoriously inexplicit: no one has ever seen one directly, and no one ever will.

That said, one can still develop some intuition for ultrafilters. I think of them as something like “fat nets”. Each ultrafilter F on a set X defines a poset (of subsets ordered by inclusion), but I find it more suggestive to consider instead the opposite F^{op}, where U \leq V in F^{op} means V \subseteq U — so that the further or deeper you go in F^{op}, the smaller or more concentrated the element. Since F is closed under finite intersections, F^{op} has finite joins, so that F^{op} is directed (any two elements have an upper bound), just like the elements of a net (or more pedantically, the domain of a net). I call an ultrafilter a “fat net” because its elements, being subsets of X, are “fatter” than mere points.

Intuitively speaking, ultrafilters as nets “move in a definite direction”, in the sense that given an element U \in F, however far in the net, and given a subset T \subseteq U, the ultrafilter-as-net sniffs out a direction in which to proceed, “tunneling” either into T if T \in F, or into its relative complement U \cap \neg T if this belongs to F. In the case of a principal ultrafilter, there is a final element x of the net; otherwise not (but we can think of a nonprincipal ultrafilter as ending at an “ideal point” of the set X if we want).

Since the intuitive imagery here is already vaguely topological, we may as well make the connection with topology more precise. So, suppose now that X comes equipped with a topology. We say that an ultrafilter F on X converges to a point x \in X if each open set U containing x (or each neighborhood of x) belongs to the ultrafilter. In other words, by going deep enough into the ultrafilter-as-net, you get within any chosen neighborhood of the point. We write F \to x to say that F converges to x.

General topology can be completely developed in terms of the notion of ultrafilter convergence, often very efficiently. For example, starting with any relation whatsoever between ultrafilters and points,

c \subseteq \beta(X) \times X,

we can naturally define a topology \mbox{top}(c) on X so that

F \to x with respect to \mbox{top}(c) whenever (F, x) \in c.

Let’s tackle that in stages: in order for the displayed condition to hold, a neighborhood of x must belong to every ultrafilter F for which (F, x) \in c. This suggests that we try defining the filter N_x of neighborhoods of x to be the intersection of ultrafilters

N_x = \bigcap_{F: (F, x) \in c} F.

Then define a subset U \subseteq X to be open if it is a neighborhood of all the points it contains. In other words, define U to be open if

\forall_{(F, x) \in c} x \in U \Rightarrow U \in F.

Proposition 3: This defines a topology, \mbox{top}(c).

Proof: Since X \in F for every ultrafilter F, it is clear that X is open; also, it is vacuously true that the empty set is open. If U, V are open, then for all (F, x) \in c, whenever x \in U \cap V, we have x \in U and x \in V, so that U \in F and V \in F by openness, whence U \cap V \in F since F is closed under intersections. So U \cap V is also open. Finally, suppose U_i is a collection of open sets. For all (F, x) \in c, if x \in \bigcup_i U_i, then x \in U_i for some i, so that U_i \in F by openness, whence \bigcup_i U_i \in F since ultrafilters are upward closed. So \bigcup_i U_i is also open. \Box.

Let’s recap: starting from a topology \tau on X, we’ve defined a convergence relation \mbox{conv}(\tau) \subseteq \beta(X) \times X (consisting of pairs (F, x) such that F \to x), and conversely, given any relation c \subseteq \beta(X) \times X, we’ve defined a topology \mbox{top}(c) on X. What we actually have here is a Galois connection where

c \subseteq \mbox{conv}(\tau) if and only if \tau \subseteq \mbox{top}(c)

Of course not every relation c \subseteq \beta(X) \times X is the convergence relation of a topology, so we don’t quite have a Galois correspondence (that is, \mbox{conv} and \mbox{top} are not quite inverse to one another). But, it is true that every topology \tau is the topology of its ultrafilter convergence relation, i.e., \tau = \mbox{top(conv}(\tau)). For this, it suffices to show that every neighborhood filter N_x is the intersection of the ultrafilters that contain it. But that is true of any filter:

Theorem 1: If N is a filter in PX and A \notin N, then there exists an ultrafilter F for which N \subseteq F and A \notin F.

Proof: First I claim \neg A \cap B \neq 0 for all B \in N; otherwise \neg A \cap B = 0 for some B \in N, whence B \subseteq \neg \neg A = A, so that A \in N since filters are upward closed, contradiction. It follows that N can be extended to the (proper) filter

\{C \in PX: \exists_{B \in N} \neg A \cap B \subseteq C\}

which in turn extends to some ultrafilter F, by proposition 2. Since \neg A \in F, we have A \notin F. \Box

Corollary 1: Every filter is the intersection of all the ultrafilters which contain it.

The ultrafilter convergence approach to topology is particularly convenient for studies of compactness:

Theorem 2: A space X is compact if and only if every ultrafilter F converges to at least one point. It is Hausdorff if and only if every ultrafilter converges to at most one point.

Proof: First suppose that X is compact, and (in view of a contradiction) that F converges to no point of X. This means that for every x \in X there is a neighborhood U_x which does not belong to F, or that \neg U_x \in F. Finitely many of these U_x cover X, by compactness. By De Morgan’s law, this means finitely many \neg U_x have empty intersection. But this would mean \emptyset \in F, since F is closed under finite intersections, contradiction.

In the reverse direction, suppose that every ultrafilter converges. We need to show that if C = \{U_i\}_{i \in I} is any collection of open subsets of X such that no finite subcollection covers X, then the union of the U_i cannot cover X. First, because no finite subcollection covers, we may construct a filter generated by the complements:

F = \{A \subseteq X: \bigcap_{j=1}^n \neg U_{i_j} \subseteq A \mbox{ for finitely many  } U_{i_1}, \ldots, U_{i_n} \in C\}.

Extend this filter to an ultrafilter G; then by assumption \exists_{x \in X} G \to x. If some one of the U_i contained x, then U_i \in G by definition of convergence. But we also have \neg U_i \in F \subseteq G, and this is a contradiction. So, x lies outside the union of the U_i, as was to be shown.

Now let X be Hausdorff, and suppose that F \to x and F \to y. Let U_x, U_y be neighborhoods of x, y respectively with empty intersection. By definition of convergence, we have U_x, U_y \in F, whence \emptyset = U_x \cap U_y \in F, contradiction.

Conversely, suppose every ultrafilter converges to at most one point, and let x, y be two distinct points. Unless there are neighborhoods U_x, U_y of x, y respectively such that U_x \cap U_y = \emptyset (which is what we want), the smallest filter containing the two neighborhood filters N_x, N_y (that is to say, the join N_x \vee N_y in the poset of filters) is proper, and hence extends to an ultrafilter F. But then N_x \subseteq F and N_y \subseteq F, which is to say F \to x and F \to y, contradiction. \Box

Theorem 2 is very useful; among other things it paves the way for a clean and conceptual proof of Tychonoff’s theorem (that an arbitrary product of compact spaces is compact). For now we note that it says that a topology \tau is the topology of a compact Hausdorff space structure on X if and only if the convergence relation \mbox{conv}(\tau) \subseteq \beta(X) \times X is a function. And in practice, functions c: \beta(X) \to X which arise “naturally” tend to be such convergence relations, making X a compact Hausdorff space.

Here is our key example. Let B be a Boolean algebra, and let X = QB = \mbox{Bool}(B, \mathbf{2}), which we have identified with the set of ultrafilters in B. Define a map c: \beta(X) \to X by

\displaystyle \beta (QB) \cong QP^{op}QB \stackrel{Q\varepsilon_B}{\to} QB

where \varepsilon_B: P^{op}QB \to B was the counit (evaluated at B) of the adjunction P^{op} \dashv Q defined at the top of this post. Unpacking the definitions a bit, the map Q \varepsilon_B is the map \mbox{Bool}(i_B, \mathbf{2}), the result of applying the hom-functor \mbox{Bool}(-, \mathbf{2}) to

i_B: B \to P^{op}QB = \hom(\mbox{Bool}(B, \mathbf{2}), \mathbf{2}): b \mapsto (\phi \mapsto \phi(b))

Chasing this a little further, the map c “pulls back” an ultrafilter F \subseteq P^{op}QB to the ultrafilter i_B^{-1}(F) \subseteq B, viewed as an element of QB. We then topologize QB by the topology \mbox{top}(c).

This construction is about as “abstract nonsense” as it gets, but you have to admit that it’s pretty darned canonical! The topological space QB we get in this way is called the spectrum of the Boolean algebra B. If you’ve seen a bit of algebraic geometry, then you probably know another, somewhat more elementary way of defining the spectrum (of B as commutative ring), so we may as well make the connection explicit. However you define it, the result is a compact Hausdorff space structure with some other properties which make it very reminiscent of Cantor space.

It is first of all easy to see that X = QB is compact, i.e., that every ultrafilter F converges. Indeed, the relation c is a function \beta (QB) \to QB, and if you look at the condition for a set U to be open w.r.t. \mbox{top}(c),

\forall_{(F, x = c(F))} x \in U \Rightarrow U \in F,

you see immediately that F converges to x = c(F).

To get Hausdorffness, take two distinct points u, v \in QB (ultrafilters in B). Since these are distinct maximal filters, there exists b \in B such that b belongs to u but not to v, and then \neg b belongs to v but not to u. Define

U(b) := \{w \in QB: b \in w\}.

Proposition 4: U(b) is open in \mbox{top}(c).

Proof: We must check that for all ultrafilters F on QB, that

c(F) \in U(b) \Rightarrow U(b) \in F.

But c(F) = i_B^{-1}(F). By definition of U(b), we are thus reduced to checking that

b \in i_B^{-1}(F) \Rightarrow U(b) \in F

or that i_B(b) \in F \Rightarrow U(b) \in F. But i_B(b) \in P^{op}QB (as a subset of QB) is U(b)! \Box

As a result, U(b) and U(\neg b) are open sets containing the given points u, v. They are disjoint since in fact U(\neg b) = \neg U(b) (indeed, because i_B preserves negation). This gives Hausdorffness, and also that the U(b) are clopen (closed and open).

We actually get a lot more:

Proposition 5: The collection \{U(b): b \in B\} is a basis for the topology \mbox{top}(c) on QB.

Proof: The sets U(b) form a basis for some topology \tau, because U(b) \wedge U(c) = U(b \wedge c) (indeed, i_B preserves meets). By the previous proposition, \tau \subseteq \mbox{top}(c). So the identity on QB gives a continuous comparison map

QB_{\mbox{top}(c)} \to QB_{\tau}

between the two topologies. But a continuous bijection from a compact space to a Hausdorff space is necessarily a homeomorphism, so \tau = \mbox{top}(c). \Box

  • Remark: In particular, the canonical topology on \beta X = QP^{op}X is compact Hausdorff; this space is called the Stone-Cech compactification of (the discrete space) X. The methods exploited in this lecture can be used to show that in fact \beta X is the free compact Hausdorff space generated from the set X, meaning that the functor \beta is left adjoint to the underlying-set functor U: \mbox{CompHaus} \to Set. In fact, one can go rather further in this vein: a fascinating result (first proved by Eduardo Manes in his PhD thesis) is that the concept of compact Hausdorff space is algebraic (is monadic with respect to the monad \beta): there is a equationally defined theory where the class of J-ary operations (for each cardinal J) is coded by the set of ultrafilters \beta J, and whose models are precisely compact Hausdorff spaces. This goes beyond the scope of these lectures, but for the theory of monads, see the entertaining YouTube lectures by the Catsters!

In our last post on category theory, we continued our exploration of universal properties, showing how they can be used to motivate the concept of natural transformation, the “right” notion of morphism \phi: F \to G between functors F, G. In today’s post, I want to turn things around, applying the notion of natural transformation to explain generally what we mean by a universal construction. The key concept is the notion of representability, at the center of a circle of ideas which includes the Yoneda lemma, adjoint functors, monads, and other things — it won’t be possible to talk about all these things in detail (because I really want to return to Stone duality before long), but perhaps these notes will provide a key of entry into more thorough treatments.

Even for a fanatic like myself, it’s a little hard to see what would drive anyone to study category theory except a pretty serious “need to know” (there is a beauty and conceptual economy to categorical thinking, but I’m not sure that’s compelling enough motivation!). I myself began learning category theory on my own as an undergraduate; at the time I had only the vaguest glimmerings of a vast underlying unity to mathematics, but it was only after discovering the existence of category theory by accident (reading the introductory chapter of Spanier’s Algebraic Topology) that I began to suspect it held the answer to a lot of questions I had. So I got pretty fired-up about it then, and started to read Mac Lane’s Categories for the Working Mathematician. I think that even today this book remains the best serious introduction to the subject — for those who need to know! But category theory should be learned from many sources and in terms of its many applications. Happily, there are now quite a few resources on the Web and a number of blogs which discuss category theory (such as The Unapologetic Mathematician) at the entry level, with widely differing applications in mind. An embarrassment of riches!

Anyway, to return to today’s topic. Way back when, when we were first discussing posets, most of our examples of posets were of a “concrete” nature: sets of subsets of various types, ordered by inclusion. In fact, we went a little further and observed that any poset could be represented as a concrete poset, by means of a “Dedekind embedding” (bearing a familial resemblance to Cayley’s lemma, which says that any group can be represented concretely, as a group of permutations). Such concrete representation theorems are extremely important in mathematics; in fact, this whole series is a trope on the Stone representation theorem, that every Boolean algebra is an algebra of sets! With that, I want to discuss a representation theorem for categories, where every (small) category can be explicitly embedded in a concrete category of “structured sets” (properly interpreted). This is the famous Yoneda embedding.

This requires some preface. First, we need the following fundamental construction: for every category C there is an opposite category C^{op}, having the same classes O, M of objects and morphisms as C, but with domain and codomain switched (\mbox{dom}^{op} := \mbox{cod}: M \to O, and \mbox{cod}^{op} := \mbox{dom}: M \to O). The function O \to M: A \mapsto 1_A is the same in both cases, but we see that the class of composable pairs of morphisms is modified:

(f, g) \in C^{op}_2 [is a composable pair in C^{op}] if and only if (g, f) \in C_2

and accordingly, we define composition of morphisms in C^{op} in the order opposite to composition in C:

(g \circ f)^{op} := f \circ g in C.

Observation: The categorical axioms are satisfied in the structure C^{op} if and only if they are in C; also, (C^{op})^{op} = C.

This observation is the underpinning of a Principle of Duality in the theory of categories (extending the principle of duality in the theory of posets). As the construction of opposite categories suggests, the dual of a sentence expressed in the first-order language of category theory is obtained by reversing the directions of all arrows and the order of composition of morphisms, but otherwise keeping the logical structure the same. Let me give a quick example:

Definition: Let X_1, X_2 be objects in a category C. A coproduct of X_1 and X_2 consists of an object X and maps i_1: X_1 \to X, i_2: X_2 \to X (called injection or coprojection maps), satisfying the universal property that given an object Y and maps f_1: X_1 \to Y, f_2: X_2 \to Y, there exists a unique map f: X \to Y such that f_1 = f \circ i_1 and f_2 = f \circ i_2. \Box

This notion is dual to the notion of product. (Often, one indicates the dual notion by appending the prefix “co” — except of course if the “co” prefix is already there; then one removes it.) In the category of sets, the coproduct of two sets X_1, X_2 may be taken to be their disjoint union X_1 + X_2, where the injections i_1, i_2 are the inclusion maps of X_1, X_2 into X_1 + X_2 (exercise).

Exercise: Formulate the notion of coequalizer (by dualizing the notion of equalizer). Describe the coequalizer of two functions \displaystyle f, g: X \stackrel{\to}{\to} Y (in the category of sets) in terms of equivalence classes. Then formulate the notion dual to that of monomorphism (called an epimorphism), and by a process of dualization, show that in any category, coequalizers are epic.

Principle of duality: If a sentence expressed in the first-order theory of categories is provable in the theory, then so is the dual sentence. Proof (sketch): A proof P of a sentence proceeds from the axioms of category theory by applying rules of inference. The dualization of P proves the dual sentence by applying the same rules of inference but starting from the duals of the categorical axioms. A formal proof of the Observation above shows that collectively, the set of categorical axioms is self-dual, so we are done. \Box

Next, we introduce the all-important hom-functors. We suppose that C is a locally small category, meaning that the class of morphisms g: c \to d between any two given objects c, d is small, i.e., is a set as opposed to a proper class. Even for large categories, this condition is just about always satisfied in mathematical practice (although there is the occasional baroque counterexample, like the category of quasitopological spaces).

Let Set denote the category of sets and functions. Then, there is a functor

\hom_C: C^{op} \times C \to Set

which, at the level of objects, takes a pair of objects (c, d) to the set \hom(c, d) of morphisms g: c \to d (in C) between them. It takes a morphism (f: c \to c', h: d \to d') of C^{op} \times C (that is to say, a pair of morphisms (f: c' \to c, h: d \to d') of C) to the function

\hom_C(f, h): \hom(c, d) \to \hom(c', d'): g \mapsto hgf.

Using the associativity and identity axioms in C, it is not hard to check that this indeed defines a functor \hom_C: C^{op} \times C \to Set. It generalizes the truth-valued pairing P^{op} \times P \to \mathbf{2} we defined earlier for posets.

Now assume C is small. From last time, there is a bijection between functors

\displaystyle \frac{h: C^{op} \times C \to Set}{f: C \to Set^{C^{op}}}

and by applying this bijection to \hom_C: C^{op} \times C \to Set, we get a functor

y_C: C \to Set^{C^{op}}.

This is the famous Yoneda embedding of the category C. It takes an object c to the hom-functor \hom_C(-, c): C^{op} \to Set. This hom-functor can be thought of as a structured, disciplined way of considering the totality of morphisms mapping into the object c, and has much to do with the Yoneda Principle we stated informally last time (and which we state precisely below).

  • Remark: We don’t need C to be small to talk about \hom_C(-, c); local smallness will do. The only place we ask that C be small is when we are considering the totality of all functors C^{op} \to Set, as forming a category \displaystyle Set^{C^{op}}.

Definition: A functor F: C^{op} \to Set is representable (with representing object c) if there is a natural isomorphism \displaystyle \hom_C(-, c) \stackrel{\sim}{\to} F of functors.

The concept of representability is key to discussing what is meant by a universal construction in general. To clarify its role, let’s go back to one of our standard examples.

Let c_1, c_2 be objects in a category C, and let F: C^{op} \to Set be the functor \hom(-, c_1) \times \hom(-, c_2); that is, the functor which takes an object b of C to the set \hom(b, c_1) \times \hom(b, c_2). Then a representing object for F is a product c_1 \times c_2 in C. Indeed, the isomorphism between sets \hom(b, c_1 \times c_2) \cong \hom(b, c_1) \times \hom(b, c_2) simply recapitulates that we have a bijection

\displaystyle \frac{b \to c_1 \times c_2}{b \to c_1 \qquad b \to c_2}

between morphisms into the product and pairs of morphisms. But wait, not just an isomorphism: we said a natural isomorphism (between functors in the argument b) — how does naturality figure in?

Enter stage left the celebrated

Yoneda Lemma: Given a functor F: C^{op} \to Set and an object c of C, natural transformations \phi: \hom(-, c) \to F are in (natural!) bijection with elements \xi \in F(c).

Proof: We apply the “Yoneda trick” introduced last time: probe the representing object c with the identity morphism, and see where \phi takes it: put \xi = \phi_c(1_c). Incredibly, this single element \xi determines the rest of the transformation \phi: by chasing the element 1_c \in \hom(c, c) around the diagram

                phi_c
      hom(c, c) -----> Fc
          |            |
hom(f, c) |            | Ff
          V            V
      hom(b, c) -----> Fb
                phi_b

(which commutes by naturality of \phi), we see for any morphism f: b \to c in \hom(b, c) that \phi_b(f) = F(f)(\xi). That the bijection

\displaystyle \frac{\xi: 1 \to F(c)}{\phi: \hom(-, c) \to F}

is natural in the arguments F, c we leave as an exercise. \Box

Returning to our example of the product c_1 \times c_2 as representing object, the Yoneda lemma implies that the natural bijection

\displaystyle \phi_b: \hom(b, c_1 \times c_2) \cong \hom(b, c_1) \times \hom(b, c_2)

is induced by the element \xi = \phi_{c_1 \times c_2}(1_{c_1 \times c_2}), and this element is none other than the pair of projection maps

\xi = (\pi_1: c_1 \times c_2 \to c_1, \pi_2: c_1 \times c_2 \to c_2).

In summary, the Yoneda lemma guarantees that a hom-representation \phi: \hom(-, c) \cong F of a functor is, by the naturality assumption, induced in a uniform way from a single “universal” element \xi \in F(c). All universal constructions fall within this general pattern.

Example: Let C be a category with products, and let c, d be objects. Then a representing object for the functor \hom(- \times c, d): C^{op} \to Set is an exponential d^c; the universal element \xi \in \hom(d^c \times c, d) is the evaluation map d^c \times c \to d.

Exercise: Let \displaystyle f, g: x \stackrel{\to}{\to} y be a pair of parallel arrows in a category C. Describe a functor F: C^{op} \to Set which is represented by an equalizer of this pair (assuming one exists).

Exercise: Dualize the Yoneda lemma by considering hom-functors \hom_C(c, -): C \to Set. Express the universal property of the coproduct in terms of representability by such hom-functors.

The Yoneda lemma has a useful corollary: for any (locally small) category C, there is a natural isomorphism

\displaystyle \frac{\hom_C(-, a) \to \hom_C(-, b)}{a \to b}

between natural transformations between hom-functors and morphisms in C. Using C(a, b) as alternate notation for the hom-set, the action of the Yoneda embedding functor y_C on morphisms gives an isomorphism between hom-sets

\displaystyle C(a, b) \stackrel{\sim}{\to} Set^{C^{op}}(y_C a, y_C b);

the functor y_C is said in that case to be fully faithful (faithful means this action on morphisms is injective for all a, b, and full means the action is surjective for all a, b). The Yoneda embedding y_C thus maps C isomorphically onto the category of hom-functors y_C a = \hom_C(-, a) valued in the category Set.

It is illuminating to work out the meaning of this last statement in special cases. When the category C is a group G (that is, a category with exactly one object \bullet in which every morphism is invertible), then functors F: G^{op} \to Set are tantamount to sets X equipped with a group homomorphism G^{op} \to \hom(X, X), i.e., a left action of G^{op}, or a right action of G. In particular, \hom(-, \bullet): G^{op} \to Set is the underlying set of G, equipped with the canonical right action \rho: G \to \hom(G, G), where \rho(g)(h) = hg. Moreover, natural transformations between functors G^{op} \to Set are tantamount to morphisms of right G-sets. Now, the Yoneda embedding

y_G: G \to Set^{G^{op}}

identifies any abstract group G with a concrete group y_G(G), i.e., with a group of permutations — namely, exactly those permutations on G which respect the right action of G on itself. This is the sophisticated version of Cayley’s theorem in group theory. If on the other hand we take C to be a poset, then the Yoneda embedding is tantamount to the Dedekind embedding we discussed in the first lecture.

Tying up a loose thread, let us now formulate the “Yoneda principle” precisely. Informally, it says that an object is determined up to isomorphism by the morphisms mapping into it. Using the hom-functor \hom(-, c) to collate the morphisms mapping into c, the precise form of the Yoneda principle says that an isomorphism between representables \hom(-, c) \to \hom(-, d) corresponds to a unique isomorphism c \to d between objects. This follows easily from the Yoneda lemma.

But far and away, the most profound manifestation of representability is in the notion of an adjoint pair of functors. “Free constructions” give a particularly ubiquitous class of examples; the basic idea will be explained in terms of free groups, but the categorical formulation applies quite generally (e.g., to free monoids, free Boolean algebras, free rings = polynomial algebras, etc., etc.).

If X is a set, the free group (q.v.) generated by X is, informally, the group FX whose elements are finite “words” built from “literals” a, b, c, \ldots which are the elements of X and their formal inverses, where we identify a word with any other gotten by introducing or deleting appearances of consecutive literals a a^{-1} or a^{-1}a. Janis Joplin said it best:

Freedom’s just another word for nothin’ left to lose…

— there are no relations between the generators of FX beyond the bare minimum required by the group axioms.

Categorically, the free group FX is defined by a universal property; loosely speaking, for any group G, there is a natural bijection between group homomorphisms and functions

\displaystyle \frac{FX \to G}{X \to UG}

where UG denotes the underlying set of the group. That is, we are free to assign elements of G to elements of X any way we like: any function f: X \to UG extends uniquely to a group homomorphism \hat{f}: FX \to G, sending a word x_1 x_2 \ldots x_n in FX to the element f(x_1)f(x_2) \ldots f(x_n) in G.

Using the usual Yoneda trick, or the dual of the Yoneda trick, this isomorphism is induced by a universal function i: X \to UFX, gotten by applying the bijection above to the identity map id: FX \to FX. Concretely, this function takes an element x \in X to the one-letter word x \in UFX in the underlying set of the free group. The universal property states that the bijection above is effected by composing with this universal map:

\displaystyle \hom_{Grp}(FX, G) \to \hom_{Set}(UFX, UG) \stackrel{\hom(i, UG)}{\to} \hom_{Set}(X, UG)

where the first arrow refers to the action of the underlying-set or forgetful functor U: Grp \to Set, mapping the category of groups to the category of sets (U “forgets” the fact that homomorphisms f: G \to H preserve group structure, and just thinks of them as functions Uf: UG \to UH).

  • Remark: Some people might say this a little less formally: that the original function f: X \to G is retrieved from the extension homomorphism \hat{f}: FX \to G by composing with the canonical injection of the generators X \to FX. The reason we don’t say this is that there’s a confusion of categories here: properly speaking, FX \to G belongs to the category of groups, and X \to G to the category of sets. The underlying-set functor U: Grp \to Set is a device we apply to eliminate the confusion.

In different words, the universal property of free groups says that the functor \hom_{Set}(X, U-): Grp \to Set, i.e., the underlying functor U: Grp \to Set followed by the hom-functor \hom(X, -): Set \to Set, is representable by the free group FX: there is a natural isomorphism of functors from groups to sets:

Grp(FX, -) \stackrel{\sim}{\to} Set(X, U-).

Now, the free group FX can be constructed for any set X. Moreover, the construction is functorial: defines a functor F: Set \to Grp. This is actually a good exercise in working with universal properties. In outline: given a function f: X \to Y, the homomorphism Ff: FX \to FY is the one which corresponds bijectively to the function

\displaystyle X \stackrel{f}{\to} Y \stackrel{i_Y}{\to} UFY,

i.e., Ff is defined to be the unique map h such that Uh \circ i_X = i_Y \circ f.

Proposition: F: Set \to Grp is functorial (i.e., preserves morphism identities and morphism composition).

Proof: Suppose f: X \to Y, g: Y \to Z is a composable pair of morphisms in Set. By universality, there is a unique map h: FX \to FZ, namely F(g \circ f), such that Uh \circ i_X = i_Z \circ (g \circ f). But Fg \circ Ff also has this property, since

U(Fg \circ Ff) \circ i_X = UFg \circ UFf \circ i_X = UFg \circ i_Y \circ f = i_Z \circ g \circ f

(where we used functoriality of U in the first equation). Hence F(g \circ f) = Fg \circ Ff. Another universality argument shows that F preserves identities. \Box

Observe that the functor F is rigged so that for all morphisms f: X \to Y,

UFf \circ i_X = i_Y \circ f.

That is to say, that there is only one way of defining F so that the universal map i_X is (the component at X of) a natural transformation 1_{Set} \to UF!

The underlying-set and free functors U: Grp \to Set, F: Set \to Grp are called adjoints, generalizing the notion of adjoint in truth-valued matrix algebra: we have an isomorphism

\hom_{Grp}(FX, Y) \cong \hom_{Set}(X, UY)

natural in both arguments X, Y. We say that F is left adjoint to U, or dually, that U is right adjoint to F, and write F \dashv U. The transformation i: 1 \to UF is called the unit of the adjunction.

Exercise: Define the construction dual to the unit, called the counit, as a transformation \varepsilon: FU \to 1. Describe this concretely in the case of the free-underlying adjunction F \dashv U between sets and groups.

What makes the concept of adjoint functors so compelling is that it combines representability with duality: the manifest symmetry of an adjunction \hom(FX, Y) \cong \hom(X, GY) means that we can equally well think of GY as representing \hom(F-, Y) as we can FX as representing \hom(X, G-). Time is up for today, but we’ll be seeing more of adjunctions next time, when we resume our study of Stone duality.

[Tip of the hat to Robert Dawson for the Janis Joplin quip.]

I wish to bring the attention of our readers to the 36^{th} Carnival of Mathematics hosted by Charles at Rigorous Trivialities. I guess most of you already know about it. Among other articles/posts, one of Todd’s recent post Basic Category Theory I is part of the carnival. He followed it up with another post titled Basic Category Theory II. There will be a third post on the same topic some time soon. This sub-series of posts on basic category theory, if you recall, is part of the larger series on Stone Duality, which all began with Toward Stone Duality: Posets and Meets. Hope you enjoy the Carnival!

I will write a series of posts as a way of gently introducing category theory to the ‘beginner’, though I will assume that the beginner will have some level of mathematical maturity. This series will be based on the the book, Conceptual Mathematics: A first introduction to categories by Lawvere and Schanuel. So, this won’t go into most of the deeper stuff that’s covered in, say, Categories for the Working Mathematician by Mac Lane. We shall deal only with sets (as our objects) and functions (as our morphisms). This means we deal only with the Category of Sets! Therefore, the reader is not expected to know about advanced stuff like groups and/or group homomorphisms, vector spaces and/or linear transformations, etc. Also, no knowledge of college level calculus is required. Only knowledge of sets and functions, some familiarity in dealing with mathematical symbols and some knowledge of elementary combinatorics are required. That’s all!

Sets, maps and composition

An object (in this category) is a finite set or collection.

A map f: A \to B (in this category) consists of the following:

i) a set A called the domain of the map;

ii) a set B called the codomain of the map; and

iii) a rule assigning to each a \in A, an element b \in B.

We also use ‘function’, ‘transformation’, ‘operator’, ‘arrow’ and ‘morphism’ for ‘map’ depending on the context, as we shall see later.

An endomap f: A \to A is a map that has the same object as domain and codomain, which in this case is A.

An endomap in which f(a) = a for every a \in A is called an identity map, also denoted by 1_A.

Composition of maps is a process by which two maps are combined to yield a third map. Composition of maps is really at the heart of category theory, and this will be evident in plenty in the later posts. So, if we have two maps f: X \to Y and g:Y \to Z, then g \circ f: X \to Z is the third map obtained by composing f and g. Note that g \circ f is read ‘g following f‘.

Guess what? Those are all the ingredients we need to define our category of sets! Though we shall deal only with sets and functions, the following definition of a category of sets is actually pretty much the same as the general definition of a category.

Definition: A CATEGORY consists of the following:

(1) OBJECTS: these are usually denoted by A, B, C, \ldots etc.

(2) MAPS: these are usually denoted by f, g, h, \ldots etc.

(3) For each map f, one object as DOMAIN of f and one object as CODOMAIN of f. So, f: A \to B has domain A and codomain B. This is also read as ‘f is a map from A to B‘.

(4) For each object A, there exists an IDENTITY MAP, 1_A. This is also written as 1_A: A \to A.

(5) For each pair of maps f: A \to B and g: B \to C, there exists a COMPOSITE map, g \circ f: A \to C. ( ‘g following f‘.)

such that the following RULES are satisfied:

(i) (IDENTITY LAWS): If f: A \to B, then we have 1_B \circ f = f and f \circ 1_A = f.

(ii) (ASSOCIATIVE LAW): If f: A \to B , \, g: B \to C and h: C \to D, then we have (h \circ g)\circ f = h \circ (g \circ f). ( ‘h following g following f‘) Note that in this case we are allowed to write h \circ g \circ f without any ambiguity.

Exercises: Suppose A = \{ a, b, c, d \} and B = \{ 1, 2, 3 \}.

(1) How many maps f are there from A to B?

(2) How many maps f are there from A to A?

(3) How many maps f are there from B to A?

(4) How many maps f are there from B to B?

(5) How many maps f are there from A to A satisfying f \circ f = f?

(This is a non-trivial exercise for the general case in which |A| = n for some positive integer n.)

(6) How many maps g are there from B to B satisfying g \circ g = g?

(7) Can you find a pair of maps f: A \to B and g: B \to A such that g \circ f = 1_A. If yes, how many pairs can you find?

(8 ) Can you find a pair of maps h: B \to A and k: A \to B such that k \circ h = 1_B. If yes, how many pairs can you find?

Bonus exercise:

How many maps f: A \to B are there if A is the empty set and |B| = n for some n \in \mathbb{N}? What if |A| = n and B is the empty set? What if both A and B are empty?

Our other blog

Visitors to this blog

Blog Stats

  • 371,445 hits

Wikio Ranking

Wikio - Top Blogs - Sciences

Current Online Readers

Archives

March 2023
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031