You are currently browsing the category archive for the ‘Category Theory’ category.

Who doesn’t like self-referential paradoxes? There is something about them that appeals to all and sundry. And, there is also a certain air of mystery associated with them, but when people talk about such paradoxes in a non-technical fashion indiscriminately, especially when dealing with Gödel’s incompleteness theorem, then quite often it gets annoying!

Lawvere in ‘Diagonal Arguments and Cartesian Closed Categories‘ sought, among several things, to demystify the incompleteness theorem. To pique your interest, in a self-commentary on the above paper, he actually has quite a few harsh words, in a manner of speaking.

“The original aim of this article was to demystify the incompleteness theorem of Gödel and the truth-definition theory of Tarski by showing that both are consequences of some very simple algebra in the cartesian-closed setting. It was always hard for many to comprehend how Cantor’s mathematical theorem could be re-christened as a“paradox” by Russell and how Gödel’s theorem could be so often declared to be the most significant result of the 20th century. There was always the suspicion among scientists that such extra-mathematical publicity movements concealed an agenda for re-establishing belief as a substitute for science.”

In the aforesaid paper, Lawvere of course uses the language of category theory – the setting is that of cartesian closed categories – and therefore the technical presentation can easily get out of reach of most people’s heads – including myself. Thankfully, Noson S. Yanofsky has written a nice paper, ‘A Universal Approach to Self-Referential Paradoxes, Incompleteness and Fixed Points’, that is a lot more accessible and fun to read as well.Yanofsky employs only the notions of sets and functions, thereby avoiding the language of category theory, to bring out and make accessible as much as possible the content of Lawvere’s paper. Cantor’s theorem, Russell’s Paradox, the non-definability of satisfiability, Tarski’s non-definability of truth and Gödel’s (first) incompleteness theorem are all shown to be paradoxical phenomena that merely result from the existence of a cartesian closed category satisfying certain conditions. The idea is to use a single formalism to describe all these diverse phenomena.

(Dang, I just found that John Baez had already blogged on this before, way back in 2006!)

After a long hiatus, I’d like to renew the discussion of axiomatic categorical set theory, more specifically the Elementary Theory of the Category of Sets (ETCS). Last time I blogged about this, I made some initial forays into “internalizing logic” in ETCS, and described in broad brushstrokes how to use that internal logic to derive a certain amount of the structure one associates with a category of sets. Today I’d like to begin applying some of the results obtained there to the problem of constructing colimits in a category satisfying the ETCS axioms (an ETCS category, for short).

(If you’re just joining us now, and you already know some of the jargon, an ETCS category is a well-pointed topos that satisfies the axiom of choice and with a natural numbers object. We are trying to build up some of the elementary theory of such categories from scratch, with a view toward foundations of mathematics.)

But let’s see — where were we? Since it’s been a while, I was tempted to review the philosophy behind this undertaking (why one would go to all the trouble of setting up a categories-based alternative to ZFC, when time-tested ZFC is able to express virtually all of present-day mathematics on the basis of a reasonably short list of axioms?). But in the interest of time and space, I’ll confine myself to a few remarks.

As we said, a chief difference between ZFC and ETCS resides in how ETCS treats the issue of membership. In ZFC, membership is a global binary relation: we can take any two “sets” $A, B$ and ask whether $A \in B$. Whereas in ETCS, membership is a relation between entities of different sorts: we have “sets” on one side and “elements” on another, and the two are not mixed (e.g., elements are not themselves considered sets).

Further, and far more radical: in ETCS the membership relation $x \in A$ is a function, that is, an element $x$ “belongs” to only one set $A$ at a time. We can think of this as “declaring” how we are thinking of an element, that is, declaring which set (or which type) an element is being considered as belonging to. (In the jargon, ETCS is a typed theory.) This reflects a general and useful philosophic principle: that elements in isolation are considered inessential, that what counts are the aggregates or contexts in which elements are organized and interrelated. For instance, the numeral ‘2’ in isolation has no meaning; what counts is the context in which we think of it (qua rational number or qua complex number, etc.). Similarly the set of real numbers has no real sense in isolation; what counts is which category we view it in.

I believe it is reasonable to grant this principle a foundational status, but: rigorous adherence to this principle completely changes the face of what set theory looks like. If elements “belong” to only one set at a time, how then do we even define such basic concepts as subsets and intersections? These are some of these issues we discussed last time.

There are other significant differences between ZFC and ETCS: stylistically, or in terms of presentation, ZFC is more “top-down” and ETCS is more “bottom-up”. For example, in ZFC, one can pretty much define a subset $\{x \in X: P\}$ by writing down a first-order formula $P$ in the language; the comprehension (or separation) axiom scheme is a mighty sledgehammer that takes care of the rest. In the axioms of ETCS, there is no such sledgehammer: the closest thing one has to a comprehension scheme in the ETCS axioms is the power set axiom (a single axiom, not an axiom scheme). However, in the formal development of ETCS, one derives a comprehension scheme as one manually constructs the internal logic, in stages, using the simple tools of adjunctions and universal properties. We started doing some of that in our last post. So: with ZFC it’s more as if you can just hop in the car and go; with ETCS you build the car engine from smaller parts with your bare hands, but in the process you become an expert mechanic, and are not so rigidly attached to a particular make and model (e.g., much of the theory is built just on the axioms of a topos, which allows a lot more semantic leeway than one has with ZF).

But, in all fairness, that is perhaps the biggest obstacle to learning ETCS: at the outset, the tools available [mainly, the idea of a universal property] are quite simple but parsimonious, and one has to learn how to build some set-theoretic and logical concepts normally taken as “obvious” from the ground up. (Talk about “foundations”!) On the plus side, by building big logical machines from scratch, one gains a great deal of insight into the inner workings of logic, with a corresponding gain in precision and control and modularity when one would like to use these developments to design, say, automated deduction systems (where there tend to be strong advantages to using type-theoretic frameworks).

Enough philosophy for now; readers may refer to my earlier posts for more. Let’s get to work, shall we? Our last post was about the structure of (and relationships between) posets of subobjects $Sub(X)$ relative to objects $X$, and now we want to exploit the results there to build some absolute constructions, in particular finite coproducts and coequalizers. In this post we will focus on coproducts.

Note to the experts: Most textbook treatments of the formal development of topos theory (as for example Mac Lane-Moerdijk) are efficient but highly technical, involving for instance the slice theorem for toposes and, in the construction of colimits, recourse to Beck’s theorem in monad theory applied to the double power-set monad [following the elegant construction of Paré]. The very abstract nature of this style of argumentation (which in the application of Beck’s theorem expresses ideas of fourth-order set theory and higher) is no doubt partly responsible for the somewhat fearsome reputation of topos theory.

In these notes I take a much less efficient but much more elementary approach, based on an arrangement of ideas which I hope can be seen as “natural” from the point of view of naive set theory. I learned of this approach from Myles Tierney, who was my PhD supervisor, and who with Bill Lawvere co-founded elementary topos theory, but I am not aware of any place where the details of this approach have been written up before now. I should also mention that the approach taken here is not as “purist” as many topos theorists might want; for example, here and there I take advantage of the strong extensionality axiom of ETCS to simplify some arguments.

### The Empty Set and Two-Valued Logic

We begin with the easy observation that a terminal category, i.e., a category $\mathbf{1}$ with just one object and one morphism (the identity), satisfies all the ETCS axioms. Ditto for any category $C$ equivalent to $\mathbf{1}$ (where every object is terminal). Such boring ETCS categories are called degenerate; obviously our interest is in the structure of nondegenerate ETCS categories.

Let $\mathbf{E}$ be an ETCS category (see here for the ETCS axioms). Objects of $\mathbf{E}$ are generally called “sets”, and morphisms are generally called “functions” or “maps”.

Proposition 0: If an ETCS category $\mathbf{E}$ is a preorder, then $\mathbf{E}$ is degenerate.

Proof: Recall that a preorder is a category in which there is at most one morphism $A \to B$ for any two objects $A, B$. Every morphism in a preorder is vacuously monic. If there is a nonterminal set $A$, then the monic $A \to 1$ to any terminal set defines a subset $A \subseteq 1$ distinct from the subset defined by $1 \to 1$, thus giving (in an ETCS category) distinct classifying maps $\chi_A, t: 1 \to P(1)$, contradicting the preorder assumption. Therefore all objects $A$ are terminal. $\Box$

Assume from now on that $\mathbf{E}$ is a nondegenerate ETCS category.

Proposition 1: There are at least two truth values, i.e., two elements $1 \to P(1)$, in $\mathbf{E}$.

Proof: By proposition 0, there exist sets $X, Y$ and two distinct functions $f, g: X \to Y$. By the axiom of strong extensionality, there exists $x: 1 \to X$ such that $fx \neq gx$. The equalizer $E \to 1$ of the pair $fx, gx: 1 \to Y$ is then a proper subset of $1$, and therefore there are at least two distinct elements $\chi_E, \chi_1: 1 \to P(1)$. $\Box$

Proposition 2: There are at most two truth values $1 \to P(1)$; equivalently, there are at most two subsets of $1$.

Proof: If $U, V \subseteq 1$ are distinct subsets of $1$, then either $U \neq U \cap V$ or $V \neq U \cap V$, say the former. Then $1_U: U \subseteq U$ and $U \cap V \subseteq U$ are distinct subsets, with distinct classifying maps $\chi_{1_U}, \chi_{U \cap V}: U \to P(1)$. By strong extensionality, there exists $x: 1 \to U$ distinguishing these classifying maps. Because $1$ is terminal, we then infer $1 \subseteq U$ and $U \subseteq 1$, so $U = 1$ as subsets of $1$, and in that case only $V$ can be a proper subset of $1$. $\Box$

By propositions 1 and 2, there is a unique proper subset of the terminal object $1$. Let $0 \to 1$ denote this subset. Its domain may be called an “empty set”; by the preceding proposition, it has no proper subsets. The classifying map $1 \to P1$ of $0 \subseteq 1$ is the truth value we call “false”.

Proposition 3: 0 is an initial object, i.e., for any $X$ there exists a unique function $0 \to X$.

Proof: Uniqueness: if $f, g: 0 \to X$ are maps, then their equalizer $x: E \to 0$, which is monic, must be an isomorphism since 0 has no proper subsets. Therefore $f = g$. Existence: there are monos

$\displaystyle 0 \to 1 \stackrel{t_X}{\to} PX$

$\displaystyle X \stackrel{\sigma}{\to} PX$

where $t_X$ is “global truth” (classifying the subset $X \subseteq X$) on $X$ and $\sigma$ is the “singleton mapping $x \mapsto \{x\}$” on $X$, defined as the classifying map of the diagonal map $\delta: X \subseteq X \times X$ (last time we saw $\sigma$ is monic). Take their pullback. The component of the pullback parallel to $\sigma$ is a mono $P \to 0$ which again is an isomorphism, whence we get a map $0 \cong P \to X$ using the other component of the pullback. $\Box$

Remark: For the “purists”, an alternative construction of the initial set 0 that avoids use of the strong extensionality axiom is to define the subset $0 \subseteq 1$ to be “the intersection all subsets of $1$“. Formally, one takes the extension $\left[\phi\right] \subseteq 1$ of the map

$\displaystyle \phi = (1 \stackrel{t_{P1}}{\to} PP1 \stackrel{\bigcap}{\to} P1);$

where the first arrow represents the class of all subsets of $P1$, and the second is the internal intersection operator defined at the end of our last post. Using formal properties of intersection developed later, this intersection $0 \subseteq 1$ has no proper subsets, and then the proof of proposition 3 carries over verbatim. $\Box$

Corollary 1: For any $X$, the set $0 \times X$ is initial.

Proof: By cartesian closure, maps $0 \times X \to Y$ are in bijection with maps of the form $0 \to Y^X$, and there is exactly one of these since 0 is initial. $\Box$

Corollary 2: If there exists $f: X \to 0$, then $X$ is initial.

Proof: The composite of $\langle f, 1_X \rangle: X \to 0 \times X$ followed by $\pi_2: 0 \times X \to X$ is $1_X$, and $\pi_2$ followed by $\langle f, 1_X \rangle: X \to 0 \times X$ is also an identity since $0 \times X$ is initial by corollary 1. Hence $X$ is isomorphic to an initial object $0 \times X$. $\Box$

By corollary 2, for any object $Y$ the arrow $0 \to Y$ is vacuously monic, hence defines a subset.

Proposition 4: If $X \not\cong 0$, then there exists an element $x: 1 \to X$.

Proof: Under the assumption, $X$ has at least two distinct subsets: $0 \subseteq X$ and $1_X: X \subseteq X$. By strong extensionality, their classifying maps $\chi_0, \chi_1: X \to P1$ are distinguished by some element $x: 1 \to X$.

### External Unions and Internal Joins

One of the major goals in this post is to construct finite coproducts in an ETCS category. As in ordinary set theory, we will construct these as disjoint unions. This means we need to discuss unions first; as should be expected by now, in ETCS unions are considered locally, i.e., we take unions of subsets of a given set. So, let $A, B \subseteq X$ be subsets.

To define the union $A \cup B \subseteq X$, the idea is to take the intersection of all subsets containing $A$ and $B$. That is, we apply the internal intersection operator (constructed last time),

$\displaystyle \bigcap: PPX \to PX,$

to the element $1 \to PPX$ that represents the set of all subsets of $X$ containing $A$ and $B$; the resulting element $1 \to PX$ represents $A \cup B$. The element $1 \to PPX$ corresponds to the intersection of two subsets

$\{C \in PX: A \subseteq C\} \cap \{C \in PX: B \subseteq C\} \subseteq PX.$

Remark: Remember that in ETCS we are using generalized elements: $C \in PX$ really means a function $C: U \to PX$ over some domain $U$, which in turn classifies a subset $\left[C\right] \subseteq U \times X$. On the other hand, the $A$ here is a subset $A \subseteq X$. How then do we interpret the condition “$A \subseteq C$“? We first pull back $\chi_A: 1 \to PX$ over to the domain $U$; that is, we form the composite $\displaystyle U \stackrel{!}{\to} 1 \stackrel{\chi_A}{\to} PX$, and consider the condition that this is bounded above by $C: U \to PX$. (We will write $\chi_A \leq C$, thinking of the left side as constant over $U$.) Externally, in terms of subsets, this corresponds to the condition $U \times A \subseteq \left[C\right]$.

We need to construct the subsets $\{C \in PX: A \subseteq C\}, \{C \in PX: B \subseteq C\}$. In ZFC, we could construct those subsets by applying the comprehension axiom scheme, but the axioms of ETCS have no such blanket axiom scheme. (In fact, as we said earlier, much of the work on “internalizing logic” goes to show that in ETCS, we instead derive a comprehension scheme!) However, one way of defining subsets in ETCS is by taking loci of equations; here, we express the condition $A \subseteq C$, more pedantically $A \subseteq \left[C\right]$ or $\chi_A \leq C$, as the equation

$(\chi_A \Rightarrow C) = t_X$

where the right side is the predicate “true over $X$“.

Thus we construct the subset $\{C \in PX: A \subseteq C\}$ of $PX$ via the pullback:

{C: A ≤ C} -------> 1
|              |
|              | t_X
V chi_A => -   V
PX -----------> PX

Let me take a moment to examine what this diagram means exactly. Last time we constructed an internal implication operator

$\Rightarrow: P1 \times P1 \to P1$

and now, in the pullback diagram above, what we are implicitly doing is lifting this to an operator

$\Rightarrow_X: PX \times PX \to PX$

The easy and cheap way of doing this is to remember the isomorphism $PX \cong P1^X$ we used last time to uncover the cartesian closed structure, and apply this to

$\displaystyle P1^X \times P1^X \cong (P1 \times P1)^X \stackrel{\Rightarrow^X}{\to} P1^X$

to define $\Rightarrow_X: PX \times PX \to PX$. This map classifies a certain subset of $X \times PX \times PX$, which I’ll just write down (leaving it as an exercise which involves just chasing the relevant definitions):

$\{(x, T, S) \in X \times PX \times PX: x \in_X T \Rightarrow x \in_X S\}$

Remark: Similarly we can define a meet operator $\wedge_X: PX \times PX \to PX$ by exponentiating the internal meet $P1 \times P1 \to P1$. It is important to know that the general Heyting algebra identities which we established last time for $P1$ lift to the corresponding identities for the operators $\wedge_X, \Rightarrow_X$ on $PX$. Ultimately this rests on the fact that the functor $(-)^X$, being a right adjoint, preserves products, and therefore preserves any algebraic identity which can be expressed as a commutative diagram of operations between such products.

Hence, for the fixed subset $A \subseteq X$ (classified by $\chi_A: 1 \to PX$), the operator

$\chi_A \Rightarrow -: PX \to PX$

classifies the subset

$\{(x, S): x \in_X A \Rightarrow x \in_X S\} \hookrightarrow X \times PX$

Finally, in the pullback diagram above, we are pulling back the operator $\chi_A \Rightarrow -$ against $t_X$. But, from last time, that was exactly the method we used to construct universal quantification. That is, given a subset

$R \subseteq X \times Y$

we defined $\forall_Y R \subseteq X$ to be the pullback of $t_Y: 1 \hookrightarrow PY$ along $\chi_R: X \to PY$. Putting all this together, the pullback diagram above expresses the definition

$\{C \in PX: A \subseteq C\} := \{C \in PX: \forall_{x \in X} \ x \in_X A \Rightarrow x \in_X C\}$

that one would expect “naively”.

Now that all the relevant constructions are in place, we show that $A \cup B$ is the join of $A$ and $B$ in the poset $Sub(X)$. There is nothing intrinsically difficult about this, but as we are still in the midst of constructing the internal logic, we will have to hunker down and prove some logic things normally taken for granted or zipped through without much thought. For example, the internal intersection operator was defined with the help of internal universal quantification, and we will need to establish some formal properties of that.

Here is a useful general principle for doing internal logic calculations. Let $\chi_R: X \to PY$ be the classifying map of a subset $R \subseteq X \times Y$, and let $f: U \to X$ be a function. Then the composite $\chi_R f: U \to PY$ classifies the subset

$(f \times 1_Y)^{-1}(R) \subseteq U \times Y$

so that one has the general identity $\chi_R f = \chi_{(f \times 1)^{-1}(R)}$. In passing back and forth between the external and internal viewpoints, the general principle is to try to render “complicated” functions $U \to PY$ into a form $\chi_R f$ which one can more easily recognize. For lack of a better term, I’ll call this the “pullback principle”.

Lemma 1: Given a relation $R \hookrightarrow X \times Y$ and a constant $c: 1 \to Y$, there is an inclusion

$\forall_Y R \subseteq (1_X \times c)^{-1}(R)$

as subsets of $X$. (In traditional logical syntax, this says that for any element $c: 1 \to Y$,

$\forall_{y: Y} R(x, y)$ implies $R(x, c)$

as predicates over elements $x \in X$. This is the type of thing that ordinarily “goes without saying”, but which we actually have to prove here!)

Proof: As we recalled above, $\forall_Y R \subseteq X$ was defined to be $\chi_R^{-1}(t_Y)$, the pullback of global truth $t_Y: 1 \to PY$ along the classifying map $\chi_R: X \to PY$. Hold that thought.

Let

$Pc: PY \to P1$

be the map which classifies the subset $\{S \in PY: c \in_Y S\}$. Equivalently, this is the map

$P1^c: P1^Y \to P1^1$

under the canonical isomorphisms $PY \cong P1^Y$, $P1^1 \cong P1$. Intuitively, this maps $f \mapsto f(c)$, i.e., plugs an element $c \in Y$ into an element $f \in P1^Y$.

Using the adjunction $(- \times Y) \dashv (-)^Y$ of cartesian closure, the composite

$\displaystyle X \stackrel{\chi_R}{\to} PY \stackrel{Pc}{\to} P1$

transforms to the composite

$X \times 1 \stackrel{1_X \times c}{\to} X \times Y \stackrel{\chi_{R}}{\to} P1$

so by the pullback principle, $(Pc)\chi_R: X \times 1 \to P1$ classifies $(1_X \times c)^{-1}(R) \subseteq X \times 1 \cong X$.

Equivalently,

$(1_X \times c)^{-1}(R) = \chi_{R}^{-1} (Pc)^{-1}(t) \qquad (1)$

Also, as subsets of $PY$, we have the inclusion

$(t_Y: 1 \hookrightarrow PY) \subseteq (Pc)^{-1}(t) \qquad (2)$

[this just says that $t_Y$ belongs to the subset classified by $Pc$, or equivalently that $c: 1 \to Y$ is in the subset $Y \subseteq Y$]. Applying the pullback operation $\chi_{R}^{-1}$ to (2), and comparing to (1), lemma 1 follows. $\Box$

Lemma 2: If $R \subseteq S$ as subsets of $X \times Y$, then $\forall_Y R \subseteq \forall_Y S$.

Proof: From the last post, we have an adjunction:

$T \times Y \subseteq S$ if and only if $T \subseteq \forall_Y S$

for any subset of $T \subseteq X$. So it suffices to show $\forall_Y R \times Y \subseteq S$. But

$\forall_Y R \times Y \subseteq R \subseteq S$

where the first inclusion follows from $\forall_Y R \subseteq \forall_Y R$. $\Box$

Next, recall from the last post that the internal intersection of $F: 1 \to PPX$ was defined by interpreting the following formula on the right:

$\displaystyle \bigcap F = \forall_{S \in PX} (S \in_{PX} F) \Rightarrow (x \in_X S)$

Lemma 3: If $F \leq G: 1 \to PPX$, then $\displaystyle \bigcap G \leq \bigcap F: 1 \to PX$.

Proof: $F: 1 \to PPX$ classifies the subset $\{S \in PX: S \in_{PX} F\}$, i.e.,  $F$ is identified with the predicate $S \in_{PX} F$ in the argument $S$, so by hypothesis $(S \in_{PX} F) \leq (S \in_{PX} G)$ as predicates on $S$. Internal implication $a \Rightarrow b$ is contravariant in the argument $a$ [see the following remark], so

$\left[(S \in G) \Rightarrow (x \in S)\right] \leq \left[(S \in F) \Rightarrow (x \in S)\right]$

Now apply lemma 2 to complete the proof. $\Box$

Remark: The contravariance of $- \Rightarrow b$, that is, the fact that

$x \leq y$ implies $(y \Rightarrow b) \leq (x \Rightarrow b),$

is a routine exercise using the adjunction [discussed last time]

$a \wedge c \leq b$ if and only if $c \leq (a \Rightarrow b).$

Indeed, we have

$x \wedge (y \Rightarrow b) \leq y \wedge (y \Rightarrow b) \leq b \qquad (*)$

where the first inequality follows from the hypothesis $x \leq y$, and the second follows from $y \Rightarrow b \leq y \Rightarrow b$. By the adjunction, the inequality (*) implies $(y \Rightarrow b) \leq (x \Rightarrow b)$. $\Box$

Theorem 1: For subsets $A, B$ of $X$, the subset $A \cup B$ is an upper bound of $A$ and $B$, i.e., $A, B \subseteq A \cup B$.

Proof: It suffices to prove that $\displaystyle A = \bigcap \{C \in PX: A \subseteq C\}$, since then we need only apply lemma 3 to the trivially true inclusion

$\{C \in PX: A \subseteq C\} \cap \{C \in PX: B \subseteq C\} \subseteq \{C \in PX: A \subseteq C\}$

to infer $A \subseteq A \cup B$, and similarly $B \subseteq A \cup B$. (Actually, we need only show $\displaystyle A \subseteq \bigcap \{C \in PX: A \subseteq C\}$. We’ll do that first, and then show full equality.)

The condition we want,

$A \subseteq \{x \in X: \forall_{S \in PX} (A \subseteq S) \Rightarrow (x \in_X S)\},$

is, by the adjunction $(- \times PX) \dashv \forall_{PX}: Sub(X \times PX) \to Sub(X)$, equivalent to

$A \times PX \subseteq \{(x, S): A \subseteq S \Rightarrow (x \in_X S)\}$

which, by a $\wedge$-$\Rightarrow$ adjunction, is equivalent to

$(A \times PX) \cap (X \times \{S \in PX: A \subseteq S\}) \subseteq \ \in_X \qquad (1)$

as subsets of $X \times PX$. So we just have to prove (1). At this point we recall, from our earlier analysis, that

$\{S \in PX: A \subseteq S\} = \{S \in PX: \forall_{x \in X} x \in_X A \Rightarrow x \in_X S\}$

Using the adjunction $(X \times -) \dashv \forall_X$, as in the proof of lemma 2, we have

$X \times \{S \in PX: \forall_{x \in X} x \in_X A \Rightarrow x \in_X S\}$

$\subseteq \{(x, S): x \in_X A \Rightarrow x \in_X S\} := (A \times PX) \Rightarrow \in_X,$

which shows that the left side of (1) is contained in

$(A \times PX) \cap ((A \times PX) \Rightarrow \in_X) \subseteq \ \in_X,$

where the last inclusion uses another $\wedge$-$\Rightarrow$ adjunction. Thus we have established (1) and therefore also the inclusion

$\displaystyle A \subseteq \bigcap \{S \in PX: A \subseteq S\}$

Now we prove the opposite inclusion

$\displaystyle \bigcap \{S \in PX: A \subseteq S\} \subseteq A,$

that is to say

$\{x \in X: \forall_{S \in PX} A \subseteq S \Rightarrow x \in_X S\} \subseteq A \qquad (**)$

Here we just use lemma 1, applied to the particular element $\chi_A: 1 \to PX$: we see that the left side of (**) is contained in

$\{x \in X: A \subseteq \left[\chi_A\right] \Rightarrow x \in_X A\}$

which collapses to $\{x \in X: x \in_X A\} = A$, since $A = \left[\chi_A\right]$. This completes the proof. $\Box$

Theorem 2: $A \cup B$ is the least upper bound of $A, B$, i.e., if $C \subseteq X$ is a subset containing both $A \subseteq X$ and $B \subseteq X$, then $A \cup B \subseteq C$.

Proof: We are required to show that

$\{x \in X: \forall_{S \in PX} \ (A \subseteq S \wedge B \subseteq S) \Rightarrow x \in_X S\} \subseteq C.$

Again, we just apply lemma 1 to the particular element $\chi_C: 1 \to PX$: the left-hand side of the claimed inclusion is contained in

$\{x \in X: (A \subseteq C \wedge B \subseteq C) \Rightarrow x \in_X C\}$

but since $(A \subseteq C \wedge B \subseteq C)$ is true by hypothesis (is globally true as a predicate on the implicit variable $x \in X$), this last subset collapses to

$\{x \in X: t_X \Rightarrow x \in_X C\} = \{x \in X: x \in_X C\} = C$

which completes the proof. $\Box$

Theorems 1 and 2 show that for any set $X$, the external poset $Sub(X)$ admits joins. One may go on to show (just on the basis of the topos axioms) that as in the case of meets, the global external operation of taking joins is natural in $X$, so that by the Yoneda principle, it is classified by an internal join operation

$\vee: P1 \times P1 \to P1,$

namely, the map which classifies the union of the subsets

$\displaystyle \left[\pi_1\right] = P1 \times 1 \stackrel{1 \times t}{\hookrightarrow} P1 \times P1$

$\displaystyle \left[\pi_2\right] = 1 \times P1 \stackrel{t \times 1}{\hookrightarrow} P1 \times P1$

and this operation satisfies all the expected identities. In short, $P1$ carries an internal Heyting algebra structure, as does $PX \cong P1^X$ for any set $X$.

We will come back to this point later, when we show (as a consequence of strong extensionality) that $P1$ is actually an internal Boolean algebra.

### Construction of Coproducts

Next, we construct coproducts just as we do in ordinary set theory: as disjoint unions. Letting $X, Y$ be sets (objects in an ETCS category), a disjoint union of $X$ and $Y$ is a pair of monos

$i: X \hookrightarrow Z \qquad j: Y \hookrightarrow Z$

whose intersection is empty, and whose union or join in $Sub(Z)$ is all of $Z$. We will show that disjoint unions exist and are essentially unique, and that they satisfy the universal property for coproducts. We will use the notation $X + Y$ for a disjoint union.

Theorem 3: A disjoint union of $X$ and $Y$ exists.

Proof: It’s enough to embed $X, Y$ disjointly into some set $C$, since the union of the two monos in $Sub(C)$ would then be the requisite $Z$. The idea now is that if a disjoint union or coproduct $X+Y$ exists, then there’s a canonical isomorphism $P(X + Y) \cong PX \times PY$. Since the singleton map

$\sigma: X + Y \to P(X + Y) \cong PX \times PY$

is monic, one thus expects to be able to embed $X$ and $Y$ disjointly into $PX \times PY$. Since we can easily work out how all this goes in ordinary naive set theory, we just write out the formulas and hope it works out in ETCS.

In detail, define $i_X: X \to PX \times PY$ to be

$\displaystyle X \cong X \times 1 \stackrel{\sigma_X \times \chi_0}{\to} PX \times PY$

where $\sigma_X$ is the singleton mapping and $\chi_0$ classifies $0 \hookrightarrow Y$; similarly, define $i_Y: Y \to PX \times PY$ to be

$\displaystyle Y \cong 1 \times Y \stackrel{\chi_0 \times \sigma_Y}{\to} PX \times PY.$

Clearly $i_X$ and $i_Y$ are monic, so to show disjointness we just have to show that their pullback is empty. But their pullback is isomorphic to the cartesian product of the pullbacks of the diagrams

$\displaystyle X \stackrel{\sigma_X}{\to} PX \stackrel{\chi_0}{\leftarrow} 1 \qquad 1 \stackrel{\chi_0}{\to} PY \stackrel{\sigma_Y}{\leftarrow} Y$

so it would be enough to show that each (or just one) of these two pullbacks is empty, let’s say the first.

Suppose given a map $h: A \to X$ which makes the square

  A -------> 1
|          |
h |          | chi_0
V sigma_X  V
X -------> PX

commute. Using the pullback principle, the map $A \to 1 \stackrel{\chi_0}{\to} PX$ classifies

$0 \times A \hookrightarrow X \times A$

which is just the empty subset. This must be the same subset as classified by $\sigma_X h = \chi_{\delta}h$ (where $\delta: X \hookrightarrow X \times X$ is the diagonal), which by the pullback principle is

$(1_X \times h)^{-1}(\delta) \hookrightarrow X \times A.$

An elementary calculation shows this to be the equalizer of the pair of maps

$\pi_1, h\pi_2: X \times A \stackrel{\to}{\to} X$

So this equalizer $E$ is empty. But notice that $\langle h, 1 \rangle: A \to X \times A$ equalizes this pair of maps. Therefore we have a map $A \to E \cong 0$. By corollary 2 above, we infer $A \cong 0$. This applies to the case where $A$ is the pullback, so the pullback is empty, as was to be shown. $\Box$

Theorem 4: Any two disjoint unions of $X, Y$ are canonically isomorphic.

Proof: Suppose $i: X \to Z \leftarrow Y: j$ is a disjoint union. Define a map

$\phi= \langle \phi_1, \phi_2 \rangle: Z \to PX \times PY$

where $\phi_1: Z \to PX$ classifies the subset $\langle 1_X, i \rangle: X \to X \times Z$, and $\phi_2: Z \to PY$ classifies the subset $\langle 1_Y, j \rangle: Y \to Y \times Z$. Applying the pullback principle, the composite $\phi_1 i: X \to PX$ classifies

$(1_X \times i)^{-1}(\langle 1_X, i \rangle) \hookrightarrow X \times X$

which is easily seen to be the diagonal on $X$. Hence $\phi_1 i = \sigma_X$. On the other hand, $\phi_1 j: Y \to PX$ classifies the subset

$(1_X \times j)^{-1}(\langle 1_X, i \rangle) \hookrightarrow X \times Y$

which is empty because $i$ and $j$ are disjoint embeddings, so $\phi_1 j = \chi_0: Y \to PX$. Similar calculations yield

$\phi_2 i = \chi_0: X \to PY \qquad \phi_2 j = \sigma_Y: Y \to PY.$

Putting all this together, we conclude that $\phi i = i_X: X \to PX \times PY$ and $\phi j = i_Y: Y \to PX \times PY$, where $i_X$ and $i_Y$ were defined in the proof of theorem 3.

Next, we show that $\phi$ is monic. If not, then by strong extensionality, there exist distinct elements $c, d: 1 \to Z$ for which $\phi(c) = \phi(d)$; therefore, $\phi_1 c = \phi_1 d: 1 \to PX$ and $\phi_2 c = \phi_2 d: 1 \to PY$. By the pullback principle, these equations say (respectively)

$c^{-1}(i) = d^{-1}(i) \hookrightarrow 1 \qquad c^{-1}(j) = d^{-1}(j) \hookrightarrow 1$

If $c^{-1}(i) = 1$, then both $c, d: 1 \to Z$ factor through the mono $i: X \to Z$. However, since $\phi i = i_{X}$ is monic, this would imply that $\phi (c) \neq \phi (d)$, contradiction. Therefore $c^{-1}(i) = c \cap i = 0$. By similar reasoning, $c \cap j = 0$. Therefore

$i \subseteq \neg c \qquad j \subseteq \neg c$

where $\neg = (- \Rightarrow 0): Sub(Z) \to Sub(Z)$ is the negation operator. But then $i \cup j \subseteq \neg c$. And since $1: Z \hookrightarrow Z$ is the union $i \cup j$ by assumption, $\neg c$ must be the top element $\top \in Sub(Z)$, whence $c: 1 \to Z$ is the bottom element 0. This contradicts the assumption that the topos is nondegenerate. Thus we have shown that $\phi$ must be monic.

The argument above shows that $\phi: Z \hookrightarrow PX \times PY$ is an upper bound of $i_X: X \to PX \times PY$ and $i_Y: Y \to PX \times PY$ in $Sub(PX \times PY)$. It follows that the join $X + Y$ constructed in theorem 3 is contained in $\phi: Z \to PX \times PY$, and hence can be regarded as the join of $X$ and $Y$ in $Sub(Z)$. But $Z$ is their join in $Sub(Z)$ by assumption of being a disjoint union, so the containment $X + Y \subseteq Z$ must be an equality. The proof is now complete. $\Box$

Theorem 5: The inclusions $i_X: X \to X + Y$, $i_Y: Y \to X + Y$ exhibit $X + Y$ as the coproduct of $X$ and $Y$.

Proof: Let $f: X \to B$, $g: Y \to B$ be given functions. Then we have monos

$\langle 1_X, f \rangle: X \hookrightarrow X \times B \qquad \langle 1_Y, g \rangle: Y \hookrightarrow Y \times B \qquad (1)$

Now the operation $- \times B: Sub(C) \to Sub(C \times B)$ certainly preserves finite meets, and also preserves finite joins because it is left adjoint to $\forall_B: Sub(C \times B) \to Sub(C)$. Therefore this operation preserves disjoint unions; we infer that the monos

$X \times B \stackrel{i_X \times B}{\to} (X + Y) \times B \qquad Y \times B \stackrel{i_Y \times B}{\to} (X + Y) \times B \qquad (2)$

exhibit $(X + Y) \times B$ as a disjoint union of $X \times B, Y \times B$. Composing the monos of (1) and (2), we have disjoint embeddings of $X$ and $Y$ in $(X + Y) \times B$. Using theorem 4, $X + Y$ is isomorphic to the join of these embeddings; this means we have an inclusion

$\langle 1, h \rangle: X + Y \hookrightarrow (X + Y) \times B$

whose restriction to $X$ yields $\langle i_X, f \rangle$ and whose restriction to $Y$ yields $\langle i_Y, g \rangle$. Hence $h: X + Y \to B$ extends $f$ and $g$. It is the unique extension, for if there were two extensions $h, h'$, then the equalizer of $\langle 1, h \rangle$ and $\langle 1, h' \rangle$ would be an upper bound of $X, Y$ in $Sub((X + Y) \times B)$, contradicting the fact that $X + Y$ is the least upper bound. This completes the proof. $\Box$

I think that’s enough for one day. I will continue to explore the categorical structure and logic of ETCS next time.

This post is a continuation of the discussion of “the elementary theory of the category of sets” [ETCS] which we had begun last time, here and in the comments which followed. My thanks go to all who commented, for some useful feedback and thought-provoking questions.

Today I’ll describe some of the set-theoretic operations and “internal logic” of ETCS. I have a feeling that some people are going to love this, and some are going to hate it. My main worry is that it will leave some readers bewildered or exasperated, thinking that category theory has an amazing ability to make easy things difficult.

• An aside: has anyone out there seen the book Mathematics Made Difficult? It’s probably out of print by now, but I recommend checking it out if you ever run into it — it’s a kind of extended in-joke which pokes fun at category theory and abstract methods generally. Some category theorists I know take a dim view of this book; I for my part found certain passages hilarious, in some cases making me laugh out loud for five minutes straight. There are category-theory-based books and articles out there which cry out for parody!

In an attempt to nip my concerns in the bud, let me remind my readers that there are major differences between the way that standard set theories like ZFC treat membership and the way ETCS treats membership, and that differences at such a fundamental level are bound to propagate throughout the theoretical development, and impart a somewhat different character or feel between the theories. The differences may be summarized as follows:

• Membership in ZFC is a global relation between objects of the same type (sets).
• Membership in ETCS is a local relation between objects of different types (“generalized” elements or functions, and sets).

Part of what we meant by “local” is that an element per se is always considered relative to a particular set to which it belongs; strictly speaking, as per the discussion last time, the same element is never considered as belonging to two different sets. That is, in ETCS, an (ordinary) element of a set $A$ is defined to be a morphism $x: 1 \to A$; since the codomain is fixed, the same morphism cannot be an element $1 \to B$ of a different set $B$. This implies in particular that in ETCS, there is no meaningful global intersection operation on sets, which in ZFC is defined by:

$A \cap B = \{x: (x \in A) \wedge (x \in B)\}$

Instead, in ETCS, what we have is a local intersection operation on subsets $A \hookrightarrow X, B \hookrightarrow X$ of a set. But even the word “subset” requires care, because of how we are now treating membership. So let’s back up, and lay out some simple but fundamental definitions of terms as we are now using them.

Given two monomorphisms $i: A \to X, j: B \to X$, we write $i \subseteq j$ ($A \subseteq B$ if the monos are understood, or $A \subseteq_X B$ if we wish to emphasize this is local to $X$) if there is a morphism $k: A \to B$ such that $i = j k$. Since $j$ is monic, there can be at most one such morphism $k$; since $i$ is monic, such $k$ must be monic as well. We say $i, j$ define the same subset if this $k$ is an isomorphism. So: subsets of $X$ are defined to be isomorphism classes of monomorphisms into $X$. As a simple exercise, one may show that monos $i, j$ into $X$ define the same subset if and only if $i \subseteq j$ and $j \subseteq i$. The (reflexive, transitive) relation $\subseteq_X$ on monomorphisms thus induces a reflexive, transitive, antisymmetric relation, i.e., a partial order on subsets of $X$.

Taking some notational liberties, we write $A \subseteq X$ to indicate a subset of $X$ (as isomorphism class of monos). If $x: U \to X$ is a generalized element, let us say $x$ is in a subset $A \subseteq X$ if it factors (evidently uniquely) through any representative mono $i: A \to X$, i.e., if there exists $x': U \to A$ such that $x = i x'$. Now the intersection of two subsets $A \subseteq X$ and $B \subseteq X$ is defined to be the subset $A \cap B \subseteq X$ defined by the pullback of any two representative monos $i: A \to X, j: B \to X$. Following the “Yoneda principle”, it may equivalently be defined up to isomorphism by specifying its generalized elements:

$A \cap B :=_i \{x \in X: (x \mbox{ is in } A) \wedge (x \mbox{ is in } B)\}.$

Thus, intersection works essentially the same way as in ZFC, only it’s local to subsets of a given set.

While we’re at it, let’s reformulate the power set axiom in this language: it says simply that for each set $B$ there is a set $P(B)$ and a subset $\in_B \subseteq B \times P(B)$, such that for any relation $R \subseteq B \times A$, there is a unique “classifying map” $\chi_R: A \to P(B)$ whereby, under $1_B \times \chi_R: B \times A \to B \times P(B)$, we have

$R = (1_B \times \chi_R)^{-1}(\in_B).$

The equality is an equality between subsets, and the inverse image on the right is defined by a pullback. In categorical set theory notation,

$R = \{\langle b, a \rangle \in B \times A: b \in_B \chi_R(a)\}.$

Hence, there are natural bijections

$\displaystyle \frac{R \subseteq B \times A}{A \to P(B)} \qquad \frac{R \subseteq B \times A}{B \to P(A)}$

between subsets and classifying maps. The subset corresponding to $\phi: A \to P(B)$ is denoted $\left[\phi\right] \subseteq B \times A$ or $\left[\phi\right] \subseteq A \times B$, and is called the extension of $\phi$.

The set $P(1)$ plays a particularly important role; it is called the “subset classifier” because subsets $A \subseteq X$ are in natural bijection with functions $\chi: X \to P(1)$. [Cf. classifying spaces in the theory of fiber bundles.]

In ordinary set theory, the role of $P(1)$ is played by the 2-element set $\{f, t\}$. Here subsets $A \subseteq X$ are classified by their characteristic functions $\chi_A: X \to \{f, t\}$, defined by $\chi_A(x) := t$ iff $x \in A$. We thus have $A = \chi_A^{-1}(t)$; the elementhood relation $\in_1 \hookrightarrow 1 \times P(1)$ boils down to $t: 1 \to P(1)$. Something similar happens in ETCS set theory:

Lemma 1: The domain of elementhood $\in_1 \to 1 \times P(1) \cong P(1)$ is terminal.

Proof: A map $X \to \in_1$, that is, a map $\chi: X \to P(1)$ which is in $\in_1 \subseteq P(1)$, corresponds exactly to a subset $\chi^{-1}(\in_1) \subseteq X$ which contains all of $X$ (i.e., the subobject $1_X: X \subseteq X$). Since the only such subset is $1_X$, there is exactly one map $X \to \in_1$. $\Box$

Hence elementhood $\in_1 \subseteq 1 \times P(1)$ is given by an element $t: 1 \to P(1)$. The power set axiom says that a subset $A \subseteq X$ is retrieved from its classifying map $\chi_A: X \to P(1)$ as the pullback $\chi^{-1}_A(t) \subseteq X$.

Part of the power of, well, power sets is in a certain dialectic between external operations on subsets and internal operations on $P(1)$; one can do some rather amazing things with this. The intuitive (and pre-axiomatic) point is that if $C$ has finite products, equalizers, and power objects, then $P(1)$ is a representing object for the functor

$Sub: C^{op} \to Set$

which maps an object $X$ to the collection of subobjects of $X$, and which maps a morphism (“function”) $f: X \to Y$ to the “inverse image” function $f^{-1}: Sub(Y) \to Sub(X)$, that sends a subset $j: B \subseteq Y$ to the subset $f^{-1}(B) \subseteq X$ given by the pullback of the arrows $f: X \to Y$, $j: B \subseteq Y$. By the Yoneda lemma, this representability means that external natural operations on the $Sub(X)$ correspond to internal operations on the object $P(1)$. As we will see, one can play off the external and internal points of view against each other to build up a considerable amount of logical structure, enough for just about any mathematical purpose.

• Remark: A category satisfying just the first three axioms of ETCS, namely existence of finite products, equalizers, and power objects, is called an (elementary) topos. Most or perhaps all of this post will use just those axioms, so we are really doing some elementary topos theory. As I was just saying, we can build up a tremendous amount of logic internally within a topos, but there’s a catch: this logic will be in general intuitionistic. One gets classical logic (including law of the excluded middle) if one assumes strong extensionality [where we get the definition of a well-pointed topos]. Topos theory has a somewhat fearsome reputation, unfortunately; I’m hoping these notes will help alleviate some of the sting.

To continue this train of thought: by the Yoneda lemma, the representing isomorphism

$\displaystyle \theta: \hom(-, P(1)) \stackrel{\sim}{\to} Sub(-)$

is determined by a universal element $\theta_{P(1)}(1_{P(1)})$, i.e., a subset of $P(1)$, namely the mono $t: 1 \to P(1)$. In that sense, $t: 1 \to P(1)$ plays the role of a universal subset. The Yoneda lemma implies that external natural operations on general posets $Sub(X)$ are completely determined by how they work on the universal subset.

### Internal Meets

To illustrate these ideas, let us consider intersection. Externally, the intersection operation is a natural transformation

$\cap_X: Sub(X) \times Sub(X) \to Sub(X).$

This corresponds to a natural transformation

$\hom(X, P(1)) \times \hom(X, P(1)) \to \hom(X, P(1))$

which (by Yoneda) is given by a function $\wedge: P(1) \times P(1) \to P(1)$. Working through the details, this function is obtained by putting $X = P(1) \times P(1)$ and chasing

$\langle \pi_1, \pi_2\rangle \in \hom(P(1) \times P(1), P(1)) \times \hom(P(1) \times P(1), P(1))$

through the composite

$\displaystyle \hom(X, P(1)) \times \hom(X, P(1))$

$\displaystyle \stackrel{\sim}{\to} Sub(X) \times Sub(X) \stackrel{\cap}{\to} Sub(X) \stackrel{\sim}{\to} \hom(X, P(1)).$

Let’s analyze this bit by bit. The subset $\left[\pi_1\right] = \pi_{1}^{-1}(t) \subseteq P(1) \times P(1)$ is given by

$t \times id: 1 \times P(1) \to P(1) \times P(1),$

and the subset $\left[\pi_2\right] = \pi_{2}^{-1}(t) \subseteq P(1) \times P(1)$ is given by

$id \times t: P(1) \times 1 \to P(1) \times P(1).$

Hence $\left[\pi_1\right] \cap \left[\pi_2\right] \subseteq P(1) \times P(1)$ is given by the pullback of the functions $t \times id$ and $id \times t$, which is just

$t \times t: 1 \times 1 \to P(1) \times P(1).$

The map $\wedge: P(1) \times P(1) \to P(1)$ is thus defined to be the classifying map of $t \times t: 1 \times 1 \subseteq P(1) \times P(1)$.

To go from the internal meet $\wedge: P(1) \times P(1) \to P(1)$ back to the external intersection operation, let $A \subseteq X, B \subseteq X$ be two subsets, with classifying maps $\chi_A, \chi_B: X \to P(1)$. By the definition of $\wedge$, we have that for all generalized elements $x \in X$

$\chi_A(x) \wedge \chi_B(x) = t$ if and only if $\langle \chi_A(x), \chi_B(x) \rangle = \langle t, t \rangle$

(where the equality signs are interpreted with the help of equalizers). This holds true iff $x$ is in the subset $A \subseteq X$ and is in the subset $B \subseteq X$, i.e., if and only if $x$ is in the subset $A \cap B \subseteq X$. Thus $\chi_A \wedge \chi_B$ is indeed the classifying map of $A \cap B \subseteq X$. In other words, $\chi_{A \cap B} = \chi_A \wedge \chi_B$.

A by-product of the interplay between the internal and external is that the internal intersection operator

$\wedge: P(1) \times P(1) \to P(1)$

is the meet operator of an internal meet-semilattice structure on $P(1)$: it is commutative, associative, and idempotent (because that is true of external intersection). The identity element for $\wedge$ is the element $t: 1 \to P(1)$. In particular, $P(1)$ carries an internal poset structure: given generalized elements $u, v: A \to P(1)$, we may define

$u \leq v$ if and only if $u = u \wedge v$

and this defines a reflexive, symmetric, antisymmetric relation $\left[\leq\right] \subseteq P(1) \times P(1)$:

$\left[\leq\right] :=_i \{\langle u, v \rangle \in P(1) \times P(1): u = u \wedge v\},$

equivalently described as the equalizer

$\left[\leq\right] \to P(1) \times P(1) \stackrel{\to}{\to} P(1)$

of the maps $\pi_1: P(1) \times P(1) \to P(1)$ and $\wedge: P(1) \times P(1) \to P(1)$. We have that $u \leq v$ if and only if $\left[u\right] \subseteq \left[v\right]$.

### Internal Implication

Here we begin to see some of the amazing power of the interplay between internal and external logical operations. We will prove that $P(1)$ carries an internal Heyting algebra structure (ignoring joins for the time being).

Let’s recall the notion of Heyting algebra in ordinary naive set-theoretic terms: it’s a lattice $P$ that has a material implication operator $\Rightarrow$ such that, for all $x, y, z \in P$,

$x \wedge y \leq z$ if and only if $x \leq y \Rightarrow z.$

Now: by the universal property of $P(1)$, a putative implication operation $\Rightarrow: P(1) \times P(1) \to P(1)$ is uniquely determined as the classifying map of its inverse image $(\Rightarrow)^{-1}(t) \subseteq P(1) \times P(1)$, whose collection of generalized elements is

$\{\langle u, v \rangle \in P(1) \times P(1): (u \Rightarrow v) = t\}$

Given $\langle u, v \rangle: A \to P(1) \times P(1)$, the equality here is equivalent to

$t \leq u \Rightarrow v$

(because $t: 1 \to P(1)$ is maximal), which in turn is equivalent to

$t \wedge u = u \leq v$

This means we should define $\Rightarrow: P(1) \times P(1) \to P(1)$ to be the classifying map of the subset

$\left[\leq\right] \subseteq P(1) \times P(1).$

Theorem 1: $P(1)$ admits internal implication.

Proof: We must check that for any three generalized elements $u, v, w: A \to P(1)$, we have

$w \leq u \Rightarrow v$ if and only if $w \wedge u \leq v.$

Passing to the external picture, let $\left[u\right], \left[v\right], \left[w\right]$ be the corresponding subsets of $A$. Now: according to how we defined $\Rightarrow,$ a generalized element $e \in A$ is in $\left[u \Rightarrow v\right]$ if and only if $u(e) \leq v(e)$. This applies in particular to any monomorphism $e: \left[w\right] \to A$ that represents the subset $\left[w\right] \subseteq A$.

Lemma 2: The composite

$\displaystyle u(e) = (\left[w\right] \stackrel{e}{\to} A \stackrel{u}{\to} P(1))$

is the classifying map of the subset $\left[w\right] \cap \left[u\right] \subseteq \left[w\right]$.

Proof: As subsets of $\left[w\right]$, $(u e)^{-1}(t) = e^{-1} u^{-1}(t) = e^{-1}(\left[u\right]) = \left[w\right] \cap \left[u\right]$ where the last equation holds because both sides are the subsets defined as the pullback of two representative monos $e: \left[w\right] \to A$, $i: \left[u\right] \to A$. $\Box$

Continuing the proof of theorem 1, we see by lemma 2 that the condition $u(e) \leq v(e)$ corresponds externally to the condition

$\left[w\right] \cap \left[u\right] \subseteq \left[w\right] \cap \left[v\right]$

and this condition is equivalent to $\left[w\right] \cap \left[u\right] \subseteq \left[v\right]$. Passing back to the internal picture, this is equivalent to $w \wedge u \leq v$, and the proof of theorem 1 is complete. $\Box$

### Cartesian Closed Structure

Next we address a comment made by “James”, that a category satisfying the ETCS axioms is cartesian closed. As everything else in this article, this uses only the fact that such a category is a topos: has finite products, equalizers, and “power sets”.

Proposition 1: If $A, B$ are “sets”, then $P(A \times B)$ represents an exponential $P(B)^A.$

Proof: By the power set axiom, there is a bijection between maps into the power set and relations:

$\displaystyle \frac{\phi: X \to P(A \times B)}{R \subseteq X \times (A \times B)}$

which is natural in $X$. By the same token, there is a natural bijection

$\displaystyle \frac{R \subseteq (X \times A) \times B}{\phi': X \times A \to P(B)}.$

Putting these together, we have a natural isomorphism

$\hom(-, P(A \times B)) \cong \hom(- \times A, P(B))$

and this representability means precisely that $P(A \times B)$ plays the role of an exponential $P(B)^A$. $\Box$

Corollary 1: $P(A) \cong P(1)^A$. $\Box$

The universal element of this representation is an evaluation map $A \times P(A) \to P(1)$, which is just the classifying map of the subset $\in_A \subseteq A \times P(A)$.

Thus, $P(B)^A \cong P(A \times B)$ represents the set of all functions $\phi: A \to P(B)$ (that is, relations from $A$ to $B$). This is all we need to continue the discussion of internal logic in this post, but let’s also sketch how we get full cartesian closure. [Warning: for those who are not comfortable with categorical reasoning, this sketch could be rough going in places.]

As per our discussion, we want to internalize the set of such relations which are graphs of functions, i.e., maps $\phi$ where each $\phi(a) \subseteq B$ is a singleton, in other words which factor as

$\displaystyle A \to B \stackrel{\sigma}{\to} P(B)$

where $\sigma: B \to P(B)$ is the singleton mapping:

$b \mapsto \{b\} = \{c \in B: b = c\}.$

We see from this set-theoretic description that $\sigma: B \to P(B)$ classifies the equality relation

$\{\langle b, c\rangle \in B \times B: b = c\} \subseteq B \times B$

which we can think of as either the equalizer of the pair of maps $\pi_1, \pi_2: B \times B \to B$ or, what is the same, the diagonal map $\delta_B = \langle 1_B, 1_B \rangle: B \to B \times B$.

Thus, we put $\sigma = \chi_{\delta}: B \to P(B)$, and it is not too hard to show that the singleton mapping $\sigma$ is a monomorphism. As usual, we get this monomorphism as the pullback $\chi_{\sigma}^{-1}(t)$ of $t: 1 \to P(1)$ along its classifying map $\chi_{\sigma}: P(B) \to P(1)$.

Now: a right adjoint such as $(-)^A$ preserves all limits, and in particular pullbacks, so we ought then to have a pullback

       B^A ---------------> 1^A
|                    |
sigma^A |                    | t^A
V                    V
P(B)^A -------------> P(1)^A
(chi_sigma)^A

Of course, we don’t even have $B^A$ yet, but this should give us an idea: define $\sigma^A$, and in particular its domain $B^A$, by taking the pullback of the right-hand map along the bottom map. In case there is doubt, the map on the bottom is defined Yoneda-wise, applying the isomorphism

$\hom(P(B)^A \times A, P(1)) \cong \hom(P(B)^A, P(1)^A)$

to the element in the hom-set (on the left) given by the composite

$\displaystyle P(B)^A \times A \stackrel{ev}{\to} P(B) \stackrel{\chi_\sigma}{\to} P(1).$

The map on the right of the pullback is defined similarly. That this recipe really gives a construction of $B^A$ will be left as an exercise for the reader.

### Universal Quantification

As further evidence of the power of the internal-external dialectic, we show how to internalize universal quantification.

As we are dealing here now with predicate logic, let’s begin by defining some terms as to be used in ETCS and topos theory:

• An ordinary predicate of type $A$ is a function $\phi: A \to P(1)$. Alternatively, it is an ordinary element $\phi': 1 \to P(1)^A \cong P(A)$. It corresponds (naturally and bijectively) to a subset $\left[\phi\right]: S \subseteq A$.
• A generalized predicate of type $A$ is a function $\phi': X \to P(A) \cong P(1)^A$. It may be identified with (corresponds naturally and bijectively to) a function $\phi: X \times A \to P(1)$, or to a subset $\left[\phi\right]: S \subseteq X \times A$.

We are trying to define an operator $\forall_A$ which will take a predicate of the form $\phi: X \times A \to P(1)$ [conventionally written $\phi(x, a)$] to a predicate $\forall_A \phi: X \to P(1)$ [conventionally written $\forall_{a \in A} \phi(x, a)$]. Externally, this corresponds to a natural operation which takes subsets of $X \times A$ to subsets of $X$. Internally, it corresponds to an operation of the form

$\forall_A: P(A) \cong P(1)^A \to P(1).$

This function is determined by the subset $(\forall_A)^{-1}(t) \subseteq P(1)^A$, defined elementwise by

$\{\phi \in P(1)^A: \forall_A \phi = t\}.$

Now, in ordinary logic, $\forall_{a \in A} \phi(a)$ is true if and only if $\phi(a)$ is true for all $a \in A$, or, in slightly different words, if $\phi: A \to P(1)$ is constantly true over all of $A$:

$\displaystyle \phi = (A \stackrel{!}{\to} 1 \stackrel{t}{\to} P(1)).$

The expression on the right (global truth over $A$) corresponds to a function $t_A: 1 \to P(1)^A$, indeed a monomorphism since any function with domain $1$ is monic. Thus we are led to define the desired quantification operator $\forall_A: P(1)^A \to P(1)$ as the classifying map of $t_A: 1 \subseteq P(1)^A$.

Let’s check how this works externally. Let $\phi: X \to P(1)^A$ be a generalized predicate of type $A$. Then according to how $\forall_A$ has just been defined, $\forall_A \phi: X \to P(1)$ classifies the subset

$\displaystyle \{x \in X: \phi(x, -) = t_A: A \to P(1))\} \subseteq X$

There is an interesting adjoint relationship between universal quantification and substitution (aka “pulling back”). By “substitution”, we mean that given any predicate $\psi: X \to P(1)$ on $X$, we can always pull back to a predicate on $X \times A$ (substituting in a dummy variable $a$ of type $A$, forming e.g. $\psi(x) \wedge \left[a=a\right]$) by composing with the projection $\pi: X \times A \to X$. In terms of subsets, substitution along $A$ is the natural external operation

$(\left[\psi\right] \subseteq X) \mapsto (\left[\psi\right]\times A \subseteq X \times A).$

Then, for any predicate $\phi: X \times A \to P(1)$, we have the adjoint relationship

$\left[\psi\right] \times A \subseteq \phi$ if and only if $\left[\psi\right] \subseteq \forall_A \phi$

so that substitution along $A$ is left adjoint to universal quantification along $A$. This is easy to check; I’ll leave that to the reader.

### Internal Intersection Operators

Now we put all of the above together, to define an internal intersection operator

$\bigcap: PPX \to PX$

which intuitively takes an element $1 \to PPX$ (a family $F$ of subsets of $X$) to its intersection $1 \to PX$, as a subset $\bigcap F \subseteq X$.

Let’s first write out a logical formula which expresses intersection:

$x \in \bigcap F \ \ \mbox{if and only if} \ \ \forall_{S \in PX} (S \in F) \Rightarrow (x \in S)\}.$

We have all the ingredients to deal with the logical formula on the right: we have an implication operator $\Rightarrow$ as part of the internal Heyting algebra structure on $P(1)$, and we have the quantification operator $\forall_{PX}$. The atomic expressions $(S \in F)$ and $(x \in S)$ refer to internal elementhood: $(x \in S)$ means $\langle x, S\rangle \in X \times PX$ is in $\ \in_{X}\ \subseteq X \times PX$, and $(S \in F)$ means $\langle S, F\rangle \in PX \times PPX$ is in $\ \in_{PX}\ \subseteq PX \times PPX$.

There is a slight catch, in that the predicates “$(S \in_{PX} F)$” and “$(x \in_X S)$” (as generalized predicates over $PX$, where $S$ lives) are taken over different domains. The first is of the form $\phi_1: PPX \to P(1)^{PX}$, and the second is of the form $\phi_2: X \to P(1)^{PX}$. No matter: we just substitute in some dummy variables. That is, we just pull these maps back to a common domain $PPX \times X$, forming the composites

$\displaystyle \psi_1 = (PPX \times X \stackrel{\pi_1}{\to} PPX \stackrel{\phi_1}{\to} P(1)^{PX})$

and

$\displaystyle \psi_2 = (PPX \times X \stackrel{\pi_2}{\to} X \stackrel{\phi_2}{\to} P(1)^{PX}).$

Putting all this together, we form the composite

$\displaystyle PPX \times X \stackrel{\langle \psi_1, \psi_2\rangle}{\to} P(1)^{PX} \times P(1)^{PX}$

$\displaystyle \cong (P(1) \times P(1))^{PX} \stackrel{(\Rightarrow)^{PX}}{\to} P(1)^{PX} \stackrel{\forall_{PX}}{\to} P(1)$

This composite directly expresses the definition of the internal predicate $(x \in \bigcap F)$ given above. By cartesian closure, this map $PPX \times X \to P(1)$ induces the desired internal intersection operator, $\bigcap: PPX \to PX$.

This construction provides an important bridge to getting the rest of the internal logic of ETCS. Since we can can construct the intersection of arbitrary definable families of subsets, the power sets $PX$ are internal inf-lattices. But inf-lattices are sup-lattices as well; on this basis we will be able to construct the colimits (e.g., finite sums, coequalizers) that we need. Similarly, the intersection operators easily allow us to construct image factorizations: any function $f: X \to Y$ can be factored (in an essentially unique way) as an epi or surjection $X \to I$ to the image, followed by a mono or injection $I \to Y$. The trick is to define the image as the smallest subset of $Y$ through which $f$ factors, by taking the intersection of all such subsets. Image factorization leads in turn to the construction of existential quantification.

As remarked above, the internal logic of a topos is generally intuitionistic (the law of excluded middle is not satisfied). But, if we add in the axiom of strong extensionality of ETCS, then we’re back to ordinary classical logic, where the law of excluded middle is satisfied, and where we just have the two truth values “true” and “false”. This means we will be able to reason in ETCS set theory just as we do in ordinary mathematics, taking just a bit of care with how we treat membership. The foregoing discussion gives indication that logical operations in categorical set theory work in ways familiar from naive set theory, and that basic set-theoretic constructions like intersection are well-grounded in ETCS.

This is a post on “foundations of mathematics” (eek!). I was motivated to write it while I’ve been struggling to understand better certain applications of ultrafilters — namely the theory of measurable cardinals — from a point of view and language that I feel comfortable with. My original intent was to blog about that, as a kind of off-shoot of the general discussion of ultrafilters I started in connection with the series on Stone duality, and because it seems kind of cool. And I will. But this got finished first, and I thought that it would be of interest to some who have been following my category theory posts.

A lot of confusion seems to reign around “the categorical approach to foundations” and what it might entail; some seem to think it involves a “doing-away with elements” that we all know and love, or doing away with sets and supplanting them with categories, or something like that. That’s an unfortunate misunderstanding. My own attitude is pragmatic: I’m all in favor of mathematicians using ordinary “naive” (pre-axiomatic) set theory to express their thoughts if that’s the familiar and appropriate conveyance — I mean, obviously I do it myself. It’s our common heritage, learned through years of undergraduate and graduate school experience and beyond. I’m not proposing for a moment to “overthrow” it.

What I do propose to discuss is a formalized set theory which embodies this rich tradition, but which takes advantage of categorical insights honed over the decades, and which I would argue is ‘natural’ in its ease to accept formulas in naive set theory and give them a foundation true to mathematical practice; I also argue it addresses certain criticisms which I feel could be put to that hallowed foundational theory, ZFC. I should own up that this theory is not immune to criticism, a main one being that a certain amount of preface and commentary is required to make it accessible (and I don’t think category theorists have done a particularly hot job doing that, frankly).

Let’s start by putting down what we want in very simple, pragmatic terms:

• A (formalized) ‘set theory’ worthy of the name ought to realize a conception of sets as “completed collections”, and allow for the existence of enough sets and relations to fulfill the needs of working mathematicians.

This is intentionally vague. The “needs of working mathematicians” fluctuate over time and place and person. Some of the core needs would include the existence of the sets of natural numbers and real numbers, for instance. On the other hand, set theorists may have greater needs than specialists in the theory of several complex variables. For now I’ll ignore some of the deeper needs of set theorists, and try to focus on the basic stuff you’d need to formalize what goes on in your average graduate school text (to put it vaguely, again).

We will discuss two formalizations of set theory: ZFC, and Lawvere’s Elementary Theory of the Category of Sets [ETCS]. The first “needs no introduction”, as they say. The second is an autonomous category-based theory, described in detail below, and proposed by Saunders Mac Lane as an alternative approach to “foundations of mathematics” (see his book with Moerdijk). Either formalization provides fully adequate infrastructure to support the naive set theory of working mathematicians, but there are significant conceptual differences between them, centering precisely on how the notion of membership is handled. I’ll start with the more familiar ZFC.

As everyone knows, ZFC formalizes a conception of “set” as collection extensionally determined by the members it contains, and the ZFC axioms ensure a rich supply of ways in which to construct new sets from old (pairings, unions, power sets, etc.). Considering how old and well-developed this theory is, and the plenitude of available accounts, I won’t say much here on its inner development. Instead, I want to pose a question and answer to highlight a key ZFC conception, and which we use to focus our discussion:

Question: “What are the members of sets?”

This may seem innocent enough, but the consequences are quite far-reaching. It says that “membership” is a relation $\in$ from the collection $V$ of all “sets” to itself. (Speaking at a pre-axiomatic level, a relation from a set $X$ to a set $Y$ is a subset $R \subseteq X \times Y$. So a structure for ZFC set theory consists of a “universe of discourse” $V$, together with a collection $\in \ \subseteq V \times V$ of pairs of elements of $V$, called the membership relation.)

Why is this a big deal? A reasonable analogue might be dynamical systems. If $X$ and $Y$ are manifolds, say, then you can study the properties of a given smooth map $f: X \to Y$ and maybe say interesting things of course, but in the case $X = Y$, you get the extra bonus that outputs can be fed back in as inputs, and infinite processes are born: you can study periodic orbits, long-term behaviors, and so on, and this leads to some very intricate mathematics, even when $X$ is a simple manifold like a 2-sphere.

My point is that something analogous is happening in ZFC: we have a (binary) relation $\in$ from $V$ to itself, and we get a rich “dynamics” and feedback by iterative relational composition of $\in$ with itself, or by composing other derived binary relations from $V$ to itself. (Perhaps I should recall here, again at a pre-axiomatic level, that the composite $S \circ R$ of a relation $R \subseteq X \times Y$ and $S \subseteq Y \times Z$ is the subset

$\{(x, z): \exists_{y \in Y} (x, y) \in R \wedge (y, z) \in S\} \subseteq X \times Z$.)

A “behavior” then would correspond to an iterated membership chain

$(x \in y) \wedge (y \in z) \wedge (z \in w) \wedge \ldots$

and there are certain constraints on behavior provided by things like the axiom of foundation (no infinitely long backward evolutions). The deep meaning of the extensionality axiom is that a “set” $s$ is uniquely specified by the abstract structure of the tree of possible backward evolutions or behaviors starting from the “root set” $s$. This gives some intuitive but honest idea of the world of sets according to the ZFC picture: sets are tree-like constructions. The ZFC axioms are very rich, having to do with incredibly powerful operations on trees, and the combinatorial results are extremely complicated.

There are other formulations of ZFC. One is by posets: given any relation $R \subseteq V \times V$ (never mind one satisfying the ZFC axioms), one can create a reflexive and transitive relation $\leq$, defined by the first-order formula

$a \leq b$ if and only if $\forall_{x \in V} (x, a) \in R \Rightarrow (x, b) \in R$

The “extensionality axiom” for $R$ can then be formulated as the condition that $\leq$ also be antisymmetric, so that it is a partial ordering on $V$. If $R = \in$ is the membership relation for a model of ZFC, then this $\leq$ is of course just the usual “subset relation” between elements of $V$.

Then, by adding in a suitable “singleton” operator $s: V \to V$ so that

$x \in y$ if and only if $s(x) = \{x\} \leq y$

the rest of the ZFC axioms can be equivalently recast as conditions on the augmented poset structure $(V, \leq, s)$. In fact, Joyal and Moerdijk wrote a slim volume, Algebraic Set Theory, which gives a precise (and for a categorist, attractive) sense in which models of axiomatic frameworks like ZF can be expressed as certain initial algebras [of structure type $(V, \leq, s)$] within an ambient category of classes, effectively capturing the “cumulative hierarchy” conception underlying ZFC in categorical fashion.

The structure of a ZFC poset $(V, \leq)$ is rich and interesting, of course, but in some ways a little odd or inconvenient: e.g., it has a bottom element of course (the “empty set”), but no top (which would run straight into Russell’s paradox). Categorically, there are some cute things to point out about this poset, usually left unsaid; for example, taking “unions” is left adjoint to taking “power sets”:

$\bigcup F \leq X$  if and only if  $F \leq PX$.

In summary: ZFC is an axiomatic theory (in the language of first-order logic with equality), with one basic type $V$ and one basic predicate $\in$ of binary type $V \times V$, satisfying a number of axioms. The key philosophic point is that there is no typed distinction between “elements” and “sets”: both are of type $V$, and there is a consequent very complicated dynamical “mixing” which results just on the basis of a short list of axioms: enough in principle to found all of present-day mathematics! I think the fact that one gets such great power, so economically, from apparently such slender initial data, is a source of great pride and pleasure among those who uphold the ZFC conception (or that of close kin like NBG) as a gold standard in foundations.

My own reaction is that ZFC is perhaps way too powerful! For example, the fact that $\in$ is an endo-relation makes possible the kind of feedback which can result in things like Russell’s paradox, if one is not careful. Even if one is free from the paradoxes, though, the point remains that ZFC pumps out not only all of mathematics, but all sorts of dross and weird by-products that are of no conceivable interest or relevance to mathematics. One might think, for example, that to understand a model of ZFC, we have to be able to understand which definable pairs $(a, b)$ satisfy $a \in b$. So, in principle, we can ask ourselves such otherwise meaningless gibberish as “what in our model and implementation is the set-theoretic intersection of the real number $\pi$ and Cantor space?” and expect to get a well-defined answer. When you get right down to it, the idea that everything in mathematics (like say the number $e$) is a “set” is just plain bizarre, and actually very far removed from the way mathematicians normally think. And yet this is how we are encouraged to think, if we are asked to take ZFC seriously as a foundations.

One might argue that all expressions and theorems of normal mathematics are interpretable or realizable in the single theory ZFC, and that’s really all we ever asked for — the details of the actual implementation (like, ‘what is an ordered pair?’) being generally of little genuine interest to mathematicians (which is why the mathematician in the street who says ZFC is so great usually can’t say with much specificity what ZFC is). But this would seem to demote ZFC foundations, for most mathematicians, to a security blanket — nice to know it’s there, maybe, but otherwise fairly irrelevant to their concerns. But if there really is such a disconnect between how a mathematician thinks of her materials at a fundamental level and how it specifically gets coded up as trees in ZFC, with such huge wads of uninteresting or irrelevant stuff in its midst, we might re-examine just how appropriate ZFC is as “foundations” of our subject, or at least ask ourselves how much of it we usefully retain and how we might eliminate the dross.

We turn now to consider a categorical approach, ETCS. This will require retooling the way we think of mathematical membership. There are three respects in which “membership” or “elementhood” differs here from the way it is handled in ZFC:

1. “Elements” and “sets” are entities of different types. (Meaning, elements are not themselves presupposed to be sets.)
2. When we say “element”, we never mean an object considered in isolation; we always consider it relative to the specific “set” it is considered to be a member of. (That is, strictly speaking, the same object is never thought of as “belonging to” two distinct sets — use of such language must be carefully circumscribed.)
3. We consider not just “elements” in the usual sense, but what are sometimes called “generalized elements”. Civilians call them “functions”. Thus, an element of type $X$ over a domain of variation $U$ is fancy terminology for a function $f: U \to X$. We will call them functions or “generalized elements”, depending on the intuition we have in mind. A function $x: 1 \to X$ corresponds to an ordinary element of $X$.

Each of these corresponds to some aspect of normal practice, but taken together they are sufficiently different in how they treat “membership” that they might need some getting used to. The first corresponds to a decision to treat elements of a “set” like $\mathbb{R}$ as ‘urelements': they are not considered to have elements themselves and are not considered as having any internal structure; they are just atoms. What counts in a mathematical structure then is not what the constituents are ‘like’ themselves, but only how they are interrelated among themselves qua the structure they are considered being part of.

This brings us right to the second point. It corresponds e.g. to a decision never to consider a number like ‘3’ in isolation or as some Platonic essence, but always with respect to an ambient system to which it is bound, as in ‘3 qua natural number’, ‘3 qua rational number’, etc. It is a firm resolve to always honor context-dependence. Naturally, we can in a sense transport ‘3’ from one context to another via a specified function like $\mathbb{N} \to \mathbb{Q}$, but strictly speaking we’ve then changed the element. This raises interesting questions, like “what if anything plays the role of extensionality?”, or “what does it mean to take the intersection of sets?”. (Globally speaking, in ETCS we don’t — but we can, with a bit of care, speak of the intersection of two “subsets” of a given set. For any real mathematical purpose, this is good enough.)

My own sense is that it may be this second precept which takes the most getting used to — it certainly gives the lie to sometimes-heard accusations that categorical set theory is just a “slavish translation of ZFC into categorical terms”. Clearly, we are witnessing here radical departure from how membership is treated in ZFC. Such unbending commitment to the principle of context-dependence might even be felt to be overkill, a perhaps pedantic exercise in austerity or purity, or that it robs us of some freedom in how we want to manipulate sets. A few quick answers: no, we don’t lose any essential freedoms. Yes, the formal language may seem slightly awkward or stilted at certain points, but the bridges between the naive and formal are mercifully fairly obvious and easily navigated. Lastly, by treating membership not as a global endo-relation on sets, but as local and relative, we effectively eliminate all the extraneous dreck and driftwood which one rightly ignores when examining the mathematics of ZFC.

The third precept is familiar from the way category theorists and logicians have used generalized elements to extend set-theoretic notation, e.g., to chase diagrams in abelian categories, or to describe sheaf semantics of intuitionistic set theory, or to flesh out the Curry-Howard isomorphism. It is a technical move in some sense, but one which is easy to grow accustomed to, and very convenient. In ETCS, there is a strong “extensionality principle” (technically, the requirement that the terminal object is a generator) which guarantees enough “ordinary” elements $x: 1 \to X$ to make any distinctions that can sensibly be made, but experience with topos theory suggests that for many applications, it is often convenient to drop or significantly modify that principle. If anything in ETCS is a nod to traditional set theory, it is such a strong extensionality principle. [The Yoneda principle, which deals with generalized elements, is also an extensionality principle: it says that a set is determined uniquely (to within uniquely specified isomorphism) by its generalized elements.]

Okay, it is probably time to lay out the axioms of ETCS. The basic data are just those of a category; here, we are going to think of objects as “sets”, and morphisms $x: a \to b$ as functions or equivalently as “elements of a set $b$ over a domain of variation $a$“. The latter is a mouthful, and it is sometimes convenient to suppress explicit mention of the domain $a$, so that “$x \in b$” just means some morphism $x$ with codomain $b$. More on this below. The axioms of ETCS are the axioms of category theory, plus existence axioms which guarantee enough structure to express and support naive set theory (under the strictures imposed by precepts 1-3 above). For those who speak the lingo, the axioms below are those of a well-pointed topos with natural number object and axiom of choice. (This can be augmented later with a replacement axiom, so as to achieve bi-interpretability with full ZFC.)

Remark: As ETCS breaks the “dynamical” aspects of ZFC, and additionally treats issues of membership in a perhaps unaccustomed manner, its axioms do take longer to state. This should come as no surprise. Actually, we’ve discussed some of them already in other posts on category theory; we will repeat ourselves but make some minor adjustments to reflect normal notational practice of naive set theory, and build bridges between the naive and formal.

Axiom of products. For any two sets $a, b$, there is a set $c$ and functions $p_1: c \to a$, $p_2: c \to b$, such that given two elements $x \in a, y \in b$ over the same domain, there exists a unique element $\langle x, y \rangle \in c$ over that domain for which

$p_1\langle x, y \rangle = x \qquad p_2\langle x, y \rangle = y$

A choice of product $c$ is usually denoted $a \times b$. To make a bridge with naive set-theory notation, we suggestively write

$a \times b :=_i \{\langle x, y \rangle: x \in a, y \in b\}$

where the funny equality sign and bracketing notation on the right simply mean that the cartesian product is uniquely defined up to isomorphism by its collection of (generalized) elements, which correspond to pairs of elements, in accordance with the Yoneda principle as explained in the discussion here.

We also assume the existence of an “empty product” or terminal object 1: this is a set with a unique element $x \in 1$ over any domain.

Axiom of equalizers. For any two functions $\displaystyle f, g: a \stackrel{\to}{\to} b$, there exists a function $i: e \to a$ such that

1. $f i = g i$,
2. Given $x \in a$ over some domain such that $f x = g x$, there exists a unique $x' \in e$ over the same domain such that $i x' = x$.

An equalizer $i: e \to a$ is again defined up to isomorphism by its collection of generalized elements, denoted $e :=_i \{x \in a: f(x) = g(x)\} \hookrightarrow a$, again in accordance with the Yoneda principle.

Using the last two axioms, we can form pullbacks: given functions $f: a \to c, g: b \to c$, we can form the set denoted

$\{\langle x, y \rangle \in a \times b: f x = g y\}$

using the product and equalizer indicated by this notation.

Before stating the next axiom, a few important remarks. We recall that a function $i: a \to b$ is injective if for every $x, y \in a$ over the same domain, $i x = i y$ implies $x = y$. In that case we think of $i: a \to b$ as defining a “subset” $a$ of $b$, whose (generalized) elements correspond to those elements $z \in b$ which factor (evidently uniquely) through $i$. It is in that sense that we say $z \in b$ also “belongs to” a subset $a$ (cf. precept 2). A relation from $a$ to $b$ is an injective function or subset $r \hookrightarrow a \times b$.

Axiom of power sets. For every set $b$ there is a choice of power set $P(b)$ and a relation $\in_b \hookrightarrow b \times P(b)$, so that for every relation $r \hookrightarrow a \times b$, there exists a unique function $\chi_r: a \to P(b)$ such that $r$ is obtained up to isomorphism as the pullback

$\{\langle x, y \rangle \in a \times b: y \in_b \chi_r(x)\}$

In other words, $\langle x, y \rangle$ belongs to $r$ if and only if $\langle y, \chi_r(x) \rangle$ belongs to $\in_b$.

Axiom of strong extensionality. For functions $\displaystyle f, g: a \to b$, we have $f = g$ if and only if $f x = g x$ for all “ordinary” elements $x: 1 \to a$.

Axiom of natural number object. There is a set $\mathbf{N}$, together with an element $0: 1 \to \mathbf{N}$ and a function $s: \mathbf{N} \to \mathbf{N}$, which is initial among sets equipped with such data. That is, given a set $a$ together with an element $x: 1 \to a$ and a function $f: a \to a$, there exists a unique function $h: \mathbf{N} \to a$ such that

$h(0) = x; \qquad h s = f h$

Or, in elementwise notation, $h(n+1) = f h(n)$ for every (generalized) element $n \in \mathbf{N}$, where $n+1$ means $s(n)$. Under strong extensionality, we may drop the qualifier “generalized”.

Before stating the last axiom, we formulate a notion of “surjective” function: $f: a \to b$ is surjective if for any two functions $g, h: b \to c$, we have $g = h$ if and only if $g f = h f$. This is dual to the notion of being injective, and under the axiom of strong extensionality, is equivalent to the familiar notion: that $f$ is surjective if for every element $y: 1 \to b$, there exists an element $x: 1 \to a$ such that $f x = y$.

Axiom of choice. Every surjective function $s: a \to b$ admits a section, i.e., a function $i: b \to a$ such that $s i = 1_b$, the identity function.

This completes the list of axioms for ETCS. I have been at pains to try to describe them in notation which is natural from the standpoint of naive set theory, with the clear implication that any formula of naive set theory is readily translated into the theory ETCS (provided we pay appropriate attention to our precepts governing membership), and that this theory provides a rigorous foundation for mainstream mathematics.

To make good on this claim, further discussion is called for. First, I have not discussed how classical first-order logic is internalized in this setting (which we would need to do justice to a comprehension or separation scheme), nor have I discussed the existence or construction of colimits. I plan to take this up later, provided I have the energy for it. Again, the plan would be to stick as closely as possible to naive set-theoretic reasoning. (This might actually be useful: the categorical treatments found in many texts tend to be technical, often involving things like monad theory and Beck’s theorem, which makes it hard for those not expert in category theory to get into. I want to show this need not be the case.)

Also, some sort of justification for the claim that ETCS “founds” mainstream mathematics is called for. Minimally, one should sketch how the reals are constructed, for instance, and one should include enough “definability theory” to make it plausible that almost all constructions in ordinary mathematics find a natural home in ETCS. What is excluded? Mainly certain parts of set theory, and parts of category theory (ha!) which involve certain parts of set theory, but this is handled by strengthening the theory with more axioms; I particularly have in mind a discussion of the replacement axiom, and perhaps large cardinal axioms. More to come!

After this brief (?) categorical interlude, I’d like to pick up the main thread again, and take a closer look at the some of the ingredients of baby Stone duality in the context of categorical algebra, specifically through the lens of adjoint functors. By passing a topological light through this lens, we will produce the spectrum of a Boolean algebra: a key construction of full-fledged Stone duality!

Just before the interlude, we were discussing some consequences of baby Stone duality. Taking it from the top, we recalled that there are canonical maps

$i_B: B \to \hom(\mbox{Bool}(B, \mathbf{2}), \mathbf{2}): b \mapsto (\phi \mapsto \phi(b))$

$j_X: X \to \mbox{Bool}(\hom(X, \mathbf{2}), \mathbf{2}): x \mapsto (\sigma \mapsto \sigma(x))$

in the categories of Boolean algebras $B$ and sets $X$. We said these are “natural” maps (even before the notion of naturality had been formally introduced), and recalled our earlier result that these are isomorphisms when $B$ and $X$ are finite (which is manifestly untrue in general; for instance, if $B$ is a free Boolean algebra generated by a countable set, then for simple reasons of cardinality $B$ cannot be a power set).

What we have here is an adjoint pair of functors between the categories $Set$ and $\mbox{Bool}$ of sets and Boolean algebras, each given by a hom-functor:

$(P^{op} = \hom(-, \mathbf{2})^{op}: Set \to \mbox{Bool}^{op}) \dashv (Q = \mbox{Bool}(-, \mathbf{2}): \mbox{Bool}^{op} \to Set)$

($P^{op}$ acts the same way on objects and morphisms as $P: Set^{op} \to \mbox{Bool}$, but is regarded as mapping between the opposite categories). This actually says something very simple: that there is a natural bijection between Boolean algebra maps and functions

$\displaystyle \frac{\phi: B \to \hom(X, \mathbf{2})}{\hat{\phi}: X \to \mbox{Bool}(B, \mathbf{2})}$

given by the formula $\hat{\phi}(x)(b) = \phi(b)(x)$. [The very simple nature of this formula suggests that it's nothing special to Boolean algebras -- a similar adjunction could be defined for any algebraic theory defined by operations and (universally quantified) equations, replacing $\mathbf{2}$ by any model of that theory.] The unit of the adjunction at $X$ is the function $j_X: X \to QP^{op}X$, and the counit at $B$ is the Boolean algebra map $i_B$ (regarded as a morphism $\varepsilon_B: P^{op}QB \to B$ mapping the other way in the opposite category $\mbox{Bool}^{op}$).

The functor $QP^{op}$ is usually described in the language of ultrafilters, as I will now explain.

Earlier, we remarked that an ultrafilter in a Boolean algebra is a maximal filter, dual to a maximal ideal; let’s recall what that means. A maximal ideal $I$ in a Boolean ring $B$ is the kernel of a (unique) ring map

$\phi: B \to \mathbf{2}$

i.e., has the form $I = \phi^{-1}(0)$ for some such map. Being an ideal, it is an additive subgroup $I \subseteq B$ such that $x \in B, y \in I$ implies $x y = x \wedge y \in I$. It follows that if $x, y \in I$, then $x \vee y = x + y + xy \in I$, so $I$ is closed under finite joins (including the empty join $0 = \bot$). Also, if $x \leq y$ and $y \in I$, then $x = x \wedge y = xy \in I$, so that $I$ is “downward-closed”.

Conversely, a downward-closed subset $I \subseteq B$ which is closed under finite joins is an ideal in $B$ (exercise!). Finally, if $I$ is a maximal ideal, then under the quotient map

$\phi: B \to B/I \cong \mathbf{2}$

we have that for all $b \in B$, either $\phi(b) = 0$ or $\phi(b) = 1$, i.e., that either $b \in I$ or $\neg b = 1 - b \in I$.

Thus we have redefined the notion of maximal ideal in a Boolean algebra in the first-order theory of posets: a downward-closed set $I \subseteq B$ closed under finite joins, such that every element or its complement (but never both!) is contained in $I$. [If both $x, \neg x \in I$, then $x \vee \neg x = 1 \in I$, whence $b \in I$ for all $b \in B$ (since $b \leq 1$ and $I$ is downward-closed). But then $I$ isn't a maximal (proper) ideal!]

The notion of ultrafilter is dual, so an ultrafilter in a Boolean algebra $B$ is defined to be a subset $F \subseteq B$ which

• Is upward-closed: if $x \in F$ and $x \leq y$, then $y \in F$;
• Is closed under finite meets: if $x, y \in F$, then $x \wedge y \in F$;
• Satisfies dichotomy: for every $x \in B$, exactly one of $x, \neg x$ belongs to $F$.

If $I$ is a maximal ideal, then $\neg I = \{\neg x: x \in I\}$ is an ultrafilter, and we have natural bijections between the following concepts:

Boolean algebra maps $B \to \mathbf{2}$ $\leftrightarrow$ maximal ideals $I \subseteq B$ $\leftrightarrow$ ultrafilters $F \subseteq B$

so that $QB = \mbox{Bool}(B, \mathbf{2})$ is naturally identified with the set of ultrafilters in $B$.

If $X$ is a set, then an ultrafilter on $X$ is by definition an ultrafilter in the Boolean algebra $P X$. Hence $QP^{op}X$ is identified with the set of ultrafilters on $X$, usually denoted $\beta X$. The unit map

$j_X: X \to QP^{op}X \cong \beta X$

maps $x \in X$ to an ultrafilter denoted $\mbox{prin}_X(x) \subseteq PX$, consisting of all subsets $S \subseteq X$ which contain $x$, and called the principal ultrafilter generated by $x$.

We saw that when $X$ is finite, the function $j_X$ (and therefore also $\mbox{prin}_X$) is a bijection: there every ultrafilter is principal, as part of baby Stone duality (see Proposition 4 here). Here is a slight generalization:

Proposition 1: If an ultrafilter $F$ on $X$ contains a finite set $S \subseteq X$, then $F$ is principal.

Proof: It is enough to show $F$ contains $\{x\}$ for some $x \in S$. If not, then $F$ contains the complement $\neg\{x\}$ for every $x \in S$ (by dichotomy), and therefore also the finite intersection

$\bigcap_{x \in S} \neg\{x\} = \neg S,$

which contradicts the fact that $S \in F$. $\Box$

It follows that nonprincipal ultrafilters can exist only on infinite sets $X$, and that every cofinite subset of $X$ (complement of a finite set) belongs to such an ultrafilter (by dichotomy). The collection of cofinite sets forms a filter, and so the question of existence of nonprincipal ultrafilters is the question of whether the filter of cofinite sets can be extended to an ultrafilter. Under the axiom of choice, the answer is yes:

Proposition 2: Every (proper) filter in a Boolean algebra is contained in some ultrafilter.

Proof: This is dual to the statement that every proper ideal in a Boolean ring is contained in a maximal ideal. Either statement may be proved by appeal to Zorn’s lemma: the collection of filters which contain a given filter has the property that every linear chain of such filters has an upper bound (namely, the union of the chain), and so by Zorn there is a maximal such filter. $\Box$

As usual, Zorn’s lemma is a kind of black box: it guarantees existence without giving a clue to an explicit construction. In fact, nonprincipal ultrafilters on sets $X$, like well-orderings of the reals, are notoriously inexplicit: no one has ever seen one directly, and no one ever will.

That said, one can still develop some intuition for ultrafilters. I think of them as something like “fat nets”. Each ultrafilter $F$ on a set $X$ defines a poset (of subsets ordered by inclusion), but I find it more suggestive to consider instead the opposite $F^{op}$, where $U \leq V$ in $F^{op}$ means $V \subseteq U$ — so that the further or deeper you go in $F^{op}$, the smaller or more concentrated the element. Since $F$ is closed under finite intersections, $F^{op}$ has finite joins, so that $F^{op}$ is directed (any two elements have an upper bound), just like the elements of a net (or more pedantically, the domain of a net). I call an ultrafilter a “fat net” because its elements, being subsets of $X$, are “fatter” than mere points.

Intuitively speaking, ultrafilters as nets “move in a definite direction”, in the sense that given an element $U \in F$, however far in the net, and given a subset $T \subseteq U$, the ultrafilter-as-net sniffs out a direction in which to proceed, “tunneling” either into $T$ if $T \in F$, or into its relative complement $U \cap \neg T$ if this belongs to $F$. In the case of a principal ultrafilter, there is a final element $x$ of the net; otherwise not (but we can think of a nonprincipal ultrafilter as ending at an “ideal point” of the set $X$ if we want).

Since the intuitive imagery here is already vaguely topological, we may as well make the connection with topology more precise. So, suppose now that $X$ comes equipped with a topology. We say that an ultrafilter $F$ on $X$ converges to a point $x \in X$ if each open set $U$ containing $x$ (or each neighborhood of $x$) belongs to the ultrafilter. In other words, by going deep enough into the ultrafilter-as-net, you get within any chosen neighborhood of the point. We write $F \to x$ to say that $F$ converges to $x$.

General topology can be completely developed in terms of the notion of ultrafilter convergence, often very efficiently. For example, starting with any relation whatsoever between ultrafilters and points,

$c \subseteq \beta(X) \times X,$

we can naturally define a topology $\mbox{top}(c)$ on $X$ so that

$F \to x$ with respect to $\mbox{top}(c)$ whenever $(F, x) \in c$.

Let’s tackle that in stages: in order for the displayed condition to hold, a neighborhood of $x$ must belong to every ultrafilter $F$ for which $(F, x) \in c$. This suggests that we try defining the filter $N_x$ of neighborhoods of $x$ to be the intersection of ultrafilters

$N_x = \bigcap_{F: (F, x) \in c} F.$

Then define a subset $U \subseteq X$ to be open if it is a neighborhood of all the points it contains. In other words, define $U$ to be open if

$\forall_{(F, x) \in c} x \in U \Rightarrow U \in F.$

Proposition 3: This defines a topology, $\mbox{top}(c)$.

Proof: Since $X \in F$ for every ultrafilter $F$, it is clear that $X$ is open; also, it is vacuously true that the empty set is open. If $U, V$ are open, then for all $(F, x) \in c$, whenever $x \in U \cap V$, we have $x \in U$ and $x \in V$, so that $U \in F$ and $V \in F$ by openness, whence $U \cap V \in F$ since $F$ is closed under intersections. So $U \cap V$ is also open. Finally, suppose $U_i$ is a collection of open sets. For all $(F, x) \in c$, if $x \in \bigcup_i U_i$, then $x \in U_i$ for some $i$, so that $U_i \in F$ by openness, whence $\bigcup_i U_i \in F$ since ultrafilters are upward closed. So $\bigcup_i U_i$ is also open. $\Box$.

Let’s recap: starting from a topology $\tau$ on $X$, we’ve defined a convergence relation $\mbox{conv}(\tau) \subseteq \beta(X) \times X$ (consisting of pairs $(F, x)$ such that $F \to x$), and conversely, given any relation $c \subseteq \beta(X) \times X$, we’ve defined a topology $\mbox{top}(c)$ on $X$. What we actually have here is a Galois connection where

$c \subseteq \mbox{conv}(\tau)$ if and only if $\tau \subseteq \mbox{top}(c)$

Of course not every relation $c \subseteq \beta(X) \times X$ is the convergence relation of a topology, so we don’t quite have a Galois correspondence (that is, $\mbox{conv}$ and $\mbox{top}$ are not quite inverse to one another). But, it is true that every topology $\tau$ is the topology of its ultrafilter convergence relation, i.e., $\tau = \mbox{top(conv}(\tau))$. For this, it suffices to show that every neighborhood filter $N_x$ is the intersection of the ultrafilters that contain it. But that is true of any filter:

Theorem 1: If $N$ is a filter in $PX$ and $A \notin N$, then there exists an ultrafilter $F$ for which $N \subseteq F$ and $A \notin F$.

Proof: First I claim $\neg A \cap B \neq 0$ for all $B \in N$; otherwise $\neg A \cap B = 0$ for some $B \in N$, whence $B \subseteq \neg \neg A = A$, so that $A \in N$ since filters are upward closed, contradiction. It follows that $N$ can be extended to the (proper) filter

$\{C \in PX: \exists_{B \in N} \neg A \cap B \subseteq C\}$

which in turn extends to some ultrafilter $F$, by proposition 2. Since $\neg A \in F$, we have $A \notin F$. $\Box$

Corollary 1: Every filter is the intersection of all the ultrafilters which contain it.

The ultrafilter convergence approach to topology is particularly convenient for studies of compactness:

Theorem 2: A space $X$ is compact if and only if every ultrafilter $F$ converges to at least one point. It is Hausdorff if and only if every ultrafilter converges to at most one point.

Proof: First suppose that $X$ is compact, and (in view of a contradiction) that $F$ converges to no point of $X$. This means that for every $x \in X$ there is a neighborhood $U_x$ which does not belong to $F$, or that $\neg U_x \in F$. Finitely many of these $U_x$ cover $X$, by compactness. By De Morgan’s law, this means finitely many $\neg U_x$ have empty intersection. But this would mean $\emptyset \in F$, since $F$ is closed under finite intersections, contradiction.

In the reverse direction, suppose that every ultrafilter converges. We need to show that if $C = \{U_i\}_{i \in I}$ is any collection of open subsets of $X$ such that no finite subcollection covers $X$, then the union of the $U_i$ cannot cover $X$. First, because no finite subcollection covers, we may construct a filter generated by the complements:

$F = \{A \subseteq X: \bigcap_{j=1}^n \neg U_{i_j} \subseteq A \mbox{ for finitely many } U_{i_1}, \ldots, U_{i_n} \in C\}.$

Extend this filter to an ultrafilter $G$; then by assumption $\exists_{x \in X} G \to x$. If some one of the $U_i$ contained $x$, then $U_i \in G$ by definition of convergence. But we also have $\neg U_i \in F \subseteq G$, and this is a contradiction. So, $x$ lies outside the union of the $U_i$, as was to be shown.

Now let $X$ be Hausdorff, and suppose that $F \to x$ and $F \to y$. Let $U_x, U_y$ be neighborhoods of $x, y$ respectively with empty intersection. By definition of convergence, we have $U_x, U_y \in F$, whence $\emptyset = U_x \cap U_y \in F$, contradiction.

Conversely, suppose every ultrafilter converges to at most one point, and let $x, y$ be two distinct points. Unless there are neighborhoods $U_x, U_y$ of $x, y$ respectively such that $U_x \cap U_y = \emptyset$ (which is what we want), the smallest filter containing the two neighborhood filters $N_x, N_y$ (that is to say, the join $N_x \vee N_y$ in the poset of filters) is proper, and hence extends to an ultrafilter $F$. But then $N_x \subseteq F$ and $N_y \subseteq F$, which is to say $F \to x$ and $F \to y$, contradiction. $\Box$

Theorem 2 is very useful; among other things it paves the way for a clean and conceptual proof of Tychonoff’s theorem (that an arbitrary product of compact spaces is compact). For now we note that it says that a topology $\tau$ is the topology of a compact Hausdorff space structure on $X$ if and only if the convergence relation $\mbox{conv}(\tau) \subseteq \beta(X) \times X$ is a function. And in practice, functions $c: \beta(X) \to X$ which arise “naturally” tend to be such convergence relations, making $X$ a compact Hausdorff space.

Here is our key example. Let $B$ be a Boolean algebra, and let $X = QB = \mbox{Bool}(B, \mathbf{2})$, which we have identified with the set of ultrafilters in $B$. Define a map $c: \beta(X) \to X$ by

$\displaystyle \beta (QB) \cong QP^{op}QB \stackrel{Q\varepsilon_B}{\to} QB$

where $\varepsilon_B: P^{op}QB \to B$ was the counit (evaluated at $B$) of the adjunction $P^{op} \dashv Q$ defined at the top of this post. Unpacking the definitions a bit, the map $Q \varepsilon_B$ is the map $\mbox{Bool}(i_B, \mathbf{2})$, the result of applying the hom-functor $\mbox{Bool}(-, \mathbf{2})$ to

$i_B: B \to P^{op}QB = \hom(\mbox{Bool}(B, \mathbf{2}), \mathbf{2}): b \mapsto (\phi \mapsto \phi(b))$

Chasing this a little further, the map $c$ “pulls back” an ultrafilter $F \subseteq P^{op}QB$ to the ultrafilter $i_B^{-1}(F) \subseteq B$, viewed as an element of $QB$. We then topologize $QB$ by the topology $\mbox{top}(c)$.

This construction is about as “abstract nonsense” as it gets, but you have to admit that it’s pretty darned canonical! The topological space $QB$ we get in this way is called the spectrum of the Boolean algebra $B$. If you’ve seen a bit of algebraic geometry, then you probably know another, somewhat more elementary way of defining the spectrum (of $B$ as commutative ring), so we may as well make the connection explicit. However you define it, the result is a compact Hausdorff space structure with some other properties which make it very reminiscent of Cantor space.

It is first of all easy to see that $X = QB$ is compact, i.e., that every ultrafilter $F$ converges. Indeed, the relation $c$ is a function $\beta (QB) \to QB$, and if you look at the condition for a set $U$ to be open w.r.t. $\mbox{top}(c)$,

$\forall_{(F, x = c(F))} x \in U \Rightarrow U \in F,$

you see immediately that $F$ converges to $x = c(F)$.

To get Hausdorffness, take two distinct points $u, v \in QB$ (ultrafilters in $B$). Since these are distinct maximal filters, there exists $b \in B$ such that $b$ belongs to $u$ but not to $v$, and then $\neg b$ belongs to $v$ but not to $u$. Define

$U(b) := \{w \in QB: b \in w\}.$

Proposition 4: $U(b)$ is open in $\mbox{top}(c)$.

Proof: We must check that for all ultrafilters $F$ on $QB$, that

$c(F) \in U(b) \Rightarrow U(b) \in F.$

But $c(F) = i_B^{-1}(F)$. By definition of $U(b)$, we are thus reduced to checking that

$b \in i_B^{-1}(F) \Rightarrow U(b) \in F$

or that $i_B(b) \in F \Rightarrow U(b) \in F$. But $i_B(b) \in P^{op}QB$ (as a subset of $QB$) is $U(b)$! $\Box$

As a result, $U(b)$ and $U(\neg b)$ are open sets containing the given points $u, v$. They are disjoint since in fact $U(\neg b) = \neg U(b)$ (indeed, because $i_B$ preserves negation). This gives Hausdorffness, and also that the $U(b)$ are clopen (closed and open).

We actually get a lot more:

Proposition 5: The collection $\{U(b): b \in B\}$ is a basis for the topology $\mbox{top}(c)$ on $QB$.

Proof: The sets $U(b)$ form a basis for some topology $\tau$, because $U(b) \wedge U(c) = U(b \wedge c)$ (indeed, $i_B$ preserves meets). By the previous proposition, $\tau \subseteq \mbox{top}(c)$. So the identity on $QB$ gives a continuous comparison map

$QB_{\mbox{top}(c)} \to QB_{\tau}$

between the two topologies. But a continuous bijection from a compact space to a Hausdorff space is necessarily a homeomorphism, so $\tau = \mbox{top}(c)$. $\Box$

• Remark: In particular, the canonical topology on $\beta X = QP^{op}X$ is compact Hausdorff; this space is called the Stone-Cech compactification of (the discrete space) $X$. The methods exploited in this lecture can be used to show that in fact $\beta X$ is the free compact Hausdorff space generated from the set $X$, meaning that the functor $\beta$ is left adjoint to the underlying-set functor $U: \mbox{CompHaus} \to Set$. In fact, one can go rather further in this vein: a fascinating result (first proved by Eduardo Manes in his PhD thesis) is that the concept of compact Hausdorff space is algebraic (is monadic with respect to the monad $\beta$): there is a equationally defined theory where the class of $J$-ary operations (for each cardinal $J$) is coded by the set of ultrafilters $\beta J$, and whose models are precisely compact Hausdorff spaces. This goes beyond the scope of these lectures, but for the theory of monads, see the entertaining YouTube lectures by the Catsters!

In our last post on category theory, we continued our exploration of universal properties, showing how they can be used to motivate the concept of natural transformation, the “right” notion of morphism $\phi: F \to G$ between functors $F, G$. In today’s post, I want to turn things around, applying the notion of natural transformation to explain generally what we mean by a universal construction. The key concept is the notion of representability, at the center of a circle of ideas which includes the Yoneda lemma, adjoint functors, monads, and other things — it won’t be possible to talk about all these things in detail (because I really want to return to Stone duality before long), but perhaps these notes will provide a key of entry into more thorough treatments.

Even for a fanatic like myself, it’s a little hard to see what would drive anyone to study category theory except a pretty serious “need to know” (there is a beauty and conceptual economy to categorical thinking, but I’m not sure that’s compelling enough motivation!). I myself began learning category theory on my own as an undergraduate; at the time I had only the vaguest glimmerings of a vast underlying unity to mathematics, but it was only after discovering the existence of category theory by accident (reading the introductory chapter of Spanier’s Algebraic Topology) that I began to suspect it held the answer to a lot of questions I had. So I got pretty fired-up about it then, and started to read Mac Lane’s Categories for the Working Mathematician. I think that even today this book remains the best serious introduction to the subject — for those who need to know! But category theory should be learned from many sources and in terms of its many applications. Happily, there are now quite a few resources on the Web and a number of blogs which discuss category theory (such as The Unapologetic Mathematician) at the entry level, with widely differing applications in mind. An embarrassment of riches!

Anyway, to return to today’s topic. Way back when, when we were first discussing posets, most of our examples of posets were of a “concrete” nature: sets of subsets of various types, ordered by inclusion. In fact, we went a little further and observed that any poset could be represented as a concrete poset, by means of a “Dedekind embedding” (bearing a familial resemblance to Cayley’s lemma, which says that any group can be represented concretely, as a group of permutations). Such concrete representation theorems are extremely important in mathematics; in fact, this whole series is a trope on the Stone representation theorem, that every Boolean algebra is an algebra of sets! With that, I want to discuss a representation theorem for categories, where every (small) category can be explicitly embedded in a concrete category of “structured sets” (properly interpreted). This is the famous Yoneda embedding.

This requires some preface. First, we need the following fundamental construction: for every category $C$ there is an opposite category $C^{op}$, having the same classes $O, M$ of objects and morphisms as $C$, but with domain and codomain switched ($\mbox{dom}^{op} := \mbox{cod}: M \to O$, and $\mbox{cod}^{op} := \mbox{dom}: M \to O$). The function $O \to M: A \mapsto 1_A$ is the same in both cases, but we see that the class of composable pairs of morphisms is modified:

$(f, g) \in C^{op}_2$ [is a composable pair in $C^{op}$] if and only if $(g, f) \in C_2$

and accordingly, we define composition of morphisms in $C^{op}$ in the order opposite to composition in $C$:

$(g \circ f)^{op} := f \circ g$ in $C$.

Observation: The categorical axioms are satisfied in the structure $C^{op}$ if and only if they are in $C$; also, $(C^{op})^{op} = C$.

This observation is the underpinning of a Principle of Duality in the theory of categories (extending the principle of duality in the theory of posets). As the construction of opposite categories suggests, the dual of a sentence expressed in the first-order language of category theory is obtained by reversing the directions of all arrows and the order of composition of morphisms, but otherwise keeping the logical structure the same. Let me give a quick example:

Definition: Let $X_1, X_2$ be objects in a category $C$. A coproduct of $X_1$ and $X_2$ consists of an object $X$ and maps $i_1: X_1 \to X$, $i_2: X_2 \to X$ (called injection or coprojection maps), satisfying the universal property that given an object $Y$ and maps $f_1: X_1 \to Y$, $f_2: X_2 \to Y$, there exists a unique map $f: X \to Y$ such that $f_1 = f \circ i_1$ and $f_2 = f \circ i_2$. $\Box$

This notion is dual to the notion of product. (Often, one indicates the dual notion by appending the prefix “co” — except of course if the “co” prefix is already there; then one removes it.) In the category of sets, the coproduct of two sets $X_1, X_2$ may be taken to be their disjoint union $X_1 + X_2$, where the injections $i_1, i_2$ are the inclusion maps of $X_1, X_2$ into $X_1 + X_2$ (exercise).

Exercise: Formulate the notion of coequalizer (by dualizing the notion of equalizer). Describe the coequalizer of two functions $\displaystyle f, g: X \stackrel{\to}{\to} Y$ (in the category of sets) in terms of equivalence classes. Then formulate the notion dual to that of monomorphism (called an epimorphism), and by a process of dualization, show that in any category, coequalizers are epic.

Principle of duality: If a sentence expressed in the first-order theory of categories is provable in the theory, then so is the dual sentence. Proof (sketch): A proof $P$ of a sentence proceeds from the axioms of category theory by applying rules of inference. The dualization of $P$ proves the dual sentence by applying the same rules of inference but starting from the duals of the categorical axioms. A formal proof of the Observation above shows that collectively, the set of categorical axioms is self-dual, so we are done. $\Box$

Next, we introduce the all-important hom-functors. We suppose that $C$ is a locally small category, meaning that the class of morphisms $g: c \to d$ between any two given objects $c, d$ is small, i.e., is a set as opposed to a proper class. Even for large categories, this condition is just about always satisfied in mathematical practice (although there is the occasional baroque counterexample, like the category of quasitopological spaces).

Let $Set$ denote the category of sets and functions. Then, there is a functor

$\hom_C: C^{op} \times C \to Set$

which, at the level of objects, takes a pair of objects $(c, d)$ to the set $\hom(c, d)$ of morphisms $g: c \to d$ (in $C$) between them. It takes a morphism $(f: c \to c', h: d \to d')$ of $C^{op} \times C$ (that is to say, a pair of morphisms $(f: c' \to c, h: d \to d')$ of $C$) to the function

$\hom_C(f, h): \hom(c, d) \to \hom(c', d'): g \mapsto hgf.$

Using the associativity and identity axioms in $C$, it is not hard to check that this indeed defines a functor $\hom_C: C^{op} \times C \to Set$. It generalizes the truth-valued pairing $P^{op} \times P \to \mathbf{2}$ we defined earlier for posets.

Now assume $C$ is small. From last time, there is a bijection between functors

$\displaystyle \frac{h: C^{op} \times C \to Set}{f: C \to Set^{C^{op}}}$

and by applying this bijection to $\hom_C: C^{op} \times C \to Set$, we get a functor

$y_C: C \to Set^{C^{op}}.$

This is the famous Yoneda embedding of the category $C$. It takes an object $c$ to the hom-functor $\hom_C(-, c): C^{op} \to Set$. This hom-functor can be thought of as a structured, disciplined way of considering the totality of morphisms mapping into the object $c$, and has much to do with the Yoneda Principle we stated informally last time (and which we state precisely below).

• Remark: We don’t need $C$ to be small to talk about $\hom_C(-, c)$; local smallness will do. The only place we ask that $C$ be small is when we are considering the totality of all functors $C^{op} \to Set$, as forming a category $\displaystyle Set^{C^{op}}$.

Definition: A functor $F: C^{op} \to Set$ is representable (with representing object $c$) if there is a natural isomorphism $\displaystyle \hom_C(-, c) \stackrel{\sim}{\to} F$ of functors.

The concept of representability is key to discussing what is meant by a universal construction in general. To clarify its role, let’s go back to one of our standard examples.

Let $c_1, c_2$ be objects in a category $C$, and let $F: C^{op} \to Set$ be the functor $\hom(-, c_1) \times \hom(-, c_2)$; that is, the functor which takes an object $b$ of $C$ to the set $\hom(b, c_1) \times \hom(b, c_2)$. Then a representing object for $F$ is a product $c_1 \times c_2$ in $C$. Indeed, the isomorphism between sets $\hom(b, c_1 \times c_2) \cong \hom(b, c_1) \times \hom(b, c_2)$ simply recapitulates that we have a bijection

$\displaystyle \frac{b \to c_1 \times c_2}{b \to c_1 \qquad b \to c_2}$

between morphisms into the product and pairs of morphisms. But wait, not just an isomorphism: we said a natural isomorphism (between functors in the argument $b$) — how does naturality figure in?

Enter stage left the celebrated

Yoneda Lemma: Given a functor $F: C^{op} \to Set$ and an object $c$ of $C$, natural transformations $\phi: \hom(-, c) \to F$ are in (natural!) bijection with elements $\xi \in F(c)$.

Proof: We apply the “Yoneda trick” introduced last time: probe the representing object $c$ with the identity morphism, and see where $\phi$ takes it: put $\xi = \phi_c(1_c)$. Incredibly, this single element $\xi$ determines the rest of the transformation $\phi$: by chasing the element $1_c \in \hom(c, c)$ around the diagram

                phi_c
hom(c, c) -----> Fc
|            |
hom(f, c) |            | Ff
V            V
hom(b, c) -----> Fb
phi_b

(which commutes by naturality of $\phi$), we see for any morphism $f: b \to c$ in $\hom(b, c)$ that $\phi_b(f) = F(f)(\xi)$. That the bijection

$\displaystyle \frac{\xi: 1 \to F(c)}{\phi: \hom(-, c) \to F}$

is natural in the arguments $F, c$ we leave as an exercise. $\Box$

Returning to our example of the product $c_1 \times c_2$ as representing object, the Yoneda lemma implies that the natural bijection

$\displaystyle \phi_b: \hom(b, c_1 \times c_2) \cong \hom(b, c_1) \times \hom(b, c_2)$

is induced by the element $\xi = \phi_{c_1 \times c_2}(1_{c_1 \times c_2})$, and this element is none other than the pair of projection maps

$\xi = (\pi_1: c_1 \times c_2 \to c_1, \pi_2: c_1 \times c_2 \to c_2).$

In summary, the Yoneda lemma guarantees that a hom-representation $\phi: \hom(-, c) \cong F$ of a functor is, by the naturality assumption, induced in a uniform way from a single “universal” element $\xi \in F(c)$. All universal constructions fall within this general pattern.

Example: Let $C$ be a category with products, and let $c, d$ be objects. Then a representing object for the functor $\hom(- \times c, d): C^{op} \to Set$ is an exponential $d^c$; the universal element $\xi \in \hom(d^c \times c, d)$ is the evaluation map $d^c \times c \to d$.

Exercise: Let $\displaystyle f, g: x \stackrel{\to}{\to} y$ be a pair of parallel arrows in a category $C$. Describe a functor $F: C^{op} \to Set$ which is represented by an equalizer of this pair (assuming one exists).

Exercise: Dualize the Yoneda lemma by considering hom-functors $\hom_C(c, -): C \to Set$. Express the universal property of the coproduct in terms of representability by such hom-functors.

The Yoneda lemma has a useful corollary: for any (locally small) category $C$, there is a natural isomorphism

$\displaystyle \frac{\hom_C(-, a) \to \hom_C(-, b)}{a \to b}$

between natural transformations between hom-functors and morphisms in $C$. Using $C(a, b)$ as alternate notation for the hom-set, the action of the Yoneda embedding functor $y_C$ on morphisms gives an isomorphism between hom-sets

$\displaystyle C(a, b) \stackrel{\sim}{\to} Set^{C^{op}}(y_C a, y_C b);$

the functor $y_C$ is said in that case to be fully faithful (faithful means this action on morphisms is injective for all $a, b$, and full means the action is surjective for all $a, b$). The Yoneda embedding $y_C$ thus maps $C$ isomorphically onto the category of hom-functors $y_C a = \hom_C(-, a)$ valued in the category $Set$.

It is illuminating to work out the meaning of this last statement in special cases. When the category $C$ is a group $G$ (that is, a category with exactly one object $\bullet$ in which every morphism is invertible), then functors $F: G^{op} \to Set$ are tantamount to sets $X$ equipped with a group homomorphism $G^{op} \to \hom(X, X)$, i.e., a left action of $G^{op}$, or a right action of $G$. In particular, $\hom(-, \bullet): G^{op} \to Set$ is the underlying set of $G$, equipped with the canonical right action $\rho: G \to \hom(G, G)$, where $\rho(g)(h) = hg$. Moreover, natural transformations between functors $G^{op} \to Set$ are tantamount to morphisms of right $G$-sets. Now, the Yoneda embedding

$y_G: G \to Set^{G^{op}}$

identifies any abstract group $G$ with a concrete group $y_G(G)$, i.e., with a group of permutations — namely, exactly those permutations on $G$ which respect the right action of $G$ on itself. This is the sophisticated version of Cayley’s theorem in group theory. If on the other hand we take $C$ to be a poset, then the Yoneda embedding is tantamount to the Dedekind embedding we discussed in the first lecture.

Tying up a loose thread, let us now formulate the “Yoneda principle” precisely. Informally, it says that an object is determined up to isomorphism by the morphisms mapping into it. Using the hom-functor $\hom(-, c)$ to collate the morphisms mapping into $c$, the precise form of the Yoneda principle says that an isomorphism between representables $\hom(-, c) \to \hom(-, d)$ corresponds to a unique isomorphism $c \to d$ between objects. This follows easily from the Yoneda lemma.

But far and away, the most profound manifestation of representability is in the notion of an adjoint pair of functors. “Free constructions” give a particularly ubiquitous class of examples; the basic idea will be explained in terms of free groups, but the categorical formulation applies quite generally (e.g., to free monoids, free Boolean algebras, free rings = polynomial algebras, etc., etc.).

If $X$ is a set, the free group (q.v.) generated by $X$ is, informally, the group $FX$ whose elements are finite “words” built from “literals” $a, b, c, \ldots$ which are the elements of $X$ and their formal inverses, where we identify a word with any other gotten by introducing or deleting appearances of consecutive literals $a a^{-1}$ or $a^{-1}a$. Janis Joplin said it best:

Freedom’s just another word for nothin’ left to lose…

– there are no relations between the generators of $FX$ beyond the bare minimum required by the group axioms.

Categorically, the free group $FX$ is defined by a universal property; loosely speaking, for any group $G$, there is a natural bijection between group homomorphisms and functions

$\displaystyle \frac{FX \to G}{X \to UG}$

where $UG$ denotes the underlying set of the group. That is, we are free to assign elements of $G$ to elements of $X$ any way we like: any function $f: X \to UG$ extends uniquely to a group homomorphism $\hat{f}: FX \to G$, sending a word $x_1 x_2 \ldots x_n$ in $FX$ to the element $f(x_1)f(x_2) \ldots f(x_n)$ in $G$.

Using the usual Yoneda trick, or the dual of the Yoneda trick, this isomorphism is induced by a universal function $i: X \to UFX$, gotten by applying the bijection above to the identity map $id: FX \to FX$. Concretely, this function takes an element $x \in X$ to the one-letter word $x \in UFX$ in the underlying set of the free group. The universal property states that the bijection above is effected by composing with this universal map:

$\displaystyle \hom_{Grp}(FX, G) \to \hom_{Set}(UFX, UG) \stackrel{\hom(i, UG)}{\to} \hom_{Set}(X, UG)$

where the first arrow refers to the action of the underlying-set or forgetful functor $U: Grp \to Set$, mapping the category of groups to the category of sets ($U$ “forgets” the fact that homomorphisms $f: G \to H$ preserve group structure, and just thinks of them as functions $Uf: UG \to UH$).

• Remark: Some people might say this a little less formally: that the original function $f: X \to G$ is retrieved from the extension homomorphism $\hat{f}: FX \to G$ by composing with the canonical injection of the generators $X \to FX$. The reason we don’t say this is that there’s a confusion of categories here: properly speaking, $FX \to G$ belongs to the category of groups, and $X \to G$ to the category of sets. The underlying-set functor $U: Grp \to Set$ is a device we apply to eliminate the confusion.

In different words, the universal property of free groups says that the functor $\hom_{Set}(X, U-): Grp \to Set$, i.e., the underlying functor $U: Grp \to Set$ followed by the hom-functor $\hom(X, -): Set \to Set$, is representable by the free group $FX$: there is a natural isomorphism of functors from groups to sets:

$Grp(FX, -) \stackrel{\sim}{\to} Set(X, U-).$

Now, the free group $FX$ can be constructed for any set $X$. Moreover, the construction is functorial: defines a functor $F: Set \to Grp$. This is actually a good exercise in working with universal properties. In outline: given a function $f: X \to Y$, the homomorphism $Ff: FX \to FY$ is the one which corresponds bijectively to the function

$\displaystyle X \stackrel{f}{\to} Y \stackrel{i_Y}{\to} UFY,$

i.e., $Ff$ is defined to be the unique map $h$ such that $Uh \circ i_X = i_Y \circ f$.

Proposition: $F: Set \to Grp$ is functorial (i.e., preserves morphism identities and morphism composition).

Proof: Suppose $f: X \to Y$, $g: Y \to Z$ is a composable pair of morphisms in $Set$. By universality, there is a unique map $h: FX \to FZ$, namely $F(g \circ f)$, such that $Uh \circ i_X = i_Z \circ (g \circ f)$. But $Fg \circ Ff$ also has this property, since

$U(Fg \circ Ff) \circ i_X = UFg \circ UFf \circ i_X = UFg \circ i_Y \circ f = i_Z \circ g \circ f$

(where we used functoriality of $U$ in the first equation). Hence $F(g \circ f) = Fg \circ Ff$. Another universality argument shows that $F$ preserves identities. $\Box$

Observe that the functor $F$ is rigged so that for all morphisms $f: X \to Y$,

$UFf \circ i_X = i_Y \circ f.$

That is to say, that there is only one way of defining $F$ so that the universal map $i_X$ is (the component at $X$ of) a natural transformation $1_{Set} \to UF$!

The underlying-set and free functors $U: Grp \to Set$, $F: Set \to Grp$ are called adjoints, generalizing the notion of adjoint in truth-valued matrix algebra: we have an isomorphism

$\hom_{Grp}(FX, Y) \cong \hom_{Set}(X, UY)$

natural in both arguments $X, Y$. We say that $F$ is left adjoint to $U$, or dually, that $U$ is right adjoint to $F$, and write $F \dashv U$. The transformation $i: 1 \to UF$ is called the unit of the adjunction.

Exercise: Define the construction dual to the unit, called the counit, as a transformation $\varepsilon: FU \to 1$. Describe this concretely in the case of the free-underlying adjunction $F \dashv U$ between sets and groups.

What makes the concept of adjoint functors so compelling is that it combines representability with duality: the manifest symmetry of an adjunction $\hom(FX, Y) \cong \hom(X, GY)$ means that we can equally well think of $GY$ as representing $\hom(F-, Y)$ as we can $FX$ as representing $\hom(X, G-)$. Time is up for today, but we’ll be seeing more of adjunctions next time, when we resume our study of Stone duality.

[Tip of the hat to Robert Dawson for the Janis Joplin quip.]

I wish to bring the attention of our readers to the $36^{th}$ Carnival of Mathematics hosted by Charles at Rigorous Trivialities. I guess most of you already know about it. Among other articles/posts, one of Todd’s recent post Basic Category Theory I is part of the carnival. He followed it up with another post titled Basic Category Theory II. There will be a third post on the same topic some time soon. This sub-series of posts on basic category theory, if you recall, is part of the larger series on Stone Duality, which all began with Toward Stone Duality: Posets and Meets. Hope you enjoy the Carnival!

I’ll continue then with this brief subseries on category theory. Today I want to talk more about universal properties, and about the notion of natural transformation. Maybe not today, but soon at any rate, I want to tie all this in with the central concept of representability, which leads directly and naturally to the powerful and fundamental idea of adjoint functors. This goes straight to the very heart of category theory.

The term “natural”, often bandied about by mathematicians, is perhaps an overloaded term (see the comments here for a recent disagreement about certain senses of the word). I don’t know the exact history of the word as used by mathematicians, but by the 1930s and 40s the description of something as “natural” was part of the working parlance of many mathematicians (in particular, algebraic topologists), and it is to the great credit of Eilenberg and Mac Lane that they sought to give the word a precise mathematical sense. A motivating problem in their case was to prove a universal coefficient theorem for Cech cohomology, for which they needed certain comparison maps (transformations) which cohered by making certain diagrams commute (which was the naturality condition). In trying to precisely define this concept of naturality, they were led to the concept of a “functor” and then, to define the concept of functor, they were led back to the notion of category! And the rest, as they say, is history.

More on naturality in a moment. Let me first give a few more examples of universal constructions. Last time we discussed the general concept of a cartesian product — obviously in honor of Descartes, for his tremendous idea of the method of coordinates and analytic geometry.

But of course products are only part of the story: he was also interested in the representation of equations by geometric figures: for instance, representing an equation $y = f(x)$ as a subset of the plane. In the language of category theory, the variable $y$ denotes the second coordinate or second projection map $\pi_2: \mathbb{R} \times \mathbb{R} \to \mathbb{R}$, and $f(x)$ denotes the composite of the first projection map followed by some given map $f$:

$\displaystyle \mathbb{R} \times \mathbb{R} \stackrel{\pi_1}{\to} \mathbb{R} \stackrel{f}{\to} \mathbb{R}.$

The locus of the equation $y = f(x)$ is the subset of $\mathbb{R} \times \mathbb{R}$ where the two morphisms $\pi_2$ and $f \circ \pi_1$ are equal, and we want to describe the locus $L$ in a categorical way (i.e., in a way which will port over to other categories).

Definition: Given a pair of morphisms

$\displaystyle f, g: X \stackrel{\to}{\to} Y$

their equalizer consists of an object $L$ and a map $e: L \to X$, such that $f \circ e = g \circ e$, and satisfying the following universal property: for any map $h: A \to X$ such that $f \circ h = g \circ h$, there exists a unique map $j: A \to L$ such that $h = e \circ j$ (any map $h: A \to X$ that equalizes $f$ and $g$ factors in a unique way through the equalizer $e: L \to X$). $\Box$

Another way of saying it is that there is a bijection between $(f, g)$-equalizing maps $h: A \to X$ and maps $j: A \to L$,

$\displaystyle \frac{h: A \to X \mbox{ such that } fh = gh}{j: A \to L \qquad },$

effected by composing such maps $j$ with the universal $(f, g)$-equalizing map $e: L \to X$.

Exercise: Apply a universality argument to show that any two equalizers of a given pair of maps $(f, g)$ are isomorphic.

It is not immediately apparent from the definition that an equalizer $e: L \to X$ describes a “subthing” (e.g., a subset) of $X$, but then we haven’t even discussed subobjects. The categorical idea of subobject probably takes some getting used to anyway, so I’ll be brief. First, there is the idea of a monomorphism (or a “mono” for short), which generalizes the idea of an injective or one-to-one function. A morphism $f: S \to T$ is monic if for all $g, h: A \to S$, $f \circ g = f \circ h$ implies $g = h$. Monos with codomain $T$ are preordered by a relation $\leq$, where

$(e: R \to T) \leq (f: S \to T)$

if there exists $g: R \to S$ such that $e = f \circ g$. (Such a $g$ is unique since $f$ is monic, so it doesn’t need to be specified particularly; also this $g$ is easily seen to be monic [exercise].) Then we say that two monics $e, f$ mapping into $T$ name the same subobject of $T$ if $e \leq f$ and $f \leq e$; in that case the mediator $g$ is an isomorphism. Writing $e \sim f$ to denote this condition, it is standard that $\sim$ is an equivalence relation.

Thus, a subobject of $X$ is an equivalence class of monos into $X$. So when we say an equalizer $e: L \to X$ of maps $f, g: X \to Y$ defines a subobject of $X$, all we really mean is that $e$ is monic. Proof: Suppose $eh = ej$ for maps $h, j: A \to X$. Since $fe = ge$, we have $f(ej) = g(ej)$ for instance. By definition of equalizer, this means there exists a unique map $k: A \to X$ for which $eh = ej = ek$. Uniqueness then implies $h, j$ are equal to this self-same $k$, so $h = j$ and we are done.

Let me turn to another example of a universal construction, which has been used in one form or another for centuries: that of “function space”. For example, in the calculus of variations, one may be interested in the “space” of all (continuous) paths $\alpha: I = \left[0, 1\right] \to X$ in a physical space $X$, and in paths which minimize “action” (principle of least action).

If $X$ is a topological space, then one is faced with a variety of choices for topologizing the path space (denoted $X^I$). How to choose? As in our discussion last time of topologizing products, our view here is that the “right” topology will be the unique one which ensures that an appropriate universal property is satisfied.

To get started on this: the points of the path space $X^I$ are of course paths $\alpha: I \to X$, and paths in the path space, $I \to X^I$, sending each $s \in I$ to a path $\alpha_s: I \to X$, should correspond to homotopies between paths, that is continuous maps $h: I \times I \to X$; the idea is that $h(s, t) := \alpha_s(t)$. Now, just knowing what paths in a space $Y = X^I$ look like (homotopies between paths) may not be enough to pin down the topology on $Y$, but: suppose we now generalize. Suppose we decree that for any space $Z$, the continuous maps $Z \to X^I$ should correspond exactly to continuous maps $h: Z \times I \to X$, also called homotopies. Then that is enough to pin down the topology on $X^I$. (We could put it this way: we use general spaces $Z$ to probe the topology of $X^I$.)

This principle applies not just to topology, but is extremely general: it applies to any category! I’ll state it very informally for now, and more precisely later:

Yoneda principle: to determine any object $Y$ up to isomorphism, it suffices to understand what general maps $Z \to Y$ mapping into it look like.

For instance, a product $X_1 \times X_2$ is determined up to isomorphism by knowing what maps $Z \to X_1 \times X_2$ into it look like [they look like pairs of maps $(Z \to X_1, Z \to X_2)$]. In the first lecture in the Stone duality, we stated the Yoneda principle just for posets; now we are generalizing it to arbitrary categories.

In the case at hand, we would like to express the bijection between continuous maps

$\displaystyle \frac{f: Z \to X^I}{h: Z \times I \to X}$

as a working universal property for the function space $X^I$. There is a standard “Yoneda trick” for doing this: probe the thing we seek a universal characterization of with the identity map, here $1_{X^I}: X^I \to X^I$. Passing to the other side of the bijection, the identity map corresponds to a map

$ev: X^I \times I \to X$

and this is the “universal map” we need. (I called it $ev$ because in this case it is the evaluation map, which maps a pair (path $\alpha: I \to X$, point $t \in I$) to $\alpha(t)$, i.e., evaluates $\alpha$ at $t$.)

Here then is the universal characterization of the path space: a space $X^I$ equipped with a continuous map $ev: X^I \times I \to X$, satisfying the following universal property: given any continuous map $h: Z \times I \to X$, there exists a unique continuous map $f: Z \to X^I$ such that $h$ is retrieved as the composite

$\displaystyle Z \times I \stackrel{f \times 1_I}{\to} X^I \times I \stackrel{ev}{\to} X$

(for the first arrow in the composite, cf. the exercise stated at the end of the last lecture).

Exercise: Formulate a universality argument that this universal property determines $X^I$ up to isomorphism.

Remark: Incidentally, for any space $X$, such a path space exists; its topology turns out to be the topology of “uniform convergence”. We can pose a similar universal definition of any function space $X^Y$ (replacing $I$ by $Y$, mutatis mutandis); a somewhat non-trivial result is that such a function space exists for all $X$ if and only if $Y$ is locally compact; the topology on $X^Y$ is then the so-called “compact-open” topology.

But why stop there? A general notion of “exponential” object is available for any category $C$ with cartesian products: for objects $c, d$ of $C$, an exponential $d^c$ is an object equipped with a map $ev: d^c \times c \to d$, such that for any $h: b \times c \to d$, there exists a unique $f: b \to d^c$ such that $h$ is retrieved as the composite

$\displaystyle b \times c \stackrel{f \times 1_c}{\to} d^c \times c \stackrel{ev}{\to} d.$

Example: If the category is a meet-semilattice, then (assuming $x^y$ exists) there is a bijection or equivalence which takes the form

$\displaystyle \frac{a \leq x^y}{a \wedge y \leq x}$ iff

But wait, we’ve seen this before: $x^y$ is what we earlier called the implication $y \Rightarrow x$. So implication is really a function space object! $\Box$

Okay, let me turn now to the notion of natural transformation. I described the original discovery (or invention) of categories as a kind of reverse engineering (functors were invented to talk about natural transformations, and categories were invented to talk about functors). Moving now in the forward direction, the rough idea can be stated as a progression:

• The notion of functor is appropriately defined as a morphism between categories,
• The notion of natural transformation is appropriately defined as a morphism between functors.

That seems pretty bare-bones: how do we decide what the appropriate notion of morphism between functors should be? One answer is by pursuing an analogy:

• As a space $Y^X$ of continuous functions $X \to Y$ is to the category of topological spaces, so a category $D^C$ of functors $C \to D$ should be to the category of categories.

That is, we already “know” (or in a moment we’ll explain) that the objects of this alleged exponential category $D^C$ are functors $F: C \to D$. Since $D^C$ is defined by a universal property, it is uniquely determined up to isomorphism. This in turn will uniquely determine what the “right” notion of morphism between functors $F, G: C \to D$ should be: morphisms $F \to G$ in the exponential $D^C$! Then, to discover the nature of these morphisms, we employ an appropriate “probe”.

To carry this out, I’ll need two special categories. First, the category $\mathbf{1}$ denotes a (chosen) category with exactly one object and exactly one morphism (necessarily the identity morphism). It satisfies the universal property that for any category $C$, there exists a unique functor $C \to \mathbf{1}$. It is called a terminal category (for that reason). It can also be considered as an empty product of categories.

Proposition: For any category $C$, there is an isomorphism $\mathbf{1} \times C \cong C$.

Proof: Left to the reader. It can be proven either directly, or by applying universal properties. $\Box$

The category $\mathbf{1}$ can also be considered an “object probe”, in the sense that a functor $\mathbf{1} \to C$ is essentially the same thing as an object of $C$ (just look where the object of $\mathbf{1}$ goes to in $C$).

For example, to probe the objects of the exponential category $D^C$, we investigate functors $\mathbf{1} \to D^C$. By the universal property of exponentials $D^C$, these are in bijection with functors $\mathbf{1} \times C \to D$. By the proposition above, these are in bijection with functors $C \to D$. So objects of $D^C$ are necessarily tantamount to functors $C \to D$ (and so we might as well define them as such).

Now we want to probe the morphisms of $D^C$. For this, we use the special category given by the poset $\mathbf{2} = \{0 \leq 1\}$. For if $X$ is any category and $f: x \to y$ is a morphism of $X$, we can define a corresponding functor $F: \mathbf{2} \to X$ such that $F(0) = x$, $F(1) = y$, and $F$ sends the morphism $0 \leq 1$ to $f$. Thus, such functors $F$ are in bijection with morphisms of $X$. Speaking loosely, we could call the category $\mathbf{2}$ the “generic morphism”.

Thus, to probe the morphisms in the category $D^C$, we look at functors $\mathbf{2} \to D^C$. In particular, if $F, G$ are functors $C \to D$, let us consider functors $\phi: \mathbf{2} \to D^C$ such that $\phi(0) = F$ and $\phi(1) = G$. By the universal property of $D^C$, these are in bijection with functors $\eta: \mathbf{2} \times C \to D$ such that the composite

$\displaystyle C \cong \mathbf{1} \times C \stackrel{0 \times 1_C}{\to} \mathbf{2} \times C \stackrel{\eta}{\to} D$

equals $F$, and the composite

$\displaystyle C \cong \mathbf{1} \times C \stackrel{1 \times 1_C}{\to} \mathbf{2} \times C \stackrel{\eta}{\to} D$

equals $G$. Put more simply, this says $\eta(0, c) = F(c)$ and $\eta(1, c) = G(c)$ for objects $c$ of $C$, and $\eta(1_0, f) = F(f)$ and $\eta(1_1, f) = G(f)$ for morphisms $f$ of $C$.

The remaining morphisms of $\mathbf{2} \times C$ have the form $(0 \leq 1, f: c \to d)$. Introduce the following abbreviations:

1. $\phi_c := \eta(0 \leq 1, 1_c)$ for objects $c$ of $C$;
2. $\phi_f := \eta(0 \leq 1, f)$ for morphisms $f$ of $C$.

Since $\eta$ is a functor, it preserves morphism composition. We find in particular that since

$(1_1, f) \circ (0 \leq 1, 1_c) = (1_1 \circ (0 \leq 1), f \circ 1_c) = (0 \leq 1, f)$

$(0 \leq 1, 1_d) \circ (1_0, f) = ((0 \leq 1) \circ 1_0, 1_d \circ f) = (0 \leq 1, f)$

we have

$\eta(1_1, f) \circ \eta(0 \leq 1, 1_c) = \eta(0 \leq 1, f)$

$\eta(0 \leq 1, 1_d) \circ \eta(1_0, f) = \eta(0 \leq 1, f)$

or, using the abbreviations,

$G(f) \circ \phi_c = \phi_f = \phi_d \circ F(f).$

In particular, the data $\phi_f$ is redundant: it may be defined either as either side of the equation

$G(f) \circ \phi_c = \phi_d \circ F(f)$.

Exercise: Just on the basis of this last equation (for arbitrary morphisms $f$ and objects $c$ of $C$), prove that functoriality of $\eta$ follows.

This leads us at last to the definition of natural transformation:

Definition: Let $C, D$ be categories and let $F, G$ be functors from $C$ to $D$. A natural transformation $\phi: F \to G$ is an assignment of morphisms $\phi_c: F(c) \to G(c)$ in $D$ to objects $c$ of $C$, such that for every morphism $f: c \to d$, the following equation holds: $G(f) \circ \phi_c = \phi_d \circ F(f)$. $\Box$

Usually this equation is expressed in the form of a commutative diagram:

          F(f)
F(c) ---> F(d)
|         |
phi_c |         | phi_d
V   G(f)  V
G(c) ---> G(d)

which asserts the equality of the composites formed by following the paths from beginning (here $F(c)$) to end (here $G(d)$). (Despite the inconvenience in typesetting them, commutative diagrams as 2-dimensional arrays are usually easier to read and comprehend than line-by-line equations.) The commutative diagram says that the components $\phi_c$ of the transformation are coherent or compatible with all morphisms $f: c \to d$ of the domain category.

Remarks: Now that I’ve written this post, I’m a little worried that any first-timers to category theory reading this will find this approach to natural transformations a little hardcore or intimidating. In that case I should say that my intent was to make this notion seem as inevitable as possible: by taking seriously the analogy

function space: category of spaces :: functor category: category of categories

we are inexorably led to the “right” (the “natural”) notion of natural transformation as morphism between functors. But if this approach is for some a pedagogical flop, then I urge those readers just to forget it, or come back to it later. Just remember the definition of natural transformation we finally arrived at, and you should be fine. Grasping the inner meaning of fundamental concepts like this takes time anyway, and isn’t merely a matter of pure deduction.

I should also say that the approach involved a kind of leap of faith that these functor categories (the exponentials $D^C$) “exist”. To be sure, the analysis above shows clearly what they must look like if they do exist (objects are functors $C \to D$; morphisms are natural transformations as we’ve defined them), but actually there’s some more work to do: one must show they satisfy the universal property with respect to not just the two probing categories $\mathbf{1}$ and $\mathbf{2}$ that we used, but any category $E$.

A somewhat serious concern here is that our talk of exponential categories played pretty fast and loose at the level of mathematical foundations. There’s that worrying phrase “category of categories”, for starters. That particular phraseology can be avoided, but nevertheless, it must be said that in the case where $C$ is a large category (i.e., involving a proper class of morphisms), the collection of all functors from $C$ to $D$ is not a well-formed construction within the confines of Gödel-Bernays-von Neumann set theory (it is not provably a class in general; in some approaches it could be called a “super-class”).

My own attitude toward these “problems” tends to be somewhat blasé, almost dismissive: these are mere technicalities, sez I. The main concepts are right and robust and very fruitful, and there are various solutions to the technical “problem of size” which have been developed over the decades (although how satisfactory they are is still a matter of debate) to address the apparent difficulties. Anyway, I try not to worry about it much. But, for those fine upstanding citizens who do worry about these things, I’ll just state one set-theoretically respectable theorem to convey that at least conceptually, all is right with the world.

Definition: A category with finite products is cartesian closed if for any two objects $c, d$, there exists an exponential object $d^c$.

Theorem: The category of small categories is cartesian closed. $\Box$

After a long hiatus (sorry about that!), I would like to resume the series on Stone duality. You may recall I started this series by saying that my own point of view on mathematics is strongly informed by category theory, followed by a little rant about the emotional reactions that category theory seems to excite in many people, and that I wouldn’t be “blathering” about categories unless a strong organic need was felt for it. Well, it’s come to that point: to continue talking sensibly about Stone duality, I really feel some basic concepts of category theory are now in order. So: before we pick up the main thread again, I’ll be talking about categories up to the point of the central concept of adjoint pairs, generalizing what we’ve discussed before in the context of truth-valued matrix algebra.

I’ll start by firmly denouncing a common belief: that category theory is some arcane, super-abstract subject. I just don’t believe that’s a healthy way of looking at it. To me, categories are no more and no less abstract than groups, rings, fields, etc. — they are just algebraic structures of a certain type (and a not too complicated type at that). That said, they are particularly ubiquitous and useful structures, which can be studied either as small structures (for example, posets provide examples of categories, and so do groups), or to organize the study of general types of structure in the large (for example, the class of posets and poset morphisms forms a category). Just think of them that way: they are certain sorts of algebraic structures which crop up just about everywhere, and it is very useful to learn something about them.

Usually, the first examples one is shown are large categories, typically of the following sort. One considers the class of mathematical structures of a given type: it could be the class of groups, or of posets, or of Boolean algebras, etc. The elements of a general such class are given the neutral name “objects”. Then, we are also interested in how the objects $A, B, C, \ldots$ are related to each other, typically through transformations $f: A \to B$ which “preserve” the given type of structure. In the case of sets, transformations are just functions; in the case of groups, the transformations are group homomorphisms (which preserve group multiplication, inversion, and identities); in the case of vector spaces, they are linear transformations (preserving vector addition and scalar multiplication); in the case of topological spaces, they are continuous maps (preserving a suitable notion of convergence). In general, the transformations are given the neutral name “homomorphisms”, or more often just “morphisms” or “maps”.

In all of these cases, two morphisms $f: A \to B$, $g: B \to C$ compose to give a new morphism $g \circ f: A \to C$ (for example the composite of two group homomorphisms is a group homomorphism), and do so in an associative way ($h \circ (g \circ f) = (h \circ g) \circ f$), and also there is an identity morphism $1_A: A \to A$ for each object $A$ which behaves as identities should ($f \circ 1_A = f = 1_B \circ f$ for any morphism $f: A \to B$). A collection of objects, morphisms between them, together with an associative law of composition and identities, is called a category.

A key insight of category theory is that in general, important structural properties of objects $A, B, C, \ldots$ can be described purely in terms of general patterns or diagrams of morphisms and their composites. By means of such general patterns, the same concept (like the concept of a product of two objects, or of a quotient object, or of a dual) takes on the same form in many different kinds of category, for many different kinds of structure (whether algebraic, or topological, or analytic, or some mixture thereof) — and this in large part gives category theory the power to unify and simplify the study of general mathematical structure. It came as quite a revelation to me personally that (to take one example) the general idea of a “quotient object” (quotient group, quotient space, etc.) is not based merely on vague family resemblances between different kinds of structure, but can be made absolutely precise and across the board, in a simple way. That sort of explanatory power and conceptual unification is what got me hooked!

In a nutshell, then, category theory is the study of commonly arising structures via general patterns or diagrams of morphisms, and the general application of such study to help simplify and organize large portions of mathematics. Let’s get down to brass tacks.

Definition: A category $C$ consists of the following data:

• A class $O$ of objects;
• A class $M$ of morphisms;
• A function $\mbox{dom}: M \to O$ which assigns to each morphism its domain, and a function $\mbox{cod}: M \to O$ which assigns to each morphism its codomain. If $f \in M$, we write $f: A \to B$ to indicate that $\mbox{dom}(f) = A$ and $\mbox{cod}(f) = B$.
• A function $\mbox{Id}: O \to M$, taking an object $A$ to a morphism $1_A: A \to A$, called the identity on $A$.

Finally, let $C_2$ denote the class of composable pairs of morphisms, i.e., pairs $(f, g) \in M \times M$ such that $\mbox{cod}(f) = \mbox{dom}(g)$. The final datum:

• A function $\mbox{comp}: C_2 \to M$, taking a composable pair $(f: A \to B, g: B \to C)$ to a morphism $g \circ f: A \to C$, called the composite of $f$ and $g$.

These data satisfy a number of axioms, some of which have already been given implicitly (e.g., $\mbox{dom}(g \circ f) = \mbox{dom}(f)$ and $\mbox{cod}(g \circ f) = \mbox{cod}(g)$). The ones which haven’t are

1. Associativity: $h \circ (g \circ f) = (h \circ g) \circ f$ for each composable triple $(f: A \to B, g: B \to C, h: C \to D)$.
2. Identity axiom: Given $f: A \to B$, $f \circ 1_A = f = 1_B \circ f$.

Sometimes we write $C_0$ for the class of objects, $C_1$ for the class of morphisms, and for $n > 1$, $C_n$ for the class of composable $n$-tuples of morphisms. $\Box$

Nothing in this definition says that objects of a category are “sets with extra structure” (or that morphisms preserve such structure); we are just thinking of objects as general “things” and depict them as nodes, and morphisms as arrows or directed edges between nodes, with a given law for composing them. The idea then is all about formal patterns of arrows and their compositions (cf. “commutative diagrams”). Vishal’s post on the notion of category had some picture displays of the categorical axioms, like associativity, which underline this point of view.

In the same vein, categories are used to talk about not just large classes of structures; in a number of important cases, the structures themselves can be viewed as categories. For example:

1. A preorder can be defined as a category for which there is at most one morphism $f: A \to B$ for any two objects $A, B$. Given there is at most one morphism from one object to another, there is no particular need to give it a name like $f$; normally we just write $a \leq b$ to say there is a morphism from $a$ to $b$. Morphism composition then boils down to the transitivity law, and the data of identity morphisms expresses the reflexivity law. In particular, posets (preorders which satisfy the antisymmetry law) are examples of categories.
2. A monoid is usually defined as a set $M$ equipped with an associative binary operation $(a, b) \mapsto a \cdot b$ and with a (two-sided) identity element $e$ for that operation. Alternatively, a monoid can be construed as a category with exactly one object. Here’s how it works: given a monoid $(M, \cdot, e)$, define a category where the class $O$ consists of a single object (which I’ll give a neutral name like $\bullet$; it doesn’t have to be any “thing” in particular; it’s just a “something”, it doesn’t matter what), and where the class of morphisms is defined to be the set $M$. Since there is only one object, we are forced to define $\mbox{dom}(a) = \bullet$ and $\mbox{cod}(a) = \bullet$ for all $a \in M$. In that case all pairs of morphisms are composable, and composition is defined to be the operation in $M$: $a \circ b := a \cdot b$. The identity morphism on $\bullet$ is defined to be $e$. We can turn the process around: given a category with exactly one object, the class of morphisms $M$ can be construed as a monoid in the usual sense.
3. A groupoid is a category in which every morphism is an isomorphism (by definition, an isomorphism is an invertible morphism, that is, a morphism $f: A \to B$ for which there exists a morphism $g: B \to A$ such that $g \circ f = 1_A$ and $f \circ g = 1_B$). For example, the category of finite sets and bijections between them is a groupoid. The category of topological spaces and homeomorphisms between them is a groupoid. A group is a monoid in which every element is invertible; hence a group is essentially the same thing as a groupoid with exactly one object.

Remark: The notion of isomorphism is important in category theory: we think of an isomorphism $f: A \to B$ as a way in which objects $A, B$ are the “same”. For example, if two spaces are homeomorphic, then they are indistinguishable as far as topology is concerned (any topological property satisfied by one is shared by the other). In general there may be many ways or isomorphisms to exhibit such “sameness”, but typically in category theory, if two objects satisfy the same structural property (called a universal property; see below), then there is just one isomorphism between them which respects that property. Those are sometimes called “canonical” or “god-given” isomorphisms; they are 100% natural, no artificial ingredients! $\Box$

Earlier I said that category theory studies mathematical structure in terms of general patterns or diagrams of morphisms. Let me give a simple example: the general notion of “cartesian product”. Suppose $X_1, X_2$ are two objects in a category. A cartesian product of $X_1$ and $X_2$ is an object $X$ together with two morphisms $\pi_1: X \to X_1$, $\pi_2: X \to X_2$ (called the projection maps), satisfying the following universal property: given any object $Y$ equipped with a map $f_i: Y \to X_i$ for $i = 1, 2$, there exists a unique map $f: Y \to X$ such that $f_i = \pi_i \circ f$ for $i = 1, 2$. (Readers not familiar with this categorical notion should verify the universal property for the cartesian product of two sets, in the category of sets and functions.)

I said “a” cartesian product, but any two cartesian products are the same in the sense of being isomorphic. For suppose both $(X, \pi_1, \pi_2)$ and $(X', \pi_1', \pi_2')$ are cartesian products of $X_1, X_2$. By the universal property of the first product, there exists a unique morphism $f: X' \to X$ such that $\pi_i' = \pi_i \circ f$ for $i = 1, 2$. By the universal property of the second product, there exists a unique morphism $g: X \to X'$ such that $\pi_i = \pi_i' \circ g$. These maps $f$ and $g$ are inverse to one another. Why? By the universal property, there is a unique map $\phi: X \to X$ (namely, $\phi = 1_X$) such that $\pi_i = \pi_i \circ \phi$ for $i = 1, 2$. But $\phi = f \circ g$ also satisfies these equations: $\pi_i = \pi_i \circ (f \circ g)$ (using associativity). So $1_X = f \circ g$ by the uniqueness clause of the universal property; similarly, $1_{X'} = g \circ f$. Hence $f: X \to X'$ is an isomorphism.

This sort of argument using a universal property is called a universality argument. It is closely related to what we dubbed the “Yoneda principle” when we studied posets.

So: between any two products $X, X'$ of $X_1$ and $X_2$, there is a unique isomorphism $f: X' \to X$ respecting the product structure; we say that any two products are canonically isomorphic. Very often one also has chosen products (a specific choice of product for each ordered pair $(X_1, X_2)$), as in set theory when we construe the product of two sets as a set of ordered pairs $\{(x_1, x_2): x_1 \in X_1, x_2 \in X_2\}$. We use $X_1 \times X_2$ to denote (the object part of) a chosen cartesian product of $(X_1, X_2)$.

Exercise: Use universality to exhibit a canonical isomorphism $\sigma: X_1 \times X_2 \to X_2 \times X_1$. This is called a symmetry isomorphism for the cartesian product.

Many category theorists (including myself) are fond of the following notation for expressing the universal property of products:

$\displaystyle \frac{f_1: Y \to X_1 \qquad f_2: Y \to X_2}{f = \langle f_1, f_2 \rangle: Y \to X_1 \times X_2}$

where the dividing line indicates a bijection between pairs of maps $(f_1, f_2)$ and single maps $f$ into the product, effected by composing $f$ with the pair of projection maps. We have actually seen this before: when the category is a poset, the cartesian product is called the meet:

$\displaystyle \frac{a \leq x \qquad a \leq y}{a \leq x \wedge y}$

In fact, a lot of arguments we developed for dealing with meets in posets extend to more general cartesian products in categories, with little change (except that instead of equalities, there will typically be canonical isomorphisms). For example, we can speak of a cartesian product of any indexed collection of objects $\{X_i\}_{i \in I}$: an object $\prod_{i \in I} X_i$ equipped with projection maps $\pi_i: \prod_{i \in I} X_i \to X_i$, satisfying the universal property that for every $I$-tuple of maps $f_i: Y \to X_i$, there exists a unique map $f: Y \to \prod_{i \in I} X_i$ such that $f_i = \pi_i \circ f$. Here we have a bijection between $I$-tuples of maps and single maps:

$\displaystyle \frac{(f_i: Y \to X_i)_{i \in I}}{f = \langle f_i \rangle_{i \in I}: Y \to \prod_{i \in I} X_i}$

By universality, such products are unique up to unique isomorphism. In particular, $(X_1 \times X_2) \times X_3$ is a choice of product of the collection $\{X_1, X_2, X_3\}$, as seen by contemplating the bijection between triples of maps and single maps

$\displaystyle \frac{\frac{f_1: Y \to X_1 \qquad f_2: Y \to X_2}{\langle f_1, f_2 \rangle: Y \to X_1 \times X_2} \qquad \frac{f_3: Y \to X_3}{f_3: Y \to X_3}}{f: Y \to (X_1 \times X_2) \times X_3}$

and similarly $X_1 \times (X_2 \times X_3)$ is another choice of product. Therefore, by universality, there is a canonical associativity isomorphism

$\alpha: (X_1 \times X_2) \times X_3 \to X_1 \times (X_2 \times X_3).$

Remark: It might be thought that in all practical cases, the notion of cartesian product (in terms of good old-fashioned sets of tuples) is clear enough; why complicate matters with categories? One answer is that it isn’t always clear from purely set-theoretic considerations what the right product structure is, and in such cases the categorical description gives a clear guide to what we really need. For example, when I was first learning topology, the box topology on the set-theoretic product $\prod_{i \in I} X_i$ seemed to me to be a perfectly natural choice of topology; I didn’t understand the general preference for what is called the “product topology”. (The open sets in the box topology are unions of products $\prod_{i \in I} U_i$ of open sets in the factors $X_i$. The open sets in the product topology are unions of such products where $U_i = X_i$ for all but finitely many $i \in I$.)

In retrospect, the answer is obvious: the product topology on $\prod_{i \in I} X_i$ is the smallest topology making all the projection maps $\pi_i$ continuous. This means that a function $f: Y \to \prod_{i \in I} X_i$ is continuous if and only if each $f_i = \pi_i \circ f: Y \to X_i$ is continuous: precisely the universal property we need. Similarly, in seeking to understand products or other constructions of more abstruse mathematical structures (schemes for instance), the categorical description is de rigeur in making sure we get it right. $\Box$

For just about any mathematical structure we can consider a category of such structures, and this applies to the notion of category itself. That is, we can consider a category of categories! (Sounds almost religious to me: category of categories, holy of holies, light of lights…)

• Remark: Like “set of sets”, the idea of category of categories taken to a naive extreme leads to paradoxes or other foundational difficulties, but there are techniques for dealing with these issues, which I don’t particularly want to discuss right now. If anyone is uncomfortable around these issues, a stopgap measure is to consider rather the category of small categories (a category has a class of objects and morphisms; a small category is where these classes are sets), within some suitable framework like the set theory of Gödel-Bernays-von Neumann.

If categories are objects, the morphisms between them may be taken to be structure-preserving maps between categories, called “functors”.

Definition: If $C$ and $D$ are categories, a functor $F: C \to D$ consists of a function $F_0: C_0 \to D_0$ (between objects) and a function $F_1: C_1 \to D_1$ (between morphisms), such that

• $F_0(\mbox{dom}_C(f)) = \mbox{dom}_D(F_1(f))$ and $F_0(\mbox{cod}_C(f)) = \mbox{cod}_D(F_1(f))$, for each morphism $f \in C_1$ (i.e., $F$ preserves domains and codomains of morphisms);
• $F_1(1_A) = 1_{F_0(A)}$ for each object $A \in C_0$, and $F_1(g \circ f) = F_1(g) \circ F_1(f)$ for each composable pair $(f, g) \in C_2$ (i.e., $F$ preserves identity morphisms and composition of morphisms).

Normally we are not so fussy in writing $F_1(f)$ or $F_0(A)$; we write $F(f)$ and $F(A)$ for morphisms $f$ and objects $A$ alike. Sometimes we drop the parentheses as well. $\Box$

If $X, Y$ are groups or monoids regarded as one-object categories, then a functor between them amounts to a group or monoid homomorphism. If $X, Y$ are posets regarded as categories, then a functor between them amounts to a poset map. So no new surprises in these cases.

Exercise: Define a product $C \times D$ of two categories $C, D$, and verify that the definition satisfies the universal property of products in the “category of categories”.

Exercise: If a category $C$ has chosen products, show how a functor $C \times C \to C$ may be defined which takes a pair of objects $(c, d)$ to its product $c \times d$. (You need to define the morphism part $F_1$ of this functor; this will involve the universal property of products.)

In this post, I’d like to move from abstract, general considerations of Boolean algebras to more concrete ones, by analyzing what happens in the finite case. A rather thorough analysis can be performed, and we will get our first taste of a simple categorical duality, the finite case of Stone duality which we call “baby Stone duality”.

Since I have just mentioned the “c-word” (categories), I should say that a strong need for some very basic category theory makes itself felt right about now. It is true that Marshall Stone stated his results before the language of categories was invented, but it’s also true (as Stone himself recognized, after categories were invented) that the most concise and compelling and convenient way of stating them is in the language of categories, and it would be crazy to deny ourselves that luxury.

I’ll begin with a relatively elementary but very useful fact discovered by Stone himself — in retrospect, it seems incredible that it was found only after decades of study of Boolean algebras. It says that Boolean algebras are essentially the same things as what are called Boolean rings:

Definition: A Boolean ring is a commutative ring (with identity $1$) in which every element $x$ is idempotent, i.e., satisfies $x^2 = x$.

Before I explain the equivalence between Boolean algebras and Boolean rings, let me tease out a few consequences of this definition.

Proposition 1: For every element $x$ in a Boolean ring, $2x = 0$.

Proof: By idempotence, we have $x + 1 = (x+1)^2 = x^2 + 2x + 1$. Since $x = x^2$, we may additively cancel in the ring to conclude $0 = 2x$. $\Box$

This proposition implies that the underlying additive group of a Boolean ring is a vector space over the field $\mathbb{Z}_2$ consisting of two elements. I won’t go into details about this, only that it follows readily from the proposition if we define a vector space over $\mathbb{Z}_2$ to be an abelian group $V$ together with a ring homomorphism $\mathbb{Z}_2 \to Hom(V, V)$ to the ring of abelian group homomorphisms from $V$ to itself (where such homomorphisms are “multiplied” by composing them; the idea is that this ring homomorphism takes an element $r = 0, 1$ to scalar-multiplication $r \cdot (-): V \to V$).

Anyway, the point is that we can now apply some linear algebra to study this $\mathbb{Z}_2$-vector space; in particular, a finite Boolean ring $B$ is a finite-dimensional vector space over $\mathbb{Z}_2$. By choosing a basis, we see that $B$ is vector-space isomorphic to $\mathbb{Z}_{2}^{n}$ where $n$ is the dimension. So the cardinality of a finite Boolean ring must be of the form $2^n$. Hold that thought!

Now, the claim is that Boolean algebras and Boolean rings are essentially the same objects. Let me make this more precise: given a Boolean ring $B$, we may construct a corresponding Boolean algebra structure on the underlying set of $B$, uniquely determined by the stipulation that the multiplication $x \cdot y$ of the Boolean ring match the meet operation $x \wedge y$ of the Boolean algebra. Conversely, given a Boolean algebra $B$, we may construct a corresponding Boolean ring structure on $B$, and this construction is inverse to the previous one.

In one direction, suppose $B$ is a Boolean ring. We know from before that a binary operation on a set $B$ that is commutative, associative, unital [has a unit or identity] and idempotent — here, the multiplication of $B$ — can be identified with the meet operation of a meet-semilattice structure on $B$, uniquely specified by taking its partial order to be defined by: $x \leq y$ iff $x = x \cdot y$. It immediately follows from this definition that the additive identity $0 \in B$ satisfies $0 \leq y$ for all $y$ (is the bottom element), and the multiplicative identity $1 \in B$ satisfies $x \leq 1$ for all $x$ (is the top element).

Notice also that $x \wedge (1-x) = x (1-x) = 0$, by idempotence. This leads one to suspect that $1-x$ will be the complement of $x$ in the Boolean algebra we are trying to construct; we are partly encouraged in this by noting $x = 1 - (1 - x)$, i.e., $x$ is equal to its putative double negation.

Proposition 2: $x \mapsto 1-x$ is order-reversing.

Proof: Looking at the definition of the order, this says that if $x = x y$, then $1-y = (1-x)(1-y)$. This is immediate. $\Box$

So, $x \mapsto 1 - x$ is an order-reversing map $B \to B$ (an order-preserving map $B \to B^{op}$) which is a bijection (since it is its own inverse). We conclude that $B \to B^{op}: x \mapsto 1-x$ is a poset isomorphism. Since $B$ has meets and $B \cong B^{op}$, $B^{op}$ also has meets (and the isomorphism preserves them). But meets in $B^{op}$ are joins in $B$. Hence $B$ has both meets and joins, i.e., is a lattice. More exactly, we are saying that the function $f(x) = 1 - x$ takes meets in $B$ to joins in $B$; that is,

$f(x \wedge y) = 1 - x y = f(x) \vee f(y) = (1 - x) \vee (1 - y)$

or, replacing $x$ by $1-x$ and $y$ by $1-y$,

$1 - (1-x)(1-y) = x \vee y$

whence $x \vee y = x + y - x y = x + y + xy$, using the proposition 1 above.

Proposition 3: $1 - x$ is the complement of $x$.

Proof: We already saw $x \wedge (1-x) = x(1-x) = 0$. Also

$x \vee (1-x) = x + (1 - x) + x(1-x) = x + (1-x) + 0 = 1,$

using the formula for join we just computed. This completes the proof. $\Box$

So the lattice is complemented; the only thing left to check is distributivity. Following the definitions, we have $(x \vee y) \wedge z = (x + y + xy)z = xz + yz + xyz$. On the other hand, $(x \wedge z) \vee (y \wedge z) = xz + yz + (xz)(yz) = xz + yz + xyz$, using idempotence once again. So the distributive law for the lattice is satisfied, and therefore we get a Boolean algebra from a Boolean ring.

Naturally, we want to invert the process: starting with a Boolean algebra structure on a set $B$, construct a corresponding Boolean ring structure on $B$ whose multiplication is the meet of the Boolean algebra (and also show the two processes are inverse to one another). One has to construct an appropriate addition operation for the ring. The calculations above indicate that the addition should satisfy $x \vee y = x + y + x \wedge y$, so that $x \vee y = x + y$ if $x \wedge y = 0$ (i.e., if $x$ and $y$ are disjoint): this gives a partial definition of addition. Continuing this thought, if we express $x \vee y = x + y + x \wedge y$ as a disjoint sum of some element $a$ and $x \wedge y$, we then conclude $x \vee y = a + x \wedge y$, whence $a = x + y$ by cancellation. In the case where the Boolean algebra is a power set $PX$, this element $a$ is the symmetric difference of $x$ and $y$. This generalizes: if we define the addition by the symmetric difference formula $x + y := (\neg x \wedge y) \vee (x \wedge \neg y)$, then $x + y$ is disjoint from $x \wedge y$, so that

$(x + y) + x \wedge y$

$= (x + y) \vee (x \wedge y) = (\neg x \wedge y) \vee (x \wedge \neg y) \vee (x \wedge y) = x \vee y$

after a short calculation using the complementation and distributivity axioms. After more work, one shows that $x + y$ is the addition operation for an abelian group, and that multiplication distributes over addition, so that one gets a Boolean ring.

Exercise: Verify this last assertion.

However, the assertion of equivalence between Boolean rings and Boolean algebras has a little more to it: recall for example our earlier result that sup-lattices “are” inf-lattices, or that frames “are” complete Heyting algebras. Those results came with caveats: that while e.g. sup-lattices are extensionally the same as inf-lattices, their morphisms (i.e., structure-preserving maps) are different. That is to say, the category of sup-lattices cannot be considered “the same as” or equivalent to the category of inf-lattices, even if they have the same objects.

Whereas here, in asserting Boolean algebras “are” Boolean rings, we are making the stronger statement that the category of Boolean rings is the same as (is isomorphic to) the category of Boolean algebras. In one direction, given a ring homomorphism $f: B \to C$ between Boolean rings, it is clear that $f$ preserves the meet $x \cdot y$ and join $x + y + x y$ of any two elements $x, y$ [since it preserves multiplication and addition] and of course also the complement $1 + x$ of any $x$; therefore $f$ is a map of the corresponding Boolean algebras. Conversely, a map $f: B \to C$ of Boolean algebras preserves meet, join, and complementation (or negation), and therefore preserves the product $x \wedge y$ and sum $(\neg x \wedge y) \vee (x \wedge \neg y)$ in the corresponding Boolean ring. In short, the operations of Boolean rings and Boolean algebras are equationally interdefinable (in the official parlance, they are simply different ways of presenting of the same underlying Lawvere algebraic theory). In summary,

Theorem 1: The above processes define functors $F: \mbox{BoolRing} \to \mbox{BoolAlg}$, $G: \mbox{BoolAlg} \to \mbox{BoolRing}$, which are mutually inverse, between the category of Boolean rings and the category of Boolean algebras.

• Remark: I am taking some liberties here in assuming that the reader is already familiar with, or is willing to read up on, the basic notion of category, and of functor (= structure-preserving map between categories, preserving identity morphisms and composites of morphisms). I will be introducing other categorical concepts piece by piece as the need arises, in a sort of apprentice-like fashion.

Let us put this theorem to work. We have already observed that a finite Boolean ring (or Boolean algebra) has cardinality $2^n$ — the same as the cardinality of the power set Boolean algebra $PX$ if $X$ has cardinality $n$. The suspicion arises that all finite Boolean algebras arise in just this way: as power sets of finite sets. That is indeed a theorem: every finite Boolean algebra $B$ is naturally isomorphic to one of the form $PX$; one of our tasks is to describe $X$ in terms of $B$ in a “natural” (or rather, functorial) way. From the Boolean ring perspective, $X$ is a basis of the underlying $\mathbb{Z}_2$-vector space of $B$; to pin it down exactly, we use the full ring structure.

$X$ is naturally a basis of $PX$; more precisely, under the embedding $i: X \to PX$ defined by $x \mapsto \{x\}$, every subset $S \subseteq X$ is uniquely a disjoint sum of finitely many elements of $i(X)$: $S = \sum_{x \in X} a_x(S) \{x\}$ where $a_x(S) \in \{0, 1\} = \mathbb{Z}_2$: naturally, $a_x(S) = 1$ iff $x \in S$. For each $S$, we can treat the coefficient $a_x(S)$ as a function of $x$ valued in $\mathbb{Z}_2$. Let $\hom(X, \mathbb{Z}_2)$ denote the set of functions $X \to \mathbb{Z}_2$; this becomes a Boolean ring under the obvious pointwise definitions $(f + g)(x) := f(x) + g(x)$ and $(f g)(x) = f(x) g(x)$. The function $PX \to \hom(X, \mathbb{Z}_2)$ which takes $S \in PX$ to the coefficient function $a_{(-)}(S)$ is a Boolean ring map which is one-to-one and onto, i.e., is a Boolean ring isomorphism. (Exercise: verify this fact.)

Or, we can turn this around: for each $x \in X$, we get a Boolean ring map $PX \to \mathbb{Z}_2$ which takes $S$ to $a_x(S)$. Let $\mbox{Bool}(PX, \mathbb{Z}_2)$ denote the set of Boolean ring maps $PX \to \mathbb{Z}_2$.

Proposition 4: For a finite set $X$, the function $X \to \mbox{Bool}(PX, \mathbb{Z}_2)$ that sends $x$ to $a_x(-)$ is a bijection (in other words, an isomorphism).

Proof: We must show that for every Boolean ring map $\phi: PX \to \mathbb{Z}_2$, there exists a unique $x \in X$ such that $\phi = a_x(-)$, i.e., such that $\phi(T) = a_x(T)$ for all $T \in PX$. So let $\phi$ be given, and let $S$ be the intersection (or Boolean ring product) of all $T \in PX$ for which $\phi(T) = 1$. Then

$\phi(S) = \phi(\prod_{T: \phi(T) = 1} T) = \prod_{T: \phi(T) = 1} \phi(T) = 1$.

I claim that $S$ must be a singleton $\{x\}$ for some (evidently unique) $x \in X$. For $1 = \phi(S) = \phi(\sum_{x \in S} \{x\}) = \sum_{x \in S} \phi(\{x\})$, forcing $\phi(\{x\}) = 1$ for some $x \in S$. But then $S \subseteq \{x\}$ according to how $S$ was defined, and so $S = \{x\}$. To finish, I now claim $\phi(T) = a_x(T)$ for all $T \in PX$. But $\phi(T) = 1$ iff $S \subseteq T$ iff $x \in T$ iff $a_x(T) = 1$. This completes the proof. $\Box$

This proposition is a vital clue, for if $B$ is to be isomorphic to a power set $PX$ (equivalently, to $\hom(X, \mathbb{Z}_2)$), the proposition says that the $X$ in question can be retrieved reciprocally (up to isomorphism) as $\mbox{Bool}(B, \mathbb{Z}_2)$.

With this in mind, our first claim is that there is a canonical Boolean ring homomorphism

$B \to \hom(\mbox{Bool}(B, \mathbb{Z}_2), \mathbb{Z}_2)$

which sends $b \in B$ to the function $eval_b$ which maps $\phi \in \mbox{Bool}(B, \mathbb{Z}_2)$ to $\phi(b)$ (i.e., evaluates $\phi$ at $b$). That this is a Boolean ring map is almost a tautology; for instance, that it preserves addition amounts to the claim that $eval_{b+c}(\phi) = eval_b(\phi) + eval_c(\phi)$ for all $\phi \in \mbox{Bool}(B, \mathbb{Z}_2)$. But by definition, this is the equation $\phi(b+c) = \phi(b) + \phi(c)$, which holds since $\phi$ is a Boolean ring map. Preservation of multiplication is proved in exactly the same manner.

Theorem 2: If $B$ is a finite Boolean ring, then the Boolean ring map

$eval: B \to \hom(\mbox{Bool}(B, \mathbb{Z}_2), \mathbb{Z}_2)$

is an isomorphism. (So, there is a natural isomorphism $B \cong P(\mbox{Bool}(B, \mathbb{Z}_2)$.)

Proof: First we prove injectivity: suppose $b \in B$ is nonzero. Then $\neg b \neq 1$, so the ideal $(\neg b) = \{a \cdot \neg b: a \in B\} = \{x \in B: x \leq \neg b\}$ is a proper ideal. Let $I$ be a maximal proper ideal containing $\neg b$, so that $B/I$ is both a field and a Boolean ring. Then $B/I \cong \mathbb{Z}_2$ (otherwise any element $x \in B/I$ not equal to $0, 1 \in B/I$ would be a zero divisor on account of $x(1-x) = 0$). The evident composite

$B \to B/I \cong \mathbb{Z}_2$

yields a homomorphism $\phi: B \to \mathbb{Z}_2$ for which $\phi(\neg b) = \phi(1-b) = 0$, so $\phi(b) = eval_b(\phi) = 1$. Therefore $eval_b$ is nonzero, as desired.

Now we prove surjectivity. A function $g: \mbox{Bool}(B, \mathbb{Z}_2) \to \mathbb{Z}_2$ is determined by the set of elements $\phi$ mapping to $1$ under $g$, and each such homomorphism $\phi: B \to \mathbb{Z}_2$, being surjective, is uniquely determined by its kernel, which is a maximal ideal. Let $J$ be the intersection of these maximal ideals; it is an ideal. Notice that an ideal is closed under joins in the Boolean algebra, since if $x, y$ belong to $J$, then so does $x \vee y = x + y + x y$. Let $j$ be the join of the finitely many elements of $J$; notice $J = \{x \in B: x \leq j\} = (j)$ (actually, this proves that every ideal of a finite Boolean ring $B$ is principal). In fact, writing $k_\phi$ for the unique element such that $\ker(\phi) = (k_\phi)$, we have

$j = \bigwedge_{\phi: g(\phi) = 1} k_\phi$

(certainly $j \leq k_\phi$ for all such $\phi$, since $J \subseteq \ker(\phi) = \{x \in B: x \leq k_\phi\}$, but also $\bigwedge_{g(\phi) = 1} k_\phi$ belongs to the intersection of these kernels and hence to $J = \{x \in B: x \leq j\}$, whence $\bigwedge_{g(\phi) = 1} k_\phi \leq j$).

Now let $b = 1- j$; I claim that $g = eval_b$, proving surjectivity. We need to show $g(\phi) = eval_b(\phi) = \phi(b)$ for all $\phi \in \mbox{Bool}(B, \mathbb{Z}_2)$. In one direction, we already know from the above that if $g(\phi) = 1$, then $j$ belongs to the kernel of $\phi$, so $\phi(j) = 0$, whence $\phi(b) = \phi(1-j) = 1$.

For the other direction, suppose $\psi(b) = 1$, or that $\psi(j) = 0$. Now the kernel of $\psi$ is principal, say $(k)$ for some $k \neq 1$. We have $j \leq k$, so

$k = k \vee j = k \vee \bigwedge_{g(\phi) = 1} k_\phi = \bigwedge_{g(\phi) = 1} k \vee k_\phi$

from which it follows that $k \vee k_\phi \neq 1$ for some $\phi \in g^{-1}(1)$. But then $(k \vee k_\phi)$ is a proper ideal containing the maximal ideals $(k)$ and $(k_\phi)$; by maximality it follows that $(k) = (k \vee k_\phi) = (k_\phi)$. Since $\psi$ and $\phi$ have the same kernels, they are equal. And therefore $g(\psi) = g(\phi) = 1$. We have now proven both directions of the statement ($\psi(b) = 1$ if and only if $g(\psi) = 1$), and the proof is now complete. $\Box$

• Remark: In proving both injectivity and surjectivity, we had in each case to pass back and forth between certain elements $b$ and their negations, in order to take advantage of some ring theory (kernels, principal ideals, etc.). In the usual treatments of Boolean algebra theory, one circumvents this passage back-and-forth by introducing the notion of a filter of a Boolean algebra, dual to the notion of ideal. Thus, whereas an ideal is a subset $I \subseteq B$ closed under joins and such that $x \wedge y \in I$ for $x \in B, y \in I$, a filter is (by definition) a subset $F$ closed under meets and such that $x \vee y \in F$ whenever $x \in B, y \in F$ (this second condition is equivalent to upward-closure: $y \in F$ and $y \leq x$ implies $x \in F$). There are also notions of principal filter and maximal filter, or ultrafilter as it is usually called. Notice that if $I$ is an ideal, then the set of negations $\{\neg x: x \in I\}$ is a filter, by the De Morgan laws, and vice-versa. So via negation, there is a bijective correspondence between ideals and filters, and between maximal ideals and ultrafilters. Also, if $f: B \to C$ is a Boolean algebra map, then the inverse image $f^{-1}(1)$ is a filter, just as the inverse image $f^{-1}(0)$ is an ideal. Anyway, the point is that had we already had the language of filters, the proof of theorem 2 could have been written entirely in that language by straightforward dualization (and would have saved us a little time by not going back and forth with negation). In the sequel we will feel free to use the language of filters, when desired.

For those who know some category theory: what is really going on here is that we have a power set functor

$P(-) = \hom(-, \mathbb{Z}_2): \mbox{FinSet}^{op} \to \mbox{FinBool}$

(taking a function $f: X \to Y$ between finite sets to the inverse image map $f^{-1}: PY \to PX$, which is a map between finite Boolean algebras) and a functor

$Q(-) = \mbox{Bool}(-, \mathbb{Z}_2): \mbox{FinBool}^{op} \to \mbox{FinSet}$

which we could replace by its opposite $Q(-)^{op}: \mbox{FinBool} \to \mbox{FinSet}^{op}$, and the canonical maps of proposition 4 and theorem 2,

$X \to \mbox{Bool}(\hom(X, \mathbb{Z}_2), \mathbb{Z}_2),$

$B \to \hom(\mbox{Bool}(B, \mathbb{Z}_2), \mathbb{Z}_2),$

are components (at $X$ and $B$) of the counit and unit for an adjunction $Q(-)^{op} \dashv P(-)$. The actual statements of proposition 4 and theorem 2 imply that the counit and unit are natural isomorphisms, and therefore we have defined an adjoint equivalence between the categories $\mbox{FinSet}^{op}$ and $\mbox{FinBool}$. This is the proper categorical statement of Stone duality in the finite case, or what we are calling “baby Stone duality”. I will make some time soon to explain what these terms mean.

• 253,852 hits