Last summer, Todd and I discussed a problem and its solution, and I had wondered if it was fit enough to be in the POW-series (on this blog) when he mentioned that the problem might be somewhat too easy for that purpose. Of course, I immediately saw that he was right. But, a few days back, I thought it wouldn’t be bad if we shared this cute problem and its solution over here, the motivation being that some of our readers may perhaps gain something out of it. What is more, an analysis of an egf solution to the problem lends itself naturally to a discussion of combinatorial species. Todd will talk more about it in the second half of this post. Anyway, let’s begin.

PROBLEM: Suppose $A = \{ 1,2, \ldots , n \}$, where $n$ is a positive natural number. Find the number of endofunctions $f: A \rightarrow A$ satisfying the idempotent property, i.e. $f \circ f = f$.

It turns out that finding a solution to the above problem is equivalent to counting the number of forests with $n$ nodes and height at most $1$, which I found here (click only if you wish to see the answer!) at the Online Encyclopedia of Integer Sequences. If you haven’t clicked on that link yet and wish to solve the problem on your own, then please stop reading beyond this point.

SOLUTION: There are two small (and related) observations that need to be made. And, both are easy ones.

Lemma 1: $f$ has at least one fixed point.

Proof: Pick any $i \in A$ and let $f(i) = j$, where $j \in A$. Then, using the idempotent property, we have $f(f(i)) = f(i)$, which implies $f(j) = j$. Therefore, $j$ is a fixed point, and this proves our lemma.

Lemma 2: The elements in $A$ that are not fixed points are mapped to fixed points of $f$.

Proof: Suppose$j \in A$ is not a fixed point such that $f(j) = k$.  Then, using the idempotent property again, we immediately have $f(f(j)) = f(j)$, which implies $f(k) = k$, thereby establishing the fact that $k$ itself is a fixed point. Hence, $j$ (which is not a fixed point) is mapped to some fixed point of $f$.

In both the lemmas above, the idempotent property “forces” everything.

Now, the solution is right before our eyes! Suppose $f$ has $m$ fixed points. Then there are $\displaystyle \binom{n}{m}$ ways of choosing them. And, each of the remaining $n - m$ elements of $A$ that are not fixed points are to be mapped to any one of the $m$ fixed points. And, there are a total of $m^{n-m}$ ways of doing that. So, summing over all $m$, our final answer is $\displaystyle \sum_{m=0}^{n} \binom{n}{m} m^{n-m}$.

### Exponential Generating Function and Introduction to Species

Hi; Todd here. Vishal asked whether I would discuss this problem from the point of view of exponential generating functions (or egf’s), and also from a categorical point of view, using the concept of species of structure, which gives the basis for a categorical or structural approach to generatingfunctionology.

I’ll probably need to write a new post of my own to do any sort of justice to these topics, but maybe I can whet the reader’s appetite by talking a little about the underlying philosophy, followed by a quick but possibly cryptic wrap-up which I could come back to later for illustrative purposes.

Enumerative combinatorics studies the problem of counting the number $a_n$ of combinatorial structures of some type on an $n$-element set, such as the number of idempotent functions on that set, or the number of equivalence relations, and so on. A powerful idea in enumerative combinatorics is the idea of a generating function, where we study the series $a_n$ by rolling them into a single analytic function, such as

$\displaystyle A(x) = \sum_{n \geq 0} \frac{a_n x^n}{n!},$

(this the so-called “exponential” generating function of $\{a_n\}_{n \geq 0}$). In many cases of interest, the function $A(x)$ will be recognizable in terms of operations familiar from calculus (addition, multiplication, differentiation, composition, etc.), and this can then be used to extract information about the series $a_n$, such as explicit formulas, asymptotics, and so on. If you’ve never seen this idea in action, you should definitely take a look at Wilf’s book generatingfunctionology, or at the book Concrete Mathematics by Graham, Knuth and Patashnik.

Each of the basic operations one performs on analytic functions (addition, multiplication, composition, etc.) will, it turns out, correspond to some set-theoretic operation directly at the level of combinatorial structures, and one of the trade secrets of generating function technology is to have very clear pictures of the combinatorial structures being counted, and how these pictures are built up using these basic structural operations.

In fact, why don’t we start right now, and figure out what some of these structural operations would be? In other words, let’s ask ourselves: if $A(x)$ and $B(x)$ are generating functions for counting combinatorial structures of type (or species) $A$ and $B$, then what types of structures would the function $A(x) + B(x)$ “count”?  How about $A(x)B(x)$? Composition $A(B(x))$?

The case of $A(x) + B(x)$ is easy: writing

$\displaystyle A(x) + B(x) = \sum_{n \geq 0} \frac{a_n x^n}{n!} + \sum_{n \geq 0} \frac{b_n x^n}{n!} = \sum_{n \geq 0} \frac{(a_n + b_n) x^n}{n!},$

and thinking of $a_n$ as counting structures of type $A$ on an $n$-element set, and $b_n$ as counting structures of type $B$, the quantity $a_n + b_n$ counts elements in the disjoint union of the sets of $A$-structures and $B$-structures.

In the categorical approach we will discuss later, we actually think of structure types (or species of structure) $A$ as functors, which take an $n$-element set $S$ to the set $A\left[S\right]$ of structures of type $A$ on $S$. Here, we have to be a little bit careful about what categories we’re talking about, but the general idea is that if we have a bijection $f: S \to T$ from one $n$-element set to another, then it should always be possible to “transport” $A$-structures on $S$ to $A$-structures on $T$, simply by relabeling points along the bijection $f$. So, let us define a species to be a functor

$A: FB \to Set$

where $FB$ is the category of finite sets and bijections (not all functions, just bijections!), and $Set$ is the category of sets. In enumerative combinatorics, the set $A\left[S\right]$ is normally assumed to be finite, but in other applications of the notion of species, we actually allow a lot more latitude, and allow the functor $A$ to map into other categories $C$, not just $Set$ (“$C$-valued species”). But if we stick for now just to set-valued species $A$, $B$, then we define the species $A + B$ by the functorial formula

$\displaystyle (A + B)\left[S\right] = A\left[S\right] \sqcup B\left[S\right]$

where $\sqcup$ denotes disjoint union. So addition of generating functions will correspond to the concrete operation of taking disjoint unions of sets of combinatorial species.

More interesting is the case of multiplication. Let’s calculate the product of two egf’s:

$\displaystyle A(x) B(x) = (\sum_{j \geq 0} \frac{a_j x^j}{j!})(\sum_{k \geq 0} \frac{b_k x^k}{k!}) = \sum_{n \geq 0} (\sum_{j + k = n} \frac{n!}{j! k!} a_j b_k) \frac{x^n}{n!}$

The question is: what type of structure does the expression $\displaystyle \sum_{j+k = n} \frac{n!}{j! k!} a_j b_k$ “count”? Look at the individual terms: the binomial coefficient $\displaystyle \frac{n!}{j! k!}$ describes the number of ways of decomposing an $n$-element set into two disjoint subsets, one with $j$ elements and the other with $k$, where $j$ and $k$ add to $n$. Then, $a_j$ is the number of ways of putting an $A$-structure on the $j$-element part, and $b_k$ is the number of $B$-structures on the $k$-element part.

This suggests a new operation on structure types: given structure types or species $A, B$, we define a new species $A \otimes B$ according to the formula

$\displaystyle (A \otimes B)\left[S\right] = \bigsqcup_{T \sqcup U = S} A\left[T\right] \times B\left[U\right]$

(that is, a structure of type $A \otimes B$ on a set $S$ is an ordered pair, consisting of an $A$-structure on a subset of $S$ and a $B$-structure on its complement). This functorial operation is usually called the “convolution product” of the combinatorial species $A, B$: it is the concrete set-theoretic operation which corresponds to multiplication of generating functions.

Finally, let’s look at composition $A(B(x))$. Here we make the technical assumption that $b_0 = 0$ (no $B$-structures on the empty set!), so that we won’t have divergence issues blowing up in our faces: we want to remain safely within the realm of finitary structures. Okay, then, what type of combinatorial structure does this egf count?

Perhaps not surprisingly, this is rather more challenging than the previous two examples. In analytic function language, we are trying here to give a meaning to the Taylor coefficients of a composite function in terms of the Taylor coefficients of the original functions — for this, there is a famous formula attributed to Faà di Bruno, which we then want to interpret combinatorially. If you don’t already know this but want to think about this on your own, then stop reading! But I’ll just give away the answer, and say no more for now about where it comes from, although there’s a good chance you can figure it out just by staring at it for a while, possibly with paper and pen in hand.

Definition: Let $A, B: FB \to Fin$ be species (functors from finite sets and bijections to finite sets), and assume $B\left[\emptyset\right] = \emptyset$. The substitution product $A \circ B$ is defined by the formula

$\displaystyle (A \circ B)\left[S\right] = \sum_{E \in Eq(S)} A\left[S/E\right] \times \prod_{c \in S/E} B\left[c\right]$

This clearly requires some explanation. The sum here denotes disjoint union, and $Eq(S)$ denotes the set of equivalence relations on the finite set $S$. So $E$ here is an equivalence relation, which partitions $S$ into nonempty sets $c$ ($E$-equivalence classes). And the quotient $S/E$ denotes the set of such equivalence classes (so we think of each class $c$ as a point of $S/E$). What this formula says is that a structure of type $A \circ B$ on $S$ consists of a partition of $S$ into a bunch of non-empty blobs, a $B$-structure on each blob, and then an $A$-structure on the set of blobs.

It’s high time for an example! So let’s look at Vishal’s problem, and see if we can picture it in terms of these operations. We’re going to need some basic functions (or functors!) to apply these operations to, and out of thin air I’ll pluck the two basic ones we’ll need:

$\displaystyle E(x) = \exp(x) = \sum_{n \geq 0} \frac{x^n}{n!}$

$F(x) = x$

The first is the generating function for the series $e_n = 1$. So for the species $E$, there’s just one structure of type $E$ for each set $S$ (in categorical language, the functor $E: FB \to Set$ is the terminal functor). We can just think of that structure as the set $S$ itself, if we like, with no other structure appended thereon.

For $F$, we have $f_n = 0$ unless $n = 1$, where $f_1 = 1$. So $F$ is the species for the one-element set structure (meaning that $F\left[S\right] = \emptyset$ unless $S$ has cardinality 1, in which case $F\left[S\right] = \{S\}$).

Okay, on to Vishal’s example. He was counting the number of idempotent functions $f: S \to S$, and now, as promised, I want to determine the corresponding egf. You might be able to find it by looking at his formula, but obviously I want to use the ideas I’ve developed thus far, which focuses much more on the pictures. So, let’s picture $f: S \to S$, first as Vishal did, by thinking of the elements of $S$ as ‘nodes’, and then drawing a directed edge from node $x$ to node $y$ if $f(x) = y$. (Then, by idempotence of $f$, $y$ will be a fixed point of $f$. Let’s agree not to bother drawing an edge from $y$ to itself, if $y$ is already a fixed point.)

In this picture, we get a directed graph which consists of a disjoint union of “sprouts”: little bushes, each rooted at a fixed point of $f$, whose only other nodes are “leaves” joined to the root by an edge. We can simplify the picture a little: if you put a circle around each sprout, you don’t need the edges at all: just mark one of the points inside as the root, and you know what to do.

So we arrive at a picture of an idempotent function on $S$: a partition of $S$ into a collection of (nonempty) blobs, and inside each blob, one of the points is marked “root”. In terms of our operations, what does it mean to mark a point in a blob? It just means: break the blob into two pieces, where one piece is given the structure of “one-element set”, and the other piece is just itself. In terms of the ideas developed above, this means each blob carries a $F \otimes E$ structure; we’ll suggestively write this structure type as $X \otimes \exp(X)$.

In this picture of idempotent $f$, there is no extra combinatorial structure imposed on the set of blobs, beyond the set itself. In other words, in this picture, the set of blobs carries merely an “$E$-structure”, nothing more.

So, putting all this together, we picture an idempotent function on $S$ as a partition or equivalence relation on $S$, together with an assignment of a marked point in each equivalence class. In the language of species operations, we may therefore identify the structure type of idempotent functions with

$E \circ (F \otimes E)$

or more suggestively, $\exp \circ (X \otimes \exp(X))$. The exponential generating function is, of course, $e^{x e^x}$!

In summary, the theory of species is a functorial calculus which projects onto its better-known “shadow”, the functional calculus of generating functions. That is to say, we lift operations on enumeration sequences $\{a_n\}$, as embodied in their generating functions, directly up to the level of the sets we’re counting, where the functorial operations become both richer and more concrete. The functorial analogue of the generating function itself is called the “analytic functor” attached to the species (the species itself being the concrete embodiment of the enumeration).

Much more could be said, of course. Instead, here’s a little exercise which can be solved by working through the ideas presented here: write down the egf for the number of ways a group of people can be split into pairs, and give an explicit formula for this number. Those of you who have studied quantum field theory may recognize this already (and certainly the egf is very suggestive!) ; in that case, you might find interesting the paper by Baez and Dolan, From Finite Sets to Feynman Diagrams, where the functorial point of view is used to shed light on, e.g., creation and annihilation operators in terms of simple combinatorial operations.

The literature on species (in all its guises) is enormous, but I’d strongly recommend reading the original paper on the subject:

• André Joyal, Une théorie combinatoire des séries formelles, Adv. Math. 42 (1981), 1-82.

which I’ve actually referred to before, in connection with a POW whose solution involves counting tree structures. Joyal could be considered to be a founding father of what I would call the “Montreal school of combinatorics”, of which a fine representative text would be

• F. Bergeron, G. Labelle, and P. Leroux, Combinatorial Species and Tree-like Structures, Encyclopedia of Mathematics and its Applications 67, 1998.

More to come, I hope!

I thought I would share with our chess-loving readers the following interesting (and somewhat well-known) mathematical chess paradox , apparently proving that $64=65$, and the accompanying explanation offered by Prof. Christian Hesse, University of Stuttgart (Germany).  It shows a curious connection between the well-known Cassini’s identity (related to Fibonacci numbers) and the $8 \times 8$ chessboard ($8$ being a Fibonacci number!). The connection can be exploited further to come up with similar paradoxes wherein any $F_n \times F_n$ -square can always be “rerranged” to form a $F_{n-1} \times F_{n+1}$ -rectangle such that the difference between their areas is either $+1$ or $-1$. Of course, for the curious reader there are plenty of such dissection problems listed in Prof David Eppstein’s Dissection page.

This being a mathematics blog, I’m sure a lot of readers out like to play Minesweeper. I’ve just obtained a personal best today (94 seconds on expert level) which, as Minesweeper buffs know, is nowhere close to world-class levels, but which made me happy anyway, as I’d never broken into the double digits before today!

It seems to me that world-class competitors must know some tricks with the mouse which I’ve never bothered to master, particularly because my laptop doesn’t have a mouse but rather a touchpad. This being the case, I keep my right index finger on the touchpad to guide the cursor, and the left index finger to click. I always left-click: that is, in my style of play, I [practically] never flag squares for bombs; I click only on non-bomb squares. For it’s well-known, or at least it should be, that the program doesn’t care if you identify where the bombs are — you get full credit for only identifying all the numbered squares.

To play in this style well, one needs to be fluent in a number of tactical tricks, which I don’t have good names for, but which in my personal argot I call things like “1-2-1″, “1-2-2-1″, “rule of three”, to name just a few. But that’s not what I set out to discuss, really. What I’d really like to hear from good players is: what opening strategies do you use?

The personal best I just set came after deciding on a new opening strategy. What I had been doing is clicking along border squares. Under that strategy, one could of course just keep clicking until one opens up an area, but often I would add to that the observation that if one clicked on a 1, thus leading to, e.g.,

x 1 x (–> border row)
x x x

then one could then click on the non-border square just adjacent to the 1, with only a 1 in 5 chance of setting off a bomb. If one then hits another 1:

x 1 x
x 1 x
x x x

then one can immediately open up the line of squares on the third rank, leading to a configuration such as

x 1 x
x 1 x
1 1 1

or better. This is often a cheap and quick way of opening up squares or otherwise getting a tactical toehold.

The new strategy I began using today is not to click along the border, but to click along the ranks or files adjacent to the border. Under this strategy, if one lands on a 1, leading to

x x x  (–> border row)
x 1 x
x x x

then one can click on the border square adjacent to the 1, with only a 1 in 8 chance of setting off a bomb. If one does not set off a bomb, that square has to be a 1:

x 1 x
x 1 x
x x x

and then one can proceed as before. So I’ve just lowered my odds of hitting a bomb, plus a very small fractional gain in processing time that comes with the certain knowledge that it’s a 1 if not a bomb. So far the strategy has paid off well!

I’d like to hear other people’s opening strategies, and also I’d like to know some statistics. For example, considered over the space of expert-level games, what is the probability of getting a 1, a 2, and so on? Does anyone know? (It seems this would be very difficult computing analytically — one is probably better off using a Monte Carlo simulation. But I don’t have the wherewithal to set that kind of thing up.)

Love is like PI – natural, irrational and very important!
– Lisa Hoffman

Happy Co-Valentine's Day

Valentine’s Day is usually associated with romantic love, but I think such an idea although wonderful is somewhat restrictive. This time of the year, I believe, is also about letting people close and dear to you know how much you love and care about them! Keeping that in mind, I wish my parents a Happy Valentine’s Day and hope that my younger brother, Vishant, has a great Valentine’s Day too!

I also sincerely hope that Todd gets to spend a great Valentine weekend with his wife and family! And, here’s hoping that all our readers and my friends (including Aditya, Pawan and Kenji!) will today not hesitate in expressing their love to their near and dear ones.

And very importantly, here’s wishing Carolyn an unforgettable Valentine’s Day! Thanks for being my Valentine even though you are thousands of miles away!!

[I do hope Todd will forgive me for posting something completely non-mathematical. In my defense, this post has at least a reference to PI and category theory! :-) ]

The solutions are in! This problem of the week was interesting for me: the usual pattern has been that I pose problems that I’ve solved myself at some point in the past, giving me a kind of “inside edge” on understanding the solutions that come in. But, as I said in POW-12, the difference this time is that the solution I knew of came from someone else (Arin Chaudhuri). What was interesting for me — and given the circumstances, it was probably inevitable — is that some of the solutions we received forced me to make contact with some mathematics I’d only heard about but never studied. Let me get to that in a minute.

Another interesting thing for me — speaking now with my category theorist’s hat on — is how utterly simple and conceptual Arin’s proof was! I was pleased to see that regular problem solver Philipp Lampe also spotted it. Wait for it…

Solution I by Philipp Lampe, University of Bonn: The answer is 8.

Among the eight neighbors of an arbitrary vertex, all colors of an admissible coloring must occur. Thus, 8 is an upper bound for the maximum number of colors one can use. We have to show that there is an admissible coloring with eight colors.

The vertices of the 8-dimensional cube may be represented by vectors $(a_1, a_2, \ldots, a_8)$ with $a_i$ in $\{0,1\}$, in other words as vectors in $\mathbb{F}_{2}^8$, the 8-dimensional vector space over the field $\mathbb{F}_2 = \{0, 1\}$ with two elements (i.e., the integers modulo 2). Two such vectors $u, v$ are neighbors iff their coordinate vectors differ in exactly one place, in other words if $u = v + e_i$ considered modulo 2, where $e_i$ is one of the eight standard basis elements (with $i$-th coordinate 1, and the other coordinates 0).

Now let our “colors” be the 8 elements $x_1, x_2, \ldots, x_8$ of $\mathbb{F}_{2}^3$, the 3-dimensional vector space over $\mathbb{F}_2$. Let the vertex “coloring”

$C: \mathbb{F}_{2}^8 \to \mathbb{F}_{2}^3$

be the unique $\mathbb{F}_2$-linear map such that $C(e_i) = x_i$; that is, define the color of a vertex/vector by

$\displaystyle C\left(a_1,\ldots, a_8\right) = \sum_{i=1}^{8} a_i x_i$

Now, if $v$ is any vector, the colors of its neighbors are $C(v + e_i) = C(v) + C(e_i) = C(v) + x_i$. The colors of these neighbors are all distinct since the $x_i$ are distinct. Hence all 8 colors are represented among the colors of the neighbors of any given vertex $v$, QED.

But I also learned a thing or two by studying the next solution. It relies on the theory of Hamming codes, with which I was not conversant. Up to some small edits, here is exactly what Sune Jakobsen submitted:

Solution II by Sune Jakobsen (first-year student), University of Copenhagen: Since each vertex only has 8 neighbors, the answer cannot be greater that 8.

Now we construct such a coloring with 8 colors. An 8-dimensional cube can be represented by the graph with vertices in $\{0, 1\}^8$, and with an edge between them iff the Hamming distance between them is 1. We color a vertex with color 8 if the seven first bits in the vertex is a “correct” Hamming message (cf. Hamming code (7,4)), and color it in color $i$ if the first seven bits give a correct Hamming message upon changing bit $i$. This is a well-defined coloring, since each element in $\{0, 1\}^7$ is either a correct Hamming message, or is Hamming distance 1 away to exactly one correct Hamming message.

It remains to show that no vertex is neighbor to two vertices of the same color. The Hamming distance between these two vertices is 2, thus the Hamming distance between the first 7 bits of two neighbors must be 1 or 2. If two neighbors had the same color $i$, then by definition one would get two correct Hamming messages by changing bit $i$ in both of them, and the Hamming distance between these messages would be the same as before — either 1 or 2. But the distance between any two correct Hamming messages is at least 3. Contradiction.

Remarks:

1. Let me give a little background to Sune’s solution. Mathematically, the Hamming code called “(7, 4)” is the image of injective linear map

$G: \mathbb{F}_{2}^4 \to \mathbb{F}_{2}^7$

given by the $7 \times 4$ matrix

1101
1011
1000
0111
0100
0010
0001

The Hamming code is what is known as an “error-correcting code”. Imagine that you want to send a 4-bit message (each bit being a 0 or 1) down a transmission line, but that due to noise in the line, there is the possibility of a bit being flipped and the wrong message being received. What can you do to help ensure that the intended message is received?

The answer is to add some “parity check bits” to the message. In the code (7, 4), one adds in three parity checks, so that the transmitted message is 7 bits long. What one does is apply the matrix $G$ to the 4-vector, to get a 7-vector (remember we’re working modulo 2), and this 7-vector is sent down the line. Assuming only a moderate amount of noise in the transmission line, perhaps the 7-bit message will remain intact or perhaps a single bit will be flipped, more rarely two bits will be flipped (and we’ll assume the event that more than two are flipped has negligible probability). Now, the Hamming encoding $G$ is rigged up so that if the 7-bit vector is received as sent, then the parity checks will report 0 errors. If the parity checks report an error, they report precisely where the error occurred if the received vector was off by one bit. (If two flips occurred, then the receiver can still detect from the parity checks that an error occurred. The parity checks can never detect how many bits were flipped, but if the receiver assumes correctly that just one bit got flipped, he can restore the intended message with accuracy. If three or more got flipped, then the receiver got the wrong message, but he would never know it if the parity checks reported back 0 errors.)

How does this work? By having the receiver apply a parity-check matrix $H$ to the received message, given by the $3 \times 7$ matrix

1010101
0110011
0001111

In the first place, $HG = 0$, and so if the message received belongs to the image of $G$ (is a “correct Hamming message” in the terminology of Solution II), which will indeed be the case if there were no errors in the transmission, then $H$ applied to the message will produce the zero vector. In the case where one bit got flipped in the transmission, $H$ applied to the received vector will return a nonzero 3-vector, but the beauty of the Hamming code is that the 3-vector will spell out in binary the bit the error occurred (for example, if the output vector is $(0 1 1)^t$, then error occurred in the bit with binary number 011, that is bit 3). In that case, the receiver flips that bit to restore the original message. In these two cases, the original 4 bits are then read off as the 3rd, 5th, 6th, and 7th coordinates of the (possibly corrected) 7-vector.

By the way, Sune reminded us that Hamming codes also came up in a post by Greg Muller over at The Everything Seminar, who used the existence of a Hamming code in every dimension $2^k - 1$ to solve general hat-guessing puzzles.

2. Within minutes of posting the original problem, we received a message from David Eppstein, who reported that the solution of 8 colors is essentially contained in a paper he coauthored with T. Givaris (page 4); his solution is close in spirit to Sune’s.

Arin Chaudhuri also surmised the connection to Hamming codes, and mentions that the problem (in a slightly different formulation) originally came from a friend of a friend, who blogs about a number of related problems on this page. Presumably the author had things like error-correcting codes in mind.

3. Sune and David noted that their solutions generalize to show that on the $2^k$-dimensional cube (for any $k$), it is possible to color the vertices with $2^k$ colors (the maximum number possible) so that all of the colors show up as colors of the neighbors of any given vertex.

This is very easy to see by adapting Philipp’s method. Indeed, for each $k$ just take the color set to be the $\mathbb{F}_2$-vector space $S = \mathbb{F}_{2}^k$. The set of vertices of the $2^k$-dimensional cube may identified with the $\mathbb{F}_2$-vector space $V(S)$ generated by the set $S$ (as basis), and the desired coloring is then just the $\mathbb{F}_2$-linear map $C: V(S) \to S$ that extends the identity function on $S$. As mathematical constructions go, you can’t get much more canonical than that! No fuss, no muss — it’s a purely categorical construction (categorists call it a counit of an adjunction). So then why didn’t I see that myself? :-)

4. We’re still not sure what the story is in other dimensions. If $a_n$ is the maximum number of admissible colors in dimension $n$, then about all any one of us knows right now is that $a_n = n$ for $n = 2^k$, $a_n$ is weakly monotone increasing, and that $a_n < n$ if $n$ is not a power of 2. There’s a conjecture afloat that $a_n = 2^{\left[\log_2 n\right]}$ where $\left[x\right]$ is the floor function, but for now that should be treated as a wild guess. If anyone has further information, please let us know!

Solved by Arin Chaudhuri, David Eppstein, Sune Jakobsen, and Philipp Lampe. Thanks to all those who wrote in!

Happy holidays, folks! And happy birthday to Topological Musings, which Vishal started just a little over a year ago — we’ve both had a really good time with it.

And in particular with the Problem of the Week series. Which, as we all know, doesn’t come out every week  — but then again, to pinch a line from John Baez’s This Week’s Finds, we never said it would: only that each time a new problem comes out, it’s always the problem for that week! Anyway, we’ve been very gratified by the response and by the ingenious solutions we routinely receive — please keep ‘em coming.

This problem comes courtesy of regular problem-solver Arin Chaudhuri and — I’ll freely admit — this one defeated me. I didn’t feel a bit bad though when Arin revealed his very elegant solution, and I’d like to see what our readers come up with. Here it is:

What is the maximum number of colors you could use to color the vertices of an 8-dimensional cube, so that starting from any vertex, each color occurs as the color of some neighbor of that vertex? (Call two vertices neighbors if they are the two endpoints of an edge of the cube.)

Arin, Vishal, and I would be interested and pleased if someone solved this problem for all dimensions $n$.

Please submit solutions to topological[dot]musings[At]gmail[dot]com by Friday, January 2, 2009, 11:59 pm (UTC); do not submit solutions in Comments. Everyone with a correct solution will be inducted into our Hall of Fame! We look forward to your response.

After a long hiatus, I’d like to renew the discussion of axiomatic categorical set theory, more specifically the Elementary Theory of the Category of Sets (ETCS). Last time I blogged about this, I made some initial forays into “internalizing logic” in ETCS, and described in broad brushstrokes how to use that internal logic to derive a certain amount of the structure one associates with a category of sets. Today I’d like to begin applying some of the results obtained there to the problem of constructing colimits in a category satisfying the ETCS axioms (an ETCS category, for short).

(If you’re just joining us now, and you already know some of the jargon, an ETCS category is a well-pointed topos that satisfies the axiom of choice and with a natural numbers object. We are trying to build up some of the elementary theory of such categories from scratch, with a view toward foundations of mathematics.)

But let’s see — where were we? Since it’s been a while, I was tempted to review the philosophy behind this undertaking (why one would go to all the trouble of setting up a categories-based alternative to ZFC, when time-tested ZFC is able to express virtually all of present-day mathematics on the basis of a reasonably short list of axioms?). But in the interest of time and space, I’ll confine myself to a few remarks.

As we said, a chief difference between ZFC and ETCS resides in how ETCS treats the issue of membership. In ZFC, membership is a global binary relation: we can take any two “sets” $A, B$ and ask whether $A \in B$. Whereas in ETCS, membership is a relation between entities of different sorts: we have “sets” on one side and “elements” on another, and the two are not mixed (e.g., elements are not themselves considered sets).

Further, and far more radical: in ETCS the membership relation $x \in A$ is a function, that is, an element $x$ “belongs” to only one set $A$ at a time. We can think of this as “declaring” how we are thinking of an element, that is, declaring which set (or which type) an element is being considered as belonging to. (In the jargon, ETCS is a typed theory.) This reflects a general and useful philosophic principle: that elements in isolation are considered inessential, that what counts are the aggregates or contexts in which elements are organized and interrelated. For instance, the numeral ‘2’ in isolation has no meaning; what counts is the context in which we think of it (qua rational number or qua complex number, etc.). Similarly the set of real numbers has no real sense in isolation; what counts is which category we view it in.

I believe it is reasonable to grant this principle a foundational status, but: rigorous adherence to this principle completely changes the face of what set theory looks like. If elements “belong” to only one set at a time, how then do we even define such basic concepts as subsets and intersections? These are some of these issues we discussed last time.

There are other significant differences between ZFC and ETCS: stylistically, or in terms of presentation, ZFC is more “top-down” and ETCS is more “bottom-up”. For example, in ZFC, one can pretty much define a subset $\{x \in X: P\}$ by writing down a first-order formula $P$ in the language; the comprehension (or separation) axiom scheme is a mighty sledgehammer that takes care of the rest. In the axioms of ETCS, there is no such sledgehammer: the closest thing one has to a comprehension scheme in the ETCS axioms is the power set axiom (a single axiom, not an axiom scheme). However, in the formal development of ETCS, one derives a comprehension scheme as one manually constructs the internal logic, in stages, using the simple tools of adjunctions and universal properties. We started doing some of that in our last post. So: with ZFC it’s more as if you can just hop in the car and go; with ETCS you build the car engine from smaller parts with your bare hands, but in the process you become an expert mechanic, and are not so rigidly attached to a particular make and model (e.g., much of the theory is built just on the axioms of a topos, which allows a lot more semantic leeway than one has with ZF).

But, in all fairness, that is perhaps the biggest obstacle to learning ETCS: at the outset, the tools available [mainly, the idea of a universal property] are quite simple but parsimonious, and one has to learn how to build some set-theoretic and logical concepts normally taken as “obvious” from the ground up. (Talk about “foundations”!) On the plus side, by building big logical machines from scratch, one gains a great deal of insight into the inner workings of logic, with a corresponding gain in precision and control and modularity when one would like to use these developments to design, say, automated deduction systems (where there tend to be strong advantages to using type-theoretic frameworks).

Enough philosophy for now; readers may refer to my earlier posts for more. Let’s get to work, shall we? Our last post was about the structure of (and relationships between) posets of subobjects $Sub(X)$ relative to objects $X$, and now we want to exploit the results there to build some absolute constructions, in particular finite coproducts and coequalizers. In this post we will focus on coproducts.

Note to the experts: Most textbook treatments of the formal development of topos theory (as for example Mac Lane-Moerdijk) are efficient but highly technical, involving for instance the slice theorem for toposes and, in the construction of colimits, recourse to Beck’s theorem in monad theory applied to the double power-set monad [following the elegant construction of Paré]. The very abstract nature of this style of argumentation (which in the application of Beck’s theorem expresses ideas of fourth-order set theory and higher) is no doubt partly responsible for the somewhat fearsome reputation of topos theory.

In these notes I take a much less efficient but much more elementary approach, based on an arrangement of ideas which I hope can be seen as “natural” from the point of view of naive set theory. I learned of this approach from Myles Tierney, who was my PhD supervisor, and who with Bill Lawvere co-founded elementary topos theory, but I am not aware of any place where the details of this approach have been written up before now. I should also mention that the approach taken here is not as “purist” as many topos theorists might want; for example, here and there I take advantage of the strong extensionality axiom of ETCS to simplify some arguments.

### The Empty Set and Two-Valued Logic

We begin with the easy observation that a terminal category, i.e., a category $\mathbf{1}$ with just one object and one morphism (the identity), satisfies all the ETCS axioms. Ditto for any category $C$ equivalent to $\mathbf{1}$ (where every object is terminal). Such boring ETCS categories are called degenerate; obviously our interest is in the structure of nondegenerate ETCS categories.

Let $\mathbf{E}$ be an ETCS category (see here for the ETCS axioms). Objects of $\mathbf{E}$ are generally called “sets”, and morphisms are generally called “functions” or “maps”.

Proposition 0: If an ETCS category $\mathbf{E}$ is a preorder, then $\mathbf{E}$ is degenerate.

Proof: Recall that a preorder is a category in which there is at most one morphism $A \to B$ for any two objects $A, B$. Every morphism in a preorder is vacuously monic. If there is a nonterminal set $A$, then the monic $A \to 1$ to any terminal set defines a subset $A \subseteq 1$ distinct from the subset defined by $1 \to 1$, thus giving (in an ETCS category) distinct classifying maps $\chi_A, t: 1 \to P(1)$, contradicting the preorder assumption. Therefore all objects $A$ are terminal. $\Box$

Assume from now on that $\mathbf{E}$ is a nondegenerate ETCS category.

Proposition 1: There are at least two truth values, i.e., two elements $1 \to P(1)$, in $\mathbf{E}$.

Proof: By proposition 0, there exist sets $X, Y$ and two distinct functions $f, g: X \to Y$. By the axiom of strong extensionality, there exists $x: 1 \to X$ such that $fx \neq gx$. The equalizer $E \to 1$ of the pair $fx, gx: 1 \to Y$ is then a proper subset of $1$, and therefore there are at least two distinct elements $\chi_E, \chi_1: 1 \to P(1)$. $\Box$

Proposition 2: There are at most two truth values $1 \to P(1)$; equivalently, there are at most two subsets of $1$.

Proof: If $U, V \subseteq 1$ are distinct subsets of $1$, then either $U \neq U \cap V$ or $V \neq U \cap V$, say the former. Then $1_U: U \subseteq U$ and $U \cap V \subseteq U$ are distinct subsets, with distinct classifying maps $\chi_{1_U}, \chi_{U \cap V}: U \to P(1)$. By strong extensionality, there exists $x: 1 \to U$ distinguishing these classifying maps. Because $1$ is terminal, we then infer $1 \subseteq U$ and $U \subseteq 1$, so $U = 1$ as subsets of $1$, and in that case only $V$ can be a proper subset of $1$. $\Box$

By propositions 1 and 2, there is a unique proper subset of the terminal object $1$. Let $0 \to 1$ denote this subset. Its domain may be called an “empty set”; by the preceding proposition, it has no proper subsets. The classifying map $1 \to P1$ of $0 \subseteq 1$ is the truth value we call “false”.

Proposition 3: 0 is an initial object, i.e., for any $X$ there exists a unique function $0 \to X$.

Proof: Uniqueness: if $f, g: 0 \to X$ are maps, then their equalizer $x: E \to 0$, which is monic, must be an isomorphism since 0 has no proper subsets. Therefore $f = g$. Existence: there are monos

$\displaystyle 0 \to 1 \stackrel{t_X}{\to} PX$

$\displaystyle X \stackrel{\sigma}{\to} PX$

where $t_X$ is “global truth” (classifying the subset $X \subseteq X$) on $X$ and $\sigma$ is the “singleton mapping $x \mapsto \{x\}$” on $X$, defined as the classifying map of the diagonal map $\delta: X \subseteq X \times X$ (last time we saw $\sigma$ is monic). Take their pullback. The component of the pullback parallel to $\sigma$ is a mono $P \to 0$ which again is an isomorphism, whence we get a map $0 \cong P \to X$ using the other component of the pullback. $\Box$

Remark: For the “purists”, an alternative construction of the initial set 0 that avoids use of the strong extensionality axiom is to define the subset $0 \subseteq 1$ to be “the intersection all subsets of $1$“. Formally, one takes the extension $\left[\phi\right] \subseteq 1$ of the map

$\displaystyle \phi = (1 \stackrel{t_{P1}}{\to} PP1 \stackrel{\bigcap}{\to} P1);$

where the first arrow represents the class of all subsets of $P1$, and the second is the internal intersection operator defined at the end of our last post. Using formal properties of intersection developed later, this intersection $0 \subseteq 1$ has no proper subsets, and then the proof of proposition 3 carries over verbatim. $\Box$

Corollary 1: For any $X$, the set $0 \times X$ is initial.

Proof: By cartesian closure, maps $0 \times X \to Y$ are in bijection with maps of the form $0 \to Y^X$, and there is exactly one of these since 0 is initial. $\Box$

Corollary 2: If there exists $f: X \to 0$, then $X$ is initial.

Proof: The composite of $\langle f, 1_X \rangle: X \to 0 \times X$ followed by $\pi_2: 0 \times X \to X$ is $1_X$, and $\pi_2$ followed by $\langle f, 1_X \rangle: X \to 0 \times X$ is also an identity since $0 \times X$ is initial by corollary 1. Hence $X$ is isomorphic to an initial object $0 \times X$. $\Box$

By corollary 2, for any object $Y$ the arrow $0 \to Y$ is vacuously monic, hence defines a subset.

Proposition 4: If $X \not\cong 0$, then there exists an element $x: 1 \to X$.

Proof: Under the assumption, $X$ has at least two distinct subsets: $0 \subseteq X$ and $1_X: X \subseteq X$. By strong extensionality, their classifying maps $\chi_0, \chi_1: X \to P1$ are distinguished by some element $x: 1 \to X$.

### External Unions and Internal Joins

One of the major goals in this post is to construct finite coproducts in an ETCS category. As in ordinary set theory, we will construct these as disjoint unions. This means we need to discuss unions first; as should be expected by now, in ETCS unions are considered locally, i.e., we take unions of subsets of a given set. So, let $A, B \subseteq X$ be subsets.

To define the union $A \cup B \subseteq X$, the idea is to take the intersection of all subsets containing $A$ and $B$. That is, we apply the internal intersection operator (constructed last time),

$\displaystyle \bigcap: PPX \to PX,$

to the element $1 \to PPX$ that represents the set of all subsets of $X$ containing $A$ and $B$; the resulting element $1 \to PX$ represents $A \cup B$. The element $1 \to PPX$ corresponds to the intersection of two subsets

$\{C \in PX: A \subseteq C\} \cap \{C \in PX: B \subseteq C\} \subseteq PX.$

Remark: Remember that in ETCS we are using generalized elements: $C \in PX$ really means a function $C: U \to PX$ over some domain $U$, which in turn classifies a subset $\left[C\right] \subseteq U \times X$. On the other hand, the $A$ here is a subset $A \subseteq X$. How then do we interpret the condition “$A \subseteq C$“? We first pull back $\chi_A: 1 \to PX$ over to the domain $U$; that is, we form the composite $\displaystyle U \stackrel{!}{\to} 1 \stackrel{\chi_A}{\to} PX$, and consider the condition that this is bounded above by $C: U \to PX$. (We will write $\chi_A \leq C$, thinking of the left side as constant over $U$.) Externally, in terms of subsets, this corresponds to the condition $U \times A \subseteq \left[C\right]$.

We need to construct the subsets $\{C \in PX: A \subseteq C\}, \{C \in PX: B \subseteq C\}$. In ZFC, we could construct those subsets by applying the comprehension axiom scheme, but the axioms of ETCS have no such blanket axiom scheme. (In fact, as we said earlier, much of the work on “internalizing logic” goes to show that in ETCS, we instead derive a comprehension scheme!) However, one way of defining subsets in ETCS is by taking loci of equations; here, we express the condition $A \subseteq C$, more pedantically $A \subseteq \left[C\right]$ or $\chi_A \leq C$, as the equation

$(\chi_A \Rightarrow C) = t_X$

where the right side is the predicate “true over $X$“.

Thus we construct the subset $\{C \in PX: A \subseteq C\}$ of $PX$ via the pullback:

{C: A ≤ C} -------> 1
|              |
|              | t_X
V chi_A => -   V
PX -----------> PX

Let me take a moment to examine what this diagram means exactly. Last time we constructed an internal implication operator

$\Rightarrow: P1 \times P1 \to P1$

and now, in the pullback diagram above, what we are implicitly doing is lifting this to an operator

$\Rightarrow_X: PX \times PX \to PX$

The easy and cheap way of doing this is to remember the isomorphism $PX \cong P1^X$ we used last time to uncover the cartesian closed structure, and apply this to

$\displaystyle P1^X \times P1^X \cong (P1 \times P1)^X \stackrel{\Rightarrow^X}{\to} P1^X$

to define $\Rightarrow_X: PX \times PX \to PX$. This map classifies a certain subset of $X \times PX \times PX$, which I’ll just write down (leaving it as an exercise which involves just chasing the relevant definitions):

$\{(x, T, S) \in X \times PX \times PX: x \in_X T \Rightarrow x \in_X S\}$

Remark: Similarly we can define a meet operator $\wedge_X: PX \times PX \to PX$ by exponentiating the internal meet $P1 \times P1 \to P1$. It is important to know that the general Heyting algebra identities which we established last time for $P1$ lift to the corresponding identities for the operators $\wedge_X, \Rightarrow_X$ on $PX$. Ultimately this rests on the fact that the functor $(-)^X$, being a right adjoint, preserves products, and therefore preserves any algebraic identity which can be expressed as a commutative diagram of operations between such products.

Hence, for the fixed subset $A \subseteq X$ (classified by $\chi_A: 1 \to PX$), the operator

$\chi_A \Rightarrow -: PX \to PX$

classifies the subset

$\{(x, S): x \in_X A \Rightarrow x \in_X S\} \hookrightarrow X \times PX$

Finally, in the pullback diagram above, we are pulling back the operator $\chi_A \Rightarrow -$ against $t_X$. But, from last time, that was exactly the method we used to construct universal quantification. That is, given a subset

$R \subseteq X \times Y$

we defined $\forall_Y R \subseteq X$ to be the pullback of $t_Y: 1 \hookrightarrow PY$ along $\chi_R: X \to PY$. Putting all this together, the pullback diagram above expresses the definition

$\{C \in PX: A \subseteq C\} := \{C \in PX: \forall_{x \in X} \ x \in_X A \Rightarrow x \in_X C\}$

that one would expect “naively”.

Now that all the relevant constructions are in place, we show that $A \cup B$ is the join of $A$ and $B$ in the poset $Sub(X)$. There is nothing intrinsically difficult about this, but as we are still in the midst of constructing the internal logic, we will have to hunker down and prove some logic things normally taken for granted or zipped through without much thought. For example, the internal intersection operator was defined with the help of internal universal quantification, and we will need to establish some formal properties of that.

Here is a useful general principle for doing internal logic calculations. Let $\chi_R: X \to PY$ be the classifying map of a subset $R \subseteq X \times Y$, and let $f: U \to X$ be a function. Then the composite $\chi_R f: U \to PY$ classifies the subset

$(f \times 1_Y)^{-1}(R) \subseteq U \times Y$

so that one has the general identity $\chi_R f = \chi_{(f \times 1)^{-1}(R)}$. In passing back and forth between the external and internal viewpoints, the general principle is to try to render “complicated” functions $U \to PY$ into a form $\chi_R f$ which one can more easily recognize. For lack of a better term, I’ll call this the “pullback principle”.

Lemma 1: Given a relation $R \hookrightarrow X \times Y$ and a constant $c: 1 \to Y$, there is an inclusion

$\forall_Y R \subseteq (1_X \times c)^{-1}(R)$

as subsets of $X$. (In traditional logical syntax, this says that for any element $c: 1 \to Y$,

$\forall_{y: Y} R(x, y)$ implies $R(x, c)$

as predicates over elements $x \in X$. This is the type of thing that ordinarily “goes without saying”, but which we actually have to prove here!)

Proof: As we recalled above, $\forall_Y R \subseteq X$ was defined to be $\chi_R^{-1}(t_Y)$, the pullback of global truth $t_Y: 1 \to PY$ along the classifying map $\chi_R: X \to PY$. Hold that thought.

Let

$Pc: PY \to P1$

be the map which classifies the subset $\{S \in PY: c \in_Y S\}$. Equivalently, this is the map

$P1^c: P1^Y \to P1^1$

under the canonical isomorphisms $PY \cong P1^Y$, $P1^1 \cong P1$. Intuitively, this maps $f \mapsto f(c)$, i.e., plugs an element $c \in Y$ into an element $f \in P1^Y$.

Using the adjunction $(- \times Y) \dashv (-)^Y$ of cartesian closure, the composite

$\displaystyle X \stackrel{\chi_R}{\to} PY \stackrel{Pc}{\to} P1$

transforms to the composite

$X \times 1 \stackrel{1_X \times c}{\to} X \times Y \stackrel{\chi_{R}}{\to} P1$

so by the pullback principle, $(Pc)\chi_R: X \times 1 \to P1$ classifies $(1_X \times c)^{-1}(R) \subseteq X \times 1 \cong X$.

Equivalently,

$(1_X \times c)^{-1}(R) = \chi_{R}^{-1} (Pc)^{-1}(t) \qquad (1)$

Also, as subsets of $PY$, we have the inclusion

$(t_Y: 1 \hookrightarrow PY) \subseteq (Pc)^{-1}(t) \qquad (2)$

[this just says that $t_Y$ belongs to the subset classified by $Pc$, or equivalently that $c: 1 \to Y$ is in the subset $Y \subseteq Y$]. Applying the pullback operation $\chi_{R}^{-1}$ to (2), and comparing to (1), lemma 1 follows. $\Box$

Lemma 2: If $R \subseteq S$ as subsets of $X \times Y$, then $\forall_Y R \subseteq \forall_Y S$.

Proof: From the last post, we have an adjunction:

$T \times Y \subseteq S$ if and only if $T \subseteq \forall_Y S$

for any subset of $T \subseteq X$. So it suffices to show $\forall_Y R \times Y \subseteq S$. But

$\forall_Y R \times Y \subseteq R \subseteq S$

where the first inclusion follows from $\forall_Y R \subseteq \forall_Y R$. $\Box$

Next, recall from the last post that the internal intersection of $F: 1 \to PPX$ was defined by interpreting the following formula on the right:

$\displaystyle \bigcap F = \forall_{S \in PX} (S \in_{PX} F) \Rightarrow (x \in_X S)$

Lemma 3: If $F \leq G: 1 \to PPX$, then $\displaystyle \bigcap G \leq \bigcap F: 1 \to PX$.

Proof: $F: 1 \to PPX$ classifies the subset $\{S \in PX: S \in_{PX} F\}$, i.e.,  $F$ is identified with the predicate $S \in_{PX} F$ in the argument $S$, so by hypothesis $(S \in_{PX} F) \leq (S \in_{PX} G)$ as predicates on $S$. Internal implication $a \Rightarrow b$ is contravariant in the argument $a$ [see the following remark], so

$\left[(S \in G) \Rightarrow (x \in S)\right] \leq \left[(S \in F) \Rightarrow (x \in S)\right]$

Now apply lemma 2 to complete the proof. $\Box$

Remark: The contravariance of $- \Rightarrow b$, that is, the fact that

$x \leq y$ implies $(y \Rightarrow b) \leq (x \Rightarrow b),$

is a routine exercise using the adjunction [discussed last time]

$a \wedge c \leq b$ if and only if $c \leq (a \Rightarrow b).$

Indeed, we have

$x \wedge (y \Rightarrow b) \leq y \wedge (y \Rightarrow b) \leq b \qquad (*)$

where the first inequality follows from the hypothesis $x \leq y$, and the second follows from $y \Rightarrow b \leq y \Rightarrow b$. By the adjunction, the inequality (*) implies $(y \Rightarrow b) \leq (x \Rightarrow b)$. $\Box$

Theorem 1: For subsets $A, B$ of $X$, the subset $A \cup B$ is an upper bound of $A$ and $B$, i.e., $A, B \subseteq A \cup B$.

Proof: It suffices to prove that $\displaystyle A = \bigcap \{C \in PX: A \subseteq C\}$, since then we need only apply lemma 3 to the trivially true inclusion

$\{C \in PX: A \subseteq C\} \cap \{C \in PX: B \subseteq C\} \subseteq \{C \in PX: A \subseteq C\}$

to infer $A \subseteq A \cup B$, and similarly $B \subseteq A \cup B$. (Actually, we need only show $\displaystyle A \subseteq \bigcap \{C \in PX: A \subseteq C\}$. We’ll do that first, and then show full equality.)

The condition we want,

$A \subseteq \{x \in X: \forall_{S \in PX} (A \subseteq S) \Rightarrow (x \in_X S)\},$

is, by the adjunction $(- \times PX) \dashv \forall_{PX}: Sub(X \times PX) \to Sub(X)$, equivalent to

$A \times PX \subseteq \{(x, S): A \subseteq S \Rightarrow (x \in_X S)\}$

which, by a $\wedge$-$\Rightarrow$ adjunction, is equivalent to

$(A \times PX) \cap (X \times \{S \in PX: A \subseteq S\}) \subseteq \ \in_X \qquad (1)$

as subsets of $X \times PX$. So we just have to prove (1). At this point we recall, from our earlier analysis, that

$\{S \in PX: A \subseteq S\} = \{S \in PX: \forall_{x \in X} x \in_X A \Rightarrow x \in_X S\}$

Using the adjunction $(X \times -) \dashv \forall_X$, as in the proof of lemma 2, we have

$X \times \{S \in PX: \forall_{x \in X} x \in_X A \Rightarrow x \in_X S\}$

$\subseteq \{(x, S): x \in_X A \Rightarrow x \in_X S\} := (A \times PX) \Rightarrow \in_X,$

which shows that the left side of (1) is contained in

$(A \times PX) \cap ((A \times PX) \Rightarrow \in_X) \subseteq \ \in_X,$

where the last inclusion uses another $\wedge$-$\Rightarrow$ adjunction. Thus we have established (1) and therefore also the inclusion

$\displaystyle A \subseteq \bigcap \{S \in PX: A \subseteq S\}$

Now we prove the opposite inclusion

$\displaystyle \bigcap \{S \in PX: A \subseteq S\} \subseteq A,$

that is to say

$\{x \in X: \forall_{S \in PX} A \subseteq S \Rightarrow x \in_X S\} \subseteq A \qquad (**)$

Here we just use lemma 1, applied to the particular element $\chi_A: 1 \to PX$: we see that the left side of (**) is contained in

$\{x \in X: A \subseteq \left[\chi_A\right] \Rightarrow x \in_X A\}$

which collapses to $\{x \in X: x \in_X A\} = A$, since $A = \left[\chi_A\right]$. This completes the proof. $\Box$

Theorem 2: $A \cup B$ is the least upper bound of $A, B$, i.e., if $C \subseteq X$ is a subset containing both $A \subseteq X$ and $B \subseteq X$, then $A \cup B \subseteq C$.

Proof: We are required to show that

$\{x \in X: \forall_{S \in PX} \ (A \subseteq S \wedge B \subseteq S) \Rightarrow x \in_X S\} \subseteq C.$

Again, we just apply lemma 1 to the particular element $\chi_C: 1 \to PX$: the left-hand side of the claimed inclusion is contained in

$\{x \in X: (A \subseteq C \wedge B \subseteq C) \Rightarrow x \in_X C\}$

but since $(A \subseteq C \wedge B \subseteq C)$ is true by hypothesis (is globally true as a predicate on the implicit variable $x \in X$), this last subset collapses to

$\{x \in X: t_X \Rightarrow x \in_X C\} = \{x \in X: x \in_X C\} = C$

which completes the proof. $\Box$

Theorems 1 and 2 show that for any set $X$, the external poset $Sub(X)$ admits joins. One may go on to show (just on the basis of the topos axioms) that as in the case of meets, the global external operation of taking joins is natural in $X$, so that by the Yoneda principle, it is classified by an internal join operation

$\vee: P1 \times P1 \to P1,$

namely, the map which classifies the union of the subsets

$\displaystyle \left[\pi_1\right] = P1 \times 1 \stackrel{1 \times t}{\hookrightarrow} P1 \times P1$

$\displaystyle \left[\pi_2\right] = 1 \times P1 \stackrel{t \times 1}{\hookrightarrow} P1 \times P1$

and this operation satisfies all the expected identities. In short, $P1$ carries an internal Heyting algebra structure, as does $PX \cong P1^X$ for any set $X$.

We will come back to this point later, when we show (as a consequence of strong extensionality) that $P1$ is actually an internal Boolean algebra.

### Construction of Coproducts

Next, we construct coproducts just as we do in ordinary set theory: as disjoint unions. Letting $X, Y$ be sets (objects in an ETCS category), a disjoint union of $X$ and $Y$ is a pair of monos

$i: X \hookrightarrow Z \qquad j: Y \hookrightarrow Z$

whose intersection is empty, and whose union or join in $Sub(Z)$ is all of $Z$. We will show that disjoint unions exist and are essentially unique, and that they satisfy the universal property for coproducts. We will use the notation $X + Y$ for a disjoint union.

Theorem 3: A disjoint union of $X$ and $Y$ exists.

Proof: It’s enough to embed $X, Y$ disjointly into some set $C$, since the union of the two monos in $Sub(C)$ would then be the requisite $Z$. The idea now is that if a disjoint union or coproduct $X+Y$ exists, then there’s a canonical isomorphism $P(X + Y) \cong PX \times PY$. Since the singleton map

$\sigma: X + Y \to P(X + Y) \cong PX \times PY$

is monic, one thus expects to be able to embed $X$ and $Y$ disjointly into $PX \times PY$. Since we can easily work out how all this goes in ordinary naive set theory, we just write out the formulas and hope it works out in ETCS.

In detail, define $i_X: X \to PX \times PY$ to be

$\displaystyle X \cong X \times 1 \stackrel{\sigma_X \times \chi_0}{\to} PX \times PY$

where $\sigma_X$ is the singleton mapping and $\chi_0$ classifies $0 \hookrightarrow Y$; similarly, define $i_Y: Y \to PX \times PY$ to be

$\displaystyle Y \cong 1 \times Y \stackrel{\chi_0 \times \sigma_Y}{\to} PX \times PY.$

Clearly $i_X$ and $i_Y$ are monic, so to show disjointness we just have to show that their pullback is empty. But their pullback is isomorphic to the cartesian product of the pullbacks of the diagrams

$\displaystyle X \stackrel{\sigma_X}{\to} PX \stackrel{\chi_0}{\leftarrow} 1 \qquad 1 \stackrel{\chi_0}{\to} PY \stackrel{\sigma_Y}{\leftarrow} Y$

so it would be enough to show that each (or just one) of these two pullbacks is empty, let’s say the first.

Suppose given a map $h: A \to X$ which makes the square

  A -------> 1
|          |
h |          | chi_0
V sigma_X  V
X -------> PX

commute. Using the pullback principle, the map $A \to 1 \stackrel{\chi_0}{\to} PX$ classifies

$0 \times A \hookrightarrow X \times A$

which is just the empty subset. This must be the same subset as classified by $\sigma_X h = \chi_{\delta}h$ (where $\delta: X \hookrightarrow X \times X$ is the diagonal), which by the pullback principle is

$(1_X \times h)^{-1}(\delta) \hookrightarrow X \times A.$

An elementary calculation shows this to be the equalizer of the pair of maps

$\pi_1, h\pi_2: X \times A \stackrel{\to}{\to} X$

So this equalizer $E$ is empty. But notice that $\langle h, 1 \rangle: A \to X \times A$ equalizes this pair of maps. Therefore we have a map $A \to E \cong 0$. By corollary 2 above, we infer $A \cong 0$. This applies to the case where $A$ is the pullback, so the pullback is empty, as was to be shown. $\Box$

Theorem 4: Any two disjoint unions of $X, Y$ are canonically isomorphic.

Proof: Suppose $i: X \to Z \leftarrow Y: j$ is a disjoint union. Define a map

$\phi= \langle \phi_1, \phi_2 \rangle: Z \to PX \times PY$

where $\phi_1: Z \to PX$ classifies the subset $\langle 1_X, i \rangle: X \to X \times Z$, and $\phi_2: Z \to PY$ classifies the subset $\langle 1_Y, j \rangle: Y \to Y \times Z$. Applying the pullback principle, the composite $\phi_1 i: X \to PX$ classifies

$(1_X \times i)^{-1}(\langle 1_X, i \rangle) \hookrightarrow X \times X$

which is easily seen to be the diagonal on $X$. Hence $\phi_1 i = \sigma_X$. On the other hand, $\phi_1 j: Y \to PX$ classifies the subset

$(1_X \times j)^{-1}(\langle 1_X, i \rangle) \hookrightarrow X \times Y$

which is empty because $i$ and $j$ are disjoint embeddings, so $\phi_1 j = \chi_0: Y \to PX$. Similar calculations yield

$\phi_2 i = \chi_0: X \to PY \qquad \phi_2 j = \sigma_Y: Y \to PY.$

Putting all this together, we conclude that $\phi i = i_X: X \to PX \times PY$ and $\phi j = i_Y: Y \to PX \times PY$, where $i_X$ and $i_Y$ were defined in the proof of theorem 3.

Next, we show that $\phi$ is monic. If not, then by strong extensionality, there exist distinct elements $c, d: 1 \to Z$ for which $\phi(c) = \phi(d)$; therefore, $\phi_1 c = \phi_1 d: 1 \to PX$ and $\phi_2 c = \phi_2 d: 1 \to PY$. By the pullback principle, these equations say (respectively)

$c^{-1}(i) = d^{-1}(i) \hookrightarrow 1 \qquad c^{-1}(j) = d^{-1}(j) \hookrightarrow 1$

If $c^{-1}(i) = 1$, then both $c, d: 1 \to Z$ factor through the mono $i: X \to Z$. However, since $\phi i = i_{X}$ is monic, this would imply that $\phi (c) \neq \phi (d)$, contradiction. Therefore $c^{-1}(i) = c \cap i = 0$. By similar reasoning, $c \cap j = 0$. Therefore

$i \subseteq \neg c \qquad j \subseteq \neg c$

where $\neg = (- \Rightarrow 0): Sub(Z) \to Sub(Z)$ is the negation operator. But then $i \cup j \subseteq \neg c$. And since $1: Z \hookrightarrow Z$ is the union $i \cup j$ by assumption, $\neg c$ must be the top element $\top \in Sub(Z)$, whence $c: 1 \to Z$ is the bottom element 0. This contradicts the assumption that the topos is nondegenerate. Thus we have shown that $\phi$ must be monic.

The argument above shows that $\phi: Z \hookrightarrow PX \times PY$ is an upper bound of $i_X: X \to PX \times PY$ and $i_Y: Y \to PX \times PY$ in $Sub(PX \times PY)$. It follows that the join $X + Y$ constructed in theorem 3 is contained in $\phi: Z \to PX \times PY$, and hence can be regarded as the join of $X$ and $Y$ in $Sub(Z)$. But $Z$ is their join in $Sub(Z)$ by assumption of being a disjoint union, so the containment $X + Y \subseteq Z$ must be an equality. The proof is now complete. $\Box$

Theorem 5: The inclusions $i_X: X \to X + Y$, $i_Y: Y \to X + Y$ exhibit $X + Y$ as the coproduct of $X$ and $Y$.

Proof: Let $f: X \to B$, $g: Y \to B$ be given functions. Then we have monos

$\langle 1_X, f \rangle: X \hookrightarrow X \times B \qquad \langle 1_Y, g \rangle: Y \hookrightarrow Y \times B \qquad (1)$

Now the operation $- \times B: Sub(C) \to Sub(C \times B)$ certainly preserves finite meets, and also preserves finite joins because it is left adjoint to $\forall_B: Sub(C \times B) \to Sub(C)$. Therefore this operation preserves disjoint unions; we infer that the monos

$X \times B \stackrel{i_X \times B}{\to} (X + Y) \times B \qquad Y \times B \stackrel{i_Y \times B}{\to} (X + Y) \times B \qquad (2)$

exhibit $(X + Y) \times B$ as a disjoint union of $X \times B, Y \times B$. Composing the monos of (1) and (2), we have disjoint embeddings of $X$ and $Y$ in $(X + Y) \times B$. Using theorem 4, $X + Y$ is isomorphic to the join of these embeddings; this means we have an inclusion

$\langle 1, h \rangle: X + Y \hookrightarrow (X + Y) \times B$

whose restriction to $X$ yields $\langle i_X, f \rangle$ and whose restriction to $Y$ yields $\langle i_Y, g \rangle$. Hence $h: X + Y \to B$ extends $f$ and $g$. It is the unique extension, for if there were two extensions $h, h'$, then the equalizer of $\langle 1, h \rangle$ and $\langle 1, h' \rangle$ would be an upper bound of $X, Y$ in $Sub((X + Y) \times B)$, contradicting the fact that $X + Y$ is the least upper bound. This completes the proof. $\Box$

I think that’s enough for one day. I will continue to explore the categorical structure and logic of ETCS next time.

Or, at least, that’s what this blog post at Science and Math Defeated aims to do. Normally, I avoid writing on such a topic but I think the following example could be instructive to a few people, at least, in learning how not to infer from mathematical induction. The author of that blog post sets to “disprove” the foundation of Calculus by showing that the “assumption” $0.999 \ldots = 1$ leads to a contradiction (which I am sure most of you have seen before.) And this is supposed to be achieved  through the use of Mathematical Induction.

Let $P(n)$ be the statement $\displaystyle 0.\underbrace{999...9}_{n \, 9's} < 1$ for all $n \in \mathbb{N}$ and $n \ge 1$.

Claim: $P(n)$ is true for all $n \ge 1$.

Proof: $0.9 < 1$, and so, $P(1)$ is true. This takes care of the base case. Now assume $P(n)$ is true for some $k \in \mathbb{N}$, where $k \geq 1$. Now, it is easy to show that $P(k+1)$ is true as well (I just skipped some details!). Hence, $P(k) \Rightarrow P(k+1)$ holds. This takes care of the induction step. (Note that $P(k+1)$ is shown to be true independent of $P(k)$!) And, this proves our claim.

(Erroneous) Conclusion: Hence, $0.999\ldots < 1$.

Notwithstanding the inductive proof (which is correct) above, why is the above conclusion wrong?

Ans. Because “infinity” is not a member of $\mathbb{N}$.

(Watch out for Todd’s next post in the ETCS series!)

I would like to quickly point out to our readers that Jason Dyer is currently hosting the 43rd Carnival of Mathematics and that the Carnival lists Todd’s POW-11 (Preserving Sums of Squares) post as one of its entries!

The following “polynomial-logarithmic” algebraic identity that one encounters on many occasions turns out to have a rather useful set of applications!

POLYNOMIAL-LOGARITHMIC IDENTITY: If $P(z)$ is a polynomial of degree $n \ge 1$ with roots $a_1, a_2, \ldots, a_n$, then $\displaystyle \frac{P'(z)}{P(z)} = \frac1{z-a_1} + \frac1{z-a_2} + \ldots + \frac1{z-a_n}$.

PROOF: This one is left as a simple exercise. (Hint: Logarithms!)

A nice application of the above identity is found in one of the exercises from the chapter titled Analysis (p120) in Proofs from the Book by Aigner, Ziegler and Hofmann.

EXERCISE: Let $p(x)$ be a non-constant polynomial with only real zeros. Show that $p'(x)^2 \ge p(x) p''(x)$ for all $x \in \mathbb{R}$.

SOLUTION: If $x = a_i$ is a zero of $p(x)$, then the right hand side of the above inequality equals zero, and we are done. So, suppose $x$ is not a root of $p(x)$. Then, differentiating the above identity w.r.t. $x$, we obtain $\displaystyle \frac{p''(x)p(x) - p'(x)^2}{p(x)^2} = - \sum_{k=1}^n \frac1{(x - a_k)^2} < 0$, and we are done.

It turns out that the above identity can also used to prove the well-known Gauss-Lucas theorem.

GAUSS-LUCAS: If $P$ is a non-constant polynomial, then the zeros of $P'$ lie in the convex hull of the roots of $P$.

PROOF: See this

HISTORY: The well-known Russian author V.V. Prasolov in his book Polynomials offers a brief and interesting historical background of the theorem, in which he points out that Gauss’ original proof (in 1836) of a variant of the theorem was motivated by physical concepts, and it was only in 1874 that F. Lucas, a French Engineer, formulated and proved the above theorem. (Note that the Gauss-Lucas theorem can also be thought of as some sort of a generalization (at least, in spirit!) of Rolle’s theorem.)

Even though I knew the aforesaid identity before, it was once again brought to my attention through a nice (and elementary) article, titled On an Algebraic Identity by Roberto Bosch Cabrera, available at Mathematical Reflections. In particular, Cabrera offers a simple solution, based on an application of the given identity, to the following problem (posed in the 2006 4th issue of Mathematical Reflections), the solution to which had either escaped regular problem solvers or required knowledge of some tedious (albeit elementary) technique.

PROBLEM: Evaluate the sum $\displaystyle \sum_{k=0}^{n-1} \frac1{1 + 8\sin^2 (k\pi /n)}$. (proposed by Dorin Andrica and Mihai Piticari.)

There is yet another problem which has a nice solution based again on our beloved identity!

PROBLEM: (Putnam A3/2005) Let $p(z)$ be a polynomial of degree $n$, all of whose zeros have absolute value 1 in the complex plane. Put $g(z) = p(z)/z^{n/2}$. Show that all zeros of $g'(z) = 0$ have absolute value 1.

• 262,812 hits