You are currently browsing Todd Trimble’s articles.

Huh — no solutions to POW-13 came in!  I guess I was surprised by that.

Ah well, that’s okay. The problem wasn’t exactly trivial; there are some fairly deep and interesting things going on in that problem that I’d like to share now. First off, let me say that the problem comes from The Unapologetic Mathematician, the blog written by John Armstrong, who posted the problem back in March this year. He in turn had gotten the problem from Noam Elkies, who kindly responded to some email I wrote and had some pretty insightful things to say.

In lieu of a reader solution, I’ll give John’s solution first, and then mine, and then touch on some of the things Noam related in his emails. But before we do so, let me paraphrase what John wrote at the end of his post:

Here’s a non-example. Pick $m$ points $(1, 0), (2, 0), (3, 0)$ and so on up to $(m, 0)$. Pick $n$ points $(1, 1)$, $(2, 1)$, $(3, 1)$, and so on up to $(n, 1)$. In this case we have $m+n-1$ blocking points at $(1, \frac1{2})$, $(\frac{3}{2}, \frac1{2})$, and so on by half-integers up to $(\frac{m+n}{2}, \frac1{2})$. Of course this solution doesn’t count because the first $m$ points lie on a line as do the $n$ points that follow, which violates the collinearity condition of the problem.

Here’s a picture of that scenario when $m = n = 3$:

Does this configuration remind you of anything? Did somebody say “Pappus’s theorem“? Good. Hold that thought please.

Okay, in the non-example the first $m+n$ points were on two lines, which is disallowed. Now the two lines here, $y = 0$ and $y = 1$, form a degenerate conic $y^2 - y = 0$. Thinking in good algebraic geometry fashion, perhaps we can modify the non-solution by replacing the degenerate conic by an honest (smooth) nondegenerate conic, like an ellipse or something, so that at most two of the $m+n$ points are on any given line. This should put one in the mood for our first solution.

John Armstrong writes:  Basically, I took the non-example and imagined bending back the two lines to satisfy the collinearity (pair-of-lines = degenerate conic, so non-example is degenerate example).  The obvious pair of curves to use is the two branches of a hyperbola.  But hyperbolas can be hard to work with, so I decided to do a projective transformation to turn it into a parabola.

So let’s consider points on a parabola.  The points $(-i, i^2)$ and $(j, j^2)$ are connected by a line of slope
$\displaystyle \frac{j^2 - i^2}{j + i} = j - i$

The line itself is $y = (j-i)x + i j$.  Which has the obvious $y$-intercept $(0, i j)$.  Now we need to pick a lot of $i$ and $j$ values to get repeated products. Place $m$ points at $(-1, 1)$, $(-2, 4)$, $(-4, 16)$, $(-8, 64)$, and so on above the negative powers of two.  Place $n$ points at $(1, 1)$, $(2, 4)$, $(4, 16)$, $(8, 64)$, and so on above the positive powers of two.  The blocking points are then at $(0, 1)$, $(0, 2)$, $(0, 4)$, $(0, 8)$, and so on up the $y$-axis by powers of two.  Presto! $\Box$

Very nice. My own solution was less explicit in the sense that I didn’t actually write down coordinates of points, but gave instead a general recipe which relies instead on the geometry of conics, in fact on a generalization of Pappus’s theorem known to me as “Pascal’s mystic hexagon“. I first learned about this from a wonderful book:

• C. Herbert Clemens, A Scrapbook of Complex Curve Theory (2nd Edition), Graduate Studies in Mathematics 55, AMS (2002).

Pascal’s Mystic Hexagon, version A: Consider any hexagon inscribed in a conic, with vertices marked $x_1, x_2, x_3, y_1, y_2, y_3$. For $i \neq j$, mark the intersection of the lines $\overline{x_i y_j} \cap \overline{x_j y_i}$ by $z_{i+j-1}$. Then $z_2, z_3, z_4$ are collinear. (The reason for the strange indexing will be clear in a moment.)

Pascal’s Mystic Hexagon, version B: Consider any pentagon inscribed in a conic $C$, with vertices marked $x_1, x_2, x_3, y_1, y_2$. Choose any line $L$ through $z_2 = \overline{x_1 y_2} \cap \overline{y_1 x_2}$, and define $z_3 = L \cap \overline{y_1 x_3}$ and $z_4 = L \cap \overline{y_2 x_3}$.  Then the intersection $y_3 := \overline{x_1 z_3} \cap \overline{x_2 z_4}$ is the sixth point of a hexagon inscribed in $C$.

The following solution uses version B.

Solution by Todd and Vishal: For the sake of explicitness, let the conic $C$ be a circle $x^2 + y^2 = 1$ and let the line (segment) $L$ be the diameter along $y = 0$. Choose two points $x_1, x_2$ on $C$ above $L$ and a point $y_1$ on $C$ below $L$. The remaining points are determined recursively by a zig-zag procedure, where at each stage $x_{n+1}$ is the intersection $C \cap \overline{y_1 z_{n+1}}$, where $z_{2n} = L \cap \overline{x_{n+1} y_n}$, where $y_{n+1} = C \cap \overline{x_1 z_{n+1}}$, and $z_{2n+1} = L \cap \overline{x_{n+1} y_{n+1}}$. We will show the blocking condition is satisfied: for all $i, j$, the point $z_{i+j-1}$ is on the line $\overline{x_i y_j}$.

To get started, notice that the blocking condition is trivially satisfied up to the point where we construct $z_1, z_2, y_2, z_3, x_3, z_4$. Mystic Hexagon B ensures that $y_3$, as defined above, is blocked from $x_1$ by $z_3$ and from $x_2$ by $z_4$. Then define $z_5$ as above. So far the blocking condition is satisfied.

Suppose the blocking condition is satisfied for the points $x_1, \ldots, x_n, y_1, \ldots, y_n, z_1, \ldots, z_{2n-1}$. Define $x_{n+1} = C \cap \overline{y_1 z_{n+1}}$, as above. Then Mystic Hexagon B, applied to the pentagon consisting of points $y_1, y_2, y_3, x_{n-1}, x_n$, shows that $x_{n+1}$ is blocked from $y_1$ by $z_{n+1}$ and from $y_2$ by $z_{n+2}$.

This shows that $x_{n+1}$ could have been defined to be $C \cap \overline{y_2 z_{n+2}}$. Then Mystic Hexagon B, applied to the pentagon $y_2, y_3, y_4, x_{n-1}, x_n$, shows that $x_{n+1}$ is blocked from $y_2$ by $z_{n+2}$ and from $y_3$ by $z_{n+3}$. This shows $x_{n+1}$ could have been defined to be $C \cap \overline{y_3 z_{n+3}}$.

And so on up the inductive ladder: for $2 \leq i \leq n-1$, defining $x_{n+1} = C \cap \overline{y_{i-1} z_{n+i-1}}$, Hexagon B applied to the pentagon $y_{i-1}, y_i, y_{i+1}, x_{n-1}, x_n$ shows that $x_{n+1}$ could have been defined to be $C \cap \overline{y_i z_{n+i}}$. This shows the blocking condition is satisfied for $1 \leq i \leq n+1$, $1 \leq j \leq n-1$. We block $x_{n+1}$ from $y_n$ by defining $z_{2n}$ as above, to extend up to $j = n$.

Then, as prescribed above, we define $y_{n+1} = C \cap \overline{x_1 z_{n+1}}$, and repeat the preceding argument mutatis mutandis (interchanging $x$‘s and $y$‘s), to obtain the blocking conditions up to $1 \leq i \leq n+1$, $1 \leq j \leq n+1$. This completes the proof. $\Box$

What this argument shows (with Hexagon B doing all the heavy lifting) is that no cleverness whatsoever is required to construct the desired points: starting with any nondegenerate conic $C$, any secant line $L$, and any three initial points $x_1, x_2, y_1$ on $C$ to get started (with $x_1, x_2$‘s on one side of $L$ and $y_1$ on the other), the whole construction is completely forced and works no matter what!

Which may lead one to ask: what is behind this “miracle” called Pascal’s Mystic Hexagon?  Actually, not that much! Let me give a seat-of-the-pants argument for why one might expect it to hold, and then give a somewhat more respectable argument.

Define a planar cubic to be the locus $P(x, y) = 0$ of any degree 3 polynomial $P$ in two variables. (We should actually be doing this as projective geometry, so I ought to write $P(x, y, z) = 0$ where $P$ is homogeneous of degree 3, but I think I’ll skip that.) For example, the union of three distinct lines is a cubic where $P$ is a product of three degree 1 polynomials. What is the dimension of the space of planar cubics? A cubic polynomial $P(x, y)$,

$a_0 + a_1 x + a_2 y + a_3 x^2 + a_4 x y + a_5 y^2 + a_6 x^3 + a_7 x^2 y + a_8 x y^2 + a_9 y^3,$

has 10 coefficients. But the equation $P(x, y) = 0$ is equivalent to the equation $\lambda P(x, y) = 0$ for any nonzero scalar $\lambda$; modding out by scalars, there are 9 degrees of freedom in the space of cubics. What is the dimension of the space of cubics passing through a given point $(x, y) = (a, b)$? The condition $P(a, b) = 0$ gives one linear equation on the coefficients $a_0, \ldots, a_9$, so we cut down a degree of freedom, and the dimension would be 8. Similarly, if we ask for the dimension of cubics passing through 8 given points, we get a system of eight linear equations, and we cut down by eight degrees of freedom: in general, the space of cubics through 8 points is “expected” to be (is “almost always”) 1-dimensional, in fact, a (projective) line.

In the configuration for Pascal’s Mystic Hexagon, version A

we see three cubics passing through the 8 points $x_1, x_2, x_3, y_1, y_2, y_3, z_2, z_3$, namely:

• $A = \overline{x_1 y_2} \cup \overline{x_2 y_3} \cup \overline{x_3 y_1}$
• $B = \overline{x_2 y_1} \cup \overline{x_3 y_2} \cup \overline{x_1 y_3}$
• $C \cup L$ where the conic $C$ is defined by a degree 2 polynomial

Since we expect that the space of cubics through these eight points is a line, we should have a linear relationship between the cubic polynomials $P, Q, R$ used respectively to define $A, B$, and $C \cup L$ above, hence we would get

$\lambda P(x, y) + \mu Q(x, y) = R(x, y)$

for some scalars $\lambda, \mu$. Thus, if a ninth point $z_4 = (a, b)$ is in $A$ and $B$, so that $P(a, b) = Q(a, b) = 0$, then $R(a, b) = 0$. Thus $z_4$ lies in $C \cup L$ as well, and if $z_4$ isn’t on $C$, it must be on the line $L$. Thus, “generically” we expect $z_2, z_3, z_4$ to be collinear, whence Hexagon A.

This rough argument isn’t too far removed from a slightly more rigorous one. There’s a general result in projective algebraic geometry called Bézout’s theorem, which says that a degree $m$ planar curve and a degree $n$ planar curve either intersect in $m n$ points (if you count them right, “with multiplicity”) or they have a whole curve component in common. (Fine print: to make this generally true, you have to work in the projective plane, and you have to work over an algebraically closed field.) A much weaker result which removes all the fine print is that a degree $m$ curve and a degree $n$ curve either have a curve component in common, or they intersect in at most $m n$ points. In particular, in the notation above, the cubics $A$ and $B$ intersect in 9 points, 6 of which are on the conic $C$. Pick a seventh point $(a, b)$ on $C$, away from those six, and let $\lambda = Q(a, b)$ and $\mu = -P(a, b)$. Then we see that the locus of the degree 3 polynomial

$T(x, y) = \lambda P(x, y) + \mu Q(a, b) = 0$

intersects the degree 2 conic $C$ in at least 7 points (namely, $x_1, x_2, x_3, y_1, y_2, y_3$ and $(a, b)$), greater than the expected number $3 \cdot 2$, which is impossible  unless the loci of $T$ and $C$ have a component in common. But the conic $C$ has just one component — itself — so one can conclude that its defining degree 2 polynomial (I’ll call it $C(x, y)$) must divide $T$. Then we have

$T(x, y) = \lambda P(x, y) + \mu Q(x, y) = C(x, y)L(x, y)$

for some degree 1 polynomial $L$, so the last three of the nine points of intersection $A \cap B$, which are zeroes of $P$ and $Q$, must be zeroes of the linear polynomial $L$, and hence are collinear. Thus we obtain Pascal’s Mystic Hexagon, version A. $\Box$

It’s clear then that what makes the Mystic Hexagon tick has something to do with the geometry of cubic curves. With that in mind, I’m now going to kick the discussion up a notch, and relate a third rather more sophisticated construction on cubics which basically subsumes the first two constructions. It has to do with so-called “elliptic curves“.

Officially, an elliptic curve is a smooth projective (irreducible) cubic “curve” over the complex numbers. I put “curve” in quotes because while it is defined by an equation $P(x, y) = 0$ where $P$ is a polynomial of degree 3, the coefficients of $P$ are complex numbers as are the solutions to this equation. We say “curve” in the sense that locally it is like a “line”, but this is the complex line $\mathbb{C}$ we’re talking about, so from our real number perspective it is two-dimensional — it actually looks more like a surface. Indeed, an elliptic curve is an example of a Riemann surface. It would take me way too far afield to give explanations, but when you study these things, you find that elliptic curves are Riemann surfaces of genus 1. In more down-to-earth terms, this means they are tori (toruses), or doughnut-shaped as surfaces. Topologically, such tori or doughnuts are cartesian products of two circles.

Now a circle or 1-dimensional sphere $S^1$ carries a continuous (abelian) group structure, if we think of it as the set of complex numbers $\{z: |z| = 1\}$ of norm 1, where the group operation is complex multiplication. A torus $S^1 \times S^1$ also carries a group structure, obtained by multiplying in each of the two components. Thus, given what we have said, an elliptic curve also carries a continuous group structure. But it’s actually much better than that: one can define a group structure on a smooth complex cubic $C$ (in the complex plane $\mathbb{C}^2$, or rather the projective complex plane $P^2(\mathbb{C})$) not just by continuous operations, but by polynomially defined operations, and the definition of the group law is just incredibly elegant as a piece of geometry. Writing the group multiplication as addition, it says that if $a, b, c$ are points on $C$, then

$a + b = -c \qquad (a + b + c = 0)$

if $a, b, c$ are collinear. [To be precise, one must select a point 0 on $C$ to serve as identity, and this point must be one of the nine inflection points of $C$. When $a$ and $b$ coincide (are “infinitesimally close”), the line through $a$ and $b$ is taken to be tangent to $C$; when $a, b, c$ coincide, this is a line of inflection.]

This is rather an interesting thing to prove, that this prescription actually satisfies the axioms for an abelian group. The hardest part is proving associativity, but this turns out to be not unlike what we did for Pascal’s Mystic Hexagon: basically it’s an application of Bézout’s theorem again. (In algebraic geometry texts, such as Hartshorne’s famous book, the discussion of this point can be far more sophisticated, largely because one can and does define elliptic curves as certain abstract 1-dimensional varieties or schemes which have no presupposed extrinsic embeddings as cubic curves in the plane, and there the goal is to understand the operations intrinsically.)

In the special case where $C$ is defined by a cubic polynomial with real coefficients, we can look at the locus of real solutions (or “real points”), and it turns out that this prescription for the group law still works on the real locus, in particular is still well-defined. (Basically for the same reason that if you have two real numbers $a, b$ which are solutions to a real cubic equation $p(x) = 0$, then there is also a third real solution $c$.) There is still an identity element, which will be an inflection point of the cubic.

Okay, here is a third solution to the problem, lifted from one of Noam Elkies’ emails. (The original formulation of the problem spoke in terms of $r$ “red” points (instead of my $x_1, \ldots, x_m$), $b$ “blue” points (instead of my $y_1, \ldots, y_n$), and “blocking” points which play the role of my $z_1, \ldots, z_{m+n-1}$.) The addition referred to is the addition law on an elliptic curve. I’ve taken the liberty of paraphrasing a bit.

“Choose points $B, R$ on the real points of an elliptic curve such that $-(B+R)$ is in-between $B$ and $R$.  Then set

• red points:     $R + i P$, $0 \leq i \leq r-1$
• blue points:    $B + j P$, $1 \leq j \leq b-1$
• blocking points:  $-(R+B+kP)$, $0 \leq k \leq r+b-2$

where $P$ is a real point on the elliptic curve very close to the identity.  The  pair $R + i P$, $B + j P$ is blocked by $-(R + B + (i + j)P$, because these three points are collinear, and the smallness of P guarantees that the blocking point is actually between the red point and blue point, by continuity.”

Well, well. That’s awfully elegant. (According to Noam’s email, it came out of a three-way conversation between Roger Alperin, Joe Buhler, and Adam Chalcraft. Edit: Joe Buhler informs me in email that Joel Rosenberg’s name should be added. More at the end of this post.) Noam had given his own slick solution where again the red and blue points sit on a conic and the blocking points lie on a line not tangent to the conic, and he observed that his configuration was a degenerate cubic, leading him to surmise that his example could in a sense be seen as a special case of theirs.

How’s that? The last solution took place on a smooth (nondegenerate) cubic, so the degenerate cubic = conic+line examples could not, literally speaking, be special cases. Can the degenerate examples be seen in terms of algebraic group structures based on collinearity?

The answer is: yes!  As you slide around in the space of planar cubics, nondegenerate cubics (the generic or typical case) can converge to cubics which are degenerate in varying degrees (including the case of three lines, or even a triple line), but the group laws on nondegenerate cubics based on collinearity converge to group laws, even in degenerate cases! (I hadn’t realized that.)  You just have to be careful and throw away the singular points of the degenerate cubic, but otherwise you can basically still use the definition of the group law based on collineation, although it gets a little tricky saying exactly how you’re supposed to add points on a line component, such as the line of conic+line.

So let me give an example of how it works. It seems convenient for this purpose to use John Armstrong’s model which is based on the parabola+line, specifically the locus of $(y - x^2)x = 0$. The singular points of its projective completion are at $(0, 0)$ and the point where the $y$-axis meets the line at infinity. After throwing those away, what remains is a disjoint union of four pieces: right half of parabola, left half of parabola, positive $y$-axis, negative $y$-axis.

We can maybe guess that since $a + b + c = 0$ implies $a, b, c$ collinear, that the two pieces of the $y$-axis form a subgroup for the group law we are after (also, these two pieces together should suggest the two halves of the multiplicative group of nonzero reals $\mathbb{R}^*$, but don’t jump to conclusions how this works!). If so, then we notice that if $a$ and $b$ lie on the parabola, then the line between them intersects the $y$-axis at a point $c$, so then the parabolic part would not be closed under multiplication.

One is then led to consider that the group structure of this cubic overall is isomorphic to the group $\{-1, 1\} \times \mathbb{R}^*$, with the linear part identified somehow with the subgroup $\{1\} \times \mathbb{R}^*$, and the parabolic part with $\{-1\} \times \mathbb{R}^*$.

I claim that the abelian group structure on the punctured $y$-axis should be defined by

$(0, x) + (0, y) := (0, - x y)$

so that the identity element on the cubic is $(0, -1)$, and the inverse of $(0, x)$ is $(0, 1/x)$. The remainder of the abelian group structure on the cubic is defined as follows:

$(s, s^2) + (t, t^2) := (0, -1/(s t))$

$(0, x) + (s, s^2) := (-s/x, s^2/x^2)$

Undoubtedly this group law looks a bit strange!  So let’s do a spot check. Suppose $(0, a)$, $(s, s^2)$, and $(t, t^2)$ are collinear. Then it is easily checked that $a = -s t$, and each of the two equations

$(0, a) + (s, s^2) = (t^{-1}, t^{-2}) = -(t, t^2)$

$(s, s^2) + (t, t^2) = (0, 1/a) = -(0, a)$

is correct according to the group law, so the three collinear points do add to the identity and everything checks out.

All right, let’s retrieve John’s example as a special case. Take $R = (1, 1)$ as a red point, $B = (-1, 1)$ as a blue point, and $-(B + R) = (0, 1)$ as blocking point. Take a point $P$ “sufficiently close” to the identity, say $(0, -1/2)$. Then

$R + i P = (1, 1) + (0, -1/2^i) = (2^i, 2^{2i})$

$B + j P = (-1, 1) + (0, -1/2^j) = (-2^j, 2^{2j})$

which was John’s solution.

Another long post from yours truly. I was sorry no solutions came from our readers, but if you’d like another problem to chew on, here’s a really neat one I saw just the other day. Feel free to discuss in comments!

Exercise: Given a point on a circle, show how to draw a tangent to the point using ruler only, no compass. Hint: use a mystic hexagon where two of the points are “infinitesimally close”.

Added July 25: As edited in above, Joel Rosenberg also had a hand in the elliptic curves solution, playing an instrumental role in establishing some of the conjectures, such as that $r + b - 1$ is the minimal number of blocking points under certain assumptions, and (what is very nice) that the elliptic curves solution is the most general solution under certain assumptions. I thank Joe Buhler for transmitting this information.

It’s been an awfully long time since I’ve posted anything; time to finally break the silence.

This problem appeared elsewhere on the internet some months ago; some of you may have already seen it. I don’t want to say right away where I saw it, because there was some commentary which included some rough hints which I don’t want to give, but I’ll be sure to give credit when the solution is published. I’ll bet some of you will be able to find a solution, and will agree it’s quite cute. Here it is:

Given integers $m, n \geq 1$, show that it is possible to construct a set of $m + n$ points in the plane, let’s say $x_1, \ldots, x_m, y_1, \ldots, y_n$, so that no three points of the set are collinear, and for which there exist points $z_1, z_2, \ldots, z_{m+n-1}$, all lying on a straight line, and arranged so that on the line between any $x_i$ and any $y_j$, some $z_k$ lies between them.

So no $x_i$ can “see” any $y_j$, because there’s always some $z_k$ blocking the view. As the proposer observed, the problem would be easy if we had $m n$ $z$‘s to play with, one for each pair $(x_i, y_j)$. But here there are only $m+ n-1$ $z$‘s, so some of them will have to do more than their fair share, blocking the view between quite a few $(x, y)$-pairs simultaneously. Thus, you have to arrange the $x$‘s and $y$‘s so that a lot of the lines between them will be coincident at a point $z$, and subject to the constraint I italicized above.

Please submit solutions to topological[dot]musings[At]gmail[dot]com by Friday, July 17, 11:59 pm (UTC); do not submit solutions in Comments. Everyone with a correct solution will be inducted into our Hall of Fame! We look forward to your response.

This being a mathematics blog, I’m sure a lot of readers out like to play Minesweeper. I’ve just obtained a personal best today (94 seconds on expert level) which, as Minesweeper buffs know, is nowhere close to world-class levels, but which made me happy anyway, as I’d never broken into the double digits before today!

It seems to me that world-class competitors must know some tricks with the mouse which I’ve never bothered to master, particularly because my laptop doesn’t have a mouse but rather a touchpad. This being the case, I keep my right index finger on the touchpad to guide the cursor, and the left index finger to click. I always left-click: that is, in my style of play, I [practically] never flag squares for bombs; I click only on non-bomb squares. For it’s well-known, or at least it should be, that the program doesn’t care if you identify where the bombs are — you get full credit for only identifying all the numbered squares.

To play in this style well, one needs to be fluent in a number of tactical tricks, which I don’t have good names for, but which in my personal argot I call things like “1-2-1″, “1-2-2-1″, “rule of three”, to name just a few. But that’s not what I set out to discuss, really. What I’d really like to hear from good players is: what opening strategies do you use?

The personal best I just set came after deciding on a new opening strategy. What I had been doing is clicking along border squares. Under that strategy, one could of course just keep clicking until one opens up an area, but often I would add to that the observation that if one clicked on a 1, thus leading to, e.g.,

x 1 x (–> border row)
x x x

then one could then click on the non-border square just adjacent to the 1, with only a 1 in 5 chance of setting off a bomb. If one then hits another 1:

x 1 x
x 1 x
x x x

then one can immediately open up the line of squares on the third rank, leading to a configuration such as

x 1 x
x 1 x
1 1 1

or better. This is often a cheap and quick way of opening up squares or otherwise getting a tactical toehold.

The new strategy I began using today is not to click along the border, but to click along the ranks or files adjacent to the border. Under this strategy, if one lands on a 1, leading to

x x x  (–> border row)
x 1 x
x x x

then one can click on the border square adjacent to the 1, with only a 1 in 8 chance of setting off a bomb. If one does not set off a bomb, that square has to be a 1:

x 1 x
x 1 x
x x x

and then one can proceed as before. So I’ve just lowered my odds of hitting a bomb, plus a very small fractional gain in processing time that comes with the certain knowledge that it’s a 1 if not a bomb. So far the strategy has paid off well!

I’d like to hear other people’s opening strategies, and also I’d like to know some statistics. For example, considered over the space of expert-level games, what is the probability of getting a 1, a 2, and so on? Does anyone know? (It seems this would be very difficult computing analytically — one is probably better off using a Monte Carlo simulation. But I don’t have the wherewithal to set that kind of thing up.)

The solutions are in! This problem of the week was interesting for me: the usual pattern has been that I pose problems that I’ve solved myself at some point in the past, giving me a kind of “inside edge” on understanding the solutions that come in. But, as I said in POW-12, the difference this time is that the solution I knew of came from someone else (Arin Chaudhuri). What was interesting for me — and given the circumstances, it was probably inevitable — is that some of the solutions we received forced me to make contact with some mathematics I’d only heard about but never studied. Let me get to that in a minute.

Another interesting thing for me — speaking now with my category theorist’s hat on — is how utterly simple and conceptual Arin’s proof was! I was pleased to see that regular problem solver Philipp Lampe also spotted it. Wait for it…

Solution I by Philipp Lampe, University of Bonn: The answer is 8.

Among the eight neighbors of an arbitrary vertex, all colors of an admissible coloring must occur. Thus, 8 is an upper bound for the maximum number of colors one can use. We have to show that there is an admissible coloring with eight colors.

The vertices of the 8-dimensional cube may be represented by vectors $(a_1, a_2, \ldots, a_8)$ with $a_i$ in $\{0,1\}$, in other words as vectors in $\mathbb{F}_{2}^8$, the 8-dimensional vector space over the field $\mathbb{F}_2 = \{0, 1\}$ with two elements (i.e., the integers modulo 2). Two such vectors $u, v$ are neighbors iff their coordinate vectors differ in exactly one place, in other words if $u = v + e_i$ considered modulo 2, where $e_i$ is one of the eight standard basis elements (with $i$-th coordinate 1, and the other coordinates 0).

Now let our “colors” be the 8 elements $x_1, x_2, \ldots, x_8$ of $\mathbb{F}_{2}^3$, the 3-dimensional vector space over $\mathbb{F}_2$. Let the vertex “coloring”

$C: \mathbb{F}_{2}^8 \to \mathbb{F}_{2}^3$

be the unique $\mathbb{F}_2$-linear map such that $C(e_i) = x_i$; that is, define the color of a vertex/vector by

$\displaystyle C\left(a_1,\ldots, a_8\right) = \sum_{i=1}^{8} a_i x_i$

Now, if $v$ is any vector, the colors of its neighbors are $C(v + e_i) = C(v) + C(e_i) = C(v) + x_i$. The colors of these neighbors are all distinct since the $x_i$ are distinct. Hence all 8 colors are represented among the colors of the neighbors of any given vertex $v$, QED.

But I also learned a thing or two by studying the next solution. It relies on the theory of Hamming codes, with which I was not conversant. Up to some small edits, here is exactly what Sune Jakobsen submitted:

Solution II by Sune Jakobsen (first-year student), University of Copenhagen: Since each vertex only has 8 neighbors, the answer cannot be greater that 8.

Now we construct such a coloring with 8 colors. An 8-dimensional cube can be represented by the graph with vertices in $\{0, 1\}^8$, and with an edge between them iff the Hamming distance between them is 1. We color a vertex with color 8 if the seven first bits in the vertex is a “correct” Hamming message (cf. Hamming code (7,4)), and color it in color $i$ if the first seven bits give a correct Hamming message upon changing bit $i$. This is a well-defined coloring, since each element in $\{0, 1\}^7$ is either a correct Hamming message, or is Hamming distance 1 away to exactly one correct Hamming message.

It remains to show that no vertex is neighbor to two vertices of the same color. The Hamming distance between these two vertices is 2, thus the Hamming distance between the first 7 bits of two neighbors must be 1 or 2. If two neighbors had the same color $i$, then by definition one would get two correct Hamming messages by changing bit $i$ in both of them, and the Hamming distance between these messages would be the same as before — either 1 or 2. But the distance between any two correct Hamming messages is at least 3. Contradiction.

Remarks:

1. Let me give a little background to Sune’s solution. Mathematically, the Hamming code called “(7, 4)” is the image of injective linear map

$G: \mathbb{F}_{2}^4 \to \mathbb{F}_{2}^7$

given by the $7 \times 4$ matrix

1101
1011
1000
0111
0100
0010
0001

The Hamming code is what is known as an “error-correcting code”. Imagine that you want to send a 4-bit message (each bit being a 0 or 1) down a transmission line, but that due to noise in the line, there is the possibility of a bit being flipped and the wrong message being received. What can you do to help ensure that the intended message is received?

The answer is to add some “parity check bits” to the message. In the code (7, 4), one adds in three parity checks, so that the transmitted message is 7 bits long. What one does is apply the matrix $G$ to the 4-vector, to get a 7-vector (remember we’re working modulo 2), and this 7-vector is sent down the line. Assuming only a moderate amount of noise in the transmission line, perhaps the 7-bit message will remain intact or perhaps a single bit will be flipped, more rarely two bits will be flipped (and we’ll assume the event that more than two are flipped has negligible probability). Now, the Hamming encoding $G$ is rigged up so that if the 7-bit vector is received as sent, then the parity checks will report 0 errors. If the parity checks report an error, they report precisely where the error occurred if the received vector was off by one bit. (If two flips occurred, then the receiver can still detect from the parity checks that an error occurred. The parity checks can never detect how many bits were flipped, but if the receiver assumes correctly that just one bit got flipped, he can restore the intended message with accuracy. If three or more got flipped, then the receiver got the wrong message, but he would never know it if the parity checks reported back 0 errors.)

How does this work? By having the receiver apply a parity-check matrix $H$ to the received message, given by the $3 \times 7$ matrix

1010101
0110011
0001111

In the first place, $HG = 0$, and so if the message received belongs to the image of $G$ (is a “correct Hamming message” in the terminology of Solution II), which will indeed be the case if there were no errors in the transmission, then $H$ applied to the message will produce the zero vector. In the case where one bit got flipped in the transmission, $H$ applied to the received vector will return a nonzero 3-vector, but the beauty of the Hamming code is that the 3-vector will spell out in binary the bit the error occurred (for example, if the output vector is $(0 1 1)^t$, then error occurred in the bit with binary number 011, that is bit 3). In that case, the receiver flips that bit to restore the original message. In these two cases, the original 4 bits are then read off as the 3rd, 5th, 6th, and 7th coordinates of the (possibly corrected) 7-vector.

By the way, Sune reminded us that Hamming codes also came up in a post by Greg Muller over at The Everything Seminar, who used the existence of a Hamming code in every dimension $2^k - 1$ to solve general hat-guessing puzzles.

2. Within minutes of posting the original problem, we received a message from David Eppstein, who reported that the solution of 8 colors is essentially contained in a paper he coauthored with T. Givaris (page 4); his solution is close in spirit to Sune’s.

Arin Chaudhuri also surmised the connection to Hamming codes, and mentions that the problem (in a slightly different formulation) originally came from a friend of a friend, who blogs about a number of related problems on this page. Presumably the author had things like error-correcting codes in mind.

3. Sune and David noted that their solutions generalize to show that on the $2^k$-dimensional cube (for any $k$), it is possible to color the vertices with $2^k$ colors (the maximum number possible) so that all of the colors show up as colors of the neighbors of any given vertex.

This is very easy to see by adapting Philipp’s method. Indeed, for each $k$ just take the color set to be the $\mathbb{F}_2$-vector space $S = \mathbb{F}_{2}^k$. The set of vertices of the $2^k$-dimensional cube may identified with the $\mathbb{F}_2$-vector space $V(S)$ generated by the set $S$ (as basis), and the desired coloring is then just the $\mathbb{F}_2$-linear map $C: V(S) \to S$ that extends the identity function on $S$. As mathematical constructions go, you can’t get much more canonical than that! No fuss, no muss — it’s a purely categorical construction (categorists call it a counit of an adjunction). So then why didn’t I see that myself? :-)

4. We’re still not sure what the story is in other dimensions. If $a_n$ is the maximum number of admissible colors in dimension $n$, then about all any one of us knows right now is that $a_n = n$ for $n = 2^k$, $a_n$ is weakly monotone increasing, and that $a_n < n$ if $n$ is not a power of 2. There’s a conjecture afloat that $a_n = 2^{\left[\log_2 n\right]}$ where $\left[x\right]$ is the floor function, but for now that should be treated as a wild guess. If anyone has further information, please let us know!

Solved by Arin Chaudhuri, David Eppstein, Sune Jakobsen, and Philipp Lampe. Thanks to all those who wrote in!

Happy holidays, folks! And happy birthday to Topological Musings, which Vishal started just a little over a year ago — we’ve both had a really good time with it.

And in particular with the Problem of the Week series. Which, as we all know, doesn’t come out every week  — but then again, to pinch a line from John Baez’s This Week’s Finds, we never said it would: only that each time a new problem comes out, it’s always the problem for that week! Anyway, we’ve been very gratified by the response and by the ingenious solutions we routinely receive — please keep ‘em coming.

This problem comes courtesy of regular problem-solver Arin Chaudhuri and — I’ll freely admit — this one defeated me. I didn’t feel a bit bad though when Arin revealed his very elegant solution, and I’d like to see what our readers come up with. Here it is:

What is the maximum number of colors you could use to color the vertices of an 8-dimensional cube, so that starting from any vertex, each color occurs as the color of some neighbor of that vertex? (Call two vertices neighbors if they are the two endpoints of an edge of the cube.)

Arin, Vishal, and I would be interested and pleased if someone solved this problem for all dimensions $n$.

Please submit solutions to topological[dot]musings[At]gmail[dot]com by Friday, January 2, 2009, 11:59 pm (UTC); do not submit solutions in Comments. Everyone with a correct solution will be inducted into our Hall of Fame! We look forward to your response.

After a long hiatus, I’d like to renew the discussion of axiomatic categorical set theory, more specifically the Elementary Theory of the Category of Sets (ETCS). Last time I blogged about this, I made some initial forays into “internalizing logic” in ETCS, and described in broad brushstrokes how to use that internal logic to derive a certain amount of the structure one associates with a category of sets. Today I’d like to begin applying some of the results obtained there to the problem of constructing colimits in a category satisfying the ETCS axioms (an ETCS category, for short).

(If you’re just joining us now, and you already know some of the jargon, an ETCS category is a well-pointed topos that satisfies the axiom of choice and with a natural numbers object. We are trying to build up some of the elementary theory of such categories from scratch, with a view toward foundations of mathematics.)

But let’s see — where were we? Since it’s been a while, I was tempted to review the philosophy behind this undertaking (why one would go to all the trouble of setting up a categories-based alternative to ZFC, when time-tested ZFC is able to express virtually all of present-day mathematics on the basis of a reasonably short list of axioms?). But in the interest of time and space, I’ll confine myself to a few remarks.

As we said, a chief difference between ZFC and ETCS resides in how ETCS treats the issue of membership. In ZFC, membership is a global binary relation: we can take any two “sets” $A, B$ and ask whether $A \in B$. Whereas in ETCS, membership is a relation between entities of different sorts: we have “sets” on one side and “elements” on another, and the two are not mixed (e.g., elements are not themselves considered sets).

Further, and far more radical: in ETCS the membership relation $x \in A$ is a function, that is, an element $x$ “belongs” to only one set $A$ at a time. We can think of this as “declaring” how we are thinking of an element, that is, declaring which set (or which type) an element is being considered as belonging to. (In the jargon, ETCS is a typed theory.) This reflects a general and useful philosophic principle: that elements in isolation are considered inessential, that what counts are the aggregates or contexts in which elements are organized and interrelated. For instance, the numeral ‘2’ in isolation has no meaning; what counts is the context in which we think of it (qua rational number or qua complex number, etc.). Similarly the set of real numbers has no real sense in isolation; what counts is which category we view it in.

I believe it is reasonable to grant this principle a foundational status, but: rigorous adherence to this principle completely changes the face of what set theory looks like. If elements “belong” to only one set at a time, how then do we even define such basic concepts as subsets and intersections? These are some of these issues we discussed last time.

There are other significant differences between ZFC and ETCS: stylistically, or in terms of presentation, ZFC is more “top-down” and ETCS is more “bottom-up”. For example, in ZFC, one can pretty much define a subset $\{x \in X: P\}$ by writing down a first-order formula $P$ in the language; the comprehension (or separation) axiom scheme is a mighty sledgehammer that takes care of the rest. In the axioms of ETCS, there is no such sledgehammer: the closest thing one has to a comprehension scheme in the ETCS axioms is the power set axiom (a single axiom, not an axiom scheme). However, in the formal development of ETCS, one derives a comprehension scheme as one manually constructs the internal logic, in stages, using the simple tools of adjunctions and universal properties. We started doing some of that in our last post. So: with ZFC it’s more as if you can just hop in the car and go; with ETCS you build the car engine from smaller parts with your bare hands, but in the process you become an expert mechanic, and are not so rigidly attached to a particular make and model (e.g., much of the theory is built just on the axioms of a topos, which allows a lot more semantic leeway than one has with ZF).

But, in all fairness, that is perhaps the biggest obstacle to learning ETCS: at the outset, the tools available [mainly, the idea of a universal property] are quite simple but parsimonious, and one has to learn how to build some set-theoretic and logical concepts normally taken as “obvious” from the ground up. (Talk about “foundations”!) On the plus side, by building big logical machines from scratch, one gains a great deal of insight into the inner workings of logic, with a corresponding gain in precision and control and modularity when one would like to use these developments to design, say, automated deduction systems (where there tend to be strong advantages to using type-theoretic frameworks).

Enough philosophy for now; readers may refer to my earlier posts for more. Let’s get to work, shall we? Our last post was about the structure of (and relationships between) posets of subobjects $Sub(X)$ relative to objects $X$, and now we want to exploit the results there to build some absolute constructions, in particular finite coproducts and coequalizers. In this post we will focus on coproducts.

Note to the experts: Most textbook treatments of the formal development of topos theory (as for example Mac Lane-Moerdijk) are efficient but highly technical, involving for instance the slice theorem for toposes and, in the construction of colimits, recourse to Beck’s theorem in monad theory applied to the double power-set monad [following the elegant construction of Paré]. The very abstract nature of this style of argumentation (which in the application of Beck’s theorem expresses ideas of fourth-order set theory and higher) is no doubt partly responsible for the somewhat fearsome reputation of topos theory.

In these notes I take a much less efficient but much more elementary approach, based on an arrangement of ideas which I hope can be seen as “natural” from the point of view of naive set theory. I learned of this approach from Myles Tierney, who was my PhD supervisor, and who with Bill Lawvere co-founded elementary topos theory, but I am not aware of any place where the details of this approach have been written up before now. I should also mention that the approach taken here is not as “purist” as many topos theorists might want; for example, here and there I take advantage of the strong extensionality axiom of ETCS to simplify some arguments.

### The Empty Set and Two-Valued Logic

We begin with the easy observation that a terminal category, i.e., a category $\mathbf{1}$ with just one object and one morphism (the identity), satisfies all the ETCS axioms. Ditto for any category $C$ equivalent to $\mathbf{1}$ (where every object is terminal). Such boring ETCS categories are called degenerate; obviously our interest is in the structure of nondegenerate ETCS categories.

Let $\mathbf{E}$ be an ETCS category (see here for the ETCS axioms). Objects of $\mathbf{E}$ are generally called “sets”, and morphisms are generally called “functions” or “maps”.

Proposition 0: If an ETCS category $\mathbf{E}$ is a preorder, then $\mathbf{E}$ is degenerate.

Proof: Recall that a preorder is a category in which there is at most one morphism $A \to B$ for any two objects $A, B$. Every morphism in a preorder is vacuously monic. If there is a nonterminal set $A$, then the monic $A \to 1$ to any terminal set defines a subset $A \subseteq 1$ distinct from the subset defined by $1 \to 1$, thus giving (in an ETCS category) distinct classifying maps $\chi_A, t: 1 \to P(1)$, contradicting the preorder assumption. Therefore all objects $A$ are terminal. $\Box$

Assume from now on that $\mathbf{E}$ is a nondegenerate ETCS category.

Proposition 1: There are at least two truth values, i.e., two elements $1 \to P(1)$, in $\mathbf{E}$.

Proof: By proposition 0, there exist sets $X, Y$ and two distinct functions $f, g: X \to Y$. By the axiom of strong extensionality, there exists $x: 1 \to X$ such that $fx \neq gx$. The equalizer $E \to 1$ of the pair $fx, gx: 1 \to Y$ is then a proper subset of $1$, and therefore there are at least two distinct elements $\chi_E, \chi_1: 1 \to P(1)$. $\Box$

Proposition 2: There are at most two truth values $1 \to P(1)$; equivalently, there are at most two subsets of $1$.

Proof: If $U, V \subseteq 1$ are distinct subsets of $1$, then either $U \neq U \cap V$ or $V \neq U \cap V$, say the former. Then $1_U: U \subseteq U$ and $U \cap V \subseteq U$ are distinct subsets, with distinct classifying maps $\chi_{1_U}, \chi_{U \cap V}: U \to P(1)$. By strong extensionality, there exists $x: 1 \to U$ distinguishing these classifying maps. Because $1$ is terminal, we then infer $1 \subseteq U$ and $U \subseteq 1$, so $U = 1$ as subsets of $1$, and in that case only $V$ can be a proper subset of $1$. $\Box$

By propositions 1 and 2, there is a unique proper subset of the terminal object $1$. Let $0 \to 1$ denote this subset. Its domain may be called an “empty set”; by the preceding proposition, it has no proper subsets. The classifying map $1 \to P1$ of $0 \subseteq 1$ is the truth value we call “false”.

Proposition 3: 0 is an initial object, i.e., for any $X$ there exists a unique function $0 \to X$.

Proof: Uniqueness: if $f, g: 0 \to X$ are maps, then their equalizer $x: E \to 0$, which is monic, must be an isomorphism since 0 has no proper subsets. Therefore $f = g$. Existence: there are monos

$\displaystyle 0 \to 1 \stackrel{t_X}{\to} PX$

$\displaystyle X \stackrel{\sigma}{\to} PX$

where $t_X$ is “global truth” (classifying the subset $X \subseteq X$) on $X$ and $\sigma$ is the “singleton mapping $x \mapsto \{x\}$” on $X$, defined as the classifying map of the diagonal map $\delta: X \subseteq X \times X$ (last time we saw $\sigma$ is monic). Take their pullback. The component of the pullback parallel to $\sigma$ is a mono $P \to 0$ which again is an isomorphism, whence we get a map $0 \cong P \to X$ using the other component of the pullback. $\Box$

Remark: For the “purists”, an alternative construction of the initial set 0 that avoids use of the strong extensionality axiom is to define the subset $0 \subseteq 1$ to be “the intersection all subsets of $1$“. Formally, one takes the extension $\left[\phi\right] \subseteq 1$ of the map

$\displaystyle \phi = (1 \stackrel{t_{P1}}{\to} PP1 \stackrel{\bigcap}{\to} P1);$

where the first arrow represents the class of all subsets of $P1$, and the second is the internal intersection operator defined at the end of our last post. Using formal properties of intersection developed later, this intersection $0 \subseteq 1$ has no proper subsets, and then the proof of proposition 3 carries over verbatim. $\Box$

Corollary 1: For any $X$, the set $0 \times X$ is initial.

Proof: By cartesian closure, maps $0 \times X \to Y$ are in bijection with maps of the form $0 \to Y^X$, and there is exactly one of these since 0 is initial. $\Box$

Corollary 2: If there exists $f: X \to 0$, then $X$ is initial.

Proof: The composite of $\langle f, 1_X \rangle: X \to 0 \times X$ followed by $\pi_2: 0 \times X \to X$ is $1_X$, and $\pi_2$ followed by $\langle f, 1_X \rangle: X \to 0 \times X$ is also an identity since $0 \times X$ is initial by corollary 1. Hence $X$ is isomorphic to an initial object $0 \times X$. $\Box$

By corollary 2, for any object $Y$ the arrow $0 \to Y$ is vacuously monic, hence defines a subset.

Proposition 4: If $X \not\cong 0$, then there exists an element $x: 1 \to X$.

Proof: Under the assumption, $X$ has at least two distinct subsets: $0 \subseteq X$ and $1_X: X \subseteq X$. By strong extensionality, their classifying maps $\chi_0, \chi_1: X \to P1$ are distinguished by some element $x: 1 \to X$.

### External Unions and Internal Joins

One of the major goals in this post is to construct finite coproducts in an ETCS category. As in ordinary set theory, we will construct these as disjoint unions. This means we need to discuss unions first; as should be expected by now, in ETCS unions are considered locally, i.e., we take unions of subsets of a given set. So, let $A, B \subseteq X$ be subsets.

To define the union $A \cup B \subseteq X$, the idea is to take the intersection of all subsets containing $A$ and $B$. That is, we apply the internal intersection operator (constructed last time),

$\displaystyle \bigcap: PPX \to PX,$

to the element $1 \to PPX$ that represents the set of all subsets of $X$ containing $A$ and $B$; the resulting element $1 \to PX$ represents $A \cup B$. The element $1 \to PPX$ corresponds to the intersection of two subsets

$\{C \in PX: A \subseteq C\} \cap \{C \in PX: B \subseteq C\} \subseteq PX.$

Remark: Remember that in ETCS we are using generalized elements: $C \in PX$ really means a function $C: U \to PX$ over some domain $U$, which in turn classifies a subset $\left[C\right] \subseteq U \times X$. On the other hand, the $A$ here is a subset $A \subseteq X$. How then do we interpret the condition “$A \subseteq C$“? We first pull back $\chi_A: 1 \to PX$ over to the domain $U$; that is, we form the composite $\displaystyle U \stackrel{!}{\to} 1 \stackrel{\chi_A}{\to} PX$, and consider the condition that this is bounded above by $C: U \to PX$. (We will write $\chi_A \leq C$, thinking of the left side as constant over $U$.) Externally, in terms of subsets, this corresponds to the condition $U \times A \subseteq \left[C\right]$.

We need to construct the subsets $\{C \in PX: A \subseteq C\}, \{C \in PX: B \subseteq C\}$. In ZFC, we could construct those subsets by applying the comprehension axiom scheme, but the axioms of ETCS have no such blanket axiom scheme. (In fact, as we said earlier, much of the work on “internalizing logic” goes to show that in ETCS, we instead derive a comprehension scheme!) However, one way of defining subsets in ETCS is by taking loci of equations; here, we express the condition $A \subseteq C$, more pedantically $A \subseteq \left[C\right]$ or $\chi_A \leq C$, as the equation

$(\chi_A \Rightarrow C) = t_X$

where the right side is the predicate “true over $X$“.

Thus we construct the subset $\{C \in PX: A \subseteq C\}$ of $PX$ via the pullback:

{C: A ≤ C} -------> 1
|              |
|              | t_X
V chi_A => -   V
PX -----------> PX

Let me take a moment to examine what this diagram means exactly. Last time we constructed an internal implication operator

$\Rightarrow: P1 \times P1 \to P1$

and now, in the pullback diagram above, what we are implicitly doing is lifting this to an operator

$\Rightarrow_X: PX \times PX \to PX$

The easy and cheap way of doing this is to remember the isomorphism $PX \cong P1^X$ we used last time to uncover the cartesian closed structure, and apply this to

$\displaystyle P1^X \times P1^X \cong (P1 \times P1)^X \stackrel{\Rightarrow^X}{\to} P1^X$

to define $\Rightarrow_X: PX \times PX \to PX$. This map classifies a certain subset of $X \times PX \times PX$, which I’ll just write down (leaving it as an exercise which involves just chasing the relevant definitions):

$\{(x, T, S) \in X \times PX \times PX: x \in_X T \Rightarrow x \in_X S\}$

Remark: Similarly we can define a meet operator $\wedge_X: PX \times PX \to PX$ by exponentiating the internal meet $P1 \times P1 \to P1$. It is important to know that the general Heyting algebra identities which we established last time for $P1$ lift to the corresponding identities for the operators $\wedge_X, \Rightarrow_X$ on $PX$. Ultimately this rests on the fact that the functor $(-)^X$, being a right adjoint, preserves products, and therefore preserves any algebraic identity which can be expressed as a commutative diagram of operations between such products.

Hence, for the fixed subset $A \subseteq X$ (classified by $\chi_A: 1 \to PX$), the operator

$\chi_A \Rightarrow -: PX \to PX$

classifies the subset

$\{(x, S): x \in_X A \Rightarrow x \in_X S\} \hookrightarrow X \times PX$

Finally, in the pullback diagram above, we are pulling back the operator $\chi_A \Rightarrow -$ against $t_X$. But, from last time, that was exactly the method we used to construct universal quantification. That is, given a subset

$R \subseteq X \times Y$

we defined $\forall_Y R \subseteq X$ to be the pullback of $t_Y: 1 \hookrightarrow PY$ along $\chi_R: X \to PY$. Putting all this together, the pullback diagram above expresses the definition

$\{C \in PX: A \subseteq C\} := \{C \in PX: \forall_{x \in X} \ x \in_X A \Rightarrow x \in_X C\}$

that one would expect “naively”.

Now that all the relevant constructions are in place, we show that $A \cup B$ is the join of $A$ and $B$ in the poset $Sub(X)$. There is nothing intrinsically difficult about this, but as we are still in the midst of constructing the internal logic, we will have to hunker down and prove some logic things normally taken for granted or zipped through without much thought. For example, the internal intersection operator was defined with the help of internal universal quantification, and we will need to establish some formal properties of that.

Here is a useful general principle for doing internal logic calculations. Let $\chi_R: X \to PY$ be the classifying map of a subset $R \subseteq X \times Y$, and let $f: U \to X$ be a function. Then the composite $\chi_R f: U \to PY$ classifies the subset

$(f \times 1_Y)^{-1}(R) \subseteq U \times Y$

so that one has the general identity $\chi_R f = \chi_{(f \times 1)^{-1}(R)}$. In passing back and forth between the external and internal viewpoints, the general principle is to try to render “complicated” functions $U \to PY$ into a form $\chi_R f$ which one can more easily recognize. For lack of a better term, I’ll call this the “pullback principle”.

Lemma 1: Given a relation $R \hookrightarrow X \times Y$ and a constant $c: 1 \to Y$, there is an inclusion

$\forall_Y R \subseteq (1_X \times c)^{-1}(R)$

as subsets of $X$. (In traditional logical syntax, this says that for any element $c: 1 \to Y$,

$\forall_{y: Y} R(x, y)$ implies $R(x, c)$

as predicates over elements $x \in X$. This is the type of thing that ordinarily “goes without saying”, but which we actually have to prove here!)

Proof: As we recalled above, $\forall_Y R \subseteq X$ was defined to be $\chi_R^{-1}(t_Y)$, the pullback of global truth $t_Y: 1 \to PY$ along the classifying map $\chi_R: X \to PY$. Hold that thought.

Let

$Pc: PY \to P1$

be the map which classifies the subset $\{S \in PY: c \in_Y S\}$. Equivalently, this is the map

$P1^c: P1^Y \to P1^1$

under the canonical isomorphisms $PY \cong P1^Y$, $P1^1 \cong P1$. Intuitively, this maps $f \mapsto f(c)$, i.e., plugs an element $c \in Y$ into an element $f \in P1^Y$.

Using the adjunction $(- \times Y) \dashv (-)^Y$ of cartesian closure, the composite

$\displaystyle X \stackrel{\chi_R}{\to} PY \stackrel{Pc}{\to} P1$

transforms to the composite

$X \times 1 \stackrel{1_X \times c}{\to} X \times Y \stackrel{\chi_{R}}{\to} P1$

so by the pullback principle, $(Pc)\chi_R: X \times 1 \to P1$ classifies $(1_X \times c)^{-1}(R) \subseteq X \times 1 \cong X$.

Equivalently,

$(1_X \times c)^{-1}(R) = \chi_{R}^{-1} (Pc)^{-1}(t) \qquad (1)$

Also, as subsets of $PY$, we have the inclusion

$(t_Y: 1 \hookrightarrow PY) \subseteq (Pc)^{-1}(t) \qquad (2)$

[this just says that $t_Y$ belongs to the subset classified by $Pc$, or equivalently that $c: 1 \to Y$ is in the subset $Y \subseteq Y$]. Applying the pullback operation $\chi_{R}^{-1}$ to (2), and comparing to (1), lemma 1 follows. $\Box$

Lemma 2: If $R \subseteq S$ as subsets of $X \times Y$, then $\forall_Y R \subseteq \forall_Y S$.

Proof: From the last post, we have an adjunction:

$T \times Y \subseteq S$ if and only if $T \subseteq \forall_Y S$

for any subset of $T \subseteq X$. So it suffices to show $\forall_Y R \times Y \subseteq S$. But

$\forall_Y R \times Y \subseteq R \subseteq S$

where the first inclusion follows from $\forall_Y R \subseteq \forall_Y R$. $\Box$

Next, recall from the last post that the internal intersection of $F: 1 \to PPX$ was defined by interpreting the following formula on the right:

$\displaystyle \bigcap F = \forall_{S \in PX} (S \in_{PX} F) \Rightarrow (x \in_X S)$

Lemma 3: If $F \leq G: 1 \to PPX$, then $\displaystyle \bigcap G \leq \bigcap F: 1 \to PX$.

Proof: $F: 1 \to PPX$ classifies the subset $\{S \in PX: S \in_{PX} F\}$, i.e.,  $F$ is identified with the predicate $S \in_{PX} F$ in the argument $S$, so by hypothesis $(S \in_{PX} F) \leq (S \in_{PX} G)$ as predicates on $S$. Internal implication $a \Rightarrow b$ is contravariant in the argument $a$ [see the following remark], so

$\left[(S \in G) \Rightarrow (x \in S)\right] \leq \left[(S \in F) \Rightarrow (x \in S)\right]$

Now apply lemma 2 to complete the proof. $\Box$

Remark: The contravariance of $- \Rightarrow b$, that is, the fact that

$x \leq y$ implies $(y \Rightarrow b) \leq (x \Rightarrow b),$

is a routine exercise using the adjunction [discussed last time]

$a \wedge c \leq b$ if and only if $c \leq (a \Rightarrow b).$

Indeed, we have

$x \wedge (y \Rightarrow b) \leq y \wedge (y \Rightarrow b) \leq b \qquad (*)$

where the first inequality follows from the hypothesis $x \leq y$, and the second follows from $y \Rightarrow b \leq y \Rightarrow b$. By the adjunction, the inequality (*) implies $(y \Rightarrow b) \leq (x \Rightarrow b)$. $\Box$

Theorem 1: For subsets $A, B$ of $X$, the subset $A \cup B$ is an upper bound of $A$ and $B$, i.e., $A, B \subseteq A \cup B$.

Proof: It suffices to prove that $\displaystyle A = \bigcap \{C \in PX: A \subseteq C\}$, since then we need only apply lemma 3 to the trivially true inclusion

$\{C \in PX: A \subseteq C\} \cap \{C \in PX: B \subseteq C\} \subseteq \{C \in PX: A \subseteq C\}$

to infer $A \subseteq A \cup B$, and similarly $B \subseteq A \cup B$. (Actually, we need only show $\displaystyle A \subseteq \bigcap \{C \in PX: A \subseteq C\}$. We’ll do that first, and then show full equality.)

The condition we want,

$A \subseteq \{x \in X: \forall_{S \in PX} (A \subseteq S) \Rightarrow (x \in_X S)\},$

is, by the adjunction $(- \times PX) \dashv \forall_{PX}: Sub(X \times PX) \to Sub(X)$, equivalent to

$A \times PX \subseteq \{(x, S): A \subseteq S \Rightarrow (x \in_X S)\}$

which, by a $\wedge$-$\Rightarrow$ adjunction, is equivalent to

$(A \times PX) \cap (X \times \{S \in PX: A \subseteq S\}) \subseteq \ \in_X \qquad (1)$

as subsets of $X \times PX$. So we just have to prove (1). At this point we recall, from our earlier analysis, that

$\{S \in PX: A \subseteq S\} = \{S \in PX: \forall_{x \in X} x \in_X A \Rightarrow x \in_X S\}$

Using the adjunction $(X \times -) \dashv \forall_X$, as in the proof of lemma 2, we have

$X \times \{S \in PX: \forall_{x \in X} x \in_X A \Rightarrow x \in_X S\}$

$\subseteq \{(x, S): x \in_X A \Rightarrow x \in_X S\} := (A \times PX) \Rightarrow \in_X,$

which shows that the left side of (1) is contained in

$(A \times PX) \cap ((A \times PX) \Rightarrow \in_X) \subseteq \ \in_X,$

where the last inclusion uses another $\wedge$-$\Rightarrow$ adjunction. Thus we have established (1) and therefore also the inclusion

$\displaystyle A \subseteq \bigcap \{S \in PX: A \subseteq S\}$

Now we prove the opposite inclusion

$\displaystyle \bigcap \{S \in PX: A \subseteq S\} \subseteq A,$

that is to say

$\{x \in X: \forall_{S \in PX} A \subseteq S \Rightarrow x \in_X S\} \subseteq A \qquad (**)$

Here we just use lemma 1, applied to the particular element $\chi_A: 1 \to PX$: we see that the left side of (**) is contained in

$\{x \in X: A \subseteq \left[\chi_A\right] \Rightarrow x \in_X A\}$

which collapses to $\{x \in X: x \in_X A\} = A$, since $A = \left[\chi_A\right]$. This completes the proof. $\Box$

Theorem 2: $A \cup B$ is the least upper bound of $A, B$, i.e., if $C \subseteq X$ is a subset containing both $A \subseteq X$ and $B \subseteq X$, then $A \cup B \subseteq C$.

Proof: We are required to show that

$\{x \in X: \forall_{S \in PX} \ (A \subseteq S \wedge B \subseteq S) \Rightarrow x \in_X S\} \subseteq C.$

Again, we just apply lemma 1 to the particular element $\chi_C: 1 \to PX$: the left-hand side of the claimed inclusion is contained in

$\{x \in X: (A \subseteq C \wedge B \subseteq C) \Rightarrow x \in_X C\}$

but since $(A \subseteq C \wedge B \subseteq C)$ is true by hypothesis (is globally true as a predicate on the implicit variable $x \in X$), this last subset collapses to

$\{x \in X: t_X \Rightarrow x \in_X C\} = \{x \in X: x \in_X C\} = C$

which completes the proof. $\Box$

Theorems 1 and 2 show that for any set $X$, the external poset $Sub(X)$ admits joins. One may go on to show (just on the basis of the topos axioms) that as in the case of meets, the global external operation of taking joins is natural in $X$, so that by the Yoneda principle, it is classified by an internal join operation

$\vee: P1 \times P1 \to P1,$

namely, the map which classifies the union of the subsets

$\displaystyle \left[\pi_1\right] = P1 \times 1 \stackrel{1 \times t}{\hookrightarrow} P1 \times P1$

$\displaystyle \left[\pi_2\right] = 1 \times P1 \stackrel{t \times 1}{\hookrightarrow} P1 \times P1$

and this operation satisfies all the expected identities. In short, $P1$ carries an internal Heyting algebra structure, as does $PX \cong P1^X$ for any set $X$.

We will come back to this point later, when we show (as a consequence of strong extensionality) that $P1$ is actually an internal Boolean algebra.

### Construction of Coproducts

Next, we construct coproducts just as we do in ordinary set theory: as disjoint unions. Letting $X, Y$ be sets (objects in an ETCS category), a disjoint union of $X$ and $Y$ is a pair of monos

$i: X \hookrightarrow Z \qquad j: Y \hookrightarrow Z$

whose intersection is empty, and whose union or join in $Sub(Z)$ is all of $Z$. We will show that disjoint unions exist and are essentially unique, and that they satisfy the universal property for coproducts. We will use the notation $X + Y$ for a disjoint union.

Theorem 3: A disjoint union of $X$ and $Y$ exists.

Proof: It’s enough to embed $X, Y$ disjointly into some set $C$, since the union of the two monos in $Sub(C)$ would then be the requisite $Z$. The idea now is that if a disjoint union or coproduct $X+Y$ exists, then there’s a canonical isomorphism $P(X + Y) \cong PX \times PY$. Since the singleton map

$\sigma: X + Y \to P(X + Y) \cong PX \times PY$

is monic, one thus expects to be able to embed $X$ and $Y$ disjointly into $PX \times PY$. Since we can easily work out how all this goes in ordinary naive set theory, we just write out the formulas and hope it works out in ETCS.

In detail, define $i_X: X \to PX \times PY$ to be

$\displaystyle X \cong X \times 1 \stackrel{\sigma_X \times \chi_0}{\to} PX \times PY$

where $\sigma_X$ is the singleton mapping and $\chi_0$ classifies $0 \hookrightarrow Y$; similarly, define $i_Y: Y \to PX \times PY$ to be

$\displaystyle Y \cong 1 \times Y \stackrel{\chi_0 \times \sigma_Y}{\to} PX \times PY.$

Clearly $i_X$ and $i_Y$ are monic, so to show disjointness we just have to show that their pullback is empty. But their pullback is isomorphic to the cartesian product of the pullbacks of the diagrams

$\displaystyle X \stackrel{\sigma_X}{\to} PX \stackrel{\chi_0}{\leftarrow} 1 \qquad 1 \stackrel{\chi_0}{\to} PY \stackrel{\sigma_Y}{\leftarrow} Y$

so it would be enough to show that each (or just one) of these two pullbacks is empty, let’s say the first.

Suppose given a map $h: A \to X$ which makes the square

  A -------> 1
|          |
h |          | chi_0
V sigma_X  V
X -------> PX

commute. Using the pullback principle, the map $A \to 1 \stackrel{\chi_0}{\to} PX$ classifies

$0 \times A \hookrightarrow X \times A$

which is just the empty subset. This must be the same subset as classified by $\sigma_X h = \chi_{\delta}h$ (where $\delta: X \hookrightarrow X \times X$ is the diagonal), which by the pullback principle is

$(1_X \times h)^{-1}(\delta) \hookrightarrow X \times A.$

An elementary calculation shows this to be the equalizer of the pair of maps

$\pi_1, h\pi_2: X \times A \stackrel{\to}{\to} X$

So this equalizer $E$ is empty. But notice that $\langle h, 1 \rangle: A \to X \times A$ equalizes this pair of maps. Therefore we have a map $A \to E \cong 0$. By corollary 2 above, we infer $A \cong 0$. This applies to the case where $A$ is the pullback, so the pullback is empty, as was to be shown. $\Box$

Theorem 4: Any two disjoint unions of $X, Y$ are canonically isomorphic.

Proof: Suppose $i: X \to Z \leftarrow Y: j$ is a disjoint union. Define a map

$\phi= \langle \phi_1, \phi_2 \rangle: Z \to PX \times PY$

where $\phi_1: Z \to PX$ classifies the subset $\langle 1_X, i \rangle: X \to X \times Z$, and $\phi_2: Z \to PY$ classifies the subset $\langle 1_Y, j \rangle: Y \to Y \times Z$. Applying the pullback principle, the composite $\phi_1 i: X \to PX$ classifies

$(1_X \times i)^{-1}(\langle 1_X, i \rangle) \hookrightarrow X \times X$

which is easily seen to be the diagonal on $X$. Hence $\phi_1 i = \sigma_X$. On the other hand, $\phi_1 j: Y \to PX$ classifies the subset

$(1_X \times j)^{-1}(\langle 1_X, i \rangle) \hookrightarrow X \times Y$

which is empty because $i$ and $j$ are disjoint embeddings, so $\phi_1 j = \chi_0: Y \to PX$. Similar calculations yield

$\phi_2 i = \chi_0: X \to PY \qquad \phi_2 j = \sigma_Y: Y \to PY.$

Putting all this together, we conclude that $\phi i = i_X: X \to PX \times PY$ and $\phi j = i_Y: Y \to PX \times PY$, where $i_X$ and $i_Y$ were defined in the proof of theorem 3.

Next, we show that $\phi$ is monic. If not, then by strong extensionality, there exist distinct elements $c, d: 1 \to Z$ for which $\phi(c) = \phi(d)$; therefore, $\phi_1 c = \phi_1 d: 1 \to PX$ and $\phi_2 c = \phi_2 d: 1 \to PY$. By the pullback principle, these equations say (respectively)

$c^{-1}(i) = d^{-1}(i) \hookrightarrow 1 \qquad c^{-1}(j) = d^{-1}(j) \hookrightarrow 1$

If $c^{-1}(i) = 1$, then both $c, d: 1 \to Z$ factor through the mono $i: X \to Z$. However, since $\phi i = i_{X}$ is monic, this would imply that $\phi (c) \neq \phi (d)$, contradiction. Therefore $c^{-1}(i) = c \cap i = 0$. By similar reasoning, $c \cap j = 0$. Therefore

$i \subseteq \neg c \qquad j \subseteq \neg c$

where $\neg = (- \Rightarrow 0): Sub(Z) \to Sub(Z)$ is the negation operator. But then $i \cup j \subseteq \neg c$. And since $1: Z \hookrightarrow Z$ is the union $i \cup j$ by assumption, $\neg c$ must be the top element $\top \in Sub(Z)$, whence $c: 1 \to Z$ is the bottom element 0. This contradicts the assumption that the topos is nondegenerate. Thus we have shown that $\phi$ must be monic.

The argument above shows that $\phi: Z \hookrightarrow PX \times PY$ is an upper bound of $i_X: X \to PX \times PY$ and $i_Y: Y \to PX \times PY$ in $Sub(PX \times PY)$. It follows that the join $X + Y$ constructed in theorem 3 is contained in $\phi: Z \to PX \times PY$, and hence can be regarded as the join of $X$ and $Y$ in $Sub(Z)$. But $Z$ is their join in $Sub(Z)$ by assumption of being a disjoint union, so the containment $X + Y \subseteq Z$ must be an equality. The proof is now complete. $\Box$

Theorem 5: The inclusions $i_X: X \to X + Y$, $i_Y: Y \to X + Y$ exhibit $X + Y$ as the coproduct of $X$ and $Y$.

Proof: Let $f: X \to B$, $g: Y \to B$ be given functions. Then we have monos

$\langle 1_X, f \rangle: X \hookrightarrow X \times B \qquad \langle 1_Y, g \rangle: Y \hookrightarrow Y \times B \qquad (1)$

Now the operation $- \times B: Sub(C) \to Sub(C \times B)$ certainly preserves finite meets, and also preserves finite joins because it is left adjoint to $\forall_B: Sub(C \times B) \to Sub(C)$. Therefore this operation preserves disjoint unions; we infer that the monos

$X \times B \stackrel{i_X \times B}{\to} (X + Y) \times B \qquad Y \times B \stackrel{i_Y \times B}{\to} (X + Y) \times B \qquad (2)$

exhibit $(X + Y) \times B$ as a disjoint union of $X \times B, Y \times B$. Composing the monos of (1) and (2), we have disjoint embeddings of $X$ and $Y$ in $(X + Y) \times B$. Using theorem 4, $X + Y$ is isomorphic to the join of these embeddings; this means we have an inclusion

$\langle 1, h \rangle: X + Y \hookrightarrow (X + Y) \times B$

whose restriction to $X$ yields $\langle i_X, f \rangle$ and whose restriction to $Y$ yields $\langle i_Y, g \rangle$. Hence $h: X + Y \to B$ extends $f$ and $g$. It is the unique extension, for if there were two extensions $h, h'$, then the equalizer of $\langle 1, h \rangle$ and $\langle 1, h' \rangle$ would be an upper bound of $X, Y$ in $Sub((X + Y) \times B)$, contradicting the fact that $X + Y$ is the least upper bound. This completes the proof. $\Box$

I think that’s enough for one day. I will continue to explore the categorical structure and logic of ETCS next time.

The solutions to POW-11 are in! Or make that solution, unless you also count your hosts': only one reader responded. Perhaps everyone has been distracted by the upcoming US presidential election?!

Composite solution by Philipp Lampe, University of Bonn, and Todd and Vishal: We claim there are exactly two functions $h: \mathbb{N} \to \mathbb{N}$ which satisfy the identity $h(p^2 + q^2) = h(p)^2 + h(q)^2$: the identity function and the function which is identically zero.

Clearly $h(0) = h(0^2 + 0^2) = 2h(0)^2$ forces $h(0) = 0$, whence $h(1) = h(1^2 + 0^2) = h(1)^2$.  Therefore $h(1) = 0$ or $h(1) = 1$, and the claim above follows if we show that $h(p) = ph(1)$ for all $p \in \mathbb{N}$. The first few cases are built up as follows:

$h(2) = h(1^2 + 1^2) = 2h(1)^2 = 2h(1)$

$h(4) = h(2^2) = (2h(1))^2 = 4h(1)$

$h(5) = h(2^2 + 1^2) = (2h(1))^2 + h(1)^2 = 5h(1)$

$h(3) = \sqrt{h(3^2 + 4^2) - h(4)^2} = \sqrt{h(5)^2 - h(4)^2} = \sqrt{(25-16)h(1)^2}$

$= 3h(1)$

$h(8) = h(2^2 + 2^2) = (2h(1))^2 + (2h(1))^2 = 8h(1)$

$h(10) = h(3^2 + 1^2) = (3h(1))^2 + h(1)^2 = 10h(1)$

$h(6) = \sqrt{h(6^2 + 8^2) - h(8)^2} = \sqrt{h(10)^2 - h(8)^2}$

$= \sqrt{(100-64)h(1)^2} = 6h(1)$

where in a few places we have invoked the simple lemma that $h(p^2) = h(p)^2$ [since $h(p^2 + 0^2) = h(p)^2 + 0^2$]. The remainder of the argument proceeds by strong induction. For odd $p = 2m+1 > 5$, we have

$(2m+1)^2 + (m-2)^2 = (2m-1)^2 + (m+2)^2 \qquad (1)$

Since $h$ preserves sums of two squares,  we derive

$h(2m+1)^2 + h(m-2)^2 = h(2m-1)^2 + h(m+2)^2$

Therefore, using the inductive hypothesis that $h(q) = qh(1)$ for $q < 2m+1$, we have

$h(2m+1)^2 + (m-2)^2h(1) = (2m-1)^2 h(1) + (m+2)^2 h(1) \qquad (2)$

From equation (1) we also have

$(2m+1)^2h(1) + (m-2)^2h(1) = (2m-1)^2h(1) + (m+2)^2h(1) \qquad (3)$

Comparing equations (2) and (3), we infer $h(2m+1) = (2m+1)h(1)$. In particular, $h(q) = qh(1)$ for $q = 7, 9$, and we have the case $h(8) = 8h(1)$ from before. This will be used to start the induction for even $p$, starting from $p = 10$.

For even $p = 2m+2$, where $m \geq 4$, we have

$(2m + 2)^2 + (m-4)^2 = (2m-2)^2 + (m+4)^2 \qquad (4)$

and by arguing as we did above, we derive the two equations

$h(2m+2)^2 + h(m-4)^2 = h(2m-2)^2 + h(m+4)^2$

$h(2m+2)^2 + (m-4)^2h(1) = (2m-2)^2h(1) + (m+4)^2h(1)$

where the second follows from the first by inductive hypothesis. Multiplying (4) through by $h(1)$ and comparing with the last equation, we conclude $h(2m+2) = (2m+2)h(1)$, and this completes the proof.

Remarks:

The underlying idea is to exploit the fact that many numbers can be expressed as a sum of two squares in more than one way, so that one can generate enough relations of the form $h(p)^2 + h(q)^2 = h(r)^2 + h(s)^2$ to nail down all the $h(p)$ [in terms of $h(1)$]. A powerful way of generating equations of the form $p^2 + q^2 = r^2 + s^2$ is to consider prime factorizations in the ring of Gaussian integers $\mathbb{Z}\left[i\right]$. For example, in the Gaussian integers we have the prime factorizations

$5 = (2 + i)(2-i) \qquad 13 = (2 + 3i)(2-3i)$

so that there are two ways of writing their product 65 in the form $z \cdot \bar{z}$:

$(2+i)(2+3i) \cdot (2-i)(2-3i), \qquad (2+i)(2-3i) \cdot (2-i)(2+3i)$

Here we have, respectively, $z = 1 + 8i$, $z = 7 - 4i$, whence $65 = z \cdot \bar{z} = 1^2 + 8^2 = 7^2 + 4^2$. Extrapolating from this, if we set

$z = (a+bi)(c+di), \qquad z' = (a+bi)(c-di)$

then we have $|z|^2 = z \cdot \bar{z} = z' \cdot \bar{z'} = |z'|^2$, whence we derive

$(ac - bd)^2 + (bc + ad)^2 = (ac + bd)^2 + (bc - ad)^2.$

This gives a large supply of available relations to work with, and from there it is not too bad to get an inductive argument up and flying, once some base cases have been cleared out of the way with some numerical experimentation.

[Although it’s a little unusual to need such an outlay of base cases (up to at least $p = 6$) before one can start the induction; it creates a psychological hurdle, I feel. It reminds me of POW-6, where no complete solutions were received: one has to fiddle with a number of base cases before the general argument can be pushed through.]

Incidentally, while researching this problem, I found the wonderful website MathPages, created by an all but anonymous author named Kevin Brown. I really recommend checking out this website, and I hope that it is somehow kept alive: it would be a shame for such a treasure trove to be buried and forgotten in some deep hollow of cyberspace.

Time for another problem of the week! This one doesn’t necessarily require any number-theoretic knowledge, although a little bit might help point you in the right direction. As usual, let $\mathbb{N}$ be the set of natural numbers, i.e., the set of nonnegative integers.

Describe all functions $h: \mathbb{N} \to \mathbb{N}$ such that $h(p^2 + q^2) = h(p)^2 + h(q)^2$ for all $p, q \in \mathbb{N}$.

Please submit solutions to topological[dot]musings[At]gmail[dot]com by Sunday, November 2, 11:59 pm (UTC); do not submit solutions in Comments. Everyone with a correct solution will be inducted into our Hall of Fame! We look forward to your response.

A reader brought up essentially this question: does anyone happen to know a proof that $e^{x^2}$ does not possess an elementary antiderivative? By “elementary”, I mean a member of the class of functions which contains all constants valued in the complex numbers, the identity function, the exponential and log functions, and closed under the four basic arithmetic operations and composition.

The solutions are in! As is so often the case, solvers came up with a number of different approaches; the one which follows broadly represents the type of solution that came up most often. I’ll mention some others in the remarks below, some of it connected with the lore of Richard Feynman.

Solution by Nilay Vaish: The answer to POW-10 is $\displaystyle \frac{\pi}{2}\ln 2$. Put

$\displaystyle I = \int_{0}^{\pi/2} \frac{x}{\tan x} dx = \int_{0}^{\pi/2} \frac{x\cos x}{\sin x} dx$

and integrate by parts: we have

$\displaystyle I = x\ln \sin x |_{0}^{\pi/2} - \int_{0}^{\pi/2} \ln \sin x dx.$

The first term vanishes by a simple application of L’hôpital’s rule. We now have

$\displaystyle I \stackrel{(1)}{=} -\int_{0}^{\pi/2} \ln \sin x dx = -\int_{0}^{\pi/2} \ln \cos x dx$

where the second equation follows from $\cos x = \sin(\pi/2 - x)$, and the general elementary fact that $\int_{b}^{c} f(x) dx = \int_{a-c}^{a-b} f(a-x) dx.$ Adding the last two displayed equations, we obtain

$\displaystyle 2I = -\int_{0}^{\pi/2} \ln (\sin (x) \cos (x)) dx = -\int_{0}^{\pi/2} \ln(\frac{\sin 2x}{2}) dx.$

This gives

$\displaystyle 2I = \int_{0}^{\pi/2} \ln 2 dx - \int_{0}^{\pi/2} \ln \sin 2x dx = \frac{\pi \ln 2}{2} - \frac1{2} \int_{0}^{\pi} \ln \sin t dt \qquad (2)$

after a simple substitution. The last integral splits up as two integrals:

$\displaystyle \int_{0}^{\pi} \ln \sin t dt = \int_{0}^{\pi/2} \ln \sin t dt + \int_{\pi/2}^{\pi} \ln \sin t dt \qquad (3)$

but these two integrals are equal, using the identity $\sin t = \sin(\pi - t)$ together with the general integration fact cited above. Hence the two sides of (3) equal

$\displaystyle 2\int_{0}^{\pi/2} \ln \sin t dt = -2I \qquad (4),$

recalling equation (1) above. Substituting this for the last integral in equation (2), we arrive at

$\displaystyle 2I = \frac{\pi}{2} \ln 2 + I$

whence we derive the value of the desired integral $I$ .

Remarks:

1. A number of solvers exploited variations on the theme of the solution above, which could be summarized as involving symmetry considerations together with a double angle formula. For example, Philipp Lampe and Paul Shearer in their solutions made essential use of the identity

$2 \cot(2x) = \cot x - \tan x$

in conjunction with the complementarity $\cot x = \tan(\pi/2 - x)$, and the general integration fact cited above.

2. Arin Chaudhuri (and Vishal in private email) pointed out to me that the evaluation of the integral

$\displaystyle \int_{0}^{\pi/2} \ln \sin x dx$

is actually fairly well-known: it appears for example in Complex Analysis by Ahlfors (3rd edition, p. 160) to illustrate contour integration of a complex analytic function via the calculus of residues, and no doubt occurs in any number of other places.

3. Indeed, Simon Tyler in his solution referred to this as an integral of Clausen type, and gave a clever method for evaluating it: we have

$\displaystyle \int_{0}^{\pi/2} \ln \sin x dx = \frac1{2}\int_{0}^{\pi/2} \ln \frac1{2} (1 - \cos 2x) dx$

which works out to

$\displaystyle -\frac{\pi \ln 2}{4} + \frac1{4}\int_{0}^{\pi} \ln(1 - \cos x) dx \qquad (*)$

The last integral can be expanded as a series

$\displaystyle -\sum_{n \geq 1} \int_{0}^{\pi} \frac1{n} \cos^n(x) dx$

where the summands for odd $n$ vanish; the other terms can be collected and then resummed to yield

$\displaystyle \frac1{2} \int_{0}^{\pi} \ln(1 - \cos^2 x) dx = \int_{0}^{\pi} \ln \sin x dx = 2 \int_{0}^{\pi/2} \ln \sin x dx$

and by substituting this for the integral in (*), the original integral is easily evaluated.

4. Arin C. later described still another method which he says he got from an exercise in a book by Apostol — it’s close in spirit to the one I myself had in mind, called “differentiation under the integral sign”, famously referred to in Surely You’re Joking, Mr. Feynman!. As Feynman recounts, he never really learned fancy methods like complex contour integration, but he did pick up this method of differentiating under the integral sign from an old calculus book. In typical Feynman style, he writes,

“It turns out that’s not taught very much in the universities; they don’t emphasize it. But I caught on how to use that method, and I used that one damn tool again and again. So because I was self-taught using that book [Wilson’s Advanced Calculus], I had peculiar methods of doing integrals. The result was, when guys at MIT or Princeton had trouble doing a certain integral, it was because they couldn’t do it with the standard methods they had learned in school. If it was contour integration, they would have found it; if it was a simple series expansion, they would have found it. Then I come along and try differentiating under the integral sign, and often it worked. So I got a great reputation for doing integrals, only because my box of tools was different from everybody else’s, and they had tried all their tools on it before giving the problem to me.”

So, what is this method? Wikipedia has a pretty good article on it; the idea is to view a given definite integral $\int_{a}^{b} f(x) dx$ as an instance of a smoothly parametrized family $G(t) = \int_{a}^{b} F(x, t) dx$ of integrals, where the extra parameter $t$ is inserted at some judicious spot, in such a way that $G(t)$ has an easy-to-integrate derivative. Then one figures out $G$, and evaluates it at the $t$ which yields the original definite integral.

The best way to understand it is through an example. Pretend you’re Feynman, and someone hands you

$\displaystyle \int_{0}^{1} \frac{x^{10} - 1}{\ln x} dx$

[I suppose there may be a contour integration method to handle that one, but never mind — you’re Feynman now, and not supposed to know how to do that!] After fiddling around a little, you find that by inserting a parameter $t$ in the exponent,

$\displaystyle G(t) := \int_{0}^1 \frac{x^{10 t} - 1}{\ln x} dx,$

this $G(t)$ has a very simple derivative:

$\displaystyle G'(t) = \frac{d}{dt} \int_{0}^{1} \frac{x^{10 t} - 1}{\ln x} dx = \int_{0}^{1} \frac{\partial}{\partial t} \frac{x^{10 t} - 1}{\ln x} dx$

i.e., by differentiating under the integral sign, you manage to kill off that annoying factor $\ln x$ in the denominator:

$\displaystyle G'(t) = \int_{0}^{1} 10 x^{10 t} dx = \frac{10 x^{10 t + 1}}{10 t + 1} |_{0}^{1} = \frac{10}{10 t + 1}.$

We therefore have

$G(t) = \ln(10 t + 1) + C,$

and notice that from the original definition of $G(t)$, we have $G(0) = 0$. Thus $C = 0$, and the problem integral evaluates to $G(1) = \ln 11$. Bam!

It takes a little experience to judge where or how to stick in the extra parameter $t$ to make this method work, but the Wikipedia article has some good practice problems, including the integral of POW-10. For this problem they recommend considering

$\displaystyle G(t) = \int_{0}^{\pi/2} \frac{\tan^{-1} (t\tan x)}{\tan x} dx$

and I’ll let you the reader take it from there.

The point is not that this method is much more elegant than others, but that a little practice with it should be enough to convince one that it’s incredibly powerful, and often succeeds where other methods fail. (Not always, of course, and Feynman had to concede defeat when in response to a challenge, his pal Paul Olum concocted some hellacious integral which he said could only be done via contour integration.)

Doron Zeilberger is also fond of this method (as witnessed by this paper). Something about it makes me wonder whether it’s secretly connected with the idea of “creative telescoping” and the powerful methods (e.g., see here) developed by Gosper, Wilf, Zeilberger, and others to establish identities of hypergeometric type. But I haven’t had time to consider this carefully.

Also solved by Arin Chaudhuri, Philipp Lampe (University of Bonn), Paul Shearer (University of Michigan), and Simon Tyler. Thanks to all who wrote in!

• 260,977 hits