You are currently browsing the monthly archive for June 2008.

I’ll continue then with this brief subseries on category theory. Today I want to talk more about universal properties, and about the notion of natural transformation. Maybe not today, but soon at any rate, I want to tie all this in with the central concept of representability, which leads directly and naturally to the powerful and fundamental idea of adjoint functors. This goes straight to the very heart of category theory.

The term “natural”, often bandied about by mathematicians, is perhaps an overloaded term (see the comments here for a recent disagreement about certain senses of the word). I don’t know the exact history of the word as used by mathematicians, but by the 1930s and 40s the description of something as “natural” was part of the working parlance of many mathematicians (in particular, algebraic topologists), and it is to the great credit of Eilenberg and Mac Lane that they sought to give the word a precise mathematical sense. A motivating problem in their case was to prove a universal coefficient theorem for Cech cohomology, for which they needed certain comparison maps (transformations) which cohered by making certain diagrams commute (which was the naturality condition). In trying to precisely define this concept of naturality, they were led to the concept of a “functor” and then, to define the concept of functor, they were led back to the notion of category! And the rest, as they say, is history.

More on naturality in a moment. Let me first give a few more examples of universal constructions. Last time we discussed the general concept of a cartesian product — obviously in honor of Descartes, for his tremendous idea of the method of coordinates and analytic geometry.

But of course products are only part of the story: he was also interested in the representation of equations by geometric figures: for instance, representing an equation y = f(x) as a subset of the plane. In the language of category theory, the variable y denotes the second coordinate or second projection map \pi_2: \mathbb{R} \times \mathbb{R} \to \mathbb{R}, and f(x) denotes the composite of the first projection map followed by some given map f:

\displaystyle \mathbb{R} \times \mathbb{R} \stackrel{\pi_1}{\to} \mathbb{R} \stackrel{f}{\to} \mathbb{R}.

The locus of the equation y = f(x) is the subset of \mathbb{R} \times \mathbb{R} where the two morphisms \pi_2 and f \circ \pi_1 are equal, and we want to describe the locus L in a categorical way (i.e., in a way which will port over to other categories).

Definition: Given a pair of morphisms

\displaystyle f, g: X \stackrel{\to}{\to} Y

their equalizer consists of an object L and a map e: L \to X, such that f \circ e = g \circ e, and satisfying the following universal property: for any map h: A \to X such that f \circ h = g \circ h, there exists a unique map j: A \to L such that h = e \circ j (any map h: A \to X that equalizes f and g factors in a unique way through the equalizer e: L \to X). \Box

Another way of saying it is that there is a bijection between (f, g)-equalizing maps h: A \to X and maps j: A \to L,

\displaystyle \frac{h: A \to X \mbox{  such that  } fh = gh}{j: A \to L \qquad },

effected by composing such maps j with the universal (f, g)-equalizing map e: L \to X.

Exercise: Apply a universality argument to show that any two equalizers of a given pair of maps (f, g) are isomorphic.

It is not immediately apparent from the definition that an equalizer e: L \to X describes a “subthing” (e.g., a subset) of X, but then we haven’t even discussed subobjects. The categorical idea of subobject probably takes some getting used to anyway, so I’ll be brief. First, there is the idea of a monomorphism (or a “mono” for short), which generalizes the idea of an injective or one-to-one function. A morphism f: S \to T is monic if for all g, h: A \to S, f \circ g = f \circ h implies g = h. Monos with codomain T are preordered by a relation \leq, where

(e: R \to T) \leq (f: S \to T)

if there exists g: R \to S such that e = f \circ g. (Such a g is unique since f is monic, so it doesn’t need to be specified particularly; also this g is easily seen to be monic [exercise].) Then we say that two monics e, f mapping into T name the same subobject of T if e \leq f and f \leq e; in that case the mediator g is an isomorphism. Writing e \sim f to denote this condition, it is standard that \sim is an equivalence relation.

Thus, a subobject of X is an equivalence class of monos into X. So when we say an equalizer e: L \to X of maps f, g: X \to Y defines a subobject of X, all we really mean is that e is monic. Proof: Suppose eh = ej for maps h, j: A \to X. Since fe = ge, we have f(ej) = g(ej) for instance. By definition of equalizer, this means there exists a unique map k: A \to X for which eh = ej = ek. Uniqueness then implies h, j are equal to this self-same k, so h = j and we are done.

Let me turn to another example of a universal construction, which has been used in one form or another for centuries: that of “function space”. For example, in the calculus of variations, one may be interested in the “space” of all (continuous) paths \alpha: I = \left[0, 1\right] \to X in a physical space X, and in paths which minimize “action” (principle of least action).

If X is a topological space, then one is faced with a variety of choices for topologizing the path space (denoted X^I). How to choose? As in our discussion last time of topologizing products, our view here is that the “right” topology will be the unique one which ensures that an appropriate universal property is satisfied.

To get started on this: the points of the path space X^I are of course paths \alpha: I \to X, and paths in the path space, I \to X^I, sending each s \in I to a path \alpha_s: I \to X, should correspond to homotopies between paths, that is continuous maps h: I \times I \to X; the idea is that h(s, t) := \alpha_s(t). Now, just knowing what paths in a space Y = X^I look like (homotopies between paths) may not be enough to pin down the topology on Y, but: suppose we now generalize. Suppose we decree that for any space Z, the continuous maps Z \to X^I should correspond exactly to continuous maps h: Z \times I \to X, also called homotopies. Then that is enough to pin down the topology on X^I. (We could put it this way: we use general spaces Z to probe the topology of X^I.)

This principle applies not just to topology, but is extremely general: it applies to any category! I’ll state it very informally for now, and more precisely later:

Yoneda principle: to determine any object Y up to isomorphism, it suffices to understand what general maps Z \to Y mapping into it look like.

For instance, a product X_1 \times X_2 is determined up to isomorphism by knowing what maps Z \to X_1 \times X_2 into it look like [they look like pairs of maps (Z \to X_1, Z \to X_2)]. In the first lecture in the Stone duality, we stated the Yoneda principle just for posets; now we are generalizing it to arbitrary categories.

In the case at hand, we would like to express the bijection between continuous maps

\displaystyle \frac{f: Z \to X^I}{h: Z \times I \to X}

as a working universal property for the function space X^I. There is a standard “Yoneda trick” for doing this: probe the thing we seek a universal characterization of with the identity map, here 1_{X^I}: X^I \to X^I. Passing to the other side of the bijection, the identity map corresponds to a map

ev: X^I \times I \to X

and this is the “universal map” we need. (I called it ev because in this case it is the evaluation map, which maps a pair (path \alpha: I \to X, point t \in I) to \alpha(t), i.e., evaluates \alpha at t.)

Here then is the universal characterization of the path space: a space X^I equipped with a continuous map ev: X^I \times I \to X, satisfying the following universal property: given any continuous map h: Z \times I \to X, there exists a unique continuous map f: Z \to X^I such that h is retrieved as the composite

\displaystyle Z \times I \stackrel{f \times 1_I}{\to} X^I \times I \stackrel{ev}{\to} X

(for the first arrow in the composite, cf. the exercise stated at the end of the last lecture).

Exercise: Formulate a universality argument that this universal property determines X^I up to isomorphism.

Remark: Incidentally, for any space X, such a path space exists; its topology turns out to be the topology of “uniform convergence”. We can pose a similar universal definition of any function space X^Y (replacing I by Y, mutatis mutandis); a somewhat non-trivial result is that such a function space exists for all X if and only if Y is locally compact; the topology on X^Y is then the so-called “compact-open” topology.

But why stop there? A general notion of “exponential” object is available for any category C with cartesian products: for objects c, d of C, an exponential d^c is an object equipped with a map ev: d^c \times c \to d, such that for any h: b \times c \to d, there exists a unique f: b \to d^c such that h is retrieved as the composite

\displaystyle b \times c \stackrel{f \times 1_c}{\to} d^c \times c \stackrel{ev}{\to} d.

Example: If the category is a meet-semilattice, then (assuming x^y exists) there is a bijection or equivalence which takes the form

\displaystyle \frac{a \leq x^y}{a \wedge y \leq x} iff

But wait, we’ve seen this before: x^y is what we earlier called the implication y \Rightarrow x. So implication is really a function space object! \Box

Okay, let me turn now to the notion of natural transformation. I described the original discovery (or invention) of categories as a kind of reverse engineering (functors were invented to talk about natural transformations, and categories were invented to talk about functors). Moving now in the forward direction, the rough idea can be stated as a progression:

  • The notion of functor is appropriately defined as a morphism between categories,
  • The notion of natural transformation is appropriately defined as a morphism between functors.

That seems pretty bare-bones: how do we decide what the appropriate notion of morphism between functors should be? One answer is by pursuing an analogy:

  • As a space Y^X of continuous functions X \to Y is to the category of topological spaces, so a category D^C of functors C \to D should be to the category of categories.

That is, we already “know” (or in a moment we’ll explain) that the objects of this alleged exponential category D^C are functors F: C \to D. Since D^C is defined by a universal property, it is uniquely determined up to isomorphism. This in turn will uniquely determine what the “right” notion of morphism between functors F, G: C \to D should be: morphisms F \to G in the exponential D^C! Then, to discover the nature of these morphisms, we employ an appropriate “probe”.

To carry this out, I’ll need two special categories. First, the category \mathbf{1} denotes a (chosen) category with exactly one object and exactly one morphism (necessarily the identity morphism). It satisfies the universal property that for any category C, there exists a unique functor C \to \mathbf{1}. It is called a terminal category (for that reason). It can also be considered as an empty product of categories.

Proposition: For any category C, there is an isomorphism \mathbf{1} \times C \cong C.

Proof: Left to the reader. It can be proven either directly, or by applying universal properties. \Box

The category \mathbf{1} can also be considered an “object probe”, in the sense that a functor \mathbf{1} \to C is essentially the same thing as an object of C (just look where the object of \mathbf{1} goes to in C).

For example, to probe the objects of the exponential category D^C, we investigate functors \mathbf{1} \to D^C. By the universal property of exponentials D^C, these are in bijection with functors \mathbf{1} \times C \to D. By the proposition above, these are in bijection with functors C \to D. So objects of D^C are necessarily tantamount to functors C \to D (and so we might as well define them as such).

Now we want to probe the morphisms of D^C. For this, we use the special category given by the poset \mathbf{2} = \{0 \leq 1\}. For if X is any category and f: x \to y is a morphism of X, we can define a corresponding functor F: \mathbf{2} \to X such that F(0) = x, F(1) = y, and F sends the morphism 0 \leq 1 to f. Thus, such functors F are in bijection with morphisms of X. Speaking loosely, we could call the category \mathbf{2} the “generic morphism”.

Thus, to probe the morphisms in the category D^C, we look at functors \mathbf{2} \to D^C. In particular, if F, G are functors C \to D, let us consider functors \phi: \mathbf{2} \to D^C such that \phi(0) = F and \phi(1) = G. By the universal property of D^C, these are in bijection with functors \eta: \mathbf{2} \times C \to D such that the composite

\displaystyle C \cong \mathbf{1} \times C  \stackrel{0 \times 1_C}{\to} \mathbf{2} \times C \stackrel{\eta}{\to} D

equals F, and the composite

\displaystyle C \cong \mathbf{1} \times C \stackrel{1 \times 1_C}{\to} \mathbf{2} \times C \stackrel{\eta}{\to} D

equals G. Put more simply, this says \eta(0, c) = F(c) and \eta(1, c) = G(c) for objects c of C, and \eta(1_0, f) = F(f) and \eta(1_1, f) = G(f) for morphisms f of C.

The remaining morphisms of \mathbf{2} \times C have the form (0 \leq 1, f: c \to d). Introduce the following abbreviations:

  1. \phi_c := \eta(0 \leq 1, 1_c) for objects c of C;
  2. \phi_f := \eta(0 \leq 1, f) for morphisms f of C.

Since \eta is a functor, it preserves morphism composition. We find in particular that since

(1_1, f) \circ (0 \leq 1, 1_c) = (1_1 \circ (0 \leq 1), f \circ 1_c) = (0 \leq 1, f)

(0 \leq 1, 1_d) \circ (1_0, f) = ((0 \leq 1) \circ 1_0, 1_d \circ f) = (0 \leq 1, f)

we have

\eta(1_1, f) \circ \eta(0 \leq 1, 1_c) = \eta(0 \leq 1, f)

\eta(0 \leq 1, 1_d) \circ \eta(1_0, f) = \eta(0 \leq 1, f)

or, using the abbreviations,

G(f) \circ \phi_c = \phi_f = \phi_d \circ F(f).

In particular, the data \phi_f is redundant: it may be defined either as either side of the equation

G(f) \circ \phi_c = \phi_d \circ F(f).

Exercise: Just on the basis of this last equation (for arbitrary morphisms f and objects c of C), prove that functoriality of \eta follows.

This leads us at last to the definition of natural transformation:

Definition: Let C, D be categories and let F, G be functors from C to D. A natural transformation \phi: F \to G is an assignment of morphisms \phi_c: F(c) \to G(c) in D to objects c of C, such that for every morphism f: c \to d, the following equation holds: G(f) \circ \phi_c = \phi_d \circ F(f). \Box

Usually this equation is expressed in the form of a commutative diagram:

     F(c) ---> F(d)
      |         |
phi_c |         | phi_d
      V   G(f)  V
     G(c) ---> G(d)

which asserts the equality of the composites formed by following the paths from beginning (here F(c)) to end (here G(d)). (Despite the inconvenience in typesetting them, commutative diagrams as 2-dimensional arrays are usually easier to read and comprehend than line-by-line equations.) The commutative diagram says that the components \phi_c of the transformation are coherent or compatible with all morphisms f: c \to d of the domain category.

Remarks: Now that I’ve written this post, I’m a little worried that any first-timers to category theory reading this will find this approach to natural transformations a little hardcore or intimidating. In that case I should say that my intent was to make this notion seem as inevitable as possible: by taking seriously the analogy

function space: category of spaces :: functor category: category of categories

we are inexorably led to the “right” (the “natural”) notion of natural transformation as morphism between functors. But if this approach is for some a pedagogical flop, then I urge those readers just to forget it, or come back to it later. Just remember the definition of natural transformation we finally arrived at, and you should be fine. Grasping the inner meaning of fundamental concepts like this takes time anyway, and isn’t merely a matter of pure deduction.

I should also say that the approach involved a kind of leap of faith that these functor categories (the exponentials D^C) “exist”. To be sure, the analysis above shows clearly what they must look like if they do exist (objects are functors C \to D; morphisms are natural transformations as we’ve defined them), but actually there’s some more work to do: one must show they satisfy the universal property with respect to not just the two probing categories \mathbf{1} and \mathbf{2} that we used, but any category E.

A somewhat serious concern here is that our talk of exponential categories played pretty fast and loose at the level of mathematical foundations. There’s that worrying phrase “category of categories”, for starters. That particular phraseology can be avoided, but nevertheless, it must be said that in the case where C is a large category (i.e., involving a proper class of morphisms), the collection of all functors from C to D is not a well-formed construction within the confines of Gödel-Bernays-von Neumann set theory (it is not provably a class in general; in some approaches it could be called a “super-class”).

My own attitude toward these “problems” tends to be somewhat blasé, almost dismissive: these are mere technicalities, sez I. The main concepts are right and robust and very fruitful, and there are various solutions to the technical “problem of size” which have been developed over the decades (although how satisfactory they are is still a matter of debate) to address the apparent difficulties. Anyway, I try not to worry about it much. But, for those fine upstanding citizens who do worry about these things, I’ll just state one set-theoretically respectable theorem to convey that at least conceptually, all is right with the world.

Definition: A category with finite products is cartesian closed if for any two objects c, d, there exists an exponential object d^c.

Theorem: The category of small categories is cartesian closed. \Box


This week’s problem is offered more in the spirit of a light and pleasant diversion — I don’t think you’ll need any deep insight to solve it. (A little persistence may come in handy though!)

Define a triomino to be a figure congruent to the union of three of the four unit squares in a 2 \times 2 square. For which pairs of positive integers (m, n) is an m \times n rectangle tileable by triominoes?

Please submit solutions to topological[dot]musings[At]gmail[dot]com by Wednesday, July 3, 11:59 pm (UTC); do not submit solutions in Comments. Everyone with a correct solution will be inducted into our Hall of Fame! We look forward to your response. Enjoy!

We got some very good response to our last week’s problem from several of our “regular” problem-solvers as well as several others who are “new”. There were solutions that were more “algebraic” than others, some that had a more “trigonometric” flavor to them and some that had a combination of both. All the solutions we received this time were correct and they all deserve to be published, but for the sake of brevity I will post just one.

Solution to POW-5: (due to Animesh Datta, Univ of New Mexico)

Note that the given integral may be written as

\displaystyle \int  \frac{x^2 - 1}{x(x^2 + 1) \sqrt{x^2 + 1/x^2}} \, dx

\displaystyle =  \int \frac{1 - 1/x^2}{(x + 1/x) \sqrt{(x + 1/x)^2 - 2}} \, dx.

Now, we use the substitution t = x + 1/x, which transforms the integral into

\displaystyle \int \frac1{t \sqrt{t^2 - 2}} \, dt.

Finally, we use one last (trigonometric) substitution t = \sqrt{2} \sec \theta, which transforms the integral into \displaystyle \int \frac1{\sqrt{2}} \, d\theta, which evaluates to \theta /\sqrt{2} + C, which equals \displaystyle \frac1{\sqrt2} \arctan \sqrt{\frac12 (x^2 + \frac1{x^2})} + C. And this is our final answer!

Watch out for the next POW that will be posted by Todd!

Source: I had mentioned earlier that Carl Lira had brought this integral to our attention, and he in turn had found it in the MIT Integration Bee archives. This one was from the year 1994.

Trivia: Four out of the six people who sent correct solutions are either Indians or of Indian origin! Coincidence? 🙂

After a long hiatus (sorry about that!), I would like to resume the series on Stone duality. You may recall I started this series by saying that my own point of view on mathematics is strongly informed by category theory, followed by a little rant about the emotional reactions that category theory seems to excite in many people, and that I wouldn’t be “blathering” about categories unless a strong organic need was felt for it. Well, it’s come to that point: to continue talking sensibly about Stone duality, I really feel some basic concepts of category theory are now in order. So: before we pick up the main thread again, I’ll be talking about categories up to the point of the central concept of adjoint pairs, generalizing what we’ve discussed before in the context of truth-valued matrix algebra.

I’ll start by firmly denouncing a common belief: that category theory is some arcane, super-abstract subject. I just don’t believe that’s a healthy way of looking at it. To me, categories are no more and no less abstract than groups, rings, fields, etc. — they are just algebraic structures of a certain type (and a not too complicated type at that). That said, they are particularly ubiquitous and useful structures, which can be studied either as small structures (for example, posets provide examples of categories, and so do groups), or to organize the study of general types of structure in the large (for example, the class of posets and poset morphisms forms a category). Just think of them that way: they are certain sorts of algebraic structures which crop up just about everywhere, and it is very useful to learn something about them.

Usually, the first examples one is shown are large categories, typically of the following sort. One considers the class of mathematical structures of a given type: it could be the class of groups, or of posets, or of Boolean algebras, etc. The elements of a general such class are given the neutral name “objects”. Then, we are also interested in how the objects A, B, C, \ldots are related to each other, typically through transformations f: A \to B which “preserve” the given type of structure. In the case of sets, transformations are just functions; in the case of groups, the transformations are group homomorphisms (which preserve group multiplication, inversion, and identities); in the case of vector spaces, they are linear transformations (preserving vector addition and scalar multiplication); in the case of topological spaces, they are continuous maps (preserving a suitable notion of convergence). In general, the transformations are given the neutral name “homomorphisms”, or more often just “morphisms” or “maps”.

In all of these cases, two morphisms f: A \to B, g: B \to C compose to give a new morphism g \circ f: A \to C (for example the composite of two group homomorphisms is a group homomorphism), and do so in an associative way (h \circ (g \circ f) = (h \circ g) \circ f), and also there is an identity morphism 1_A: A \to A for each object A which behaves as identities should (f \circ 1_A = f = 1_B \circ f for any morphism f: A \to B). A collection of objects, morphisms between them, together with an associative law of composition and identities, is called a category.

A key insight of category theory is that in general, important structural properties of objects A, B, C, \ldots can be described purely in terms of general patterns or diagrams of morphisms and their composites. By means of such general patterns, the same concept (like the concept of a product of two objects, or of a quotient object, or of a dual) takes on the same form in many different kinds of category, for many different kinds of structure (whether algebraic, or topological, or analytic, or some mixture thereof) — and this in large part gives category theory the power to unify and simplify the study of general mathematical structure. It came as quite a revelation to me personally that (to take one example) the general idea of a “quotient object” (quotient group, quotient space, etc.) is not based merely on vague family resemblances between different kinds of structure, but can be made absolutely precise and across the board, in a simple way. That sort of explanatory power and conceptual unification is what got me hooked!

In a nutshell, then, category theory is the study of commonly arising structures via general patterns or diagrams of morphisms, and the general application of such study to help simplify and organize large portions of mathematics. Let’s get down to brass tacks.

Definition: A category C consists of the following data:

  • A class O of objects;
  • A class M of morphisms;
  • A function \mbox{dom}: M \to O which assigns to each morphism its domain, and a function \mbox{cod}: M \to O which assigns to each morphism its codomain. If f \in M, we write f: A \to B to indicate that \mbox{dom}(f) = A and \mbox{cod}(f) = B.
  • A function \mbox{Id}: O \to M, taking an object A to a morphism 1_A: A \to A, called the identity on A.

Finally, let C_2 denote the class of composable pairs of morphisms, i.e., pairs (f, g) \in M \times M such that \mbox{cod}(f) = \mbox{dom}(g). The final datum:

  • A function \mbox{comp}: C_2 \to M, taking a composable pair (f: A \to B, g: B \to C) to a morphism g \circ f: A \to C, called the composite of f and g.

These data satisfy a number of axioms, some of which have already been given implicitly (e.g., \mbox{dom}(g \circ f) = \mbox{dom}(f) and \mbox{cod}(g \circ f) = \mbox{cod}(g)). The ones which haven’t are

  1. Associativity: h \circ (g \circ f) = (h \circ g) \circ f for each composable triple (f: A \to B, g: B \to C, h: C \to D).
  2. Identity axiom: Given f: A \to B, f \circ 1_A = f = 1_B \circ f.

Sometimes we write C_0 for the class of objects, C_1 for the class of morphisms, and for n > 1, C_n for the class of composable n-tuples of morphisms. \Box

Nothing in this definition says that objects of a category are “sets with extra structure” (or that morphisms preserve such structure); we are just thinking of objects as general “things” and depict them as nodes, and morphisms as arrows or directed edges between nodes, with a given law for composing them. The idea then is all about formal patterns of arrows and their compositions (cf. “commutative diagrams”). Vishal’s post on the notion of category had some picture displays of the categorical axioms, like associativity, which underline this point of view.

In the same vein, categories are used to talk about not just large classes of structures; in a number of important cases, the structures themselves can be viewed as categories. For example:

  1. A preorder can be defined as a category for which there is at most one morphism f: A \to B for any two objects A, B. Given there is at most one morphism from one object to another, there is no particular need to give it a name like f; normally we just write a \leq b to say there is a morphism from a to b. Morphism composition then boils down to the transitivity law, and the data of identity morphisms expresses the reflexivity law. In particular, posets (preorders which satisfy the antisymmetry law) are examples of categories.
  2. A monoid is usually defined as a set M equipped with an associative binary operation (a, b) \mapsto a \cdot b and with a (two-sided) identity element e for that operation. Alternatively, a monoid can be construed as a category with exactly one object. Here’s how it works: given a monoid (M, \cdot, e), define a category where the class O consists of a single object (which I’ll give a neutral name like \bullet; it doesn’t have to be any “thing” in particular; it’s just a “something”, it doesn’t matter what), and where the class of morphisms is defined to be the set M. Since there is only one object, we are forced to define \mbox{dom}(a) = \bullet and \mbox{cod}(a) = \bullet for all a \in M. In that case all pairs of morphisms are composable, and composition is defined to be the operation in M: a \circ b := a \cdot b. The identity morphism on \bullet is defined to be e. We can turn the process around: given a category with exactly one object, the class of morphisms M can be construed as a monoid in the usual sense.
  3. A groupoid is a category in which every morphism is an isomorphism (by definition, an isomorphism is an invertible morphism, that is, a morphism f: A \to B for which there exists a morphism g: B \to A such that g \circ f = 1_A and f \circ g = 1_B). For example, the category of finite sets and bijections between them is a groupoid. The category of topological spaces and homeomorphisms between them is a groupoid. A group is a monoid in which every element is invertible; hence a group is essentially the same thing as a groupoid with exactly one object.

Remark: The notion of isomorphism is important in category theory: we think of an isomorphism f: A \to B as a way in which objects A, B are the “same”. For example, if two spaces are homeomorphic, then they are indistinguishable as far as topology is concerned (any topological property satisfied by one is shared by the other). In general there may be many ways or isomorphisms to exhibit such “sameness”, but typically in category theory, if two objects satisfy the same structural property (called a universal property; see below), then there is just one isomorphism between them which respects that property. Those are sometimes called “canonical” or “god-given” isomorphisms; they are 100% natural, no artificial ingredients! \Box

Earlier I said that category theory studies mathematical structure in terms of general patterns or diagrams of morphisms. Let me give a simple example: the general notion of “cartesian product”. Suppose X_1, X_2 are two objects in a category. A cartesian product of X_1 and X_2 is an object X together with two morphisms \pi_1:  X \to X_1, \pi_2: X \to X_2 (called the projection maps), satisfying the following universal property: given any object Y equipped with a map f_i: Y \to X_i for i = 1, 2, there exists a unique map f: Y \to X such that f_i = \pi_i \circ f for i = 1, 2. (Readers not familiar with this categorical notion should verify the universal property for the cartesian product of two sets, in the category of sets and functions.)

I said “a” cartesian product, but any two cartesian products are the same in the sense of being isomorphic. For suppose both (X, \pi_1, \pi_2) and (X', \pi_1', \pi_2') are cartesian products of X_1, X_2. By the universal property of the first product, there exists a unique morphism f: X' \to X such that \pi_i' = \pi_i \circ f for i = 1, 2. By the universal property of the second product, there exists a unique morphism g: X \to X' such that \pi_i = \pi_i' \circ g. These maps f and g are inverse to one another. Why? By the universal property, there is a unique map \phi: X \to X (namely, \phi = 1_X) such that \pi_i = \pi_i \circ \phi for i = 1, 2. But \phi = f \circ g also satisfies these equations: \pi_i = \pi_i \circ (f \circ g) (using associativity). So 1_X = f \circ g by the uniqueness clause of the universal property; similarly, 1_{X'} = g \circ f. Hence f: X \to X' is an isomorphism.

This sort of argument using a universal property is called a universality argument. It is closely related to what we dubbed the “Yoneda principle” when we studied posets.

So: between any two products X, X' of X_1 and X_2, there is a unique isomorphism f: X' \to X respecting the product structure; we say that any two products are canonically isomorphic. Very often one also has chosen products (a specific choice of product for each ordered pair (X_1, X_2)), as in set theory when we construe the product of two sets as a set of ordered pairs \{(x_1, x_2): x_1 \in X_1, x_2 \in X_2\}. We use X_1 \times X_2 to denote (the object part of) a chosen cartesian product of (X_1, X_2).

Exercise: Use universality to exhibit a canonical isomorphism \sigma: X_1 \times X_2 \to X_2 \times X_1. This is called a symmetry isomorphism for the cartesian product.

Many category theorists (including myself) are fond of the following notation for expressing the universal property of products:

\displaystyle \frac{f_1: Y \to X_1 \qquad f_2: Y \to X_2}{f = \langle f_1, f_2 \rangle: Y \to X_1 \times X_2}

where the dividing line indicates a bijection between pairs of maps (f_1, f_2) and single maps f into the product, effected by composing f with the pair of projection maps. We have actually seen this before: when the category is a poset, the cartesian product is called the meet:

\displaystyle \frac{a \leq x \qquad a \leq y}{a \leq x \wedge y}

In fact, a lot of arguments we developed for dealing with meets in posets extend to more general cartesian products in categories, with little change (except that instead of equalities, there will typically be canonical isomorphisms). For example, we can speak of a cartesian product of any indexed collection of objects \{X_i\}_{i \in I}: an object \prod_{i \in I} X_i equipped with projection maps \pi_i: \prod_{i \in I} X_i \to X_i, satisfying the universal property that for every I-tuple of maps f_i: Y \to X_i, there exists a unique map f: Y \to \prod_{i \in I} X_i such that f_i = \pi_i \circ f. Here we have a bijection between I-tuples of maps and single maps:

\displaystyle \frac{(f_i: Y \to X_i)_{i \in I}}{f = \langle f_i \rangle_{i \in I}: Y \to \prod_{i \in I} X_i}

By universality, such products are unique up to unique isomorphism. In particular, (X_1 \times X_2) \times X_3 is a choice of product of the collection \{X_1, X_2, X_3\}, as seen by contemplating the bijection between triples of maps and single maps

\displaystyle \frac{\frac{f_1: Y \to X_1 \qquad f_2: Y \to X_2}{\langle f_1, f_2 \rangle: Y \to X_1 \times X_2} \qquad \frac{f_3: Y \to X_3}{f_3: Y \to X_3}}{f: Y \to (X_1 \times X_2) \times X_3}

and similarly X_1 \times (X_2 \times X_3) is another choice of product. Therefore, by universality, there is a canonical associativity isomorphism

\alpha: (X_1 \times X_2) \times X_3 \to X_1 \times (X_2 \times X_3).

Remark: It might be thought that in all practical cases, the notion of cartesian product (in terms of good old-fashioned sets of tuples) is clear enough; why complicate matters with categories? One answer is that it isn’t always clear from purely set-theoretic considerations what the right product structure is, and in such cases the categorical description gives a clear guide to what we really need. For example, when I was first learning topology, the box topology on the set-theoretic product \prod_{i \in I} X_i seemed to me to be a perfectly natural choice of topology; I didn’t understand the general preference for what is called the “product topology”. (The open sets in the box topology are unions of products \prod_{i \in I} U_i of open sets in the factors X_i. The open sets in the product topology are unions of such products where U_i = X_i for all but finitely many i \in I.)

In retrospect, the answer is obvious: the product topology on \prod_{i \in I} X_i is the smallest topology making all the projection maps \pi_i continuous. This means that a function f: Y \to \prod_{i \in I} X_i is continuous if and only if each f_i = \pi_i \circ f: Y \to X_i is continuous: precisely the universal property we need. Similarly, in seeking to understand products or other constructions of more abstruse mathematical structures (schemes for instance), the categorical description is de rigeur in making sure we get it right. \Box

For just about any mathematical structure we can consider a category of such structures, and this applies to the notion of category itself. That is, we can consider a category of categories! (Sounds almost religious to me: category of categories, holy of holies, light of lights…)

  • Remark: Like “set of sets”, the idea of category of categories taken to a naive extreme leads to paradoxes or other foundational difficulties, but there are techniques for dealing with these issues, which I don’t particularly want to discuss right now. If anyone is uncomfortable around these issues, a stopgap measure is to consider rather the category of small categories (a category has a class of objects and morphisms; a small category is where these classes are sets), within some suitable framework like the set theory of Gödel-Bernays-von Neumann.

If categories are objects, the morphisms between them may be taken to be structure-preserving maps between categories, called “functors”.

Definition: If C and D are categories, a functor F: C \to D consists of a function F_0: C_0 \to D_0 (between objects) and a function F_1: C_1 \to D_1 (between morphisms), such that

  • F_0(\mbox{dom}_C(f)) = \mbox{dom}_D(F_1(f)) and F_0(\mbox{cod}_C(f)) = \mbox{cod}_D(F_1(f)), for each morphism f \in C_1 (i.e., F preserves domains and codomains of morphisms);
  • F_1(1_A) = 1_{F_0(A)} for each object A \in C_0, and F_1(g \circ f) = F_1(g) \circ F_1(f) for each composable pair (f, g) \in C_2 (i.e., F preserves identity morphisms and composition of morphisms).

Normally we are not so fussy in writing F_1(f) or F_0(A); we write F(f) and F(A) for morphisms f and objects A alike. Sometimes we drop the parentheses as well. \Box

If X, Y are groups or monoids regarded as one-object categories, then a functor between them amounts to a group or monoid homomorphism. If X, Y are posets regarded as categories, then a functor between them amounts to a poset map. So no new surprises in these cases.

Exercise: Define a product C \times D of two categories C, D, and verify that the definition satisfies the universal property of products in the “category of categories”.

Exercise: If a category C has chosen products, show how a functor C \times C \to C may be defined which takes a pair of objects (c, d) to its product c \times d. (You need to define the morphism part F_1 of this functor; this will involve the universal property of products.)

Time for our next problem in the POW series! Earlier, Todd and I deliberated for a bit on whether we should pose a “hard” Ramanujan identity (involving an integral and Gamma function) as the next POW, but decided against doing it. Perhaps, we may do so some time in the future.

Okay, the following integral was brought to our attention by Carl Lira, and for the time being I won’t reveal the actual source of the problem.

Compute \displaystyle \int \frac{x^2 - 1}{(x^2 + 1) \sqrt{x^4 + 1}} \, dx.

It is “hard” or “easy” depending on how you look at it!

Please send your solutions to topological[dot]musings[At]gmail[dot]com by Wednesday, June 26, 11:59pm (UTC); do not submit solutions in Comments. Everyone with a correct solution gets entered in our Hall of Fame! We look forward to your response.

Don’t forget to download Firefox 3.0 today!

Download Day 2008

The solutions are in! I thought last week’s problem might have been a little more challenging than problems of previous weeks — the identity is just gorgeous, but not at all obvious (I don’t think!) without some correspondingly gorgeous combinatorial insight. Luckily, some of our readers came up with the goods, and their solutions provide a forum for discussing a beautiful circle of ideas, involving the inter-related combinatorics of trees and endofunctions.

I can’t decide which of the solutions we received I like best. They all bear a certain familial resemblance, but each has its own distinct personality. I’ll give two representative examples, and append some comments at the end. Both proofs are conceptual “bijective” proofs, in which the two sides of the identity represent two different ways of counting essentially the same combinatorial objects. And both rely on a famous theorem of Cayley, on the number of tree structures or spanning trees on n distinct labeled nodes (maybe this would be sufficient hint, if you still want to think about it some more by yourself!). Here, I’ll add a little spoiler space:






1. (Solution by David Eppstein) As is well known (see, e.g.,, the number of different spanning trees on a set of n labeled nodes is n^{n-2}. Equivalently, the number of ways of choosing a spanning tree together with a specification of a single vertex s is n^{n-1}, and the number of ways of choosing a spanning tree together with a specification of two vertices s and t (possibly equal to each other) is n^n. So that’s the right hand side of the identity.

Now suppose you are given a tree T, and two nodes s and t. If s and t are different, let (s,u) be the first edge on the path in T from s to t; cutting T at that edge produces two disjoint subtrees, one containing one marked node s and the other containing two (possibly equal) marked nodes, namely t and the first node u on the path after s. Conversely, from this information (two trees, one containing a marked node s and the other containing two marks on nodes u and t) we can put together T simply by connecting the two trees by an edge (s,u). If j is the number of nodes in the tree containing s, the number of ways we can choose two disjoint marked subtrees in this way is

\displaystyle \sum_{j=1}^{n-1} {n\choose j} j^{j-1} (n-j)^{n-j},

almost the same as the left hand side of the identity, but missing the final term in the sum.

The final term comes from the case when the marked nodes s and t of tree T coincide. The number of ways this can happen is the same as the number of ways we can pick a single marked node of a tree, that is, n^{n-1}, which is the same as the final term in the left hand sum.

Thus, the left side counts (partitions of n vertices into two disjoint subtrees, one subtree having one marked node and one subtree having two possibly-equal marks) + (n-vertex trees with one marked node); the right side counts (n-vertex trees with two possibly-equal marks), and we have demonstrated a combinatorial equivalence between these two sets. \Box

2. (Solution by Sune Jakobsen) Consider all (n-1)-tuples a=(a_1,a_2,...,a_{n-1}), where each term is from the set \{1,2, \ldots, n\}. Since each of the n-1 terms can take n values, there are n^{n-1} such tuples.

Given a (n-1)-tuple, a, construct a graph as follows. Begin with a vertex labeled n. Then, for each vertex labeled k in the graph, if a_i=k, add a new vertex labeled i, and connect i and k by an edge. This graph must be a tree since each a_i only takes one value and a_n doesn’t exist.

Using this graph, I will count the number of (n-1)-tuples in another way. Let j be the number of vertices in such a tree graph. The vertices may be chosen in \displaystyle \binom{n-1}{j-1} ways, since the vertex labeled n is already one of them. Given the vertices, the tree can be formed in j^{j-2} ways, by Cayley’s theorem (see Given the tree graph, the values of the a_i‘s, for each vertex labeled i in the graph, can be chosen in one and only one way (namely, a_i is the label of the first vertex after i along the unique path from vertex i to vertex n). The remaining n-j components of the tuple are not among the vertex labels in the graph, so each takes on one of n-j possible values, giving (n-j)^{n-j} possibilities for the remaining components. Therefore the number of (n-1)-tuples must be:

\displaystyle n^{n-1} = \sum_{j=1}^{n} \binom{n-1}{j-1} j^{j-2} (n-j)^{n-j}

Multiplying both sides of the previous equation by n and using \displaystyle \frac{n}{j}\binom{n-1}{j-1} = \binom{n}{j}, the claim follows. \Box



1. I found this curious identity in HAKMEM, item 118. For those who don’t know, HAKMEM is a kind of archive of cool mathematical observations made by some of the original MIT computer “hackers” from the 60’s and 70’s, including Bill Gosper and Rick Schroeppel. This particular item is credited to Schroeppel, but the accompanying text is a bit cryptic in my view:

Differentiate ye^{-y} = x to get y + y x y' - x y' = 0. One observes the curious identity
\displaystyle \sum_{j=1}^n \binom{n}{j} j^{j-1} (n-j)^{(n-j)} = n^n (0^0 = 1)
and thus
\displaystyle y(x) = \sum_{n \geq 1} \frac{n^{n-1} x^n}{n!}.

Maybe it was just their style to record a lot of their observations in such terse, compact form, but it annoys me that these guys hide their light under a bushel in this way. No motivation whatsoever, even though (I’d be willing to bet) these guys knew about the connection to trees — they’re computer scientists, after all!

Personally, I find it easier to get from y = x e^y to \displaystyle y = \sum_{n=1}^\infty \frac{n^{n-1}x^n}{n!} by other means than through their intermediate identity. I feel sure that just about anyone who has played around with enumerative combinatorics, and with the combinatorics of trees in particular, could figure this one out.

For, as David pointed out in his solution, n^{n-1} is the number of spanning trees on the set [n] = \{1, 2, \ldots, n\} equipped with a distinguished vertex; I’ll call that vertex the root, and such structures rooted trees. (Incidentally, a spanning tree is by definition an acyclic subgraph of the complete graph on the set [n], such that any two elements of the set are connected or spanned by a path in the subgraph. The theorem of Cayley mentioned above is that there are n^{n-2} such spanning trees.) Thus,

\displaystyle y(x) = \sum_{n=1}^\infty \frac{n^{n-1} x^n}{n!}
is the exponential generating function (egf) for rooted trees.


On the other hand, it is not hard to see that the functional equation y = xe^y holds for the egf of rooted trees (and uniquely determines the power series of the egf). One just applies some basic principles; I’ll just say it briefly and hope it’s somewhat followable: a rooted tree structure on a finite set S is given by the selection of a root r \in S, together with a partition of the remainder S - \{r\} into equivalence classes and a choice of rooted tree structure on each class. (Severing the root results in a bunch of disjoint subtrees, whose roots are those vertices adjacent to the original root.) At the level of egf’s, selection of the root accounts for the factor x on the right of the functional equation, and if y is the egf for rooted trees, then the other factor e^y is the egf for the collection of ways of partitioning a set into nonempty classes and putting a rooted tree structure on each class. This is all part of the art and science of generatingfunctionology. It’s beautiful stuff.

Somehow I find this explanation much easier to understand than the machinations hinted at in HAKMEM 118.

2. David’s proof was actually the one I myself had in mind. I can’t say what inspired David, but I myself was inspired by an earlier reading of a beautiful (and in many respects revolutionary-for-its-time) article, on a systematic functorial approach to enumerative combinatorics:

  • André Joyal, Une théorie combinatoire des séries formelles, Adv. Math. 42 (1981), 1-82.

In particular, I am very fond of the proof Joyal gives for Cayley’s theorem (which he credits to Gilbert Labelle), and this proof is in a line of thought which also leads to David’s solution. I’d like to present that proof now.

Labelle’s proof of Cayley’s theorem:

The expression n^n probably makes most people think of the number of functions f: [n] \to [n] from an n-element set to itself. The art of combinatorics lies in drawing appropriate pictures, so draw a picture (a graph) of such a function by drawing a directed edge from i to j = f(i) whenever i \neq j (cf. Sune’s solution). Starting from any vertex and iterating f enough times, you always eventually land in a cycle where points get revisited periodically, infinitely often. Let’s call those points periodic points; the function f acts as a permutation on periodic points. Now, for each periodic point p, consider the union of directed paths which end at p without hitting any other periodic points. This union forms a subgraph T_p which is a tree, rooted at p (again, cf. Sune’s solution). The entire set [n] is thereby partitioned into (equivalence) classes (the underlying vertex sets of the trees T_p), and the structure of a function f: [n] \to [n] thus determines the following data:

  • An equivalence relation on [n];
  • A rooted tree structure on each equivalence class;
  • A permutation structure on the set of equivalence classes (each tagged by the periodic point at the root).

Conversely, these three data determine a function, and the correspondence is bijective.

  • Remark: It’s not necessary to the proof, but let me add that by basic principles of generatingfunctionology, if p(x) is the egf for permutations [namely, \displaystyle p(x) = \frac1{1-x}], and if y(x) is the egf for rooted trees, then (p \circ y)(x) is the egf for structures given by such triplets of data. Thus, by the bijective correspondence, we have

    \displaystyle \sum_{n=0}^\infty \frac{n^n x^n}{n!} = (p \circ y)(x).

On the other hand, consider a tree structure T on n points, and suppose we also specify an ordered pair of such points (s, t), possibly equal. There is a unique path from s to t in T, which I’ll call the spine (of the “bipointed tree”); call the points along that path, including s and t, vertebrae. Now, for each point x \in T, there is a unique shortest path from x to the spine, terminating at a vertebra p. The union of all such paths which terminate at a vertebra p again forms a subtree T_p rooted at p. Again, the set of n points is partitioned by the (underlying vertex sets of) T_p , and the structure of a bipointed tree on an n-element set [n] is thus encoded [in bijective fashion] by

  • An equivalence relation on [n];
  • A rooted tree structure on each equivalence class;
  • A spine structure (that is, a linear ordering) on the roots which tag the equivalence classes.

However, the number of linear orderings on an n-element set, n!, is the same as the number of permutations on that set. We conclude that the number of bipointed tree structures on an n-element set is the same as the number of endofunctions, n^n. And, voilà! the number of tree structures on the n-element set must therefore be n^{n-2}. \Box

Note: regular solver Philipp Lampe, who submitted a solution similar to David’s, pointed out that there are no fewer than four proofs of Cayley’s theorem given in Aigner and Ziegler’s Proofs from The Book, which I referred to in an earlier post. At this point, I really wish I had that book! I’d be delighted if someone were to post one of those nice proofs in comments…

3. I’m not quite sure, but Sune’s solution just might be my current favorite, just because it makes obvious contact with the circle of ideas which embrace endofunctions, trees, and rooted trees (I think of the tuples there as endofunctions, or actually, partial endofunctions on (n-1)-element sets). In any event, my sincere thanks go to David, Philipp, and Sune for their insightful responses.

Encouraged and emboldened (embiggened?) by the ingenuity displayed by some of our readers, I’d like to see what sort of response we get to this Problem of the Week:

Establish the following identity: \displaystyle \sum_{j=1}^n \binom{n}{j} j^{j-1} (n-j)^{(n-j)} = n^n for all natural numbers n > 0.

(Here we make the convention 0^0 = 1.) I find this problem tantalizing because it looks as if there should be some sort of conceptual proof — can you find one?

Please send your solutions to topological[dot]musings[At]gmail[dot]com by Wednesday, June 11, 11:59pm (UTC); do not submit solutions in Comments. Everyone with a correct solution gets entered in our Hall of Fame! We look forward to your response.

The last Problem of the Week elicited some diverse and creative responses; many thanks to all those who submitted a solution!

Three basic approaches emerged from the solutions we received, each shedding different light on the problem, so Vishal and I thought it appropriate to give a representative example of each. (This is consonant with established practice in other fora, notably the Problems and Solutions section of the American Mathematical Monthly, which we look to as our model.) Here they are:

1. (Method based on Gamma and Beta functions; composite solution due to Philipp Lampe [U. Bonn], Jöel Duet, and Rod Carvalho) For integers n, k > 0, it is well-known that the Beta function

\displaystyle B(n, k+1) := \frac{\Gamma(n) \Gamma(k+1)}{\Gamma(n+k+1)} = \frac{(n-1)! k!}{(n+k)!}

admits an integral representation

\displaystyle B(n, k+1) = \int_{0}^{1} x^{n-1} (1-x)^k \ dx.

(See for instance Andrews, Askey, and Roy, Special Functions, pp. 4-5.) From the first equation we have \displaystyle \frac1{\binom{n+k}{k}} = nB(n, k+1). Now let n > 1; then we have

\displaystyle \sum_{k=0}^{\infty} \frac1{\binom{n+k}{k}} = n\sum_{k=0}^{\infty} \int_{0}^{1} x^{n-1} (1-x)^k \ dx.

Interchanging summation and integration, we get

\displaystyle n \int_{0}^{1} \sum_{k=0}^{\infty} x^{n-1} (1-x)^k\ dx = \int_{0}^{1} \frac{n x^{n-1}}{1 - (1-x)} dx = \int_{0}^{1} n x^{n-2} dx = \frac{n}{n-1}.

There is a slight difficulty with uniform convergence of the series (which would justify such interchange), at least in the case n = 2. There are various ways of handling this; for example, we may establish the result directly as follows: by change of variables x \leftrightarrow 1-x, we have

\displaystyle \int_{0}^{1} x^{n-1} (1-x)^k dx = \int_{0}^{1} (1-x)^{n-1} x^k dx = \int_{0}^{1} (1-x)^{n-2} (x^k - x^{k+1}) dx

and the partial sum \sum_{k=0}^m \int_{0}^{1} (1-x)^{n-2} (x^k - x^{k+1})\ dx telescopes to

\displaystyle \int_{0}^{1} (1-x)^{n-2}(1 - x^{m+1})\ dx = \frac1{n-1} - \int_{0}^{1} x^{m+1}(1-x)^{n-2}\ dx.

Since x(1-x) \leq \frac1{4} for 0 \leq x \leq 1, the last integral is bounded above by

\displaystyle \int_{0}^{1} x^{m+3-n} \frac1{4^{n-2}}\ dx = \frac1{(m+2-n) 4^{n-2}}

which tends to zero as m \to \infty. Putting all this together, it follows that

\displaystyle n \lim_{m\to \infty} \sum_{k=0}^m \int_{0}^1 x^{n-1} (1-x)^k \ dx = \frac{n}{n-1},

as claimed.

2. (Method of repeated integration, due to Gantumur Tsogtgerel, UC San Diego): Put \displaystyle a_{n k} = \frac1{(k+1)\ldots (k+n)}, and let \displaystyle s_n(x) = \sum_{k=0}^{\infty} a_{n k} x^{k+n}. We have a_{n k} \leq (k+1)^{-n}; thus for n > 1, the series defining s_n(x) converges for |x| \leq 1, and the series defining s_1(x) converges for |x| < 1. Observe that

\displaystyle \sigma_n := \sum_{k=0}^{\infty} \frac1{\binom{n+k}{k}} = n! s_n(1)

for n > 1.

We have s_1(x)=-\ln(1-x) for |x|<1. We also have

s_2(0)=0, \qquad \textrm{and} \qquad s_2'(x)=s_1(x) \qquad (|x|<1)

which gives s_2(x)=x+(1-x)\ln(1-x) for |x|<1. By continuity, we get s_2(1)=1. Further, for n>2 we have s_n(0) = 0 and

s_n'(x)=s_{n-1}(x) \qquad (|x|\leq 1).


\displaystyle s_n(1)=\int_{0}^{1} \int_{0}^{x_{n-1}} \ldots \int_{0}^{x_3} s_2(x_2) dx_2 \ldots dx_{n-1}

\displaystyle \qquad =\frac1{(n-3)!} \int_{0}^{1} (1-y)^{n-3} s_2(y)dy,

where we have used the Cauchy formula for repeated integration. After a short integration by parts, this last expression evaluates to \displaystyle s_n(1) = \frac1{(n-1)!(n-1)}. This yields \displaystyle \sigma_n=\frac{n}{n-1} for n>1.

3. (Method of telescoping series, due to Nilay Vaish)

First, note that \displaystyle \, \frac1{\binom{n+k}{k}} = \frac{n!k!}{(n+k)!} = \frac{n!}{(k+1) (k+2) \cdots (k+n)}

\displaystyle = \frac{n!}{(n-1)} \left( \frac1{(k+1) \cdots (k+n-1)} - \frac1{(k+2) \cdots (k+n)} \right).

Therefore, \displaystyle \sum_{k=0}^{\infty} \frac1{\binom{n+k}{k}}

\displaystyle = \frac{n!}{(n-1)} \sum_{k=0}^{\infty} \left( \frac1{(k+1) \cdots (k+n-1)} - \frac1{(k+2) \cdots (k+n)} \right)

\displaystyle = \frac{n!}{(n-1)} \cdot \frac1{(n-1)!} = \frac{n}{n-1}.


Method 1 was the most popular; it has the advantage that provided one is already familiar with the Beta function, the calculation is essentially a triviality [modulo the slight technical bump in the road noted above]. For those unfamiliar with this function, I can strongly recommend the beautiful text by Andrews, Askey, and Roy, mentioned above, where the Beta function occupies a central place.

Method 2 was essentially how I accidentally discovered the calculation of the series to begin with: by considering iterated antiderivatives of \frac1{1-x}. This however was some years ago, and I am grateful to Gantumur for his elegant solution, which saved me the trouble of recalling the details! Incidentally, the Wikipedia article on the Cauchy formula (linked above) sketches a proof which implicitly invokes differentiation under the integral sign (a favorite technique of Feynman), but we can also think of it this way: the (n-1)-dimensional volume of the simplex y \leq \sigma_{n-1} \leq \ldots \leq \sigma_1 \leq x is \displaystyle \frac{(x-y)^{n-1}}{(n-1)!}, since there are (n-1)! possible linear orderings of coordinates of points in the (n-1)-cube [x, y] \times \ldots \times [x, y], and each ordering gives a simplex congruent to all the others. Then the result follows by an application of the Fubini theorem.

Method 3 is in some sense the most elementary solution [e.g., it avoids integration], if one is fortunate enough to see the trick of writing the summands as differences, so that the series telescopes. From one point of view this trick may seem awfully slick, like a magician pulling a rabbit out of a hat. But, it can be made to look utterly natural in the context of discrete calculus (dealing with sequences f: \mathbb{N} \to \mathbb{R} instead of continuous functions \mathbb{R} \to \mathbb{R}, and with the difference operator (\Delta f)(x) := f(x+1) - f(x) instead of the derivative).

Discrete calculus may be built up in analogy with continuous calculus; here are some basic terms of the analogy:

  • A basic identity of discrete calculus is \displaystyle \Delta \binom{x}{k} = \binom{x}{k-1}, the polynomial identity which restates the recursive identity that generates Pascal’s triangle: \displaystyle \binom{n+1}{k} = \binom{n}{k} + \binom{n}{k-1}. This is analogous to the identity of continuous calculus, \displaystyle D(\frac{x^k}{k!}) = \frac{x^{k-1}}{(k-1)!}.
  • The preceding observation suggests that the falling power \displaystyle x^{\underline{k}} := x(x-1)\ldots (x-k+1) is the discrete analogue of the ordinary power f(x) = x^k of continuous calculus. It satisfies the identity \displaystyle \Delta x^{\underline{k}} = kx^{\underline{k-1}}. The discrete analogue of the identity \displaystyle x^{k+l} = x^k \cdot x^l is the identity \displaystyle x^{\underline{k+l}} = x^{\underline{k}} \cdot (x-k)^{\underline{l}}.
  • The last identity may be used to motivate the definition of x^{\underline{k}} when k is negative: from \displaystyle 1 = x^{\underline{0}} = x^{\underline{k}}(x-k)^{\underline{-k}}, one quickly derives \displaystyle x^{\underline{-k}} = \frac1{(x+1)(x+2)\ldots (x+k)}. The identity \displaystyle \Delta x^{\underline{k}} = k x^{\underline{k-1}} holds for all integers k.
  • The analogue of continuous integration is finite summation. Formally, introduce the following symbolism for discrete integration: if a \leq b and b-a is integral, \textstyle \sum_{a}^{b} f(x) \delta x := f(a) + f(a+1) + \ldots + f(b-1). The fundamental theorem of discrete calculus is that \textstyle{\sum_{a}^{b}} f(x) \delta x = F(b) - F(a) if \Delta F(x) = f(x). This simply restates the method of telescoping sums.

Using discrete calculus, the solution could be expressed compactly as follows:

\displaystyle \sum_{k=0}^{N-1} \frac{n! k!}{(n+k)!} = n! \textstyle{\sum_{0}^{N}} x^{\underline{-n}} \delta x = n! \left. \displaystyle \frac{x^{\underline{1-n}}}{1-n} \right|_{0}^{N} = \displaystyle \frac{n!}{1-n}(N^{\underline{1-n}} - 0^{\underline{1-n}}).

Letting N \to \infty, our answer is: \displaystyle -\frac{n!}{1-n} 0^{\underline{1-n}} = \frac{n}{n-1}.

Update: Look for the ‘sexism’ video at the end of this post that essentially strengthens my argument about the media treatment meted out to a woman, who was/is as capable as any other candidate, running for a very high office in America, the world’s oldest democracy and whose founding fathers, as we learn from history, were the children of The Enlightenment!


With the Democratic primary race practically over now and knowing, as we all do, who the nominee is going to be, I just couldn’t resist writing a post on this one, having avoided writing anything about politics all this while.

Well, it was quite appalling to see/hear all these months that when it came to Hillary the discussions/commentaries in the so-called “mainstream” media were similar to ones that are generally heard in men’s locker rooms, while Obama has been treated almost like a God-like figure. And while Hillary’s “racist” remarks were dissected/analyzed with great relish, no one, it seemed to me, paid any particular attention to the disgusting misogynist remarks directed at her throughout the primary campaign season, with the result that the Democratic party has managed, or so it seems, to lose its grip over white women voters now. I have a feeling that this is going to cost the Democrats another general election. (Of course, I could be wrong; I am not a political “pundit”, after all!)

So much has my mother been miffed/angry at the blatant sexist remarks openly made in the media against Hillary that she has vowed now to vote for McCain this Fall. To her, the contest has “demonstrated” yet again that women still haven’t been able to break the glass ceiling in this male-dominated world. Is anyone listening to women voters like her?

A video sample:

In mathematical parlance, this is the only instance in which Left = Right, if you know what I mean.

Our other blog

Visitors to this blog

Blog Stats

  • 323,927 hits

Wikio Ranking

Wikio - Top Blogs - Sciences

Current Online Readers


June 2008
« May   Jul »