The solutions are in! I thought last week’s problem might have been a little more challenging than problems of previous weeks — the identity is just gorgeous, but not at all obvious (I don’t think!) without some correspondingly gorgeous combinatorial insight. Luckily, some of our readers came up with the goods, and their solutions provide a forum for discussing a beautiful circle of ideas, involving the inter-related combinatorics of trees and endofunctions.

I can’t decide which of the solutions we received I like best. They all bear a certain familial resemblance, but each has its own distinct personality. I’ll give two representative examples, and append some comments at the end. Both proofs are conceptual “bijective” proofs, in which the two sides of the identity represent two different ways of counting essentially the same combinatorial objects. And both rely on a famous theorem of Cayley, on the number of tree structures or spanning trees on distinct labeled nodes (maybe this would be sufficient hint, if you still want to think about it some more by yourself!). Here, I’ll add a little spoiler space:

**1**. (*Solution by David Eppstein*) As is well known (see, e.g., http://en.wikipedia.org/wiki/Pr%C3%BCfer_sequence), the number of different spanning trees on a set of n labeled nodes is . Equivalently, the number of ways of choosing a spanning tree together with a specification of a single vertex is , and the number of ways of choosing a spanning tree together with a specification of two vertices and (possibly equal to each other) is . So that’s the right hand side of the identity.

Now suppose you are given a tree , and two nodes and . If and are different, let be the first edge on the path in from to ; cutting at that edge produces two disjoint subtrees, one containing one marked node and the other containing two (possibly equal) marked nodes, namely and the first node on the path after . Conversely, from this information (two trees, one containing a marked node and the other containing two marks on nodes and ) we can put together simply by connecting the two trees by an edge . If is the number of nodes in the tree containing , the number of ways we can choose two disjoint marked subtrees in this way is

almost the same as the left hand side of the identity, but missing the final term in the sum.

The final term comes from the case when the marked nodes and of tree T coincide. The number of ways this can happen is the same as the number of ways we can pick a single marked node of a tree, that is, , which is the same as the final term in the left hand sum.

Thus, the left side counts (partitions of n vertices into two disjoint subtrees, one subtree having one marked node and one subtree having two possibly-equal marks) + (-vertex trees with one marked node); the right side counts (-vertex trees with two possibly-equal marks), and we have demonstrated a combinatorial equivalence between these two sets.

**2**. (*Solution by Sune Jakobsen*) Consider all -tuples , where each term is from the set Since each of the terms can take values, there are such tuples.

Given a -tuple, , construct a graph as follows. Begin with a vertex labeled . Then, for each vertex labeled in the graph, if , add a new vertex labeled , and connect and by an edge. This graph must be a tree since each only takes one value and doesn’t exist.

Using this graph, I will count the number of -tuples in another way. Let be the number of vertices in such a tree graph. The vertices may be chosen in ways, since the vertex labeled is already one of them. Given the vertices, the tree can be formed in ways, by Cayley’s theorem (see http://www.research.att.com/~njas/sequences/A000272). Given the tree graph, the values of the ‘s, for each vertex labeled in the graph, can be chosen in one and only one way (namely, is the label of the first vertex after along the unique path from vertex to vertex ). The remaining components of the tuple are not among the vertex labels in the graph, so each takes on one of possible values, giving possibilities for the remaining components. Therefore the number of -tuples must be:

**Remarks**:

1. I found this curious identity in HAKMEM, item 118. For those who don’t know, HAKMEM is a kind of archive of cool mathematical observations made by some of the original MIT computer “hackers” from the 60’s and 70’s, including Bill Gosper and Rick Schroeppel. This particular item is credited to Schroeppel, but the accompanying text is a bit cryptic in my view:

Differentiate to get . One observes the curious identity

()

and thus.

Maybe it was just their style to record a lot of their observations in such terse, compact form, but it annoys me that these guys hide their light under a bushel in this way. No motivation whatsoever, even though (I’d be willing to bet) these guys knew about the connection to trees — they’re computer scientists, after all!

Personally, I find it easier to get from to by other means than through their intermediate identity. I feel sure that just about anyone who has played around with enumerative combinatorics, and with the combinatorics of trees in particular, could figure this one out.

For, as David pointed out in his solution, is the number of spanning trees on the set [] equipped with a distinguished vertex; I’ll call that vertex the *root*, and such structures *rooted trees*. (Incidentally, a *spanning tree* is by definition an acyclic subgraph of the complete graph on the set [], such that any two elements of the set are connected or spanned by a path in the subgraph. The theorem of Cayley mentioned above is that there are such spanning trees.) Thus,

Somehow I find this explanation much easier to understand than the machinations hinted at in HAKMEM 118.

2. David’s proof was actually the one I myself had in mind. I can’t say what inspired David, but I myself was inspired by an earlier reading of a beautiful (and in many respects revolutionary-for-its-time) article, on a systematic functorial approach to enumerative combinatorics:

- André Joyal, Une théorie combinatoire des séries formelles, Adv. Math. 42 (1981), 1-82.

In particular, I am very fond of the proof Joyal gives for Cayley’s theorem (which he credits to Gilbert Labelle), and this proof is in a line of thought which also leads to David’s solution. I’d like to present that proof now.

**Labelle’s proof of Cayley’s theorem**:

The expression probably makes most people think of the number of functions : [] [] from an -element set to itself. The art of combinatorics lies in drawing appropriate pictures, so draw a picture (a graph) of such a function by drawing a directed edge from to whenever (cf. Sune’s solution). Starting from any vertex and iterating enough times, you always eventually land in a cycle where points get revisited periodically, infinitely often. Let’s call those points *periodic points*; the function acts as a permutation on periodic points. Now, for each periodic point , consider the union of directed paths which end at without hitting any other periodic points. This union forms a subgraph which is a tree, rooted at (again, cf. Sune’s solution). The entire set [] is thereby partitioned into (equivalence) classes (the underlying vertex sets of the trees ), and the structure of a function [] [] thus determines the following data:

- An equivalence relation on [];
- A rooted tree structure on each equivalence class;
- A permutation structure on the set of equivalence classes (each tagged by the periodic point at the root).

Conversely, these three data determine a function, and the correspondence is bijective.

**Remark**: It’s not necessary to the proof, but let me add that by basic principles of generatingfunctionology, if is the egf for permutations [namely, ], and if is the egf for rooted trees, then is the egf for structures given by such triplets of data. Thus, by the bijective correspondence, we have

On the other hand, consider a tree structure on points, and suppose we also specify an ordered pair of such points , possibly equal. There is a unique path from to in , which I’ll call the *spine* (of the “bipointed tree”); call the points along that path, including and , *vertebrae*. Now, for each point , there is a unique shortest path from to the spine, terminating at a vertebra . The union of all such paths which terminate at a vertebra again forms a subtree rooted at . Again, the set of points is partitioned by the (underlying vertex sets of) , and the structure of a bipointed tree on an -element set [] is thus encoded [in bijective fashion] by

- An equivalence relation on [];
- A rooted tree structure on each equivalence class;
- A spine structure (that is, a linear ordering) on the roots which tag the equivalence classes.

However, the number of linear orderings on an -element set, , is the same as the number of permutations on that set. We conclude that the number of bipointed tree structures on an -element set is the same as the number of endofunctions, . And, voilà! the number of tree structures on the -element set must therefore be .

Note: regular solver Philipp Lampe, who submitted a solution similar to David’s, pointed out that there are no fewer than four proofs of Cayley’s theorem given in Aigner and Ziegler’s *Proofs from The Book*, which I referred to in an earlier post. At this point, I really wish I had that book! I’d be delighted if someone were to post one of those nice proofs in comments…

3. I’m not quite sure, but Sune’s solution just might be my current favorite, just because it makes obvious contact with the circle of ideas which embrace endofunctions, trees, and rooted trees (I think of the tuples there as endofunctions, or actually, partial endofunctions on -element sets). In any event, my sincere thanks go to David, Philipp, and Sune for their insightful responses.

## 7 comments

Comments feed for this article

June 12, 2008 at 12:06 pm

Todd TrimbleDaniel Rife (UC Santa Barbara) jotted down and gmailed the following interesting suggestion for an alternative proof. I haven’t had much time to think it over, but I thought it merited consideration:

This is more of the flavor of the idea and not a proof, but here is what I noted.

n^n = total number of leaves in all (n+1)^(n-1) trees on {0,1,2,…,n} rooted at 0. So for n=2 we have {0->1,0->2},{0->1->2},{0->2->1} has 4=2^2 total leaves.

j^(j-1) = the number of labeled rooted trees w/ j nodes

The idea is that one way to sum up all the leaves on the trees described for n^n is to count up all the leaves of smaller trees that can expand to the trees in question. The way we can find which trees expand to our larger tree is by taking our larger trees and contracting any subtree they contain.

Each term of the sum chooses points to make a tree out of (nCj), finds all the trees with those nodes (j^(j-1)), and counts the number of leaves when we take those trees out: (n-j)^(n-j). For example, if j=3, we choose 3 points so nChoose3, find all 3^(3-1) trees with j nodes, and count the number of leaves contributed by trees with these 3 nodes taken away: (n-3)^(n-3).

June 12, 2008 at 4:26 pm

PhilippDear Todd,

you wanted to know one of the proofs of Cayley’s formula Aigner and Ziegler give in their book. Their first proof is actually Joyal’s / Labelle’s proof you mentioned above.

I will sketch their second proof. It uses linear algebra and it is based on Kirchhoff’s matrix tree theorem. (I think Aigner and Ziegler explain it much better than I do, so I recommend reading their book.)

Let n be a positive integer greater than one. Let us consider the matrix given by for and for . The first thing to notice is . They don’t give a proof for this, but this is easy. Notice that is the characteristic polynomial of the matrix , whose entries are all one, evaluated at . The matrix has got two eigenvalues. The eigenvalue 0 has a -dimensional eigenspace consisting of vectors with and multiplicity . The eigenvalue has multiplicity . Thus the characteristic polynomial of is . If we plug in we get .

Now we have to prove that is the number of spanning trees on vertices. Note that where the matrix is constructed as follows: We consider a complete graph on vertices labelled . It has got edges which we label . The matrix is a matrix. Its rows are labelled by and its columns are labelled by . If an edge joins the vertices then the entries of in the column corresponding to are at position , at position , and 0 otherwise. The identity follows from the fact that two different vertices in are joined by exactly one edge and every vertex has emanating edges.

Now they use the Cauchy-Binet formula. We have

where the sum runs over all subsets of size and is the matrix obtained from by considering only the columns indexed by elements corresponding to .

To finish the proof they show that is zero if the edges corresponding to do not form a spanning tree (which is easy since the matrix has a zero row) and if the edges corresponding to S form a spanning tree: In this case, one of the vertices is a leaf. The row i contains only one nonzero entry and this entry is . Laplace expansion and induction finishes the proof.

PS: I am not familiar with including latex commands in wordpress. I hope everything works.

PPS: There are variations of your identity. The following identity for example is due to V. Babev and holds for :

June 12, 2008 at 10:12 pm

Todd TrimbleThanks for writing that out, Philipp! That proof does seem clever and nice, but I don’t have enough insight into the Cauchy-Binet formula that this proof packs quite the same punch for me as Labelle’s. Could be a “cultural” thing, though.

I’m a little puzzled about this Babev identity: am I making a mistake, or is this “trivial”? If I write the identity as

then it looks to me like a special case of the statement that the order difference operator (that is, the iteration of the operator which takes a function to ), applied to the function , is identically 0. In any event, this type of alternating sum with binomial coefficients looks suspiciously along these lines — did I goof up somewhere?

June 12, 2008 at 11:13 pm

Todd TrimbleThat bit about “trivial” may have sounded a bit jerk-y; let me say it more nicely this time.

I mean that when we iterate the differencing operator, we get

and so on. The difference is

and my comment is that the difference applied to the function is identically zero, since the difference of a degree polynomial is a degree polynomial. Then we specialize to to get the Babev identity.

June 13, 2008 at 11:36 am

PhilippOh, you’re right. Babev’s identity is indeed very easy to prove. I wasn’t aware of that.

When I came across this identity I proved it in the following way. (By the way I found it in a list of problem proposals for a competition on the AoPS-Website http://www.artofproblemsolving.com/Forum/viewtopic.php?t=160249&sid=bd125f6f61c7ebdc5d92b90f485b66a8)

According to Cayley’s formula, the RHS is the number of spanning trees on the vertex set . It is enough to prove that the LHS is the same. Note that the LHS is equal to where the sum runs over all subsets S of V with . But is the number of spanning trees in which all vertices in S are leaves: if T is a spanning tree in which the elements in S are leaves, then no two elements in S are linked by an edge. The subgraph on the vertex set is a spanning tree. There are possible spanning trees. Each element in S is linked to an element in which can be done in ways. By the principle of inclusion and exclusion, is the number of spanning trees on V.

I had this problem in mind when I solved your POW 4. That’s the reason why I thought of Cayley’s formula.

Below I’ll (very briefly) sketch Aigner’s and Ziegler’s third proof. It is due to Riordan and Renyi and it is based on a recursion. Let as above. Define to be the number of forests that are the union of k disjoint trees such that every belongs to exactly one and belongs to iff . If you have such a forest, you can delete the vertex 1. If the vertex 1 is linked to i vertices, the new forest consists of trees. Its vertex set consists of the elements . In this way they obtain a recursion formula

Now they use the recursion formula to prove . This is proved by induction on n. The induction step is a calculation of a binomial sum using the recursion formula and it is maybe six lines long. Now gives Cayley’s formula.

July 2, 2008 at 11:16 pm

Todd TrimblePhilipp — thanks for posting this last proof of Cayley’s theorem (and sorry for the delay in acknowledging it) — it’s very pretty. I also like your proof of the Babev identity.

You are right that there are many variations on the identity of POW-4. I just ran across an article in an old American Mathematical Monthly, “More Trees and Power Sums” by James E. Simpson (November, 1986), which gives a whole bunch, e.g., for ,

His methods give pride of place to matrix tree theorems, and also Abel identities (see Catalog # 3900048 here).

March 28, 2009 at 9:06 pm

Number of idempotent endofunctions « Todd and Vishal’s blog[…] I’ve actually referred to before, in connection with a POW whose solution involves counting tree […]