The solutions are in! I thought last week’s problem might have been a little more challenging than problems of previous weeks — the identity is just gorgeous, but not at all obvious (I don’t think!) without some correspondingly gorgeous combinatorial insight. Luckily, some of our readers came up with the goods, and their solutions provide a forum for discussing a beautiful circle of ideas, involving the inter-related combinatorics of trees and endofunctions.
I can’t decide which of the solutions we received I like best. They all bear a certain familial resemblance, but each has its own distinct personality. I’ll give two representative examples, and append some comments at the end. Both proofs are conceptual “bijective” proofs, in which the two sides of the identity represent two different ways of counting essentially the same combinatorial objects. And both rely on a famous theorem of Cayley, on the number of tree structures or spanning trees on distinct labeled nodes (maybe this would be sufficient hint, if you still want to think about it some more by yourself!). Here, I’ll add a little spoiler space:
1. (Solution by David Eppstein) As is well known (see, e.g., http://en.wikipedia.org/wiki/Pr%C3%BCfer_sequence), the number of different spanning trees on a set of n labeled nodes is . Equivalently, the number of ways of choosing a spanning tree together with a specification of a single vertex
is
, and the number of ways of choosing a spanning tree together with a specification of two vertices
and
(possibly equal to each other) is
. So that’s the right hand side of the identity.
Now suppose you are given a tree , and two nodes
and
. If
and
are different, let
be the first edge on the path in
from
to
; cutting
at that edge produces two disjoint subtrees, one containing one marked node
and the other containing two (possibly equal) marked nodes, namely
and the first node
on the path after
. Conversely, from this information (two trees, one containing a marked node
and the other containing two marks on nodes
and
) we can put together
simply by connecting the two trees by an edge
. If
is the number of nodes in the tree containing
, the number of ways we can choose two disjoint marked subtrees in this way is
almost the same as the left hand side of the identity, but missing the final term in the sum.
The final term comes from the case when the marked nodes and
of tree T coincide. The number of ways this can happen is the same as the number of ways we can pick a single marked node of a tree, that is,
, which is the same as the final term in the left hand sum.
Thus, the left side counts (partitions of n vertices into two disjoint subtrees, one subtree having one marked node and one subtree having two possibly-equal marks) + (-vertex trees with one marked node); the right side counts (
-vertex trees with two possibly-equal marks), and we have demonstrated a combinatorial equivalence between these two sets.
2. (Solution by Sune Jakobsen) Consider all -tuples
, where each term is from the set
Since each of the
terms can take
values, there are
such tuples.
Given a -tuple,
, construct a graph as follows. Begin with a vertex labeled
. Then, for each vertex labeled
in the graph, if
, add a new vertex labeled
, and connect
and
by an edge. This graph must be a tree since each
only takes one value and
doesn’t exist.
Using this graph, I will count the number of -tuples in another way. Let
be the number of vertices in such a tree graph. The vertices may be chosen in
ways, since the vertex labeled
is already one of them. Given the vertices, the tree can be formed in
ways, by Cayley’s theorem (see http://www.research.att.com/~njas/sequences/A000272). Given the tree graph, the values of the
‘s, for each vertex labeled
in the graph, can be chosen in one and only one way (namely,
is the label of the first vertex after
along the unique path from vertex
to vertex
). The remaining
components of the tuple are not among the vertex labels in the graph, so each takes on one of
possible values, giving
possibilities for the remaining components. Therefore the number of
-tuples must be:
Remarks:
1. I found this curious identity in HAKMEM, item 118. For those who don’t know, HAKMEM is a kind of archive of cool mathematical observations made by some of the original MIT computer “hackers” from the 60′s and 70′s, including Bill Gosper and Rick Schroeppel. This particular item is credited to Schroeppel, but the accompanying text is a bit cryptic in my view:
Differentiateto get
. One observes the curious identity
(
)
and thus.
Maybe it was just their style to record a lot of their observations in such terse, compact form, but it annoys me that these guys hide their light under a bushel in this way. No motivation whatsoever, even though (I’d be willing to bet) these guys knew about the connection to trees — they’re computer scientists, after all!
Personally, I find it easier to get from to
by other means than through their intermediate identity. I feel sure that just about anyone who has played around with enumerative combinatorics, and with the combinatorics of trees in particular, could figure this one out.
For, as David pointed out in his solution, is the number of spanning trees on the set [
]
equipped with a distinguished vertex; I’ll call that vertex the root, and such structures rooted trees. (Incidentally, a spanning tree is by definition an acyclic subgraph of the complete graph on the set [
], such that any two elements of the set are connected or spanned by a path in the subgraph. The theorem of Cayley mentioned above is that there are
such spanning trees.) Thus,
Somehow I find this explanation much easier to understand than the machinations hinted at in HAKMEM 118.
2. David’s proof was actually the one I myself had in mind. I can’t say what inspired David, but I myself was inspired by an earlier reading of a beautiful (and in many respects revolutionary-for-its-time) article, on a systematic functorial approach to enumerative combinatorics:
- André Joyal, Une théorie combinatoire des séries formelles, Adv. Math. 42 (1981), 1-82.
In particular, I am very fond of the proof Joyal gives for Cayley’s theorem (which he credits to Gilbert Labelle), and this proof is in a line of thought which also leads to David’s solution. I’d like to present that proof now.
Labelle’s proof of Cayley’s theorem:
The expression probably makes most people think of the number of functions
: [
]
[
] from an
-element set to itself. The art of combinatorics lies in drawing appropriate pictures, so draw a picture (a graph) of such a function by drawing a directed edge from
to
whenever
(cf. Sune’s solution). Starting from any vertex and iterating
enough times, you always eventually land in a cycle where points get revisited periodically, infinitely often. Let’s call those points periodic points; the function
acts as a permutation on periodic points. Now, for each periodic point
, consider the union of directed paths which end at
without hitting any other periodic points. This union forms a subgraph
which is a tree, rooted at
(again, cf. Sune’s solution). The entire set [
] is thereby partitioned into (equivalence) classes (the underlying vertex sets of the trees
), and the structure of a function
[
]
[
] thus determines the following data:
- An equivalence relation on [
];
- A rooted tree structure on each equivalence class;
- A permutation structure on the set of equivalence classes (each tagged by the periodic point at the root).
Conversely, these three data determine a function, and the correspondence is bijective.
- Remark: It’s not necessary to the proof, but let me add that by basic principles of generatingfunctionology, if
is the egf for permutations [namely,
], and if
is the egf for rooted trees, then
is the egf for structures given by such triplets of data. Thus, by the bijective correspondence, we have
On the other hand, consider a tree structure on
points, and suppose we also specify an ordered pair of such points
, possibly equal. There is a unique path from
to
in
, which I’ll call the spine (of the “bipointed tree”); call the points along that path, including
and
, vertebrae. Now, for each point
, there is a unique shortest path from
to the spine, terminating at a vertebra
. The union of all such paths which terminate at a vertebra
again forms a subtree
rooted at
. Again, the set of
points is partitioned by the (underlying vertex sets of)
, and the structure of a bipointed tree on an
-element set [
] is thus encoded [in bijective fashion] by
- An equivalence relation on [
];
- A rooted tree structure on each equivalence class;
- A spine structure (that is, a linear ordering) on the roots which tag the equivalence classes.
However, the number of linear orderings on an -element set,
, is the same as the number of permutations on that set. We conclude that the number of bipointed tree structures on an
-element set is the same as the number of endofunctions,
. And, voilà! the number of tree structures on the
-element set must therefore be
.
Note: regular solver Philipp Lampe, who submitted a solution similar to David’s, pointed out that there are no fewer than four proofs of Cayley’s theorem given in Aigner and Ziegler’s Proofs from The Book, which I referred to in an earlier post. At this point, I really wish I had that book! I’d be delighted if someone were to post one of those nice proofs in comments…
3. I’m not quite sure, but Sune’s solution just might be my current favorite, just because it makes obvious contact with the circle of ideas which embrace endofunctions, trees, and rooted trees (I think of the tuples there as endofunctions, or actually, partial endofunctions on -element sets). In any event, my sincere thanks go to David, Philipp, and Sune for their insightful responses.

7 comments
Comments feed for this article
June 12, 2008 at 12:06 pm
Todd Trimble
Daniel Rife (UC Santa Barbara) jotted down and gmailed the following interesting suggestion for an alternative proof. I haven’t had much time to think it over, but I thought it merited consideration:
This is more of the flavor of the idea and not a proof, but here is what I noted.
n^n = total number of leaves in all (n+1)^(n-1) trees on {0,1,2,…,n} rooted at 0. So for n=2 we have {0->1,0->2},{0->1->2},{0->2->1} has 4=2^2 total leaves.
j^(j-1) = the number of labeled rooted trees w/ j nodes
The idea is that one way to sum up all the leaves on the trees described for n^n is to count up all the leaves of smaller trees that can expand to the trees in question. The way we can find which trees expand to our larger tree is by taking our larger trees and contracting any subtree they contain.
Each term of the sum chooses points to make a tree out of (nCj), finds all the trees with those nodes (j^(j-1)), and counts the number of leaves when we take those trees out: (n-j)^(n-j). For example, if j=3, we choose 3 points so nChoose3, find all 3^(3-1) trees with j nodes, and count the number of leaves contributed by trees with these 3 nodes taken away: (n-3)^(n-3).
June 12, 2008 at 4:26 pm
Philipp
Dear Todd,
you wanted to know one of the proofs of Cayley’s formula Aigner and Ziegler give in their book. Their first proof is actually Joyal’s / Labelle’s proof you mentioned above.
I will sketch their second proof. It uses linear algebra and it is based on Kirchhoff’s matrix tree theorem. (I think Aigner and Ziegler explain it much better than I do, so I recommend reading their book.)
Let n be a positive integer greater than one. Let us consider the
matrix
given by
for
and
for
. The first thing to notice is
. They don’t give a proof for this, but this is easy. Notice that
is the characteristic polynomial of the matrix
, whose entries are all one, evaluated at
. The matrix
has got two eigenvalues. The eigenvalue 0 has a
-dimensional eigenspace consisting of vectors
with
and multiplicity
. The eigenvalue
has multiplicity
. Thus the characteristic polynomial of
is
. If we plug in
we get
.
Now we have to prove that
is the number of spanning trees on
vertices. Note that
where the matrix
is constructed as follows: We consider a complete graph
on
vertices labelled
. It has got
edges which we label
. The matrix
is a
matrix. Its rows are labelled by
and its columns are labelled by
. If an edge
joins the vertices
then the entries of
in the column corresponding to
are
at position
,
at position
, and 0 otherwise. The identity
follows from the fact that two different vertices in
are joined by exactly one edge and every vertex has
emanating edges.
Now they use the Cauchy-Binet formula. We have
where the sum runs over all subsets of size
and
is the matrix obtained from
by considering only the columns indexed by elements corresponding to
.
To finish the proof they show that
is zero if the edges corresponding to
do not form a spanning tree (which is easy since the matrix has a zero row) and
if the edges corresponding to S form a spanning tree: In this case, one of the vertices
is a leaf. The row i contains only one nonzero entry and this entry is
. Laplace expansion and induction finishes the proof.
PS: I am not familiar with including latex commands in wordpress. I hope everything works.
PPS: There are variations of your identity. The following identity for example is due to V. Babev and holds for
:

June 12, 2008 at 10:12 pm
Todd Trimble
Thanks for writing that out, Philipp! That proof does seem clever and nice, but I don’t have enough insight into the Cauchy-Binet formula that this proof packs quite the same punch for me as Labelle’s. Could be a “cultural” thing, though.
I’m a little puzzled about this Babev identity: am I making a mistake, or is this “trivial”? If I write the identity as
then it looks to me like a special case of the statement that the
order difference operator (that is, the
iteration of the operator which takes a function
to
), applied to the function
, is identically 0. In any event, this type of alternating sum with binomial coefficients looks suspiciously along these lines — did I goof up somewhere?
June 12, 2008 at 11:13 pm
Todd Trimble
That bit about “trivial” may have sounded a bit jerk-y; let me say it more nicely this time.
I mean that when we iterate the differencing operator, we get
and so on. The
difference is
and my comment is that the
difference applied to the function
is identically zero, since the difference of a degree
polynomial is a degree
polynomial. Then we specialize to
to get the Babev identity.
June 13, 2008 at 11:36 am
Philipp
Oh, you’re right. Babev’s identity is indeed very easy to prove. I wasn’t aware of that.
When I came across this identity I proved it in the following way. (By the way I found it in a list of problem proposals for a competition on the AoPS-Website http://www.artofproblemsolving.com/Forum/viewtopic.php?t=160249&sid=bd125f6f61c7ebdc5d92b90f485b66a8)
According to Cayley’s formula, the RHS is the number of spanning trees on the vertex set
. It is enough to prove that the LHS is the same. Note that the LHS is equal to
where the sum runs over all subsets S of V with
. But
is the number of spanning trees in which all vertices in S are leaves: if T is a spanning tree in which the elements in S are leaves, then no two elements in S are linked by an edge. The subgraph on the vertex set
is a spanning tree. There are
possible spanning trees. Each element in S is linked to an element in
which can be done in
ways. By the principle of inclusion and exclusion,
is the number of spanning trees on V.
I had this problem in mind when I solved your POW 4. That’s the reason why I thought of Cayley’s formula.
Below I’ll (very briefly) sketch Aigner’s and Ziegler’s third proof. It is due to Riordan and Renyi and it is based on a recursion. Let
as above. Define
to be the number of forests
that are the union of k disjoint trees such that every
belongs to exactly one
and
belongs to
iff
. If you have such a forest, you can delete the vertex 1. If the vertex 1 is linked to i vertices, the new forest consists of
trees. Its vertex set consists of the
elements
. In this way they obtain a recursion formula
Now they use the recursion formula to prove
. This is proved by induction on n. The induction step is a calculation of a binomial sum using the recursion formula and it is maybe six lines long. Now
gives Cayley’s formula.
July 2, 2008 at 11:16 pm
Todd Trimble
Philipp — thanks for posting this last proof of Cayley’s theorem (and sorry for the delay in acknowledging it) — it’s very pretty. I also like your proof of the Babev identity.
You are right that there are many variations on the identity of POW-4. I just ran across an article in an old American Mathematical Monthly, “More Trees and Power Sums” by James E. Simpson (November, 1986), which gives a whole bunch, e.g., for
,
His methods give pride of place to matrix tree theorems, and also Abel identities (see Catalog # 3900048 here).
March 28, 2009 at 9:06 pm
Number of idempotent endofunctions « Todd and Vishal’s blog
[...] I’ve actually referred to before, in connection with a POW whose solution involves counting tree [...]