You are currently browsing the category archive for the ‘Elementary Math Problem Solving’ category.

High-school students and undergraduates are (almost) always taught the following definition of an equivalence relation.

A binary relation R on a set A is an equivalence iff it satisfies

  • the reflexive property: for all a  in A, a R a,
  • the symmetric property: for all a, b in A, if a R b, then b R a, and
  • the transitive property: for all a, b, c in A, if a R b and b R c, then a R c.

However, there is another formulation of an equivalence relation that one usually doesn’t hear about, as far as I know. And, it is the following one.

A binary relation R on a set A is an equivalence iff it satisfies

  • the reflexive property: for all a  in A, a R a, and
  • the euclidean property: for all a, b, c in A, if a R b and a R c, then b R c.

Exercise:  Show that a binary relation R on a set A is reflexive, symmetric and transitive iff it is reflexive and euclidean.

Last summer, Todd and I discussed a problem and its solution, and I had wondered if it was fit enough to be in the POW-series (on this blog) when he mentioned that the problem might be somewhat too easy for that purpose. Of course, I immediately saw that he was right. But, a few days back, I thought it wouldn’t be bad if we shared this cute problem and its solution over here, the motivation being that some of our readers may perhaps gain something out of it. What is more, an analysis of an egf solution to the problem lends itself naturally to a discussion of combinatorial species. Todd will talk more about it in the second half of this post. Anyway, let’s begin.

PROBLEM: Suppose A = \{ 1,2, \ldots , n \}, where n is a positive natural number. Find the number of endofunctions f: A \rightarrow A satisfying the idempotent property, i.e. f \circ f = f.

It turns out that finding a solution to the above problem is equivalent to counting the number of forests with n nodes and height at most 1, which I found here (click only if you wish to see the answer!) at the Online Encyclopedia of Integer Sequences. If you haven’t clicked on that link yet and wish to solve the problem on your own, then please stop reading beyond this point.

SOLUTION: There are two small (and related) observations that need to be made. And, both are easy ones.

Lemma 1: f has at least one fixed point.

Proof: Pick any i \in A and let f(i) = j, where j \in A. Then, using the idempotent property, we have f(f(i)) = f(i), which implies f(j) = j. Therefore, j is a fixed point, and this proves our lemma.

Lemma 2: The elements in A that are not fixed points are mapped to fixed points of f.

Proof: Supposej \in A is not a fixed point such that f(j) = k.  Then, using the idempotent property again, we immediately have f(f(j)) = f(j), which implies f(k) = k, thereby establishing the fact that k itself is a fixed point. Hence, j (which is not a fixed point) is mapped to some fixed point of f.

In both the lemmas above, the idempotent property “forces” everything.

Now, the solution is right before our eyes! Suppose f has m fixed points. Then there are \displaystyle \binom{n}{m} ways of choosing them. And, each of the remaining n - m elements of A that are not fixed points are to be mapped to any one of the m fixed points. And, there are a total of m^{n-m} ways of doing that. So, summing over all m, our final answer is \displaystyle \sum_{m=0}^{n} \binom{n}{m} m^{n-m}.

Exponential Generating Function and Introduction to Species

Hi; Todd here. Vishal asked whether I would discuss this problem from the point of view of exponential generating functions (or egf’s), and also from a categorical point of view, using the concept of species of structure, which gives the basis for a categorical or structural approach to generatingfunctionology.

I’ll probably need to write a new post of my own to do any sort of justice to these topics, but maybe I can whet the reader’s appetite by talking a little about the underlying philosophy, followed by a quick but possibly cryptic wrap-up which I could come back to later for illustrative purposes.

Enumerative combinatorics studies the problem of counting the number a_n of combinatorial structures of some type on an n-element set, such as the number of idempotent functions on that set, or the number of equivalence relations, and so on. A powerful idea in enumerative combinatorics is the idea of a generating function, where we study the series a_n by rolling them into a single analytic function, such as

\displaystyle A(x) = \sum_{n \geq 0} \frac{a_n x^n}{n!},

(this the so-called “exponential” generating function of \{a_n\}_{n \geq 0}). In many cases of interest, the function A(x) will be recognizable in terms of operations familiar from calculus (addition, multiplication, differentiation, composition, etc.), and this can then be used to extract information about the series a_n, such as explicit formulas, asymptotics, and so on. If you’ve never seen this idea in action, you should definitely take a look at Wilf’s book generatingfunctionology, or at the book Concrete Mathematics by Graham, Knuth and Patashnik.

Each of the basic operations one performs on analytic functions (addition, multiplication, composition, etc.) will, it turns out, correspond to some set-theoretic operation directly at the level of combinatorial structures, and one of the trade secrets of generating function technology is to have very clear pictures of the combinatorial structures being counted, and how these pictures are built up using these basic structural operations.

In fact, why don’t we start right now, and figure out what some of these structural operations would be? In other words, let’s ask ourselves: if A(x) and B(x) are generating functions for counting combinatorial structures of type (or species) A and B, then what types of structures would the function A(x) + B(x) “count”?  How about A(x)B(x)? Composition A(B(x))?

The case of A(x) + B(x) is easy: writing

\displaystyle A(x) + B(x) = \sum_{n \geq 0} \frac{a_n x^n}{n!} + \sum_{n \geq 0} \frac{b_n x^n}{n!} = \sum_{n \geq 0} \frac{(a_n + b_n) x^n}{n!},

and thinking of a_n as counting structures of type A on an n-element set, and b_n as counting structures of type B, the quantity a_n + b_n counts elements in the disjoint union of the sets of A-structures and B-structures.

In the categorical approach we will discuss later, we actually think of structure types (or species of structure) A as functors, which take an n-element set S to the set A\left[S\right] of structures of type A on S. Here, we have to be a little bit careful about what categories we’re talking about, but the general idea is that if we have a bijection f: S \to T from one n-element set to another, then it should always be possible to “transport” A-structures on S to A-structures on T, simply by relabeling points along the bijection f. So, let us define a species to be a functor

A: FB \to Set

where FB is the category of finite sets and bijections (not all functions, just bijections!), and Set is the category of sets. In enumerative combinatorics, the set A\left[S\right] is normally assumed to be finite, but in other applications of the notion of species, we actually allow a lot more latitude, and allow the functor A to map into other categories C, not just Set (“C-valued species”). But if we stick for now just to set-valued species A, B, then we define the species A + B by the functorial formula

\displaystyle (A + B)\left[S\right] = A\left[S\right] \sqcup B\left[S\right]

where \sqcup denotes disjoint union. So addition of generating functions will correspond to the concrete operation of taking disjoint unions of sets of combinatorial species.

More interesting is the case of multiplication. Let’s calculate the product of two egf’s:

\displaystyle A(x) B(x) = (\sum_{j \geq 0} \frac{a_j x^j}{j!})(\sum_{k \geq 0} \frac{b_k x^k}{k!}) = \sum_{n \geq 0} (\sum_{j + k = n} \frac{n!}{j! k!} a_j b_k) \frac{x^n}{n!}

The question is: what type of structure does the expression \displaystyle \sum_{j+k = n} \frac{n!}{j! k!} a_j b_k “count”? Look at the individual terms: the binomial coefficient \displaystyle \frac{n!}{j! k!} describes the number of ways of decomposing an n-element set into two disjoint subsets, one with j elements and the other with k, where j and k add to n. Then, a_j is the number of ways of putting an A-structure on the j-element part, and b_k is the number of B-structures on the k-element part.

This suggests a new operation on structure types: given structure types or species A, B, we define a new species A \otimes B according to the formula

\displaystyle (A \otimes B)\left[S\right] = \bigsqcup_{T \sqcup U = S} A\left[T\right] \times B\left[U\right]

(that is, a structure of type A \otimes B on a set S is an ordered pair, consisting of an A-structure on a subset of S and a B-structure on its complement). This functorial operation is usually called the “convolution product” of the combinatorial species A, B: it is the concrete set-theoretic operation which corresponds to multiplication of generating functions.

Finally, let’s look at composition A(B(x)). Here we make the technical assumption that b_0 = 0 (no B-structures on the empty set!), so that we won’t have divergence issues blowing up in our faces: we want to remain safely within the realm of finitary structures. Okay, then, what type of combinatorial structure does this egf count?

Perhaps not surprisingly, this is rather more challenging than the previous two examples. In analytic function language, we are trying here to give a meaning to the Taylor coefficients of a composite function in terms of the Taylor coefficients of the original functions — for this, there is a famous formula attributed to Faà di Bruno, which we then want to interpret combinatorially. If you don’t already know this but want to think about this on your own, then stop reading! But I’ll just give away the answer, and say no more for now about where it comes from, although there’s a good chance you can figure it out just by staring at it for a while, possibly with paper and pen in hand.

Definition: Let A, B: FB \to Fin be species (functors from finite sets and bijections to finite sets), and assume B\left[\emptyset\right] = \emptyset. The substitution product A \circ B is defined by the formula

\displaystyle (A \circ B)\left[S\right] = \sum_{E \in Eq(S)} A\left[S/E\right] \times \prod_{c \in S/E} B\left[c\right]

This clearly requires some explanation. The sum here denotes disjoint union, and Eq(S) denotes the set of equivalence relations on the finite set S. So E here is an equivalence relation, which partitions S into nonempty sets c (E-equivalence classes). And the quotient S/E denotes the set of such equivalence classes (so we think of each class c as a point of S/E). What this formula says is that a structure of type A \circ B on S consists of a partition of S into a bunch of non-empty blobs, a B-structure on each blob, and then an A-structure on the set of blobs.

It’s high time for an example! So let’s look at Vishal’s problem, and see if we can picture it in terms of these operations. We’re going to need some basic functions (or functors!) to apply these operations to, and out of thin air I’ll pluck the two basic ones we’ll need:

\displaystyle E(x) = \exp(x) = \sum_{n \geq 0} \frac{x^n}{n!}

F(x) = x

The first is the generating function for the series e_n = 1. So for the species E, there’s just one structure of type E for each set S (in categorical language, the functor E: FB \to Set is the terminal functor). We can just think of that structure as the set S itself, if we like, with no other structure appended thereon.

For F, we have f_n = 0 unless n = 1, where f_1 = 1. So F is the species for the one-element set structure (meaning that F\left[S\right] = \emptyset unless S has cardinality 1, in which case F\left[S\right] = \{S\}).

Okay, on to Vishal’s example. He was counting the number of idempotent functions f: S \to S, and now, as promised, I want to determine the corresponding egf. You might be able to find it by looking at his formula, but obviously I want to use the ideas I’ve developed thus far, which focuses much more on the pictures. So, let’s picture f: S \to S, first as Vishal did, by thinking of the elements of S as ‘nodes’, and then drawing a directed edge from node x to node y if f(x) = y. (Then, by idempotence of f, y will be a fixed point of f. Let’s agree not to bother drawing an edge from y to itself, if y is already a fixed point.)

In this picture, we get a directed graph which consists of a disjoint union of “sprouts”: little bushes, each rooted at a fixed point of f, whose only other nodes are “leaves” joined to the root by an edge. We can simplify the picture a little: if you put a circle around each sprout, you don’t need the edges at all: just mark one of the points inside as the root, and you know what to do.

So we arrive at a picture of an idempotent function on S: a partition of S into a collection of (nonempty) blobs, and inside each blob, one of the points is marked “root”. In terms of our operations, what does it mean to mark a point in a blob? It just means: break the blob into two pieces, where one piece is given the structure of “one-element set”, and the other piece is just itself. In terms of the ideas developed above, this means each blob carries a F \otimes E structure; we’ll suggestively write this structure type as X \otimes \exp(X).

In this picture of idempotent f, there is no extra combinatorial structure imposed on the set of blobs, beyond the set itself. In other words, in this picture, the set of blobs carries merely an “E-structure”, nothing more.

So, putting all this together, we picture an idempotent function on S as a partition or equivalence relation on S, together with an assignment of a marked point in each equivalence class. In the language of species operations, we may therefore identify the structure type of idempotent functions with

E \circ (F \otimes E)

or more suggestively, \exp \circ (X \otimes \exp(X)). The exponential generating function is, of course, e^{x e^x}!

In summary, the theory of species is a functorial calculus which projects onto its better-known “shadow”, the functional calculus of generating functions. That is to say, we lift operations on enumeration sequences \{a_n\}, as embodied in their generating functions, directly up to the level of the sets we’re counting, where the functorial operations become both richer and more concrete. The functorial analogue of the generating function itself is called the “analytic functor” attached to the species (the species itself being the concrete embodiment of the enumeration).

Much more could be said, of course. Instead, here’s a little exercise which can be solved by working through the ideas presented here: write down the egf for the number of ways a group of people can be split into pairs, and give an explicit formula for this number. Those of you who have studied quantum field theory may recognize this already (and certainly the egf is very suggestive!) ; in that case, you might find interesting the paper by Baez and Dolan, From Finite Sets to Feynman Diagrams, where the functorial point of view is used to shed light on, e.g., creation and annihilation operators in terms of simple combinatorial operations.

The literature on species (in all its guises) is enormous, but I’d strongly recommend reading the original paper on the subject:

  • André Joyal, Une théorie combinatoire des séries formelles, Adv. Math. 42 (1981), 1-82.

which I’ve actually referred to before, in connection with a POW whose solution involves counting tree structures. Joyal could be considered to be a founding father of what I would call the “Montreal school of combinatorics”, of which a fine representative text would be

  • F. Bergeron, G. Labelle, and P. Leroux, Combinatorial Species and Tree-like Structures, Encyclopedia of Mathematics and its Applications 67, 1998.

More to come, I hope!

I thought I would share with our chess-loving readers the following interesting (and somewhat well-known) mathematical chess paradox , apparently proving that 64=65, and the accompanying explanation offered by Prof. Christian Hesse, University of Stuttgart (Germany).  It shows a curious connection between the well-known Cassini’s identity (related to Fibonacci numbers) and the 8 \times 8 chessboard (8 being a Fibonacci number!). The connection can be exploited further to come up with similar paradoxes wherein any F_n \times F_n -square can always be “rerranged” to form a F_{n-1} \times F_{n+1} -rectangle such that the difference between their areas is either +1 or -1. Of course, for the curious reader there are plenty of such dissection problems listed in Prof David Eppstein’s Dissection page.

Happy holidays, folks! And happy birthday to Topological Musings, which Vishal started just a little over a year ago — we’ve both had a really good time with it.

And in particular with the Problem of the Week series. Which, as we all know, doesn’t come out every week  — but then again, to pinch a line from John Baez’s This Week’s Finds, we never said it would: only that each time a new problem comes out, it’s always the problem for that week! Anyway, we’ve been very gratified by the response and by the ingenious solutions we routinely receive — please keep ’em coming.

This problem comes courtesy of regular problem-solver Arin Chaudhuri and — I’ll freely admit — this one defeated me. I didn’t feel a bit bad though when Arin revealed his very elegant solution, and I’d like to see what our readers come up with. Here it is:

What is the maximum number of colors you could use to color the vertices of an 8-dimensional cube, so that starting from any vertex, each color occurs as the color of some neighbor of that vertex? (Call two vertices neighbors if they are the two endpoints of an edge of the cube.)

Arin, Vishal, and I would be interested and pleased if someone solved this problem for all dimensions n.

Please submit solutions to topological[dot]musings[At]gmail[dot]com by Friday, January 2, 2009, 11:59 pm (UTC); do not submit solutions in Comments. Everyone with a correct solution will be inducted into our Hall of Fame! We look forward to your response.

The following “polynomial-logarithmic” algebraic identity that one encounters on many occasions turns out to have a rather useful set of applications! 

POLYNOMIAL-LOGARITHMIC IDENTITY: If P(z) is a polynomial of degree n \ge 1 with roots a_1, a_2, \ldots, a_n, then \displaystyle \frac{P'(z)}{P(z)} = \frac1{z-a_1} + \frac1{z-a_2} + \ldots + \frac1{z-a_n}.

PROOF: This one is left as a simple exercise. (Hint: Logarithms!)

A nice application of the above identity is found in one of the exercises from the chapter titled Analysis (p120) in Proofs from the Book by Aigner, Ziegler and Hofmann. 

EXERCISE: Let p(x) be a non-constant polynomial with only real zeros. Show that p'(x)^2 \ge p(x) p''(x) for all x \in \mathbb{R}.

SOLUTION: If x = a_i is a zero of p(x), then the right hand side of the above inequality equals zero, and we are done. So, suppose x is not a root of p(x). Then, differentiating the above identity w.r.t. x, we obtain \displaystyle \frac{p''(x)p(x) - p'(x)^2}{p(x)^2} = - \sum_{k=1}^n \frac1{(x - a_k)^2} < 0, and we are done.

It turns out that the above identity can also used to prove the well-known Gauss-Lucas theorem.

GAUSS-LUCAS: If P is a non-constant polynomial, then the zeros of P' lie in the convex hull of the roots of P.

PROOF: See this

HISTORY: The well-known Russian author V.V. Prasolov in his book Polynomials offers a brief and interesting historical background of the theorem, in which he points out that Gauss’ original proof (in 1836) of a variant of the theorem was motivated by physical concepts, and it was only in 1874 that F. Lucas, a French Engineer, formulated and proved the above theorem. (Note that the Gauss-Lucas theorem can also be thought of as some sort of a generalization (at least, in spirit!) of Rolle’s theorem.)

Even though I knew the aforesaid identity before, it was once again brought to my attention through a nice (and elementary) article, titled On an Algebraic Identity by Roberto Bosch Cabrera, available at Mathematical Reflections. In particular, Cabrera offers a simple solution, based on an application of the given identity, to the following problem (posed in the 2006 4th issue of Mathematical Reflections), the solution to which had either escaped regular problem solvers or required knowledge of some tedious (albeit elementary) technique. 

PROBLEM: Evaluate the sum \displaystyle \sum_{k=0}^{n-1} \frac1{1 + 8\sin^2 (k\pi /n)}. (proposed by Dorin Andrica and Mihai Piticari.)

SOLUTION: (Read Cabrera’s article.)

There is yet another problem which has a nice solution based again on our beloved identity!

PROBLEM: (Putnam A3/2005) Let p(z) be a polynomial of degree n, all of whose zeros have absolute value 1 in the complex plane. Put g(z) = p(z)/z^{n/2}. Show that all zeros of g'(z) = 0 have absolute value 1.

SOLUTION: (Again, read Cabrera’s article.)

The solutions to POW-11 are in! Or make that solution, unless you also count your hosts’: only one reader responded. Perhaps everyone has been distracted by the upcoming US presidential election?!

Composite solution by Philipp Lampe, University of Bonn, and Todd and Vishal: We claim there are exactly two functions h: \mathbb{N} \to \mathbb{N} which satisfy the identity h(p^2 + q^2) = h(p)^2 + h(q)^2: the identity function and the function which is identically zero.

Clearly h(0) = h(0^2 + 0^2) = 2h(0)^2 forces h(0) = 0, whence h(1) = h(1^2 + 0^2) = h(1)^2.  Therefore h(1) = 0 or h(1) = 1, and the claim above follows if we show that h(p) = ph(1) for all p \in \mathbb{N}. The first few cases are built up as follows:

h(2) = h(1^2 + 1^2) = 2h(1)^2 = 2h(1)

h(4) = h(2^2) = (2h(1))^2 = 4h(1)

h(5) = h(2^2 + 1^2) = (2h(1))^2 + h(1)^2 = 5h(1)

h(3) = \sqrt{h(3^2 + 4^2) - h(4)^2} = \sqrt{h(5)^2 - h(4)^2} = \sqrt{(25-16)h(1)^2}

= 3h(1)

h(8) = h(2^2 + 2^2) = (2h(1))^2 + (2h(1))^2 = 8h(1)

h(10) = h(3^2 + 1^2) = (3h(1))^2 + h(1)^2 = 10h(1)

h(6) = \sqrt{h(6^2 + 8^2) - h(8)^2} = \sqrt{h(10)^2 - h(8)^2}

= \sqrt{(100-64)h(1)^2} = 6h(1)

where in a few places we have invoked the simple lemma that h(p^2) = h(p)^2 [since h(p^2 + 0^2) = h(p)^2 + 0^2]. The remainder of the argument proceeds by strong induction. For odd p = 2m+1 > 5, we have

(2m+1)^2 + (m-2)^2 = (2m-1)^2 + (m+2)^2 \qquad (1)

Since h preserves sums of two squares,  we derive

h(2m+1)^2 + h(m-2)^2 = h(2m-1)^2 + h(m+2)^2

Therefore, using the inductive hypothesis that h(q) = qh(1) for q < 2m+1, we have

h(2m+1)^2 + (m-2)^2h(1) = (2m-1)^2 h(1) + (m+2)^2 h(1) \qquad (2)

From equation (1) we also have

(2m+1)^2h(1) + (m-2)^2h(1) = (2m-1)^2h(1) + (m+2)^2h(1) \qquad (3)

Comparing equations (2) and (3), we infer h(2m+1) = (2m+1)h(1). In particular, h(q) = qh(1) for q = 7, 9, and we have the case h(8) = 8h(1) from before. This will be used to start the induction for even p, starting from p = 10.

For even p = 2m+2, where m \geq 4, we have

(2m + 2)^2 + (m-4)^2 = (2m-2)^2 + (m+4)^2 \qquad (4)

and by arguing as we did above, we derive the two equations

h(2m+2)^2 + h(m-4)^2 = h(2m-2)^2 + h(m+4)^2

h(2m+2)^2 + (m-4)^2h(1) = (2m-2)^2h(1) + (m+4)^2h(1)

where the second follows from the first by inductive hypothesis. Multiplying (4) through by h(1) and comparing with the last equation, we conclude h(2m+2) = (2m+2)h(1), and this completes the proof.


The underlying idea is to exploit the fact that many numbers can be expressed as a sum of two squares in more than one way, so that one can generate enough relations of the form h(p)^2 + h(q)^2 = h(r)^2 + h(s)^2 to nail down all the h(p) [in terms of h(1)]. A powerful way of generating equations of the form p^2 + q^2 = r^2 + s^2 is to consider prime factorizations in the ring of Gaussian integers \mathbb{Z}\left[i\right]. For example, in the Gaussian integers we have the prime factorizations

5 = (2 + i)(2-i) \qquad 13 = (2 + 3i)(2-3i)

so that there are two ways of writing their product 65 in the form z \cdot \bar{z}:

(2+i)(2+3i) \cdot (2-i)(2-3i), \qquad (2+i)(2-3i) \cdot (2-i)(2+3i)

Here we have, respectively, z = 1 + 8i, z = 7 - 4i, whence 65 = z \cdot \bar{z} = 1^2 + 8^2 = 7^2 + 4^2. Extrapolating from this, if we set

z = (a+bi)(c+di), \qquad z' = (a+bi)(c-di)

then we have |z|^2 = z \cdot \bar{z} = z' \cdot \bar{z'} = |z'|^2, whence we derive

(ac - bd)^2 + (bc + ad)^2 = (ac + bd)^2 + (bc - ad)^2.

This gives a large supply of available relations to work with, and from there it is not too bad to get an inductive argument up and flying, once some base cases have been cleared out of the way with some numerical experimentation.

[Although it’s a little unusual to need such an outlay of base cases (up to at least p = 6) before one can start the induction; it creates a psychological hurdle, I feel. It reminds me of POW-6, where no complete solutions were received: one has to fiddle with a number of base cases before the general argument can be pushed through.]

Incidentally, while researching this problem, I found the wonderful website MathPages, created by an all but anonymous author named Kevin Brown. I really recommend checking out this website, and I hope that it is somehow kept alive: it would be a shame for such a treasure trove to be buried and forgotten in some deep hollow of cyberspace.

Time for another problem of the week! This one doesn’t necessarily require any number-theoretic knowledge, although a little bit might help point you in the right direction. As usual, let \mathbb{N} be the set of natural numbers, i.e., the set of nonnegative integers.

Describe all functions h: \mathbb{N} \to \mathbb{N} such that h(p^2 + q^2) = h(p)^2 + h(q)^2 for all p, q \in \mathbb{N}.

Please submit solutions to topological[dot]musings[At]gmail[dot]com by Sunday, November 2, 11:59 pm (UTC); do not submit solutions in Comments. Everyone with a correct solution will be inducted into our Hall of Fame! We look forward to your response.

The solutions are in! As is so often the case, solvers came up with a number of different approaches; the one which follows broadly represents the type of solution that came up most often. I’ll mention some others in the remarks below, some of it connected with the lore of Richard Feynman.

Solution by Nilay Vaish: The answer to POW-10 is \displaystyle \frac{\pi}{2}\ln 2. Put

\displaystyle I = \int_{0}^{\pi/2} \frac{x}{\tan x} dx = \int_{0}^{\pi/2} \frac{x\cos x}{\sin x} dx

and integrate by parts: we have

\displaystyle I = x\ln \sin x |_{0}^{\pi/2} - \int_{0}^{\pi/2} \ln \sin x dx.

The first term vanishes by a simple application of L’hôpital’s rule. We now have

\displaystyle I \stackrel{(1)}{=} -\int_{0}^{\pi/2} \ln \sin x dx = -\int_{0}^{\pi/2} \ln \cos x dx

where the second equation follows from \cos x = \sin(\pi/2 - x), and the general elementary fact that \int_{b}^{c} f(x) dx = \int_{a-c}^{a-b} f(a-x) dx. Adding the last two displayed equations, we obtain

\displaystyle 2I = -\int_{0}^{\pi/2} \ln (\sin (x) \cos (x)) dx = -\int_{0}^{\pi/2} \ln(\frac{\sin 2x}{2}) dx.

This gives

\displaystyle 2I = \int_{0}^{\pi/2} \ln 2 dx - \int_{0}^{\pi/2} \ln \sin 2x dx = \frac{\pi \ln 2}{2} - \frac1{2} \int_{0}^{\pi} \ln \sin t dt \qquad (2)

after a simple substitution. The last integral splits up as two integrals:

\displaystyle \int_{0}^{\pi} \ln \sin t dt = \int_{0}^{\pi/2} \ln \sin t dt + \int_{\pi/2}^{\pi} \ln \sin t dt \qquad (3)

but these two integrals are equal, using the identity \sin t = \sin(\pi - t) together with the general integration fact cited above. Hence the two sides of (3) equal

\displaystyle 2\int_{0}^{\pi/2} \ln \sin t dt = -2I \qquad (4),

recalling equation (1) above. Substituting this for the last integral in equation (2), we arrive at

\displaystyle 2I = \frac{\pi}{2} \ln 2 + I

whence we derive the value of the desired integral I .


1. A number of solvers exploited variations on the theme of the solution above, which could be summarized as involving symmetry considerations together with a double angle formula. For example, Philipp Lampe and Paul Shearer in their solutions made essential use of the identity

2 \cot(2x) = \cot x - \tan x

in conjunction with the complementarity \cot x = \tan(\pi/2 - x), and the general integration fact cited above.

2. Arin Chaudhuri (and Vishal in private email) pointed out to me that the evaluation of the integral

\displaystyle \int_{0}^{\pi/2} \ln \sin x dx

is actually fairly well-known: it appears for example in Complex Analysis by Ahlfors (3rd edition, p. 160) to illustrate contour integration of a complex analytic function via the calculus of residues, and no doubt occurs in any number of other places.

3. Indeed, Simon Tyler in his solution referred to this as an integral of Clausen type, and gave a clever method for evaluating it: we have

\displaystyle \int_{0}^{\pi/2} \ln \sin x dx = \frac1{2}\int_{0}^{\pi/2} \ln \frac1{2} (1 - \cos 2x) dx

which works out to

\displaystyle -\frac{\pi \ln 2}{4} + \frac1{4}\int_{0}^{\pi} \ln(1 - \cos x) dx \qquad (*)

The last integral can be expanded as a series

\displaystyle -\sum_{n \geq 1} \int_{0}^{\pi} \frac1{n} \cos^n(x) dx

where the summands for odd n vanish; the other terms can be collected and then resummed to yield

\displaystyle \frac1{2} \int_{0}^{\pi} \ln(1 - \cos^2 x) dx = \int_{0}^{\pi} \ln \sin x dx = 2 \int_{0}^{\pi/2} \ln \sin x dx

and by substituting this for the integral in (*), the original integral is easily evaluated.

4. Arin C. later described still another method which he says he got from an exercise in a book by Apostol — it’s close in spirit to the one I myself had in mind, called “differentiation under the integral sign”, famously referred to in Surely You’re Joking, Mr. Feynman!. As Feynman recounts, he never really learned fancy methods like complex contour integration, but he did pick up this method of differentiating under the integral sign from an old calculus book. In typical Feynman style, he writes,

“It turns out that’s not taught very much in the universities; they don’t emphasize it. But I caught on how to use that method, and I used that one damn tool again and again. So because I was self-taught using that book [Wilson’s Advanced Calculus], I had peculiar methods of doing integrals. The result was, when guys at MIT or Princeton had trouble doing a certain integral, it was because they couldn’t do it with the standard methods they had learned in school. If it was contour integration, they would have found it; if it was a simple series expansion, they would have found it. Then I come along and try differentiating under the integral sign, and often it worked. So I got a great reputation for doing integrals, only because my box of tools was different from everybody else’s, and they had tried all their tools on it before giving the problem to me.”

So, what is this method? Wikipedia has a pretty good article on it; the idea is to view a given definite integral \int_{a}^{b} f(x) dx as an instance of a smoothly parametrized family G(t) = \int_{a}^{b} F(x, t) dx of integrals, where the extra parameter t is inserted at some judicious spot, in such a way that G(t) has an easy-to-integrate derivative. Then one figures out G, and evaluates it at the t which yields the original definite integral.

The best way to understand it is through an example. Pretend you’re Feynman, and someone hands you

\displaystyle \int_{0}^{1} \frac{x^{10} - 1}{\ln x} dx

[I suppose there may be a contour integration method to handle that one, but never mind — you’re Feynman now, and not supposed to know how to do that!] After fiddling around a little, you find that by inserting a parameter t in the exponent,

\displaystyle G(t) := \int_{0}^1 \frac{x^{10 t} - 1}{\ln x} dx,

this G(t) has a very simple derivative:

\displaystyle G'(t) = \frac{d}{dt} \int_{0}^{1} \frac{x^{10 t} - 1}{\ln x} dx = \int_{0}^{1} \frac{\partial}{\partial t} \frac{x^{10 t} - 1}{\ln x} dx

i.e., by differentiating under the integral sign, you manage to kill off that annoying factor \ln x in the denominator:

\displaystyle G'(t) = \int_{0}^{1} 10 x^{10 t} dx = \frac{10 x^{10 t + 1}}{10 t + 1} |_{0}^{1} = \frac{10}{10 t + 1}.

We therefore have

G(t) = \ln(10 t + 1) + C,

and notice that from the original definition of G(t), we have G(0) = 0. Thus C = 0, and the problem integral evaluates to G(1) = \ln 11. Bam!

It takes a little experience to judge where or how to stick in the extra parameter t to make this method work, but the Wikipedia article has some good practice problems, including the integral of POW-10. For this problem they recommend considering

\displaystyle G(t) = \int_{0}^{\pi/2} \frac{\tan^{-1} (t\tan x)}{\tan x} dx

and I’ll let you the reader take it from there.

The point is not that this method is much more elegant than others, but that a little practice with it should be enough to convince one that it’s incredibly powerful, and often succeeds where other methods fail. (Not always, of course, and Feynman had to concede defeat when in response to a challenge, his pal Paul Olum concocted some hellacious integral which he said could only be done via contour integration.)

Doron Zeilberger is also fond of this method (as witnessed by this paper). Something about it makes me wonder whether it’s secretly connected with the idea of “creative telescoping” and the powerful methods (e.g., see here) developed by Gosper, Wilf, Zeilberger, and others to establish identities of hypergeometric type. But I haven’t had time to consider this carefully.

Also solved by Arin Chaudhuri, Philipp Lampe (University of Bonn), Paul Shearer (University of Michigan), and Simon Tyler. Thanks to all who wrote in!

Another problem of the week has been long overdue. Here’s one that may appeal to those who liked POW-5 (maybe we should have an integration problem every fifth time?):

Evaluate \displaystyle \int_{0}^{\pi/2} \frac{x}{\tan x} dx.

Please submit solutions to topological[dot]musings[At]gmail[dot]com by Friday, October 10, 11:59 pm (UTC); do not submit solutions in Comments. Everyone with a correct solution will be inducted into our Hall of Fame! We look forward to your response.

It’s been quite a while since I’ve posted anything; I’ve been very busy with non-mathematical things this past month, and probably will be for at least another few weeks. In the meantime, I’d been idly thinking of another Problem of the Week to post, and while I was thinking of one such problem… I got stuck. So instead of just baldly posing the problem, this time I’ll explain how far as I got with it, and then ask you all for help.

The problem is easy to state: except for 1, are there any positive integers which are simultaneously triangular, square, and pentagonal numbers?

Just asking for all numbers which are simultaneously triangular and square is a fun little exercise in number theory, involving Pell’s equation and (yes) continued fractions. I’ll sketch how this goes. We are trying to solve the Diophantine equation

\displaystyle m^2 = \frac{n(n+1)}{2}

By completing the square and manipulating a little, we get

8m^2 = (2n+1)^2 - 1.

This converts to a Pell equation x^2 - 2y^2 = 1 where x = 2n+1, y = 2m. Writing the Pell equation as

N(x + \sqrt{2}y) := (x + \sqrt{2}y)(x - \sqrt{2}y) = 1,

and using the fact that the norm map N: \mathbb{Q}(\sqrt{2}) \to \mathbb{Q} just defined is a multiplicative homomorphism (because the Galois automorphism x + \sqrt{2}y \mapsto x - \sqrt{2}y is also multiplicative), we see that the set of solutions

\{(x, y): N(x + \sqrt{2}y) = 1\}

forms a group, under the group multiplication law

(x, y) \cdot (x', y') = (x x' + 2y y', x y' + y x')

which is read off just by expanding (x + \sqrt{2}y)(x' + \sqrt{2}y'). The right branch of the hyperbola x^2 - 2y^2 = 1 thus forms a group (under this multiplication law, with identity (1, 0)) which is isomorphic to the additive group of real numbers. The set of integer solutions sitting on this branch forms a discrete subgroup, and hence is isomorphic to the additive group of integers (provided that nontrivial solutions exist!). To find a generator of this group, just look for the integer solutions to x^2 - 2y^2 = 1 which come closest to the identity (1, 0); they are (x, y) = (3, 2), (3, -2).

So, (3, 2) generates the set of positive integer solutions. The next is

(3, 2)^2 = (3 \cdot 3 + 2(2 \cdot 2), 3 \cdot 2 + 2 \cdot 3) = (17, 12)

and in general, the solutions (x_n, y_n) = (3, 2)^n are given recursively by

(x_n, y_n) = (3x_{n-1} + 4y_{n-1}, 2x_{n-1} + 3y_{n-1}).

Let’s go back a minute to our original problem, which asked for numbers which are both triangular and square. We read off the k^{th} triangle-square, m_k^2 = n_k(n_k+1)/2, from (x_k, y_k) = (2n_k + 1, 2m_k). For example,

(x_2, y_2) = (17, 12) \Rightarrow (n_2, m_2) = (8, 6)

and so the square 36 = 6^2 is simultaneously the triangular number 1 + 2 + \ldots + 8. From (x_3, y_3) = (99, 70), we have (m_3, n_3) = (49, 35), and so 35^2 is the next triangle-square. The next after that is 204^2. And so on.

It would be nice to express these solutions in closed form. To this end, let me observe that the quotients x_k/y_k = 3/2, 17/12, 99/70, \ldots are rational approximants to the continued fraction of \sqrt{2}:

3/2 = 1 + 1/2,

17/12 = 1 + 1/(2 + 1/(2 + 1/2)),

99/70 = 1 + 1/(2 + 1/(2 + 1/2 + 1/(2 + 1/2), \ldots

(A similar observation applies to any Pell equation: positive solutions to x^2 - Dy^2 = 1, where D is square-free, can be read off by examining every other rational approximant to the continued fraction for \sqrt{D}. This, I find, is a very cool fact.)

Starting from the first two rational approximants p_0/q_0 = 1/1, p_1/q_1 = 3/2, the remaining approximants p_k/q_k can be read off from the recursive rule (*)

p_{k+1} = 2p_k + p_{k-1} \qquad q_{k+1} = 2q_k + q_{k-1}

and for the purpose of determining the triangle-squares, we are really interested only in every other one of the denominators q_k, that is to say, q_1, q_3, q_5, \ldots. We can get a recursive rule for these from

q_{k+2} = 2q_{k+1} + q_k = 2(2q_k + q_{k-1}) + q_k = 6q_k - q_{k-2}

where the last equation follows easily from the recursive rule (*). The same recursion thus gives the y_k and m_k:

y_k = 2, 12, 70, 408, \ldots,

m_k = 1, 6, 35, 204, \ldots,

namely m_k = 6m_{k-1} - m_{k-2}.

To get a closed form for the m_k (whose squares are triangular), we can now apply the familiar method of generating functions (see here for a worked example of this method to derive the closed form for the Fibonacci numbers). Form

f(x) = m_1 + m_2 x + m_3 x^2 + \ldots = 1 + 6x + 35x^2 + \ldots

and use the recursive rule for the m_k to deduce (after some manipulation) that

\displaystyle f(x) = \frac1{1 - 6x + x^2}.

The denominator expands as (1 - (3 + \sqrt{8})x)(1 - (3 - \sqrt{8})x); by the method of partial fractions, we obtain

\displaystyle f(x) = \frac{(3 + \sqrt{8})/2\sqrt{8}}{1 - (3 + \sqrt{8})x} - \frac{(3 - \sqrt{8})/2\sqrt{8}}{1 - (3 - \sqrt{8})x}

and from there, expanding the preceding line in geometric series, we derive a closed form for the m_k:

\displaystyle m_k = \frac1{2\sqrt{8}}((3 + \sqrt{8})^k - (3 - \sqrt{8})^k)

Nice! If you prefer, this is just the integer part of \displaystyle \frac1{2\sqrt{8}}(3 + \sqrt{8})^k.

So, the k^{th} triangular square is

\displaystyle \frac1{32}((3 + \sqrt{8})^{2k} + (3 - \sqrt{8})^{2k} - 2).


Having gone through this analysis, the same method can be used to determine those squares which are pentagonal (the first few pentagonal numbers are 1, 1+4, 1+4+7, 1+4+7+10, \ldots). I’ll let readers work this out if they like; suffice it to say that the k^{th} square which is also pentagonal is the square of

\displaystyle \frac1{2\sqrt{24}}((5 + \sqrt{24})^{2k-1} - (5 - \sqrt{24})^{2k-1}).

And that’s about as far as I got with this. After trying a few numerical forays with a hand-held calculator, my gut feeling is that I’d be amazed if there were any triangular-square-pentagonals past m = 1. But can anyone clinch the case?

[Edit: I’ve corrected the exponents in the last displayed line, from k to 2k-1.]

Our other blog

Visitors to this blog

Blog Stats

  • 372,989 hits

Wikio Ranking

Wikio - Top Blogs - Sciences

Current Online Readers


May 2023