You are currently browsing the category archive for the ‘Some theorems’ category.

The following “polynomial-logarithmic” algebraic identity that one encounters on many occasions turns out to have a rather useful set of applications! 

POLYNOMIAL-LOGARITHMIC IDENTITY: If P(z) is a polynomial of degree n \ge 1 with roots a_1, a_2, \ldots, a_n, then \displaystyle \frac{P'(z)}{P(z)} = \frac1{z-a_1} + \frac1{z-a_2} + \ldots + \frac1{z-a_n}.

PROOF: This one is left as a simple exercise. (Hint: Logarithms!)

A nice application of the above identity is found in one of the exercises from the chapter titled Analysis (p120) in Proofs from the Book by Aigner, Ziegler and Hofmann. 

EXERCISE: Let p(x) be a non-constant polynomial with only real zeros. Show that p'(x)^2 \ge p(x) p''(x) for all x \in \mathbb{R}.

SOLUTION: If x = a_i is a zero of p(x), then the right hand side of the above inequality equals zero, and we are done. So, suppose x is not a root of p(x). Then, differentiating the above identity w.r.t. x, we obtain \displaystyle \frac{p''(x)p(x) - p'(x)^2}{p(x)^2} = - \sum_{k=1}^n \frac1{(x - a_k)^2} < 0, and we are done.

It turns out that the above identity can also used to prove the well-known Gauss-Lucas theorem.

GAUSS-LUCAS: If P is a non-constant polynomial, then the zeros of P' lie in the convex hull of the roots of P.

PROOF: See this

HISTORY: The well-known Russian author V.V. Prasolov in his book Polynomials offers a brief and interesting historical background of the theorem, in which he points out that Gauss’ original proof (in 1836) of a variant of the theorem was motivated by physical concepts, and it was only in 1874 that F. Lucas, a French Engineer, formulated and proved the above theorem. (Note that the Gauss-Lucas theorem can also be thought of as some sort of a generalization (at least, in spirit!) of Rolle’s theorem.)

Even though I knew the aforesaid identity before, it was once again brought to my attention through a nice (and elementary) article, titled On an Algebraic Identity by Roberto Bosch Cabrera, available at Mathematical Reflections. In particular, Cabrera offers a simple solution, based on an application of the given identity, to the following problem (posed in the 2006 4th issue of Mathematical Reflections), the solution to which had either escaped regular problem solvers or required knowledge of some tedious (albeit elementary) technique. 

PROBLEM: Evaluate the sum \displaystyle \sum_{k=0}^{n-1} \frac1{1 + 8\sin^2 (k\pi /n)}. (proposed by Dorin Andrica and Mihai Piticari.)

SOLUTION: (Read Cabrera’s article.)

There is yet another problem which has a nice solution based again on our beloved identity!

PROBLEM: (Putnam A3/2005) Let p(z) be a polynomial of degree n, all of whose zeros have absolute value 1 in the complex plane. Put g(z) = p(z)/z^{n/2}. Show that all zeros of g'(z) = 0 have absolute value 1.

SOLUTION: (Again, read Cabrera’s article.)

The following theorem, I feel, is not very well-known, though it is a particularly useful one for solving certain types of “limit” problems. Let me pose a couple of elementary problems and offer their solutions. First, the theorem.

Stolz-Cesàro: Let (a_n)_{n \ge 1} and (b_n)_{n \ge 1} be two sequences of real numbers, such that (b_n) is positive, strictly increasing and unbounded. Then,

\displaystyle \lim_{n \to \infty} \frac{a_n}{b_n} = \lim_{n \to \infty} \frac{a_{n+1} - a_n}{b_{n+1} - b_n},

if the limit on the right hand side exists.

The proof involves the usual \epsilon - \delta method, and I will avoid presenting it here since it isn’t particularly interesting. Just as Abel’s lemma is the discrete analogue of integration by parts, the Stolz-Cesàro theorem may be considered the discrete analogue of L’Hospital’s rule in calculus.

Problem 1: Evaluate the limit \displaystyle \lim_{n \to \infty} \frac{1^k + 2^k + \ldots + n^k}{n^{k+1}}, where k \in \mathbb{N}.

Solution: One may certainly consider the above limit as a Riemann-sum which may then be transformed into the integral \displaystyle \int_0^1 x^k \, dx, which then obviously evaluates to 1/(k+1). But, we will take a different route here.

First, let a_n = 1^k + 2^k + \ldots + n^k and b_n = n^{k+1}. Then, we note that the sequence (b_n) is positive, strictly increasing and unbounded. Now,

\displaystyle \lim_{n \to \infty} \frac{a_{n+1} - a_n}{b_{n+1} - b_n} = \lim_{n \to \infty} \frac{(n+1)^k}{(n+1)^{k+1} - n^{k+1}}

\displaystyle = \lim_{n \to \infty} \frac{(n+1)^k}{\left(1 + \binom{k+1}{1}n + \binom{k+1}{2}n^2 + \ldots + \binom{k+1}{k}n^k + n^{k+1}\right) - n^{k+1}}

(using the binomial theorem)

\displaystyle = \lim_{n \to \infty} \frac{(n+1)^k / n^k}{\left(1 + \binom{k+1}{1}n + \binom{k+1}{2}n^2 + \ldots + \binom{k+1}{k}n^k \right) / n^k}

\displaystyle = \lim_{n \to \infty} \frac{(1 + 1/n)^k}{\binom{k+1}{k}} = \frac1{k+1}.

Therefore, using the Stolz-Cesàro theorem, we conclude that the required limit is also 1/(k+1).

Let us now look at another problem where applying the aforesaid theorem makes our job a lot easier. This problem is an example of one that is not amenable to the other usual methods of evaluating limits.

Problem 2: Let k\geq 2 be integers and suppose C: y = \sqrt {2x + 1}\ (x > 0). Given the tangent line at the point (a_{k},\, \sqrt {2a_{k} + 1}) from the point (0, k) to C, evaluate

\displaystyle \lim_{n\to\infty}\frac {1}{n^{3}}\sum_{k = 2}^{n}a_{k}.

Solution:(This is basically the solution I had offered elsewhere a while ago; so, it’s pretty much copy/paste!)

\displaystyle y = \sqrt {2x + 1}\Rightarrow \frac {dy}{dx} = \frac {1}{\sqrt {2x + 1}}.

So, the equation of the tangent line at the point (a_{k}, \sqrt {2a_{k} + 1}) is given by

\displaystyle y - \sqrt {2a_{k} + 1} = \frac {1}{\sqrt {2a_{k} + 1}}(x - a_{k}).

Since the point (0,k) lies on this line, we must have

\displaystyle k - \sqrt {2a_{k} + 1} = \frac {1}{\sqrt {2a_{k} + 1}}( - a_{k})

The above, after squaring and some algebraic manipulation yields
a_{k}^{2} + 2(1 - k^{2})a_{k} + 1 - k^{2} = 0, which implies a_{k} = (k^{2} - 1) + k(\sqrt {k^{2} - 1}). We drop the negative root because a_{k} > 0 for all k\ge 2.

(This is where the Stolz-Cesàro theorem actually comes into play!)

Now, let (b_{n}) and (c_{n}) be two sequences such that \displaystyle b_{n} = \sum_{k = 2}^{n}a_{k} and c_{n} = n^{3}.
Note that (c_{n}) is a positive, increasing and unbounded sequence.

Therefore, \displaystyle \lim_{n\to \infty}\frac {b_{n + 1} - b_{n}}{c_{n + 1} - c_{n}} = \lim_{n\rightarrow \infty}\frac {\sum_{k = 2}^{n + 1}a_{k} - \sum_{k = 2}^{n}a_{k}}{(n + 1)^{3} - n^{3}} = \lim_{n\rightarrow \infty}\frac {a_{n + 1}}{3n^{2} + 3n + 1}

\displaystyle = \lim_{n\to \infty}\frac {(n + 1)^{2} - 1 + (n + 1)\sqrt {(n + 1)^{2} - 1}}{3n^{2} + 3n + 1}

\displaystyle = \lim_{n \to \infty} \frac{(1 + 1/n)^2 - 1/n^2 + (1 + 1/n)\sqrt{1 + 2/n}}{3 + 3/n + 1/n^2}

= 2/3.

Therefore, by the Stolz- Cesàro theorem, we have

\displaystyle \lim_{n\to \infty}\frac {b_{n}}{c_{n}} = 2/3 \,, and so

\displaystyle \lim_{n\to \infty}\frac {1}{n^{3}}\sum_{k = 2}^{n}a_{k} = 2/3.

Last time in this series on Stone duality, we introduced the concept of lattice and various cousins (e.g., inf-lattice, sup-lattice). We said a lattice is a poset with finite meets and joins, and that inf-lattices and sup-lattices have arbitrary meets and joins (meaning that every subset, not just every finite one, has an inf and sup). Examples include the poset PX of all subsets of a set X, and the poset Sub(V) of all subspaces of a vector space V.

I take it that most readers are already familiar with many of the properties of the poset PX; there is for example the distributive law A \cap (B \cup C) = (A \cap B) \cup (A \cap C), and De Morgan laws, and so on — we’ll be exploring more of that in depth soon. The poset Sub(V), as a lattice, is a much different animal: if we think of meets and joins as modeling the logical operations “and” and “or”, then the logic internal to Sub(V) is a weird one — it’s actually much closer to what is sometimes called “quantum logic”, as developed by von Neumann, Mackey, and many others. Our primary interest in this series will be in the direction of more familiar forms of logic, classical logic if you will (where “classical” here is meant more in a physicist’s sense than a logician’s).

To get a sense of the weirdness of Sub(V), take for example a 2-dimensional vector space V. The bottom element is the zero space \{0\}, the top element is V, and the rest of the elements of Sub(V) are 1-dimensional: lines through the origin. For 1-dimensional spaces x, y, there is no relation x \leq y unless x and y coincide. So we can picture the lattice as having three levels according to dimension, with lines drawn to indicate the partial order:

       V = 1
       / | \
     /   |   \
    x    y    z
     \   |   /
       \ | /

Observe that for distinct elements x, y, z in the middle level, we have for example x \wedge y = 0 = x \wedge z (0 is the largest element contained in both x and y), and also for example y \vee z = 1 (1 is the smallest element containing y and z). It follows that x \wedge (y \vee z) = x \wedge 1 = x, whereas (x \wedge y) \vee (x \wedge z) = 0 \vee 0 = 0. The distributive law fails in Sub(V)!

Definition: A lattice is distributive if x \wedge (y \vee z) = (x \wedge y) \vee (x \wedge z) for all x, y, z. That is to say, a lattice X is distributive if the map x \wedge -: X \to X, taking an element y to x \wedge y, is a morphism of join-semilattices.

  1. Exercise: Show that in a meet-semilattice, x \wedge -: X \to X is a poset map. Is it also a morphism of meet-semilattices? If X has a bottom element, show that the map x \wedge - preserves it.
  2. Exercise: Show that in any lattice, we at least have (x \wedge y) \vee (x \wedge z) \leq x \wedge (y \vee z) for all elements x, y, z.

Here is an interesting theorem, which illustrates some of the properties of lattices we’ve developed so far:

Theorem: The notion of distributive lattice is self-dual.

Proof: The notion of lattice is self-dual, so all we have to do is show that the dual of the distributivity axiom, x \vee (y \wedge z) = (x \vee y) \wedge (x \vee z), follows from the distributive lattice axioms.

Expand the right side to ((x \vee y) \wedge x) \vee ((x \vee y) \wedge z), by distributivity. This reduces to x \vee [(x \vee y) \wedge z], by an absorption law. Expand this again, by distributivity, to x \vee (x \wedge z) \vee (y \wedge z). This reduces to x \vee (y \wedge z), by the other absorption law. This completes the proof. \Box

Distributive lattices are important, but perhaps even more important in mathematics are lattices where we have not just finitary, but infinitary distributivity as well:

Definition: A frame is a sup-lattice for which x \wedge -: X \to X is a morphism of sup-lattices, for every x \in X. In other words, for every subset S \subseteq  X, we have \sup(\{x \wedge s: s \in S\}) = x \wedge \sup(S), or, as is often written,

\bigvee_{s \in S} x \wedge s = x \wedge \bigvee_{s \in S} s.

Example: A power set PX, as always partially ordered by inclusion, is a frame. In this case, it means that for any subset A and any collection of subsets \{B_i: i \in I\}, we have

A \cap (\bigcup_{i \in I} B_i) = \bigcup_{i \in I} A \cap B_i

This is a well-known fact from naive set theory, but soon we will see an alternative proof, thematically closer to the point of view of these notes.

Example: If X is a set, a topology on X is a subset \mathbf{T} \subseteq PX of the power set, partially ordered by inclusion as PX is, which is closed under finite meets and arbitrary sups. This means the empty sup or bottom element \emptyset and the empty meet or top element X of PX are elements of \mathbf{T}, and also:

  1. If U, V are elements of \mathbf{T}, then so is U \cap V.
  2. If \{U_i: i \in I\} is a collection of elements of \mathbf{T}, then \bigcup_{i \in I} U_i is an element of \mathbf{T}.

A topological space is a set X which is equipped with a topology \mathbf{T}; the elements of the topology are called open subsets of the space. Topologies provide a primary source of examples of frames; because the sups and meets in a topology are constructed the same way as in PX (unions and finite intersections), it is clear that the requisite infinite distributivity law holds in a topology.

The concept of topology was originally rooted in analysis, where it arose by contemplating very generally what one means by a “continuous function”. I imagine many readers who come to a blog titled “Topological Musings” will already have had a course in general topology! but just to be on the safe side I’ll give now one example of a topological space, with a promise of more to come later. Let X be the set \mathbb{R}^n of n-tuples of real numbers. First, define the open ball in \mathbb{R}^n centered at a point x \in \mathbb{R}^n and of radius r > 0 to be the set \{y \in \mathbb{R}: ||x - y|| < r\}. Then, define a subset U \subseteq \mathbb{R}^n to be open if it can be expressed as the union of a collection, finite or infinite, of (possibly overlapping) open balls; the topology is by definition the collection of open sets.

It’s clear from the definition that the collection of open sets is indeed closed under arbitrary unions. To see it is closed under finite intersections, the crucial lemma needed is that the intersection of two overlapping open balls is itself a union of smaller open balls. A precise proof makes essential use of the triangle inequality. (Exercise?)

Topology is a huge field in its own right; much of our interest here will be in its interplay with logic. To that end, I want to bring in, in addition to the connectives “and” and “or” we’ve discussed so far, the implication connective in logic. Most readers probably know that in ordinary logic, the formula p \Rightarrow q (“p implies q“) is equivalent to “either not p or q” — symbolically, we could define p \Rightarrow q as \neg p \vee q. That much is true — in ordinary Boolean logic. But instead of committing ourselves to this reductionistic habit of defining implication in this way, or otherwise relying on Boolean algebra as a crutch, I want to take a fresh look at material implication and what we really ask of it.

The main property we ask of implication is modus ponens: given p and p \Rightarrow q, we may infer q. In symbols, writing the inference or entailment relation as \leq, this is expressed as p \wedge (p \Rightarrow q) \leq q. And, we ask that implication be the weakest possible such assumption, i.e., that material implication p \Rightarrow q be the weakest a whose presence in conjunction with p entails q. In other words, for given p and q, we now define implication p \Rightarrow q by the property

(a \wedge p \leq q) if and only if (a \leq p \Rightarrow q).

As a very easy exercise, show by Yoneda that an implication p \Rightarrow q is uniquely determined when it exists. As the next theorem shows, not all lattices admit an implication operator; in order to have one, it is necessary that distributivity holds:


  • (1) If X is a meet-semilattice which admits an implication operator, then for every element p, the operator p \wedge -: X \to X preserves any sups which happen to exist in X.
  • (2) If X is a frame, then X admits an implication operator.

Proof: (1) Suppose S \subseteq X has a sup in X, here denoted \bigvee_{s \in S} s. We have

(\bigvee_{s \in S} s) \wedge p \leq q if and only if

\bigvee_{s \in S} s \leq p \Rightarrow q if and only if

for all s \in S, (s \leq p \Rightarrow q) if and only if

for all s \in S, (s \wedge p \leq q) if and only if

\bigvee_{s \in S} (s \wedge p) \leq q.

Since this is true for all q, the (dual of the) Yoneda principle tells us that (\bigvee_{s \in S} s) \wedge p = \bigvee_{s \in S} (s \wedge p), as desired. (We don’t need to add the hypothesis that the sup on the right side exists, for the first four lines after “We have” show that (\bigvee_{s \in S} s) \wedge p satisfies the defining property of that sup.)

(2) Suppose p, q are elements of a frame X. Define p \Rightarrow q to be \sup(\{a \in X: a \wedge p \leq q\}). By definition, if a \wedge p \leq q, then a \leq p \Rightarrow q. Conversely, if a \leq p \Rightarrow q, then

a \wedge p \leq \sup\{x: x \wedge p \leq q\} \wedge p = \sup\{x \wedge p: x \wedge p \leq q\},

where the equality holds because of the infinitary distributive law in a frame, and this last sup is clearly bounded above by q (according to the defining property of sups). Hence a \wedge p \leq q, as desired. \Box

Incidentally, part (1) this theorem gives an alternative proof of the infinitary distributive law for Boolean algebras such as PX, so long as we trust that p \Rightarrow q := \neg p \vee q really does what we ask of implication. We’ll come to that point again later.

Part (2) has some interesting consequences vis à vis topologies: we know that topologies provide examples of frames; therefore by part (2) they admit implication operators. It is instructive to work out exactly what these implication operators look like. So, let U, V be open sets in a topology. According to our prescription, we define U \Rightarrow V as the sup (the union) of all open sets W with the property that W \cap U \subseteq V. We can think of this inclusion as living in the power set PX. Then, assuming our formula U^c \cup V for implication in the Boolean algebra PX (where U^c denotes the complement of U), we would have W \subseteq U^c \cup V. And thus, our implication U \Rightarrow V in the topology is the union of all open sets W contained in the (usually non-open) set U^c \cup V. That is to say, U \Rightarrow V is the largest open contained in U^c \cup V, otherwise known as the interior of U^c \cup V. Hence our formula:

U \Rightarrow V = int(U^c \cup V).

Definition: A Heyting algebra is a lattice H which admits an implication p \Rightarrow q for any two elements p, q \in H. A complete Heyting algebra is a complete lattice which admits an implication for any two elements.

Again, our theorem above says that frames are (extensionally) the same thing as complete Heyting algebras. But, as in the case of inf-lattices and sup-lattices, we make intensional distinctions when we consider the appropriate notions of morphism for these concepts. In particular, a morphism of frames is a poset map which preserves finite meets and arbitrary sups. A morphism of Heyting algebras preserves all structure in sight (i.e., all implied in the definition of Heyting algebra — meets, joins, and implication). A morphism of complete Heyting algebras also preserves all structure in sight (sups, infs, and implication).

Heyting algebras are usually not Boolean algebras. For example, it is rare that a topology is a Boolean lattice. We’ll be speaking more about that next time soon, but for now I’ll remark that Heyting algebra is the algebra which underlies intuitionistic propositional calculus.

Exercise: Show that (0 \Rightarrow 0) = 1 in a Heyting algebra.

Exercise: (For those who know some general topology.) In a Heyting algebra, we define the negation \neg x to be x \Rightarrow 0. For the Heyting algebra given by a topology, what can you say about \neg U when U is open and dense?

Part 2:

After having understood the inclusion-exclusion principle by working out a few cases and examples in my earlier post, we are now ready to prove the general version of the principle.

As with many things in mathematics, there is a “normal” way of doing proofs and there is the “Polya/Szego” way of doing proofs. (Ok, ok, I admit it’s just a bias I have.) I will stick to the latter. Ok, let’s state the principle first and follow it up with its proof in a step-by-step fashion.

Inclusion-Exclusion Principle: Let there be a set of N objects. Suppose out of these N objects, there are N_a objects of type a, N_b objects of type b, \ldots , N_k objects of type k and N_l objects of type l. Also, suppose N_{ab}, N_{ac}, \ldots , N_{abc}, \ldots , N_{ab \ldots kl} denote the number of objects that are simultaneously of type a AND b, a AND c, \ldots , a, b AND c, \ldots , a, b, c, k AND l, respectively. Then, the number of objects that are NOT of type a, b, c, \ldots , k, l is

N - (N_a + N_b + \ldots + N_k + N_l)

+ (N_{ab} + N_{ac} + \ldots + N_{kl})

- (N_{abc} + \ldots ) + \ldots

\pm N_{abc \ldots kl}.

Notation —

Let U (finite or infinite) be the universe of discourse. Suppose A \subset U. Then, the characteristic function 1_A: U \to \{ 0, 1 \} of A is defined as

1_A (x) = 1 if x \in A,

and 1_A(x) = 0 otherwise, for all x \in U.

For example, suppose U = \{ 1, 2, 3, 4, \ldots , 29, 30 \}. Let A = \{ 2, 4, 6, \ldots , 28, 30 \} (i.e. even integers.) Then, 1_A(2) = 1_A(26) = 1, 1_A(3) = 1_A(29) = 0, and so on.

Note: 1_U (x) = 1 and 1_{\emptyset}(x) = 0 for all x \in U. Here, \emptyset denotes the empty set. Due to this, we will use 1 and 1_U interchangeably from now.



Lemma 1: A \subset B iff 1_A(x) \le 1_B(x) for all x \in U.

Proof: We first prove the “only if”part. So, suppose A \subset B. Let x \in U. If x \in A, then 1_A(x) = 1. But, we also have x \in B, in which case, 1_B(x) = 1. If, on the other hand, x \notin A, then 1_A(x) = 0 \le 1_B(x). Hence, in either case, 1_A(x) \le 1_B(x) for all x \in U.

We now prove the “if” part. So, suppose 1_A(x) \le 1_B(x) for all x \in U. Let x \in A. Then, 1_A(x) = 1, which forces 1_B(x) = 1, which implies x \in B. Hence, A \subset B, and this completes our proof.

Note: If U is finite, then \displaystyle |A| = \sum_{x \in U}1_A(x).

Lemma 2: 1_{A^c}(x) = 1 - 1_A(x) ,

1_{A \cap B}(x) = 1_A(x) 1_B(x), and

1_{A \cup B}(x) = 1 - (1 - 1_A(x))(1 - 1_B(x))

for all x \in U. (Here, A^c is the complement of A.)

Proof: The proof for each case is elementary.

Lemma 3: Suppose U is finite. If A, B, C, \ldots \subset U, then the characteristic function of A \cup B \cup C \cup \ldots is 1 - (1 - 1_A)(1 - 1_B)(1 - 1_C) \ldots, i.e.

1_{A \cup B \cup C \cup \ldots}(x) = 1 - (1 - 1_A(x))(1 - 1_B(x))(1 - 1_C(x)) \ldots for all x \in U.

Proof: Note the above is an extension of the third part of lemma (2). A simple induction on the number of subsets of U proves the result.

Proof of the inclusion-exclusion principle —

Now, suppose A, B, C, \ldots, K, L are subsets of objects of type a, b, c, \ldots, k, l, respectively. Observe that the set of objects that are NOT of type a, b, c, \ldots , k, l is simply the region outside of all the oval regions! (Look at the previous post to see what this means.) And this region is simply the subset (A \cup B \cup B \cup \ldots \cup K \cup L)^c. Using the first part of lemma (2), we see that the characteristic function of this outside region is 1 - 1_{A \cup B \cup B \cup \ldots \cup K \cup L}, which from lemma (3) is the same as

(1-1_A)(1-1_B)(1-1_C) \ldots (1-1_K)(1-1_L).

Expand the last expression to get

1 - (1_A + 1_B + 1_C + \ldots)

+ (1_A1_B + 1_A1_C + \ldots)

- (1_A1_B1_C + \ldots) + \ldots

\pm 1_A1_B\ldots 1_L.

Now, sum over all the elements of U and use the second part of lemma (2) to obtain the desired result. And this completes our proof.

Part 1:

A couple of weeks ago, my friend John (from UK) asked me if I could explain the Inclusion-Exclusion Principle to him. Wikipedia contains an article on the same topic but John felt that it wasn’t a very helpful introduction. So, as promised, here is the post on that topic, though I managed to finish it only after some delay. (Sorry, John!)

As the title of this post suggests, the inclusion-exclusion principle can simply be described as counting all the objects outside the oval regions! We will use Venn diagrams to explain what that means.

Note: \lfloor x \rfloor denotes the greatest integer less than x.

Ok, let’s now “build” the principle step by step.

1. Suppose there are N objects out of which there are exactly N_a objects of type a. How many objects are NOT of type a? The answer is obvious: N - N_a. The Venn diagram below depicts the answer pictorially. The rounded rectangular region (with the orange border) is the set of all N objects, and the oval region (with the blue border) is the set of all N_a objects of type a. Then, the remaining white region that is outside the oval region denotes the set of all objects that are NOT of type a, and clearly, there are N - N_a of ’em.


Indeed, let us take a simple example. Consider the set of first thirty natural numbers: \{ 1, 2, \ldots , 30 \}. So, N = 30. Now, out of these thirty integers, let N_2 be the number of integers divisible by 2. Then, N_2 = \lfloor 30/2 \rfloor = 15. It is easy to see that the number of integers NOT divisible by 2 equals N - N_2 = 30 - 15 = 15, which is what we would expect if we were to list all the integers not divisible by 2. Indeed, those integers are 1, 3, 5, \ldots , 29.

2. Now, suppose there are N objects out of which there are exactly N_a objects of type a and N_b objects of type b. Also, suppose there are exactly N_{ab} objects that are of both type a AND b. Then, how many objects are NOT of type a OR b? The Venn diagram below illustrates this case.


Again, we are counting the number of objects outside the two oval regions. To answer the above question, we first need to determine the number of objects inside the two oval regions, and then subtract this number from the total, which is N. Now, one might be tempted to think that the number of objects inside the two oval regions is simply N_a + N_b. But this is only true if the two oval regions don’t intersect (i.e. they have no objects in common.) In the general case, however, the expression N_a + N_b counts N_{ab} twice! And so, we must subtract N_{ab} from the expression to get N_a + N_b - N_{ab} as the exact number of objects inside the two oval regions. We can now see that the number of objects outside the two oval regions equals N - (N_a + N_b - N_{ab}) = N - (N_a + N_b) + N_{ab}, and we are done.

Continuing with our example used earlier, let N_3 be the number of integers divisible by 3. Also, let N_6 be the number of integers divisible by 2 AND 3 (i.e. we count multiples of 6.) Now, note that N_3 = \lfloor 30/3 \rfloor = 10, and N_6 = \lfloor 30/6 \rfloor = 5.

Thus, using the formula derived above, the number of integers that are NOT divisible by 2 OR 3 equals N - (N_2 + N_3) + N_6 = 30 - (15 + 10) + 5 = 10. In fact, we can list these ten integers: 1, 5, 7, 11, 13, 17, 19, 23, 25 and 29; and this confirms our answer.

3. Now, suppose there are N objects out of which there are exactly N_a objects of type a, N_b objects of type b and N_c objects of type c. Also, let N_{ab} denote the number of objects of type a AND b, N_{bc} the number of objects of type b AND c, N_{ca} the number of objects of type c AND a, and N_{abc} the number of objects of type a, b AND c. Then, how many objects are NOT of type a, b OR c? This case is illustrated by the Venn diagram shown below.


Once again, let us ask, what is the number of objects inside the three oval regions. A possible answer is N_a + N_b + N_c. Now this will only be true if the three oval regions are pairwise disjoint. In the general case, however, we will have to take care of overcounting, just as we did in (2) earlier. A brief thought will reveal that in the above expression, we have counted each of N_{ab}, N_{bc} and N_{ca} twice and N_{abc} thrice! To take care of this overcounting, we subtract each of N_{ab}, N_{bc} and N_{ca} once from the expression, but in doing so, we also end up subtracting N_{abc} thrice! We thus need to add N_{abc} back into the expression to get N_a + N_b + N_c - N_{ab} - N_{bc} - N_{ca} + N_{abc}, and this expression yields the exact number of objects inside the three oval regions. Therefore, the number of objects outside the three oval regions equals N - (N_a + N_b + N_c) + (N_{ab} + N_{bc} + N_{ca}) - N_{abc}. And, we are done.

Again, continuing with our earlier example, let N_5 denote the number of integers divisible by 5. Then, N_5 = \lfloor 30/5 \rfloor = 6. Also, let N_{15} denote the number of integers divisible by 3 AND 5 (i.e. we are counting multiples of 15); then, N_{15} = \lfloor 30/15 \rfloor = 2. Again, let N_{10} denote the number of integers divisible by 2 and 5; then N_{10} = \lfloor 30/10 \rfloor = 3. And, finally, let N_{30} denote the number of integers divisible by 2, 3 and 5; then N_{30} = \lfloor 30/30 \rfloor = 1.

So, the number of integers NOT divisible by 2, 3 OR 5 equals N - (N_2 + N_3 + N_5) + (N_{6} + N_{15} + N_{10}) - N_{30}

= 30 - (15 + 10 + 6) + (5 + 2 + 3) - 1 = 8. Indeed, those eight integers are 1, 7, 11, 13, 17, 19, 23 and 29.

It isn’t very hard to deduce the formula for the general case when we have a set of N objects, out of which there are N_a objects of type a, N_b objects of type b, and so on. The proof of the general formula will follow in the next post, which may include a couple of problems/solutions involving this principle.

[Update: Thanks to Andreas for pointing out that I may have been a little sloppy in stating the maximum modulus principle! The version below is an updated and correct one. Also, Andreas pointed out an excellent post on “amplification, arbitrage and the tensor power trick” (by Terry Tao) in which the “tricks” discussed are indeed very useful and far more powerful generalizations of the “method” of E. Landau discussed in this post. The Landau method mentioned here, it seems, is just one of the many examples of the “tensor power trick”.]

The maximum modulus principle states that if f: U \to \mathbb{C} (where U \subset \mathbb{C}) is a holomorphic function, then |f| attains its maximal value on any compact K \subset U on the boundary \partial K of K. (If |f| attains its maximal value anywhere in the interior of K, then f is a constant. However, we will not bother about this part of the theorem in this post.)

Problems and Theorems in Analysis II, by Polya and Szego, provides a short proof of the “inequality part” of the principle. The proof by E. Landau employs Cauchy’s integral formula, and the technique is very interesting and useful indeed. The proof is as follows.

From Cauchy’s integral formula, we have

\displaystyle f(z) = \frac1{2\pi i} \oint_L \frac{f(\zeta)}{\zeta - z} d\zeta,

for every z in the interior of D.

Now, suppose |f(\zeta)| \le M on L. Then,

\displaystyle |f(z)| \le \frac{M}{2\pi} \int_L \left|\frac{d\zeta}{\zeta - z}\right| = KM,

where the constant K depends only on the curve L and on the position of z, and is independent of the specific choices of f(z). Now, this rough estimate can be significantly improved by applying the same argument to (f(z))^n, where n \in \mathbb{N}, to obtain

|f(z)|^n \le K M^n, or |f(z)| \le K^{1/n} M.

By allowing n to go to infinity, we get |f(z)| \le M, which is what we set out to prove.

Polya/Szego mention that the proof shows that a rough estimate may sometimes be transformed into a sharper estimate by making appropriate use of the generality for which the original estimate is valid.

I will follow this up with, maybe, a couple of problems/solutions to demonstrate the effectiveness of this useful technique.

About a couple of weeks ago, Noah Snyder liveblogged at the Secret Blogging Seminar on two topology talks given by Jacob Lurie. You may want to learn more about the contents of the talks by clicking the appropriate link. The thing is when I came to know about Noah’s liveblog, the thought that immediately sprang to my mind was the one involving his elementary proof of the well-known Mason-Stothers theorem.

In this post, I wish to present Noah Snyder’s proof of the Mason-Stothers Theorem, which is the polynomial version of the yet unproven (and well-known) ABC Conjecture in number theory. I will follow it up with a problem and its solution using the aforesaid theorem. For a detailed and wonderful exposition on the ABC conjecture, you may want to read an article, titled The ABC’s of Number Theory, written by Noam Elkies for The Harvard College Mathematics Review.

First, a brief history. Though this theorem on polynomials was proved by Stothers in 1981, it didn’t attract much attention until 1983 when it was rediscovered by Mason. Noah, in 1998 (while still in high school), gave perhaps the most elegant elementary proof.

The proof below is the version given in Serge Lang’s Undergraduate Algebra, which it seems has quite a number of typos.

Okay, now some terminology. If f and g are polynomials, then

  • \deg(f) denotes the degree of f,
  • N_0 (f) denotes the number of distinct zeros of f, and
  • (f, g) denotes the \gcd (f, g).

To illustrate, suppose f(x) = x(x-1)^2 (x+2)^3 and g(x) = 5x^2. Then, \deg(f) = 6 and N_0 (f) = 3. And, \deg(g) = 2 and N_0 (g) = 1. Also, (f, g) = x.

Let us first prove a couple of useful lemmas before stating the theorem and its proof.

Lemma 1: If f \in \mathbb{C}[t] is a polynomial, then f has repeated roots iff f and f' have a common root.

Proof: If f has a root \alpha of multiplicity k, then f(x) = (x - \alpha)^k Q(x), where Q(\alpha) \ne 0. Therefore,

f'(x) = k(x - \alpha)^{k-1} Q(x) + (x - \alpha)^k Q'(x)

= (x - \alpha)^{k-1} (kQ(x) + (x - \alpha)Q'(x).

Now, if k = 1, then f'(\alpha) = Q(\alpha) \ne 0, which is the same thing as saying, if f does not have a repeated root, then f and f' don’t have a common root. And, if k > 1, then f' has root \alpha of multiplicity at least k-1. This is same as saying, if f has a repeated root \alpha, then f and f' have a common root \alpha. And, this completes our proof.

Lemma 2: If f \in \mathbb{C}[t] is a polynomial, then \deg (f, f') = \deg (f) - N_0 (f).

Proof: Suppose \deg (f) = n and f has distinct roots \alpha_1, \alpha_2, \ldots , \alpha_i, with multiplicities k_1, k_2, \ldots, k_i, respectively. Then, n = k_1 + k_2 + \ldots + k_i. Now, from the proof of lemma (1) above, we note that (f, f') = (x-\alpha_1)^{k_1 - 1}(x-\alpha_2)^{k_2 - 1} \ldots (x-\alpha_n)^{k_n - 1}. Therefore,

\deg (f, f')

= (k_1 - 1) + (k_2 - 1) + \ldots + (k_i - 1)

= (k_1 + k_2 + \ldots + k_i) - i

= n - i

= \deg (f) - N_0 (f). And, we are done.

Okay, we are now ready to state the theorem.

Mason-Stothers Theorem: If f, g, h \in \mathbb{C}[t] are relatively prime polynomials, not all constant, such that f + g = h, then \max \{\deg(f), \deg(g), \deg(h)\} \le N_0 (fgh) - 1.

Proof: We first note that

f'g -  fg' = f'h - fh' \quad \quad (*).

Indeed, we have f' + g' = h'. Therefore,

f'g - fg' = f'(h-f) - f(h' - f') = f'h - fh'. And, we are done.

Also, note that at least two of the polynomials f, g and h are non-constants, for if any two polynomials are constants, then this forces the third to be a constant as well. So, without any loss of generality, assume f and g are non-constant polynomials. Now, we note that f'g - fg' \ne 0, for otherwise, f'g = fg' \ne 0, and since f and g are relatively prime, this would imply g \mid g', which leads to a contradiction!

Now, we observe that (f, f') and (g, g') divide the left hand side of (*), and (h, h') divides the right hand side of (*), which is equal to the left hand side. And, since f, g and h are relatively prime, we conclude that

(f, f')(g, g')(h, h') divides f'g - fg'.

The above implies that

\deg (f, f') + \deg (g, g') + \deg (h, h') \le \deg (f'g - fg'),

which implies

\deg (f, f') + \deg (g, g') + \deg (h, h') \le \deg (f) + \deg (g) - 1 \quad \quad (**)

Now, applying lemma (2) to f, g, h, we obtain,

\deg (f, f') = \deg (f) - N_0 (f)

\deg (g, g') = \deg (g) - N_0 (g)

\deg (h, h') = \deg (h) - N_0 (h)

Using the above in (**), we get

\deg (h) \le N_0 (f) + N_0 (g) + N_0 (h) - 1 = N_0 (fgh) - 1

since f, g, h are relatively prime polynomials.

Due to symmetricity, the above arguments can similarly be repeated for f and g to get similar inequalities for \deg (f) and \deg(g). And, this concludes our proof.

( Earlier, I had mentioned I would pose a problem and also give its solution that would use the above theorem. I will do that in my next post.)

Person Jacob Lurie
Right click for SmartMenu shortcuts

Our other blog

Visitors to this blog

Blog Stats

  • 371,445 hits

Wikio Ranking

Wikio - Top Blogs - Sciences

Current Online Readers


March 2023