You are currently browsing the category archive for the ‘Some theorems’ category.
The following “polynomial-logarithmic” algebraic identity that one encounters on many occasions turns out to have a rather useful set of applications!
POLYNOMIAL-LOGARITHMIC IDENTITY: If
is a polynomial of degree
with roots
, then
.
PROOF: This one is left as a simple exercise. (Hint: Logarithms!)
A nice application of the above identity is found in one of the exercises from the chapter titled Analysis (p120) in Proofs from the Book by Aigner, Ziegler and Hofmann.
EXERCISE: Let
be a non-constant polynomial with only real zeros. Show that
for all
.
SOLUTION: If is a zero of
, then the right hand side of the above inequality equals zero, and we are done. So, suppose
is not a root of
. Then, differentiating the above identity w.r.t.
, we obtain
, and we are done.
It turns out that the above identity can also used to prove the well-known Gauss-Lucas theorem.
GAUSS-LUCAS: If
is a non-constant polynomial, then the zeros of
lie in the convex hull of the roots of
.
PROOF: See this.
HISTORY: The well-known Russian author V.V. Prasolov in his book Polynomials offers a brief and interesting historical background of the theorem, in which he points out that Gauss’ original proof (in 1836) of a variant of the theorem was motivated by physical concepts, and it was only in 1874 that F. Lucas, a French Engineer, formulated and proved the above theorem. (Note that the Gauss-Lucas theorem can also be thought of as some sort of a generalization (at least, in spirit!) of Rolle’s theorem.)
Even though I knew the aforesaid identity before, it was once again brought to my attention through a nice (and elementary) article, titled On an Algebraic Identity by Roberto Bosch Cabrera, available at Mathematical Reflections. In particular, Cabrera offers a simple solution, based on an application of the given identity, to the following problem (posed in the 2006 4th issue of Mathematical Reflections), the solution to which had either escaped regular problem solvers or required knowledge of some tedious (albeit elementary) technique.
PROBLEM: Evaluate the sum
. (proposed by Dorin Andrica and Mihai Piticari.)
SOLUTION: (Read Cabrera’s article.)
There is yet another problem which has a nice solution based again on our beloved identity!
PROBLEM: (Putnam A3/2005) Let
be a polynomial of degree
, all of whose zeros have absolute value 1 in the complex plane. Put
. Show that all zeros of
have absolute value 1.
SOLUTION: (Again, read Cabrera’s article.)
The following theorem, I feel, is not very well-known, though it is a particularly useful one for solving certain types of “limit” problems. Let me pose a couple of elementary problems and offer their solutions. First, the theorem.
Stolz-Cesàro: Let and
be two sequences of real numbers, such that
is positive, strictly increasing and unbounded. Then,
,
if the limit on the right hand side exists.
The proof involves the usual method, and I will avoid presenting it here since it isn’t particularly interesting. Just as Abel’s lemma is the discrete analogue of integration by parts, the Stolz-Cesàro theorem may be considered the discrete analogue of L’Hospital’s rule in calculus.
Problem 1: Evaluate the limit , where
.
Solution: One may certainly consider the above limit as a Riemann-sum which may then be transformed into the integral , which then obviously evaluates to
. But, we will take a different route here.
First, let and
. Then, we note that the sequence
is positive, strictly increasing and unbounded. Now,
(using the binomial theorem)
.
Therefore, using the Stolz-Cesàro theorem, we conclude that the required limit is also .
Let us now look at another problem where applying the aforesaid theorem makes our job a lot easier. This problem is an example of one that is not amenable to the other usual methods of evaluating limits.
Problem 2: Let be integers and suppose
. Given the tangent line at the point
from the point
to
, evaluate
.
Solution:(This is basically the solution I had offered elsewhere a while ago; so, it’s pretty much copy/paste!)
.
So, the equation of the tangent line at the point is given by
Since the point lies on this line, we must have
The above, after squaring and some algebraic manipulation yields
, which implies
. We drop the negative root because
for all
.
(This is where the Stolz-Cesàro theorem actually comes into play!)
Now, let and
be two sequences such that
and
Note that is a positive, increasing and unbounded sequence.
Therefore,
.
Therefore, by the Stolz- Cesàro theorem, we have
, and so
.
Last time in this series on Stone duality, we introduced the concept of lattice and various cousins (e.g., inf-lattice, sup-lattice). We said a lattice is a poset with finite meets and joins, and that inf-lattices and sup-lattices have arbitrary meets and joins (meaning that every subset, not just every finite one, has an inf and sup). Examples include the poset of all subsets of a set
, and the poset
of all subspaces of a vector space
.
I take it that most readers are already familiar with many of the properties of the poset ; there is for example the distributive law
, and De Morgan laws, and so on — we’ll be exploring more of that in depth soon. The poset
, as a lattice, is a much different animal: if we think of meets and joins as modeling the logical operations “and” and “or”, then the logic internal to
is a weird one — it’s actually much closer to what is sometimes called “quantum logic”, as developed by von Neumann, Mackey, and many others. Our primary interest in this series will be in the direction of more familiar forms of logic, classical logic if you will (where “classical” here is meant more in a physicist’s sense than a logician’s).
To get a sense of the weirdness of , take for example a 2-dimensional vector space
. The bottom element is the zero space
, the top element is
, and the rest of the elements of
are 1-dimensional: lines through the origin. For 1-dimensional spaces
, there is no relation
unless
and
coincide. So we can picture the lattice as having three levels according to dimension, with lines drawn to indicate the partial order:
V = 1 / | \ / | \ x y z \ | / \ | / 0
Observe that for distinct elements in the middle level, we have for example
(0 is the largest element contained in both
and
), and also for example
(1 is the smallest element containing
and
). It follows that
, whereas
. The distributive law fails in
!
Definition: A lattice is distributive if for all
. That is to say, a lattice
is distributive if the map
, taking an element
to
, is a morphism of join-semilattices.
- Exercise: Show that in a meet-semilattice,
is a poset map. Is it also a morphism of meet-semilattices? If
has a bottom element, show that the map
preserves it.
- Exercise: Show that in any lattice, we at least have
for all elements
.
Here is an interesting theorem, which illustrates some of the properties of lattices we’ve developed so far:
Theorem: The notion of distributive lattice is self-dual.
Proof: The notion of lattice is self-dual, so all we have to do is show that the dual of the distributivity axiom, , follows from the distributive lattice axioms.
Expand the right side to , by distributivity. This reduces to
, by an absorption law. Expand this again, by distributivity, to
. This reduces to
, by the other absorption law. This completes the proof.
Distributive lattices are important, but perhaps even more important in mathematics are lattices where we have not just finitary, but infinitary distributivity as well:
Definition: A frame is a sup-lattice for which is a morphism of sup-lattices, for every
. In other words, for every subset
, we have
, or, as is often written,
Example: A power set , as always partially ordered by inclusion, is a frame. In this case, it means that for any subset
and any collection of subsets
, we have
This is a well-known fact from naive set theory, but soon we will see an alternative proof, thematically closer to the point of view of these notes.
Example: If is a set, a topology on
is a subset
of the power set, partially ordered by inclusion as
is, which is closed under finite meets and arbitrary sups. This means the empty sup or bottom element
and the empty meet or top element
of
are elements of
, and also:
- If
are elements of
, then so is
.
- If
is a collection of elements of
, then
is an element of
.
A topological space is a set which is equipped with a topology
; the elements of the topology are called open subsets of the space. Topologies provide a primary source of examples of frames; because the sups and meets in a topology are constructed the same way as in
(unions and finite intersections), it is clear that the requisite infinite distributivity law holds in a topology.
The concept of topology was originally rooted in analysis, where it arose by contemplating very generally what one means by a “continuous function”. I imagine many readers who come to a blog titled “Topological Musings” will already have had a course in general topology! but just to be on the safe side I’ll give now one example of a topological space, with a promise of more to come later. Let be the set
of
-tuples of real numbers. First, define the open ball in
centered at a point
and of radius
to be the set
<
. Then, define a subset
to be open if it can be expressed as the union of a collection, finite or infinite, of (possibly overlapping) open balls; the topology is by definition the collection of open sets.
It’s clear from the definition that the collection of open sets is indeed closed under arbitrary unions. To see it is closed under finite intersections, the crucial lemma needed is that the intersection of two overlapping open balls is itself a union of smaller open balls. A precise proof makes essential use of the triangle inequality. (Exercise?)
Topology is a huge field in its own right; much of our interest here will be in its interplay with logic. To that end, I want to bring in, in addition to the connectives “and” and “or” we’ve discussed so far, the implication connective in logic. Most readers probably know that in ordinary logic, the formula (“
implies
“) is equivalent to “either not
or
” — symbolically, we could define
as
. That much is true — in ordinary Boolean logic. But instead of committing ourselves to this reductionistic habit of defining implication in this way, or otherwise relying on Boolean algebra as a crutch, I want to take a fresh look at material implication and what we really ask of it.
The main property we ask of implication is modus ponens: given and
, we may infer
. In symbols, writing the inference or entailment relation as
, this is expressed as
. And, we ask that implication be the weakest possible such assumption, i.e., that material implication
be the weakest
whose presence in conjunction with
entails
. In other words, for given
and
, we now define implication
by the property
if and only if
As a very easy exercise, show by Yoneda that an implication is uniquely determined when it exists. As the next theorem shows, not all lattices admit an implication operator; in order to have one, it is necessary that distributivity holds:
Theorem:
- (1) If
is a meet-semilattice which admits an implication operator, then for every element
, the operator
preserves any sups which happen to exist in
.
- (2) If
is a frame, then
admits an implication operator.
Proof: (1) Suppose has a sup in
, here denoted
. We have
if and only if
if and only if
for all if and only if
for all if and only if
.
Since this is true for all , the (dual of the) Yoneda principle tells us that
, as desired. (We don’t need to add the hypothesis that the sup on the right side exists, for the first four lines after “We have” show that
satisfies the defining property of that sup.)
(2) Suppose are elements of a frame
. Define
to be
. By definition, if
, then
. Conversely, if
, then
where the equality holds because of the infinitary distributive law in a frame, and this last sup is clearly bounded above by (according to the defining property of sups). Hence
, as desired.
Incidentally, part (1) this theorem gives an alternative proof of the infinitary distributive law for Boolean algebras such as , so long as we trust that
really does what we ask of implication. We’ll come to that point again later.
Part (2) has some interesting consequences vis à vis topologies: we know that topologies provide examples of frames; therefore by part (2) they admit implication operators. It is instructive to work out exactly what these implication operators look like. So, let be open sets in a topology. According to our prescription, we define
as the sup (the union) of all open sets
with the property that
. We can think of this inclusion as living in the power set
. Then, assuming our formula
for implication in the Boolean algebra
(where
denotes the complement of
), we would have
. And thus, our implication
in the topology is the union of all open sets
contained in the (usually non-open) set
. That is to say,
is the largest open contained in
, otherwise known as the interior of
. Hence our formula:
= int
Definition: A Heyting algebra is a lattice which admits an implication
for any two elements
. A complete Heyting algebra is a complete lattice which admits an implication for any two elements.
Again, our theorem above says that frames are (extensionally) the same thing as complete Heyting algebras. But, as in the case of inf-lattices and sup-lattices, we make intensional distinctions when we consider the appropriate notions of morphism for these concepts. In particular, a morphism of frames is a poset map which preserves finite meets and arbitrary sups. A morphism of Heyting algebras preserves all structure in sight (i.e., all implied in the definition of Heyting algebra — meets, joins, and implication). A morphism of complete Heyting algebras also preserves all structure in sight (sups, infs, and implication).
Heyting algebras are usually not Boolean algebras. For example, it is rare that a topology is a Boolean lattice. We’ll be speaking more about that next time soon, but for now I’ll remark that Heyting algebra is the algebra which underlies intuitionistic propositional calculus.
Exercise: Show that in a Heyting algebra.
Exercise: (For those who know some general topology.) In a Heyting algebra, we define the negation to be
. For the Heyting algebra given by a topology, what can you say about
when
is open and dense?
Part 2:
After having understood the inclusion-exclusion principle by working out a few cases and examples in my earlier post, we are now ready to prove the general version of the principle.
As with many things in mathematics, there is a “normal” way of doing proofs and there is the “Polya/Szego” way of doing proofs. (Ok, ok, I admit it’s just a bias I have.) I will stick to the latter. Ok, let’s state the principle first and follow it up with its proof in a step-by-step fashion.
Inclusion-Exclusion Principle: Let there be a set of objects. Suppose out of these
objects, there are
objects of type
,
objects of type
objects of type
and
objects of type
. Also, suppose
denote the number of objects that are simultaneously of type
AND
AND
AND
AND
respectively. Then, the number of objects that are NOT of type
is
.
— Notation —
Let (finite or infinite) be the universe of discourse. Suppose
. Then, the characteristic function
of
is defined as
if
,
and otherwise, for all
.
For example, suppose . Let
(i.e. even integers.) Then,
, and so on.
Note: and
for all
. Here,
denotes the empty set. Due to this, we will use
and
interchangeably from now.
—
Lemma 1: iff
for all
.
Proof: We first prove the “only if”part. So, suppose . Let
. If
, then
. But, we also have
, in which case,
. If, on the other hand,
, then
. Hence, in either case,
for all
.
We now prove the “if” part. So, suppose for all
. Let
. Then,
, which forces
, which implies
. Hence,
, and this completes our proof.
Note: If is finite, then
.
Lemma 2:
and
for all . (Here,
is the complement of
.)
Proof: The proof for each case is elementary.
Lemma 3: Suppose is finite. If
, then the characteristic function of
is
, i.e.
for all
.
Proof: Note the above is an extension of the third part of lemma . A simple induction on the number of subsets of
proves the result.
— Proof of the inclusion-exclusion principle —
Now, suppose are subsets of objects of type
, respectively. Observe that the set of objects that are NOT of type
is simply the region outside of all the oval regions! (Look at the previous post to see what this means.) And this region is simply the subset
. Using the first part of lemma
, we see that the characteristic function of this outside region is
, which from lemma
is the same as
.
Expand the last expression to get
.
Now, sum over all the elements of and use the second part of lemma
to obtain the desired result. And this completes our proof.
Part 1:
A couple of weeks ago, my friend John (from UK) asked me if I could explain the Inclusion-Exclusion Principle to him. Wikipedia contains an article on the same topic but John felt that it wasn’t a very helpful introduction. So, as promised, here is the post on that topic, though I managed to finish it only after some delay. (Sorry, John!)
As the title of this post suggests, the inclusion-exclusion principle can simply be described as counting all the objects outside the oval regions! We will use Venn diagrams to explain what that means.
Note: denotes the greatest integer less than
.
Ok, let’s now “build” the principle step by step.
1. Suppose there are objects out of which there are exactly
objects of type
. How many objects are NOT of type
? The answer is obvious:
. The Venn diagram below depicts the answer pictorially. The rounded rectangular region (with the orange border) is the set of all
objects, and the oval region (with the blue border) is the set of all
objects of type
. Then, the remaining white region that is outside the oval region denotes the set of all objects that are NOT of type
, and clearly, there are
of ’em.

Indeed, let us take a simple example. Consider the set of first thirty natural numbers: . So,
. Now, out of these thirty integers, let
be the number of integers divisible by
. Then,
. It is easy to see that the number of integers NOT divisible by
equals
, which is what we would expect if we were to list all the integers not divisible by
. Indeed, those integers are
.
2. Now, suppose there are objects out of which there are exactly
objects of type
and
objects of type
. Also, suppose there are exactly
objects that are of both type
AND
. Then, how many objects are NOT of type
OR
? The Venn diagram below illustrates this case.

Again, we are counting the number of objects outside the two oval regions. To answer the above question, we first need to determine the number of objects inside the two oval regions, and then subtract this number from the total, which is . Now, one might be tempted to think that the number of objects inside the two oval regions is simply
. But this is only true if the two oval regions don’t intersect (i.e. they have no objects in common.) In the general case, however, the expression
counts
twice! And so, we must subtract
from the expression to get
as the exact number of objects inside the two oval regions. We can now see that the number of objects outside the two oval regions equals
, and we are done.
Continuing with our example used earlier, let be the number of integers divisible by
. Also, let
be the number of integers divisible by
AND
(i.e. we count multiples of
.) Now, note that
, and
.
Thus, using the formula derived above, the number of integers that are NOT divisible by OR
equals
. In fact, we can list these ten integers:
and
; and this confirms our answer.
3. Now, suppose there are objects out of which there are exactly
objects of type
,
objects of type
and
objects of type
. Also, let
denote the number of objects of type
AND
,
the number of objects of type
AND
,
the number of objects of type
AND
, and
the number of objects of type
AND
. Then, how many objects are NOT of type
OR
? This case is illustrated by the Venn diagram shown below.

Once again, let us ask, what is the number of objects inside the three oval regions. A possible answer is . Now this will only be true if the three oval regions are pairwise disjoint. In the general case, however, we will have to take care of overcounting, just as we did in
earlier. A brief thought will reveal that in the above expression, we have counted each of
and
twice and
thrice! To take care of this overcounting, we subtract each of
and
once from the expression, but in doing so, we also end up subtracting
thrice! We thus need to add
back into the expression to get
, and this expression yields the exact number of objects inside the three oval regions. Therefore, the number of objects outside the three oval regions equals
. And, we are done.
Again, continuing with our earlier example, let denote the number of integers divisible by
. Then,
. Also, let
denote the number of integers divisible by
AND
(i.e. we are counting multiples of
); then,
. Again, let
denote the number of integers divisible by
and
; then
. And, finally, let
denote the number of integers divisible by
and
; then
.
So, the number of integers NOT divisible by OR
equals
. Indeed, those eight integers are
and
.
It isn’t very hard to deduce the formula for the general case when we have a set of objects, out of which there are
objects of type
,
objects of type
, and so on. The proof of the general formula will follow in the next post, which may include a couple of problems/solutions involving this principle.
[Update: Thanks to Andreas for pointing out that I may have been a little sloppy in stating the maximum modulus principle! The version below is an updated and correct one. Also, Andreas pointed out an excellent post on “amplification, arbitrage and the tensor power trick” (by Terry Tao) in which the “tricks” discussed are indeed very useful and far more powerful generalizations of the “method” of E. Landau discussed in this post. The Landau method mentioned here, it seems, is just one of the many examples of the “tensor power trick”.]
The maximum modulus principle states that if (where
) is a holomorphic function, then
attains its maximal value on any compact
on the boundary
of
. (If
attains its maximal value anywhere in the interior of
, then
is a constant. However, we will not bother about this part of the theorem in this post.)
Problems and Theorems in Analysis II, by Polya and Szego, provides a short proof of the “inequality part” of the principle. The proof by E. Landau employs Cauchy’s integral formula, and the technique is very interesting and useful indeed. The proof is as follows.
From Cauchy’s integral formula, we have
,
for every in the interior of
.
Now, suppose on
. Then,
,
where the constant depends only on the curve
and on the position of
, and is independent of the specific choices of
. Now, this rough estimate can be significantly improved by applying the same argument to
, where
, to obtain
, or
.
By allowing to go to infinity, we get
, which is what we set out to prove.
Polya/Szego mention that the proof shows that a rough estimate may sometimes be transformed into a sharper estimate by making appropriate use of the generality for which the original estimate is valid.
I will follow this up with, maybe, a couple of problems/solutions to demonstrate the effectiveness of this useful technique.
About a couple of weeks ago, Noah Snyder liveblogged at the Secret Blogging Seminar on two topology talks given by Jacob Lurie. You may want to learn more about the contents of the talks by clicking the appropriate link. The thing is when I came to know about Noah’s liveblog, the thought that immediately sprang to my mind was the one involving his elementary proof of the well-known Mason-Stothers theorem.
In this post, I wish to present Noah Snyder’s proof of the Mason-Stothers Theorem, which is the polynomial version of the yet unproven (and well-known) ABC Conjecture in number theory. I will follow it up with a problem and its solution using the aforesaid theorem. For a detailed and wonderful exposition on the ABC conjecture, you may want to read an article, titled The ABC’s of Number Theory, written by Noam Elkies for The Harvard College Mathematics Review.
First, a brief history. Though this theorem on polynomials was proved by Stothers in 1981, it didn’t attract much attention until 1983 when it was rediscovered by Mason. Noah, in 1998 (while still in high school), gave perhaps the most elegant elementary proof.
The proof below is the version given in Serge Lang’s Undergraduate Algebra, which it seems has quite a number of typos.
Okay, now some terminology. If and
are polynomials, then
denotes the degree of
,
denotes the number of distinct zeros of
, and
denotes the
.
To illustrate, suppose and
. Then,
and
. And,
and
. Also,
.
Let us first prove a couple of useful lemmas before stating the theorem and its proof.
Lemma 1: If is a polynomial, then
has repeated roots iff
and
have a common root.
Proof: If has a root
of multiplicity
, then
, where
. Therefore,
.
Now, if , then
, which is the same thing as saying, if
does not have a repeated root, then
and
don’t have a common root. And, if
, then
has root
of multiplicity at least
. This is same as saying, if
has a repeated root
, then
and
have a common root
. And, this completes our proof.
Lemma 2: If is a polynomial, then
.
Proof: Suppose and
has distinct roots
, with multiplicities
, respectively. Then,
. Now, from the proof of lemma
above, we note that
. Therefore,
. And, we are done.
Okay, we are now ready to state the theorem.
Mason-Stothers Theorem: If
are relatively prime polynomials, not all constant, such that
, then
.
Proof: We first note that
.
Indeed, we have . Therefore,
. And, we are done.
Also, note that at least two of the polynomials and
are non-constants, for if any two polynomials are constants, then this forces the third to be a constant as well. So, without any loss of generality, assume
and
are non-constant polynomials. Now, we note that
, for otherwise,
, and since
and
are relatively prime, this would imply
, which leads to a contradiction!
Now, we observe that and
divide the left hand side of
, and
divides the right hand side of
, which is equal to the left hand side. And, since
and
are relatively prime, we conclude that
divides
.
The above implies that
,
which implies
Now, applying lemma to
, we obtain,
Using the above in , we get
since are relatively prime polynomials.
Due to symmetricity, the above arguments can similarly be repeated for and
to get similar inequalities for
and
. And, this concludes our proof.
( Earlier, I had mentioned I would pose a problem and also give its solution that would use the above theorem. I will do that in my next post.)
Recent Comments