Part 2:
After having understood the inclusion-exclusion principle by working out a few cases and examples in my earlier post, we are now ready to prove the general version of the principle.
As with many things in mathematics, there is a “normal” way of doing proofs and there is the “Polya/Szego” way of doing proofs. (Ok, ok, I admit it’s just a bias I have.) I will stick to the latter. Ok, let’s state the principle first and follow it up with its proof in a step-by-step fashion.
Inclusion-Exclusion Principle: Let there be a set of objects. Suppose out of these
objects, there are
objects of type
,
objects of type
objects of type
and
objects of type
. Also, suppose
denote the number of objects that are simultaneously of type
AND
AND
AND
AND
respectively. Then, the number of objects that are NOT of type
is
.
— Notation —
Let (finite or infinite) be the universe of discourse. Suppose
. Then, the characteristic function
of
is defined as
if
,
and otherwise, for all
.
For example, suppose . Let
(i.e. even integers.) Then,
, and so on.
Note: and
for all
. Here,
denotes the empty set. Due to this, we will use
and
interchangeably from now.
—
Lemma 1: iff
for all
.
Proof: We first prove the “only if”part. So, suppose . Let
. If
, then
. But, we also have
, in which case,
. If, on the other hand,
, then
. Hence, in either case,
for all
.
We now prove the “if” part. So, suppose for all
. Let
. Then,
, which forces
, which implies
. Hence,
, and this completes our proof.
Note: If is finite, then
.
Lemma 2:
and
for all . (Here,
is the complement of
.)
Proof: The proof for each case is elementary.
Lemma 3: Suppose is finite. If
, then the characteristic function of
is
, i.e.
for all
.
Proof: Note the above is an extension of the third part of lemma . A simple induction on the number of subsets of
proves the result.
— Proof of the inclusion-exclusion principle —
Now, suppose are subsets of objects of type
, respectively. Observe that the set of objects that are NOT of type
is simply the region outside of all the oval regions! (Look at the previous post to see what this means.) And this region is simply the subset
. Using the first part of lemma
, we see that the characteristic function of this outside region is
, which from lemma
is the same as
.
Expand the last expression to get
.
Now, sum over all the elements of and use the second part of lemma
to obtain the desired result. And this completes our proof.
18 comments
Comments feed for this article
March 21, 2008 at 10:34 pm
Todd Trimble
It’s a nice proof. So you’ve isomorphically translated the power set PX into 2^X, the set of functions from X to 2 = {0, 1}. The distributive law for the Boolean algebra PX is then faithfully represented by the distributive law for the Boolean ring of functions, with the pointwise structure of addition and multiplication on functions, itself inherited from the ring structure on 2. Then apply ring distributivity to death to get the conclusion!
The inclusion-exclusion principle is really a whole family of results having a similar flavor. Here’s one, mentioned in the wonderful book by Klain and Rota, An Introduction to Geometric Probability. If you have a “lattice of sets” (i.e., a subset of a power set PX which is closed under finite intersections and finite unions, and contains X and the empty set), call it L, then there is a notion of *valuation* on L: a real-valued function m on L such that
. Think of m as a general kind of “measure”; it could be the counting measure as in your post, or it could be a different kind of measure, like m(A) = probability that an element of X lies in A. Anyway, inclusion-exclusion can be generalized so as to apply to these more general valuations.
It probably comes as no surprise that inclusion-exclusion is a key principle in enumerative combinatorics. You might find the relatively high-powered discussion in Stanley’s Enumerative Combinatorics (chapter 2, Sieve Methods) pretty interesting. In fact, there is a far-reaching generalization of inclusion-exclusion called “Moebius inversion” on a poset; see example 3.8.3 on page 118 of Stanley’s book, and the surrounding material. Rota had a field day with this stuff in his work!
March 28, 2008 at 6:26 am
Vishal
Todd,
What you wrote in the first paragraph above, how might that be re-phrased in categorical language? I am very curious.
March 28, 2008 at 9:48 pm
Todd Trimble
From a categorical perspective, perhaps the most important thing to be rephrased is the relatively innocent-looking first (actually, second) sentence. [I’m sure you want to hear more than just about that one bit, but there is so much to say even about that one little sentence that I think I will defer discussion of the rest to another comment.] On the one hand, we have PX, the set of subsets of X, partially ordered by inclusion. On the other, we have 2^X, the set of functions from X to the set 2 of truth values {false = 0, true = 1}. Between them is a canonical bijection PX –> 2^X, which takes a subset A to the function which you denoted as 1_A (also called the characteristic function of the subset
, and often denoted by
).
The category-theoretic translation of this turns out to be the key idea underlying the notion of topos, as defined by F. William Lawvere and Myles Tierney (who was my PhD adviser).
First we need the concept of “subobject”, generalizing the notion of subset of a set. As a first approximation, a subset
gives an injective (i.e., a one-to-one, but perhaps not onto) function from
to
, and there is a clear-cut way of expressing injectivity in categorical terms: a morphism
is monic if, for any pair of morphisms
,
are equal whenever
.
Now, as you know, a key insight of category theory is that mathematical structure is determined not by what its elements “are”, but by the abstract relations between them (as Hilbert said, instead of points, lines, planes in geometry, we might just as well speak of tables, chairs, and beer mugs — provided that suitable relations among them were specified). It is thus mathematically more relevant to ask when two structures are isomorphic rather than exactly the same, and we are permitted to “identify” two structures as soon as an isomorphism between them is specified. With that in mind, let us say that two monomorphisms
,
define the same subobject of
if they are connected by an isomorphism
such that
. Observe that there can be at most one such isomorphism
(because
is a monomorphism), so there is no issue of specifying which isomorphism we mean.
(To give a little more detail: one can order the monomorphisms into
by the relation
which says there exists some [necessarily unique] morphism
[not necessarily an isomorphism] such that
. It is not difficult to prove that
itself must be a monomorphism. Exercise: prove that monomorphisms
define the same subobject of
if both
and
. Thus,
defines a partial order on subobjects of
.)
Now we begin to come to the main point. The crucial categorical property of the two-element set “2” is that it acts as a “subobject classifier” (in the category of sets). More precisely, 2 is equipped with a “generic” or “universal” subset given by a monomorphism
; this element of 2 is traditionally called “true”. The word “universal” here means that any subset
of
is obtained by “pulling back” the generic subset
, i.e., taking the inverse image
, along some unique map
called the characteristic map
of
. (If you’ve ever seen universal or classifying bundles in topology, this should ring a bell; if not, consider it foreshadowing!)
Here then is the official definition: if
is a category with pullbacks and with terminal object 1, a subobject classifier in
is an object
equipped with a morphism
(necessarily monic — why?), such that for any given monomorphism
, there exists a unique morphism
such that the pair
is the pullback of the diagram
. [Note that pullbacks are unique only up to isomorphism, so we are actually specifying a bijective correspondence between subobjects of
and morphisms
.]
Definition: A topos is a finitely complete cartesian closed category with a subobject classifier.
This presupposes that one knows what “cartesian closed” means (roughly, it means that for any two objects
there is a “function space object”
whose elements are identifiable with morphisms from
to
). There are lots of places where one can learn more about cartesian closed categories (I think for example John Armstrong treated these a bit on his blog), but let me just say that despite the somewhat forbidding reputation that topos theory seems to have, the notion itself is not terribly complicated. It does take time and effort though to see that how such a short definition can germinate such a rich and powerful (and beautiful!) theory.
Before taking leave, though, I’ll give another, very compact definition of topos, based on a categorical way to express the notion of “power set”. It is a slight elaboration on the theme of subobject classifier.
Let
be a category with finite limits (i.e., with pullbacks and a terminal object). If
are objects of
, a relation from
to
is by definition a subobject of
(given by a monomorphism
). A power object of
is nothing but a universal relation to
. This means an object (denoted
), together with a relation from
to
(that is, a certain subobject of
, usually denoted
; in the category of sets, it would be the subset
), that is “universal” in a sense similar to what we saw before for the subobject classifier. Namely, every monomorphism
is obtained up to isomorphism by pulling back the relation
along
, for some unique morphism $\chi_R: Y \to PX$.
(If this sounds like a mouthful, it is not so bad: in the category of sets, given a relation
, the map
sends an element
to the element of the power set
given by the subset
.)
Definition: A topos is a finitely complete category in which every object admits a power object.
And that’s it! It is a remarkable fact that from this one-line definition, one can deduce many of the fundamental constructions found in set theory (unions, coequalizers, image factorizations, etc. etc.). The notion of topos turns out to be a profound expansion of the concept “category of sets”.
More later…
March 29, 2008 at 12:39 am
#John Smith
Todd I don’t pretend to understand any of this but may I ask are you a math professor? I’m curious to know what your background is.
Instead of asking what the elements of a structure ‘are’ couldn’t we equally well ask what the relations ‘are’? What is the status of these ‘relations’?
March 29, 2008 at 12:35 pm
Todd Trimble
I used to be a math professor. Right now I’m a full time parent, but I continue to think and talk about math as time allows. Unsurprisingly, my background at the research level has centered on theory and (mathematical) applications of categories. But I’m interested in lots of math aside from that, too.
To paraphrase Bill Clinton, it depends what the definition of ‘are’ is. What I was trying to say is that in defining a mathematical structure like (say) “the” field with two elements, it doesn’t make a whit of difference what actual set we start with, whether {even, odd}, or {cat, dog}, whatever… The only thing which is mathematically relevant is to specify the relations between them, in the case of fields the addition and multiplication tables. And in comparing two structures plucked out of the air like this, it is not important to know for example which elements they have in common (e.g., whether ‘cat’ belongs to {even, odd}) — we are more interested in the mappings between them which preserve those relations relevant to the structures at hand.
We do of course use the words ‘are’, ‘is’, etc. in ways which are mathematically relevant. For instance, it would make sense to ask (if presented with a field structure on the set {cat, dog}) whether ‘cat’ is the additive identity element. But those instances of ‘is’, ‘is not’ are local ones — internal to the structure at hand. So questions of existence and ‘isness’ in actual mathematical practice are local ones, relative to the structure of discourse.
The ‘ontological’ status of relations is no different from that of elements: it makes sense to ask whether a binary relation R (a subset of A x A for some given set A) is or is not the same as another binary relation S on A, but again such questions as they arise in practice tend to be local and relative.
March 29, 2008 at 12:39 pm
Todd Trimble
Oh, by the way #John Smith, as long as we’re talking — can I ask you what your background is too? It’s sometimes hard to know how to pitch a reply without some idea of things like that.
March 29, 2008 at 6:23 pm
#John Smith
I am looking to finish my high school degree as an older student in my early 20’s. Vishal has been helping me in various ways, for which I’m grateful.
From wikipedia it says: ‘If a morphism f has domain X and codomain Y, we write f : X → Y. Thus a morphism is an arrow from its domain to its codomain’ So my question is what IS a morphism (rigorously defined)?
I do not know the details, however what would the n-morphism be for an n category where n is undefined? or if n tends to infinity?
March 29, 2008 at 7:56 pm
Todd Trimble
In the abstract, ‘morphism’ is an undefined term, much like ‘point’, ‘line’, ‘plane’ are in Euclidean geometry. It would be preferable to say that a category consists of a collection O (a set, if you like) of things called ‘objects’, a collection M of things called ‘morphisms’, a function dom: M –> O called the ‘domain’ function, another function cod: M –> O called the ‘codomain’ function… written this way, it’s a somewhat longish definition which no doubt you will already have seen on wikipedia. So in the abstract, it’s not that you define morphisms — rather you define the notion of category, calling certain of its constituents ‘morphisms’.
Having said that… in studying category theory, many of the examples of categories one first encounters have the flavor where the objects are sets equipped with a specified type of structure (such as a structure of group, or of ring, or of partially ordered set, etc.), and the morphisms are functions between those sets which preserve that type of structure; for example, a homomorphism from a group G to a group H is a function from G to H, f: G –> H, such that f takes a product g1g2 of elements of G to the corresponding product of elements in H, f(g1)f(g2). In symbols, f(g1g2) = f(g1)f(g2). In cases like that, mathematicians will sometimes introduce a category by describing just the objects (as sets with extra structure), and say something like “and where morphisms are defined in the obvious way” — they typically mean where the morphisms are functions which preserve the type of structure being discussed. But not all categories are like that.
It would be a little hard to meaningfully discuss the various levels of morphism used in n-categories, because these are very complicated types of structure, but a good starting place is n = 2. The prototypical example of a 2-category is where the ‘objects’ or ‘0-cells’ are by definition categories, where the 1-morphisms or 1-cells C –> D (where C and D are 0-cells, i.e., categories) are by definition functors from C to D, and where the 2-morphisms or 2-cells F –> G (where F and G are functors of the form C –> D) are natural transformations from F to G.
In the abstract, an n-category consists of collections C_k of k-cells, one for each k from 0 to n, together with, for each k from 1 to n, a domain function
dom_k: C_k –> C_{k-1}
and a codomain function
cod_k: C_k –> C_{k-1}
which roughly say that a k-cell is an ‘arrow’ between (k-1)-cells, which are themselves arrows between (k-2)-cells, etc., all the way down to the 0-cells. Then there is a boatload of data to describe the various ways these cells can be composed… it’s very complicated business. But you might like to poke around John Baez’s This Week’s Finds in Mathematical Physics, for some semi-popular accounts of n-categories and what they are good for — he has been one of the most effective champions of this brave and exciting new world.
March 30, 2008 at 4:39 pm
Todd Trimble
To continue in reply to Vishal’s query on categorically formulating his proof of inclusion-exclusion: so far I have discussed in categorical terms the crucial role played by characteristic functions, which gives a way of translating back and forth between subsets
and functions
into the two-element set
. This in turn gives a way of translating back and forth between the algebra of subsets, and the algebra of functions mapping into
. This back-and-forth translation is very clearly seen in Vishal’s Lemmas 1, 2, and 3. And, it is the key idea underlying the notion of topos.
Here, we have emphasized the role of 2 as the subobject classifier in the category of sets. There is of course a more obvious description of 2: it is a disjoint union of two 1-point sets, or, in categorical language, a coproduct of two copies of a terminal object.
It’s a sort of interesting and significant fact that in most toposes, the subobject classifier
is not the same (is not isomorphic to) the object 2 (defined as the coproduct 1 + 1, where 1 is terminal). There is, in any topos, a comparison map
: according to how coproducts are defined, such a map is specified by a pair of maps from 1 to
. One map we have already seen: it is the universal subobject
, aka the element “true”. The other is a map
called “false” — by definition, it is the characteristic map of the subobject
, where 0 is the initial object (in the category of sets, 0 is just the empty set). Now, when the comparison map
is an isomorphism, we say that the topos is Boolean. As I say, most toposes which arise in practice are not Boolean (and this is the source of the fact that the “internal logic” of a topos is in general intuitionistic). But, many toposes are Boolean, and it turns out there are effective methods for passing between Boolean and non-Boolean toposes, akin to the so-called “double negation translation” developed by Gödel and Kolmogorov.
I bring all this up because Vishal’s proof of inclusion-exclusion is tied to the Boolean nature of the category of sets: we are in a situation where we can use the classical logic we have all come to know and love, including (crucially) the law of the excluded middle: for any subobject
, we have
, equivalently, for all subobjects
,
, where
denotes complementation. (This may beg the question of what we mean by “complementation” in a general topos: briefly,
is defined to be the largest subobject of
whose intersection with
is the
.)
empty subobject
Anyway, life in a Boolean topos like the category of sets is a very convenient situation to be in, in part because of a very convenient fact discovered by the mathematician Marshall Stone: a Boolean algebra is essentially the same thing as a Boolean ring. I’ll assume “everyone” knows what a Boolean algebra is. A Boolean ring on the other hand is a commutative ring with identity in which every element
is idempotent: satisfies
. Given a Boolean algebra
, we get a Boolean ring whose underlying set is the underlying set of
, and where the multiplication is given by the meet or intersection operation of
(what else?), and of course this is idempotent because
. The addition operation on the other hand is defined to be the symmetric difference operation in the Boolean algebra; it should be observed that
(the symmetric difference of two elements in a Boolean algebra is the “empty” or bottom element 0). This is mandated by the assumption we get a Boolean ring (proof:
, then cancel to get
). Conversely, given a Boolean ring, we can get a Boolean algebra; I’ll leave that as an exercise. In categorical language, we would say that the category of Boolean algebras is equivalent to the category of Boolean rings.
This is important because Vishal systematically translated all the Boolean algebra operations into (Boolean) ring operations, which allowed him to do some straightforward algebra, including some crucial uses of subtraction in the ring.
Okay, anyway, that’s a categorical take on the matter in response to Vishal’s query. I should add that I am fully aware that all this category theory is perhaps a sophisticated acquired taste, and that I feel some diffidence in commandeering the discussion in this way (with comments which are both long-winded and rapid in pace — this material can take a while to digest), but maybe it will interest some people and stimulate them to read further into these matters.
March 30, 2008 at 6:10 pm
John Smith
But if we consider a group as an object and the elements as arrows, how would the group axioms be proven categorically?
March 30, 2008 at 11:52 pm
Todd Trimble
John, I’m not absolutely sure what you mean. At first it sounded like you might be referring to the (correct) fact that a group can be equivalently expressed as a one-object category in which every arrow is invertible.
If that was what you wanted to prove, then in outline it works like this. Given a group G, you construct a category with just one object — take the object to be anything you want, you could take it to be yourself for that matter, but let’s call it O. Define the morphisms of the category to be the elements of the group G. You need to specify the domain and codomain of each morphism, but since there’s just one object O, there’s no choice in the matter. Then you have to define composition of morphisms (say the composite of g: O –> O followed by g’: O –> O) — there you can define it to be gg’, their product as given by the group multiplication. The identity morphism 1_O: O –> O will then perforce be given by the identity element e of the group. You have to check that the axioms for this to be a category are satisfied (principally, the associativity law), but this follows straightforwardly from the associativity law for the group multiplication.
Every morphism in the category constructed this way is invertible; this is straightforward from the fact that every element g in the group has an inverse g^{-1}.
If all this is clear, then it should also be clear that you can reverse the process: given a category with just one object and in which every morphism is invertible, you can define a group whose elements are the morphisms of the category, and where the group multiplication is given by composition of morphisms.
If you haven’t already, you may want to get hold of a primer on category theory. John Armstrong’s notes might also be helpful.
March 31, 2008 at 7:28 pm
John Smith
Todd, is it alright if I speak to you by email because my questions may not be directly related to the topic here or may head in a different direction?
March 31, 2008 at 7:51 pm
John Smith
I’m not entirely clear about what you’ve said. When you say there’s no choice in the matter for the co/domain, where do the arrows go?
March 31, 2008 at 8:00 pm
Todd Trimble
John, sure, ask Vishal for my email address. I’ll try to set aside some time to address your questions.
I meant that in this construction, we stipulated only one object O, so the domain and codomain of any morphism in the category must be that object. Hence for this category, every morphism takes the form g: O –> O.
July 24, 2008 at 3:35 pm
The Inclusion-Exclusion Principle « The Unapologetic Mathematician
[…] the cardinality of unions in terms of the sets in question and their intersection: the Inclusion-Exclusion Principle. Predictably enough, this formula is reflected in the subspaces of a vector space. We […]
November 15, 2009 at 11:27 am
ALEEFA
how do you prove that |AUBUCUD|= |A| + |B| + |C| + |D| – |AnB| – |AnC| – |AnD| – |BnC| – |BnD| – |CnD| + |AnBnC| + |AnBnD| + |AnCnD| + |BnCnD| – |AnBnCnD| using the inclusion exclusion principle?
November 15, 2009 at 4:39 pm
John Armstrong
You do your own homework, ALEEFA.
March 19, 2016 at 7:24 am
Characteristic function of union of two sets formula and intuition - MathHub
[…] From https://topologicalmusings.wordpress.com/2008/03/20/inclusion-exclusion-principle-counting-all-the-ob… […]