**Part 2:**

After having understood the *inclusion-exclusion principle* by working out a few cases and examples in my earlier post, we are now ready to prove the general version of the principle.

As with many things in mathematics, there is a “normal” way of doing proofs *and* there is the “Polya/Szego” way of doing proofs. (Ok, ok, I admit it’s just a bias I have.) I will stick to the latter. Ok, let’s state the principle first and follow it up with its proof in a step-by-step fashion.

**Inclusion-Exclusion Principle:** Let there be a set of objects. Suppose out of these objects, there are objects of type , objects of type objects of type and objects of type . Also, suppose denote the number of objects that are *simultaneously* of type AND AND AND AND respectively. Then, the number of objects that are NOT of type is

.

**— ****Notation —
**

Let (finite or infinite) be the universe of discourse. Suppose . Then, the *characteristic function* of is defined as

if ,

and otherwise, for all .

For example, suppose . Let (*i.e.* even integers.) Then, , and so on.

*Note*: and for all . Here, denotes the *empty set*. Due to this, we will use and interchangeably from now.

**—**

**Lemma 1:** iff for all .

**Proof:** We first prove the “only if”part. So, suppose . Let . If , then . But, we also have , in which case, . If, on the other hand, , then . Hence, in either case, for all .

We now prove the “if” part. So, suppose for all . Let . Then, , which forces , which implies . Hence, , and this completes our proof.

*Note*: If is finite, then .

**Lemma 2:**

and

for all . (Here, is the complement of .)

**Proof:** The proof for each case is elementary.

**Lemma 3:** Suppose is finite. If , then the characteristic function of is , *i.e.*

for all .

**Proof:** Note the above is an extension of the third part of lemma . A simple induction on the number of subsets of proves the result.

*—* **Proof** of the *inclusion-exclusion principle — *

Now, suppose are subsets of objects of type , respectively. Observe that the set of objects that are NOT of type is simply the *region outside of all the oval regions*! (Look at the previous post to see what this means.) And this region is simply the subset . Using the first part of lemma , we see that the characteristic function of this *outside region* is , which from lemma is the same as

.

Expand the last expression to get

.

Now, sum over all the elements of and use the second part of lemma to obtain the desired result. And this completes our proof.

## 17 comments

Comments feed for this article

March 21, 2008 at 10:34 pm

Todd TrimbleIt’s a nice proof. So you’ve isomorphically translated the power set PX into 2^X, the set of functions from X to 2 = {0, 1}. The distributive law for the Boolean algebra PX is then faithfully represented by the distributive law for the Boolean ring of functions, with the pointwise structure of addition and multiplication on functions, itself inherited from the ring structure on 2. Then apply ring distributivity to death to get the conclusion!

The inclusion-exclusion principle is really a whole family of results having a similar flavor. Here’s one, mentioned in the wonderful book by Klain and Rota, An Introduction to Geometric Probability. If you have a “lattice of sets” (i.e., a subset of a power set PX which is closed under finite intersections and finite unions, and contains X and the empty set), call it L, then there is a notion of *valuation* on L: a real-valued function m on L such that . Think of m as a general kind of “measure”; it could be the counting measure as in your post, or it could be a different kind of measure, like m(A) = probability that an element of X lies in A. Anyway, inclusion-exclusion can be generalized so as to apply to these more general valuations.

It probably comes as no surprise that inclusion-exclusion is a key principle in enumerative combinatorics. You might find the relatively high-powered discussion in Stanley’s Enumerative Combinatorics (chapter 2, Sieve Methods) pretty interesting. In fact, there is a far-reaching generalization of inclusion-exclusion called “Moebius inversion” on a poset; see example 3.8.3 on page 118 of Stanley’s book, and the surrounding material. Rota had a field day with this stuff in his work!

March 28, 2008 at 6:26 am

VishalTodd,

What you wrote in the first paragraph above, how might that be re-phrased in categorical language? I am very curious.

March 28, 2008 at 9:48 pm

Todd TrimbleFrom a categorical perspective, perhaps the most important thing to be rephrased is the relatively innocent-looking first (actually, second) sentence. [I’m sure you want to hear more than just about that one bit, but there is so much to say even about that one little sentence that I think I will defer discussion of the rest to another comment.] On the one hand, we have PX, the set of subsets of X, partially ordered by inclusion. On the other, we have 2^X, the set of functions from X to the set 2 of truth values {false = 0, true = 1}. Between them is a canonical bijection PX –> 2^X, which takes a subset A to the function which you denoted as 1_A (also called the

characteristic functionof the subset , and often denoted by ).The category-theoretic translation of this turns out to be the key idea underlying the notion of

topos, as defined by F. William Lawvere and Myles Tierney (who was my PhD adviser).First we need the concept of “subobject”, generalizing the notion of subset of a set. As a first approximation, a subset gives an injective (i.e., a one-to-one, but perhaps not onto) function from to , and there is a clear-cut way of expressing injectivity in categorical terms: a morphism is

monicif, for any pair of morphisms , are equal whenever .Now, as you know, a key insight of category theory is that mathematical structure is determined not by what its elements “are”, but by the abstract relations between them (as Hilbert said, instead of points, lines, planes in geometry, we might just as well speak of tables, chairs, and beer mugs — provided that suitable relations among them were specified). It is thus mathematically more relevant to ask when two structures are

isomorphicrather than exactly the same, and we are permitted to “identify” two structures as soon as an isomorphism between them is specified. With that in mind, let us say that two monomorphisms ,define the same subobjectof if they are connected by an isomorphism such that . Observe that there can be at most one such isomorphism (because is a monomorphism), so there is no issue of specifying which isomorphism we mean.(To give a little more detail: one can order the monomorphisms into by the relation which says there exists some [necessarily unique] morphism [not necessarily an isomorphism] such that . It is not difficult to prove that itself must be a monomorphism. Exercise: prove that monomorphisms define the same subobject of if both and . Thus, defines a

partial orderon subobjects of .)Now we begin to come to the main point. The crucial categorical property of the two-element set “2” is that it acts as a “subobject classifier” (in the category of sets). More precisely, 2 is equipped with a “generic” or “universal” subset given by a monomorphism ; this element of 2 is traditionally called “true”. The word “universal” here means that any subset of is obtained by “pulling back” the generic subset , i.e., taking the inverse image , along some unique map called the characteristic map of . (If you’ve ever seen universal or classifying bundles in topology, this should ring a bell; if not, consider it foreshadowing!)

Here then is the official definition: if is a category with pullbacks and with terminal object 1, a

subobject classifierin is an object equipped with a morphism (necessarily monic — why?), such that for any given monomorphism , there exists a unique morphism such that the pair is the pullback of the diagram . [Note that pullbacks are unique only up to isomorphism, so we are actually specifying a bijective correspondence between subobjects of and morphisms .]Definition: A

toposis a finitely complete cartesian closed category with a subobject classifier.This presupposes that one knows what “cartesian closed” means (roughly, it means that for any two objects there is a “function space object” whose elements are identifiable with morphisms from to ). There are lots of places where one can learn more about cartesian closed categories (I think for example John Armstrong treated these a bit on his blog), but let me just say that despite the somewhat forbidding reputation that topos theory seems to have, the notion itself is not terribly complicated. It

doestake time and effort though to see that how such a short definition can germinate such a rich and powerful (and beautiful!) theory.Before taking leave, though, I’ll give another, very compact definition of topos, based on a categorical way to express the notion of “power set”. It is a slight elaboration on the theme of subobject classifier.

Let be a category with finite limits (i.e., with pullbacks and a terminal object). If are objects of , a

relationfrom to is by definition a subobject of (given by a monomorphism ). Apower objectof is nothing but a universal relation to . This means an object (denoted ), together with a relation from to (that is, a certain subobject of , usually denoted ; in the category of sets, it would be the subset ), that is “universal” in a sense similar to what we saw before for the subobject classifier. Namely, every monomorphism is obtained up to isomorphism by pulling back the relation along , for some unique morphism $\chi_R: Y \to PX$.(If this sounds like a mouthful, it is not so bad: in the category of sets, given a relation , the map sends an element to the element of the power set given by the subset .)

Definition: A topos is a finitely complete category in which every object admits a power object.

And that’s it! It is a remarkable fact that from this one-line definition, one can deduce many of the fundamental constructions found in set theory (unions, coequalizers, image factorizations, etc. etc.). The notion of topos turns out to be a profound expansion of the concept “category of sets”.

More later…

March 29, 2008 at 12:39 am

#John SmithTodd I don’t pretend to understand any of this but may I ask are you a math professor? I’m curious to know what your background is.

Instead of asking what the elements of a structure ‘are’ couldn’t we equally well ask what the relations ‘are’? What is the status of these ‘relations’?

March 29, 2008 at 12:35 pm

Todd TrimbleI used to be a math professor. Right now I’m a full time parent, but I continue to think and talk about math as time allows. Unsurprisingly, my background at the research level has centered on theory and (mathematical) applications of categories. But I’m interested in lots of math aside from that, too.

To paraphrase Bill Clinton, it depends what the definition of ‘are’ is. What I was trying to say is that in defining a mathematical structure like (say) “the” field with two elements, it doesn’t make a whit of difference what actual set we start with, whether {even, odd}, or {cat, dog}, whatever… The only thing which is mathematically relevant is to specify the relations between them, in the case of fields the addition and multiplication tables. And in comparing two structures plucked out of the air like this, it is not important to know for example which elements they have in common (e.g., whether ‘cat’ belongs to {even, odd}) — we are more interested in the mappings between them which preserve those relations relevant to the structures at hand.

We do of course use the words ‘are’, ‘is’, etc. in ways which

aremathematically relevant. For instance, it would make sense to ask (if presented with a field structure on the set {cat, dog}) whether ‘cat’isthe additive identity element. But those instances of ‘is’, ‘is not’ are local ones — internal to the structure at hand. So questions of existence and ‘isness’ in actual mathematical practice are local ones, relative to the structure of discourse.The ‘ontological’ status of relations is no different from that of elements: it makes sense to ask whether a binary relation R (a subset of A x A for some given set A) is or is not the same as another binary relation S on A, but again such questions as they arise in practice tend to be local and relative.

March 29, 2008 at 12:39 pm

Todd TrimbleOh, by the way #John Smith, as long as we’re talking — can I ask you what your background is too? It’s sometimes hard to know how to pitch a reply without some idea of things like that.

March 29, 2008 at 6:23 pm

#John SmithI am looking to finish my high school degree as an older student in my early 20’s. Vishal has been helping me in various ways, for which I’m grateful.

From wikipedia it says: ‘If a morphism f has domain X and codomain Y, we write f : X → Y. Thus a morphism is an arrow from its domain to its codomain’ So my question is what IS a morphism (rigorously defined)?

I do not know the details, however what would the n-morphism be for an n category where n is undefined? or if n tends to infinity?

March 29, 2008 at 7:56 pm

Todd TrimbleIn the abstract, ‘morphism’ is an undefined term, much like ‘point’, ‘line’, ‘plane’ are in Euclidean geometry. It would be preferable to say that a

categoryconsists of a collection O (a set, if you like) of things called ‘objects’, a collection M of things called ‘morphisms’, a function dom: M –> O called the ‘domain’ function, another function cod: M –> O called the ‘codomain’ function… written this way, it’s a somewhat longish definition which no doubt you will already have seen on wikipedia. So in the abstract, it’s not that you define morphisms — rather you define the notion of category, calling certain of its constituents ‘morphisms’.Having said that… in studying category theory, many of the examples of categories one first encounters have the flavor where the objects are sets equipped with a specified type of structure (such as a structure of group, or of ring, or of partially ordered set, etc.), and the morphisms are functions between those sets which

preservethat type of structure; for example, a homomorphism from a group G to a group H is a function from G to H, f: G –> H, such that f takes a product g1g2 of elements of G to the corresponding product of elements in H, f(g1)f(g2). In symbols, f(g1g2) = f(g1)f(g2). In cases like that, mathematicians will sometimes introduce a category by describing just the objects (as sets with extra structure), and say something like “and where morphisms are defined in the obvious way” — they typically mean where the morphisms are functions which preserve the type of structure being discussed. But not all categories are like that.It would be a little hard to meaningfully discuss the various levels of morphism used in n-categories, because these are very complicated types of structure, but a good starting place is n = 2. The prototypical example of a 2-category is where the ‘objects’ or ‘0-cells’ are by definition categories, where the 1-morphisms or 1-cells C –> D (where C and D are 0-cells, i.e., categories) are by definition functors from C to D, and where the 2-morphisms or 2-cells F –> G (where F and G are functors of the form C –> D) are

natural transformationsfrom F to G.In the abstract, an n-category consists of collections C_k of k-cells, one for each k from 0 to n, together with, for each k from 1 to n, a domain function

dom_k: C_k –> C_{k-1}

and a codomain function

cod_k: C_k –> C_{k-1}

which roughly say that a k-cell is an ‘arrow’ between (k-1)-cells, which are themselves arrows between (k-2)-cells, etc., all the way down to the 0-cells. Then there is a boatload of data to describe the various ways these cells can be composed… it’s very complicated business. But you might like to poke around John Baez’s This Week’s Finds in Mathematical Physics, for some semi-popular accounts of n-categories and what they are good for — he has been one of the most effective champions of this brave and exciting new world.

March 30, 2008 at 4:39 pm

Todd TrimbleTo continue in reply to Vishal’s query on categorically formulating his proof of inclusion-exclusion: so far I have discussed in categorical terms the crucial role played by

characteristic functions, which gives a way of translating back and forth between subsets and functions into the two-element set . This in turn gives a way of translating back and forth between the algebra of subsets, and the algebra of functions mapping into . This back-and-forth translation is very clearly seen in Vishal’s Lemmas 1, 2, and 3. And, it is the key idea underlying the notion of topos.Here, we have emphasized the role of 2 as the

subobject classifierin the category of sets. There is of course a more obvious description of 2: it is a disjoint union of two 1-point sets, or, in categorical language, a coproduct of two copies of a terminal object.It’s a sort of interesting and significant fact that in most toposes, the subobject classifier is not the same (is not isomorphic to) the object 2 (defined as the coproduct 1 + 1, where 1 is terminal). There

is, in any topos, a comparison map : according to how coproducts are defined, such a map is specified by a pair of maps from 1 to . One map we have already seen: it is the universal subobject , aka the element “true”. The other is a map called “false” — by definition, it is the characteristic map of the subobject , where 0 is the initial object (in the category of sets, 0 is just the empty set). Now, when the comparison map is an isomorphism, we say that the topos isBoolean. As I say, most toposes which arise in practice are not Boolean (and this is the source of the fact that the “internal logic” of a topos is in generalintuitionistic). But, many toposes are Boolean, and it turns out there are effective methods for passing between Boolean and non-Boolean toposes, akin to the so-called “double negation translation” developed by Gödel and Kolmogorov.I bring all this up because Vishal’s proof of inclusion-exclusion is tied to the Boolean nature of the category of sets: we are in a situation where we can use the classical logic we have all come to know and love, including (crucially) the law of the excluded middle: for any subobject , we have , equivalently, for all subobjects , , where denotes complementation. (This may beg the question of what we mean by “complementation” in a general topos: briefly, is defined to be the largest subobject of whose intersection with is the

empty subobject .)

Anyway, life in a Boolean topos like the category of sets is a very convenient situation to be in, in part because of a very convenient fact discovered by the mathematician Marshall Stone: a Boolean algebra is essentially the same thing as a Boolean ring. I’ll assume “everyone” knows what a Boolean algebra is. A

Boolean ringon the other hand is a commutative ring with identity in which every element isidempotent: satisfies . Given a Boolean algebra , we get a Boolean ring whose underlying set is the underlying set of , and where the multiplication is given by the meet or intersection operation of (what else?), and of course this is idempotent because . The addition operation on the other hand is defined to be thesymmetric differenceoperation in the Boolean algebra; it should be observed that (the symmetric difference of two elements in a Boolean algebra is the “empty” or bottom element 0). This is mandated by the assumption we get a Boolean ring (proof: , then cancel to get ). Conversely, given a Boolean ring, we can get a Boolean algebra; I’ll leave that as an exercise. In categorical language, we would say that the category of Boolean algebras is equivalent to the category of Boolean rings.This is important because Vishal systematically translated all the Boolean algebra operations into (Boolean)

ringoperations, which allowed him to do some straightforward algebra, including some crucial uses ofsubtractionin the ring.Okay, anyway, that’s a categorical take on the matter in response to Vishal’s query. I should add that I am fully aware that all this category theory is perhaps a sophisticated acquired taste, and that I feel some diffidence in commandeering the discussion in this way (with comments which are both long-winded and rapid in pace — this material can take a while to digest), but maybe it will interest some people and stimulate them to read further into these matters.

March 30, 2008 at 6:10 pm

John SmithBut if we consider a group as an object and the elements as arrows, how would the group axioms be proven categorically?

March 30, 2008 at 11:52 pm

Todd TrimbleJohn, I’m not absolutely sure what you mean. At first it sounded like you might be referring to the (correct) fact that a group can be equivalently expressed as a one-object category in which every arrow is invertible.

If that was what you wanted to prove, then in outline it works like this. Given a group G, you construct a category with just one object — take the object to be anything you want, you could take it to be yourself for that matter, but let’s call it O.

Definethe morphisms of the category to be the elements of the group G. You need to specify the domain and codomain of each morphism, but since there’s just one object O, there’s no choice in the matter. Then you have to define composition of morphisms (say the composite of g: O –> O followed by g': O –> O) — there you can define it to be gg’, their product as given by the group multiplication. The identity morphism 1_O: O –> O will then perforce be given by the identity element e of the group. You have to check that the axioms for this to be a category are satisfied (principally, the associativity law), but this follows straightforwardly from the associativity law for the group multiplication.Every morphism in the category constructed this way is invertible; this is straightforward from the fact that every element g in the group has an inverse g^{-1}.

If all this is clear, then it should also be clear that you can reverse the process: given a category with just one object and in which every morphism is invertible, you can define a group whose elements are the morphisms of the category, and where the group multiplication is given by composition of morphisms.

If you haven’t already, you may want to get hold of a primer on category theory. John Armstrong’s notes might also be helpful.

March 31, 2008 at 7:28 pm

John SmithTodd, is it alright if I speak to you by email because my questions may not be directly related to the topic here or may head in a different direction?

March 31, 2008 at 7:51 pm

John SmithI’m not entirely clear about what you’ve said. When you say there’s no choice in the matter for the co/domain, where do the arrows go?

March 31, 2008 at 8:00 pm

Todd TrimbleJohn, sure, ask Vishal for my email address. I’ll try to set aside some time to address your questions.

I meant that in this construction, we stipulated only one object O, so the domain and codomain of any morphism in the category must be that object. Hence for this category, every morphism takes the form g: O –> O.

July 24, 2008 at 3:35 pm

The Inclusion-Exclusion Principle « The Unapologetic Mathematician[…] the cardinality of unions in terms of the sets in question and their intersection: the Inclusion-Exclusion Principle. Predictably enough, this formula is reflected in the subspaces of a vector space. We […]

November 15, 2009 at 11:27 am

ALEEFAhow do you prove that |AUBUCUD|= |A| + |B| + |C| + |D| – |AnB| – |AnC| – |AnD| – |BnC| – |BnD| – |CnD| + |AnBnC| + |AnBnD| + |AnCnD| + |BnCnD| – |AnBnCnD| using the inclusion exclusion principle?

November 15, 2009 at 4:39 pm

John ArmstrongYou do your

ownhomework, ALEEFA.