You are currently browsing the category archive for the ‘Naive Set Theory’ category.
This is a post on “foundations of mathematics” (eek!). I was motivated to write it while I’ve been struggling to understand better certain applications of ultrafilters — namely the theory of measurable cardinals — from a point of view and language that I feel comfortable with. My original intent was to blog about that, as a kind of off-shoot of the general discussion of ultrafilters I started in connection with the series on Stone duality, and because it seems kind of cool. And I will. But this got finished first, and I thought that it would be of interest to some who have been following my category theory posts.
A lot of confusion seems to reign around “the categorical approach to foundations” and what it might entail; some seem to think it involves a “doing-away with elements” that we all know and love, or doing away with sets and supplanting them with categories, or something like that. That’s an unfortunate misunderstanding. My own attitude is pragmatic: I’m all in favor of mathematicians using ordinary “naive” (pre-axiomatic) set theory to express their thoughts if that’s the familiar and appropriate conveyance — I mean, obviously I do it myself. It’s our common heritage, learned through years of undergraduate and graduate school experience and beyond. I’m not proposing for a moment to “overthrow” it.
What I do propose to discuss is a formalized set theory which embodies this rich tradition, but which takes advantage of categorical insights honed over the decades, and which I would argue is ‘natural’ in its ease to accept formulas in naive set theory and give them a foundation true to mathematical practice; I also argue it addresses certain criticisms which I feel could be put to that hallowed foundational theory, ZFC. I should own up that this theory is not immune to criticism, a main one being that a certain amount of preface and commentary is required to make it accessible (and I don’t think category theorists have done a particularly hot job doing that, frankly).
Let’s start by putting down what we want in very simple, pragmatic terms:
- A (formalized) ‘set theory’ worthy of the name ought to realize a conception of sets as “completed collections”, and allow for the existence of enough sets and relations to fulfill the needs of working mathematicians.
This is intentionally vague. The “needs of working mathematicians” fluctuate over time and place and person. Some of the core needs would include the existence of the sets of natural numbers and real numbers, for instance. On the other hand, set theorists may have greater needs than specialists in the theory of several complex variables. For now I’ll ignore some of the deeper needs of set theorists, and try to focus on the basic stuff you’d need to formalize what goes on in your average graduate school text (to put it vaguely, again).
We will discuss two formalizations of set theory: ZFC, and Lawvere’s Elementary Theory of the Category of Sets [ETCS]. The first “needs no introduction”, as they say. The second is an autonomous category-based theory, described in detail below, and proposed by Saunders Mac Lane as an alternative approach to “foundations of mathematics” (see his book with Moerdijk). Either formalization provides fully adequate infrastructure to support the naive set theory of working mathematicians, but there are significant conceptual differences between them, centering precisely on how the notion of membership is handled. I’ll start with the more familiar ZFC.
As everyone knows, ZFC formalizes a conception of “set” as collection extensionally determined by the members it contains, and the ZFC axioms ensure a rich supply of ways in which to construct new sets from old (pairings, unions, power sets, etc.). Considering how old and well-developed this theory is, and the plenitude of available accounts, I won’t say much here on its inner development. Instead, I want to pose a question and answer to highlight a key ZFC conception, and which we use to focus our discussion:
Question: “What are the members of sets?”
Answer: “Other sets.”
This may seem innocent enough, but the consequences are quite far-reaching. It says that “membership” is a relation from the collection of all “sets” to itself. (Speaking at a pre-axiomatic level, a relation from a set to a set is a subset . So a structure for ZFC set theory consists of a “universe of discourse” , together with a collection of pairs of elements of , called the membership relation.)
Why is this a big deal? A reasonable analogue might be dynamical systems. If and are manifolds, say, then you can study the properties of a given smooth map and maybe say interesting things of course, but in the case , you get the extra bonus that outputs can be fed back in as inputs, and infinite processes are born: you can study periodic orbits, long-term behaviors, and so on, and this leads to some very intricate mathematics, even when is a simple manifold like a 2-sphere.
My point is that something analogous is happening in ZFC: we have a (binary) relation from to itself, and we get a rich “dynamics” and feedback by iterative relational composition of with itself, or by composing other derived binary relations from to itself. (Perhaps I should recall here, again at a pre-axiomatic level, that the composite of a relation and is the subset
A “behavior” then would correspond to an iterated membership chain
and there are certain constraints on behavior provided by things like the axiom of foundation (no infinitely long backward evolutions). The deep meaning of the extensionality axiom is that a “set” is uniquely specified by the abstract structure of the tree of possible backward evolutions or behaviors starting from the “root set” . This gives some intuitive but honest idea of the world of sets according to the ZFC picture: sets are tree-like constructions. The ZFC axioms are very rich, having to do with incredibly powerful operations on trees, and the combinatorial results are extremely complicated.
There are other formulations of ZFC. One is by posets: given any relation (never mind one satisfying the ZFC axioms), one can create a reflexive and transitive relation , defined by the first-order formula
if and only if
The “extensionality axiom” for can then be formulated as the condition that also be antisymmetric, so that it is a partial ordering on . If is the membership relation for a model of ZFC, then this is of course just the usual “subset relation” between elements of .
Then, by adding in a suitable “singleton” operator so that
if and only if
the rest of the ZFC axioms can be equivalently recast as conditions on the augmented poset structure . In fact, Joyal and Moerdijk wrote a slim volume, Algebraic Set Theory, which gives a precise (and for a categorist, attractive) sense in which models of axiomatic frameworks like ZF can be expressed as certain initial algebras [of structure type ] within an ambient category of classes, effectively capturing the “cumulative hierarchy” conception underlying ZFC in categorical fashion.
The structure of a ZFC poset is rich and interesting, of course, but in some ways a little odd or inconvenient: e.g., it has a bottom element of course (the “empty set”), but no top (which would run straight into Russell’s paradox). Categorically, there are some cute things to point out about this poset, usually left unsaid; for example, taking “unions” is left adjoint to taking “power sets”:
if and only if .
In summary: ZFC is an axiomatic theory (in the language of first-order logic with equality), with one basic type and one basic predicate of binary type , satisfying a number of axioms. The key philosophic point is that there is no typed distinction between “elements” and “sets”: both are of type , and there is a consequent very complicated dynamical “mixing” which results just on the basis of a short list of axioms: enough in principle to found all of present-day mathematics! I think the fact that one gets such great power, so economically, from apparently such slender initial data, is a source of great pride and pleasure among those who uphold the ZFC conception (or that of close kin like NBG) as a gold standard in foundations.
My own reaction is that ZFC is perhaps way too powerful! For example, the fact that is an endo-relation makes possible the kind of feedback which can result in things like Russell’s paradox, if one is not careful. Even if one is free from the paradoxes, though, the point remains that ZFC pumps out not only all of mathematics, but all sorts of dross and weird by-products that are of no conceivable interest or relevance to mathematics. One might think, for example, that to understand a model of ZFC, we have to be able to understand which definable pairs satisfy . So, in principle, we can ask ourselves such otherwise meaningless gibberish as “what in our model and implementation is the set-theoretic intersection of the real number and Cantor space?” and expect to get a well-defined answer. When you get right down to it, the idea that everything in mathematics (like say the number ) is a “set” is just plain bizarre, and actually very far removed from the way mathematicians normally think. And yet this is how we are encouraged to think, if we are asked to take ZFC seriously as a foundations.
One might argue that all expressions and theorems of normal mathematics are interpretable or realizable in the single theory ZFC, and that’s really all we ever asked for — the details of the actual implementation (like, ‘what is an ordered pair?’) being generally of little genuine interest to mathematicians (which is why the mathematician in the street who says ZFC is so great usually can’t say with much specificity what ZFC is). But this would seem to demote ZFC foundations, for most mathematicians, to a security blanket — nice to know it’s there, maybe, but otherwise fairly irrelevant to their concerns. But if there really is such a disconnect between how a mathematician thinks of her materials at a fundamental level and how it specifically gets coded up as trees in ZFC, with such huge wads of uninteresting or irrelevant stuff in its midst, we might re-examine just how appropriate ZFC is as “foundations” of our subject, or at least ask ourselves how much of it we usefully retain and how we might eliminate the dross.
We turn now to consider a categorical approach, ETCS. This will require retooling the way we think of mathematical membership. There are three respects in which “membership” or “elementhood” differs here from the way it is handled in ZFC:
- “Elements” and “sets” are entities of different types. (Meaning, elements are not themselves presupposed to be sets.)
- When we say “element”, we never mean an object considered in isolation; we always consider it relative to the specific “set” it is considered to be a member of. (That is, strictly speaking, the same object is never thought of as “belonging to” two distinct sets — use of such language must be carefully circumscribed.)
- We consider not just “elements” in the usual sense, but what are sometimes called “generalized elements”. Civilians call them “functions”. Thus, an element of type over a domain of variation is fancy terminology for a function . We will call them functions or “generalized elements”, depending on the intuition we have in mind. A function corresponds to an ordinary element of .
Each of these corresponds to some aspect of normal practice, but taken together they are sufficiently different in how they treat “membership” that they might need some getting used to. The first corresponds to a decision to treat elements of a “set” like as ‘urelements’: they are not considered to have elements themselves and are not considered as having any internal structure; they are just atoms. What counts in a mathematical structure then is not what the constituents are ‘like’ themselves, but only how they are interrelated among themselves qua the structure they are considered being part of.
This brings us right to the second point. It corresponds e.g. to a decision never to consider a number like ‘3’ in isolation or as some Platonic essence, but always with respect to an ambient system to which it is bound, as in ‘3 qua natural number’, ‘3 qua rational number’, etc. It is a firm resolve to always honor context-dependence. Naturally, we can in a sense transport ‘3’ from one context to another via a specified function like , but strictly speaking we’ve then changed the element. This raises interesting questions, like “what if anything plays the role of extensionality?”, or “what does it mean to take the intersection of sets?”. (Globally speaking, in ETCS we don’t — but we can, with a bit of care, speak of the intersection of two “subsets” of a given set. For any real mathematical purpose, this is good enough.)
My own sense is that it may be this second precept which takes the most getting used to — it certainly gives the lie to sometimes-heard accusations that categorical set theory is just a “slavish translation of ZFC into categorical terms”. Clearly, we are witnessing here radical departure from how membership is treated in ZFC. Such unbending commitment to the principle of context-dependence might even be felt to be overkill, a perhaps pedantic exercise in austerity or purity, or that it robs us of some freedom in how we want to manipulate sets. A few quick answers: no, we don’t lose any essential freedoms. Yes, the formal language may seem slightly awkward or stilted at certain points, but the bridges between the naive and formal are mercifully fairly obvious and easily navigated. Lastly, by treating membership not as a global endo-relation on sets, but as local and relative, we effectively eliminate all the extraneous dreck and driftwood which one rightly ignores when examining the mathematics of ZFC.
The third precept is familiar from the way category theorists and logicians have used generalized elements to extend set-theoretic notation, e.g., to chase diagrams in abelian categories, or to describe sheaf semantics of intuitionistic set theory, or to flesh out the Curry-Howard isomorphism. It is a technical move in some sense, but one which is easy to grow accustomed to, and very convenient. In ETCS, there is a strong “extensionality principle” (technically, the requirement that the terminal object is a generator) which guarantees enough “ordinary” elements to make any distinctions that can sensibly be made, but experience with topos theory suggests that for many applications, it is often convenient to drop or significantly modify that principle. If anything in ETCS is a nod to traditional set theory, it is such a strong extensionality principle. [The Yoneda principle, which deals with generalized elements, is also an extensionality principle: it says that a set is determined uniquely (to within uniquely specified isomorphism) by its generalized elements.]
Okay, it is probably time to lay out the axioms of ETCS. The basic data are just those of a category; here, we are going to think of objects as “sets”, and morphisms as functions or equivalently as “elements of a set over a domain of variation “. The latter is a mouthful, and it is sometimes convenient to suppress explicit mention of the domain , so that “” just means some morphism with codomain . More on this below. The axioms of ETCS are the axioms of category theory, plus existence axioms which guarantee enough structure to express and support naive set theory (under the strictures imposed by precepts 1-3 above). For those who speak the lingo, the axioms below are those of a well-pointed topos with natural number object and axiom of choice. (This can be augmented later with a replacement axiom, so as to achieve bi-interpretability with full ZFC.)
Remark: As ETCS breaks the “dynamical” aspects of ZFC, and additionally treats issues of membership in a perhaps unaccustomed manner, its axioms do take longer to state. This should come as no surprise. Actually, we’ve discussed some of them already in other posts on category theory; we will repeat ourselves but make some minor adjustments to reflect normal notational practice of naive set theory, and build bridges between the naive and formal.
Axiom of products. For any two sets , there is a set and functions , , such that given two elements over the same domain, there exists a unique element over that domain for which
A choice of product is usually denoted . To make a bridge with naive set-theory notation, we suggestively write
where the funny equality sign and bracketing notation on the right simply mean that the cartesian product is uniquely defined up to isomorphism by its collection of (generalized) elements, which correspond to pairs of elements, in accordance with the Yoneda principle as explained in the discussion here.
We also assume the existence of an “empty product” or terminal object 1: this is a set with a unique element over any domain.
Axiom of equalizers. For any two functions , there exists a function such that
- Given over some domain such that , there exists a unique over the same domain such that .
An equalizer is again defined up to isomorphism by its collection of generalized elements, denoted , again in accordance with the Yoneda principle.
Using the last two axioms, we can form pullbacks: given functions , we can form the set denoted
using the product and equalizer indicated by this notation.
Before stating the next axiom, a few important remarks. We recall that a function is injective if for every over the same domain, implies . In that case we think of as defining a “subset” of , whose (generalized) elements correspond to those elements which factor (evidently uniquely) through . It is in that sense that we say also “belongs to” a subset (cf. precept 2). A relation from to is an injective function or subset .
Axiom of power sets. For every set there is a choice of power set and a relation , so that for every relation , there exists a unique function such that is obtained up to isomorphism as the pullback
In other words, belongs to if and only if belongs to .
Axiom of strong extensionality. For functions , we have if and only if for all “ordinary” elements .
Axiom of natural number object. There is a set , together with an element and a function , which is initial among sets equipped with such data. That is, given a set together with an element and a function , there exists a unique function such that
Or, in elementwise notation, for every (generalized) element , where means . Under strong extensionality, we may drop the qualifier “generalized”.
Before stating the last axiom, we formulate a notion of “surjective” function: is surjective if for any two functions , we have if and only if . This is dual to the notion of being injective, and under the axiom of strong extensionality, is equivalent to the familiar notion: that is surjective if for every element , there exists an element such that .
Axiom of choice. Every surjective function admits a section, i.e., a function such that , the identity function.
This completes the list of axioms for ETCS. I have been at pains to try to describe them in notation which is natural from the standpoint of naive set theory, with the clear implication that any formula of naive set theory is readily translated into the theory ETCS (provided we pay appropriate attention to our precepts governing membership), and that this theory provides a rigorous foundation for mainstream mathematics.
To make good on this claim, further discussion is called for. First, I have not discussed how classical first-order logic is internalized in this setting (which we would need to do justice to a comprehension or separation scheme), nor have I discussed the existence or construction of colimits. I plan to take this up later, provided I have the energy for it. Again, the plan would be to stick as closely as possible to naive set-theoretic reasoning. (This might actually be useful: the categorical treatments found in many texts tend to be technical, often involving things like monad theory and Beck’s theorem, which makes it hard for those not expert in category theory to get into. I want to show this need not be the case.)
Also, some sort of justification for the claim that ETCS “founds” mainstream mathematics is called for. Minimally, one should sketch how the reals are constructed, for instance, and one should include enough “definability theory” to make it plausible that almost all constructions in ordinary mathematics find a natural home in ETCS. What is excluded? Mainly certain parts of set theory, and parts of category theory (ha!) which involve certain parts of set theory, but this is handled by strengthening the theory with more axioms; I particularly have in mind a discussion of the replacement axiom, and perhaps large cardinal axioms. More to come!
Last time in this series on Stone duality, we introduced the concept of lattice and various cousins (e.g., inf-lattice, sup-lattice). We said a lattice is a poset with finite meets and joins, and that inf-lattices and sup-lattices have arbitrary meets and joins (meaning that every subset, not just every finite one, has an inf and sup). Examples include the poset of all subsets of a set , and the poset of all subspaces of a vector space .
I take it that most readers are already familiar with many of the properties of the poset ; there is for example the distributive law , and De Morgan laws, and so on — we’ll be exploring more of that in depth soon. The poset , as a lattice, is a much different animal: if we think of meets and joins as modeling the logical operations “and” and “or”, then the logic internal to is a weird one — it’s actually much closer to what is sometimes called “quantum logic”, as developed by von Neumann, Mackey, and many others. Our primary interest in this series will be in the direction of more familiar forms of logic, classical logic if you will (where “classical” here is meant more in a physicist’s sense than a logician’s).
To get a sense of the weirdness of , take for example a 2-dimensional vector space . The bottom element is the zero space , the top element is , and the rest of the elements of are 1-dimensional: lines through the origin. For 1-dimensional spaces , there is no relation unless and coincide. So we can picture the lattice as having three levels according to dimension, with lines drawn to indicate the partial order:
V = 1 / | \ / | \ x y z \ | / \ | / 0
Observe that for distinct elements in the middle level, we have for example (0 is the largest element contained in both and ), and also for example (1 is the smallest element containing and ). It follows that , whereas . The distributive law fails in !
Definition: A lattice is distributive if for all . That is to say, a lattice is distributive if the map , taking an element to , is a morphism of join-semilattices.
- Exercise: Show that in a meet-semilattice, is a poset map. Is it also a morphism of meet-semilattices? If has a bottom element, show that the map preserves it.
- Exercise: Show that in any lattice, we at least have for all elements .
Here is an interesting theorem, which illustrates some of the properties of lattices we’ve developed so far:
Theorem: The notion of distributive lattice is self-dual.
Proof: The notion of lattice is self-dual, so all we have to do is show that the dual of the distributivity axiom, , follows from the distributive lattice axioms.
Expand the right side to , by distributivity. This reduces to , by an absorption law. Expand this again, by distributivity, to . This reduces to , by the other absorption law. This completes the proof.
Distributive lattices are important, but perhaps even more important in mathematics are lattices where we have not just finitary, but infinitary distributivity as well:
Definition: A frame is a sup-lattice for which is a morphism of sup-lattices, for every . In other words, for every subset , we have , or, as is often written,
Example: A power set , as always partially ordered by inclusion, is a frame. In this case, it means that for any subset and any collection of subsets , we have
This is a well-known fact from naive set theory, but soon we will see an alternative proof, thematically closer to the point of view of these notes.
Example: If is a set, a topology on is a subset of the power set, partially ordered by inclusion as is, which is closed under finite meets and arbitrary sups. This means the empty sup or bottom element and the empty meet or top element of are elements of , and also:
- If are elements of , then so is .
- If is a collection of elements of , then is an element of .
A topological space is a set which is equipped with a topology ; the elements of the topology are called open subsets of the space. Topologies provide a primary source of examples of frames; because the sups and meets in a topology are constructed the same way as in (unions and finite intersections), it is clear that the requisite infinite distributivity law holds in a topology.
The concept of topology was originally rooted in analysis, where it arose by contemplating very generally what one means by a “continuous function”. I imagine many readers who come to a blog titled “Topological Musings” will already have had a course in general topology! but just to be on the safe side I’ll give now one example of a topological space, with a promise of more to come later. Let be the set of -tuples of real numbers. First, define the open ball in centered at a point and of radius to be the set < . Then, define a subset to be open if it can be expressed as the union of a collection, finite or infinite, of (possibly overlapping) open balls; the topology is by definition the collection of open sets.
It’s clear from the definition that the collection of open sets is indeed closed under arbitrary unions. To see it is closed under finite intersections, the crucial lemma needed is that the intersection of two overlapping open balls is itself a union of smaller open balls. A precise proof makes essential use of the triangle inequality. (Exercise?)
Topology is a huge field in its own right; much of our interest here will be in its interplay with logic. To that end, I want to bring in, in addition to the connectives “and” and “or” we’ve discussed so far, the implication connective in logic. Most readers probably know that in ordinary logic, the formula (“ implies “) is equivalent to “either not or ” — symbolically, we could define as . That much is true — in ordinary Boolean logic. But instead of committing ourselves to this reductionistic habit of defining implication in this way, or otherwise relying on Boolean algebra as a crutch, I want to take a fresh look at material implication and what we really ask of it.
The main property we ask of implication is modus ponens: given and , we may infer . In symbols, writing the inference or entailment relation as , this is expressed as . And, we ask that implication be the weakest possible such assumption, i.e., that material implication be the weakest whose presence in conjunction with entails . In other words, for given and , we now define implication by the property
if and only if
As a very easy exercise, show by Yoneda that an implication is uniquely determined when it exists. As the next theorem shows, not all lattices admit an implication operator; in order to have one, it is necessary that distributivity holds:
- (1) If is a meet-semilattice which admits an implication operator, then for every element , the operator preserves any sups which happen to exist in .
- (2) If is a frame, then admits an implication operator.
Proof: (1) Suppose has a sup in , here denoted . We have
if and only if
if and only if
for all if and only if
for all if and only if
Since this is true for all , the (dual of the) Yoneda principle tells us that , as desired. (We don’t need to add the hypothesis that the sup on the right side exists, for the first four lines after “We have” show that satisfies the defining property of that sup.)
(2) Suppose are elements of a frame . Define to be . By definition, if , then . Conversely, if , then
where the equality holds because of the infinitary distributive law in a frame, and this last sup is clearly bounded above by (according to the defining property of sups). Hence , as desired.
Incidentally, part (1) this theorem gives an alternative proof of the infinitary distributive law for Boolean algebras such as , so long as we trust that really does what we ask of implication. We’ll come to that point again later.
Part (2) has some interesting consequences vis à vis topologies: we know that topologies provide examples of frames; therefore by part (2) they admit implication operators. It is instructive to work out exactly what these implication operators look like. So, let be open sets in a topology. According to our prescription, we define as the sup (the union) of all open sets with the property that . We can think of this inclusion as living in the power set . Then, assuming our formula for implication in the Boolean algebra (where denotes the complement of ), we would have . And thus, our implication in the topology is the union of all open sets contained in the (usually non-open) set . That is to say, is the largest open contained in , otherwise known as the interior of . Hence our formula:
Definition: A Heyting algebra is a lattice which admits an implication for any two elements . A complete Heyting algebra is a complete lattice which admits an implication for any two elements.
Again, our theorem above says that frames are (extensionally) the same thing as complete Heyting algebras. But, as in the case of inf-lattices and sup-lattices, we make intensional distinctions when we consider the appropriate notions of morphism for these concepts. In particular, a morphism of frames is a poset map which preserves finite meets and arbitrary sups. A morphism of Heyting algebras preserves all structure in sight (i.e., all implied in the definition of Heyting algebra — meets, joins, and implication). A morphism of complete Heyting algebras also preserves all structure in sight (sups, infs, and implication).
Heyting algebras are usually not Boolean algebras. For example, it is rare that a topology is a Boolean lattice. We’ll be speaking more about that next time soon, but for now I’ll remark that Heyting algebra is the algebra which underlies intuitionistic propositional calculus.
Exercise: Show that in a Heyting algebra.
Exercise: (For those who know some general topology.) In a Heyting algebra, we define the negation to be . For the Heyting algebra given by a topology, what can you say about when is open and dense?
The difference between sets and , also known as the relative complement of in , is the set defined by
If we assume the existence of a universe, , such that all the sets are subsets of , then we can considerably simplify our notation. So, for instance, can simply be written as , which denotes the complement of in . Similarly, , , and so on. A quick look at a few more facts:
- if and only if .
The last one is proved as follows. We prove the “if” part first. Suppose . If , then clearly . But, since , we have , which implies . Hence, . This closes the proof of the “if” part. Now, we prove the “only if” part. So, suppose . Now, if , then clearly . But, since , we have , which implies . Hence, . This closes the proof of the “only if” part, and, we are done.
The following are the well-known DeMorgan’s laws (about complements):
, and .
Let’s quickly prove the first one. Suppose belongs to the left hand side. Then, , which implies and , which implies and , which implies . This proves that the left hand side is a subset of the right hand side. We can similarly prove the right hand side is a subset of the left hand side, and this closes our proof.
Though it isn’t very apparent, but if we look carefully at the couple of problems whose proofs we did above, we note something called the principle of duality for sets. One encounters such dual principles in mathematics quite often. In this case, this dual principle is stated a follows.
Principle of duality (for sets): If in an inclusion or equation involving unions, intersections and complements of subsets of (the universe) we replace each set by its complement, interchange unions and intersections, and reverse all set-inclusions, the result is another theorem.
Using the above principle, it is easy to “derive” one of the DeMorgan’s laws from another and vice versa. In addition, DeMorgan’s laws can be extended to larger collections of sets instead of just pairs.
Here are a few exercises on complementation.
- if and only if ,
We will prove the last one, leaving the remaining as exercises to the reader. Suppose belongs to the left hand side. Then, and . Now, note that if , then , which implies , which implies . If, on the other hand, , then , which implies , which implies . Hence, in either case, the left hand side is a subset of , and we are done.
We now define the symmetric difference (or Boolean sum) of two sets and as follows:
This is basically the set of all elements in or but not in . In other words, . Again, a few facts (involving symmetric differences) that aren’t hard to prove:
This brings us now to the axiom of powers, which basically states if is a set then there exists a set that contains all the possible subsets of as its elements.
Axiom of powers: If is a set, then there exists a set (collection) , such that if , then .
The set , described above, may be too “comprehens ive”, i.e., it may contain sets other than the subsets of . Once again, we “fix” this by applying the axiom of specification to form the new set . The set is called the power set of , and the axiom of extension, again, guarantees its uniqueness. We denote by to show the dependence of on . A few illustrative examples: , , , and so on.
Note that if is a finite set, containing elements, then the power set contains elements. The “usual” way to prove this is by either using a simple combinatorial argument or by using some algebra. The combinatorial argument is as follows. An element in either belongs to a subset of or it doesn’t: there are thus two choices; since there are elements in , the number of all possible subsets of is therefore . A more algebraic way of proving the same result is as follows. The number of subsets with elements is . So, the number of subsets of is . But, from the binomial theorem, we have . Putting , we get as our required answer.
A few elementary facts:
- If , then .
1. Prove that .
2. Prove that .
We are now ready to discuss a couple of familiar set theoretic operations: unions and intersections. Given two sets and , it would be “nice” to have a set that contains all the elements that belong to at least one of or . In fact, it would be nicer to generalize this to a collection of sets instead of just two, though we must be careful about using words like “two”, “three” and so on, since we haven’t really defined what numbers are so far. We don’t want to run the risk of inadvertently introducing circularity in our arguments! Anyway, this brings us to the following axiom.
Axiom of unions: For every collection of sets, there exists a set that contains all the elements that belong to at least one set of the given collection.
In other words, for every collection , there exists a set such that if for some in , then . Now, the set may contain “extra” elements that may not belong to any in . This can be easily fixed by invoking the axiom of specification to form the set . This set is called the union of the collection of set. Its uniqueness is guaranteed by the axiom of extension.
Generally, if is a collection of sets, then the union is denoted by , or .
A quick look at a couple of simple facts.
1) , and
We finally arrive at the definition of the union of two sets, and . .
Below is a list of a few facts about unions of pairs:
- if and only if .
Now, we define the intersection of two sets, and as follows.
Once again, a few facts about intersections of pairs (analogous to the ones involving unions):
- if and only if .
Also, if , then the sets and are called disjoint sets.
Two useful distributive laws involving unions and intersections:
We prove the first one of the above. The proof of the second one is left as an exercise to the reader. The proof relies on the idea that we show each side is a subset of the other. So, suppose belongs to the left hand side; then and , which implies and or , which implies or , which implies ; hence belongs to the right hand side. This proves that the left hand side is a subset of the right hand side. A similar argument shows that the right hand side is a subset of the left hand side. And, we are done.
The operation of the intersection of sets from a collection, , is similar to that of the union of sets from . However, the definition will require that we prohibit from being empty, and we will see why in the next section. So, for each collection, , there exists a set such that if and only if for every in . To construct such a set , we choose any set in – this is possible because – and write .
Note that the above construction is only used to prove that exists. The existence of doesn’t depend on any arbitrary set in the collection . We can, in fact, write
The set is called the intersection of the collection of sets. The axiom of extension, once again, guarantees its uniqueness. The usual notation for such a set is or .
EXERCISE: if and only if .
SOLUTION: We first prove the “if” part. So, suppose . Now, if , then either or . In the first case, and , which implies and . In the second case, we again have (since ), which implies and . In either case, we have . Hence, is a subset of .
Similarly, if , then and . Now, if , then , which implies . And, if , then once again . Thus, in either case, . Hence, is a subset of . We, thus, proved . This concludes the proof of the “if” part.
Now, we prove the “only if” part. So, suppose . If , then belongs to the left hand side of the equality, which implies belongs to the right hand side. This implies (and .) Hence, . And, we are done.
After postulating a couple of important axioms in the previous two sections, we now arrive at a couple of important results.
1. There exists an empty set. (In fact, there exists exactly one!)
2. The empty set is a subset of every set.
Indeed, to prove the first result, suppose is some set. Then, the set is clearly an empty set, i.e. it doesn’t contain any elements. To “picture” this, imagine an empty box with nothing inside it. In fact, we can apply the axiom of specification to with any universally false sentence to create an empty set. The empty set is denoted by . The axiom of extension, on the other hand, guarantees there can be only one empty set.
Now, how do we argue that , for any arbitrary set ? Well, the reasoning is an indirect one, and for most beginners, it doesn’t seem like a complete one. There is something in the argument that doesn’t feel quite “right!” However, there is nothing “incomplete” about the argument, and here it is anyway.
Suppose, for the sake of contradiction, the emptyset, , is not a subset of Then, there exists an element in that doesn’t belong to But, the empty set is empty, and hence, no such element exists! This means our initial hypothesis is false. Hence, we conclude (maybe, still somewhat reluctantly) .
Now, the set theory we have developed thus far isn’t a very rich one; after all, we have only showed there is only one set and that it is empty! Can we do better? Can we come up with an axiom that can help us construct new sets? Well, it turns out, there is one.
Axiom of pairing: If and are two sets, then there exists a set such that and .
The above axiom guarantees that if there are two sets, and , then there exists another one, , that contains both of these. However, may contain elements other than and . So, can we guarantee there is a set that contains exactly and and nothing else? Indeed, we can. We just apply the axiom of specification to with the sentence “ or .” Thus, the set is the required one.
The above construction of a particular set illustrates one important fact: all the remaining principles of set construction are pseudo-special cases of the axiom of specification. Indeed, if it were given that there exists a set containing some particular elements, then the existence of a set containing exactly those elements (and nothing else) would follow as a special case of the axiom of specification.
Now, observe if is a set, then the axiom of pairing implies the existence of the set , which is the same as the set and is called a singleton of . Also, note that and are different sets; the first has no elements at all, whereas the second has exactly one element, viz. the empty set. In fact, there is a minimalist (inductive) way of constructing the set of natural numbers, , (due to von Neumann) using the axiom of infinity as follows.
But, more on this later.
The axiom of extension (discussed in Section 1) is unique in the sense that it postulates the existence of a relation between belonging and equality. All the other axioms of set theory, on the other hand, are designed to create new sets out of old ones!
The axiom of specification, loosely speaking, states that given some arbitrary (but well-defined) set (our universe), if we can make some “intelligent” assertion about the elements of , then we specify or characterize a subset of . An intelligent assertion about the elements of could, for example, specify a property that is shared by some elements of and not shared by the other elements. In the end, we will take up an example about an assertion that is tied to the famous Russell’s paradox.
For now, let us discuss a simple example. Suppose is the set of all (living) women. If we use to denote an arbitrary element of , then the sentence “ is married” is true for some of the elements of and false for others. Thus, we could generate a subset of using such a sentence. So, the subset of all the women who are married is denoted by . To take another example, if is the set of natural numbers, then . Now, note that the subset of is not the same as the number Loosely speaking, a box containing a hat is not the same thing as the hat itself.
Now, we only need to define what a sentence is before we can precisely formulate our axiom of specification. The following rules would be a formal way to (recursively) define a sentence:
“” is a sentence.”
“” is a sentence.
If is a sentence, then is a sentence.
If are sentences, then is a sentence.
If are sentences, then is a sentence.
If are sentences, then is a sentence.
If are sentences, then is a sentence.
If is a sentence, then is a sentence.
If is a sentence, then is a sentence.
Note that the two types of sentences, “” and ““, stated in the first two rules, are what we would call atomic sentences, while the rest of the other rules specify (valid) ways of generating (infinitely) many sentences from those two atomic sentences using the usual logical operators. Also, note that some of the rules above are rather redundant because it is possible to convert certain sentences having a set of logical operators to another sentence having a different set of logical operators. For example, “” can be written as ““, and so on. Anyway, we are digressing too far from our objective.
Having defined what a sentence is, we can now formulate the major principle of set theory, often referred to by its German name Aussonderungsaxiom.
Axiom of specification: To every set and to every condition , there corresponds a set whose elements are exactly those elements of for which holds.
A “condition” here just means a sentence. The letter is free in the sentence , meaning occurs in at least once without occurring in the phrases “for some ” or “for all “. Now, the axiom of extension guarantees us that the axiom of specification determines the set B uniquely, and we usually write .
This finally brings us to the example we mentioned in the beginning of this section. Let us define . Suppose is some arbitrary set. Let . Then for all ,
Can we have ? The answer is no, and here’s why. Suppose, for the sake of contradiction, . Then, we have either , or . If , then using , we have , a contradiction. And, if , then using again, the assumption yields , a contradiction. This proves that is false, and hence we conclude . Note that our set was an arbitrary one, and we just showed that there is something (viz. B) that does not belong to . We have, thus, essentially proved that
there is no universe.
Here, “universe” means “universe of discourse”, a set that contains all the objects that enter into that discussion.
It was mentioned earlier that the above example has something to do with Russell’s paradox. We shall see why. In the earlier pre-axiomatic approaches to set theory, the existence of a universe was taken for granted. Now, in the above example, we just showed that implies the non-existence of a universe. So, if we assume that a universe exists, then it implies that , but we have already shown that this leads to a contradiction! And this was exactly the content of Russell’s paradox. In Halmos’ own words:
The moral is that it is impossible , especially in mathematics, to get something for nothing. To specify a set, it is not enough to pronounce some magic words (which may form a sentence such as ““); it is necessary also to have at hand a set whose elements the magic words apply.
We encounter sets, or if we prefer, collections of objects, everyday in our lives. A herd of cattle, a group of women, or a bunch of yahoos are all instances of sets of living beings. “The mathematical concept of a set can be used as the foundation for all known mathematics.” The purpose here is to develop the basic properties of sets. As a slight digression, I wouldn’t consider myself a Platonist; hence, I don’t believe there are some abstract properties of sets “out there” and that the chief purpose of mathematics is to discover those abstract things, so to speak. Even though the idea of a set is ubiquitous and it seems like the very concept of a set is “external” to us, I still think that we must build, or rather postulate, the existence of the fundamental properties of sets. (I think I am more of a quasi-empiricist.)
Now, we won’t define what a set is, just as we don’t define what points or lines are in the familiar axiomatic approach to elementary geometry. So, we somewhat rely on our intuition to develop a definition of sets. Of course, our intuition may go wrong once in a while, but one of the very purposes of our exposition is to reason very clearly about our intuitive ideas, so that we can correct them any time if we discover they are wrong.
Now, a very reasonable thing to “expect” from a set is it should have elements or members. So, for example, Einstein was a member of the set of all the people who lived in the past. In mathematics, a line has points as its members, and a plane has lines as its members. The last example is a particularly important one for it underscores the idea that sets can be members of other sets!
So, a way to formalize the above notion is by developing the concept of belonging. This is a primitive (undefined) concept in axiomatic set theory. If is a member of ( is contained in , or is an element of ), we write . ( is a derivation of the Greek letter epsilon, , introduced by Peano in 1888.) If is not an element of , we write . Note that we generally reserve lowercase letters (, etc) for members or elements of a set, and we use uppercase letters to denote sets.
A possible relation between sets, more elementary than belonging, is equality. If two sets and are equal, we write If two sets and are not equal, we write
Now, the most basic property of belonging is its relation to equality, which brings us to the following formulation of our first axiom of set theory.
Axiom of extension: Two sets are equal if and only if they have the same elements.
Let us examine the relation between equality and belonging a little more deeply. Suppose we consider human beings instead of sets, and change our definition of belonging a little. If and are human beings, we write whenever is an ancestor of . Then our new (or analogous) axiom of extension would say if two human beings and are equal then they have the same ancestors (this is the “only if” part, and it is certainly true), and also that if and have the same ancestors, then they are equal (this is the “if” part, and it certainly is false.) and could be two sisters, in which case they have the same ancestors but they are certainly not the same person.
Conclusion: The axiom of extension is not just a logically necessary property of equality but a non-trivial statement about belonging.
Also, note that the two sets and have the same elements, and hence, by the axiom of extension, , even though it seems like has just two elements while has five! It is due to this that we drop duplicates while writing down the elements of a set. So, in the above example, it is customary to simply write .
Now, we come to the definition of a subset. Suppose and are sets. If every member of is a member of , then we say is a subset of , or includes , and write or . This definition, clearly, implies that every set is a subset of itself, i.e. , which demonstrates the reflexive property of set inclusion. (Of course, equality also satisfies the reflexive property, i.e. .) We say is a proper subset of whenever but . Now, if and , then , which demonstrates the transitive property of set inclusion. (Once again, equality also satisfies this property, i.e. if and , then .) However, we note that set inclusion doesn’t satisfy the symmetric property. This means, if, then it doesn’t necessarily imply . (On the other hand, equality satisfies the symmetric property, i.e. if , then .)
But, set inclusion does satisfy one very important property: the antisymmetric one. If we have and , then and have the same elements, and therefore, by the axiom of extension, . In fact, we can reformulate the axiom of extension as follows:
Axiom of extension(another version): Two sets and are equal if and only if and .
In mathematics, the above is almost always used whenever we are required to prove that two sets and are equal. All we do is show that and , and invoke the (above version of) axiom of extension to conclude that .
Before we conclude, we note that conceptually belonging () and set inclusion () are two different things. always holds, but is “false”; at least, it isn’t true of any reasonable set that anyone has ever constructed! This means, unlike set inclusion, belonging does not satisfy the reflexive property. Again, unlike set inclusion, belonging does not satisfy the transitive property. For example, a person could be considered a member of a country and a country could be considered a member of the United Nations Organizations (UNO); however, a person is not a member of the UNO.
I have just started reading Paul R. Halmos’ classic text Naive Set Theory, and I intend to blog on each section of the book. The purpose is mainly to internalize all the material presented in the book and at the same time provide the gist of each section, so that I can always come back and read whenever I feel like doing so. The actual text, divided into 25 sections (or chapters, if you will), comprises 102 pages. Halmos’ original intention was “to tell the beginning student of advanced mathematics the basic set theoretic facts of life, and to do so with the minimum of philosophical discourse and logical formalism… The style is usually informal to the point of being conversational.“
The reader is warned that “the expert specialist will find nothing new here.” Halmos recommends Hausdorff’s Set Theory and Axiomatic Set Theory by Suppes for a more extensive treatment of the subject. Nevertheless, the treatment by Halmos is not trivial at all. I personally feel his exposition is impeccable!
Almost all the ideas presented in the following posts belong to the author of the book, and I make absolutely no claims to originality in the exposition.