% Upper-case A B C D E F G H I J K L M N O P Q R S T U V W X Y Z % Lower-case a b c d e f g h i j k l m n o p q r s t u v w x y z % Digits 0 1 2 3 4 5 6 7 8 9 % Exclamation ! Double quote " Hash (number) # % Dollar $ Percent % Ampersand & % Acute accent ' Left paren ( Right paren ) % Asterisk * Plus + Comma , % Minus - Point . Solidus / % Colon : Semicolon ; Less than < % Equals = Greater than > Question mark ? % At @ Left bracket [ Backslash \ % Right bracket ] Circumflex ^ Underscore _ % Grave accent ` Left brace { Vertical bar | % Right brace } Tilde ~ % ---------------------------------------------------------------------| % --------------------------- 72 characters ---------------------------| % ---------------------------------------------------------------------| % % Optimal Foraging Theory Revisited: Appendix. Mathematical Background % (this material not included in final version of document) % % (c) Copyright 2007 by Theodore P. Pavlic % % (it would be best to split this chapter into multiple files someday; % it is a long book in one file at the moment) \chapter{Mathematical Background} \label{app:math} \sym{*conventions}{{[\texttt{xx}]}}{see reference number \texttt{xx} in \hyperref[ch:bibliography]{the bibliography}} This \appname{} is meant to provide most of the mathematical knowledge required for understanding of our models and arguments. In order for this material to be useful to a diverse audience, we develop nearly all mathematical theory from first principles. That being said, we will immediately make use of the symbols \symdef{Ageneral.0}{equals}{$=$}{is equal to} and \symdef{Ageneral.0}{definedas}{$\triangleq$}{defined as}. The former indicates that some quantity is \emph{equal} to another quantity and the second quantity indicates that some quantity is \emph{defined as} another quantity. The former will be used in the conclusions of arguments, and the latter will be used to define symbols useful in those arguments. This difference between $\triangleq$ and $=$ will be more clear in our examples. We also make use of general counting principles that are surely well-understood by any member of our audience. All other concepts and notation will be defined as needed. This \appname{} focusses on set theory, algebra, number systems, real analysis, elementary measure theory, and propositional logic. We will give references for each of these individual topics when appropriate. However, \citet{Stoll79} provides a detailed unified treatment of the sets, algebra, numbers, and logic that could easily replace most of this \appname{}. \section{Sets} \label{app:math_sets} This is meant to be a brief introduction to set theory, a topic on which nearly all mathematics can be constructed. While there are some alternative foundation candidates for mathematics, set theory is commonly used, and we consider it to be the foundation of all of the constructs that we use. Common applications of mathematics (\eg, \emph{arithmetic}) do not make their set theoretic foundations explicit. However, set theory is extensively used explicitly in the study of probability and random processes. We focus on the few set-theoretic concepts that we use. \Citet{Viniotis98} provides another useful appendix on set theory that contains additional examples and definitions. \Citet{JW96} give a complete introduction to set theory. Set theory is fundamentally related to formal logic, which is discussed in \longref{app:math_logic}, and thus analogies between set-theoretic and logical constructs are not coincidental. While the actual history of set theory and formal logic is more complicated, we will view formal logic as a specialization of set theory. As mentioned, modern set theory generalizes nearly all of formal mathematics and thus is an important fundamental concept. \subsection{Sets: Definition and Examples} A \symdef[\emph{set}]{Csets.0}{set}{$\set{X}$}{a set $\set{X}$}\symdef[]{Csets.1a}{longset}{$\{a,b,c\}$}{a set of objects $a$, $b$, and $c$} is a roughly a collection of distinct items, where an item is any abstract object. This definition follows from \emph{naive (or intuitive) set theory}. Unfortunately, this definition is not rigorous and can lead to the construction of paradoxical sets. The modern definition of a set follows from \emph{axiomatic set theory} (\ie, \emph{\acro[]{ZFC}{Zermelo-Fraenkel set theory with the axiom of choice assumed}}, which is \emph{\acro{ZF}{Zermelo-Fraenkel set theory}} with the \emph{axiom of choice} also assumed), which prevents these paradoxes by defining a set as an object that satisfies certain specific mathematical axioms. These axioms endow sets with important characteristics on which modern set theory is built. A proper handling of set theory would define a set using these axioms; however, for brevity, we give the naive set-theoretic definition. By doing this, we risk leading the reader into paradoxes of logic \citep[for details, see][]{JW96}; however, the theory used in the rest of our work depends upon the modern axiomatic definition. \paragraph{Notation:} When sets are listed explicitly, their elements are usually separated by commas and bracketed with curly braces. Because sets are abstract entities, they are often specified with words. The following are some example sets that we will use throughout this \appname{}. % \begin{subequations}\label{eq:ex_sets} \begin{align} \set{Z} &\triangleq \{\text{The people in the living room right now}\} \label{eq:ex_set_Z}\\ \set{Q} &\triangleq \{\text{The objects that could fit inside a cube with $1 \text{ m}^3$ volume}\} \label{eq:ex_set_Q}\\ \set{J} &\triangleq \{\text{Statements made by Joe}\} \label{eq:ex_set_J}\\ % \set{S} &\triangleq \{\text{The four different outcomes of two % successive coin tosses}\}\\ % &= \{ (\text{Tails},\text{Tails}), % (\text{Tails},\text{Heads}), (\text{Heads},\text{Tails}), % (\text{Heads},\text{Heads}) \} \set{S} &\triangleq \{\text{The two different outcomes of a single coin toss}\}\nonumber\\ &= \{ Heads, Tails \} \label{eq:ex_set_S} \end{align} % In the last case (\ie, the set $\set{S}$), we show how a set definition can be made more precise with an enumeration of its specific elements. We can similarly define the set $\setset{O}$ as % \begin{align} \setset{O} &\triangleq \{\text{The set of the above examples of sets}\} \nonumber\\ &= \{\set{Z},\set{Q},\set{J},\set{S}\} \label{eq:ex_set_O} \end{align} \end{subequations} % That is, the set $\setset{O}$ is a set of sets. We will typically use calligraphy for the names of sets (\eg, $\set{A}$) and script for the names of sets of sets (\eg, $\setset{A}$). \paragraph{Numbers and Infinite Sets:} It is important that sets can have elements of other sets. In fact, the set $\{\}$ is different than the set $\{\{\}\}$ and so $\{ \{\}, \{\{\}\} \}$ is a legal representation of a set since it contains distinct items. In fact, the formal definitions of the set of \emph{natural numbers} and the set of \emph{whole numbers} can each be defined by any of the sets % \begin{subequations} \label{eq:some_countably_infinite_sets} \begin{align} \biggl\{ \{\}, \bigl\{ \{\} \bigr\}, \Bigl\{ \bigl\{\{\}\bigr\} \Bigr\}, \dots \biggr\} \label{eq:some_countably_infinite_sets_a}\\ \Biggl\{ \{\}, \bigl\{ \{\} \bigr\}, \Bigl\{ \{\}, \bigl\{\{\}\bigr\} \Bigr\}, \biggl\{ \{\}, \bigl\{\{\}\bigr\}, \Bigl\{\{\}, \bigl\{\{\}\bigr\}\Bigr\} \biggr\}, \dots \Biggr\} \label{eq:some_countably_infinite_sets_b}\\ \Biggl\{ \bigl\{ \{\} \bigr\}, \Bigl\{ \{\}, \bigl\{\{\}\bigr\} \Bigr\}, \biggl\{ \{\}, \bigl\{\{\}\bigr\}, \Bigl\{\{\}, \bigl\{\{\}\bigr\}\Bigr\} \biggr\}, \dots \Biggr\} \label{eq:some_countably_infinite_sets_c} \end{align} \end{subequations} % where \symdef{Csets.1aa}{dots}{$\dots$}{continue the established pattern \adinfinitum{} (\eg, the infinite set $\{1,2,3,\dots\}$)} indicates that the established pattern should continue \adinfinitum{}. The pattern in \longref{eq:some_countably_infinite_sets_a} is that each element set contains the element before it (\ie, the element to the left of it in the list). The pattern in \longrefs{eq:some_countably_infinite_sets_b} and \shortref{eq:some_countably_infinite_sets_c} is that each element set contains \emph{all} of the sets before it; however, the initial element set is different in these two examples. Therefore, all three of these sets are each \emph{infinite sets} since they contain an infinite (\ie, unbounded) number of elements. The previous example sets $\set{Z}$, $\set{S}$ and $\setset{O}$ are all \emph{finite sets} since they contain a finite (\ie, bounded) number of elements. Without further information, it is not clear whether the sets $\set{Q}$ and $\set{J}$ are infinite or finite sets. The concepts of finite and infinite sets will be further explored in more below. We choose to define the set of whole numbers $\W$ and the set of natural numbers $\N$ as the infinite sets in \longrefs{eq:some_countably_infinite_sets_b} and \shortref{eq:some_countably_infinite_sets_c} respectively. First, note that every element of the infinite set in \longref{eq:some_countably_infinite_sets_c} is also an element of the set in \longref{eq:some_countably_infinite_sets_b}. Now assign familiar symbols to the elements of these two infinite sets in order to make the definitions of the \emph{whole numbers} and \emph{natural numbers} more explicit. The result is % \begin{align} \W &\triangleq \{0,1,2,3,\dots\} \label{eq:whole_numbers} \end{align} % and % \begin{align} \N &\triangleq \{1,2,3,\dots\} \label{eq:natural_numbers} \end{align} % where % \begin{subequations} \begin{align} 0 &\triangleq \{\} \label{eq:zero}\\ 1 &\triangleq \bigl\{ \{\} \bigr\} = \{0\} \label{eq:one}\\ 2 &\triangleq \Bigl\{ \{\}, \bigl\{\{\}\bigr\} \Bigr\} = \{0,1\}\label{eq:two}\\ 3 &\triangleq \biggl\{ \{\}, \bigl\{\{\}\bigr\}, \Bigl\{\{\}, \bigl\{\{\}\bigr\}\Bigr\} \biggr\} = \{0,1,2\} \label{eq:three}\\ {}&\mathrel{\vdots} {} \nonumber \end{align} \end{subequations} % The justification for the construction process of the whole numbers, which are used to count things, is as follows. If the universe was empty, it would be equivalent to the empty set $\{\}$ and would have zero items in it. Thus, $0 \triangleq \{\}$. Once $0$ was constructed, the universe would now have one thing in it, and so it would be represented by $\{0\}$, and thus $1 \triangleq \{0\}$. This construction process can continue \adinfinitum{} until all of the whole numbers (\ie, all elements of $\W$) are defined. We will discuss how \emph{arithmetic} can be defined on the sets $\W$ and $\N$ in \longref{app:math_numbers}. However, for the moment we will use these as two example infinite sets and each whole number as an example finite set. To demonstrate how arbitrary finite sets can interact, we also introduce example finite sets % \begin{align*} \set{A} &\triangleq \{a,b,c\}\\ \set{B} &\triangleq \{c,d,e\}\\ \set{C} &\triangleq \{b\}\\ \set{D} &\triangleq \{d,e\}\\ \set{E} &\triangleq \{c,d,e\} \end{align*} % where $a,b,c,d,e$ are arbitrary abstract objects. Additionally, since the generic \emph{empty set} will frequently be used in discussion, we will often denote it with the symbol \symdef{Csets.1b}{emptyset}{$\emptyset$}{the empty set (\ie, $\{\}$)} which is defined $\emptyset \triangleq \{\}$. If a set is not the empty set, it will be called \emph{nonempty}. Also, note that the set $\set{C}$ only includes a single element. In this case, the set is called a \emph{singleton set}. \subsection{Set Inclusion, Set Exclusion, Subsets, and Supersets} There are a number of terms that capture the relationship between two sets or a set and its elements. \paragraph{Inclusion and Exclusion:} The notation \symdef[]{Csets.1b}{in}{$\in$}{is an element of (\ie, set inclusion)}$a \in \set{A}$ indicates that object $a$ is an \emph{element} of set $\set{A}$, and $\set{A}$ is said to \emph{contain} the \emph{set} $\{a\}$. Similarly, \symdef[]{Csets.1b}{notin}{$\notin$}{is not an element of (\ie, set exclusion)}$a \notin \set{B}$ denotes that object $a$ is not an element of set $\set{B}$. \paragraph{Containment:} Since every element of set $\set{D}$ is also an element of set $\set{B}$, set $\set{D}$ is called a \emph{subset} of set $\set{B}$ and set $\set{B}$ is called a \emph{superset} of set $\set{D}$; this is denoted by either \symdef[]{Csets.1c}{subsupseteq}{$\subseteq$ ($\supseteq$)}{is a subset (superset) of}$\set{D} \subseteq \set{B}$ or $\set{B} \supseteq \set{D}$. In this case, we say that $\set{D}$ \emph{is contained in} $\set{B}$ or $\set{B}$ \emph{contains} $\set{D}$. In particular, note that since $a \in \set{A}$, $\{a\} \subseteq \set{A}$, and so $\set{A}$ is said to \emph{contain} $\{a\}$. \paragraph{Equality:} \symdef[]{Csets.1d}{setequal}{$\set{X} = \set{Y}$}{set $\set{X}$ is equal to set $\set{Y}$ (\ie, $\set{X} \subseteq \set{Y}$ and $\set{Y} \subseteq \set{X}$)}\symdef[]{Csets.1d}{setnotequal}{$\set{X} \neq \set{Y}$}{set $\set{X}$ is not equal to set $\set{Y}$}Two sets are \emph{equal} when one set is both a subset and a superset of the other set. Otherwise, the two sets are not equal. For example, since $\set{E} \subseteq \set{B}$ and $\set{E} \supseteq \set{B}$ then $\set{E}$ and $\set{B}$ are equal, denoted $\set{E} = \set{B}$. However, since $\set{C} \subseteq \set{A}$ but set $\set{A}$ is not a subset of set $\set{C}$ then set $\set{A}$ and set $\set{C}$ are not equal, denoted $\set{A} \neq \set{C}$. \paragraph{Strict Containment:} \symdef[]{Csets.1c}{subsupset}{$\subset$ ($\supset$)}{is a proper/strict subset (superset) of}More generally, when one set is a subset of another set but the sets are not equal then the subset is called a \emph{proper (or strict) subset} and the superset is a \emph{proper (or strict) superset}. From the previous example, $\set{C} \subset \set{A}$ or $\set{A} \supset \set{C}$ both denote that set $\set{C}$ is a proper subset of set $\set{A}$ and $\set{A}$ is a proper superset of set $\set{C}$. In this case, we say that $\set{C}$ \emph{is strictly contained in} $\set{A}$ or $\set{A}$ \emph{strictly contains} $\set{C}$. Since $a \in \set{A}$ and $\{a\} \neq \set{A}$ then $\{a\} \subset \set{A}$, and so $\set{A}$ is said to \emph{strictly contain} $\{a\}$ or \emph{contain $\{a\}$ strictly}. Note that some authors omit symbols for strict containment and use the symbols $\subset$ and $\supset$ to represent containment in general. \paragraph{Containment of Empty Set:} The empty set $\emptyset$ is a subset of every set. Thus, $\emptyset \subseteq \set{A}$ and $\emptyset \subseteq \{\}$. In fact, $\emptyset \subset \set{A}$; every set contains the empty set with \emph{strict} containment if and only if the set is nonempty (\ie, if the set is nonempty then the set strictly contains $\emptyset$ and if the set strictly contains $\emptyset$ then the set must be nonempty). \paragraph{The Size of Sets:} To say that set $\set{X}$ is \emph{smaller} than $\set{Y}$ means that $\set{X} \subseteq \set{Y}$. For a set of sets $\setset{B}$ with $\set{X} \in \setset{B}$, to say that $\set{X}$ is the \emph{smallest} element of $\setset{B}$ means that for any $\set{B} \in \setset{B}$, $\set{X} \subseteq \set{B}$. Similarly, to say that set $\set{X}$ is \emph{larger} than $\set{Y}$ means that $\set{Y} \subseteq \set{X}$. For a set of sets $\setset{B}$ with $\set{X} \in \setset{B}$, to say that $\set{X}$ is the \emph{largest} element of $\setset{B}$ means that for any $\set{B} \in \setset{B}$, $\set{B} \subseteq \set{X}$. \paragraph{Infinite Sets:} Note that all of these relationships are defined for infinite sets as well; for example, $\N \subset \W$ and $\N \neq \W$. We should note that a formal definition of these set relations (\ie, $=$, $\neq$, $\subseteq$, $\supseteq$, etc.) requires a discussion of the \emph{universal set}, which we introduce in \longref{app:math_universal_set}; we will discuss this briefly in \longref{app:math_relations}. \paragraph{Set-Builder Notation:} \symdef[]{Csets.1ab}{setbuilder}{$\{ u : p \}$}{set of all elements of $u$ such that $p$}New sets can be built from other \emph{already existing} sets using \emph{set-builder notation}. That is, the notation $\{ u : p \}$ represents the set of all elements of the \emph{universe of discourse} $u$ that make the \emph{predicate} $p$ true. For example, the set $\{ x \in \set{A} : x \in \set{B} \}$ (\ie, the set of all elements of set $\set{A}$ that are also elements of set $\set{B}$) is equivalent to the set $\{ x : x \in \set{A} \text{ and } x \in \set{B} \}$ (\ie, the set of all elements of both sets $\set{A}$ and $\set{B}$) which represents the \emph{singleton set} $\{c\}$. We will use this notation heavily to construct sets. \symdef[]{Csets.1ab}{setbuilderlong}{$\{ u : p, q, r \}$}{set of all elements of $u$ such that $p$, $q$, and $r$}If a number of statements in the predicate are connected with commas, all must occur simultaneously. For example, the set $\{ x : x \in \W, x \notin \N\} = \{0\}$ represents the whole numbers that are not natural numbers. \symdef[]{Dseq.0}{indexnotation}{$x(i)$~or~$x_i$~or~$x^i$}{alternate notations for an index $i$ on a symbol $x$}% \paragraph{Index Notation and Index Sets:} A symbol $\theta$ may be equipped with a \emph{subscript} like $\theta_i$, a \emph{superscript} like $\theta^i$, or an \emph{argument} like $\theta(i)$. Depending on the types of symbols $\theta$ and $i$ and the context of their use, each of these notations may have a different meaning. However, very often $i$ is serves as an \emph{index} which makes a notation like $\theta_i$ distinct from a notation like $\theta_j$. In particular, often an \emph{index set} will be defined to provide indices that help generate notations that share some similarity. For example, take the index set $\set{I} \triangleq \{a,b,c\}$ which generates the symbols $\theta_a$, $\theta_b$, and $\theta_c$. These symbols can be easily collected using set-builder notation into the set $\{ \theta_i : i \in \set{I} \}$. This is can be a more convenient notation than explicitly listing each element in the set, as in $\{ \theta_a, \theta_b, \theta_c \}$. Note that very often index sets will be \emph{equipped} with an \emph{order relation}, the topic of \longref{app:math_order_theory}, for reasons discussed in \longref{app:math_sumprod_ind_fam}. \paragraph{Natural and Whole Numbers:} Note that the numbers defined in \longrefs{eq:zero}--\shortref{eq:three} which are all elements of the set $\W$ have element and subset relationships with each other. In particular, % \begin{align*} 0 \in 1 \text{ and } 0 \in 2 \text{ and } 0 \in 3 \text{ and } \cdots\\ 1 \in 2 \text{ and } 1 \in 3 \text{ and } \cdots\\ 2 \in 3 \text{ and } \cdots\\ \vdots \end{align*} % and % \begin{align*} 0 \subset 1 \text{ and } 0 \subset 2 \text{ and } 0 \subset 3 \text{ and } \cdots\\ 1 \subset 2 \text{ and } 1 \subset 3 \text{ and } \cdots\\ 2 \subset 3 \text{ and } \cdots\\ \vdots \end{align*} % This is a special and noteworthy property of the elements of the whole numbers $\W$. The subset relationship among the whole numbers can be summarized as % \begin{align*} 0 \subset 1 \subset 2 \subset 3 \subset 4 \subset 5 \subset 6 \subset \cdots \end{align*} % which, of course, also means that % \begin{align*} 0 \subseteq 1 \subseteq 2 \subseteq 3 \subseteq 4 \subseteq 5 \subseteq 6 \subseteq \cdots \end{align*} % and this kind of telescoping notation is common. This captures the more familiar notions of $<$ and $\leq$ (\ie, less than and less than or equal to) respectively, which both will be introduced in \longref{app:math_total_order_set} and explored in \longref{app:math_numbers}. \subsection{The Ordered Pair} \label{app:math_ordered_pair} We will use $(\cdot,\cdot)$ to denote an \symdef[\emph{ordered pair}]{Csets.2cart0}{orderedpair}{$(a,b)$}{ordered pair of objects $a$ and $b$ (\ie, $(a,b) \triangleq \{\{a\},\{a,b\}\}$)}. An ordered pair is a collection of two objects that has the property that for objects $a$, $b$, $c$, and $d$, the ordered pair $(a,b)$ is equal to ordered pair $(c,d)$ if and only if $a$ is equal to $c$ and $b$ is equal to $d$. This is a stronger property of equality than the one that is carried with sets. We refer to this special property as the \emph{equality property} of ordered pairs. Take arbitrary objects $a$ and $b$. There are two special traits of ordered pairs that distinguish them from simple sets. % \begin{itemize} \item While $\{a,b\}$ and $\{b,a\}$ describe equivalent sets, $(a,b)$ and $(b,a)$ describe two distinct ordered pairs. In other words, ordered pairs have some notion of element place or rank. Every ordered pair has a \emph{first element} which may also be called its \emph{left projection}; similarly, every ordered pair has a \emph{second element} which may also be called its \emph{right projection}. For distinct objects $a$ and $b$ and ordered pair $(a,b)$, $a$ is the ordered pair's first element and $b$ is the ordered pair's second element. \item Note that elements of an ordered pair need not be distinct. Thus, $(a,a)$ and $(b,b)$ are both valid ordered pairs. For each of these two examples, the first element and second element of the ordered pair are equal. \end{itemize} % Other common notations for the ordered pair $(a,b)$ include $\langle a,b \rangle$ and the \emph{Dirac inner-product notation} $\langle a|b \rangle$. These other notations have been introduced to reduce ambiguity between ordered pairs and other set-theoretic constructs. However, we will use the $(a,b)$ notation as we will use parentheses around any ordered list and curly braces around any unordered list (\eg, a set). We will remove any ambiguity by the context in which the notation is used. Ordered pairs can be formally defined using sets in a number of intuitive ways. Again, take the arbitrary objects $a$ and $b$. It is natural to define the ordered pair $(a,b)$ as the set $\{\{0,a\},\{1,b\}\}$, which emphasizes the order of the two objects by associating each of them with specific symbols $0$ and $1$. Additionally, it is easy to show that this definition of ordered pair has the special equality property required of all ordered pairs. However, we make use of the notion of a \emph{Kuratowski pair}, which defines the ordered pair $(a,b)$ as % \begin{equation*} (a,b) \triangleq \{ \{a\}, \{a,b\} \} \end{equation*} % This is the usual definition of ordered pair used in axiomatic set theory. It also has the equality property of ordered pairs, but it does not require the introduction of symbols $0$ and $1$ like the other definition. \subsection{The Ordered Tuple} An ordered list of zero or finite length is called an \emph{ordered tuple}, which we refer to as simply a \emph{tuple}. Take $n \in \{0,1,2,\dots\}$ and objects $x_1$, $x_2$, \dots, $x_n$. An \emph{ordered $n$-tuple}, which we refer to as an \emph{$n$-tuple}, is a tuple of length $n$. The shortest tuple, denoted $()$ and called a $0$-tuple, is defined to be the empty set. That is, % \begin{equation*} () \triangleq \emptyset \end{equation*} % A tuple made up of only the $x_1$ object, denoted $(x_1)$ and called a $1$-tuple, is defined as % \begin{equation*} (x_1) \triangleq ((), x_1) \end{equation*} % That is, a $1$-tuple is an ordered pair with a $0$-tuple left element and an object for its right element. Similarly $(x_1,x_2)$ denotes a $2$-tuple with the $x_1$ and $x_2$ items in that respective order and is defined as % \begin{equation*} (x_1,x_2) \triangleq ((x_1), x_2) \end{equation*} % That is, the $2$-tuple $(x_1,x_2)$ is defined as an ordered pair with the $1$-tuple $(x_1)$ as its first element and the object $x_2$ as its second element. This is different than the ordered pair $(x_1,x_2)$, which has a specific set-theoretic definition. The ambiguity between these two notations is one of the many reasons why other authors use a different notation for an ordered pair. However, this ambiguity should not cause any confusion in any of our arguments. Thus, we will use parentheses in all structures related to lists (\ie, collections of objects in which the order of the objects is important). In fact, this ambiguity will serve as a notational convenience in \longref{app:math_cartesian_prod}. In general, \symdef{Csets.2cart01}{ntuple}{$(x_1,x_2,\dots,x_n)$}{$n$-tuple (\ie, tuple of length $n \in \N$ with coordinates $x_1$, $x_2$,\dots,$x_n$ in their respective order)} denotes an $n$-tuple with the $x_1$, $x_2$, \dots, $x_n$ in their respective order and is defined as % \begin{equation*} (x_1,x_2,\dots,x_n) \triangleq ((x_1,x_2,\dots,x_{n-1}), x_n) \end{equation*} % using an ordered pair construction similar to the one used for a $2$-tuple. For an $n$-tuple $(x_1,x_2,\dots,x_n)$, $x_1$ is called the \emph{first coordinate} and $x_2$ is called the \emph{second coordinate} and, continuing in this pattern, $x_n$ is called the \emph{n$^\text{th}$ coordinate}. Thus, an $n$-tuple has $n$ \emph{coordinates}. As defined here, all tuples of finite non-zero length can be expressed in terms of ordered pairs. The construction of these tuples grows ``rightward'' as new elements are introduced as right projections of each ordered pair. In computer science, it is common to define these tuples as growing ``leftward'' with new elements introduced as left projections of each ordered pair. It can be shown that this difference has no major impact on the utility of the tuple. Growing tuples to the right or to the left is largely influenced by historical conventions in different disciplines and has little impact on the application of tuples. As will be shown in \longref{app:math_cartesian_prod}, it is more common to use an ordered pair instead of a $2$-tuple primarily because the recursive construction of tuples allows the ordered pair to serve as a kind of fundamental tuple from which all other tuples can be built. Many authors only define tuples for lists of three or more. For lists of two items, ordered pairs are used. For lists of one item, the item is stands alone without a list. We will follow this convention as well. \subsection{Cartesian Products} \label{app:math_cartesian_prod} The \symdef[\emph{binary Cartesian product}]{Csets.2cart1}{cartesian2}{$\set{X} \times \set{Y}$}{(binary) Cartesian product of sets $\set{X}$ and $\set{Y}$ (\ie, $\set{X} \times \set{Y} \triangleq \{(x,y):x \in \set{X}, y \in \set{Y}\}$)} of two non-empty sets $\set{X}$ and $\set{Y}$ is denoted $\set{X} \times \set{Y}$ and is defined % \begin{equation*} \set{X} \times \set{Y} \triangleq \{(x,y) : x \in \set{X}, y \in \set{Y}\} \end{equation*} % where sets $\set{X}$ and $\set{Y}$ are called \emph{factors}. The parenthetical notation in this definition represents the ordered pair. That is, the binary Cartesian product of two non-empty sets is the set of all ordered pairs that have a first coordinate from one set and a second coordinate from the other set. If either of the two factor sets are the empty set then the binary Cartesian product is also the empty set. Since the result of a binary Cartesian product of two sets is an additional set, it can serve as a factor in an additional binary Cartesian product. For example, consider non-empty sets $\set{X}$, $\set{Y}$, and $\set{Z}$. Using the definitions above, the \emph{ternary Cartesian product} $\set{X} \times \set{Y} \times \set{Z}$ can be built with two binary Cartesian products and expressed as % \begin{equation*} \set{X} \times \set{Y} \times \set{Z} \triangleq \{(x,y,z) : x \in \set{X}, y \in \set{Y}, z \in \set{Z}\} \end{equation*} % That is, it can be expressed as a set of all possible $3$-tuples with a first coordinate from set $\set{X}$, a second coordinate from set $\set{Y}$, and a third coordinate from set $\set{Z}$. Also note that if any of these sets were empty, this ternary Cartesian product would also be empty. This example shows the utility of using the parenthetical notation for ordered pair. Because a binary Cartesian product is defined with an ordered pair, binary Cartesian products of binary Cartesian products can be defined with tuples. In particular, take $n \in \{2,3,\dots\}$ and non-empty sets $\set{X}_1$, $\set{X}_2$, \dots, $\set{X}_n$. If tuples and ordered pairs share the same notation then the very general \emph{$n$-ary Cartesian product}, or simply the \symdef[\emph{Cartesian product}]{Csets.2cart10}{cartesian}{$\set{X}_1 \times \cdots \times \set{X}_n$}{Cartesian product of $n$ sets $\set{X}_1$, \dots, $\set{X}_n$ (\ie, $\set{X}_1 \times \cdots \times \set{X}_n \triangleq \{(x_1,\dots,x_n):x_1 \in \set{X}_1, \dots, x_n \in \set{X}_n\}$)}, of these sets can be defined by % \begin{equation*} \set{X}_1 \times \set{X}_2 \times \cdots \times \set{X}_n \triangleq \{(x_1,x_2,\dots,x_n) : x_1 \in \set{X}_1, x_2 \in \set{X}_2, \dots, x_n \in \set{X}_n\} \end{equation*} % where the $n$-tuple in the set definition uses the convention that a $2$-tuple refers to an ordered pair and an $n$-tuple where $n > 2$ uses the standard tuple definition. If any of these $n$ sets are empty, the result is an empty set. The notation % \begin{equation*} \prod\limits_{i=1}^n \set{X}_i \triangleq \set{X}_1 \times \set{X}_2 \times \cdots \times \set{X}_n \end{equation*} % is also often used. Consider the special case of a Cartesian product of a single set $\set{X}$ with itself $n$ times. In this case, this Cartesian product \symdef{Csets.2cart11}{cartesiann}{$\set{X}^n$}{Cartesian product of set $\set{X}$ with itself $n$ times (\eg, $\set{X}^3 \triangleq \set{X} \times \set{X} \times \set{X}$)} is denoted $\set{X}^n$. That is, % \begin{align*} \set{X}^n &\triangleq \prod\limits_{i=1}^n \set{X}\\ &= \set{X} \times \set{X} \times \cdots \times \set{X} \end{align*} % Therefore, the set $\set{X}^n$ is a the set of all $n$-tuples that can be made by choosing each of the $n$ coordinates to be an element of the set $\set{X}$. For example, the Cartesian product set $\{0,1\}^2 = \{(0,0),(0,1),(1,0),(1,1)\}$ and $(0,1,1,0) \in \{0,1\}^4$. In \longrefs{app:math_functions} and \shortref{app:math_cardinality}, it will be shown that the notation $\set{X}^n$ has other interesting interpretations that show that it was not chosen arbitrarily. The most general definition of Cartesian product also allows for products with an infinite (\ie, unbounded or even uncountable) number of factors. For example, the set of all countably infinite strings consisting of elements from the set $\{0,1\}$ is represented as % \begin{align*} \{0,1\}^\N &\triangleq \prod\limits_{i=1}^\infty \{0,1\}\\ &= \{0,1\} \times \{0,1\} \times \{0,1\} \times \cdots \end{align*} % where $\N$ are the natural numbers defined in \longref{eq:natural_numbers}. Roughly speaking, $\{0,1\}^\N$ represents every way an ordered list of every natural number can have each of its elements replaced with either a $0$ or a $1$. Using the definition from \longref{eq:two}, this set can also be represented by $2^\N$, which is a notation that will be explored in more detail in \longref{app:math_power_sets}. \subsection{Functions: Mappings Between Sets} \label{app:math_functions} Roughly speaking, a function relates elements from one set to elements of another set. Take two arbitrary sets $\set{G}$ (called the \emph{domain}) and $\set{H}$ (called the \emph{codomain}). A \emph{(total) function} $f$ is a set with $f \subseteq \set{G} \times \set{H}$ such that for every $g \in \set{G}$, there is \emph{exactly} one pair $(x,y) \in f$ such that $x = g$. The set of all such functions is denoted % \begin{equation*} \set{H}^\set{G} \end{equation*} % and so function $f \in \set{H}^\set{G}$; however, it is more common to use the \symdef[]{Ganalysis.0011}{function}{$f: \set{X} \mapsto \set{Y}$}{a function $f$ with domain $\set{X}$ and codomain $\set{Y}$}notation % \begin{equation*} f: \set{G} \mapsto \set{H} \end{equation*} % Any ambiguity with the $\set{H}^\set{G}$ notation and the Cartesian product notation will be removed in \longref{app:math_congruent_sets}. For some $x \in \set{G}$ and the corresponding $(x,y) \in f$, the right projection $y$ of $(x,y)$ is denoted $f(x)$. In other words, by the definition of a function, for any function $f: \set{G} \mapsto \set{H}$, for all $x \in \set{G}$, there exists a unique $f(x)$ such that $(x,f(x)) \in f$. \paragraph{Examples:} Take the three finite sets % \begin{align*} \set{X} &\triangleq \{a,b,c,d\}\\ \set{Y} &\triangleq \{s,t,u,v,w\}\\ \set{Z} &\triangleq \{m,n,o,p\} \end{align*} % Now take functions $f_s: \set{X} \mapsto \set{Y}$, $f_i: \set{Y} \mapsto \set{X}$, $f: \set{X} \mapsto \set{Z}$, and $f^{-1}: \set{Z} \mapsto \set{X}$. Define these four functions by % \begin{align*} f_s &\triangleq \{(a,t),(b,u),(c,v),(d,w)\}\\ f_i &\triangleq \{(s,a),(t,a),(u,b),(v,c),(w,d)\}\\ f &\triangleq \{(a,m),(b,n),(c,o),(d,p)\}\\ f^{-1} &\triangleq \{(m,a),(n,b),(o,c),(p,d)\} \end{align*} % These four functions are depicted by \longrefs{fig:functions_injective}--\shortref{fig:functions_inverse}, respectively. % \begin{figure}[!ht]\centering \subfloat[Injective Function][Injective Function $f_i$]{ \begin{picture}(100,100)(0,0) \thinlines % \put(25,50){\oval(40,100)} \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}} \put(25,80){\circle*{2}} \put(25,60){\circle*{2}} \put(25,40){\circle*{2}} \put(25,20){\circle*{2}} \put(22,80){\makebox(0,0)[r]{\text{$a$}}} \put(22,60){\makebox(0,0)[r]{\text{$b$}}} \put(22,40){\makebox(0,0)[r]{\text{$c$}}} \put(22,20){\makebox(0,0)[r]{\text{$d$}}} % \put(75,50){\oval(40,100)} \put(75,98){\makebox(0,0)[t]{\text{$\set{Y}$}}} \put(75,83.3){\circle*{2}} \put(75,66.7){\circle*{2}} \put(75,50){\circle*{2}} \put(75,33.3){\circle*{2}} \put(75,16.7){\circle*{2}} \put(78,83.3){\makebox(0,0)[l]{\text{$s$}}} \put(78,66.7){\makebox(0,0)[l]{\text{$t$}}} \put(78,50){\makebox(0,0)[l]{\text{$u$}}} \put(78,33.3){\makebox(0,0)[l]{\text{$v$}}} \put(78,16.7){\makebox(0,0)[l]{\text{$w$}}} % \qbezier(25,80)(50,90)(75,66.7) \qbezier(25,60)(50,70)(75,50) \qbezier(25,40)(50,50)(75,33.3) \qbezier(25,20)(50,30)(75,16.7) % \linethickness{\unitlength} \put(75,66.7){\vector(1000,-932){0}} \put(75,50){\vector(5,-4){0}} \put(75,33.3){\vector(1000,-668){0}} \put(75,16.7){\vector(1000,-532){0}} \end{picture} \label{fig:functions_injective} } \quad \subfloat[Surjective Function][Surjective Function $f_s$]{ \begin{picture}(100,100)(0,0) \thinlines % \put(25,50){\oval(40,100)} \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}} \put(25,80){\circle*{2}} \put(25,60){\circle*{2}} \put(25,40){\circle*{2}} \put(25,20){\circle*{2}} \put(22,80){\makebox(0,0)[r]{\text{$a$}}} \put(22,60){\makebox(0,0)[r]{\text{$b$}}} \put(22,40){\makebox(0,0)[r]{\text{$c$}}} \put(22,20){\makebox(0,0)[r]{\text{$d$}}} % \put(75,50){\oval(40,100)} \put(75,98){\makebox(0,0)[t]{\text{$\set{Y}$}}} \put(75,83.3){\circle*{2}} \put(75,66.7){\circle*{2}} \put(75,50){\circle*{2}} \put(75,33.3){\circle*{2}} \put(75,16.7){\circle*{2}} \put(78,83.3){\makebox(0,0)[l]{\text{$s$}}} \put(78,66.7){\makebox(0,0)[l]{\text{$t$}}} \put(78,50){\makebox(0,0)[l]{\text{$u$}}} \put(78,33.3){\makebox(0,0)[l]{\text{$v$}}} \put(78,16.7){\makebox(0,0)[l]{\text{$w$}}} % \qbezier(25,80)(50,100)(75,83.3) \qbezier(25,80)(50,70)(75,66.7) \qbezier(25,60)(50,50)(75,50) \qbezier(25,40)(50,30)(75,33.3) \qbezier(25,20)(50,10)(75,16.7) % \linethickness{\unitlength} \put(25,80){\vector(-5,-4){0}} \put(25,80){\vector(-5,2){0}} \put(25,60){\vector(-5,2){0}} \put(25,40){\vector(-5,2){0}} \put(25,20){\vector(-5,2){0}} \end{picture} \label{fig:functions_surjective} }\\ \medskip \subfloat[Bijective Function][Bijective Function $f$]{ \begin{picture}(100,100)(0,0) \thinlines % \put(25,50){\oval(40,100)} \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}} \put(25,80){\circle*{2}} \put(25,60){\circle*{2}} \put(25,40){\circle*{2}} \put(25,20){\circle*{2}} \put(22,80){\makebox(0,0)[r]{\text{$a$}}} \put(22,60){\makebox(0,0)[r]{\text{$b$}}} \put(22,40){\makebox(0,0)[r]{\text{$c$}}} \put(22,20){\makebox(0,0)[r]{\text{$d$}}} % \put(75,50){\oval(40,100)} \put(75,98){\makebox(0,0)[t]{\text{$\set{Z}$}}} \put(75,80){\circle*{2}} \put(75,60){\circle*{2}} \put(75,40){\circle*{2}} \put(75,20){\circle*{2}} \put(78,80){\makebox(0,0)[l]{\text{$m$}}} \put(78,60){\makebox(0,0)[l]{\text{$n$}}} \put(78,40){\makebox(0,0)[l]{\text{$o$}}} \put(78,20){\makebox(0,0)[l]{\text{$p$}}} % \qbezier(25,80)(50,90)(75,80) \qbezier(25,60)(50,70)(75,60) \qbezier(25,40)(50,50)(75,40) \qbezier(25,20)(50,30)(75,20) % \linethickness{\unitlength} \put(75,80){\vector(5,-2){0}} \put(75,60){\vector(5,-2){0}} \put(75,40){\vector(5,-2){0}} \put(75,20){\vector(5,-2){0}} \end{picture} \label{fig:functions_bijective} } \quad \subfloat[Function Inverse][$f^{-1}$, Inverse of $f$]{ \begin{picture}(100,100)(0,0) \thinlines % \put(25,50){\oval(40,100)} \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}} \put(25,80){\circle*{2}} \put(25,60){\circle*{2}} \put(25,40){\circle*{2}} \put(25,20){\circle*{2}} \put(22,80){\makebox(0,0)[r]{\text{$a$}}} \put(22,60){\makebox(0,0)[r]{\text{$b$}}} \put(22,40){\makebox(0,0)[r]{\text{$c$}}} \put(22,20){\makebox(0,0)[r]{\text{$d$}}} % \put(75,50){\oval(40,100)} \put(75,98){\makebox(0,0)[t]{\text{$\set{Z}$}}} \put(75,80){\circle*{2}} \put(75,60){\circle*{2}} \put(75,40){\circle*{2}} \put(75,20){\circle*{2}} \put(78,80){\makebox(0,0)[l]{\text{$m$}}} \put(78,60){\makebox(0,0)[l]{\text{$n$}}} \put(78,40){\makebox(0,0)[l]{\text{$o$}}} \put(78,20){\makebox(0,0)[l]{\text{$p$}}} % \qbezier(25,80)(50,90)(75,80) \qbezier(25,60)(50,70)(75,60) \qbezier(25,40)(50,50)(75,40) \qbezier(25,20)(50,30)(75,20) % \linethickness{\unitlength} \put(25,80){\vector(-5,-2){0}} \put(25,60){\vector(-5,-2){0}} \put(25,40){\vector(-5,-2){0}} \put(25,20){\vector(-5,-2){0}} \end{picture} \label{fig:functions_inverse} } \caption[Examples of the Four Types of Functions.]{Examples of the four types of functions.} \label{fig:functions} \end{figure} % These images show each set as an oval with elements represented as dots within the oval. The arrowhead curves represent each element of the function where the head of the curve represents the right projection of the element and the tail of the curve represents the left projection of the element. In other words, these functions are \emph{mappings} from a domain set to a corresponding codomain set. The function $f_i$ is known as an \emph{injective} function since every element of the codomain is mapped to from at \emph{most} one element of the domain. The function $f_s$ is not an injective function since two of its elements, $(s,a)$ and $(t,a)$, both have $a$ as a right projection. However, $f_s$ is called an \emph{surjective} function since every member of its codomain is mapped to from at \emph{least} one element of the domain. Surjective functions are said to be \emph{onto} their codomains. It is clear that $f_i$ is not a surjective function because element $s$ of the codomain $\set{Y}$ is not a right projection of any of the elements of $f_s$. The function $f$ is both injective and surjective, and thus it is called a \emph{bijective function} or simply a \emph{bijection}. For every bijective function, there exists an \emph{inverse function} that is also a bijective function. Because of this, bijective functions are also called \emph{invertible}. The inverse of function $f$ is denoted by $f^{-1}$. Roughly speaking, a function's inverse is a function which is the reverse mapping of the original function. Because of this, the bijection $f$ may be denoted by $f: \set{X} \biject \set{Z}$, which indicates that a mapping exists both from set $\set{X}$ to set $\set{Z}$ as well as from set $\set{Z}$ to set $\set{X}$. A more precise definition of inverse is given below. \paragraph{The Identity Function:} For any set $\set{X}$, the function $f: \set{X} \mapsto \set{X}$ defined by % \begin{equation*} f \triangleq \{ (x,x): x \in \set{X} \} \end{equation*} % is called the \emph{identity function}. That is, for set $\set{X}$ and identity function $f: \set{X} \mapsto \set{X}$, for all $x \in \set{X}$, $f(x)=x$. \paragraph{Compositions, and the Inverse:} Take three arbitrary sets $\set{F}$, $\set{G}$, and $\set{H}$. Take function $g: \set{F} \mapsto \set{G}$ and function $h: \set{G} \mapsto \set{H}$. The \emph{composition} of functions $h$ and $g$ is a new function $c: \mapsto \set{F} \times \set{H}$, such that for every $x \in \set{F}$, there is a pair $(x,h(g(x))) \in c$. This composition function is denoted $h \comp g$. For each $x \in \set{F}$, the right projection of the corresponding pair in $h \comp g$ is denoted by either $(h \comp g)(x)$ or $h(g(x))$. The function $f_i \comp f_s$ is shown in \longref{fig:function_comps_surjective_injective}. Its construction is depicted graphically in \longref{fig:function_comps_surjective_injective_composition}. Similarly, $f^{-1} \comp f$ is shown in \longref{fig:function_comps_identity}, and its construction is shown in \longref{fig:function_comps_identity_composition}. % \begin{figure}[!ht]\centering \subfloat[Surjective Composed with Injective][$f_i$ composed with $f_s$]{ \begin{picture}(150,100)(0,0) \thinlines % \put(25,50){\oval(40,100)} \put(25,98){\makebox(0,0)[t]{\text{$\set{Y}$}}} \put(25,83.3){\circle*{2}} \put(25,66.7){\circle*{2}} \put(25,50){\circle*{2}} \put(25,33.3){\circle*{2}} \put(25,16.7){\circle*{2}} \put(22,83.3){\makebox(0,0)[r]{\text{$s$}}} \put(22,66.7){\makebox(0,0)[r]{\text{$t$}}} \put(22,50){\makebox(0,0)[r]{\text{$u$}}} \put(22,33.3){\makebox(0,0)[r]{\text{$v$}}} \put(22,16.7){\makebox(0,0)[r]{\text{$w$}}} % \put(75,50){\oval(40,100)} \put(75,98){\makebox(0,0)[t]{\text{$\set{X}$}}} \put(75,80){\circle*{2}} \put(75,60){\circle*{2}} \put(75,40){\circle*{2}} \put(75,20){\circle*{2}} \put(75,77){\makebox(0,0)[t]{\text{$a$}}} \put(75,57){\makebox(0,0)[t]{\text{$b$}}} \put(75,37){\makebox(0,0)[t]{\text{$c$}}} \put(75,17){\makebox(0,0)[t]{\text{$d$}}} % \put(125,50){\oval(40,100)} \put(125,98){\makebox(0,0)[t]{\text{$\set{Y}$}}} \put(125,83.3){\circle*{2}} \put(125,66.7){\circle*{2}} \put(125,50){\circle*{2}} \put(125,33.3){\circle*{2}} \put(125,16.7){\circle*{2}} \put(128,83.3){\makebox(0,0)[l]{\text{$s$}}} \put(128,66.7){\makebox(0,0)[l]{\text{$t$}}} \put(128,50){\makebox(0,0)[l]{\text{$u$}}} \put(128,33.3){\makebox(0,0)[l]{\text{$v$}}} \put(128,16.7){\makebox(0,0)[l]{\text{$w$}}} % \qbezier(25,83.3)(50,90)(75,80) \qbezier(25,66.7)(50,70)(75,80) \qbezier(25,50)(50,50)(75,60) \qbezier(25,33.3)(50,30)(75,40) \qbezier(25,16.7)(50,10)(75,20) % \put(50,100){\makebox(0,0)[t]{\text{$f_s$}}} % \qbezier(75,80)(100,90)(125,66.7) \qbezier(75,60)(100,70)(125,50) \qbezier(75,40)(100,50)(125,33.3) \qbezier(75,20)(100,30)(125,16.7) % \put(100,100){\makebox(0,0)[t]{\text{$f_i$}}} % \linethickness{\unitlength} % \put(75,80){\vector(5,-2){0}} \put(75,80){\vector(5,2){0}} \put(75,60){\vector(5,2){0}} \put(75,40){\vector(5,2){0}} \put(75,20){\vector(5,2){0}} % \put(125,66.7){\vector(1000,-932){0}} \put(125,50){\vector(5,-4){0}} \put(125,33.3){\vector(1000,-668){0}} \put(125,16.7){\vector(1000,-532){0}} \end{picture} \label{fig:function_comps_surjective_injective_composition} } \quad \subfloat[Composition of Injective with Surjective]% [Function $f_i \comp f_s$]{ \begin{picture}(100,100)(0,0) \thinlines % \put(25,50){\oval(40,100)} \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}} \put(25,83.3){\circle*{2}} \put(25,66.7){\circle*{2}} \put(25,50){\circle*{2}} \put(25,33.3){\circle*{2}} \put(25,16.7){\circle*{2}} \put(22,83.3){\makebox(0,0)[r]{\text{$s$}}} \put(22,66.7){\makebox(0,0)[r]{\text{$t$}}} \put(22,50){\makebox(0,0)[r]{\text{$u$}}} \put(22,33.3){\makebox(0,0)[r]{\text{$v$}}} \put(22,16.7){\makebox(0,0)[r]{\text{$w$}}} % \put(75,50){\oval(40,100)} \put(75,98){\makebox(0,0)[t]{\text{$\set{Y}$}}} \put(75,83.3){\circle*{2}} \put(75,66.7){\circle*{2}} \put(75,50){\circle*{2}} \put(75,33.3){\circle*{2}} \put(75,16.7){\circle*{2}} \put(78,83.3){\makebox(0,0)[l]{\text{$s$}}} \put(78,66.7){\makebox(0,0)[l]{\text{$t$}}} \put(78,50){\makebox(0,0)[l]{\text{$u$}}} \put(78,33.3){\makebox(0,0)[l]{\text{$v$}}} \put(78,16.7){\makebox(0,0)[l]{\text{$w$}}} % \qbezier(25,83.3)(50,93.3)(75,66.7) \qbezier(25,66.7)(50,56.7)(75,66.7) \qbezier(25,50)(50,60)(75,50) \qbezier(25,33.3)(50,43.3)(75,33.3) \qbezier(25,16.7)(50,26.7)(75,16.7) % \linethickness{\unitlength} \put(75,66.7){\vector(125,-133){0}} \put(75,66.7){\vector(5,2){0}} \put(75,50){\vector(5,-2){0}} \put(75,33.3){\vector(5,-2){0}} \put(75,16.7){\vector(5,-2){0}} \end{picture} \label{fig:function_comps_surjective_injective} }\\ \medskip \subfloat[Inverse Composed with Its Bijective][$f^{-1}$ composed with $f$]{ \begin{picture}(150,100)(0,0) \thinlines % \put(25,50){\oval(40,100)} \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}} \put(25,80){\circle*{2}} \put(25,60){\circle*{2}} \put(25,40){\circle*{2}} \put(25,20){\circle*{2}} \put(22,80){\makebox(0,0)[r]{\text{$a$}}} \put(22,60){\makebox(0,0)[r]{\text{$b$}}} \put(22,40){\makebox(0,0)[r]{\text{$c$}}} \put(22,20){\makebox(0,0)[r]{\text{$d$}}} % \put(75,50){\oval(40,100)} \put(75,98){\makebox(0,0)[t]{\text{$\set{Z}$}}} \put(75,80){\circle*{2}} \put(75,60){\circle*{2}} \put(75,40){\circle*{2}} \put(75,20){\circle*{2}} \put(75,77){\makebox(0,0)[t]{\text{$m$}}} \put(75,57){\makebox(0,0)[t]{\text{$n$}}} \put(75,37){\makebox(0,0)[t]{\text{$o$}}} \put(75,17){\makebox(0,0)[t]{\text{$p$}}} % \put(125,50){\oval(40,100)} \put(125,98){\makebox(0,0)[t]{\text{$\set{X}$}}} \put(125,80){\circle*{2}} \put(125,60){\circle*{2}} \put(125,40){\circle*{2}} \put(125,20){\circle*{2}} \put(128,80){\makebox(0,0)[l]{\text{$a$}}} \put(128,60){\makebox(0,0)[l]{\text{$b$}}} \put(128,40){\makebox(0,0)[l]{\text{$c$}}} \put(128,20){\makebox(0,0)[l]{\text{$d$}}} % \qbezier(25,80)(50,90)(75,80) \qbezier(25,60)(50,70)(75,60) \qbezier(25,40)(50,50)(75,40) \qbezier(25,20)(50,30)(75,20) % \put(50,100){\makebox(0,0)[t]{\text{$f$}}} % \qbezier(75,80)(100,70)(125,80) \qbezier(75,60)(100,50)(125,60) \qbezier(75,40)(100,30)(125,40) \qbezier(75,20)(100,10)(125,20) % \put(100,100){\makebox(0,0)[t]{\text{$f^{-1}$}}} % \linethickness{\unitlength} % \put(75,80){\vector(5,-2){0}} \put(75,60){\vector(5,-2){0}} \put(75,40){\vector(5,-2){0}} \put(75,20){\vector(5,-2){0}} % \put(125,80){\vector(5,2){0}} \put(125,60){\vector(5,2){0}} \put(125,40){\vector(5,2){0}} \put(125,20){\vector(5,2){0}} \end{picture} \label{fig:function_comps_identity_composition} } \quad \subfloat[Composition of Inverse with Its Bijective][Identity function $f^{-1} \comp f$]{ \begin{picture}(100,100)(0,0) \thinlines % \put(25,50){\oval(40,100)} \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}} \put(25,80){\circle*{2}} \put(25,60){\circle*{2}} \put(25,40){\circle*{2}} \put(25,20){\circle*{2}} \put(22,80){\makebox(0,0)[r]{\text{$a$}}} \put(22,60){\makebox(0,0)[r]{\text{$b$}}} \put(22,40){\makebox(0,0)[r]{\text{$c$}}} \put(22,20){\makebox(0,0)[r]{\text{$d$}}} % \put(75,50){\oval(40,100)} \put(75,98){\makebox(0,0)[t]{\text{$\set{X}$}}} \put(75,80){\circle*{2}} \put(75,60){\circle*{2}} \put(75,40){\circle*{2}} \put(75,20){\circle*{2}} \put(78,80){\makebox(0,0)[l]{\text{$a$}}} \put(78,60){\makebox(0,0)[l]{\text{$b$}}} \put(78,40){\makebox(0,0)[l]{\text{$c$}}} \put(78,20){\makebox(0,0)[l]{\text{$d$}}} % \qbezier(25,80)(50,90)(75,80) \qbezier(25,60)(50,70)(75,60) \qbezier(25,40)(50,50)(75,40) \qbezier(25,20)(50,30)(75,20) % \linethickness{\unitlength} \put(75,80){\vector(5,-2){0}} \put(75,60){\vector(5,-2){0}} \put(75,40){\vector(5,-2){0}} \put(75,20){\vector(5,-2){0}} \end{picture} \label{fig:function_comps_identity} } \caption[Examples of Function Composition.]{Examples of function composition.} \label{fig:function_comps} \end{figure} % The latter example, $f^{-1} \comp f: \set{X} \mapsto \set{X}$ is equivalent to the \emph{identity function} which maps every element of set $\set{X}$ to itself. That is, for all $x \in \set{X}$, it is the case that $(f^{-1} \comp f)(x) = x$ (\ie, $f^{-1}(f(x))=x$). In fact, this result follows directly from the precise definition of a function's inverse. For an arbitrary function $f: \set{X} \to \set{Y}$, its inverse $f^{-1}: \set{Y} \to \set{X}$ is function such that $f^{-1} \comp f$ is the identity function defined on set $\set{X}$ (\ie, an identity function for $\set{X} \mapsto \set{X}$). \paragraph{The Range of a Function:} Each function is defined to have a domain and a codomain. However, functions that are not surjective will not map to every element of their codomain. For example, the function $g: \{a,b,c\} \mapsto \{d,e,f\}$ defined as % \begin{equation*} g \triangleq \{(a,d),(b,f),(c,f)\} \end{equation*} % provides no mapping to the element $e$ of the function's codomain. The \emph{range} of a function is the subset of the function's codomain which represents all of the elements that have one or more mapping from the function's domain. That is, for an arbitrary function $f: \set{X} \mapsto \set{Y}$, the range of function $f$, denoted $\range(f)$, is defined % \begin{equation*} \range(f) \triangleq \{ y \in \set{Y} : (x,y) \in f \text{ for all } x \in \set{X} \} \end{equation*} % which might also be denoted % \begin{equation*} \range(f) \triangleq \{ y \in \set{Y} : y=f(x) \text{ for all } x \in \set{X} \} \end{equation*} % Clearly, this set is a subset of the function's codomain. That is, $\range(f) \subseteq \set{Y}$. Note that if $\range(f) = \set{Y}$ then function $f$ is surjective. \paragraph{Images:} The \emph{image} of a subset of a function's domain under that function is the subset of the codomain of the function that are mapped to from that domain subset. That is, for a function $f: \set{X} \mapsto \set{Y}$ where set $\set{Z} \subseteq \set{X}$ then the image of $\set{Z}$ under $f$, denoted $f[\set{Z}]$ or $f(\set{Z})$, is defined % \begin{equation*} f[\set{Z}] \triangleq \{ y \in \set{Y} : y=f(x) \text{ for all } x \in \set{Z} \} \end{equation*} % Clearly $f[\set{Z}] \subseteq \set{Y}$. Additionally, the image of a function's domain is its range. In other words, for function $f: \set{X} \mapsto \set{Y}$, it is the case that $f[\set{X}] = \range(f)$. \paragraph{Pre-images:} The \emph{preimage} or \emph{inverse image} of a subset of a function's codomain under that function is the subset of the domain of the function that maps to that codomain subset. That is, for a function $f: \set{X} \mapsto \set{Y}$ where set $\set{Z} \subseteq \set{Y}$ then the preimage of $\set{Z}$ under $f$, denoted $f^{-1}[\set{Z}]$ or $f^{-1}(\set{Z})$, is defined % \begin{equation*} f^{-1}[\set{Z}] \triangleq \{ x \in \set{X} : f(x) \in \set{Z} \} \end{equation*} % Clearly $f^{-1}[\set{Z}] \subseteq \set{X}$. Additionally, the image of a function's range is its domain. In fact, the image of any superset of a function's range (\eg, its codomain) is also its domain. In other words, for a function $f: \set{X} \mapsto \set{Y}$, it is the case that both $f^{-1}[\range{f}] = \set{X}$ and $f^{-1}[\set{Y}] = \set{X}$. It is important to note that the inverse image or preimage of a set under a function is \emph{not} equivalent to the inverse of a function. In fact, the inverse of a function only exists if the function is a bijection. For example, take the function $g: \{a,b,c\} \mapsto \{d,e,f\}$ defined with % \begin{equation*} g \triangleq \{(a,d),(b,f),(c,f)\} \end{equation*} % The preimage $g^{-1}[\{f\}] = \{b,c\}$; however, the inverse of $g$ does not exist since $g$ is not a bijection. This can cause some confusion because sometimes the notations $g^{-1}[f]$, $g^{-1}(\{f\})$, or even $g^{-1}(f)$ might be used to represent the image of $\{f\}$ under function $g$. \paragraph{Images of Sets of Sets:} Take sets $\set{X}$ and $\set{Y}$ and a function $f: \set{X} \mapsto \set{Y}$. Now take $\setset{B} \subseteq \Pow(\set{X})$. That is, $\setset{B}$ is a set of subsets of $\set{X}$. The \emph{image} of set of sets $\setset{B}$ is denoted $f\{\setset{B}\}$ and defined by % \begin{equation*} f\{ \setset{B} \} \triangleq \{ f[\set{B}] : \set{B} \in \setset{B} \} \end{equation*} % where $f[\set{B}]$ is the image of set $\set{B} \in \setset{B}$ under $f$. That is, $f\{ \setset{B} \}$ is a set of images of sets. \paragraph{Function Restrictions:} New functions can be generated from existing functions by \emph{restricting} a function's mappings to map from a subset of the function's domain. That is, for a function $f: \set{X} \mapsto \set{Y}$ with subset $\set{Z} \subseteq \set{X}$, the restriction of function $f$ to set $\set{Z}$, denoted $f|_\set{Z}$, is defined % \begin{equation*} f|_\set{Z} \triangleq \{ (x,y) \in f : x \in \set{Z} \} \end{equation*} % Therefore, for $z \in \set{Z}$, it is also the case that $z \in \set{X}$, and so $f|_\set{Z}(z) = f(z)$. Note that the image of function $f: \set{X} \mapsto \set{Y}$ of set $\set{Z} \subseteq \set{X}$ is equal to the range of the restriction of function $f$ to set $\set{Z}$. In other words, $\range(f|_\set{Z})=f[\set{Z}]$. \paragraph{Closure Under a Function:} Take sets $\set{X}$ and $\set{Y}$ such that $\set{Y} \subseteq \set{X}$. Take a function $f: \set{X} \mapsto \set{X}$. If the image $f[\set{Y}] \subseteq \set{Y}$ then the subset $\set{Y}$ is said to be \emph{closed} under the function $f$. In other words, the function $f$ maps elements of set $\set{Y}$ back to $\set{Y}$. Put another way, the range of the restriction of $f$ to $\set{Y}$ is also a subset $\set{Y}$ (\ie, $\range(f|_\set{Y}) \subseteq \set{Y}$). \paragraph{Functions of Cartesian Products:} Take sets $\set{X}$, $\set{Y}$, and $\set{Z}$ and a function $f: \set{X} \times \set{Y} \mapsto \set{Z}$. For $(x,y) \in \set{X} \times \set{Y}$, the right projection of $(x,y)$ given by $f$ could be denoted $f((x,y))$; however, the extra parentheses are usually dropped, so the notation $f(x,y)$ is used. In other words, $((x,y),z) \in f$ if and only if $f(x,y)=z$. \paragraph{Partial Functions:} There is also a notion of \emph{partial function} that weakens the definition of function from including \emph{exactly} one element for each element of the domain to including \emph{at most} one element for each element of the domain. For example, for domain set $\set{X}$ and codomain set $\set{Y}$, a partial function $g \subset \set{X} \times \set{Y}$ (\ie, $g: \set{X} \mapsto \set{Y}$) could be defined % \begin{align*} g \triangleq \{(a,s),(b,t),(c,t)\} \end{align*} % This is a partial function (\ie, not a total function) because it provides no mapping for element $d$ of the domain set $\set{X}$. Of course, if the domain of $g$ was given as the set $\{a,b,c\}$ rather than set $\set{X}$ then $g$ would be a total function. Whether a \emph{function} is a partial function or a total function will often not be important to a particular problem; when it is important, the context should make it clear what the meaning of \emph{function} is. However, note that the important notion of \emph{bijection} should only be interpreted as involving total functions. \subsection{Indexed Families} \label{app:math_indexed_families} The notion of an \emph{indexed family} is practically identical to the notion of a function; that is, an indexed family provides an alternate notation for a function with little loss of generality. Take a function $f: \set{I} \mapsto \set{Y}$. Recall that the range of the function $f$ is the set % \begin{equation*} \{ f(i) : i \in \set{I} \} \end{equation*} % This represents the \emph{set} of values to which the function maps. Regardless of how many mappings are present in function $f$, if every element of the domain $\set{I}$ gets mapped to a single element of the range $y \in \set{Y}$ then the range will be simply $\{ y \}$ because the range is a set and a set only contains distinct values. Thus, the range lists the values mapped to by the function; however, it destroys any information about the mappings and so the function cannot be reconstructed by simply knowing the range. However, an \symdef[indexed family]{Dseq.1}{indexedfamily}{$(x_i:i \in \set{I})$}{an indexed family with index set $\set{I}$ (also $(x_i)_{i \in \set{I}}$)}, which is often denoted by % \begin{equation*} ( f_i : i \in \set{I} ) \quad \text{ or } \quad ( f_i )_{i \in \set{I}} \end{equation*} % is not a set. This makes an indexed family a collection of values which may or may not be distinct. For example, take the function $g: \{a,b,c\} \mapsto \{d,e,f\}$ defined by % \begin{equation*} g \triangleq \{(a,d),(b,f),(c,f)\} \end{equation*} % The range of this function is $\{d,f\}$. However, the indexed family representation of this function is $( f_a, f_b, f_c )$ or $( f_i : i \in \{a,b,c\})$ where $f_a = d$, $f_b = f$, and $f_c = f$. Also note that $f_i$ can be replaced with other index notations, like $f(i)$ and $f^i$. Important applications of indexed families can be found in \longrefs{app:math_sumprod_ind_fam} and \shortref{app:math_probability}. \paragraph{Ordered Indexed Families:} The indexed family notation can be especially useful when the index set is a \emph{directed set}. Directed sets are discussed in \longref{app:math_order_theory}. In this case, the indexed family is called an \symdef[\emph{ordered indexed family}]{Dseq.2}{orderedindexedfamily}{$(x(t):t \geq 0)$}{an ordered indexed family with a directed index set $\set{T}$ where $0 \in \set{T}$}. For example, the set $\W$ with the standard $\leq$ order relation is totally ordered and thus is also a directed set, so for a function $f: \W \mapsto \set{Y}$, the corresponding ordered indexed family might be listed % \begin{equation*} ( f(i) : i \geq 0 ) \quad \text{ or } \quad ( f(i) )_{i \geq 0} \quad \text{ or } \quad ( f(i) )_{i=0}^\infty \end{equation*} % where the symbol $\infty$ indicates that the length of the list is unbounded; that is, an equivalent notation is % \begin{equation*} ( f(0), f(1), f(2), f(3), \cdots ) \end{equation*} % Note that the order of the elements of the list match the order of the elements in the index set; this is intentional. Therefore, this notation provides a method for ordering the range values of the function. In other words, the value $f(0)$ comes \emph{before} all of the other values. This notation can also be used to restrict values of the function to a certain subset of $\W$ while also implying that the elements are still ordered. For example, $f$ restricted to $\{5,6,7,8\}$ (\ie, $f|_{\{5,6,7,8\}}$) could be listed % \begin{equation*} ( f(i) : 5 \leq i \leq 8 ) \quad \text{ or } \quad ( f(i) )_{i=5}^{8} \end{equation*} % which is equivalent to the list notation % \begin{equation*} ( f(5), f(6), f(7), f(8) ) \end{equation*} % where again the order of the elements of the list match the order of the elements of the index subset. This notation not only compactly restricts $f$ to a finite subset of $\W$, but it still \emph{maintains the order} of the elements of $f$. It is important to note that when viewing an indexed family as an alternate specification for a function, the indexed family does not communicate much information about the function's codomain. Thus, indexed families are primarily used to capture information about an list of objects. When that list of objects carries with it some special order, an indexed family can still carry information about that ordering while providing a more compact notation than the tuple or Cartesian product notation. Special versions of the ordered indexed family called \emph{nets} and \emph{sequences} will be introduced later in \longref{app:math_nets_and_sequences} \subsection{Congruent Sets} \label{app:math_congruent_sets} The term \emph{congruent} can mean a number of things depending on its context. However, all uses will have in common that two things that are congruent are somehow equal. That is, congruence is some weaker form of equality. That is, when unequal objects are similar enough to be substituted for each other with little to no impact on a problem then those objects might be called congruent. Our use of congruent is weaker than the use of most authors. That is, our use of congruence roughly translates to stating that two objects are the same size whereas other authors state that objects that are congruent not only have the same size but also have a roughly equivalent shape. However, most of these stronger definitions of congruent are synonyms for more descriptive terms. Therefore, if we mean to imply some stronger relationship between two sets than congruence, we will simply use the more descriptive term. This will be the subject of \longref{app:math_abstract_algebra}. \paragraph{Congruence by Bijection:} For any two sets $\set{G}$ and $\set{H}$, if there exists a bijection from $\set{G}$ to $\set{H}$ (\ie, there exists a $g: \set{G} \biject \set{H}$) then the sets are said to be where \symdef[\emph{congruent}]{Ageneral.2}{congruent}{$\cong$}{is congruent to}, which is denoted $\set{G} \cong \set{H}$. Congruence is a notion of equality. For finite sets, it is equivalent to say that the two sets have an equal number of elements. In the above examples, because $f$ is a bijection from $\set{X}$ to $\set{Z}$, the sets $\set{X}$ and $\set{Z}$ are congruent; that is, $\set{X} \cong \set{Z}$ and clearly the two finite sets have the same number of elements. Congruence also applies to infinite sets. For example, using the definition of $\W$ from \longref{eq:whole_numbers}, take the function $s: \W \biject \N$, defined by % \begin{equation*} s \triangleq \{(0,1),(1,2),(2,3),(3,4),\cdots\} \end{equation*} % That is, $s(47)=48$ and $s(1000)=1001$. Clearly, this function has an inverse $s^{-1}: \N \biject \W$ which is defined by % \begin{equation*} s^{-1} \triangleq \{(1,0),(2,1),(3,2),(4,3),\cdots\} \end{equation*} % That is, $s^{-1}(48)=47$ and $s^{-1}(1001)=1000$. So, $s$ is surely a bijection. As an exercise, note that $s^{-1} \comp s: \W \mapsto \W$ is % \begin{align*} s^{-1} \comp s \triangleq \{(0,0),(1,1),(2,2),(3,3),\cdots\} \end{align*} % and $s \comp s^{-1}: \N \mapsto \N$ is % \begin{align*} s \comp s^{-1} \triangleq \{(1,1),(2,2),(3,3),(4,4),\cdots\} \end{align*} % which are both identity functions, as expected since $s$ is a bijection. Since a bijection exists between the two infinite sets $\W$ and $\N$ then $\W \cong \N$. This is interesting because every element of $\N$ is also an element of $\W$; however, $\W$ includes $0$, which is not included in $\N$. That is, $\N$ is a strict subset of $\W$. In summary, % \begin{align*} \N \subset \W \quad \text{ and } \quad \N \neq \W \quad \text{ and } \quad \N \cong \W \end{align*} % It is impossible for two finite sets to be simultaneously related in these ways; congruent finite sets must have the same number of elements and so any set that is a strict subset of another set could never be congruent to that other set. Therefore, congruence is more generally a sort of structural equivalence between two sets rather than a size equivalence. Note that not all infinite sets are congruent. The congruence of finite and infinite sets plays a key role in the discussion in \longref{app:math_cardinality}. \paragraph{Countably and Uncountably Infinite Sets:} Note that since $\N \cong \N$ trivially and $\N \subset \W$ then $\N$ is not only congruent to $\W$ but is also congruent to a subset of $\W$. In fact, $\N$ is congruent to every countably infinite set. The definition of a \emph{countably infinite set} is one that is congruent with $\N$ (\ie, one in which there exists a bijection between it and $\N$). Therefore any infinite subset of $\W$, including $\N$, is a countably infinite set and is said to be \emph{countable}. If there is no bijection between a given infinite set and $\N$ then that set is an \emph{uncountably infinite set} and is said to be simply \emph{uncountable}. Put another way, if there is no injective function from a set to $\N$ then the set must be uncountable. \paragraph{Cartesian Product and Sets of Functions:} Take arbitrary set $\set{X}$. Recall that the Cartesian product $\set{X} \times \set{X}$ is also represented by $\set{X}^2$. Also recall that by the definition in \longref{eq:two}, $2 = \{0,1\}$. Thus, $\set{X}^2$ can also be written $\set{X}^{\{0,1\}}$, which is the set of all functions from $\{0,1\}$ to set $\set{X}$. It is true that $\set{X} \times \set{X}$ is certainly not equivalent to the set of functions $\set{X}^{\{0,1\}}$. However, these two sets are congruent. Each element of $\set{X}^{\{0,1\}}$ maps $0$ and $1$ to elements of $\set{X}$. Similarly, each element of $\set{X} \times \set{X}$ takes one element of $\set{X}$ as a left projection and one element of $\set{Y}$ as a right projection. For each function in $f \in \set{X}^{\{0,1\}}$, there is a pair $(f(0),f(1)) \in \set{X} \times \set{X}$. Additionally, for each pair $(x,y) \in \set{X} \times \set{X}$, there exists a function $f \in \set{X}^{\{0,1\}}$ such that $f(0)=x$ and $f(1)=y$. Therefore, there is a bijection between sets $\set{X}^{\{0,1\}}$ and $\set{X} \times \set{X}$ (\ie, roughly, they have the same size) and so the sets are congruent. This is why it is acceptable to substitute $\set{X}^2$ for $\set{X} \times \set{X}$. This is true for all $\set{X}^n$ with $n \in \{2,3,4,\dots\}$. \subsection{Cardinality} \label{app:math_cardinality} This is a brief introduction to the mathematical topic of \emph{cardinality}. To make it more complete, \emph{cardinals} and \emph{ordinals} should be discussed separately and contrasted. However, as it will not affect our work, our handling of cardinality may tend to blur the two concepts for simplicity. Roughly, cardinals represent some notion of size of a set and ordinals represent some notion of position in an order. The distinction between ordinals and cardinals becomes particularly important when handling infinite sets rigorously. \paragraph{Finite Cardinality and Congruence:} Consider the infinite number of sets of the form \longrefs{eq:zero}--\shortref{eq:three} that are each elements of $\W$ (\ie, the sets more commonly represented by symbols $0$, $1$, $2$, $3$, \dots). Take a arbitrary finite set $\set{X}$. There exists a unique element $c \in \W$ such that $c$ and $\set{X}$ are congruent. That is, there exists an element $c \in \W$ such that there is a bijection mapping every element from $c$ to every element of $\set{X}$. This unique element $c$ is referred to as the \symdef[\emph{cardinality}]{Csets.1zz}{cardinality}% {$\pipe\set{X}\pipe$}{cardinality of set $\set{X}$} and is denoted $|\set{X}|$. For example, for some of the finite sets used as examples above, % \begin{align*} |\emptyset| &= 0\\ |\set{A}| &= 3\\ |\set{B}| &= 3\\ |\set{C}| &= 1\\ |\set{D}| &= 2\\ |\set{E}| &= 3 \end{align*} % and for the domain and codomain sets used in \longref{app:math_functions}, % \begin{align*} |\set{X}| &= 4\\ |\set{Y}| &= 5\\ |\set{Z}| &= 4 \end{align*} % as expected, since a bijection exists between sets $\set{X}$ and $\set{Z}$, they are congruent, and since they are congruent, they have the same cardinality. Likewise, since finite sets $\set{A}$, $\set{B}$, and $\set{E}$ have the same cardinality, they are all congruent even though set $\set{A}$ is not equivalent to either set $\set{B}$ or set $\set{E}$ (also note that $\set{B}$ is equivalent to $\set{E}$, and so it must be congruent as well). Of course, it is also true that % \begin{align*} |0| &= |\{\}| = 0\\ |1| &= |\{0\}| = 1\\ |2| &= |\{0,1\}| = 2\\ |3| &= |\{0,1,2\}| = 3\\ &\mathrel{\vdots} \end{align*} % In fact, the function $n: \W \mapsto \W$ defined by % \begin{equation*} n \triangleq \{(x,y) \in \W \times \W: y = |x|\} \end{equation*} % is the identity function on set $\W$. That is, $n(53)=53$. The cardinality of any whole number is equal to itself (\ie, $|x|=x$ for all $x \in \W$). \paragraph{Cardinality of Infinite Sets:} The subject of cardinality of infinite sets is an interesting topic, but it is not crucial to our work, and so our coverage of this subject is brief. For the same reason that not all infinite sets are congruent, not all infinite sets have the same cardinality. However, it is the case that all congruent infinite sets do have the same cardinality. Just as every element of the whole numbers $\W$ is a candidate cardinality for any finite set, there are special numbers that have been created to play an analogous role for infinite sets. As an example, take the set $2^\N$ (\ie, $\{0,1\}^\N$), which is the the infinite set of all functions of the form $\N \mapsto \{0,1\}$. It can be shown that the natural numbers $\N$ are congruent to a strict subset of this set $2^\N$ just as the natural numbers are congruent to a strict subset of $\W$; however, while $\N \cong \W$, $\N$ is not congruent to $2^\N$. Roughly speaking, $2^\N$ is a larger set than $\N$ since any function taking the form $2^\N \mapsto \N$ is surjective but not injective. That is, for all functions of the form $2^\N \mapsto \N$, there is at least one element of $2^\N$ that is mapped to more than one element from $\N$. This prevents there from being a bijection between these two sets, and so these two sets cannot be congruent. Therefore, since there is no bijection between the two infinite sets $2^\N$ and $\N$ then $2^\N$ must be an uncountably infinite set. \subsection{Power Sets} \label{app:math_power_sets} The \symdef[\emph{power set}]{Csets.1z}{powerset}{$\Pow(\set{U})$}{power set of set $\set{U}$ (\ie, the set of all subsets of $\set{U}$)} of a set $\set{U}$, denoted $\Pow(\set{U})$, is defined to be the set of all subsets of set $\set{U}$. Clearly, % \begin{enumerate}[(i)] \item $\emptyset \in \Pow(\set{U})$ \label{item:power_set_emptyincl} \item $\set{U} \in \Pow(\set{U})$ \label{item:power_set_setincl} \item for all $\set{X} \subseteq \set{U}$, $\set{X} \in \Pow(\set{U})$ \end{enumerate} % By properties (\shortref{item:power_set_emptyincl}) and (\shortref{item:power_set_setincl}), $\Pow(\set{U}) \neq \emptyset$. This is true even for $\Pow(\emptyset)$ since $\Pow(\emptyset)=\{ \emptyset \}$. Therefore, all power sets are nonempty. \paragraph{Notations:} The notations $\set{Y} \in \Pow(\set{X})$ and $\set{Y} \subseteq \set{X}$ are equivalent ways of specifying that $\set{Y}$ is a subset of $\set{X}$. The power set is also denoted $2^\set{X}$, or equivalently $\{0,1\}^\set{X}$, because the set of all functions of the form $\set{X} \mapsto \{0,1\}$ is congruent to the set of subsets of $\set{X}$. To see this, take $\set{Y} \subseteq \set{X}$ to be a subset of $\set{X}$. Construct a function mapping every element in set $\set{Y}$ to $1$ and every element of $\set{X}$ that is not an element of $\set{Y}$ to $0$. By construction, this function is an element of $2^\set{X}$. Additionally, take $f \in 2^\set{X}$ to be an arbitrary function mapping all elements of $\set{X}$ to either a $0$ or a $1$. Construct a subset of $\set{X}$ made up of only those elements that map to a $1$. This subset will be a member of the power set of set $\set{X}$. Therefore, the power set and $2^\set{X}$ are congruent. \paragraph{Cardinality:} Recall from \longref{app:math_cardinality} that $2^\N$ is an infinite set that is somehow larger than the infinite set $\N$ (\ie, total functions from $\{0,1\}^\N$ to $\N$ can never be injective and thus will never be bijective). From the above explanation, $2^\N$ is congruent to $\Pow(\N)$ and is in fact an equivalent notation specifying the power set of set $\N$. It can be shown that any non-empty set is in some sense smaller than its corresponding power set. In the case of $\N$, an infinite yet countable set, its power set $\Pow(\N)$ is also infinite but is uncountable, and this lack of countability is what makes $\Pow(\N)$ somehow larger than $\N$. In \longref{app:math_numbers}, cardinality arithmetic is defined that allows the cardinality of a power set to be calculated; this helps justify that a set is always smaller (in terms of cardinality) than its power set. We also make notions of larger and smaller more precise in \longref{app:math_numbers}. Another important uncountable set that \emph{is} congruent to $\Pow(\N)$ will be introduced in \longref{app:math_reals}. \subsection{The Universal Set and the Complement of a Set} \label{app:math_universal_set} For a given discussion, if all sets are subsets of a single set $\set{U}$, that set $\set{U}$ is known as the \emph{universal set}. There is no way to define a single universal set for all discussions; in fact, defining a general set of all sets leads to contradictions of logic. In other words, many important operations on sets are not independent of context. This is particularly important when discussing the \symdef[\emph{complement}]{Csets.207}{complement}{$\set{X}^c$}% {complement of set $\set{X}^c$ (\eg, $\set{U} \setdiff \set{X}$ where $\set{X} \subseteq \set{U}$)} of a set, which could roughly be defined as a set made up of everything not in the set of interest. For example, the complement of set $\set{J}$ from \longref{eq:ex_set_J}, denoted $\set{J}^c$, could be $\{\text{Things not said by Joe}\}$ or could be $\{\text{Things said by other people}\}$. It is not possible to define $\set{J}^c$ without first defining the universal set. Once the universal set $\set{U}$ is defined as a superset of $\set{J}$ then $\set{J}^c$ is defined as % \begin{equation*} \set{J}^c \triangleq \{ x \in \set{U} : x \notin \set{J} \} \end{equation*} % For example, some valid universal sets that are each supersets of set $\set{J}$ are % \begin{align*} &\{\text{Things that were said or written by anyone}\} \text{ or }\\ &\{\text{Things that were said or written by Joe}\} \text{ or }\\ &\{\text{Things that were said by anyone}\} \end{align*} % or even $\set{J}$ itself. If the universal set $\set{U} = \set{J}$ then $\set{J}^c = \emptyset$; in fact, for any discussion, the complement of the universal set is always the empty set (\ie, $\set{U}^c = \emptyset$). Similarly, for any discussion, the complement of the empty set, is always the universal set (\ie, $\emptyset^c = \set{U}$). Of course, for any set $\set{X}$, $( \set{X}^c )^c = \set{X}$. That is, the complement of the complement of any set is the set itself. \paragraph{Power Set as Universal Set:} Take a universal set $\set{U}$ so that every set in a discussion would be a subset of $\set{U}$. The power set $\Pow(\set{U})$ serves as a universal set for all subsets of $\set{U}$. That is, for any $\set{X} \subseteq \set{U}$, $\set{X} \in \Pow(\set{U})$. Take any set of sets $\setset{S}$. A minimal universal set $\set{U}$ for each element of $\setset{S}$ can be defined by % \begin{equation*} \set{U} \triangleq \{ x : x \in \set{X}, \set{X} \in \setset{S} \} \end{equation*} % Clearly, for all $\set{X} \in \setset{S}$, $\set{X} \subseteq \set{U}$ and so $\set{X} \in \Pow(\set{U})$. Therefore, $\setset{S} \subseteq \Pow(\set{U})$, and so $\Pow(\set{U})$ can be viewed as the universal set for all elements of $\setset{S}$. \subsection{Operations on Sets} \label{app:math_set_operations} We will now discuss the standard \emph{set operations} and the corresponding \emph{set operators}. A formal definition of these set operations requires a discussion of the \emph{universal set}, which we introduce in \longref{app:math_universal_set}; we will discuss this briefly in \longref{app:math_operations}. \paragraph{Union:} The \symdef[\emph{union (or join)}]{Csets.202}{union}{$\set{X} \cup \set{Y}$}{set union (or join) of sets $\set{X}$ and $\set{Y}$} of two arbitrary sets is the set resulting from the inclusion of elements from both sets. That is, take arbitrary sets $\set{X}$ and $\set{Y}$ that are each subsets of universal set $\set{U}$. Their union is denoted $\set{X} \cup \set{Y}$ and defined by % \begin{equation*} \set{X} \cup \set{Y} \triangleq \{ z \in \set{U} : z \in \set{X} \text{ or } z \in \set{Y} \} \end{equation*} % Sometimes this operation is called as set addition and denoted $\set{X} + \set{Y}$. However, for reasons explained in \longref{app:math_algebras_of_sets}, calling this operation addition may not make sense. The \emph{symmetric difference} operation, explained below, makes more sense as a set addition operation. Take sets $\set{X}$ and $\set{Y}$. Note that % \begin{equation*} \set{X} \subseteq \set{Y} \quad \text{ if and only if } \quad \set{X} \cup \set{Y} = \set{Y} \end{equation*} % which relates to the reason for calling the set union the \emph{join} of its elements. This actually shows one way of defining what it means to be a subset of another set. Thus, % \begin{itemize} \item to say that $\set{X}$ is \emph{larger} than $\set{Y}$ means that $\set{Y} \subseteq \set{X}$ which is equivalent to saying that $\set{Y} \cup \set{X} = \set{X}$ \item for set of sets $\setset{B}$, if $\set{X} \in \setset{B}$ then set $\set{X}$ is the \emph{largest} set of $\setset{B}$ means that $\set{B} \cup \set{X} = \set{X}$ (\ie, $\set{X} \subseteq \set{B}$) for all $\set{B} \in \setset{B}$ \end{itemize} % Take the indexed sets $\set{X}_1$, $\set{X}_2$, $\set{X}_3$, and $\set{X}_4$. Also take a set of sets $\setset{X}$ defined by % \begin{align*} \setset{X} &\triangleq \{ \set{X}_i : i \in \{1,2,3,4\} \}\\ &= \{ \set{X}_1, \set{X}_2, \set{X}_3, \set{X}_4 \} \end{align*} % In this case, the symbol \symdef{Csets.2}{bigunion}{$\bigcup$}{union of many sets (compare to $\sum$)} can be used to represent the union of the elements of $\setset{X}$. That is, % \begin{equation*} \bigcup \{ \set{X}_1, \set{X}_2, \set{X}_3, \set{X}_4 \} \quad \text{ and } \quad \bigcup \{ \set{X}_i : i \in \{1,2,3,4\} \} \quad \text{ and } \quad \bigcup \setset{X} \end{equation*} % and the alternate notations % \begin{equation*} \bigcup\limits_{i \in \{1,2,3,4\}} \set{X}_i \quad \text{ and } \quad \bigcup\limits_{i=1}^4 \set{X}_i \end{equation*} % are all equivalent notations for $\set{X}_1 \cup \set{X}_2 \cup \set{X}_3 \cup \set{X}_4$. In other words, the symbol $\bigcup$ can be used to take the union of multiple sets, whether they be indexed or are simply elements of a set of sets. By convention, the union of an empty set of sets (\ie, $\bigcup \{\}$) is the empty set; this is analogous to the familiar \emph{additive identity}. Note that for arbitrary set $\set{X}$ which is a subset of universal set $\set{U}$ it is the case that $\set{X} \cup \set{X} = \set{X}$ and $\set{X} \cup \set{U} = \set{U}$. \paragraph{Intersection:} The \symdef[\emph{intersection (or meet)}]{Csets.201}{intersection}{$\set{X} \cap \set{Y}$}{set intersection (or meet) of sets $\set{X}$ and $\set{Y}$} of two arbitrary sets is the set of elements common to both sets. That is, take arbitrary sets $\set{X}$ and $\set{Y}$ that are each subsets of universal set $\set{U}$. Their intersection is denoted $\set{X} \cap \set{Y}$ and defined by % \begin{equation*} \set{X} \cap \set{Y} \triangleq \{ z \in \set{U} : z \in \set{X} \text{ and } z \in \set{Y} \} \end{equation*} % Of course, since $\set{X} \subseteq \set{U}$, it is equivalent to say % \begin{equation*} \set{X} \cap \set{Y} = \{ z \in \set{X} : z \in \set{Y} \} \end{equation*} % Take sets $\set{X}$ and $\set{Y}$. Note that % \begin{equation*} \set{X} \subseteq \set{Y} \quad \text{ if and only if } \quad \set{X} \cap \set{Y} = \set{X} \end{equation*} % which relates to the reason for calling the set intersection the \emph{meet} of its elements. This actually shows one way of defining what it means to be a subset of another set. Thus, % \begin{itemize} \item to say that $\set{X}$ is \emph{smaller} than $\set{Y}$ means that $\set{X} \subseteq \set{Y}$ which is equivalent to saying that $\set{X} \cap \set{Y} = \set{X}$ \item for set of sets $\setset{B}$, if $\set{X} \in \setset{B}$ then set $\set{X}$ is the \emph{smallest} set of $\setset{B}$ means that $\set{X} \cap \set{B} = \set{X}$ (\ie, $\set{X} \subseteq \set{B}$) for all $\set{B} \in \setset{B}$ \end{itemize} % As before, take $\setset{X} \triangleq \{ \set{X}_1, \set{X}_2, \set{X}_3, \set{X}_4 \}$. Then the symbol \symdef{Csets.2}{bigintersection}{$\bigcap$}{intersection of many sets (compare to $\sum$)} can be used to represent the intersection of the elements of $\setset{X}$. That is, % \begin{equation*} \bigcap \{ \set{X}_1, \set{X}_2, \set{X}_3, \set{X}_4 \} \quad \text{ and } \quad \bigcap \{ \set{X}_i : i \in \{1,2,3,4\} \} \quad \text{ and } \quad \bigcap \setset{X} \end{equation*} % and the alternate notations % \begin{equation*} \bigcap\limits_{i \in \{1,2,3,4\}} \set{X}_i \quad \text{ and } \quad \bigcap\limits_{i=1}^4 \set{X}_i \end{equation*} % are all equivalent notations for $\set{X}_1 \cap \set{X}_2 \cap \set{X}_3 \cap \set{X}_4$. In other words, the symbol $\bigcap$ can be used to take the intersection of multiple sets, whether they be indexed or are simply elements of a set of sets. By convention, the intersection of an empty set of sets (\ie, $\bigcap \{\}$) is the universal set; this is analogous to the familiar \emph{multiplicative identity}. Note that for arbitrary sets $\set{X}$ and $\set{Y}$ which are subsets of universal set $\set{U}$, it is the case that $\set{X} \cap \set{X} = \set{X}$ and $\set{X} \cap \emptyset = \emptyset$; it is also the case that $\set{X} \cup ( \set{X} \cap \set{Y} ) = \set{X} \cap ( \set{X} \cup \set{Y} ) = \set{X}$, where parentheses group operations which should be applied first. \paragraph{Difference:} The \symdef[\emph{set difference}]{Csets.203}{setdiff}{$\set{X} \setdiff \set{Y}$}{difference of sets $\set{X}$ and $\set{Y}$} of two arbitrary sets is the set resulting from the exclusion of the common elements (\ie, the intersection) of both sets from one of the sets. That is, take arbitrary sets $\set{X}$ and $\set{Y}$ that are each subsets of universal set $\set{U}$. The set difference between them, denoted $\set{X} \setdiff \set{Y}$, is defined by % \begin{equation*} \set{X} \setdiff \set{Y} \triangleq \{ z \in \set{U}: z \in \set{X} \text{ and } z \notin \set{Y} \} \end{equation*} % Since $\set{X} \subseteq \set{U}$ then % \begin{equation*} \set{X} \setdiff \set{Y} = \{ z \in \set{X} : z \notin \set{Y} \} \end{equation*} % Both $-$ and $\setminus$ are frequently used to denote the set difference. Note that $\set{X} \setdiff \set{Y} = \set{X} \cap \set{Y}^c$. The set complement can be written in terms of the difference between the set and the universal set. For arbitrary set $\set{X}$ that is a subset of universal set $\set{U}$, % \begin{equation*} \set{X}^c = \set{U} \setdiff \set{X} \end{equation*} % Many authors choose to make the universal set explicit and refer to the set complement only in terms of this set difference. Additionally, the set difference is sometimes called the \emph{relative complement}. That is, $\set{X} \setdiff \set{Y}$ would be called the \emph{relative complement of $\set{Y}$ in $\set{X}$}, which would be the elements in $\set{Y}$ that are not in $\set{X}$. In other words, the relative complement is the complement of a set if a new set was taken to be the universal set. \paragraph{Symmetric Difference:} The \symdef[\emph{symmetric difference}]{Csets.204}{setsymdiff}{$\set{X} \symdiff \set{Y}$}{symmetric difference of sets $\set{X}$ and $\set{Y}$ (\ie, an exclusive union; $(\set{X} \cup \set{Y}) \setdiff (\set{Y} \cap \set{X})$)} of two arbitrary sets is the set of elements taken from both (\ie, the union) that are not common to both (\ie, not the intersection). That is, take arbitrary sets $\set{X}$ and $\set{Y}$ that are each subsets of universal set $\set{U}$. The symmetric difference between them, denoted $\set{X} \symdiff \set{Y}$, is defined by % \begin{equation*} \set{X} \symdiff \set{Y} \triangleq \{ z \in \set{U}: z \in \set{X} \cup \set{Y} \text{ and } z \notin \set{X} \cap \set{Y} \} \end{equation*} % Of course, since $\set{X} \cup \set{Y} \subseteq \set{U}$, it is equivalent to say % \begin{equation*} \set{X} \symdiff \set{Y} = \{ z \in \set{X} \cup \set{Y} : z \notin \set{X} \cap \set{Y} \} \end{equation*} % For reasons explained in \longref{app:math_algebras_of_sets}, this operation will sometimes be called set addition and be denoted $\set{X} + \set{Y}$. However, because some authors denote set union with the $+$ operator, the alternate notation $\set{X} \oplus \set{Y}$ may be used. On the other hand, the operator $\oplus$ is identified with other operations, and thus $\symdiff$ may be the best choice of notation. It is easy to show that % \begin{itemize} \item $\set{X} \symdiff \set{Y} = (\set{X} \cup \set{Y}) \cap (\set{X} \cap \set{Y})^c$ \item $\set{X} \symdiff \set{Y} = (\set{X} \cup \set{Y}) \cap (\set{X}^c \cup \set{Y}^c)$ \item $\set{X} \symdiff \set{Y} = (\set{X} \cup \set{Y}) \setdiff (\set{X} \cap \set{Y})$ \item $\set{X} \symdiff \set{Y} = (\set{X} \setdiff \set{Y}) \cup (\set{Y} \setdiff \set{X})$ \item $\set{X} \symdiff \set{Y} = (\set{X} \cap \set{Y}^c) \cup (\set{Y} \cap \set{X}^c)$ \end{itemize} % We will use the symmetric difference rarely; however, it is important when viewing sets in an algebraic context like the ones described in \longrefs{app:math_abstract_algebra} and \shortref{app:math_linear_algebra}. In particular, it will be important to note that % \begin{itemize} \item $\set{X}^c = \set{U} \symdiff \set{X}$ \item $\set{X} \cup \set{Y} = \set{X} \symdiff \set{Y} \symdiff ( \set{X} \cap \set{Y} )$ \item $\set{X} \setdiff \set{Y} = ( \set{U} \symdiff \set{Y} ) \cap \set{X}$ \end{itemize} % That is, set complement, set union, and set difference can all be built from set symmetric difference and set intersection. Therefore, an analysis of the structure of sets of sets need only be concerned with these two operations. \subsection{Partitions of Sets} \label{app:math_sets_partitions} Take \emph{non-empty} sets $\set{X}$, $\set{Y}$ and $\set{Z}$ with $\set{X} \subseteq \set{Z}$ and $\set{Y} \subseteq \set{Z}$. % \begin{itemize} \item If $\set{X} \cap \set{Y} = \emptyset$ (\ie, sets $\set{X}$ and $\set{Y}$ have no common elements) then sets $\set{X}$ and $\set{Y}$ are said to be \emph{mutually exclusive} or \emph{(pairwise) disjoint}. That is, $\set{X}$ and $\set{Y}$ are \emph{disjoint sets}. \item If $\set{X} \cup \set{Y} = \set{Z}$ then sets $\set{X}$ and $\set{Y}$ are said to be \emph{collectively exhaustive} in set $\set{Z}$. \item If sets $\set{X}$ and $\set{Y}$ are both mutually exclusive and collectively exhaustive in $\set{Z}$ then sets $\set{X}$ and $\set{Y}$ are said to \emph{partition} set $\set{Z}$. In this case, every element $z \in \set{Z}$ is an element of exactly one of the sets $\set{X}$ and $\set{Y}$. This \emph{partition} of set $\set{Z}$ is denoted as the set $\{ \set{X}, \set{Y} \}$. \end{itemize} % While these definitions have been given in terms of two non-empty subsets, they apply to collections of any number of non-empty sets. \paragraph{Mutually Exclusive and Pairwise Disjoint:} Technically, two sets $\set{X}$ and $\set{Y}$ are said to be \emph{disjoint} if $\set{X} \cap \set{Y} = \emptyset$. For a set of sets $\{ \set{X}, \set{Y}, \set{Z} \}$, the collection of sets is said to be \emph{mutually exclusive} or \emph{mutually disjoint} or \emph{pairwise disjoint} if any two sets has an empty intersection. For example, for the infinite set of indexed sets % \begin{equation*} \{ \set{X}_1, \set{X}_2, \set{X}_3, \set{X}_4, \set{X}_5, \dots \} \end{equation*} % the sets are said to be \emph{pairwise disjoint} if for any $i,j \in \N$ with $i \neq j$, it is the case that $\set{X}_i \cap \set{X}_j = \emptyset$. If it is also the case that the union of these sets is equal to set $\set{Y}$ then these sets are said to \emph{partition} $\set{Y}$ and the set of sets forms a \emph{partition} of $\set{Y}$. \subsection{Geometric Interpretation of Set Operations} \label{app:math_sets_venn_diagram} In \longrefs{fig:functions} and \shortref{fig:function_comps} in \longref{app:math_functions}, we made use of graphical depictions of sets in order to make function mappings more intuitive. There are similar diagrams for set operations that can be applied very generally. An understanding of these diagrams allows for quick justification of the statements made in \longrefs{app:math_sets_cadso} and \shortref{app:math_sets_dml}. The diagrams that represent sets are often types of \emph{Euler diagrams}. This type of set diagram can be very useful when dealing with propositional logic, the topic of \longref{app:math_logic}. Because of this, versions of these diagrams used explicitly with logic are known as \emph{Johnston diagrams}. When a Euler diagram is used to show all possible relationships (\ie, union, intersection, and others) among a number of sets, the diagram is commonly known as a \emph{Venn diagram}. The combination of diagrams in \longref{fig:venn_diagrams} depict a single Venn diagram shaded in six different ways. That is, all six diagrams depict arbitrary sets $\set{X}$ and $\set{Y}$ which are subsets of universal set $\set{U}$. The three sets $\set{X}$, $\set{Y}$, and $\set{U}$ are each shown as squares with solid borders. The $\set{X}$ and $\set{Y}$ squares overlap to indicate that sets $\set{X}$ and $\set{Y}$ may have some shared elements. The $\set{X}$ and $\set{Y}$ squares are both located within the $\set{U}$ square to indicate that all elements of sets $\set{X}$ and $\set{Y}$ are also elements of universal set $\set{U}$. Each of the six diagrams are identical except for the shading, which selects elements that result from the set operation in question. For example, \longref{fig:venn_set} is shaded to select only elements from set $\set{X}$. The large region in \longref{fig:venn_complement} shows the elements of the universal set $\set{U}$ that are not elements of set $\set{X}$ (\ie, the complement of $\set{X}$). The region in \longref{fig:venn_union} shows the elements that are either members of set $\set{X}$ or $\set{Y}$ or both (\ie, the union of $\set{X}$ and $\set{Y}$). The small region in \longref{fig:venn_intersection} shows the few shared elements of sets $\set{X}$ and $\set{Y}$ (\ie, the intersection of $\set{X}$ and $\set{Y}$). The region in \longref{fig:venn_difference} shows the elements of set $\set{X}$ that are not elements of $\set{Y}$ (\ie, the difference $\set{X} \setdiff \set{Y}$). Finally, the region in \longref{fig:venn_symdifference} shows elements that are members of $\set{X}$ or $\set{Y}$ but not members of both (\ie, the symmetric difference $\set{X} \symdiff \set{Y}$). % \begin{figure}[!ht]\centering \subfloat[Set $\set{X}$]{ \begin{picture}(120,120)(-10,-10) \definecolor{white}{gray}{1} \definecolor{gray}{gray}{.5} \thicklines \put(0,0){\fcolorbox{black}{white} {\makebox(100,100)[tr]{\text{$\set{U}$}}}} \put(23,48){\fcolorbox{black}{gray} {\makebox(30,30){\text{$\set{X}$}}}} \put(48,23){\fcolorbox{black}{white} {\makebox(30,30){\text{$\set{Y}$}}}} \put(48,48){\fcolorbox{black}{gray} {\makebox(5,5){}}} \end{picture} \label{fig:venn_set} } \quad \subfloat[Set Complement $\set{X}^c$]{ \begin{picture}(120,120)(-10,-10) \definecolor{white}{gray}{1} \definecolor{gray}{gray}{.5} \thicklines \put(0,0){\fcolorbox{black}{gray} {\makebox(100,100)[tr]{\text{$\set{U}$}}}} \put(23,48){\fcolorbox{black}{white} {\makebox(30,30){\text{$\set{X}$}}}} \put(48,23){\fcolorbox{black}{gray} {\makebox(30,30){\text{$\set{Y}$}}}} \put(48,48){\fcolorbox{black}{white} {\makebox(5,5){}}} \end{picture} \label{fig:venn_complement} }\\ \medskip \subfloat[Set Union $\set{X} \cup \set{Y}$]{ \begin{picture}(120,120)(-10,-10) \definecolor{white}{gray}{1} \definecolor{gray}{gray}{.5} \thicklines \put(0,0){\fcolorbox{black}{white} {\makebox(100,100)[tr]{\text{$\set{U}$}}}} \put(23,48){\fcolorbox{black}{gray} {\makebox(30,30){\text{$\set{X}$}}}} \put(48,23){\fcolorbox{black}{gray} {\makebox(30,30){\text{$\set{Y}$}}}} \put(48,48){\fcolorbox{black}{gray} {\makebox(5,5){}}} \end{picture} \label{fig:venn_union} } \quad \subfloat[Set Intersection $\set{X} \cap \set{Y}$]{ \begin{picture}(120,120)(-10,-10) \definecolor{white}{gray}{1} \definecolor{gray}{gray}{.5} \thicklines \put(0,0){\fcolorbox{black}{white} {\makebox(100,100)[tr]{\text{$\set{U}$}}}} \put(23,48){\fcolorbox{black}{white} {\makebox(30,30){\text{$\set{X}$}}}} \put(48,23){\fcolorbox{black}{white} {\makebox(30,30){\text{$\set{Y}$}}}} \put(48,48){\fcolorbox{black}{gray} {\makebox(5,5){}}} \end{picture} \label{fig:venn_intersection} }\\ \medskip \subfloat[Set Difference $\set{X} \setdiff \set{Y}$]{ \begin{picture}(120,120)(-10,-10) \definecolor{white}{gray}{1} \definecolor{gray}{gray}{.5} \thicklines \put(0,0){\fcolorbox{black}{white} {\makebox(100,100)[tr]{\text{$\set{U}$}}}} \put(23,48){\fcolorbox{black}{gray} {\makebox(30,30){\text{$\set{X}$}}}} \put(48,23){\fcolorbox{black}{white} {\makebox(30,30){\text{$\set{Y}$}}}} \put(48,48){\fcolorbox{black}{white} {\makebox(5,5){}}} \end{picture} \label{fig:venn_difference} } \quad \subfloat[Symmetric Difference $\set{X} \symdiff \set{Y}$]{ \begin{picture}(120,120)(-10,-10) \definecolor{white}{gray}{1} \definecolor{gray}{gray}{.5} \thicklines \put(0,0){\fcolorbox{black}{white} {\makebox(100,100)[tr]{\text{$\set{U}$}}}} \put(23,48){\fcolorbox{black}{gray} {\makebox(30,30){\text{$\set{X}$}}}} \put(48,23){\fcolorbox{black}{gray} {\makebox(30,30){\text{$\set{Y}$}}}} \put(48,48){\fcolorbox{black}{white} {\makebox(5,5){}}} \end{picture} \label{fig:venn_symdifference} } \caption[Graphical Interpretation of Set Operations]{Shaded regions depict operation result.} \label{fig:venn_diagrams} \end{figure} \subsection{Commutativity, Associativity, and Distributivity of Set Operations} \label{app:math_sets_cadso} \paragraph{Commutativity:} The order of the union, intersection, or symmetric difference of two sets has no impact on the outcome of the operation. In other words, set intersection, set union, and set symmetric difference are all \emph{commutative} operations. That is, for sets $\set{X}$ and $\set{Y}$, % \begin{equation*} \set{X} \cup \set{Y} = \set{Y} \cup \set{X} \quad \text{ and } \quad \set{X} \cap \set{Y} = \set{Y} \cap \set{X} \quad \text{ and } \quad \set{X} \symdiff \set{Y} = \set{Y} \symdiff \set{X} \end{equation*} % This can easily be seen in \longrefs{fig:venn_union} and \shortref{fig:venn_intersection} as the area shaded does not vary with the order of the arguments of the operation. \paragraph{Associativity:} When taking the union, intersection, or symmetric difference of a group of sets, the result will not be impacted by the order in which the operations were applied. In other words, set intersection, set union, and set symmetric difference are all \emph{associative} operations. That is, for sets $\set{X}$, $\set{Y}$, and $\set{Z}$, % \begin{equation*} \set{X} \cup (\set{Y} \cup \set{Z}) = (\set{X} \cup \set{Y}) \cup \set{Z} \end{equation*} % and % \begin{equation*} \set{X} \cap (\set{Y} \cap \set{Z}) = (\set{X} \cap \set{Y}) \cap \set{Z} \end{equation*} % and % \begin{equation*} \set{X} \symdiff (\set{Y} \symdiff \set{Z}) = (\set{X} \symdiff \set{Y}) \symdiff \set{Z} \end{equation*} % where parentheses are used as grouping symbols to indicate which operation should be completed first. \paragraph{Distributivity of Intersection and Union:} Set operations can distribute across grouping symbols. In other words, set intersection \emph{distributes} over set union, and set union distributes over set intersection. That is, for sets $\set{X}$, $\set{Y}$, and $\set{Z}$, % \begin{equation*} \set{X} \cup (\set{Y} \cap \set{Z}) = (\set{X} \cup \set{Y}) \cap (\set{X} \cup \set{Z}) \quad \text{ and } \quad \set{X} \cap (\set{Y} \cup \set{Z}) = (\set{X} \cap \set{Y}) \cup (\set{X} \cap \set{Z}) \end{equation*} \paragraph{Distributivity of Intersection over Symmetric Difference:} The set intersection operation also distributes over symmetric difference. That is, for sets $\set{X}$, $\set{Y}$, and $\set{Z}$, % \begin{equation*} \set{X} \cap (\set{Y} \symdiff \set{Z}) = (\set{X} \cap \set{Y}) \symdiff (\set{X} \cap \set{Z}) \end{equation*} \subsection{The Set-Theoretic De Morgan's Laws} \label{app:math_sets_dml} For any two sets $\set{X}$ and $\set{Y}$, it is always the case that % \begin{equation*} ( \set{X} \cap \set{Y} )^c = \set{X}^c \cup \set{Y}^c \end{equation*} % which can also be written in terms of the universal set $\set{U}$ as % \begin{equation*} \set{U} \setdiff ( \set{X} \cap \set{Y} ) = ( \set{U} \setdiff \set{X} ) \cup ( \set{U} \setdiff \set{Y} ) \end{equation*} % This can be verified using the diagrams in \longref{fig:venn_diagrams}. This relationship is particularly important to applications of propositional logic, and so often the term \emph{De Morgan's Laws} implies a logical context. \section{Propositional Logic} \label{app:math_logic} The topic of \emph{propositional (or sentential) logic} provides a general method for analytical reasoning, and so we need to introduce logic as a tool to justify our claims. Here, we mean to define the vocabulary we use in those claims. For example, the phrases \emph{if and only if} and \emph{implies} will be defined here. Thus, our discussion of logic is less formal and less complete than our discussions of other mathematical constructs in this \appname{}. \Citet{Martin04} and \citet{Gabbay02} provide concise summaries of symbolic logic, and \citet{Hinman05} gives a more formal mathematical treatment. As already mentioned, \citet{Stoll79} explicitly integrates logic with set theory and algebra. We connect logic to algebra briefly in \longref{app:math_prop_logic_boolean_algebra}. Additionally, we connect sets to algebra in \longref{app:math_algebras_of_sets}. The connection between set theory and logic is through the algebra that analyzes their common structures. \subsection{Sentences} In propositional logic (also known as sentential logic), a \emph{sentence} is a statement that can \emph{independently} be said to be either \emph{true} or \emph{false} (but not both nor some other truth value). For example, ``there exists a boy on Earth with a certain color jacket'' cannot be a sentence because it cannot be said to be true or false without knowing the color to which ``certain color'' refers. However, ``there exists a boy on Earth with a red jacket'' is a sentence; similarly, ``for any color, there exists a boy on Earth with a jacket of that color'' is also a sentence. Both of these can be evaluated as true or false without needing any additional information. Sentences can also be specified in terms of mathematical relationships. For example, ``$1+1=5$'' is a false sentence, and ``$2+2=4$'' is a true sentence. Similarly, ``for any number $x$, $1+x=5$'' is a false sentence and ``there exists a number $x$ such that $1+x=5$'' is a true sentence. However, ``$1+x=5$'' alone is not a sentence because its truth cannot be evaluated without knowing $x$. For simplicity, sentences will often be defined symbolically. For example, consider the definitions: % \begin{align*} p &\triangleq \text{$2+2=4$}\\ q &\triangleq \text{$4-2=2$}\\ r &\triangleq \text{Joe eats with a fork.}\\ s &\triangleq \text{Everyone on Earth eats with a fork.}\\ t &\triangleq \text{Today, the sky on Earth is blue.} \end{align*} % where Joe is a person on Earth. We will use these definitions in examples below. Note that a symbolic sentence ``$a$'' is true only when $a$. That is, since $2+2=4$ (\ie, $p$), the sentence ``$p$'' (\ie, ``$2+2=4$'') is true. For brevity, phrases like ``for all'' and ``for any'' are often replaced with the symbol \symdef{Elogic.exists0}{forall}{$\forall$}{for all/any}. Also, phrases like ``there exists'' are often replaced with the symbol \symdef{Elogic.exists1}{exists}{$\exists$}{there exists}. Similarly, phrases like ``there does not exist'' will be replaced with the symbol \symdef{Elogic.exists1}{nexists}{$\nexists$}{there does not exist}. The phrase ``there exists a unique'' (\ie, implying the existence of one and only one) is represented by the symbol \symdef{Elogic.exists1}{existsunique}{$\exists \bang$}{there exists a unique}. \subsection{Logical Connectives and Compound Sentences} Propositional logic is a \emph{truth-functional logic} because sentences can be combined to make \emph{compound sentences} whose ultimate truth depends \emph{only} upon the truth of their constituent sentences. In other words, these compound sentences can be thought of as functions mapping the truth of their constituents to some ultimate truth. These compound sentences are constructed with \emph{logical connectives}. These logical connectives are also known as \emph{logical operators} and may be defined using the same constructs as other algebraic operators. We describe the most common connectives here. Just as the order of operations can be made explicit or changed with grouping symbols (\eg, $($ and $)$) in arithmetic, those same grouping symbols can be used in compound sentences for analogous reasons. \paragraph{And and Or:} The sentence ``$p \text{ and } q$'' joins sentences ``$p$'' and ``$q$'' to form the compound sentence ``$(2+2=4) \text{ and } (4-2=2)$'' which is only true if both $2+2=4$ and $4-2=2$ (\ie, ``$p \text{ and } q$'' is only true if $p$ and $q$). Similarly, the sentence ``$s \text{ or } t$'' is true if ``$s$'' is true, ``$t$'' is true, or both ``$s$'' and ``$t$'' are true (\ie, this is an \emph{inclusive or}). That is, $s \text{ or } t$ only when $s$ or $t$ or both $s$ and $t$. The symbols $\land$ and $\lor$ are often used to represent \emph{and} and \emph{or} respectively. These symbols are related to the symbols $\cap$ and $\cup$ respectively, which were introduced in \longref{app:math_sets}; in fact, sometimes $\land$ and $\lor$ are replaced with $\cap$ and $\cup$ respectively. \paragraph{Negation:} For $a$, the sentence ``$\neg a$'' (\ie, ``not $a$'') is the \emph{logical negation} of sentence ``$a$'' and is only true when $a$ is not the case. That is, ``$\neg p$'' is a false sentence since it is not the case that $2+2 \neq 4$. Note that the negation of a sentence that includes ``every'' (\ie, $\forall$) is usually a sentence that involves ``there is'' (\ie, $\exists$). For example, ``$\neg s$'' might be written, ``There is no one on earth who eats with a fork,'' which is most likely a false sentence. For reasons relating to the material in \longref{app:math_sets}, the logical negation is also known as the \emph{logical complement} or simply the \emph{complement}. In these cases, $\neg a$ might be denoted $a^c$. \paragraph{Implication:} The \symdef[\emph{logical implication}]{Elogic}{implies}{$\implies$}{logical implication} connective for $a$ and $b$ forms the sentence ``$a \implies b$'' (\ie, ``$a \text{ implies } b$''), which is true when $a$ is not the case or $b$ is the case (\ie, $\neg a$ or $b$). In other words, for $a$ and $b$, ``$a \implies b$'' represents the sentence ``if $a$ then $b$,'' which can only be shown to be false when ``$a$'' is true and ``$b$'' is not true. Thus, $p \implies q$ and $q \implies p$ (\ie, both ``$p \implies q$'' and ``$q \implies p$'' are true sentences). However, while $s \implies r$, it is not the case that $r \implies s$. \paragraph{Equivalence:} The \symdef[\emph{logical equivalence}]{Elogic}{iff}{$\iff$}{logical equivalence} connective for $a$ and $b$ forms the sentence ``$a \iff b$'' (\ie, ``$a \text{ is equivalent to } b$'' or ``$a \text{ if and only if } b$'') and is only true when $a \implies b$ and $b \implies a$. In other words, for $a$ and $b$, if $a \iff b$ then $a$ and $b$ are equivalent sentences (with respect to their logic). Thus, $p \iff q$; however, it is not the case that $r \iff s$. The symbol $\equiv$ may sometimes be used instead of the symbol $\iff$, but $=$ is usually not an appropriate replacement. In summary, for $a$ and $b$, % \begin{equation*} \left( (a \implies b) \text{ and } (b \implies a) \right) \iff \left( (\neg a \text{ or } b) \text{ and } (\neg b \text{ or } a) \right) \iff (a \iff b) \end{equation*} % is always the case. \subsection{Converse, Inverse, and Contraposition} Take arbitrary $a$ and $b$ and the sentence ``$a \implies b$.'' The \emph{inverse} of the sentence is ``$\neg a \implies \neg b$.'' The \emph{converse} of the sentence is ``$b \implies a$.'' The \emph{contrapositive} of the sentence is the inverse of its converse, which is ``$\neg b \implies \neg a$.'' It is the case that % \begin{equation*} (a \implies b) \iff (\neg b \implies \neg a) \end{equation*} % That is, any sentence is logically equivalent to its contrapositive. This is not necessarily the case for its inverse and its converse. For example, if ``everyone on Earth eats with a fork,'' then ``Joe eats with a fork'' (\ie, $s \implies r$); however, if ``Joe does not eat with a fork'' then ``There is someone on earth who does not eat with a fork'' (\ie, $\neg r \implies \neg s$). \subsection{Commutativity, Associativity, and Distributivity of Logic Operations} \label{app:math_logic_cadso} \paragraph{Commutativity of And and Or:} The order of the arguments of $\land$ (\ie, \emph{and}) or $\lor$ (\ie, \emph{or}) has no impact on the outcome of the operation. In other words, $\land$ and $\lor$ are \emph{commutative} logic operations. That is, for $x$ and $y$, % \begin{equation*} x \lor y = y \lor x \quad \text{ and } \quad x \land y = y \land x \end{equation*} \paragraph{Associativity of And and Or:} A chain of three or more of the $\land$ logical connective can be evaluated in any order. Similarly, a chain of three or more of the $\lor$ logical connective can be evaluated in any order. In other words, $\land$ and $\lor$ are \emph{associtiave} operations. That is, for $x$, $y$, and $z$, % \begin{equation*} x \lor (y \lor z) = (x \lor y) \lor z \quad \text{ and } \quad x \land (y \land z) = (x \land y) \land z \end{equation*} % where parentheses are used as grouping symbols to indicate which operation should be completed first. \paragraph{Distributivity of And and Or:} Logic operations can distribute across grouping symbols. In other words, logical and \emph{distributes} over logical or, and logical or distributes over logical and. That is, for $x$, $y$, and $z$, % \begin{equation*} x \lor (y \land z) = (x \lor y) \land (x \lor z) \quad \text{ and } \quad x \land (y \lor z) = (x \land y) \lor (x \land z) \end{equation*} \subsection{The Logical De Morgan's Laws} \label{app:math_logic_dml} For $a$ and $b$, it is always the case that % \begin{equation*} \neg ( a \text{ and } b ) \iff \neg a \text{ or } \neg b \end{equation*} % In other words, to say that both $a$ and $b$ are not the case is the same as saying that neither $a$ nor $b$ are the case. \subsection{Application of Logic to Mathematical Proof} \label{eq:math_logic_application_proof} It is often necessary to prove that given some $a$, $b$ is either a logical consequence of $a$ (\ie, $a \implies b$) or equivalent to $a$ (\ie, $a \iff b$). To prove that $a$ is equivalent to $b$, it is necessary to prove $a \implies b$ and $b \implies a$. Thus, most mathematical proof involves showing logical implication. To prove that $a \implies b$, the methods of \emph{modus ponens} or \emph{modus tollens} can be used. The former method assumes $a$ and asserts $b$. The latter method, also called \emph{proof by contraposition}, assumes $\neg b$ and asserts $\neg a$. We consider both proof methods to be equally valid; however, this is the subject of sophisticated debate among logicians. \section{Order Theory} \label{app:math_order_theory} In \longref{app:math_sets}, we introduced the set, one of the most fundamental constructs of mathematics, and hinted at ways in which sets are used to construct the numbers and arithmetic, the subjects of \longref{app:math_numbers}. To complete this picture, we must first introduce concepts from \emph{order theory} which allow elements of sets to be compared. \subsection{Relations} \label{app:math_relations} Take sets $\set{X}$ and $\set{Y}$. The \emph{relation} $\rel{R}$ is the ordered triple defined % \begin{equation*} {\rel{R}} \triangleq (\set{X}, \set{Y}, \set{G}) \end{equation*} % where $\set{G} \subseteq \set{X} \times \set{Y}$ is called the \emph{graph} of the relation $\rel{R}$. For two elements $x \in \set{X}$ and $y \in \set{Y}$, if $(x,y) \in \set{G}$ then $x$ is said to be \emph{$\rel{R}$-related} to $y$, which is denoted % \begin{equation*} x \rel{R} y \end{equation*} % In other words, if $x$ is $\leq$-related to $y$ then the more familiar notation % \begin{equation*} x \leq y \end{equation*} % is used. Because of its familiarity, being $\leq$-related is often called being \emph{less than or equal to} (\eg, $x$ is less than or equal to $y$). We will specifically define the $\leq$ relation for familiar numerical sets later. \paragraph{Examples:} For example, define the relation ${<} \triangleq (\{0,1\},\{0,1\},\set{G})$ where the graph $\set{G} \subset \{0,1\}^2$ is defined with % \begin{equation*} \set{G} \triangleq \{(0,1),(0,2),(1,2)\} \end{equation*} % Thus, it is the case that $0 < 1$, $0 < 2$, and $1 < 2$. The $<$ relation might be called the \emph{less than} relation. More abstractly, define the relation ${\prec} \triangleq (\{a,b\},\{c,d\},\set{G})$ where the graph $\set{G} \subset \{a,b\} \times \{c,d\}$ is defined with % \begin{equation*} \set{G} \triangleq \{(a,c),(b,d)\} \end{equation*} % Thus, it is the case that $a \prec c$ and $b \prec d$. However, there is no $\prec$ relationship between $a$ and $d$, and there is no $\prec$ relationship between $b$ and $c$. \paragraph{Chains of Relations:} Take a set $\set{X}$ equipped with the relations $\rel{R}$ and $\rel{S}$. Take the elements $x,y,z \in \set{X}$. The notation % \begin{equation*} x \rel{R} y \rel{S} z \end{equation*} % indicates that % \begin{equation*} x \rel{R} y \quad \text{ and } \quad y \rel{S} z \end{equation*} % That is, the former notation that shows a chain of relations is a shorthand for the latter notation that links many relationships logically. \paragraph{Set Relations and the Power Set:} We have already informally defined the relations $=$, $\subseteq$, $\supseteq$, $\subset$, and $\supset$ for sets. Recall that whenever sets are defined, a universal set needs to at least be implicitly defined. For example, define a universal set $\set{U}$ to be a superset of all possible sets of interest. That is, for any set $\set{X}$ and $\set{Y}$, $\set{X} \subseteq \set{U}$ and $\set{Y} \subseteq \set{U}$. Note that any subset of $\set{U}$ is an element of the power set $\Pow(\set{U})$. That is, % \begin{equation*} \Pow(\set{U}) = \{ \set{X} : x \in \set{X} \text{ implies } x \in \set{U} \} \end{equation*} % In fact, the power set can be viewed as a universal set for all subsets of $\set{U}$. Therefore, any relation $\rel{R}$ between two sets must take the form % \begin{equation*} {\rel{R}} = ( \Pow(\set{U}), \Pow(\set{U}), \setset{G} ) \end{equation*} % where $\setset{G} \subseteq \Pow(\set{U}) \times \Pow(\set{U})$. That is, for $\set{X} \subseteq \set{U}$ and $\set{Y} \subseteq \set{U}$ (\ie, $\set{X},\set{Y} \in \Pow(\set{U})$), % \begin{itemize} \item ${=} \triangleq ( \Pow(\set{U}), \Pow(\set{U}), \setset{G} )$ where $(\set{X},\set{Y}) \in \setset{G}$ if and only if $p \in \set{X}$ implies $p \in \set{Y}$ and $p \in \set{Y}$ implies $p \in \set{X}$ \item ${\subseteq} \triangleq ( \Pow(\set{U}), \Pow(\set{U}), \setset{G})$ where $(\set{X},\set{Y}) \in \setset{G}$ if and only if for all $p \in \set{X}$, $p \in \set{Y}$ \item ${\supseteq} \triangleq ( \Pow(\set{U}), \Pow(\set{U}), \setset{G})$ where $(\set{X},\set{Y}) \in \setset{G}$ if and only if for all $p \in \set{Y}$, $p \in \set{X}$ \item ${\subset} \triangleq ( \Pow(\set{U}), \Pow(\set{U}), \setset{G} )$ where $(\set{X},\set{Y}) \in \setset{G}$ if and only if there exists a $q \in \set{Y}$ such that $q \notin \set{X}$ and for all $p \in \set{X}$, $p \in \set{Y}$ \item ${\supset} \triangleq ( \Pow(\set{U}), \Pow(\set{U}), \setset{G} )$ where $(\set{X},\set{Y}) \in \setset{G}$ if and only if there exists a $q \in \set{X}$ such that $q \notin \set{Y}$ and for all $p \in \set{Y}$, $p \in \set{X}$ \end{itemize} % This shows one of the many important uses of the power set. \paragraph{Notation:} The shorthand notation $(\set{X}, {\rel{R}})$ indicates that the set $\set{X}$ is \emph{equipped} with the relation $\rel{R} = (\set{X},\set{X},\set{G})$. In other words, $(\set{X}, {\rel{R}})$ communicates that mutual elements of set $\set{X}$ are $\rel{R}$-related by the graph $\set{G}$. Note that the $=$ relation is typically equipped with all sets as its definition is well understood and can be easily applied. Similarly, as we will show, familiar relations like $\leq$ are typically assumed to be equipped with familiar sets like $\W$. That is, it is rare to see these sets and these relations grouped together explicitly; however, it is assumed that $\leq$ is provided with the standard definition. \subsection{Equivalence Relations on a Set} \label{app:math_equivalence_relations} An \emph{equivalence relation} on a set $\set{X}$ is a relation $\sim$ so that for $x,y,z \in \set{X}$, % \begin{itemize} \item $x \sim x$ \item if $x \sim y$ then $y \sim x$ \item if $x \sim y$ and $y \sim z$ then $y \sim z$ \end{itemize} % As mentioned above in \longref{app:math_relations}, the equivalence relation $=$ defined for sets is that a set $\set{X} = \set{Y}$ if and only if $\set{X} \subseteq \set{Y}$ and $\set{Y} \subseteq \set{X}$. It can be shown that this relation satisfies the three criteria for an equivalence relation. \subsection{Equivalence Class} \label{app:equivalence_class} Take a set $\set{X}$ equipped with an \emph{equivalence relation} $\sim$ (\eg, for set $\W$, $\sim$ might be replaced with $=$). Take an element $a \in \set{X}$. The \emph{equivalence class} of $a \in \set{X}$ is denoted \symdef{Csets.3}{equivclass}{${[a]}$}{equivalence class (\eg, $\{x \in \set{X} : x = a \}$)} and is defined % \begin{equation*} [a] \triangleq \{ x \in \set{X} : x = a \} \end{equation*} % That is, $[a] \subseteq \set{X}$ is a subset of set $\set{X}$ in which any two elements from $[a]$ are equivalent by the equivalence relation $\sim$. Therefore any set $\set{X}$ equipped with equivalence relation $\sim$, which can be denoted $(\set{X}, {\sim})$, has equivalence classes of the form $[x]$ for every $x \in \set{X}$. Clearly, for two elements $x,y \in \set{X}$, it is the case that % \begin{equation*} x \sim y \text{ if and only if } [x] \sim [y] \end{equation*} % and so there may be many representations of the same equivalence class. That is, for $(\set{X},{\sim})$ and any equivalence class $[x]$, it is the case that % \begin{equation*} [x] = [y] \text{ for all } y \in [x] \end{equation*} % where $=$ is the equivalence relation defined for sets. That is, $[x] = [y]$ if and only if $[x] \subseteq [y]$ and $[y] \subseteq [x]$. For $(\set{X},{\sim})$, the set of all equivalence classes \emph{induced} by equivalence relation $\sim$ is denoted \symdef{Csets.31}{quotientset}{$\set{X}/{=}$}{quotient set induced by set $\set{X}$ over relation $=$ (\ie, set of all $\sim$ equivalence classes in $\set{X}$)} and is defined and is called the \emph{quotient set} of $\set{X}$ by $\sim$. That is, % \begin{equation*} \set{X}/{\sim} \triangleq \{ [x] : x \in \set{X} \} \end{equation*} % In fact, the quotient set $\set{X}/{\sim}$ is a partition of $\set{X}$. That is, for any two equivalence classes $[x],[y] \in \set{X}/{\sim}$, it must be either that $[x] = [y]$ or $[x] \cap [y] = \emptyset$. Additionally, the union of all sets in $\set{X}/{\sim}$ is the set $\set{X}$. In other words, for a set $\set{X}$ with equivalence relation $\sim$, the equivalence relation \emph{divides} the set into disjoint subsets that collectively exhaust $\set{X}$. It is this \emph{division} that motivates the notation $\set{X}/{\sim}$ which shows $\set{X}$ being \emph{divided by} the equivalence relation $\sim$. \subsection{Preorder Relations on a Set} A \emph{preorder relation} on a set $\set{X}$ is a relation $\rel{R}$ (the symbol $\leq$ is often used) so that for $x,y,z \in \set{X}$, % \begin{enumerate}[(i)] \item $x \rel{R} x$ \label{item:preorder_reflexivity} \item if $x \rel{R} y$ and $y \rel{R} z$ then $y \rel{R} z$ \label{item:preorder_transitivity} \end{enumerate} % In this case, the set $\set{X}$ is called a \emph{preordered set}. To indicate the preorder relation on set $\set{X}$, it will often be written that $(\set{X},{\rel{R}})$ is a preordered set. \paragraph{Preorders as Equivalence Relations:} Take $(\set{X},{\rel{R}})$ to be a preorder. Assume that it is the case that for all $x,y \in \set{X}$, if $x \rel{R} y$ then $y \rel{R} x$. In this case, the preorder relation ${\leq}$ is an equivalence relation. \subsection{Directed Sets} A \emph{direction} on a \emph{nonempty} set $\set{X}$ is a relation $\leq$ so that for $x,y,z \in \set{X}$, % \begin{enumerate}[(i)] \item $x \leq x$ \label{item:directed_reflexivity} \item if $x \leq y$ and $y \leq z$ then $x \leq z$ \label{item:directed_transitivity} \item there exists $t \in \set{X}$ such that $x \leq t$ and $y \leq t$ \label{item:directed_directedness} \end{enumerate} % In this case, the nonempty set $\set{X}$ is called a \emph{directed set}, and it is said that $\set{X}$ is \emph{directed} by the relation $\leq$. To indicate the direction relation on set $\set{X}$, it may be written that $(\set{X},{\leq})$ is a directed set. Note that all directed sets are preordered sets. \paragraph{Downward Directed Sets:} Take a nonempty set $\set{X}$ and a relation $\leq$ so that for $x,y,z \in \set{X}$, % \begin{enumerate}[(i)] \item $x \leq x$ \label{item:downdirected_reflexivity} \item if $z \leq y$ and $y \leq x$ then $z \leq x$ \label{item:downdirected_transitivity} \item there exists $t \in \set{X}$ such that $t \leq x$ and $t \leq y$ \label{item:downdirected_directedness} \end{enumerate} % In this case, the set $\set{X}$ is said to be \emph{downward directed} by the relation $\leq$. Now assume that there is a relation $\geq$ such that for any $x,y \in \set{X}$, $x \leq y$ is equivalent to $y \geq x$. In that case, $(\set{X},{\leq})$ is a downward directed set if and only if $(\set{X},{\geq})$ is a directed set, and $(\set{X},{\leq})$ is a directed set if and only if $(\set{X},{\geq})$ is a downward directed set. \subsection{Partial Order Relations on a Set} \label{app:partial_order_relations} A \emph{partial order relation} on a set $\set{X}$ already equipped with equivalence relation $=$ is a relation $\leq$ so that for $x,y,z \in \set{X}$, % \begin{enumerate}[(i)] \item $x \leq x$ \label{item:poset_reflexivity} \item if $x \leq y$ and $y \leq x$ then $y = x$ \label{item:poset_antisymmetry} \item if $x \leq y$ and $y \leq z$ then $x \leq z$ \label{item:poset_transitivity} \end{enumerate} % In this case, the set $\set{X}$ is called a \emph{partially ordered set} or a \emph{poset}. It is common to say that $(\set{X},{\leq})$ is a poset, which indicates that the set is ordered by the partial order relation $\leq$. Note that it is not the case that any two elements from a poset can be compared. If any two elements from a poset can be compared then that poset is called a \emph{totally ordered set}. Additionally, clearly properties (\shortref{item:poset_reflexivity}) and (\shortref{item:poset_transitivity}) make any partially ordered set a preoredered set as well. \symdef[]{Ageneral.5}{ineq}{$\leq$ ($\geq$)}{less (greater) than or equal to}\symdef[]{Ageneral.5}{strictineq}{$<$ ($>$)}{strictly less (greater) than}The symbol $\leq$ typically indicates that an element is \emph{less than or equal to} another element or simply \emph{before} another element. The symbol $<$ can be used instead to indicate that an element is \emph{(strictly) less than} another element. That is, for a set $\set{X}$ and elements $x,y \in \set{X}$, % \begin{equation*} x < y \quad \text{ if and only if } \quad x \leq y \text{ and } x \neq y \end{equation*} % Additionally, for set $\set{X}$ and elements $x,y \in \set{X}$, the phrase $x < y$ ($x \leq y$) can be written $y > x$ ($y \geq x$), in which case the symbol $>$ ($\geq$) indicates that $y$ is \emph{greater than (or equal to)} $x$. \paragraph{Meets, Joins, and Lattices:} Take partially ordered set $(\set{X},{\leq})$ and $x,y \in \set{X}$. Consider two cases. % \begin{enumerate}[(i)] \item Assume that there exists $a \in \set{X}$ such that $a \leq x$ and $a \leq y$ and $z \leq a$ for all $z \in \set{X}$ such that $z \leq x$ and $z \leq y$. In that case, $a$ is called the \emph{greatest lower bound} or \symdef[\emph{(pairwise) meet}]{Forder.02}{meet}{$x \land y$}{the pairwise meet (\ie, greatest lower bound) of $x$ and $y$} of $x$ and $y$. It can also be said that $a$ is the \emph{infima} of the set $\{x,y\}$. Denote $a$ with $x \land y$ or $y \land x$. It is not coincidental that this notation is similar to the notation for a logical \emph{and}. \item Assume that there exists $b \in \set{X}$ such that $x \leq b$ and $y \leq b$ and $b \leq z$ for all $z \in \set{X}$ such that $x \leq z$ and $y \leq z$. In that case, $b$ is called the \emph{least upper bound} or \symdef[\emph{(pairwise) join}]{Forder.02}{join}{$x \lor y$}{the pairwise join (\ie, least upper bound) of $x$ and $y$} of $x$ and $y$. It can also be said that $b$ is the \emph{suprema} of the set $\{x,y\}$. Denote $b$ with $x \lor y$ or $y \lor x$. It is not coincidental that this notation is similar to the notation for a logical \emph{or}. \end{enumerate} % If the pairwise meet and pairwise join both exist for all $x,y \in \set{X}$, then $(\set{X},{\leq})$ is called a \emph{lattice}. Three common terms when dealing with lattices are the following, which assume that $(\set{X},{\leq})$ is a partially ordered set. % \begin{description} \item\emph{Totally Ordered Set:} Assume that it is the case that for all $x,y \in \set{X}$, the set $\{ x \lor y, x \land y \}$ is equivalent to the set $\{x,y\}$ (which is equivalent to the set $\{y,x\}$). In this case, $(\set{X},{\leq})$ is a \emph{totally ordered set} and it is the case that for all $x,y \in \set{X}$, $x \leq y$ if and only if $x \land y = x$. This type of set is the subject of \longref{app:math_total_order_set}. \item\emph{Complete Lattice:} Take a subset $\set{Y} \subseteq \set{X}$. Assume that it is the case that there exists $a,b \in \set{X}$ such that $a \leq y$ and $y \leq b$ for all $y \in \set{Y}$. Take such an $a$ and $b$. In this case, $a$ is called a \emph{lower bound} for $\set{Y}$ and $b$ is called an \emph{upper bound} for $\set{Y}$. Now assume that for all $c,d \in \set{X}$ such that $c$ is a lower bound for $\set{Y}$ and $d$ is an upper bound for $\set{Y}$, $a \geq c$ and $b \leq d$. In this case, $a$ is called the \emph{greatest lower bound} for $\set{Y}$ and $b$ is called the \emph{least upper bound} for $\set{Y}$. The greatest lower bound of $\set{Y}$ is also called the \emph{meet} or the \emph{infima} of $\set{Y}$ and is denoted by % \begin{equation*} \sup \set{Y} \quad \text{or} \quad \bigwedge \set{Y} \end{equation*} % The least upper bound of $\set{Y}$ is also called the \emph{join} or \emph{suprema} of $\set{Y}$ and is denoted by % \begin{equation*} \inf \set{Y} \quad \text{or} \quad \bigvee \set{Y} \end{equation*} % If for every subset $\set{Y} \subseteq \set{X}$, the meet of $\set{Y}$ and the join of $\set{Y}$ exists, then $\set{X}$ is called a \emph{complete lattice}. Upper and lower bounds are treated in detail in \longref{app:math_upper_lower_bound}. \item\emph{Bounded Lattice:} Assume that there exists an $a,b \in \set{X}$ such that $a \leq x$ and $x \leq b$ for all $x \in \set{X}$. In this case, $a$ is called the \emph{least element} or \emph{bottom} of $\set{X}$ and $b$ is called the \emph{greatest element} or \emph{top} of $\set{X}$. If a poset has both a greatest element and a least element, it is called a \emph{bounded poset}. If it is additionally a lattice, it is called a \emph{bounded lattice}. Note that all complete lattices are bounded lattices. \end{description} \subsection{Total Ordering on a Set} \label{app:math_total_order_set} For a set $\set{X}$ already equipped with an equivalence relation $=$, a \emph{total ordering} on that set is a \emph{total order} relation $\leq$ such that for \emph{any} three elements $x,y,z \in \set{X}$, % \begin{enumerate}[(i)] \item $x \leq y$ or $y \leq x$ (or both) \label{item:toset_totality} \item if $x \leq y$ and $y \leq x$ then $x = y$ \label{item:toset_antisymmetry} \item if $x \leq y$ and $y \leq z$ then $x \leq z$ \label{item:toset_transitivity} \end{enumerate} % This is identical to a partially ordered set (\ie, a poset); however, in this case, \emph{every} element of the set can be \emph{compared} to every other element. This comparison is the \emph{ordering}, and since it captures a relationship between \emph{any two elements}, it is called a \emph{total ordering} or is said to be \emph{total}. A set $\set{X}$ equipped with an order relation $\leq$ is called a \emph{totally ordered set} or simply an \emph{ordered set}. Such a set is sometimes denoted with its (total) order relation as $(\set{X},{\leq})$; however, for sets with a well-understood standard ordering, this notation is often omitted. Note that the symbols $\leq$, $<$, $\geq$, and $>$ have the same interpretation as in a poset. Clearly, every (totally) ordered set is a poset as well. \paragraph{Totally Ordered Set as a Lattice:} Take a totally ordered set $(\set{X},{\leq})$. Take $a,b \in \set{X}$. Without loss of generality, assume $a \leq b$. In this case, clearly $a$ is the join and $b$ is the meet of $a$ and $b$. Therefore, every totally ordered set is a lattice as well. \paragraph{Total Ordering as Directed Set:} Take a \emph{nonempty} totally ordered set $(\set{X},{\leq})$. Note that for all $x,y,z \in \set{X}$, without loss of generality we can assume that $x$, $y$, and $z$ were chosen such that % \begin{equation*} x \leq y \leq z \end{equation*} % Therefore, for all $x,y \in \set{X}$, there exists a $z \in \set{X}$ such that $x \leq z$ and $y \leq z$. Therefore, every nonempty totally ordered set is also a directed set. \paragraph{Whole Numbers as Example:} The set of the whole numbers $\W$ can be equipped with an \emph{equivalence relation} $=$ and a total order relation $\leq$ such that for any two whole numbers $x,y \in \W$, % \begin{itemize} \item $x \leq y$ if and only if $x \subseteq y$ \item $x = y$ if and only if $x \subseteq y$ and $y \subseteq x$ \end{itemize} % That is, define ${\leq} \triangleq (\W,\W,\set{G}_\leq)$ and ${=} \triangleq (\W,\W,\set{G}_=)$ with % \begin{equation*} \set{G}_\leq \triangleq \{ (x,y) \in \W^2 : x \subseteq y \} \quad \text{ and } \quad \set{G}_= \triangleq \{ (x,y) \in \W^2 : x \subseteq y \text{ and } y \subseteq x \} \end{equation*} % By these relations, % \begin{align*} 0 < 1 < 2 < 3 < 4 < 5 < 6 < \cdots \end{align*} % which, of course, also means that % \begin{align*} 0 \leq 1 \leq 2 \leq 3 \leq 4 \leq 5 \leq 6 \leq \cdots \end{align*} % and this is the standard ordering of the whole numbers. The set $\W$ equipped with the order relation $\leq$ makes $\W$ a totally ordered set. Since this is the standard whole number ordering, $\W$ will rarely be written explicitly with $\leq$ (\ie, $(\W,{\leq})$ or even $(\W,{=},{\leq})$). However, non-traditional ordered sets or non-traditional order relations on traditional ordered sets will often be listed with their order relations. Note that $(\W,{\leq})$ and $(\W,{\subseteq})$ are both directed sets as well; this is not surprising because all nonempty totally ordered sets are also directed sets. \paragraph{Comparison to Partially Ordered Sets:} Note that all totally ordered sets are also partially ordered sets. In a partially ordered set, it is not the case that every element can be compared to every other element. A poset (\ie, a partially ordered set) in which any two elements can be compared is \emph{total}; that is, it is a totally ordered set. \paragraph{Intervals of Totally Ordered Sets:} Take totally ordered set $(\set{X},{\leq})$. Take two elements $a,b \in \set{X}$ such that $a \leq b$. The notations \symdef[]{Csets.2intervals1}{interval1}{${[a,b]}$}{interval $[a,b] \triangleq \{ x \in \set{X} : a \leq x \leq b \}$}\symdef[]{Csets.2intervals2}{interval2}{${(a,b]}$}{interval $(a,b] \triangleq \{ x \in \set{X} : a < x \leq b \}$}\symdef[]{Csets.2intervals3}{interval3}{${[a,b)}$}{interval $[a,b) \triangleq \{ x \in \set{X} : a \leq x < b \}$}\symdef[]{Csets.2intervals4}{interval4}{${(a,b)}$}{interval $(a,b) \triangleq \{ x \in \set{X} : a < x < b \}$}$[a,b]$, $(a,b]$, $[a,b)$, and $(a,b)$ are defined with % \begin{align*} [a,b] &\triangleq \{ x \in \set{X} : a \leq x \leq b \}\\ (a,b] &\triangleq \{ x \in \set{X} : a < x \leq b \}\\ [a,b) &\triangleq \{ x \in \set{X} : a \leq x < b \}\\ (a,b) &\triangleq \{ x \in \set{X} : a < x < b \} \end{align*} % respectively. These sets are all called \emph{intervals} and $a$ and $b$ are called \emph{endpoints}. Specifically, $a$ is called the \emph{left endpoint} and $b$ is called the \emph{right endpoint} of each of the four intervals above. This notation provides a convenient way to specify a range of elements from an ordering. Since these are defined as sets, all standard set operations (\eg, intersection and union) apply to them. For example, for $c \in \set{X}$ with $a < c < b$, $[a,b] \setdiff \{c\} = [a,c) \cup (c,b]$. Also note that usually this notation is used with $a < b$. Special intervals are presented at the conclusions of \longrefs{app:math_reals} and \shortref{app:math_ext_reals}. \paragraph{Dense Ordering:} Take totally ordered set $(\set{X},{\leq})$. If it is the case that for every $x,y \in \set{X}$ with $x < y$, there exists a $z \in \set{X}$ such that $x < z < y$ then $(\set{X},{\leq})$ is said to be \emph{densely ordered} and $\leq$ is called a \emph{dense order}. \subsection{Upper and Lower Bounds} \label{app:math_upper_lower_bound} Take $\set{S}$ to be a partially ordered set equipped with partial order relation $\leq$. Take $\set{X} \subseteq \set{S}$. If there exists an $\alpha \in \set{S}$ such that for every $x \in \set{X}$, it is the case that $\alpha \leq x$ then $\alpha$ is called a \emph{lower bound} of set $\set{X}$ and $\set{X}$ is said to be \emph{bounded from below}. Similarly, if there exists a $\beta \in \set{S}$ such that for every $x \in \set{X}$, it is the case that $x \leq \beta$ then $\beta$ is called an \emph{upper bound} of set $\set{X}$ and $\set{X}$ is said to be \emph{bounded from above}. If $\set{X}$ is both bounded from above and bounded from below, $\set{X}$ is simply called a \emph{bounded} set. Again, take $\set{S}$ to be an ordered set equipped with order relation $\leq$, and take $\set{X} \subseteq \set{S}$. Assume that $\alpha \in \set{S}$ is a lower bound of $\set{X}$ and that for every $s \in \set{S}$, if $\alpha < s$ then $s$ is \emph{not} a lower bound of $\set{X}$. In that case, $\alpha$ is called the \emph{greatest lower bound} or the \emph{join} or the \symdef[]{Forder.11}{infbigwedge}{$\bigwedge$}{join of a set (\ie, lowest upper bound or supremum)}\symdef[\emph{supremum}]{Forder.201}{sup}{$\sup$}{supremum (\ie, lowest upper bound or join)} of $\set{X}$ and is denoted by % \begin{equation*} \sup \set{X} \quad \text{ or } \quad \bigvee \set{X} \end{equation*} % If $\sup \set{X} \in \set{X}$ then $\sup \set{X}$ is said to be the \symdef[\emph{maximum}]{Forder.202}{max}{$\max$}{maximum element} of $\set{X}$ and is denoted by % \begin{equation*} \max \set{X} \end{equation*} % Now assume that $\beta \in \set{S}$ is an upper bound of $\set{X}$ and that for every $s \in \set{S}$, if $s < \beta$ then $s$ is \emph{not} an upper bound of $\set{X}$. In that case, $\beta$ is called the \emph{least upper bound} or the \emph{meet} or the \symdef[]{Forder.12}{infbigvee}{$\bigvee$}{meet of a set (\ie, greatest lower bound or infimum)}\symdef[\emph{infimum}]{Forder.201}{inf}{$\inf$}{infimum (\ie, greatest lower bound or meet)} of $\set{X}$ and is denoted by % \begin{equation*} \inf \set{X} \quad \text{ or } \quad \bigwedge \set{X} \end{equation*} % If $\inf \set{X} \in \set{X}$ then $\inf \set{X}$ is said to be the \symdef[\emph{minimum}]{Forder.202}{min}{$\min$}{minimum element} of $\set{X}$ and is denoted with % \begin{equation*} \min \set{X} \end{equation*} % We will call the infimum and supremum of a set its \emph{extremum bounds}. \paragraph{Bounded Poset Bounds:} Take bounded poset $(\set{X},{\leq})$. Take $a \in \set{X}$ to be the bottom (\ie, least element) of $\set{X}$ and $b \in \set{X}$ to be the top (\ie, greatest element) of $\set{X}$. Note that % \begin{equation*} \sup \emptyset = a \end{equation*} % That is, since every element of $\set{X}$ can be called an upper bound of $\emptyset$, the least upper bound of the empty set is the least element of $\set{X}$. Similarly, % \begin{equation*} \inf \emptyset = b \end{equation*} % That is, since every element of $\set{X}$ can be called a lower bound of $\emptyset$, the greatest lower bound of the empty set is the greatest element of $\set{X}$. \paragraph{Gapless and Complete:} Take partially ordered set $(\set{X},{\leq})$. If it is the case that % \begin{enumerate}[(i)] \item the supremum of every nonempty subset of $\set{X}$ that is bounded from above exists (\ie, is an element of $\set{X}$) \label{item:lub_property} \item the infimum of every nonempty subset of $\set{X}$ that is bounded from above exists (\ie, is an element of $\set{X}$) \label{item:glb_property} \end{enumerate} % then set $\set{X}$ is called \emph{gapless} or \emph{Dedekind complete}. In particular, property (\shortref{item:lub_property}) is called the \emph{least-upper-bound property} and property (\shortref{item:glb_property}) is called the \emph{greatest-lower-bound} property. If set $\set{X}$ is gapless and every nonempty subset of $\set{X}$ is bounded then $\set{X}$ is called \emph{complete (in the sense of order)} or a \emph{complete lattice}. When we use the term \emph{complete}, we use it in this sense. To be more specific, some use the term \emph{complete lattice}. While the term \emph{gapless} is not our invention, it is not conventional; it should not be used in mathematical discourse without a definition. Take partially ordered set $(\set{X},{\leq})$. Assume that nonempty sets $\set{A}$ and $\set{B}$ form a \emph{partition} of set $\set{X}$ (\ie, every element of set $\set{X}$ is either an element of $\set{A}$ or $\set{B}$ but not an element of both). Assume that it is the case for that any element $a \in \set{A}$ and any element $b \in \set{B}$, $a < b$ (\ie, every element in set $\set{A}$ is less than every element in $\set{B}$). If and only if there exists $c \in \set{X}$ such that for any $a \in \set{A}$ and any $b \in \set{B}$ it is the case that $a \leq c \leq b$ then $\set{X}$ is \emph{gapless}. That is, this is an equivalent definition of \emph{gapless}. Also note that in this case, for such a $c$, $c = \sup \set{A} = \inf \set{B}$; that is, $c$ forms a sort of \emph{boundary} between $\set{A}$ and $\set{B}$. \paragraph{Existence of Upper Bounded Set Maxima:} Take partially ordered set $(\set{X},{\leq})$. Assume that $(\set{X},{\leq})$ is gapless but \emph{not} densely ordered. Now take a nonempty subset $\set{A} \subseteq \set{X}$ such that $\set{A}$ is bounded from above. Since $(\set{X},{\leq})$ is gapless, $\sup \set{A}$ exists. However, since $(\set{X},{\leq})$ is not densely ordered, it can be shown that $\max \set{A}$ exists and, of course, $\max \set{A} = \sup \set{A}$. \paragraph{Existence of Lower Bounded Set Minima:} Take partially ordered set $(\set{X},{\leq})$. Assume that $(\set{X},{\leq})$ is gapless but \emph{not} densely ordered. Now take a nonempty subset $\set{A} \subseteq \set{X}$ such that $\set{A}$ is bounded from below. Since $(\set{X},{\leq})$ is gapless, $\inf \set{A}$ exists. However, since $(\set{X},{\leq})$ is not densely ordered, it can be shown that $\min \set{A}$ exists and, of course, $\min \set{A} = \inf \set{A}$. \subsection{Order-Preserving Functions and Order Isomorphic Sets} \label{app:math_order_preserving} Take $(\set{X},{\preceq})$ and $(\set{Y},{\trianglelefteq})$ to each be sets paired with their corresponding total order relation. The function $f: \set{X} \mapsto \set{Y}$ is called \emph{monotone} or \emph{order-preserving} if it is the case that for every $x,y \in \set{X}$ if $x \preceq y$ then $f(x) \trianglelefteq f(y)$. If a monotone function is also a bijection (\ie, $\set{X} \cong \set{Y}$) then the function is said to be an \emph{order isomorphism} and the sets $\set{X}$ and $\set{Y}$ are \emph{order isomorphic}. Roughly, this means that every element from set $\set{X}$ can be replaced with a unique element from $\set{Y}$ and as long as the order relations are also exchanged the ordering will not change. \subsection{Filters on Partially Ordered Sets} \label{app:math_filters_on_posets} Take partially ordered set $(\set{S},{\leq})$. Take a \emph{nonempty} subset $\set{F}$ (\ie, $\set{F} \subseteq \set{S}$ with $\set{F} \neq \emptyset$). Now assume that % \begin{enumerate}[(i)] \item for all $x,y \in \set{F}$, there exists some $z \in \set{F}$ such that $z \leq x$ and $z \leq y$ \label{item:poset_filter_base} \item for all $x \in \set{F}$ and $y \in \set{S}$, if $x \leq y$ then $y \in \set{F}$ \label{item:poset_upper_set} % \item $\set{F} \neq \set{S}$ % \label{item:poset_proper} \end{enumerate} % In this case, $\set{F}$ is called a \emph{filter}. If it is the case that $\set{F} \neq \set{S}$ then $\set{F}$ may be called a \emph{proper filter}. If only property $(\shortref{item:poset_filter_base})$ is met then $\set{F}$ is called a \emph{filter base} (or a \emph{filter basis}), and a filter base $\set{F}$ with $\set{F} \neq \set{S}$ is called a \emph{proper filter base}. \subsection{Nets and Sequences} \label{app:math_nets_and_sequences} Take a set $\set{X}$ and a directed set $(\set{A},{\leq})$. The ordered indexed family $(x_\alpha)_{\alpha \in \set{A}}$ (\ie, a family with domain $\set{A}$ and codomain $\set{X}$) is called a \symdef[\emph{net}]{Dseq.3}{net}{$(x_\alpha)$}{a net (\ie, an ordered indexed family $(x_\alpha : \alpha \in \set{A})$ with directed index set $\set{A}$)}. Usually nets are listed without their index sets and the indices are given by Greek lowercase alphabetic letters. For example, % \begin{equation*} (x_\alpha) \triangleq (x_\alpha)_{\alpha \in \set{A}} \end{equation*} % is a net. \paragraph{Sequences:} Take set $\set{X}$, totally ordered set $(\N,{\leq})$, and the net $(x_n)$ from $\N$ to $\set{X}$. In this case, when a net's domain is $\N$, the net is called a \symdef[\emph{sequence}]{Dseq.3}{sequence}{$(x_n)$}{a sequence (\ie, an ordered indexed family $(x_n : n \in \N)$ with totally ordered index set $\N$)} and its indices are usually given with English lowercase alphabetic letters. For example, % \begin{equation*} (x_n) \triangleq (x_n)_{n \in \N} \end{equation*} % is a sequence (and, of course, also a net). \paragraph{Monotonic Sequences:} Take $(\set{X},{\leq})$ to be a totally ordered set and a sequence $(x_n)$ such that $x_n \in \set{X}$ for all $n \in \N$. If for all $m,n \in \N$ with $m > n$, % \begin{itemize} \item $x_m \geq x_n$ then the sequence is said to be \emph{monotonically increasing} \item $x_m > x_n$ then the sequence is said to be \emph{strictly monotonically increasing} \item $x_m \leq x_n$ then the sequence is said to be \emph{monotonically decreasing} \item $x_m < x_n$ then the sequence is said to be \emph{strictly monotonically decreasing} \end{itemize} % For example, the sequence $(1,2,3,4,\dots)$ is clearly strictly monotonically increasing. \section{Elementary Abstract Algebra} \label{app:math_abstract_algebra} Now that we have shown how elements of sets can be compared, we can introduce concepts from \emph{algebra} which allow elements of sets to interact. That is, we will show how elements can be operated on in order to produce other elements. Together with the constructs from \longref{app:math_order_theory}, this gives sets an idea of structure and shape. We will then show how the structures of two different sets can be related. Once a set is endowed with a sufficient order and structure, familiar \emph{arirthmetic} can be defined for its elements; this is our motivation for all of this discussion. \Citet{Roman92} provides further information about the algebraic structures important to us and their application. \subsection{Operations} \label{app:math_operations} We focus our attention on \emph{binary operations}, which are also called \emph{dyadic operations}. Our definition of these \emph{operations} is weaker than many of the the conventional definitions. For sets $\set{X}$, $\set{Y}$, and $\set{Z}$, a \emph{binary operation} is a function of the form $\set{X} \times \set{Y} \mapsto \set{Z}$ where at least two of the sets are usually the same. Take sets $\set{X}$, $\set{Y}$, and $\set{Z}$. Also take binary operation ${\bin{Q}}: \set{X} \times \set{Y} \mapsto \set{Z}$. For some $(x,y) \in \set{X} \times \set{Y}$ and $z \in \set{Z}$, if it is the case that $\mathop{\bin{Q}}(x,y) = z$, the notation % \begin{equation*} x \bin{Q} y = z \end{equation*} % is used and $\bin{Q}$ is referred to as a \emph{binary operator} or simply an \emph{operator}. For example, assume that an operator ${+}: \W \times \W \mapsto \W$ has been defined; then for any $x,y \in \W$, there exists $z \in \W$ such that % \begin{equation*} x + y = z \end{equation*} % This is possibly a more familiar notation than the generic one that uses $\bin{Q}$ above. \paragraph{Set Operations and the Power Set:} We have already informally defined the operations $\cap$, $\cup$, and ${}^c$ (where ${}^c$ is a \emph{unary operation}) for sets. Recall that whenever sets are defined, a universal set needs to at least be implicitly defined. For example, define a universal set $\set{U}$ to be a superset of all possible sets of interest. That is, for any set $\set{X}$ and $\set{Y}$, $\set{X} \subseteq \set{U}$ and $\set{Y} \subseteq \set{U}$. Note that any subset of $\set{U}$ is an element of the power set $\Pow(\set{U})$. That is, % \begin{equation*} \Pow(\set{U}) = \{ \set{X} : x \in \set{X} \text{ implies } x \in \set{U} \} \end{equation*} % In fact, the power set can be viewed as a universal set for all subsets of $\set{U}$. Therefore, any operation $\bin{Q}$ between two sets must take the form % \begin{equation*} {\bin{Q}}: \Pow(\set{U}) \times \Pow(\set{U}) \mapsto \Pow(\set{U}) \end{equation*} % That is, for $\set{X} \subseteq \set{U}$ and $\set{Y} \subseteq \set{U}$ (\ie, $\set{X},\set{Y} \in \Pow(\set{U})$), % \begin{itemize} \item $\set{X} \cap \set{Y} \triangleq \{ x \in \set{X} : x \in \set{Y} \}$ \item $\set{X} \cup \set{Y} \triangleq \{ x \in \set{U} : x \in \set{X} \text{ or } x \in \set{Y} \}$ \item $\set{X}^c \triangleq \{ x \in \set{U} : x \notin \set{X} \}$ \end{itemize} % Of course, these are the standard definitions for these three operations. \paragraph{Magma Notation:} The shorthand notation $(\set{X}, {\bin{Q}})$ indicates that the set $\set{X}$ is \emph{equipped} with the operation ${\bin{Q}}: \set{X} \times \set{X} \mapsto \set{X}$. In fact, $(\set{X}, {\bin{Q}})$ is called a \emph{magma} or \emph{groupoid}. The only requirement on ${\bin{Q}}$ is that it the set $\set{X}$ is closed under the operation, which is implied by the codomain of ${\bin{Q}}$ being $\set{X}$. \paragraph{Multiple-Operator Notation:} In general, when a set $\set{X}$ is equipped with $n \in \N$ operations, the $(n+1)$-tuple with $\set{X}$ as its first coordinate and the $n$ operations as its other coordinates is typically used. In the case where a set also has an order defined, that order may also be listed as a coordinate; in fact, the order is usually listed as the last coordinate of the tuple. \paragraph{Implicit Operators:} Also, familiar operations like $+$ are typically assumed to be equipped with familiar sets like $\W$. That is, it is rare to see these familiar sets and these familiar operations grouped together explicitly in the $n$-tuple notation; instead, it is assumed that the familiar operations (\eg, $+$) are provided with the standard definitions. That being said, we will explicitly define these operations for each of the familiar sets and then assume that that set carries those operations with it. \paragraph{Order of Operations, Grouping Symbols, and Precedence:} In a long string of binary operations, usually the leftmost operation should be executed first and the result should be used as the left argument of the operation adjacent to it. This chain of execution should continue from left to right. However, the grouping symbols $($ and $)$ can be used to indicate that certain operations should be executed out of order. For example, for set $\set{X}$ and binary operations ${\bin{Q}}: \set{X} \times \set{X} \mapsto \set{X}$ and ${\bin{R}}: \set{X} \times \set{X} \mapsto \set{X}$ and elements $x,y,z \in \set{X}$, the statement % \begin{equation} x \bin{Q} y \bin{R} z \label{eq:bin_oper_QR} \end{equation} % is equivalent to the statement % \begin{equation*} (x \bin{Q} y) \bin{R} z \end{equation*} % which both state that $x \bin{Q} y$ should be the left argument to $\bin{R}$; however, % \begin{equation} x \bin{Q} (y \bin{R} z) \label{eq:bin_oper_QpR} \end{equation} % is a completely different statement as it indicates that $y \bin{R} z$ should be the right argument to $\bin{Q}$. However, note that when no parentheses are given, operators may be defined so that one operator takes \emph{precedence} over another operator. That is, in the above example, $\bin{R}$ could have been defined to take precedence over $\bin{Q}$; in that case, \longrefs{eq:bin_oper_QR} and \shortref{eq:bin_oper_QpR} will be the equivalent pair above. \subsection{Groups, Monoids, and Semigroups} Take set $\set{X}$ equipped with equivalence relation $=$ and binary operation ${\diamond}: \set{X} \times \set{X} \mapsto \set{X}$. If it is the case that % \begin{enumerate}[(i)] \item For all $x, y, z \in \set{X}$, $(x \diamond y) \diamond z = x \diamond ( y \diamond c )$. \label{item:group_associativity} \item There exists an element $e \in \set{X}$ such that for all $x \in \set{X}$, $e \diamond x = x \diamond e = x$ where $e$ is known as the \emph{identity element}. \label{item:group_identity} \item For each $x \in \set{X}$, there exists a $y \in \set{X}$ such that $x \diamond y = y \diamond x = e$, where $e$ is the identity element from (\shortref{item:group_identity}), and $y$ is known as the \emph{inverse} of $x$. \label{item:group_inverse} \end{enumerate} % then the magma $(\set{X}, {\diamond})$ is called a \emph{group} with identity element $e$. The property in (\shortref{item:group_associativity}) is known as \emph{associativity} and the operator $\diamond$ is said to be \emph{associative}. Properties (\shortref{item:group_associativity}) and (\shortref{item:group_identity}) make $(\set{X}, {\diamond}, e)$ a \emph{monoid}. Property (\shortref{item:group_associativity}) makes $(\set{X}, {\diamond})$ a \emph{semigroup}. In summary, all groups are monoids and all monoids are semigroups. \paragraph{Trivial Monoids and Semigroups:} Because groups and monoids require the existence of an identity element, they \emph{must be nonempty}. However, semigroups have no requirement of the existence of any elements. Thus, we have the following. % \begin{itemize} \item The singleton set is the trivial monoid and thus is also the trivial group. Now take the trivial monoid $(\{x\},{\diamond})$. Note that it must be that $x$ is the identity element, and so $x \diamond x = x$. However, this also implies that $x$ is its own inverse, and so $(\{x\},{\diamond})$ is the trivial group as well. Therefore, all trivial monoids are trivial groups. \item The emptyset $\emptyset$ is the trivial semigroup. \end{itemize} \paragraph{Monoid Triple Notation:} Take $(\set{X},{\diamond})$ to be a monoid. In this case, there is an identity element $e_\diamond$ for the operation $\diamond$. In order to identify this identity element, it is often listed explicitly in the notation. That is, $(\set{X},{\diamond},e_\diamond)$ is an equivalent notation for the monoid. Of course, since all groups are monoids then this is also an equivalent notation for a group. \paragraph{Commutative Semigroups, Monoids, and Groups:} Take $(\set{X}, {\diamond})$ that is either a semigroup, monoid, or group. If it is the case that for any $x, y \in \set{X}$, $x \diamond y = y \diamond x$ then $(\set{X}, {\diamond})$ is said to be \emph{Abelian} or \emph{commutative}. For example, if $(\set{X}, {\diamond})$ is a group that has this property then $(\set{X}, {\diamond})$ is a \emph{commutative group}. This property is known as \emph{commutivity} and operators with this property are said to be \emph{commutative} as well. \subsection{Rings} Take set $\set{X}$ equipped with equivalence relation $=$ and binary operations ${+}: \set{X} \times \set{X} \mapsto \set{X}$ and ${\times}: \set{X} \times \set{X} \mapsto \set{X}$. Call \symdef[$+$]{Ageneral.541}{addition}{$x + y$}{sum of $x$ and $y$} the \emph{addition} operator and \symdef[$\times$]{Ageneral.542}{multiplication}{$x \times y$}{product of $x$ and $y$ (also denoted $xy$)} the \emph{multiplication} operator. If it is the case that % \begin{enumerate}[(i)] \item the magma $(\set{X},{+},e_+)$ is a commutative group (with identity element $e_+$) \label{item:ring_addition} \item the magma $(\set{X},{\times},e_\times)$ is a monoid (with identity element $e_\times$) \label{item:ring_multiplication} \item for each $x,y,z \in \set{X}$, $x \times (y + z) = (x \times y) + (x \times z)$ and $(x + y) \times z = (x \times z) + (y \times z)$ \label{item:ring_distributivity} \end{enumerate} % then $(\set{X},{+},{\times})$ is called a \emph{ring} and is often shown with its identity elements as $(\set{X},{+},{\times},e_+,e_\times)$. The identity element $e_+$ in (\shortref{item:ring_addition}) is called the \emph{additive identity} and is often denoted $0$, and the identity element $e_\times$ in (\shortref{item:ring_multiplication}) is called the \emph{multiplicative identity} and is often denoted $1$. Thus, it is common to see a ring specified with $(\set{X},{+},{\times},0,1)$. The inverses for ${+}$ are called \emph{additive inverses} and the inverses for ${\times}$ (which are not guaranteed to exist) are called \emph{multiplicative inverses}. The property in (\shortref{item:ring_distributivity}) is called \emph{distributivity}; that is, multiplication \emph{distributes} over addition. The result of the addition operator ${+}$ is called the \emph{sum} of its arguments, and the result of the multiplication operator ${\times}$ is called the \emph{product} of its arguments. \paragraph{Additive Inverses and Subtraction:} Since $(\set{X},{+})$ is a group, it has inverses. For an element $x \in \set{X}$, the additive inverse for $x$ is often denoted \symdef{Ageneral.543}{addinverse}{$-x$}{additive inverse of $x$}. Additionally, for elements $x,y \in \set{X}$, the notation \symdef{Ageneral.5431}{subtraction}{$x - y$}{difference of $x$ and $y$ (\ie, $x - y \triangleq x + -y$)} is often used to represent $x + -y$, where $-y$ is the additive inverse of $y$. In this case, the operator ${-}$ is called the \emph{subtraction} operator, and its result is called the \emph{difference} of its two arguments. \paragraph{Multiplicative Inverses, Ratios, and Division:} Since $(\set{X},{\times})$ is a monoid, some of the elements of $\set{X}$ may have inverses. For an element $x \in \set{X}$ that has an inverse, the multiplicative inverse for $x$ is often denoted $x^{-1}$. Additionally, for elements $x,y \in \set{X}$ where $y$ has a multiplicative inverse, the notation $x/y$ is often used to represent $x \times y^{-1}$, where $y^{-1}$ is the multiplicative inverse of $y$. In this case, the operator ${/}$ is called the \emph{division} operator, and its result is called the \emph{quotient} of its two arguments. Additionally, the notation $\frac{x}{y}$ is equivalent to the notation $x/y$. Both notations are often referred to as \emph{ratios} of element $x$ to element $y$. \paragraph{Juxtaposition and Related Notations:} Note that the operator $\times$ is often denoted by $\cdot$ or simply omitted completely. That is, for $x,y \in \set{X}$, $x \times y$ and $x \cdot y$ and $xy$ all indicate the same operation. The latter case (\eg, $xy$) is called \emph{juxtaposition} of $x$ and $y$. \paragraph{Order of Operations:} The multiplication operation takes precedence over the addition operation. That is, unless explicit grouping symbols (\eg, $($ and $)$) denote otherwise, all multiplication operations should be executed first. \paragraph{Multiplication by Additive Identity:} Take a ring $(\set{X},{+},{\times},0,1)$ and elements $x,y \in \set{X}$. Note that $x(0 + -y) = x0 + x(-y)$. However, since $0 + -y = -y$ then $x(0 + -y)=x(-y)$. Thus, $x(-y) = x0 + x(-y)$ and so $-(x(-y)) + x(-y) = x0$. However, $-(x(-y)) + x(-y) = 0$, and so it must be that $x0 = 0$. Similarly, it is easy to show that $0x = 0$ and so $x0 = 0x = 0$. This holds for any ring. \paragraph{Commutative Rings:} If ring $(\set{X},{+},{\times})$ is such that $(\set{X},{\times})$ is a commutative monoid rather than just a monoid then $(\set{X},{+},{\times})$ is called a \emph{commutative ring}. \paragraph{The Zero Ring:} Take singleton set $\{x\}$ with $\times$ defined so that $x \times x = x$ and $+$ defined so that $x + x = x$. Clearly, $(\{x\}, {+}, {\times}, x, x)$ is a commutative ring. In fact, this is often called the \emph{trivial ring} or the \emph{zero ring}. Clearly, in this singleton set the multiplicative identity and the additive identity are the same element. Since these are usually denoted with $1$ and $0$ respectively, this is the same as saying $1 = 0$. Take a ring $(\set{X},{+},{\times},0,1)$ where $1 = 0$. In that case, for any $x \in \set{X}$, % \begin{equation*} x \times 1 = x \times 0 = 0 \end{equation*} % And thus $x$ can only be $0$ and so $\set{X}$ must be a singleton set. Therefore, a ring is a trivial ring if and only if its multiplicative identity and its additive identity are the same element. This trivial property is sometimes denoted by $1 = 0$. Thus, if a ring is required such that $1 \neq 0$ (\ie, the multiplicative and additive identities are different), it is required that the ring is not the trivial zero ring. \paragraph{Semirings:} The definition of a ring can be relaxed slightly to define a \emph{semiring}. In particular, $(\set{X},{+},{\times})$ is called a \emph{semiring} if it is the case that % \begin{enumerate}[(i)] \item the magma $(\set{X}, {+})$ is a commutative \emph{monoid} \label{item:semiring_addition} \item the magma $(\set{X}, {\times})$ is a monoid \label{item:semiring_multiplication} \item for each $x,y,z \in \set{X}$, $x \times (y + z) = (x \times y) + (x \times z)$ and $(x + y) \times z = (x \times z) + (y \times z)$ \label{item:semiring_distributivity} \item for each $x \in \set{X}$, $x \times e_+ = e_+ \times x = e_+$ where $e_+$ is the identity element from the monoid $(\set{X}, {+})$ \end{enumerate} % If semiring $(\set{X},{+},{\times})$ is such that $(\set{X},{\times})$ is a commutative monoid rather than just a monoid then $(\set{X},{+},{\times})$ is called a \emph{commutative semiring}. It can be shown that every semiring is a ring and every commutative semiring is a commutative ring. \subsection{Fields} Take a commutative ring $(\set{X},{+},{\times},e_+,e_\times)$. If it is the case that % \begin{enumerate}[(i)] \item the additive identity $e_+$ (\eg, $0$) and the multiplicative identity $e_\times$ (\eg, $1$) are unique (\ie, $e_+ \neq e_\times$) \label{item:field_not_trivial} \item for all $x \in \set{X}$, if $x$ is not the additive identity (\ie, $x \neq e_+$) then the multiplicative inverse (\eg, $x^{-1}$) exists (\eg, $x x^{-1} = x^{-1} x = e_\times$) \label{item:field_division} \end{enumerate} % then $(\set{X},{+},{\times},e_+,e_\times)$ is called a \emph{field}. The property in (\shortref{item:field_not_trivial}) simply excludes the trivial zero ring. The property in (\shortref{item:field_division}) allows for the operation of \emph{division}. \subsection{Subgroups, Subrings, and Subfields} Take a set $\set{X}$ and subset $\set{Y} \subset \set{X}$. If there is an algebraic structure (\eg, a group, a ring, or a field) for set $\set{X}$ that maintains its structure when $\set{Y}$ is substituted for $\set{X}$ and the operations are restricted to set $\set{Y}$ then the structure with set $\set{Y}$ is known as a \emph{sub}structure. \paragraph{Examples:} For example, take $\set{X}$ and $\set{Y} \subset \set{X}$ and assume that $(\set{X},{\star})$ is a group and $(\set{X},{\star},{\divideontimes})$ is a field. % \begin{itemize} \item If $(\set{Y},{\star}|_\set{Y})$ is also a group then $\set{Y}$ is called a \emph{subgroup} of $\set{X}$ under the operation $\star$. \item If $(\set{Y},{\star}|_\set{Y},{\divideontimes}|_\set{Y})$ is also a field then $\set{Y}$ is called a \emph{subfield} of $\set{X}$ under the operations $\star$ and $\divideontimes$. \end{itemize} % Recall that the operations $\star$ and $\divideontimes$ are functions that take the form $\set{X} \times \set{X} \mapsto \set{X}$, and thus the ${}|_\set{Y}$ notation restricts them to the subset $\set{Y}$; that is, the restrictions take the form $\set{Y} \times \set{Y} \mapsto \set{Y}$. It is important that both the domain and codomain have been restricted to $\set{Y}$. If it is not possible to restrict both of the operator function's domain and codomain then the subset cannot be considered a substructure. This is referred to as \emph{closure}. That is, the subset must be \emph{closed} under the operation in order to qualify as a substructure. \paragraph{Other Relevant Substructures:} There are many other substructure examples. Later, in \longref{app:math_algebra_over_a_field}, we will define a type of \emph{algebra}, and so there may be \emph{subalgebras}. Similarly, there can be \emph{submonoids} and \emph{subrings}. If the main structure is commutative, the type of the substructure may be preceded with \emph{commutative} as well in order to indicate that it is also commutative. That is, a commutative group may have a \emph{commutative subgroup}. \subsection{Homomorphisms and Homomorphic Structures} \label{app:math_homomorphisms} Take two sets $\set{X}$ and $\set{Y}$ and a function $f: \set{X} \mapsto \set{Y}$. The function $f$ is called a \emph{homomorphism} if algebraic structures are preserved through the function. For example, consider \emph{group homomorphisms} and \emph{ring homomorphisms}. % \begin{itemize} \item Assume that $(\set{X},{\star},e_\star)$ and $(\set{Y},{\diamond},e_\diamond)$ are two groups. If the function $f$ is such that for $x,y \in \set{X}$, % \begin{equation*} f(x \star y) = f(x) \diamond f(y) \end{equation*} % then $f$ is called a \emph{group homomorphism}. That is, $f$ preserves uses the group structure present in $(\set{Y},{\diamond})$ in order to transplant the existing group structure in $(\set{X},{\star})$. It can be shown that % \begin{itemize} \item $f(e_\star)=e_\diamond$ \item for element $x \in \set{X}$, $f(x^\star)=f(x)^\diamond$ where ${}^\star$ indicates an inverse in group $(\set{X},{\star})$ and ${}^\diamond$ indicates an inverse in group $(\set{Y},{\diamond})$ \end{itemize} \item Assume that $(\set{X},{\oplus},{\otimes},e_\oplus,e_\otimes)$ and $(\set{Y},{\boxplus},{\boxtimes},% e_\boxplus,e_\boxtimes)$ are two rings. If the function $f$ is such that for $x,y \in \set{X}$, % \begin{itemize} \item $f(x \oplus y) = f(x) \boxplus f(y)$ \item $f(x \otimes y) = f(x) \boxtimes f(y)$ \item $f(e_\otimes) = e_\boxtimes$ \end{itemize} % then $f$ is called a \emph{ring homomorphism}. That is, $f$ preserves uses the ring structure present in $(\set{Y},{\boxplus},{\boxtimes},% e_\boxplus,e_\boxtimes)$ in order to transplant the existing group structure in $(\set{X},{\oplus},{\otimes},e_\oplus,e_\otimes)$. It can be shown that % \begin{itemize} \item $f(e_\oplus)=e_\boxplus$ \item for element $x \in \set{X}$, $f(x^\oplus)=f(x)^\boxplus$ where ${}^\oplus$ indicates an inverse in group $(\set{X},{\oplus})$ and ${}^\boxplus$ indicates an inverse in group $(\set{Y},{\boxplus})$ \item if $x \in \set{X}$ has an inverse $x^\otimes$ in monoid $(\set{X},{\otimes})$ then $f(x)$ has an inverse $f(x)^\boxtimes$ in monoid $(\set{Y},\boxtimes)$ and $f(x^\otimes)=f(x)^\boxtimes$ \end{itemize} \end{itemize} % Additionally, % \begin{itemize} \item a \emph{semigroup homomorphism} is defined the same way as a group homomorphism, except that it relates two semigroups and thus has no consequences involving identity or inverses \item a \emph{monoid homomorphism} is defined the same way as a group homomorphism, except that it relates two monoids and thus has no consequences involving inverses \item a \emph{semiring homomorphism} is defined the same way as a ring homomorphism, except that it relates two semirings \item a \emph{field homomorphism} is defined the same way as a ring homomorphism, except that it relates two fields \end{itemize} % Two structures for which there exists a homomorphism between them are said to be \emph{homomorphic}, which roughly means that they have the same shape. \paragraph{Isomorphisms and Isomorphic Structures:} Any homomorphism that is also bijective is called an \emph{isomorphism}. Additionally, two algebraic structures for which there exists an isomorphism between them are said to be \emph{isomorphic}. In other words, isomorphic algebraic structures are ones that consist of congruent sets that are homomorphic in their algebraic structures. % \begin{itemize} \item The fact that the two sets are congruent implies that every element of either set can be replaced with a unique element from the other set. \item The fact that the two algebraic structures are homomorphic means that any operation on elements of either set can be replaced with operations on elements of the other set. \end{itemize} % Therefore two isomorphic algebraic structures are very similar. If the isomorphism is also an order isomorphism (\ie, it preserves the ordering) then one structure most likely can be used as an equally valid \emph{representation} of the other. \subsection{Ordered Rings, Absolute Value, and Ordered Fields} \label{app:math_ordered_rings} Take commutative ring $(\set{X},{+},{\times},0,1)$. Also assume that set $\set{X}$ is \emph{totally} ordered with \emph{total} order relation $\leq$. It is common to denote this by $(\set{X},{+},{\times},{\leq})$ or even $(\set{X},{+},{\times},0,1,{\leq})$, which groups all operations, identities, and relations of interest. For any $x,y \in \set{X}$, use the notation $x < y$ to denote the relationship that $x \leq y$ and $x \neq y$. If it is the case that for any $x,y,z \in \set{X}$, % \begin{enumerate}[(i)] \item if $x \leq y$ then $z + x \leq z + y$ \label{item:ordered_ring_add} \item if $0 \leq x$ and $0 \leq y$ then $0 \leq x \times y$ \label{item:ordered_ring_mult} \end{enumerate} % then $(\set{X},{+},{\times},0,1,{\leq})$ is called an \emph{ordered ring}. Additionally, for all $x \in \set{X}$ with $x \neq 0$, % \begin{itemize} \item if $x < 0$, $x$ is called \emph{negative} \item if $0 < x$, $x$ is called \emph{positive} \end{itemize} % Thus, the \emph{sign function} of element $x \in \set{X}$ is denoted \symdef{Ageneral.5432}{sgnfn}{$\sgn(x)$}{sign function of $x$} and defined by % \begin{equation*} \sgn(x) \triangleq \begin{cases} {-1} &\text{if } x < 0\\ 0 &\text{if } x = 0\\ 1 &\text{if } x > 0 \end{cases} \end{equation*} % where $-1$ is the additive inverse of the multiplicative identity $1$. As \citet{Rudin76} does, it is simple to show that every ordered ring is such that for all $x,y,z \in \set{X}$, % \begin{itemize} \item if $0 < x$ then $-x < 0$ and vice versa, where $-x$ is the additive inverse of $x$ \item if $0 < x$ and $y < z$ then $x \times y < x \times z$ \item if $x < 0$ and $y < z$ then $x \times z < x \times y$ \item if $x \neq 0$ then $0 < x \times x$ \item $0 < 1$ \end{itemize} % Additionally, for any element $x \in \set{X}$, the \emph{absolute value} of $x$ is denoted \symdef{Ageneral.5433}{absvalue}{$\pipe x \pipe$}{absolute value of $x$ (\ie, $x = \sgn(x) \pipe x \pipe$)} and defined % \begin{equation*} |x| \triangleq \begin{cases} x &\text{if } x \geq 0\\ -x &\text{otherwise} \end{cases} \end{equation*} % Of course, for all $x \in \set{X}$, $x = \sgn(x) |x|$. Note that the absolute value can also be defined for \emph{complex numbers}, which we do not discuss here, even though they are not an ordered ring. Clearly, every subring of an ordered ring is also an ordered ring. \paragraph{Ordered Fields:} If ordered ring is also a field, it is called an \emph{ordered field}. Take ordered field $(\set{X},{+},{\times},0,1,{\leq})$. For $x,y \in \set{X}$, use the relationship $x < y$ to indicate that $x \leq y$ and $x \neq y$. It is the case that for any $x,y \in \set{X}$, % \begin{itemize} \item if $0 < x < y$ then $0 < y^{-1} < x^{-1}$ \end{itemize} % where $x^{-1}$ and $y^{-1}$ are the multiplicative inverses for $x$ and $y$ respectively. Additionally, every subfield of an ordered field is also an ordered field. Intuitively, ordered fields have all of the characteristics necessary for familiar \emph{arithmetic}. \subsection{Summations and Products of Indexed Families} \label{app:math_sumprod_ind_fam} Recall the notion of indexed family from \longref{app:math_indexed_families}. Also recall how these indexed families were used with nets and sequences from \longref{app:math_nets_and_sequences}. When an indexed family is made up of elements for which addition and multiplication are defined, it may be useful to take sums and products of every element in that family. Here we present some common notations for these operations. \paragraph{Finite Summations over Commutative Magmas:} Take a nonempty commutative magma $(\set{X},{+})$ and \emph{finite} nonempty set $\set{I}$. Now take the nonempty indexed family $(a_n)_{i \in \set{I}}$ where $a_i \in \set{X}$ for all $i \in \set{I}$. Call the operator $+$ the addition operator. The \symdef[]{Ageneral.z}{summation}{$\sum$}{sum of elements of a set}notation % \begin{equation*} \sum\limits_{i \in \set{I}} a_i \end{equation*} % results in the sum of every instance of every value of the family. Since $(\set{X},{+})$ is commutative and set $\set{I}$ is finite then the order in which the sum is performed has no impact on the value of the sum. For example, take $\set{I} = \{2,1,3\}$. Then % \begin{equation*} \sum\limits_{i \in \set{I}} a_i = \sum\limits_{i \in \{1,2,3\}} a_i = a_3 + a_2 + a_1 = a_2 + a_3 + a_1 \end{equation*} \paragraph{Ordered Summations over General Magmas:} Take a magma $(\set{X},{+})$ and \emph{totally} ordered set $(\set{I},{\leq})$ that is either \emph{finite} or \emph{countably infinite}. Now take the indexed family $(a_n)_{i \in \set{I}}$ where $a_i \in \set{X}$ for all $i \in \set{I}$. Call the operator $+$ the addition operator. If $m,n \in \set{I}$ with $m \leq n$ then the notation % \begin{equation*} \sum\limits_{i=m}^n a_i \end{equation*} % is the sum of all elements $a_i$ with $i \in \{ j \in \set{I} : m \leq j \leq n \}$ where the order of operation matches the ordering of index elements. For example, take $\set{I} = \N$ with the standard natural number order relation $\leq$. Then % \begin{equation*} \sum\limits_{i=4}^8 a_i = a_4 + a_5 + a_6 + a_7 + a_8 \end{equation*} % where the elements are listed in this order since $4 \leq 5 \leq 6 \leq 7 \leq 8$. \paragraph{Empty Summations over Magmas with Identity:} Take a magma $(\set{X},{+})$ such that there exists an element $0 \in \set{X}$ such that for all $x \in \set{X}$, $0 + x = x + 0 = x$ (\ie, $0$ is the \emph{identity element} for the magma operation $+$). Also take a set $\set{I}$. Now take the indexed family $(a_n)_{i \in \set{I}}$ where $a_i \in \set{X}$ for all $i \in \set{I}$. Call the operator $+$ the addition operator. In this case, the summation $\sum_{i \in \emptyset} a_i$ is defined by % \begin{equation*} \sum\limits_{i \in \emptyset} a_i \triangleq 0 \end{equation*} % That is, the \emph{empty sum} is the identity element for the magma. \paragraph{Finite Products over Commutative Magmas:} Take a nonempty commutative magma $(\set{X},{\times})$ and \emph{finite} nonempty set $\set{I}$. Now take the nonempty indexed family $(a_n)_{i \in \set{I}}$ where $a_i \in \set{X}$ for all $i \in \set{I}$. Call the operator $\times$ the multiplication operator. The \symdef[]{Ageneral.z}{product}{$\prod$}{product of elements of a set}notation % \begin{equation*} \prod\limits_{i \in \set{I}} a_i \end{equation*} % results in the product of every instance of every value of the family. Since $(\set{X},{\times})$ is commutative and set $\set{I}$ is finite, then the order in which the product is performed has no impact on the value of the product. For example, take $\set{I} = \{2,1,3\}$. Then % \begin{equation*} \prod\limits_{i \in \set{I}} a_i = \prod\limits_{i \in \{1,2,3\}} a_i = a_3 \times a_2 \times a_1 = a_2 \times a_3 \times a_1 \end{equation*} \paragraph{Ordered Products over General Magmas:} Take a magma $(\set{X},{\times})$ and \emph{totally} ordered set $(\set{I},{\leq})$ that is either \emph{finite} or \emph{countably infinite}. Now take the indexed family $(a_n)_{i \in \set{I}}$ where $a_i \in \set{X}$ for all $i \in \set{I}$. Call the operator $\times$ the multiplication operator. If $m,n \in \set{I}$ with $m \leq n$ then the notation % \begin{equation*} \prod\limits_{i=m}^n a_i \end{equation*} % is the product of all elements $a_i$ with $i \in \{ j \in \set{I} : m \leq j \leq n \}$ where the order of operation matches the ordering of index elements. For example, take $\set{I} = \N$ with the standard natural number order relation $\leq$. Then % \begin{equation*} \prod\limits_{i=4}^8 a_i = a_4 \times a_5 \times a_6 \times a_7 \times a_8 \end{equation*} % where the elements are listed in this order since $4 \leq 5 \leq 6 \leq 7 \leq 8$. \paragraph{Empty Products over Magmas with Identity:} Take a magma $(\set{X},{\times})$ such that there exists an element $1 \in \set{X}$ such that for all $x \in \set{X}$, $1 \times x = x \times 1 = x$ (\ie, $1$ is the \emph{identity element} for the magma operation $\times$). Also take a set $\set{I}$. Now take the indexed family $(a_n)_{i \in \set{I}}$ where $a_i \in \set{X}$ for all $i \in \set{I}$. Call the operator $\times$ the multiplication operator. In this case, the product $\prod_{i \in \emptyset} a_i$ is defined by % \begin{equation*} \prod\limits_{i \in \emptyset} a_i \triangleq 1 \end{equation*} % That is, the \emph{empty product} is the identity element for the magma. \section{Linear Algebra: Vector Spaces and Algebras} \label{app:math_linear_algebra} When many variables are related in a problem, complicated mathematical structures can be used to represent those relationships. However, these relationships can often be shown to have a certain kind of structure. The area of \emph{linear algebra} studies one of those kinds of structure. \subsection{Vector Spaces} \label{app:math_vector_space} Let $(\set{F},{+},{\times})$ be a field with set elements called \emph{scalars}. Let $(\set{V},{\oplus})$ be a commutative group with set elements called \emph{vectors}. Take scalars $a,b \in \set{F}$ and vectors $\v{x},\v{y} \in \set{V}$. Define a \emph{scalar (vector) multiplication} operator $\mathop{\otimes}: \set{F} \times \set{V} \mapsto \set{V}$; however, use the juxtaposition notation so that $a \v{x}$ is an equivalent expression for $a \otimes \v{x}$. If it is the case that % \begin{enumerate}[(i)] \item $a (\v{x} \oplus \v{y}) = a \v{x} \oplus a \v{y}$ \item $(a + b) \v{x} = a \v{x} \oplus b \v{x}$ \item $a (b \v{x}) = (a b) \v{x}$ \item $1 \v{x} = \v{x}$ where $1$ is the multiplicative identity for $(\set{F},{\times})$ \end{enumerate} % then $\set{V}$ is called a \emph{vector space} over the field $\set{F}$. The field $\set{F}$ is called the \emph{base field} of vector space $\set{V}$. Additionally, set $\set{V}$ may be called a \emph{linear space} instead of a vector space. \paragraph{Operator Notation:} Usually the same symbol will be used for all forms of addition and all forms of multiplication. That is, symbol $+$ may be used to represent both scalar addition (\eg, ${+}$ above) and vector addition (\eg, ${\oplus}$). Similarly, symbol $\times$ may be used to represent both scalar field multiplication (\eg, ${\times}$) and scalar vector multiplication (\eg, ${\otimes}$). Furthermore, multiplication in both cases can be represented by \emph{juxtaposition} of arguments. The actual operator that should be used in these cases should be clear from the type of the argument. That is, it is clear that $ab$ denotes scalar field multiplication and $a\v{x}$ denotes scalar vector multiplication. Juxtaposition is usually the preferred form of multiplication because the two other common multiplication symbols, $\times$ and $\cdot$, are often used to represent other common special types of vector multiplication that we have not defined in this document. If it is said that $\set{V}$ is a vector space over the field $\set{F}$, it is implied that $+$ should be used for addition and juxtaposition (or $\times$ or $\cdot$) should be used for multiplication. \paragraph{Vector Subspaces:} Take commutative group $(\set{V},{+})$ and field $(\set{F},{+},{\times})$. Assume that $\set{V}$ is a vector space over the field $\set{F}$ with scalar multiplication operator $\times$. Juxtaposition (\ie, placing two elements next to each other without an operator) will be used as a shorthand for multiplication, where the definition of multiplication depends on the context. Additionally, take $\set{W}$ to be a commutative subgroup of $\set{V}$. If it is the case that for any $a \in \set{F}$ and any $\v{x},\v{y} \in \set{W}$, % \begin{enumerate}[(i)] \item $a \v{x} \in \set{W}$ \item $\v{x} + \v{y} \in \set{W}$ \label{item:vector_subspace_addition} \item $0 \in \set{W}$, where $0$ is the additive identity for group $(\set{V},{+})$ \label{item:vector_subspace_identity} \end{enumerate} % then $\set{W}$ is called a \emph{vector subspace} of $\set{V}$. Note that (\shortref{item:vector_subspace_addition}) and (\shortref{item:vector_subspace_identity}) are redundant since $(\set{W},{+})$ is a subgroup; we list them here for emphasis only. In other words, a commutative subgroup of a vector space only needs to be \emph{closed} under the vector space's scalar multiplication in order to be called a vector subspace. Sometimes the term \emph{linear subspace} or simply \emph{subspace} is used instead of vector subspace. \paragraph{Interpretation:} Roughly, a vector can be thought of as any element that has some form of magnitude and direction (\eg, length and angle). Different vectors that have the same magnitude may point in different directions. Scalars then \emph{scale} the length of a vector. If a vector is multiplied by a \emph{negative} scalar, the length of the vector is not only scaled but its direction is reversed. Concrete examples of vector spaces will be given in \longref{app:math_linear_algebra}. \paragraph{Fields as Vector Spaces:} Note that any field is trivially a vector space with itself as a base field. That is, any field $(\set{F},{+},{\times})$ is a vector space over itself equipped with scalar vector multiplication operator ${\times}$. \paragraph{Vector Spaces over Commutative Rings:} A vector space over a commutative ring can be defined exactly as above. In fact, everything above holds with vector spaces over commutative rings; this is the case because none of the requirements above involve multiplicative inverses of scalars (\ie, scalar division). \paragraph{Commutative Rings as Vector Spaces:} Note that any commutative ring is trivially a vector space with itself as a base commutative ring. That is, any commutative ring $(\set{R},{+},{\times})$ is a vector space over itself equipped with scalar vector multiplication operator ${\times}$. \subsection{Linear and Bilinear Functions} \label{app:math_linear_operator} Take $\set{X}$ and $\set{Y}$ to be two vector spaces over the same base field $(\set{F},{+},{\times})$. A function $f: \set{X} \mapsto \set{Y}$ is called \emph{linear} if for any vectors $x,y \in \set{X}$ and any scalar $a \in \set{F}$, it is the case that % \begin{enumerate}[(i)] \item $f(x+y) = f(x) + f(y)$ \item $f(ax) = af(x)$ \end{enumerate} % where juxtaposition is used to indicate scalar (vector) multiplication. It is equivalent to say that the function $f: \set{X} \mapsto \set{Y}$ is linear if and only if for any vectors $x,y \in \set{X}$ and scalars $a,b \in \set{F}$, $f(ax+by)=af(x)+bf(y)$. \paragraph{Bilinear Functions:} Take $\set{X}$, $\set{Y}$, and $\set{Z}$ to be three vector spaces over the same field $(\set{F},{+},{\times})$ so that juxtaposition denotes scalar (vector) multiplication. A function $f: \set{X} \times \set{Y} \mapsto \set{Z}$ is called \emph{bilinear} if for any vectors $\v{x}_1,\v{x}_2 \in \set{X}$ and $\v{y}_1,\v{y}_2 \in \set{Y}$ and scalars $a,b \in \set{F}$, it is the case that % \begin{enumerate}[(i)] \item $f(a \v{y}_1 + b \v{y}_2,\v{y}_1) = a f(\v{x}_1,\v{y}_1) + b f(\v{x}_2,\v{y}_1)$ \label{item:bilinear_first_argument} \item $f(\v{x}_1,a \v{y}_1 + b \v{y}_2) = a f(\v{x}_1,\v{y}_1) + b f(\v{x}_1,\v{y}_2)$ \label{item:bilinear_second_argument} \end{enumerate} % Take any $\v{y}_0 \in \set{Y}$. Define a function $g: \set{X} \mapsto \set{Z}$ with $g(\v{x}) \triangleq f(\v{x},\v{y}_0)$. Since $f$ is bilinear then by property (\shortref{item:bilinear_first_argument}), the new function $g$ is linear. This is why property (\shortref{item:bilinear_first_argument}) is called being \emph{linear in the first argument}. Similarly, property (\shortref{item:bilinear_second_argument}) is called being \emph{linear in the second argument}. Note that by these two properties, it is always the case that for any vectors $\v{x} \in \set{X}$ and $\v{y} \in \set{Y}$ and scalars $a,b \in \set{F}$, % \begin{equation} \begin{split} f( a \v{x}, b \v{y} ) &= b f( a \v{x}, \v{y} ) = a b f( \v{x}, \v{y} ) = b f( \v{x}, a \v{y} ) = f( b \v{x}, a \v{y} )\\ &= a f( \v{x}, b \v{y} ) = b a f( \v{x}, \v{y} ) = a f( b \v{x}, \v{y} )\\ &= f( a b \v{x}, \v{y} ) = f( \v{x}, a b \v{y} )\\ &= f( b a \v{x}, \v{y} ) = f( \v{x}, b a \v{y} ) \end{split} \label{eq:bilinear_faux_associative} \end{equation} % Also note that since $\set{F}$ is a field (and thus a commutative ring), $a b = b a$, and so some of the equalities in \longref{eq:bilinear_faux_associative} are redundant. \paragraph{Bilinear Operators:} Take $\set{X}$ to be a vector space over the field $(\set{F},{+},{\times})$ and denote scalar vector multiplication operator with $\otimes$. Recall that the scalar vector multiplication is a function of two arguments, namely ${\otimes}: \set{F} \times \set{X} \mapsto \set{X}$. Thus, for $a \in \set{F}$ and $\v{x} \in \set{X}$, the notation % \begin{equation*} a \otimes \v{x} \triangleq \mathop{\otimes}(a,\v{x}) \end{equation*} % where $a \v{x}$ (\ie, juxtaposition) will be an alternate way of indicating $a \otimes \v{x}$. Recall that any field is trivially a vector space over itself. Thus, $\set{F}$ and $\set{X}$ are two vector spaces defined over the same field. Additionally, by the definition of a vector space, the scalar vector multiplication $\otimes$ is a bilinear function. Using those properties, it is simple to verify that for $a,b,c,d \in \set{F}$ and $\v{x}_1,\v{x}_2 \in \set{X}$, % \begin{itemize} \item $\mathop{\otimes}( ac + bd, \v{x}_1 ) = a \mathop{\otimes}( c, \v{x}_1 ) + b \mathop{\otimes}( d, \v{x}_1 )$ \item $\mathop{\otimes}( c, a\v{x}_1 + b\v{x}_2 ) = a \mathop{\otimes}( c, \v{x}_1 ) + b \mathop{\otimes}( c, \v{x}_2 )$ \end{itemize} % which is equivalent to the statement that % \begin{itemize} \item $(ac + bd) \otimes \v{x}_1 = a ( c \otimes \v{x}_1 ) + b ( d \otimes \v{x}_1 )$ \item $c \otimes ( a\v{x}_1 + b\v{x}_2 ) = a ( c \otimes \v{x}_1 ) + b ( c \otimes \v{x}_2 )$ \end{itemize} % which is equivalent to the statement that % \begin{itemize} \item $(ac + bd) \v{x}_1 = ac\v{x}_1 + bd\v{x}_1$ \item $c ( a\v{x}_1 + b\v{x}_2 ) = ac\v{x}_1 + bc\v{x}_2$ \end{itemize} % and so operator $\otimes$ is called a \emph{bilinear operator} because it is an operator that is linear in both its first and second arguments. \subsection{Algebra over a Field} \label{app:math_algebra_over_a_field} Take magma $(\set{A},{\times})$ so that $\set{A}$ is a vector space over the field $(\set{F},{+},{\times})$. Denote vector addition with $+$ and scalar vector multiplication with $\times$. Thus, the symbol $\times$ can be used to denote three different multiplication operators, namely % \begin{enumerate}[(i)] \item vector (vector) multiplication from $(\set{A},{\times})$ (\ie, ${\times}: \set{A} \times \set{A} \mapsto \set{A}$) \label{item:vector_vector_mult} \item scalar (vector) multiplication from the vector space (\ie, ${\times}: \set{F} \times \set{A} \mapsto \set{A}$) \label{item:scalar_vector_mult} \item multiplication from $(\set{F},{\times})$ (\ie, ${\times}: \set{F} \times \set{F} \mapsto \set{F}$) \label{item:scalar_mult} \end{enumerate} % where the new multiplication in (\shortref{item:vector_vector_mult}) provides a method for finding the product of two vectors. Note that this vector multiplication is a binary operator where the vector spaces making up its two arguments are both defined over the same field. Thus, it is possible that this operator is bilinear. If the vector multiplication is a bilinear operator then $\set{A}$ is called an \emph{algebra} over the field $\set{F}$ or an $\set{F}$-algebra, where $\set{F}$ is also called the \emph{base field} of algebra $\set{A}$. For example, take $\set{A}$ to be an algebra over the field $\set{F}$ where $+$ denotes both scalar and vector addition and $\times$ or juxtaposition denotes all three forms of multiplication. By the bilinear property of vector multiplication, it is the case that for every $\v{x},\v{y},\v{z} \in \set{A}$ and scalars $a,b \in \set{F}$, % \begin{itemize} \item $(\v{x}+\v{y})\v{z} = \v{x}\v{z}+\v{y}\v{z}$ \item $(a \v{x})(\v{y}) = (a)(\v{x}\v{y})$ \item $\v{x}(\v{y}+\v{z}) = \v{x}\v{y}+\v{x}\v{z}$ \item $(\v{x})(b \v{y}) = (b)(\v{x}\v{y})$ \end{itemize} % which can be summarized by % \begin{itemize} \item $(a\v{x}+b\v{y})\v{z} = a\v{x}\v{z}+b\v{y}\v{z}$ \item $\v{x}(a\v{y}+b\v{z}) = a\v{x}\v{y}+b\v{x}\v{z}$ \end{itemize} % Note that for $\v{x},\v{y} \in \set{A}$ and $a \in \set{F}$, $a \v{x} \v{y} = \v{x} a \v{y}$. For this reason, while it is not technically correct, for $\v{x} \in \set{A}$ and $a \in \set{F}$, the notation $\v{x} a$ is usually taken to be equivalent to the notation $\v{x} a$ even though the product $\v{x} a$ is not technically defined. \paragraph{Associative Algebras:} Take algebra $\set{A}$ over the field $\set{F}$. If $(\set{A},{\times})$ is a semigroup (\ie, vector multiplication is associative) then $\set{A}$ is called an \emph{associative algebra}. \paragraph{Unitary Associative Algebras:} Take algebra $\set{A}$ over the field $\set{F}$. If $(\set{A},{\times})$ is a monoid (\ie, a vector multiplicative identity exists) then $\set{A}$ is called an \emph{unitary (or unital) associative algebra}. Note that $(\set{A},{+})$ is a group and $(\set{A},{\times})$ is a monoid and multiplication distributes over addition; therefore, $\set{A}$ is also a ring. Note, however, that $\set{A}$ is not generally a commutative ring. \paragraph{Algebras over Commutative Rings:} As with vector spaces, all definitions above can be applied to algebras with bases of commutative rings instead of fields. Algebras need only be over fields when scalar multiplicative inverses (\ie, scalar division) are required. \paragraph{Fields as Algebras:} Because fields are trivially vector spaces over themselves and field multiplication is bilinear, any field is trivially an algebra over itself. In fact, because field multiplication is associative, any field is trivially an associative algebra over itself. \paragraph{Commutative Rings as Algebras:} Because commutative rings are trivially vector spaces over themselves and commutative ring multiplication is bilinear, any commutative ring is trivially an algebra over itself. In fact, because commutative ring multiplication is associative, any commutative ring is trivially an associative algebra over itself. \section{Boolean Rings and Algebras} \label{app:math_boolean_rings_and_algebras} We now introduce two new algebraic structures that have special applications in set theory and logic. \Citet{Stoll79} gives detailed information about these structures. We introduce them here to provide analytical background for \longrefs{app:math_logic}, \shortref{app:math_measure}, and \shortref{app:math_probability}. However, we do show that these two structures are identical. \subsection{Boolean Rings} \label{app:math_boolean_rings} Take a ring $(\set{X},{+},{\times},0,1)$. To say that this is a \emph{Boolean ring} means that for all $x \in \set{X}$, $x \times x = x$. \paragraph{Boolean Rings as Commutative Rings:} Take $(\set{X},{+},{\times},0,1)$ to be a Boolean ring and take juxtaposition to denote multiplication (\ie, $\times$). Take $x \in \set{X}$. Since this is a Boolean ring, $xx = x$. Additionally, since $(\set{X},{+})$ is a commutative group, then there exists an element ${-x}$ such that $x + {-x} = 0$. Take such an element called ${-x}$. Now, note that % \begin{align*} x + x &= ( x + x )( x + x )\\ &= xx + xx + xx + xx\\ &= x + x + x + x \end{align*} % Thus, % \begin{align*} x + x + {-x} + {-x} &= x + x + x + x + {-x} + {-x}\\ &= x + x + x + {-x} + x + {-x}\\ &= x + x + 0 + 0\\ &= x + x + 0\\ &= x + x \end{align*} % However, $x + x + {-x} + {-x} = x + 0 + {-x} = x + {-x} = 0$. Therefore, % \begin{equation} x + x = 0 \label{eq:boolean_ring_xplusx} \end{equation} % Now, also take $y,{-y} \in \set{X}$ such that $y + {-y} = 0$. Again, $yy=y$. Additionally, % \begin{align*} x + y &= ( x + y )( x + y )\\ &= xx + xy + yx + yy\\ &= x + xy + yx + y \end{align*} % Thus, % \begin{align*} x + y + {-x} + {-y} &= x + xy + yx + y + {-x} + {-y}\\ &= x + {-x} + xy + yx + y + {-y}\\ &= 0 + xy + yx + y + {-y}\\ &= 0 + xy + yx + 0\\ &= xy + yx + 0\\ &= xy + yx \end{align*} % However, $x + y + {-x} + {-y} = x + {-x} + y + {-y} = 0 + 0 = 0$, and so $xy + yx = 0$. This means that $xy + yx + yx = yx$. However, $xy + yx + yx = xy + 0 = xy$. Therefore, % \begin{equation*} xy = yx \end{equation*} % Thus, every Boolean ring is a commutative ring. Additionally, in a Boolean ring, the addition of any element with itself is the additive identity (\ie, \longref{eq:boolean_ring_xplusx}). \paragraph{Boolean Rings as Algebras:} Take $(\set{X},{+},{\times},0,1)$ to be a Boolean ring. As shown, this Boolean ring is also a commutative ring. Of course, all commutative rings are trivially algebras over themselves. Thus, $(\set{X},{+},{\times},0,1)$ is an algebra with itself as a base ring. \subsection{Boolean Algebra} \label{app:math_boolean_algebra} We now introduce a new algebraic structure that is not based on any of the previous structures. Take a nonempty set $\set{X}$ with elements $0$ and $1$ (\ie, $0,1 \in \set{X}$). Additionally, define operations $\lor: \set{X} \times \set{X} \mapsto \set{X}$, $\land: \set{X} \times \set{X} \mapsto \set{X}$, and $\lnot: \set{X} \mapsto \set{X}$ such that all of the following are satisfied. % \begin{enumerate}[(i)] \item For all $x,y,z \in \set{X}$, % \begin{equation*} x \lor (y \lor z) = (x \lor y) \lor z \quad \text{ and } \quad x \land (y \land z) = (x \land y) \land z \end{equation*} % That is, $\lor$ and $\land$ are both \emph{associative} operations. \item For all $x,y \in \set{X}$, % \begin{equation*} x \lor y = y \lor x \quad \text{ and } \quad x \land y = y \land x \end{equation*} % That is, $\lor$ and $\land$ are both \emph{commutative} operations. \item For all $x,y,z \in \set{X}$, % \begin{equation*} x \lor (y \land z)=(x \lor y) \land (x \lor z) \quad \text{ and } \quad x \land (y \lor z)=(x \land y) \lor (x \land z) \end{equation*} % That is, $\lor$ \emph{distributes} over $\land$, and $\land$ distributes over $\lor$. \item For any $x \in \set{X}$, % \begin{equation*} x \lor 0 = x \quad \text{ and } \quad x \land 1 = x \end{equation*} % That is, $0$ is the \emph{identity element} for $\lor$ and $1$ is the \emph{identity element} for $\land$. \item For any $x \in \set{X}$, % \begin{equation*} x \lor \lnot x = 1 \quad \text{ and } \quad x \land \lnot x = 0 \end{equation*} % This is like an inverse property. In fact, for any $x \in \set{X}$, $\lnot x$ will be called the \emph{complement} of $x$. \end{enumerate} % Together this set, these operations, and these two elements, represented as the $6$-tuple $(\set{X},{\lor},{\land},{\lnot},0,1)$, is called a \emph{Boolean algebra}. \paragraph{Properties of Boolean Algebras:} Take a Boolean algebra $(\set{X},{\lor},{\land},{\lnot},0,1)$. It can be easily shown that all of the following hold. % \begin{itemize} \item Take $y \in \set{X}$. If $x \lor y = x$ for any $x \in \set{X}$ then $y = 0$. That is, $0$ is a unique element of $\set{X}$. \item Take $y \in \set{X}$. If $x \land y = x$ for any $x \in \set{X}$ then $y = 1$. That is, $1$ is a unique element of $\set{X}$. \item Take $x,y \in \set{X}$. If $x \land y = 1$ and $x \lor y = 0$ then $y = \lnot x$. That is, every element has a unique complement. \item For all $x \in \set{X}$, $\lnot( \lnot x ) = x$. \item It is the case that $0 = \lnot 1$ and $1 = \lnot 0$. That is, $0$ and $1$ are complements of each other. \item For any $x \in \set{X}$, $x \lor x = x$ and $x \land x = x$. \item For any $x \in \set{X}$, $x \lor 1 = 1$ and $x \land 0 = 0$. \item For any $x,y \in \set{X}$, $x \lor (x \land y) = x$ and $x \land (x \lor y) = x$. \item For any $x,y \in \set{X}$, $\lnot(x \lor y) = \lnot x \land \lnot y$ and $\lnot(x \land y) = \lnot x \lor \lnot y$. \item For any $x,y \in \set{X}$, $x \land y = x$ if and only if $x \lor y = y$. \end{itemize} \paragraph{Boolean Algebra Ordering:} Take a Boolean algebra $(\set{X},{\lor},{\land},{\lnot},0,1)$. Introduce the ordering operation $\leq$ such that for any $x,y \in \set{X}$, % \begin{equation*} x \leq y \quad \text{ if and only if } \quad x \land y = x \end{equation*} % Of course, $x \land y = x$ if and only if $x \lor y = y$ for any $x,y \in \set{X}$. Therefore, an equivalent definition of $\leq$ is that for any $x,y \in \set{X}$, % \begin{equation*} x \leq y \quad \text{ if and only if } \quad x \lor y = y \end{equation*} % Clearly, for all $x,y \in \set{X}$, % \begin{itemize} \item $x \leq x$ \item if $x \leq y$ and $y \leq x$ then $x = y$ \item if $x \leq y$ and $y \leq z$ then $x \leq z$ \end{itemize} % and so $\set{X}$ equipped with $\leq$, denoted $(\set{X},{\leq})$ or $(\set{X},{\lor},{\land},{\lnot},0,1,{\leq})$, is a partially ordered set which we will call an \emph{ordered Boolean algebra}. Take $x,y \in \set{X}$. Note that by one of the properties of a Boolean algebra, % \begin{equation*} (x \land y) \lor x = x \quad \text{ and } \quad (x \land y) \lor y = y \end{equation*} % and therefore $(x \land y) \leq x$ and $(x \land y) \leq y$. Now, assume that $z \in \set{X}$ is such that $z \leq x$ and $z \leq y$. That is, % \begin{equation*} z \land x = z \quad \text{ and } \quad z \land y = z \end{equation*} % Note that % \begin{align*} z \land (x \land y) &= z \land x \land y\\ &= (z \land x) \land y\\ &= z \land y\\ &= z \end{align*} % Therefore, $z \leq (x \land y)$. It can similarly be shown that $x \leq (x \lor y)$ and $y \leq (x \lor y)$ and for any $z$ such that $x \leq z$ and $y \leq z$, $(x \lor y) \leq z$. Thus, for all $x,y \in \set{X}$, % \begin{itemize} \item the \emph{greatest lower bound} or \emph{infimum} or \emph{meet} of $x$ and $y$ is $x \land y$; that is, $\inf \{x,y\} = x \land y$ \item the \emph{least upper bound} or \emph{supremum} or \emph{join} of $x$ and $y$ is $x \lor y$; that is, $\sup \{x,y\} = x \lor y$ \end{itemize} % Therefore, since the pairwise meet and pairwise join exist for any pair, $(\set{X},{\leq})$ is a lattice. Additionally, note that for all $x \in \set{X}$, % \begin{itemize} \item $x \leq 1$ \item $0 \leq x$ \end{itemize} % Therefore, $1$ is the greatest (\ie, top) element of $(\set{X},{\leq})$ and $0$ is the least (\ie, bottom) element of $(\set{X},{\leq})$. This makes $(\set{X},{\leq})$ a bounded lattice. Finally, take $x,y \in \set{X}$ and assume that $x \leq y$. That is, assume that $x \land y = x$. Thus, $\lnot x = \lnot ( x \land y ) = \lnot x \lor \lnot y$, and so $\lnot y \leq \lnot x$. In summary, % \begin{itemize} \item every ordered Boolean algebra is a partially ordered set \item every ordered Boolean algebra is a bounded lattice \item for any two elements of an ordered Boolean algebra, one element is less than or equal to another element if and only if its complement is greater than or equal to the other complement \end{itemize} \subsection{Boolean Rings as Boolean Algebras} Take a Boolean ring $(\set{X},{+},{\times},0,1)$. Recall that this Boolean ring can be called an algebra since all commutative rings are algebras. Now, introduce the meet operator $\land: \set{X} \times \set{X} \mapsto \set{X}$, the join operator $\lor: \set{X} \times \set{X} \mapsto \set{X}$, and the complement operator $\lnot: \set{X} \times \set{X}$ such that for any $x,y \in \set{X}$, % \begin{itemize} \item $x \land y \triangleq x \times y$ \item $x \lor y \triangleq x + y + (x \times y)$ \item $\lnot x \triangleq 1 + x$ \end{itemize} % Denote Boolean ring $\set{X}$ equipped with operators $\land$, $\lor$, $\lnot$, and elements $0$ and $1$ with the $6$-tuple $(\set{X},{\lor},{\land},{\lnot},0,1)$. Using all of the properties endowed to $+$, $\times$, $0$, and $1$, it is easy to show that $(\set{X},{\lor},{\land},{\lnot},0,1)$ is a Boolean algebra. That is, the Boolean ring $(\set{X},{+},{\times},0,1)$ is an algebra and $(\set{X},{\lor},{\land},{\lnot},0,1)$ is a Boolean algebra. This is true for all Boolean rings. \subsection{Boolean Algebras as Boolean Rings} Take a Boolean algebra $(\set{X},{\lor},{\land},{\lnot},0,1)$. Introduce the addition operator $+: \set{X} \times \set{X} \mapsto \set{X}$ and the multiplication operator $\times: \set{X} \times \set{X} \mapsto \set{X}$ such that for any $x,y \in \set{X}$, % \begin{itemize} \item $x \times y \triangleq x \land y$ \item $x + y \triangleq (x \lor y) \land (\lnot x \lor \lnot y)$ \end{itemize} % Denote Boolean algebra $\set{X}$ equipped with operations $+$ and $\times$ and elements $0$ and $1$ with the $5$-tuple $(\set{X},{+},{\times},0,1)$. \paragraph{Boolean Algebra as Commutative Group:} Take Boolean algebra $(\set{X},{+},{\times},0,1)$. Using all of the properties endowed to $\land$, $\lor$, $\lnot$, $0$, and $1$, it can be easily shown that % \begin{itemize} \item for all $x,y,z \in \set{X}$, $(x + y) + z = x + (y + z)$ \item for all $x \in \set{X}$, $0 + x = x + 0 = x$ \item for all $x \in \set{X}$, $x + x = 0$ \item for all $x,y \in \set{X}$, $x + y = y + x$ \end{itemize} % That is, $(\set{X},{+})$ is a commutative group with identity element $0$ where every element is its own additive inverse. \paragraph{Boolean Algebra as Commutative Monoid:} Take Boolean algebra $(\set{X},{+},{\times},0,1)$. Using all of the properties endowed to $\land$, $\lor$, $\lnot$, $0$, and $1$, it can be easily shown that % \begin{itemize} \item for all $x,y,z \in \set{X}$, $(x \times y) \times z = x \times (y \times z)$ \item for all $x \in \set{X}$, $1 \times x = x \times 1 = x$ \item for all $x,y \in \set{X}$, $x \times y = y \times x$ \end{itemize} % That is, $(\set{X},{\times})$ is a commutative monoid with identity element $1$. \paragraph{Boolean Algebra as Commutative Ring:} Take Boolean algebra $(\set{X},{+},{\times},0,1)$. Using all of the properties endowed to $\land$, $\lor$, $\lnot$, $0$, and $1$, it can be easily shown that for all $x,y,z \in \set{X}$, % \begin{equation*} x \times (y + z) = (x \times y) + (x \times z) \quad \text{ and } \quad (x + y) \times z = (x \times z) + (y \times z) \end{equation*} % Therefore, $(\set{X},{+},{\times},0,1)$ is a commutative ring. \paragraph{Boolean Algebra as Boolean Ring:} Take Boolean algebra $(\set{X},{+},{\times},0,1)$. Using all of the properties endowed to $\land$, $\lor$, $\lnot$, $0$, and $1$, it can be easily shown that for all $x \in \set{X}$, % \begin{equation*} x \times x = x \end{equation*} % Therefore, $(\set{X},{+},{\times},0,1)$ is a Boolean ring. In fact, we have already shown that $x + x = 0$ since each element is its own additive inverse. Therefore, all Boolean algebras are Boolean rings. Since Boolean rings are all algebras (over rings), then Boolean algebras are also algebras. \subsection{Equivalence of Boolean Algebras and Boolean Rings} Because every Boolean algebra is a Boolean ring and every Boolean ring is a Boolean algebra, the two structures are equivalent. \subsection{Subalgebras of Boolean Algebras} \label{app:math_boolean_subalgebras} Take a set $\set{U}$ where $(\set{U},{\lor},{\land},{\lnot},0,1)$ is a Boolean algebra. Sometimes this is called a \emph{Boolean algebra over the set $\set{U}$}. Note that $\{0,1\} \subseteq \set{U}$ and % \begin{equation*} (\{0,1\},{\lor},{\land},{\lnot},0,1) \end{equation*} % is also a Boolean algebra over the subset $\{0,1\}$; therefore, it is called a \emph{subalgebra}. It is not necessary to list every algebraic operation with a subalgebra as it is implied that they are the same as the Boolean algebra which is over the superset. In other words, the set $\{0,1\}$ is a subalgebra of the Boolean algebra $(\set{U},{\lor},{\land},{\lnot},0,1)$. That being said, we will call $(\set{U},{\lor},{\land},{\lnot},0,1)$ the \emph{$\set{U}$ Boolean algebra} for brevity. \paragraph{Requirements for a Subalgebra:} Take the algebra $(\set{U},{\lor},{\land},{\lnot},0,1)$ and subset $\set{X} \subseteq \set{U}$. To say subset $\set{X}$ forms a \emph{subalgebra of the \set{U} Boolean algebra} means that % \begin{enumerate}[(i)] \item for any $x,y \in \set{X}$, $x \land y \in \set{X}$ \label{item:boolean_algebra_closure_and} \item for any $x,y \in \set{X}$, $x \lor y \in \set{X}$ \label{item:boolean_algebra_closure_or} \item for any $x \in \set{X}$, $\lnot x \in \set{X}$ \label{item:boolean_algebra_closure_not} \end{enumerate} % This will ensure that every one of the requirements for a Boolean algebra hold, thus justifying calling $\set{X}$ a subalgebra of the $\set{U}$ Boolean algebra. Assume that $\set{X}$ is a subalgebra of the $\set{U}$ Boolean algebra and take $x \in \set{X}$. % \begin{itemize} \item By property (\shortref{item:boolean_algebra_closure_not}), $\lnot x \in \set{X}$. \item By property (\shortref{item:boolean_algebra_closure_and}), since $x \in \set{X}$ and $\lnot x \in \set{X}$ then $x \land \lnot x \in \set{X}$; however, $x \land \lnot x = 0$ and so $0 \in \set{X}$. \item Additionally, by property (\shortref{item:boolean_algebra_closure_or}), since $x \in \set{X}$ and $\lnot x \in \set{X}$ then $x \lor \lnot x \in \set{X}$; however, $x \lor \lnot x = 1$ and so $1 \in \set{X}$. \end{itemize} % Thus, the \emph{trivial subalgebra} of the $\set{U}$ Boolean algebra is $\{0,1\}$. \subsection{Propositional Logic and the Trivial Boolean Algebra} \label{app:math_prop_logic_boolean_algebra} A trivial Boolean algebra takes the form % \begin{equation*} (\{0,1\},{\lor},{\land},{\lnot},0,1) \end{equation*} % That is, the trivial Boolean algebra contains only the two unique identity elements. Another trivial Boolean algebra that is important to us is % \begin{equation*} (\{\text{false},\text{true}\}, \text{or}, \text{and}, \text{not}, \text{false}, \text{true}) \end{equation*} % This is the Boolean algebra which is the basis for the logic described in \longref{app:math_logic}. Of course, all trivial Boolean algebras are isomorphic to each other; that is, symbols and notation can be substituted for each other. This is the justification for the use of $\land$ for \emph{and}, $\lor$ for \emph{or}, and $\lnot$ for \emph{not}. Similarly, as we will show in \longref{app:math_algebras_of_sets}, sets with their set operations can be shown to be Boolean algebras as well. This is the reason for the similarities between operations like $\cap$ with sets and $\land$ for logic. This shows the utility of algebra. By identifying common structures, algebra provides a context for very general result that can prevent repetitive work and reveal relationships that may not have been easily anticipated in the specialized context. \paragraph{Boolean Algebra Ordering and Logical Implication:} Recall the topic of \emph{statements} and \emph{implication} in propositional logic. Take $x$ and $y$ to be two logical statements. To say that $x$ implies $y$ means that % \begin{itemize} \item if $x$ is true then $y$ must be true \item if $x$ is not true then $y$ may be either true or false \end{itemize} % Assume that $x$ implies $y$. Clearly, if $x$ is false then the statement \emph{$x$ and $y$} must also be false. Additionally, if $x$ is true then $y$ must be true so \emph{$x$ and $y$} must also be true. Clearly, saying $x$ implies $y$ is equivalent to saying that % \begin{equation*} x \land y = x \end{equation*} % where $\land$ is a symbol that represents \emph{and}. However, above this was used as the definition for the partial order $\leq$. That is, saying that \emph{$x$ implies $y$} is equivalent to saying that $x \leq y$. To understand this, note that by this definition of $\leq$, it is the case that $\text{false} \leq \text{true}$. Now assume that $x \leq y$. Both $x,y \in \{\text{false},\text{true}\}$. Thus, if $x = \text{true}$ then it must be that $y = \text{true}$ because $x \leq y$ and $\text{false} \leq \text{true}$. However, if $x = \text{false}$ then $y \in \{\text{false},\text{true}\}$; that is, when $x$ is false, nothing can be said about $y$. Therefore, the $\leq$ relation matches what is expected from implication. Additionally, note that since every Boolean algebra is a totally ordered set, if $x \leq y$ and $y \leq x$ then $x = y$. This also matches implication. That is, if $x$ implies $y$ and $y$ implies $x$ then $x$ and $y$ are equivalent; $x$ is true if and only if $y$ is true. Therefore, in a Boolean algebra, % \begin{itemize} \item Elements are ordered by implication. \item Equivalent elements imply each other. \end{itemize} \paragraph{Boolean Algebra and the Exclusive Or:} Take the trivial Boolean algebra $(\{\text{false},\text{true}\}, \text{or}, \text{and}, \text{not}, \text{false}, \text{true})$. As every Boolean algebra is a Boolean ring, we can define an addition operator $\text{xor}$ so that for any $x,y \in \{\text{false},\text{true}\}$, % \begin{equation*} x \text{ xor } y \triangleq ( x \text{ or } y ) \text{ and } ( \text{not } x \text{ or } \text{not } y ) \end{equation*} % And thus, % \begin{itemize} \item $\text{false} \text{ xor } \text{false} = \text{false}$ \item $\text{false} \text{ xor } \text{true} = \text{true}$ \item $\text{true} \text{ xor } \text{false} = \text{true}$ \item $\text{false} \text{ xor } \text{false} = \text{false}$ \end{itemize} % In other words, using the more conventional $0$, $1$, and $+$, $0+0=0$, $0+1=1$, $1+0=1$, and $1+1=0$. The operation \emph{xor} is known as the \emph{exclusive or} (as opposed to the \emph{inclusive or} which is another name for the conventional \emph{or}) and the statement $x \text{ xor } y$ is only true if exactly one of $x$ and $y$ are true. Thus, the exclusive or can be viewed as addition in a trivial Boolean algebra. In fact, addition in any Boolean algebra can be conceptualized as a type of exclusive or. \section{Sets of Sets: Order and Algebra} \label{app:math_sets_sets} Recall that the power set of a set $\set{U}$ is a set of sets that contains every subset of $\set{U}$. That is, for any set $\set{U}$ and any subset $\set{X} \subseteq \set{U}$, it is the case that $\set{X} \in \Pow(\set{U})$. In other words, for a set $\set{U}$, % \begin{equation*} \Pow(\set{U}) = \{ \set{X} : \set{X} \subseteq \set{U} \} \end{equation*} % Thus, $\Pow(\set{U})$ serves as a universal set for every subset of $\set{U}$. Elements of the power set are related by $\subseteq$ and can generate new subsets with the $\cap$, $\cup$, and ${}^c$ operations. Therefore, it is interesting to look at $\Pow(\set{U})$ in an order or algebraic context. This will not only reveal properties of the structure of the power set but of a large class of sets of sets. \subsection{The Partially Ordered and Complete Power Set} \label{app:math_poset_powerset} The power set has a special ordering with some interesting properties. Take a nonempty set $\set{S}$. We first show that $(\Pow(\set{X}),{\subseteq})$ is a partially ordered set and then we show that it is a complete lattice. In \longref{app:math_sets_of_subsets_lattices}, we use this as motivation for a general property of a class of sets of sets. \paragraph{Power Set as Poset:} Take a \emph{nonempty} set $\set{S}$. Note that for any subsets $\set{X} \subseteq \set{S}$ and $\set{Y} \subseteq \set{S}$ and $\set{Z} \subseteq \set{S}$, it is the case that % \begin{itemize} \item $\set{X},\set{Y},\set{Z} \in \Pow(\set{S})$ \item $\set{X} \subseteq \set{X}$ \item if $\set{X} \subseteq \set{Y}$ and $\set{Y} \subseteq \set{X}$ then $\set{X} = \set{Y}$ \item if $\set{X} \subseteq \set{Y}$ and $\set{Y} \subseteq \set{Z}$ then $\set{X} \subseteq \set{Z}$ \end{itemize} % Therefore, $(\Pow(\set{S}),{\subseteq})$ is a partially ordered set. In other words, any subset $\setset{S} \subseteq \Pow(\set{S})$ is partially ordered by $\subseteq$. This is known as being \emph{(partially) ordered by inclusion}. \paragraph{Power Set as Complete Lattice:} Take a nonempty set $\set{S}$. Define $\setset{S}$ to be an arbitrary \emph{nonempty} set of subsets of $\set{S}$. That is, $\setset{S} \subseteq \Pow(\set{S})$ and $\setset{S} \neq \emptyset$. Notice that for any set $\set{X} \in \setset{S}$, % \begin{itemize} \item $\bigcap \setset{S} \in \Pow(\set{S})$ \item $\bigcup \setset{S} \in \Pow(\set{S})$ \item $\bigcap \setset{S} \subseteq \set{X}$ \item $\set{S} \subseteq \bigcup \setset{S}$ \end{itemize} % where $\bigcup \setset{S}$ is the union of all sets included in $\setset{S}$ and $\bigcap \setset{S}$ is the intersection of sets included in $\setset{S}$. Therefore, for partially ordered set $(\Pow(\set{S}),{\subseteq})$ and $\setset{S} \subseteq \Pow(\set{S})$, % \begin{itemize} \item $\inf \setset{S} = \bigcap \setset{S}$ \item $\sup \setset{S} = \bigcup \setset{S}$ \end{itemize} % and thus $\setset{S}$ has a least upper bound and a greatest lower bound. Since $\setset{S}$ is an arbitrary subset of $\Pow(\set{S})$, then $(\Pow(\set{S}),{\subseteq})$ is a \emph{complete lattice}. In this case, the infimum of $\setset{S}$ is often called its \emph{meet} as it represents the set of elements common to all sets included in $\setset{S}$. Similarly, the supremum of $\setset{S}$ is often called its \emph{join} as it represents the set of all elements collected from all sets included in $\setset{S}$. Keeping that in mind, note that for any sets $\set{X},\set{Y} \in \Pow{S}$ (\ie, any $\set{X} \subseteq \set{S}$ and $\set{Y} \subseteq \set{S}$), % \begin{itemize} \item $\set{X} \cap \set{Y} \in \Pow(S)$ \item $\set{X} \cup \set{Y} \in \Pow(S)$ \item $\set{X} \cap \set{Y} \subseteq \set{X}$ and $\set{Y} \cap \set{Y} \subseteq \set{X}$ \item $\set{X} \subseteq \set{X} \cup \set{Y}$ and $\set{Y} \subseteq \set{X} \cup \set{Y}$ \end{itemize} % This is the reason why the intersection of two sets is often called their \emph{meet} and the union of two sets is often called their \emph{join}. Also note the similarity in the symbols $\cap$ and $\land$, $\cup$ and $\lor$, $\bigcap$ and $\bigwedge$, $\bigcup$ and $\bigvee$; this is not coincidental. Finally, note that % \begin{itemize} \item $\{\} \in \Pow(\set{S})$ \item $\set{S} \in \Pow(\set{S})$ \item $\inf \Pow(\set{S}) = \{\}$ \item $\sup \Pow(\set{S}) = \set{S}$ \end{itemize} % Therefore, $\set{S}$ is, of course, a bounded lattice and $\min \Pow(\set{S}) = \{\}$ and $\max \Pow(\set{S}) = \set{S}$. \subsection{General Sets of Subsets as Complete Lattices} \label{app:math_sets_of_subsets_lattices} As shown in \longref{app:math_poset_powerset}, the power set of any nonempty set is partially ordered by inclusion and forms a complete (and therefore bounded) lattice. There must be some subsets of the power set for which this is also true. In particular, consider a subset of a power set for which % \begin{enumerate}[(i)] \item the set is also a poset (\ie, it includes all pairwise meets and joins) \label{item:subposet} \item the intersection or union of any finite or infinite set of its elements is also in the set \label{item:closure_under_meets_and_joins} \end{enumerate} % Using an argument similar to the one used in \longref{app:math_poset_powerset}, the subset of the power set must also be a complete lattice. However, because the ordering is by inclusion (\ie, $\subseteq$), if property (\shortref{item:closure_under_meets_and_joins}) is met then it is clear that property (\shortref{item:subposet}) is also met. That is, a pairwise meet is an intersection and a pairwise join is a union, and so the inclusion of all intersections and unions makes it necessary that pairwise meets and pairwise joins are included. Therefore, as long as a set of sets is \emph{closed} under arbitrary unions and intersections, it must be a complete lattice. \paragraph{Closure Implies Poset:} Take a set $\set{S}$ and a set $\setset{S} \in \Pow(\set{S})$. Assume that $\setset{S}$ is closed under arbitrary (possibly infinite) intersections and unions. In other words, for any subset $\setset{S}_0 \subseteq \setset{S}$, it is the case that % \begin{equation*} \bigcap \setset{S}_0 \in \setset{S} \quad \text{ and } \quad \bigcup \setset{S}_0 \in \setset{S} \end{equation*} % It has already been shown that $(\Pow(\set{S}),{\subseteq})$ is a partially ordered set. In particular, it has been shown that for all $\set{X},\set{Y} \in \setset{S}$, % \begin{itemize} \item $\set{X} \cap \set{Y} \in \Pow(\set{S})$ \item $\set{X} \cap \set{Y} \subseteq \set{X}$ and $\set{X} \cap \set{Y} \subseteq \set{Y}$ \item $\set{X} \cup \set{Y} \in \Pow(\set{S})$ \item $\set{X} \subseteq \set{X} \cup \set{Y}$ and $\set{Y} \subseteq \set{X} \cup \set{Y}$ \end{itemize} % However, since $\setset{S}$ is closed under intersections and unions, then $\set{X} \cap \set{Y} \in \setset{S}$ and $\set{X} \cup \set{Y} \in \setset{S}$. Therefore, $(\setset{S},{\subseteq})$ must also be a poset. \paragraph{Closure Implies Complete Lattice:} Take a set $\set{S}$ and a set $\setset{S} \in \Pow(\set{S})$. As before, assume that $\setset{S}$ is closed under arbitrary (possibly infinite) intersections and unions. Thus, $(\setset{S},{\subseteq})$ must also be a poset. It is clear to see that due to closure of $\setset{S}$ under intersections and unions, for any subset $\setset{S}_0 \subseteq \setset{S}$, % \begin{itemize} \item $\bigcap \setset{S}_0 \in \setset{S}$ \item $\inf \setset{S}_0 = \bigcap \setset{S}_0$ \item $\bigcup \setset{S}_0 \in \setset{S}$ \item $\sup \setset{S}_0 = \bigcup \setset{S}_0$ \end{itemize} % Therefore, any set of sets that is closed under arbitrary (possibly infinite) intersections and unions is a complete lattice when ordered by inclusion (\ie, $\subseteq$). That is, for any set $\set{X}$ closed under arbitrary intersections and unions, $(\set{X},{\subseteq})$ is a complete lattice (and therefore a bounded lattice as well). Recall that the symbol $\bigwedge$ (meet) will sometimes be used for $\inf$ (infimum) and $\bigvee$ (join) will sometimes be used for $\sup$ (supremum); this is often the case for sets of sets (especially when the set of sets is closed under arbitrary intersections (meets) and unions (joins)). \subsection{Filters on Sets} \label{app:math_filters_on_sets} The application of filters from \longref{app:math_filters_on_posets} has important uses in \emph{topology}, the subject of \longref{app:math_topology}. A framework of filters on sets allows for the discussion of overall trends of infinite sets. Therefore, it is useful for us to introduce them. Recall from \longref{app:math_poset_powerset} that for any set $\set{S}$, $(\Pow(\set{S}),{\subseteq})$ is a partially ordered set. \paragraph{Filter Bases:} Take a set $\set{S}$ and a \emph{nonempty} set $\setset{B} \subseteq \Pow(\set{S})$ (\ie, $\setset{B}$ is a set of subsets of $\set{S}$ and $\setset{B} \neq \emptyset$) where % \begin{enumerate}[(i)] \item $\emptyset \notin \setset{B}$ (and, again, $\setset{B} \neq \emptyset$) \item for any $\set{X} \in \setset{B}$ and $\set{Y} \in \setset{B}$, there exists a $\set{T} \in \setset{B}$ such that $\set{T} \subseteq \set{X} \cap \set{Y}$ \end{enumerate} % In this case, $\setset{B}$ is called a \emph{filter base on set $\set{S}$}. Note that $\setset{B}$ satisfies the conditions for a proper filter base on poset $(\Pow(\set{S}),{\subseteq})$. Note that for any elements $\set{X},\set{Y},\set{Z} \in \setset{B}$, % \begin{itemize} \item $\set{X} \supseteq \set{X}$ \item if $\set{X} \supseteq \set{Y}$ and $\set{Y} \supseteq \set{Z}$ then $\set{X} \supseteq \set{Z}$ \item there exists a $\set{T} \in \set{B}$ such that $\set{X} \supseteq \set{T}$ and $\set{Y} \supseteq \set{T}$ \end{itemize} % Therefore, $(\setset{B},{\supseteq})$ is a directed set and $(\setset{B},{\subseteq})$ is a downward directed set. Therefore, filter bases on sets are said to be \emph{downward directed} by $\subseteq$ (\ie, downward directed by inclusion). \paragraph{Filters:} Take set $\set{S}$ and a set $\setset{F} \subseteq \Pow(\set{S})$ where % \begin{enumerate}[(i)] \item if $\set{X} \in \setset{F}$ and $\set{Y} \in \setset{F}$ then $\set{X} \cap \set{Y} \in \setset{F}$ \item if $\set{X} \in \setset{F}$ and $\set{Y} \subseteq \set{S}$ with $\set{X} \subseteq \set{Y}$ then $\set{Y} \in \setset{F}$ \item $\emptyset \notin \setset{F}$ \item $\set{S} \in \setset{F}$ \end{enumerate} % In this case, $\setset{F}$ is called a \emph{filter on (nonempty) set $\set{S}$}. Note that % \begin{itemize} \item $\setset{F}$ satisfies the conditions for a proper filter on poset $(\Pow(\set{S}),{\subseteq})$ \item $\setset{F} \neq \emptyset$ since $\set{S} \in \setset{F}$ \item every filter on a set is also a filter base on the set \end{itemize} % As we will discuss, because every filter is also a filter base, results will usually be given in terms of filter bases rather than filters. \paragraph{Filters from Filter Bases:} Take set $\set{S}$. Assume $\setset{B}$ is a filter base on set $\set{S}$. Define $\setset{F}$ with % \begin{equation*} \setset{F} \triangleq \{ \set{A} \subseteq \set{S} : \text{there exists } \set{B} \in \setset{B} \text{ such that } \set{B} \subseteq \set{A} \} \end{equation*} % That is, $\setset{F}$ is the set of all subsets of $\set{S}$ that contain a set in the filter base $\setset{B}$. In this case, $\setset{F}$ is a filter on set $\set{S}$, and the filter $\setset{F}$ is said to be \emph{spanned} or \emph{generated} by the filter base $\setset{B}$. In other words, a filter basis on a set completely specifies a filter on that set. Therefore, it is common to generate results with filter bases since any filter base generates a corresponding filter. \paragraph{Filter Base Refinements:} Take set $\set{S}$. Assume $\setset{B}$ and $\setset{C}$ are filter bases on set $\set{S}$. Assume that for all $\set{B} \in \setset{B}$, there is a $\set{C} \in \setset{C}$ such that $\set{C} \subseteq \set{B}$. In this case, filter base $\setset{C}$ is said to be \emph{finer} than filter base $\setset{B}$. It is also said that a filter that is finer than another filter is a \emph{refinement} of that filter, so $\setset{C}$ is a refinement of $\setset{B}$. Note that if $\setset{C}$ is finer than $\setset{B}$ and there is another filter base $\setset{D}$ on $\set{S}$ that is finer than $\setset{C}$ then $\setset{D}$ is finer than $\setset{B}$. \paragraph{Equivalent Filter Bases:} Take set $\set{S}$ and filter bases $\setset{B}$ and $\setset{C}$ on set $\set{S}$. If $\setset{B}$ is a finer than $\setset{C}$ and $\setset{C}$ is finer than $\setset{B}$ then $\setset{B}$ and $\setset{C}$ are said to be \emph{equivalent} filter bases. \paragraph{Functions of Filter Bases:} Take sets $\set{X}$ and $\set{Y}$ and a function $f: \set{X} \mapsto \set{Y}$. Now take filter base $\setset{B}$ on $\set{X}$. It can be shown that $f\{ \setset{B} \}$ is a filter base on $\set{Y}$. \subsection{Nets and Sequences as Filters} \label{app:math_nets_and_sequences_as_filters} Take directed set $(\set{A},{\leq})$. Also take the set $\setset{A}$ defined by % \begin{equation*} \setset{A} \triangleq \{ \{ \alpha \in \set{A} : \alpha_0 \leq \alpha \} : \alpha_0 \in \set{A} \} \end{equation*} % In other words, $\setset{A}$ is a set of \emph{tails} of the directed set $\set{A}$. In fact, it is easy to verify that $\setset{A}$ is a filter base. Therefore, $\setset{A}$ is called the \emph{filter base of tails} of the directed set $\set{A}$. \paragraph{Filter Bases Generated by Nets:} Take set $\set{X}$ and directed set $(\set{A},{\leq})$ as well as the net $(x_\alpha)$ from $\set{A}$ to $\set{X}$. Now, take the set $\setset{X}$ defined by % \begin{equation*} \setset{X} \triangleq \{ \{ x_\alpha : \alpha \in \set{A}, \alpha_0 \leq \alpha \} : \alpha_0 \in \set{A} \} \end{equation*} % It is the case that $\setset{X}$ is a filter base. In fact, $\setset{X}$ is called the \emph{filter base of tails of net} $(x_\alpha)$ or the \emph{filter base generated by net} $(x_\alpha)$. \paragraph{Filter Bases Generated by Sequences:} Take set $\set{X}$, totally ordered set $(\N,{\leq})$, and the sequence $(x_n)$ with codomain $\set{X}$. Therefore, % \begin{equation*} \setset{A} \triangleq \{ \{ n \in \N : n_0 \leq n \} : n_0 \in \N \} \end{equation*} % is the \emph{filter base of tails} of the totally ordered set $\N$, and % \begin{equation*} \setset{X} \triangleq \{ \{ x_n : n \in \N, n_0 \leq n \} : n_0 \in \N \} \end{equation*} % is the \emph{filter base generated by sequence} $(x_n)$. \paragraph{Filters as General Framework:} Every sequence is a net, and every net generates a filter. Thus, any statement about filters also holds with nets and sequences. In fact, as we will discuss, a general framework has been built based on filters that agrees with the expected results derived independently with nets and sequences. Therefore, any result that we derive for a filter will have a consistent result for a any net or sequence that generates that filter as well. \paragraph{Nets as Functions:} Take a directed set $(\set{A},{\leq})$ and a set $\set{X}$ and a net $(x_\alpha)$ with domain $\set{A}$ and codomain $\set{X}$. By definition, nets are indexed families which are functions, and so nets are functions. Therefore, for sake of notation, define the function $f: \set{A} \mapsto \set{X}$ as % \begin{equation*} f(\alpha) \triangleq x_\alpha \end{equation*} % for all $\alpha \in \set{A}$. Recall that for any $\set{B} \subseteq \set{A}$, the image of $\set{B}$ under $f$ is $f[B]$. Now, define the set $\setset{A}$ as % \begin{equation*} \setset{A} \triangleq \{ \{ \alpha \in \set{A} : \alpha_0 \leq \alpha \} : \alpha_0 \in \set{A} \} \end{equation*} % As discussed, this is the filter base of tails of $\set{A}$, and it is certainly a filter base on $\set{A}$. Therefore, $f\{\setset{A}\}$ (\ie, the image of filter base $\setset{A}$ under $f$) must also be a filter base. In fact, it is clear that % \begin{equation*} f\{ \setset{A} \} = \{ \{ x_\alpha : \alpha \in \set{A}, \alpha_0 \leq \alpha \} : \alpha_0 \in \set{A} \} \end{equation*} % which is the filter base of tails of the net $(x_\alpha)$. Therefore the filter base of tails of $\set{A}$ are related to the filter base of tails of net $(x_\alpha)$ by the function $f$ which defines the net. Of course, we could pick any filter base $\setset{B}$ on $\set{A}$ and generate a new filter base $f\{\setset{B}\}$ on $\set{X}$. As we will show in \longref{app:math_topology}, the analysis of images of filter bases under nets is actually the analysis of the \emph{limits} of nets (and sequences). \subsection{Algebras, Subalgebras, and Fields of Sets} \label{app:math_algebras_of_sets} Take a set $\set{U}$ and its power set $\Pow(\set{U})$. It has been shown that $(\Pow(\set{U}),{\subseteq})$ is a complete lattice; that is, $\Pow(\set{U})$ can be viewed as being ordered by $\subseteq$. However, it should be clear that for set $\set{U}$, its power set $\Pow(\set{U})$ forms a Boolean algebra and therefore also a Boolean ring. That is, % \begin{equation*} ( \Pow(\set{U}), {\symdiff}, {\cap}, \emptyset, \set{U} ) \end{equation*} % is a Boolean ring and % \begin{equation*} ( \Pow(\set{U}), {\cup}, {\cap}, {{}^c}, \emptyset, \set{U} ) \end{equation*} % is a Boolean algebra. And so every power set is % \begin{itemize} \item a Boolean algebra \item a Boolean ring \item a commutative ring \item an algebra over a ring \end{itemize} % Therefore, the power set is called an \emph{algebra of sets}. Because of this, $\Pow(\set{U})$ is ordered by a relation $\leq$ where for any $\set{X},\set{Y} \in \Pow(\set{U})$, $\set{X} \leq \set{Y}$ if and only if $\set{X} \cap \set{Y} = \set{X}$. However, this is the definition of $\subseteq$. Therefore, ordering $\Pow(\set{U})$ by $\subseteq$ simply follows from it being a Boolean algebra; that is, \emph{since} $\Pow(\set{U})$ forms a Boolean algebra then % \begin{itemize} \item $(\Pow(\set{U}),{\subseteq})$ is a partially ordered set \item $(\Pow(\set{U}),{\subseteq})$ is bounded lattice with greatest element $\set{U}$ and least element $\emptyset$ \item for $\set{X},\set{Y} \in \Pow(\set{U})$, if $\set{X} \subseteq \set{Y}$ then $\set{Y}^c \subseteq \set{X}^c$ \end{itemize} % In fact, as was already shown, $(\Pow(\set{U}),{\subseteq})$ is a complete lattice. This justifies the statement that any set $\set{X} \in \Pow(\set{U})$ is \emph{smaller} than a set $\set{Y} \in \Pow(\set{U})$ if $\set{X} \subseteq \set{Y}$ (\ie, $\set{X} \cap \set{Y} = \set{X}$). When sets are related by order terminology, the implicit order relation is $\subseteq$ (\ie, substitute $\subseteq$ for $\leq$). \paragraph{Subalgebras as Fields of Sets:} Just as any Boolean algebra can have subalgebras, the Boolean algebra formed by the power set has subalgebras formed by sets of sets. Take a universal set $\set{U}$ and the Boolean algebra $( \Pow(\set{U}), {\cup}, {\cap}, {{}^c}, \emptyset, \set{U} )$. We will call this the \emph{power set Boolean algebra of $\set{U}$}. Take a subset $\setset{S} \subseteq \Pow(\set{U})$ (\ie, a set of subsets of $\set{U}$). To say $\setset{S}$ forms a \emph{subalgebra of the power set Boolean algebra of $\set{U}$} means that % \begin{enumerate}[(i)] \item for any $\set{X},\set{Y} \in \setset{S}$, $\set{X} \cap \set{Y} \in \set{S}$ \label{item:boolean_algebra_closure_intersection} \item for any $\set{X},\set{Y} \in \setset{S}$, $\set{X} \cup \set{Y} \in \set{S}$ \label{item:boolean_algebra_closure_union} \item for any $\set{X} \in \setset{S}$, $\set{X}^c \in \setset{S}$ \label{item:boolean_algebra_closure_complement} \end{enumerate} % This will ensure that every one of the requirements for a Boolean algebra hold, thus justifying calling $\setset{S}$ a subalgebra of the power set Boolean algebra. Assume that $\setset{S}$ is a subalgebra of the power set Boolean algebra and take $\set{X} \in \setset{S}$. % \begin{itemize} \item By property (\shortref{item:boolean_algebra_closure_complement}), $\set{X}^c \in \setset{S}$. \item By property (\shortref{item:boolean_algebra_closure_intersection}), since $\set{X} \in \setset{S}$ and $\set{X}^c \in \setset{S}$ then $\set{X} \cap \set{X}^c \in \setset{S}$; however, $\set{X} \cap \set{X}^c = \emptyset$ and so $\emptyset \in \setset{S}$. \item Additionally, by property (\shortref{item:boolean_algebra_closure_union}), since $\set{X} \in \setset{S}$ and $\set{X}^c \in \setset{S}$ then $\set{X} \cup \set{X}^c \in \setset{S}$; however, $\set{X} \cup \set{X}^c = \set{X}$ and so $\set{X} \in \setset{S}$. \end{itemize} % Thus, the trivial subalgebra of the power set Boolean algebra is $\{\emptyset,\set{X}\}$. Any subalgebra of the power set Boolean algebra of $\set{U}$ is \emph{algebra of sets} called an \emph{algebra over $\set{U}$}. If $\setset{S}$ is an algebra over $\set{U}$, then $(\set{U},\setset{S})$ is called a \emph{field of sets} and elements of $\set{U}$ are called \emph{points}. \section{The Numbers} \label{app:math_numbers} Now that we have described the set operations and have introduced basic algebra, we will define numbers and arithmetic. First we will revisit the whole numbers and natural numbers in detail, and then use them to build integers and rational numbers. Once rational numbers are defined, we will be able to define distance and limits and use these notions to build the real numbers. A more complete and yet similarly structured discussion of these number systems is given by \citet{Stoll79}. \subsection{Whole Numbers} \label{app:math_whole_numbers} The basis for mathematics is counting. That is, before any argument or analysis can be made quantitative, something must be counted. Mathematics provides the \emph{whole numbers} as an abstract quantity capturing the essence of counting. Each whole number represents how many there are of a particular object. \paragraph{Definition:} We have already introduced the natural numbers and the whole numbers. Recall that the set of the whole numbers is denoted \symdef{Bnumbers.2}{wholes}{$\W$}{the set of the whole numbers (\ie, $\{0,1,2,3,\dots\}$)} and is defined by % \begin{align*} \W \triangleq \{0,1,2,3,\dots\} \end{align*} % and the set of the natural numbers is denoted \symdef{Bnumbers.1}{naturals}{$\N$}{the set of the natural numbers (\ie, $\{1,2,3,\dots\}$)} and defined to be a subset of $\W$, namely % \begin{align*} \N \triangleq \W \setdiff \{0\} = \{1,2,3,\dots\} \end{align*} % where each whole number is defined by % \begin{align*} 0 &= \{\}\\ 1 &= 0 \cup \{0\} = \{0\}\\ 2 &= 1 \cup \{1\} = \{0,1\}\\ 3 &= 2 \cup \{2\} = \{0,1,2\}\\ 4 &= 3 \cup \{3\} = \{0,1,2,3\}\\ &\mathrel{\vdots} \end{align*} % It is important to note that whole numbers are simple sets that carry with them the standard set relations $=$, $\subseteq$, $\supseteq$, $\subset$, and $\supset$. That is, for two whole numbers $x,y \in \W$, it is only the case that $x = y$ if $x \subseteq y$ and $y \subseteq x$. In other words, the \emph{equivalence relation} $=$ on $\W$ is the same equivalence relation defined for sets. Clearly, % \begin{equation} 0 \subseteq 1 \subseteq 2 \subseteq 3 \subseteq 4 \subseteq \cdots \label{eq:whole_number_subseteq_order} \end{equation} % In fact, % \begin{equation} 0 \subset 1 \subset 2 \subset 3 \subset 4 \subset \cdots \label{eq:whole_number_subset_order} \end{equation} % In other words, for any $x,y \in \W$, $x \neq y$. Also, for any two whole numbers $x,y \in \W$ such that $x \subset y$, it is also the case that $x \in y$. \paragraph{Successor Function:} Now define the \emph{successor function} $S: \W \mapsto \N$ by % \begin{align*} S(x) \triangleq x \cup \{x\} \end{align*} % Thus, $S$ is a function that maps any whole number $x$ to its successor $x \cup \{x\}$. For example, $S(0)=1$ and $S(3)=4$. Note that since every whole number has a unique successor this function is injective. Additionally, since every natural number is a successor of a whole number, this function is bijective and therefore its inverse $S^{-1}$ exists. For example, $S^{-1}(1)=0$ and $S^{-1}(4)=3$. \paragraph{Addition:} Define the \emph{addition} operator $+$ so that for any two whole numbers $x,y \in \W$, % \begin{align*} x + 0 \triangleq x \quad \text{and} \quad x + S(y) \triangleq S(x+y) \end{align*} % where the result of an addition is called the \emph{sum}. For example, take $5 + 1$. Since the successor function $S$ is bijective then the right argument $7$ can be rewritten as $S(S^{-1}(1))$ as $S \comp S^{-1}$ is the identity function on $\N$. However, $S^{-1}(1)=0$, and so $5 + 1$ can be rewritten as $5 + S(0)$. By the definition of addition, this is $S(5+0)$. However, also by the definition of addition with $0$, $5+0=5$. Therefore, the result is $S(5)$ or $6$. This process can be applied to any operation $x + y$ where $x,y \in \W$. Note that this operator is \emph{commutative}. That is, for any two whole numbers $x,y \in \W$, it is the case that $x + y = y + x$. Additionally, this operator is \emph{associative}. That is, for any three whole numbers $x,y,z \in \W$, it is the case that $x+(y+z)=(x+y)+z$. Also, since $x + 0 = x$ for any whole number $x \in \W$, $0$ is known as the \emph{additive identity} for the whole numbers. \paragraph{Multiplication:} Similarly, define the \emph{multiplication} operator $\times$ so that for any two whole numbers $x,y \in \W$, % \begin{align*} x \times 0 \triangleq 0 \quad \text{and} \quad x \times S(y) \triangleq (x \times y) + x \end{align*} % where the parentheses indicate that $(x \times y)$ should be viewed as the left argument of the addition operator. Parentheses will often be used to instruct that an arithmetic operation should occur first; otherwise operations will occur from left to right. The result of a multiplication is known as a \emph{product}. This definition of multiplication can be applied in a similar fashion as the definition for addition above. For example, for any whole number $x \in \W$ % \begin{align*} x \times 1 &= x \times S(S^{-1}(1))\\ &= x \times S(0)\\ &= (x \times 0) + x\\ &= 0 + x\\ &= x\\ \end{align*} % This is why $1$ is known as the \emph{multiplicative identity} for the whole numbers. It can also be shown that the multiplication operator is \emph{commutative}; that is, for any two whole numbers $x,y \in \W$, it is the case that $x \times y = y \times x$. Additionally, this operator is \emph{associative}. That is, for any three whole numbers $x,y,z \in \W$, it is the case that $x \times ( y \times z ) = (x \times y) \times z$. Finally, note that if there are two whole numbers $x,y \in \W$ such that $x \times y = 0$, it must be that $x=0$, $y=0$, or both. When an operation involves both multiplication and addition, the multiplication operations should occur first unless grouping symbols like parentheses indicate that certain operations should occur first. Additionally, it can be shown that for any three whole numbers $x,y,z \in \W$, % \begin{equation*} x \times (y + z) = x \times y + x \times z \end{equation*} % This is the \emph{distributive} property of whole number multiplication. Also note that for any two whole numbers $x,y \in \W$, the notation $xy$ or $x \cdot y$ is equivalent to $x \times y$. Unfortunately, the use of $\times$ for multiplication creates some ambiguity with the Cartesian product. However, it is rare to take the Cartesian product of two whole numbers. \paragraph{Exponentiation:} Now that multiplication has been defined for the whole numbers, exponentiation can also be defined. For any three whole numbers $x, a, b \in \W$, exponentiation of the whole numbers is such that % \begin{align*} x^0 &\triangleq 1\\ x^1 &\triangleq x\\ x^{a+b} &\triangleq x^a \times x^b\\ (x^a)^b &\triangleq x^{a \times b}\\ (a \times b)^x &\triangleq a^x \times b^x \end{align*} % For example, take a whole number $x \in \W$ and the exponentiation $x^3$. The following represents the successive steps that can be used to derive an equivalent expression for $x^3$ that does not involve exponentiation. % \begin{align*} x^3 &= x^{2+1}\\ &= x^2 x^1\\ &= x^2 x\\ &= x^{1+1} x\\ &= x^1 x^1 x\\ &= x^1 x x\\ &= x x x \end{align*} % In this case, the exponentiation $x^3$ is thus a shorthand for $x \times x \times x$. Note that for any $x \in \W$, $x^2 \geq 0$. \paragraph{Even and Odd Whole Numbers:} Take whole number $y \in \W$. % \begin{itemize} \item If it is the case that there exists another whole number $x \in \W$ such that $y = 2x$ then $y$ is called an \emph{even} whole number. \item If it is the case that there exists another whole number $x \in \W$ such that $y = 2x+1$ then $y$ is called an \emph{odd} whole number. \end{itemize} % It can be shown that for every whole number $y \in \W$, $y$ is either an even number or an odd number but not both. That is, the sets % \begin{equation*} \W_E \triangleq \{ w \in \W : w \text{ is even} \} \quad \text{ and } \quad \W_O \triangleq \{ w \in \W : w \text{ is odd} \} \end{equation*} % are mutually exclusive and collectively exhaustive in $\W$ (\ie, $\W_E \cap \W_O = \emptyset$ and $\W_E \cup \W_O = \W$). Therefore, $\{ \W_E, \W_O \}$ is a partition of $\W$. Assume that $x,y \in \W$. Also assume that $x$ is even. Thus, there exists a $z \in \W$ such that $x = 2z$. Therefore, % \begin{equation*} x y = (2z) y = 2zy = 2 (zy) \end{equation*} % That is, $xy$ must also be an even whole number. Now take $x,y \in \W$ as before, but assume that $x$ and $y$ are both odd. Then there must exist $v,w \in \W$ such that $x = 2v+1$ and $y = 2w+1$. Then % \begin{align*} x y &= (2v + 1)(2w + 1) = 2v2w + 2v + 2w + 1 = 4vw + 2v + 2w + 1\\ &= 2(vw + v + w) + 1 \end{align*} % Therefore $xy$ must also be an odd number. To summarize, % \begin{itemize} \item The product of two odd whole numbers is odd. \item The product of an even whole number with any other whole number is even. \end{itemize} \paragraph{Total Ordering:} Now that addition has been defined, a \emph{total order} can be defined on the whole numbers. For any two whole numbers $x,y \in \W$, it is said that $x$ is less than or equal to $y$ (denoted $x \leq y$) if there exists another whole number $z \in \W$ such that $x + z = y$, and it is said that $x$ is strictly less than $y$ (denoted $x < y$) if there exists a natural number $z \in \N$ such that $x + z = y$. Note that the phrases $x \leq y$ and $x < y$ can be written $y \geq x$ and $y > x$ respectively. In this case, the symbol $>$ ($\geq$) represents a greater than (or equal to) relationship. Note that % \begin{equation*} 0 \leq 1 \leq 2 \leq 3 \leq 4 \leq \cdots \end{equation*} % and, in fact, % \begin{equation*} 0 < 1 < 2 < 3 < 4 < \cdots \end{equation*} % Recall the subset relationships in \longrefs{eq:whole_number_subseteq_order} and \shortref{eq:whole_number_subset_order}. Clearly, the \emph{inequality order relations} $\leq$ and $<$ for the whole numbers have been constructed to match the relationships already in place by $\subseteq$ and $\subset$ respectively. In fact, it is the case that for any two whole numbers $x,y \in \W$, $x \leq y$ if and only if $x \subseteq y$, and similarly for any two whole numbers $x,y \in \W$, $x < y$ if and only if $x \subset y$. \paragraph{Lack of Dense Ordering:} Note that for $2$ and $3$, it is the case that $2 < 3$; however, there is no whole number $z \in \W$ such that $2 < z < 3$. This can be shown analytically by using the definition of the successor function. Because of this, $\W$ cannot be densely ordered. In fact, $\N$ is also not densely ordered for the same reason. \paragraph{Gaplessness:} It is easy to show that both $(\W,{\leq})$ and $(\N,{\leq})$ are \emph{gapless}. Of course, since $\W$ and $\N$ both lack an upper bound then neither are complete. \paragraph{Existence of Minima and Maxima:} Because both $(\W,{\leq})$ and $(\N,{\leq})$ are gapless and \emph{not} densely ordered then any subset that is bounded from above has a maximum element and any subset that is bounded from below has a minimum element. However, because $\W$ is bounded from below by $0$ and $\N \subseteq \W$ then $\N$ has a minimum element, namely $1$. Therefore, all subsets of either $\N$ or $\W$ must have minimum elements. This important fact will be used in \longref{app:math_countability_and_order} to show that nontrivial densely ordered sets that are gapless (\eg, the \emph{real numbers} discussed in \longref{app:math_reals}) must also be uncountable. \paragraph{Cardinal Arithmetic:} Now that arithmetic has been defined for $\W$, we state without proof that for any two \emph{finite} sets $\set{X}$ and $\set{Y}$, % \begin{align*} |\set{X} \times \set{Y}| &= |\set{X}||\set{Y}|\\ |\set{X}^\set{Y}| &= |\set{X}|^{|\set{Y}|} \end{align*} % For example, $\{(0,0),(0,1),(1,0),(1,1)\}$ can be represented as $\{0,1\} \times \{0,1\}$ or as $\{0,1\}^{\{0,1\}}$ (\ie, $\{0,1\}^2$). Clearly this set has cardinality $4$, and the cardinality of both of these representations is also $4$ by the above two rules. Assume that there are two \emph{finite} sets $\set{X}$ and $\set{Y}$ such that $\set{X} \cap \set{Y} = \emptyset$ (\ie, they have no shared elements) then % \begin{align*} |\set{X} \cup \set{Y}| &= |\set{X}|+|\set{Y}| \end{align*} % If the intersection of finite sets $\set{X}$ and $\set{Y}$ is not empty then this only provides a bound on the intersection's cardinality. That is, in general for any two finite sets $\set{X}$ and $\set{Y}$, it is always the case that % \begin{align*} |\set{X} \cup \set{Y}| &\leq |\set{X}|+|\set{Y}| \end{align*} % For example, the union $\{0,1\} \cup \{2,3\}$ has cardinality $4$ while union $\{0,1\} \cup \{0,1\}$ has cardinality $2$, which is less than $4$. Cardinal arithmetic can be extended to infinite sets as well; however, the cardinality of infinite sets is not critically important to this work. Recall that for any set $\set{X}$, the power set $\Pow(\set{X})$ is congruent to the set $2^\set{X}$; therefore, since $|2|=2$, the cardinality $|\Pow(\set{X})|=2^{|\set{X}|}$. It can be shown that for any whole number $x \in \W$, $2^x \geq x$. This means that for any set $\set{X}$, the power set $2^\set{X}$ (\ie, $\Pow(\set{X})$) is has a cardinality greater than or equal to the cardinality of set $\set{X}$. That is, for any set $\set{X}$, % \begin{align*} |\Pow(\set{X})| \geq |\set{X}| \end{align*} % In fact, for any non-empty set $\set{X}$, the cardinality of the power set $\Pow(\set{X})$ is strictly greater than the cardinality of set $\set{X}$. That is, for any non-empty set $\set{X}$, % \begin{align*} |\Pow(\set{X})| > |\set{X}| \end{align*} % In other words, in some sense all sets have a smaller or equal size as their power sets, with equality only when the set is empty. \paragraph{Algebraic Structure of the Whole Numbers:} Note that for $(\W,{+},0)$, it is the case that % \begin{itemize} \item for all $x,y \in \W$, $x + y = y + x$ \item for all $x,y,z \in \W$, $(x + y) + z = x + (y + z)$ \item for all $x \in \W$, $0 + x = x + 0 = x$ \end{itemize} % and for $(\W,{\times},1)$, it is the case that % \begin{itemize} \item for all $x,y \in \W$, $x \times y = y \times x$ \item for all $x,y,z \in \W$, $(x \times y) \times z = x \times (y \times z)$ \item for all $x \in \W$, $1 \times x = x \times 1 = x$ \end{itemize} % And so for $(\W,{+},{\times},0,1)$, % \begin{itemize} \item $(\W,{+},0)$ is a \emph{commutative monoid} \item $(\W,{\times},1)$ is a \emph{commutative monoid} \item for each $x,y,z \in \W$, $x(y + z) = xy + yz$ and $(x + y)z = xz + yz$ \end{itemize} % Therefore, $(\W,{+},{\times},0,1)$ is a \emph{commutative semiring}. Unless otherwise noted, whenever $\W$ is used, it is assumed that it is equipped with operators $+$ and $\times$ and order relation $\leq$; in other words, $\W$ is implicitly taken to be $(\W,{+},{\times},0,1,{\leq})$. \paragraph{Algebraic Structure of the Natural Numbers:} Note that for the magma $(\N,{+})$, it is the case that % \begin{itemize} \item for all $x,y \in \N$, $x + y = y + x$ \item for all $x,y,z \in \N$, $(x + y) + z = x + (y + z)$ \end{itemize} % and for $(\W,{\times},1)$, it is the case that % \begin{itemize} \item for all $x,y \in \N$, $x \times y = y \times x$ \item for all $x,y,z \in \N$, $(x \times y) \times z = x \times (y \times z)$ \item for all $x \in \N$, $1 \times x = x \times 1 = x$ \end{itemize} % Therefore, $(\N,{+})$ is a \emph{commutative semigroup} and $(\N,{\times},1)$ is \emph{commutative monoid}. Since $(\N,{+})$ has no identity element, there is no structure identified with $(\N,{+},{\times})$. Unless otherwise noted, whenever $\N$ is used, it is assumed that it is equipped with operators $+$ and $\times$ and order relation $\leq$; in other words, $\N$ is implicitly taken to be $(\N,{+},{\times},{\leq})$ (with multiplicative identity $1$). \subsection{Integers} Now that whole numbers have been defined so that items can be counted, it is useful to define a number system that can compare two quantities. That is, while the whole numbers are ordered, it is possible to further quantity where in the ordering each whole number is with respect to some other whole number. In other words, a common framework for describing some sort of distance between whole numbers is needed. This common framework comes in the form of the \emph{integers}. \paragraph{Definition:} Just as the whole numbers are defined to be sets of other whole numbers, the \emph{integers} are defined to be equivalence classes of ordered pairs of two whole numbers. In particular, for an element $(p,q) \in \W \times \W$, define the equivalence relation $=$ on $\W \times \W$ so that for whole numbers $p,q,r,s \in \W$, the elements $(p,q),(r,s) \in \W \times \W$ are \emph{equal} if and only if % \begin{equation} p+s = q+r \label{eq:integer_equivalence_relation} \end{equation} % Each integer is then defined as an equivalence class $[(p,q)]$ where $(p,q) \in \W \times \W$. That is, the set of integers \symdef{Bnumbers.3}{integers}{$\Z$}{the set of the integers (\ie, $\{\dots,-3,-2,-1,0,1,2,3,\dots\}$)} is defined to be the quotient set % \begin{equation*} \Z \triangleq (\W \times \W)/{=} \end{equation*} % where the equivalence relation $=$ is given by \longref{eq:integer_equivalence_relation}. \paragraph{Symbols:} For every natural number $p \in \N$, define two symbols $p^*$ and $-p^*$. The symbol $p^*$ represents the integer that includes $(p,0)$ in its equivalence class. Integers of this form are called \emph{positive integers}. The symbol $-p^*$ represents the integer that includes $(0,p)$ in its equivalence class. Integers of this form are called \emph{negative integers}. For whole number $0$, define symbol $0^*$ that includes $(0,0)$ in its equivalence class. In other words, define the symbols % \begin{align*} &\mathrel{\vdots}\\ -q^* &\triangleq [(0,q)] = \{ (p,p+q): \text{ for all $p \in \W$} \} \text{ for all $q \in \N$}\\ &\mathrel{\vdots}\\ -3^* &\triangleq [(0,3)] = \{ (p,p+3): \text{ for all $p \in \W$} \}\\ -2^* &\triangleq [(0,2)] = \{ (0,2), (1,3), (2,4), (3,5), \dots \}\\ -1^* &\triangleq [(0,1)] = \{ (0,1), (1,2), (2,3), (3,4), \dots \}\\ 0^* &\triangleq [(0,0)] = \{ (0,0), (1,1), (2,2), (3,3), \dots \}\\ 1^* &\triangleq [(1,0)] = \{ (1,0), (2,1), (3,2), (4,3), \dots \}\\ 2^* &\triangleq [(2,0)] = \{ (2,0), (3,1), (4,2), (5,3), \dots \}\\ 3^* &\triangleq [(3,0)] = \{ (3+q,q): \text{ for all $q \in \W$} \}\\ &\mathrel{\vdots}\\ p^* &\triangleq [(p,0)] = \{ (p+q,q): \text{ for all $q \in \W$} \} \text{ for all $p \in \N$}\\ &\mathrel{\vdots} \end{align*} % where the notation $[\cdot]$ indicates an equivalence class. As a review of equivalence classes, note that since both of the equivalence relations $[(1,3)]$ and $[(7,9)]$ include $(0,2)$ as an element, they are both equal to each other and are also both equal to the equivalence class $[(0,2)]$, which we have defined to be the symbol ${-2}^*$. Note that we will justify removing the $*$ superscript later (\ie, replacing $0^*$ by the symbol $0$) to make these symbols more familiar. Now that we have defined these symbols, it is clear that the set of the integers $\Z$ can also be expressed as % \begin{align} \Z &= \{ p^* : \text{ for all } p \in \N \} \cup \{ -p^* : \text{ for all } p \in \N \} \cup \{ p^* : p = 0 \} \label{eq:integers_with_stars}\\ &= \{ \cdots, -4^*, -3^*, -2^*, -1^*, 0^*, 1^*, 2^*, 3^*, 4^*, \cdots \} \label{eq:integers_with_symbols} \end{align} \paragraph{Countability:} These integers in \longref{eq:integers_with_symbols} are listed with no starting point. That is, the pattern continues with no end from the left and to the right. However, they can be rewritten in an order that starts at $0^*$. In \longref{tab:integers_and_naturals}, the integers are listed horizontally above a list of natural numbers. % \begin{table}[!ht]\centering \begin{tabular}{|l|cccccccccc|} \hline Integers: & $0^*$ & ${-1}^*$ & $1^*$ & ${-2}^*$ & $2^*$ & ${-3}^*$ & $3^*$ & ${-4}^*$ & $4^*$ & $\cdots$ \\ Natural Numbers: & $1$ & $2$ & $3$ & $4$ & $5$ & $6$ & $7$ & $8$ & $9$ & $\cdots$ \\ \hline \end{tabular} \caption{Integers listed alongside natural numbers.} \label{tab:integers_and_naturals} \end{table} % This simple pattern can be continued \adinfinitum{}, matching up exactly one integer to exactly one natural number. Therefore, a bijection exists between the integers and the natural numbers. That is, define a function $f: \Z \mapsto \N$ with % \begin{equation*} f \triangleq \{(0^*,1),(-1^*,2),(1^*,3),(-2^*,4),(2^*,5),\cdots\} \end{equation*} % and its corresponding inverse $f^{-1}: \N \mapsto \Z$ with % \begin{equation*} f^{-1} \triangleq \{(1,0^*),(2,-1^*),(3,1^*),(4,-2^*),(5,2^*),\cdots\} \end{equation*} % Therefore, a bijection exists between $\Z$ and $\N$, and so those two sets are congruent. Any set congruent to the natural numbers is countably infinite. Therefore, the integers are countably infinite. It is interesting that $\Z \cong \N$ because (due to our choice of familiar symbols to represent each integer) it appears as if the integers are somehow twice as large as the integers; however, this is not the case. \paragraph{Total Ordering:} Take four whole numbers $p,q,r,s$ that are the left and right projections of two integers $[(p,q)]$ and $[(r,s)]$. The integer $[(p,q)]$ is said to be less than or equal to integer $[(r,s)]$ (denoted $[(p,q)] \leq [(r,s)]$) if and only if % \begin{equation*} p+s \leq q+r \end{equation*} % Similarly, $[(p,q)]$ is strictly less than $[(r,s)]$ (denoted $[(p,q)] < [(r,s)]$) if and only if % \begin{equation*} p+s < q+r \end{equation*} % Just as with the related inequality relation on the whole numbers, $[(p,q)] \leq [(r,s)]$ can also be denoted $[(r,s)] \geq [(p,q)]$, and $[(p,q)] < [(r,s)]$ can also be denoted $[(r,s)] > [(p,q)]$. In these cases, $>$ ($\geq$) represents that an integer is greater than (or equal to) another integer. This ordering implies that % \begin{equation*} \cdots \leq -5^* \leq -4^* \leq -3^* \leq -2^* \leq -1^* \leq 0^* \leq 1^* \leq 2^* \leq 3^* \leq 4^* \leq 5^* \leq \cdots \end{equation*} % and, in fact, % \begin{equation*} \cdots < -5^* < -4^* < -3^* < -2^* < -1^* < 0^* < 1^* < 2^* < 3^* < 4^* < 5^* \leq \cdots \end{equation*} % We refer to any integer greater than $0^*$ as \emph{positive} and any integer less than $0^*$ as \emph{negative}. The \emph{non-negative integers} are the positive integers and $0^*$ (\ie, the complement of the negative integers). The \emph{non-positive integers} are the negative integers and $0^*$ (\ie, the complement of the positive integers). The \emph{non-zero integers} are all of the integers except for $0^*$ (\ie, $\Z \setdiff \{0^*\}$, the complement of $\{0^*\}$). \paragraph{Lack of Dense Ordering:} Note that for $2^*$ and $3^*$, it is the case that $2^* < 3^*$; however, there is no integer $z \in \Z$ such that $2^* < z < 3^*$. This can be shown analytically by using the definition of the integer and the lack of dense ordering of the whole numbers. Because of this, $\Z$ cannot be densely ordered. \paragraph{Gaplessness:} It is easy to show that $(\Z,{\leq})$ and is \emph{gapless}. Of course, since $\Z$ lacks both an upper and a lower bound, it is not complete. \paragraph{Existence of Minima and Maxima:} Because $(\Z,{\leq})$ is gapless and \emph{not} densely ordered then any subset that is bounded from above has a maximum element and any subset that is bounded from below has a minimum element. \paragraph{Addition:} Again, take four whole numbers $p,q,r,s \in \W$ that make up two integers $[(p,q), [(r,s)] \in \Z$. The addition of these two integers is defined as % \begin{equation*} [(p,q)] + [(r,s)] \triangleq [(p+r,q+s)] \end{equation*} % where the result of an addition is called the \emph{sum}. For example, % \begin{align*} 5^* + 6^* &= [(5,0)]+[(8,2)]\\ &= [(5+8,0+2)]\\ &= [(13,2)]\\ &= 11^* \end{align*} % and, similarly, % \begin{align*} 6^* + {-8}^* &= [(7,1)]+[(2,10)]\\ &= [(7+2,1+10)]\\ &= [(9,11)]\\ &= {-2}^* \end{align*} % where the last steps in both of these examples is justified by the equivalence relation in \longref{eq:integer_equivalence_relation}. Take whole numbers $p,q,r,s \in \W$ making up integers $[(p,q)],[(r,s)] \in \Z$. Note that it is the case that % \begin{align*} [(r,s)] + [(p,q)] &= [(r+p,s+q)]\\ &= [(p+r,q+s)]\\ &= [(p,q)] + [(r,s)] \end{align*} % In other words, integer addition is \emph{commutative}, so for any two integers $x,y \in \Z$, $x+y=y+x$. Additionally, take whole numbers $p,q,r,s,t,u \in \W$ making up integers $[(p,q)]$, $[(r,s)]$, and $[(t,u)]$. Note that grouping symbols like parentheses also indicate that the operator surrounded by them should be calculated first rather than following the normal left-to-right operation. Thus, % \begin{align*} [(p,q)] + ( [(r,s)]+[(t,u)] ) &= [(p,q)] + [(r+t,s+u)]\\ &= [(p+(r+t),q+(s+u))]\\ &= [((p+r)+t,(q+s)+u)]\\ &= [(p+r,q+s)]+[(t,u)]\\ &= ( [(p,q)]+[(r,s)] )+[(t,u)] \end{align*} % In other words, integer addition is also \emph{associative}, so for any three integers $x,y,z \in \Z$, $x+(y+z)=(x+y)+z$. Now take whole numbers $p,q,r \in \W$ that make integers $[(p,q)]$ and $[(r,r)]$ (\ie, $0^*$). It is the case that % \begin{align*} [(p,q)]+[(r,r)] &= [(p+r,q+r)]\\ &= [(p,q)] \end{align*} % where the last step is justified by the definition of equality for integers given in \longref{eq:integer_equivalence_relation}. That is, because $p+r+q=q+r+p$ then $[(p+r,q+r)]=[(p,q)]$. Therefore, for any integer $z \in \Z$, $z + 0^* = z$. Thus, $0^*$ is the \emph{additive identity} for the integers. Also note that for any two whole numbers $p,q \in \W$, the integer addition % \begin{align*} [(p,q)]+[(q,p)] &=[(p+q,q+p)]\\ &=[(p+q,p+q)]\\ &=0^* \end{align*} % Therefore, the \emph{additive inverse} of $[(p,q)]$ is $[(q,p)]$. That is, % \begin{align*} 0^* + 0^* &= [(p,p)]+[(q,q)] = [(p+q,p+q)] = 0^*\\ 1^* + {-1}^* &= [(p+1,p)]+[(q,q+1)]\\ &= [(p+1+q,p+q+1)] = [(p+q+1,p+q+1)] = 0^*\\ 2^* + {-2}^* &= [(p+2,p)]+[(q,q+2)]\\ &= [(p+2+q,p+q+2)] = [(p+q+2,p+q+2)] = 0^*\\ 3^* + {-3}^* &= [(p+3,p)]+[(q,q+3)]\\ &= [(p+3+q,p+q+3)] = [(p+q+3,p+q+3)] = 0^*\\ &\mathrel{\vdots} \end{align*} % So $0^*$ is its own additive inverse, $-1^*$ is the additive inverse of $1^*$, $-2^*$ is the additive inverse of $2^*$, and so on. In other words, the familiar symbols chosen to represent the integers have been named with focus on additive inverses. \paragraph{Subtraction:} Motivated by the additive inverse of an integer, the \emph{subtraction} operator $-$ is defined so that for any four whole numbers $p,q,r,s \in \W$, the subtraction of two integers is % \begin{align*} [(p,q)] - [(r,s)] &\triangleq [(p,q)]+[(s,r)]\\ &= [(p+s,q+r)] \end{align*} % For example, % \begin{align*} 2^* - 5^* &= [(2,0)] - [(9,4)]\\ &= [(2,0)] + [(4,9)]\\ &= [(2+4,0+9)]\\ &= [(6,9)]\\ &= -3^* \end{align*} % Note that % \begin{align*} 5^* - 2^* &= [(6,1)] - [(3,1)]\\ &= [(6,1)] + [(1,3)]\\ &= [(6+1,1+3)]\\ &= [(7,4)]\\ &= 3^* \end{align*} % Thus, $5^*-2^*$ is the additive inverse of $2^*-5^*$. In fact, for any two integers $x,y \in \Z$, $x-y$ is the additive inverse of $y-x$. This indicates that this operator is \emph{not} commutative. Additionally, we state without proof that this operator is also \emph{not} associative. Also note that for integers $x,y,y_i \in \Z$ where $y_i$ is the additive inverse of $y$ then $x-y=x+y_i$. For example, $5^*-2^*=5^*+{-2}^*$. \paragraph{Multiplication:} As before, take four whole numbers $p,q,r,s \in \W$ that make up two integers $[(p,q)],[(r,s)] \in \Z$. The \emph{multiplication} of these two integers is defined as % \begin{equation*} [(p,q)] \times [(r,s)] = [(p \times r + q \times s, p \times s + q \times r)] \end{equation*} % and the result is called their \emph{product}. For example, % \begin{align*} -2^* \times 5^* &= [(3,5)] \times [(5,0)]\\ &= [(3 \times 5 + 5 \times 0, 3 \times 0 + 5 \times 5)]\\ &= [(15 + 0, 0 + 25)]\\ &= [(15, 25)]\\ &= -10^* \end{align*} % Also note that for any two integers $x,y \in \Z$, the notations $x \times y$, $x \cdot y$, and $xy$ are all equivalent. For those same $(p,q)$ and $(r,s)$ from above, it is also the case that % \begin{align*} [(r,s)] \times [(p,q)] &= [(r \times p + s \times q, r \times q + s \times p)]\\ &= [(p \times r + q \times s, q \times r + p \times s)]\\ &= [(p \times r + q \times s, p \times s + q \times r)]\\ &= [(p,q)] \times [(r,s)] \end{align*} % Therefore, integer multiplication is \emph{commutative}. In other words, for any two integers $x,y \in \Z$, it is the case that $x \times y = y \times x$. Similarly, for $p,q,r,s,t,u \in \W$ making up $[(p,q)],[(r,s)],[(t,u)] \in \Z$, % \begin{align*} [(p,q)] \times ( [(r,s)] \times [(t,u)] ) &= [(p,q)] \times [(rt+su,ru+st)]\\ &= [(p(rt+su)+q(ru+st),p(ru+st)+q(rt+su))]\\ &= [(prt+psu+qru+qst,pru+pst+qrt+qsu)]\\ &= [(prt+qst+psu+qru,pst+qrt+qsu+pru)]\\ &= [((pr+qs)t+(ps+qr)u,(ps+qr)t+(qs+pr)u)]\\ &= [((pr+qs)t+(ps+qr)u,(ps+qr)t+(pr+qs)u)]\\ &= [(pr+qs,ps+qr)] + [(t,u)]\\ &= ( [(p,q)] + [(r,s)] ) + [(t,u)] \end{align*} % Therefore, integer multiplication is \emph{associative}. In other words, for any three integers $x,y,z \in \Z$, it is the case that $x \times ( y \times z) = ( x \times y ) \times z$. \paragraph{Multiplication Notables:} Take three whole numbers $p,q,r \in \W$ that make up the integers $(p,q)$ and $(r,r)$ (\ie, the integer $0^*$). It is the case that % \begin{align*} [(p,q)] \times [(r,r)] &= [(pr+qr,pr+qr)]\\ &= 0^* \end{align*} % where the last step is justified by \longref{eq:integer_equivalence_relation}. In fact, for any two integers $x,y \in \Z$, if $x \times y = 0^*$ then it must be that $x=0^*$, $y=0^*$, or both. Now take the integers integers $(p,q)$ and $(r+1,r)$ (\ie, the integer $1^*$). Note that % \begin{align*} [(p,q)] \times [(r+1,r)] &= [(p(r+1)+qr,pr+q(r+1))]\\ &= [(pr+p+qr,pr+qr+q)]\\ &= [(pr+qr+p,pr+qr+q)]\\ &= [(p,q)] \end{align*} % where again the last step is justified by \longref{eq:integer_equivalence_relation}. Therefore, the integer $1^*$ is the \emph{multiplicative identity}, and so for any integer $x \in \Z$, it is the case that $x \times 1^* = x$. Now take the integers $(p,q)$ and $(r,r+1)$ (\ie, the integer ${-1}^*$). It is such that % \begin{align*} [(p,q)] \times [(r,r+1)] &= [(pr+q(r+1),p(r+1)+qr)]\\ &= [(pr+qr+q,pr+p+qr)]\\ &= [(pr+qr+q,pr+qr+p)]\\ &= [(q,p)] \end{align*} % Therefore, for any integer $x \in \Z$, the multiplication ${-1}^* \times x$ is equivalent to the additive inverse of $x$. Because this operation is very useful, it has the shorthand ${-x}$. This is also consistent with the naming of the symbols that represent each integer (\eg, $3^*$ and its additive inverse ${-3}^*$). \paragraph{Subtraction:} For any two integers $x,y \in \Z$, the \emph{subtraction} operator $-$ is defined so that % \begin{equation*} x - y \triangleq x + -y \end{equation*} % where $-y$ is a shorthand for $-1^* \times y$, which results in the additive inverse of $y$. And thus, for any integer $x \in \Z$, it is the case that $x \times {-1}^* = -x$. \paragraph{Absolute Value and Signum:} For any integer $x \in \Z$, denote its \emph{absolute value} with the notation $|x|$ defined by % \begin{equation*} |x| \triangleq \begin{cases} x &\text{if } x \geq 0^*\\ -x &\text{if } x < 0^* \end{cases} \end{equation*} % and define the \emph{signum function} (also called the \emph{sign function}, not to be confused with the \emph{sine function}) $\sgn: \Z \mapsto \{-1^*,0^*,1^*\}$ with % \begin{equation*} \sgn(x) \triangleq \begin{cases} -1^* &\text{if } x < 0^*\\ 0^* &\text{if } x = 0^*\\ 1^* &\text{if } x > 0^* \end{cases} \end{equation*} % Therefore, any integer $z \in \Z$ can be represented as a magnitude (\ie, absolute value $|z|$) and a sign (\ie, $\sgn(z)$), as in % \begin{equation*} z = \sgn(z) \times |z| \end{equation*} % Note that the absolute value has some special properties. In particular, for any two integers $x,y \in \Z$, % \begin{itemize} \item $|x| \geq 0^*$ \item $|x| = 0^*$ if and only if $x = 0^*$ \item $|x \times y| = |x| \times |y|$ \item $|x + y| \leq |x| + |y|$ \item $|x - y| \geq |x| - |y|$ \item $|{-x}| = |x|$ \item $|x| \leq y$ if and only if $-y \leq x \leq y$ \end{itemize} % The last property will commonly be used to specify that an integer should be within a certain range of other integers. \paragraph{Exponentiation:} Now that multiplication has been defined for the integers, exponentiation can also be defined. For any integers $x, y \in \Z$ and whole numbers $a,b \in \W$, exponentiation of the integers is such that % \begin{align*} x^0 &\triangleq 1^*\\ x^1 &\triangleq x\\ x^{a+b} &\triangleq x^a \times x^b\\ (x^a)^b &\triangleq x^{a \times b}\\ (x \times y)^a &\triangleq x^a \times y^a \end{align*} % These are identical to the properties of whole number exponentiation. In fact, $a$ and $b$ are not integers but are whole numbers. This is because the multiplication operation for integers does not have a multiplicative inverse. When exponentiation is defined for the rational numbers, which does have a multiplicative inverse, there will be additional properties. Note that for any $x \in \Z$, $x^2 \geq 0^*$. \paragraph{Even and Odd Integers:} Take integer $y \in \Z$. % \begin{itemize} \item If it is the case that there exists another integer $x \in \Z$ such that $y = 2^* x$ then $y$ is called an \emph{even} integer. \item If it is the case that there exists another integer $x \in \Z$ such that $y = 2^* x+1^*$ then $y$ is called an \emph{odd} integer. \end{itemize} % It can be shown that for every integer $y \in \Z$, $y$ is either an even number or an odd number but not both. That is, the sets % \begin{equation*} \Z_E \triangleq \{ z \in \Z : z \text{ is even} \} \quad \text{ and } \quad \Z_O \triangleq \{ z \in \Z : z \text{ is odd} \} \end{equation*} % are mutually exclusive and collectively exhaustive in $\Z$ (\ie, $\Z_E \cap \Z_O = \emptyset$ and $\Z_E \cup \Z_O = \Z$). Therefore, $\{ \Z_E, \Z_O \}$ is a partition of $\Z$. Assume that $x,y \in \Z$. Also assume that $x$ is even. Thus, there exists a $z \in \Z$ such that $x = 2z$. Therefore, % \begin{equation*} x y = (2^* z) y = 2^* zy = 2^* (zy) \end{equation*} % That is, $xy$ must also be an even whole number. Now take $x,y \in \Z$ as before, but assume that $x$ and $y$ are both odd. Then there must exist $v,w \in \Z$ such that $x = 2^* v+1$ and $y = 2^* w+1$. Then % \begin{equation*} x y = (2^* v + 1)(2^* w + 1) = 2^* v 2^* w + 2^* v + 2^* w + 1^* = 4^* vw + 2^* v + 2^* w + 1^* = 2^* (vw + v + w) + 1^* \end{equation*} % Therefore $xy$ must also be an odd number. To summarize, % \begin{itemize} \item The product of two odd integers is odd. \item The product of an even integer with any other integer is even. \end{itemize} \paragraph{Algebraic Structure of the Integers:} Note that for $(\Z,{+},0^*)$, it is the case that % \begin{itemize} \item for all $x,y \in \Z$, $x + y = y + x$ \item for all $x,y,z \in \Z$, $(x + y) + z = x + (y + z)$ \item for all $x \in \Z$, $0^* + x = x + 0^* = x$ \item for all $x \in \Z$, $x + -x = -x + x = 0^*$ \end{itemize} % and for $(\Z,{\times},1^*)$, it is the case that % \begin{itemize} \item for all $x,y \in \Z$, $x \times y = y \times x$ \item for all $x,y,z \in \Z$, $(x \times y) \times z = x \times (y \times z)$ \item for all $x \in \Z$, $1^* \times x = x \times 1^* = x$ \end{itemize} % And so for $(\Z,{+},{\times},0^*,1^*)$, % \begin{itemize} \item $(\Z,{+},0^*)$ is a \emph{commutative group} \item $(\Z,{\times},1^*)$ is a \emph{commutative monoid} \item for each $x,y,z \in \Z$, $x(y + z) = xy + yz$ and $(x + y)z = xz + yz$ \end{itemize} % Therefore, $(\Z,{+},{\times},0^*,1^*)$ is a \emph{commutative ring}. Thus, $(\Z,{+},{\times},0^*,1^*)$ is trivially an algebra over itself (\ie, a $\Z$-algebra). However, also note that for any $x,y,z \in \Z$, % \begin{itemize} \item if $x \leq y$ then $z + x \leq z + y$ \item if $0^* \leq x$ and $0^* \leq y$ then $0^* \leq xy$ \end{itemize} % and so $(\Z,{+},{\times},0^*,1^*,{\leq})$ is an \emph{ordered ring} and aspects of familiar arithmetic that do not involve multiplicative inverses apply to it. Unless otherwise noted, whenever $\Z$ is used, it is assumed that it is equipped with operators $+$ and $\times$ and order relation $\leq$; in other words, $\Z$ is implicitly taken to be the ordered ring $(\Z,{+},{\times},0^*,1^*,{\leq})$. \paragraph{Relationship to Whole Numbers:} Define the set $\W^*$ as the set of non-negative integers. That is, define $\W^*$ by % \begin{equation*} \W^* \triangleq \{ z \in \Z : z \geq 0 \} \end{equation*} % and so $\W^*$ is the set of the non-negative integers. It is easy to show that if the image of $\W^* \times \W^*$ through either operator $+$ or $\times$ is $\W^*$. Additionally, it can be shown that $(\W^*,{+}|_{\W^*},{\times}|_{\W^*})$ forms a commutative semiring, and so $\W^*$ is a subsemiring of $\Z$. Now take the function $f: \W^* \mapsto \W$ defined by % \begin{align*} f &\triangleq \{ (z^*, z): \text{ for all } z \in \W \}\\ &= \{ (0^*, 0), (1^*, 1), (2^*, 2), (3^*, 3), \dots \} \end{align*} % Clearly this is a bijection. That is, the inverse $f^{-1}: \W \mapsto \W^*$ is defined by % \begin{align*} f^{-1} &\triangleq \{ (z, z^*): \text{ for all } z \in \W \}\\ &= \{ (0, 0^*), (1, 1^*), (2, 2^*), (3, 3^*), \dots \} \end{align*} % Therefore $\W \cong \W^*$. Also, note that for any integers $x,y \in \W^*$, % \begin{enumerate}[(i)] \item if $x \geq y$ then $f(x) \geq f(y)$ \label{item:integer_whole_ordering} \item $f(x + y) = f(x) + f(y)$ \label{item:integer_whole_ring_homomorphism_plus} \item $f(x \times y) = f(x) \times f(y)$ \label{item:integer_whole_ring_homomorphism_times} \item $f(1^*)=1$ \label{item:integer_whole_ring_homomorphism_m_identity} \end{enumerate} % Property (\shortref{item:integer_whole_ordering}) shows that $f$ is a monotone function, and properties (\shortref{item:integer_whole_ring_homomorphism_plus})--% (\shortref{item:integer_whole_ring_homomorphism_m_identity}) show that $f$ is a semiring homomorphism. Since $f$ is also a bijection, it can be said that $f$ is both an isomorphism in both the order sense and the algebraic sense. In other words, $\W$ is isomorphic to $\W^*$ in both an order sense and an algebraic sense. Therefore, not only is $\W \cong \W^*$, but $\W^*$ is a valid \emph{representation} for $\W$, and it is justifiable to say that $\W$ is a subsemiring of $\Z$. For example, note that for any integers $x,y \in \W^*$ and whole number $a \in \W$, % \begin{itemize} \item $x = y$ if and only if $f(x) = f(y)$ \item $x \leq y$ if and only if $f(x) \leq f(y)$ \item $f(0^*) = 0$ \item $f(x + y) = f(x)+f(y)$ \item $f(x - y) = f(x)-f(y)$ \item $f(1^*) = 1$ \item $f(x \times y) = f(x) \times f(y)$ \item $f(x^a) = f(x)^a$ \end{itemize} % So arithmetic and order are both preserved by the bijection $f$. Thus, while $\W$ is certainly not equal to $\W^*$, it is equal in all of the important ways that matter to us, and so we can consider $\W \subset \Z$ with all of its standard ordering and operations. In other words, the $*$ superscript can be dropped from all of the integer symbols above; the non-negative integers (\ie, $\W^*$) are a valid representation of the whole numbers (\ie, $\W$). \subsection{Rational Numbers} While the integers provide a way of analyzing the difference between whole numbers (and, in fact, provide an equivalent representation of the whole numbers as well), they do not answer questions about the relative scale of whole numbers. That is, a difference between two whole numbers may be significant in one case but insignificant in another. Thus, it would be useful to have a framework to analyze differences with respect to some common scale. This framework comes in the form of the \emph{rational numbers}. \paragraph{Definition:} Just as each integer is an equivalence class on the set $\W \times \W$, each \emph{rational number} is an equivalence class on the set $\Z \times (\Z \setdiff \{0\})$. That is, for an element $(p,q) \in \Z \times (\Z \setdiff \{0\})$, define the equivalence relation $=$ on $\Z \times (\Z \setdiff \{0\})$ so that for integers $p,r \in \Z$ and $q,s \in \Z \setdiff \{0\}$, the elements $(p,q),(r,s) \in \Z \times (\Z \setdiff \{0\})$ are \emph{equal} if and only if % \begin{equation} ps = qr \label{eq:rational_equivalence_relation} \end{equation} % Each rational number is then defined as an equivalence class $[(p,q)]$ where $(p,q) \in \Z \times (\Z \setdiff \{0\})$. That is, the set of rational numbers \symdef{Bnumbers.4}{rationals}{$\Q$}{the set of the rationals (\ie, ratios of integers)} is defined to be the quotient set % \begin{equation*} \Q \triangleq (\Z \times (\Z \setdiff \{0\}))/{=} \end{equation*} % where the equivalence relation $=$ is given by \longref{eq:rational_equivalence_relation}. To make things more familiar, we introduce the notation % \begin{equation*} \frac{p}{q} \triangleq [(p,q)] \end{equation*} % where $p \in \Z$ and $q \in \Z \setdiff \{0\}$. The notation $p/q$ is equivalent. In both cases, the left projection $p$ is called the \emph{numerator} and the right projection $q$ is called the \emph{denominator} of the \emph{ratio} $\frac{p}{q}$. For example, $\frac{1}{2} = [(1,2)]$. However, by \longref{eq:rational_equivalence_relation}, $[(1,2)]=[(5,10)]$, which could also be written $\frac{1}{2} = \frac{5}{10}$. Therefore, each line of the following represents a single particular rational number % \begin{align*} \frac{1}{2} = \frac{2}{4} = \frac{3}{6} = \cdots &= \frac{n}{2n} \text{ for all $n \in \Z \setdiff \{0\}$}\\ \frac{1}{3} = \frac{2}{6} = \frac{3}{9} = \cdots &= \frac{n}{3n} \text{ for all $n \in \Z \setdiff \{0\}$}\\ \frac{-1}{5} = \frac{-2}{10} = \frac{-3}{15} = \cdots &= \frac{-n}{3n} \text{ for all $n \in \Z \setdiff \{0\}$} \end{align*} % and thus the set of the rationals $\Q$ can be alternatively written as % \begin{equation*} \Q \triangleq \left\{ \frac{p}{q} : p \in \Z, q \in \Z \setdiff \{0\} \right\} \end{equation*} \paragraph{Countability:} It may seem like the set $\Q$ is not countable. That is, it may seem like the sets $\Q$ and $\N$ could not be congruent. However, this would be a mistake. We will show that a bijection exists between $\Q$ and $\N$. To motivate this, construct a table of rational numbers with $\frac{1}{1}$ in its upper-left corner that has increasing numerators down its rows and increasing denominators across its columns, where increasing is in the integer sense. Now map the upper-left corner of this table to natural number $1$ and then map the cells of the nearest diagonal to $2$ and $3$. Continue in this pattern of mapping integers by diagonal until the entire table is filled. This mapping is shown in \longref{tab:motiv_rationals_and_naturals}. % \begin{table}[!ht]\centering \begin{tabular}{|cccccc|} \hline[5pt] $\left( \frac{1}{1}, 1 \right)$ & $\left( \frac{1}{2}, 3 \right)$ & $\left( \frac{1}{3}, 6 \right)$ & $\left( \frac{1}{4}, 10 \right)$ & $\left( \frac{1}{5}, 15 \right)$ & $\cdots$ \\[5pt] $\left( \frac{2}{1}, 2 \right)$ & $\left( \frac{2}{2}, 5 \right)$ & $\left( \frac{2}{3}, 9 \right)$ & $\left( \frac{2}{4}, 14 \right)$ & $\left( \frac{2}{5}, 20 \right)$ & $\cdots$ \\[5pt] $\left( \frac{3}{1}, 4 \right)$ & $\left( \frac{3}{2}, 8 \right)$ & $\left( \frac{3}{3}, 13 \right)$ & $\left( \frac{3}{4}, 19 \right)$ & $\left( \frac{3}{5}, 26 \right)$ & $\cdots$ \\[5pt] $\left( \frac{4}{1}, 7 \right)$ & $\left( \frac{4}{2}, 12 \right)$ & $\left( \frac{4}{3}, 18 \right)$ & $\left( \frac{4}{4}, 25 \right)$ & $\left( \frac{4}{5}, 33 \right)$ & $\cdots$ \\[5pt] $\left( \frac{5}{1}, 11 \right)$ & $\left( \frac{5}{2}, 17 \right)$ & $\left( \frac{5}{3}, 24 \right)$ & $\left( \frac{5}{4}, 32 \right)$ & $\left( \frac{5}{5}, 41 \right)$ & $\cdots$ \\[5pt] $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\ddots$ \\[5pt] \hline \end{tabular} \caption{Motivation for rational number and natural number bijection} \label{tab:motiv_rationals_and_naturals} \end{table} % Of course, this mapping is not a valid total function because elements of the table do not represent distinct rationals. For example, creating a mapping from $\frac{1}{1}$ that is different from the mapping from $\frac{2}{2}$ is not valid since both ratios represent the same rational number. Thus, construct a new table of rationals by traversing \longref{tab:rationals_and_naturals} in the order of the natural numbers in each mappings (\ie, traverse the diagonals starting in the upper-left corner and move right) but skip the rationals that have already been listed. That is, since $\frac{1}{1}$ is listed first, $\frac{2}{2}$ can be skipped. Map the rationals that are not skipped to the natural numbers, starting with $1$. The result is \longref{tab:motiv_rationals_and_naturals_2}, where skipped ratios are shown with the symbol $\cdot$. % \begin{table}[!ht]\centering \begin{tabular}{|cccccc|} \hline[5pt] $\left( \frac{1}{1}, 1 \right)$ & $\left( \frac{1}{2}, 3 \right)$ & $\left( \frac{1}{3}, 5 \right)$ & $\left( \frac{1}{4}, 9 \right)$ & $\left( \frac{1}{5}, 11 \right)$ & $\cdots$ \\[5pt] $\left( \frac{2}{1}, 2 \right)$ & $\cdot$ & $\left( \frac{2}{3}, 8 \right)$ & $\cdot$ & $\left( \frac{2}{5}, 16 \right)$ & $\cdots$ \\[5pt] $\left( \frac{3}{1}, 4 \right)$ & $\left( \frac{3}{2}, 7 \right)$ & $\cdot$ & $\left( \frac{3}{4}, 15 \right)$ & $\left( \frac{3}{5}, 20 \right)$ & $\cdots$ \\[5pt] $\left( \frac{4}{1}, 6 \right)$ & $\cdot$ & $\left( \frac{4}{3}, 14 \right)$ & $\cdot$ & $\left( \frac{4}{5}, 26 \right)$ & $\cdots$ \\[5pt] $\left( \frac{5}{1}, 10 \right)$ & $\left( \frac{5}{2}, 13 \right)$ & $\left( \frac{5}{3}, 19 \right)$ & $\left( \frac{5}{4}, 25 \right)$ & $\cdot$ & $\cdots$ \\[5pt] $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\ddots$ \\[5pt] \hline \end{tabular} \caption{More motivation for rational number and natural number bijection} \label{tab:motiv_rationals_and_naturals_2} \end{table} % However, this provides no mapping for rational numbers representing with ratios that include a single negative integer. It also does not provide a mapping for rational number $\frac{0}{1}$. So, use the mapping depicted in \longref{tab:rationals_and_naturals}. % \begin{table}[!ht]\centering \begin{tabular}{|c|} \hline[5pt] $\left( \frac{0}{1}, 1 \right)$ \\[5pt] \hline \end{tabular}\\ \medskip \begin{tabular}{|cccc|} \hline[5pt] $\left( \frac{1}{1}, 2 \right)$ & $\left( \frac{1}{2}, 6 \right)$ & $\left( \frac{1}{3}, 10 \right)$ & $\cdots$ \\[5pt] $\left( \frac{2}{1}, 4 \right)$ & $\cdot$ & $\left( \frac{2}{3}, 16 \right)$ & $\cdots$ \\[5pt] $\left( \frac{3}{1}, 8 \right)$ & $\left( \frac{3}{2}, 14 \right)$ & $\cdot$ & $\cdots$ \\[5pt] $\vdots$ & $\vdots$ & $\vdots$ & $\ddots$ \\[5pt] \hline \end{tabular} \quad \begin{tabular}{|cccc|} \hline[5pt] $\left( \frac{-1}{1}, 3 \right)$ & $\left( \frac{-1}{2}, 7 \right)$ & $\left( \frac{-1}{3}, 11 \right)$ & $\cdots$ \\[5pt] $\left( \frac{-2}{1}, 5 \right)$ & $\cdot$ & $\left( \frac{-2}{3}, 17 \right)$ & $\cdots$ \\[5pt] $\left( \frac{-3}{1}, 9 \right)$ & $\left( \frac{-3}{2}, 15 \right)$ & $\cdot$ & $\cdots$ \\[5pt] $\vdots$ & $\vdots$ & $\vdots$ & $\ddots$ \\[5pt] \hline \end{tabular} \caption{The rational number to natural number bijection} \label{tab:rationals_and_naturals} \end{table} % Clearly, this is mapping is a total function that is surjective (\ie, it maps to every natural number) and is injective (each natural number only receives one mapping), and thus it is a bijection. Therefore, $\Q \cong \N$. In other words, $\Q$ is countably infinite; it is possible to count each of the rationals. In fact, any set that can be listed in a table as in \longref{tab:motiv_rationals_and_naturals} can be shown to be countable. This includes the set $\N^2$ (\ie, $\N \times \N$) which can easily be written in table form; therefore, $\N^2 \cong \N$. In fact, it can be shown that for any $n \in \N - \{1\}$, it is the case that $\N^n \cong \N$. Additionally, any set that is congruent to such a set is also congruent to $\N$. However, the power set of any of these sets (\ie, $\Pow(\N^2)$ which is congruent to and also denoted $2^{\N^2}$) is not countable. \paragraph{Symbols:} There are some symbols that are used to represent some of the equivalence classes that are elements of set $\Q$. For every integer $z \in \Z$, define the symbol $z^*$ as the rational $z/1$ (\ie, the rational that includes $(z,1)$ in its equivalence class). That is, define the familiar symbols % \begin{align*} &\mathrel{\vdots}\\ -2^* &\triangleq \frac{-2}{1} = \frac{-4}{2} = \frac{-6}{3} = \cdots\\ -1^* &\triangleq \frac{-1}{1} = \frac{-2}{2} = \frac{-3}{3} = \cdots\\ 0^* &\triangleq \frac{0}{1} = \frac{0}{2} = \frac{0}{3} = \cdots\\ 1^* &\triangleq \frac{1}{1} = \frac{2}{2} = \frac{3}{3} = \cdots\\ 2^* &\triangleq \frac{2}{1} = \frac{4}{2} = \frac{6}{3} = \cdots\\ &\mathrel{\vdots} \end{align*} % As we did with the symbols we used with $\Z$, later we will justify dropping the $*$ superscripts on these symbols. \paragraph{Total Ordering:} Take four whole numbers $p,q,r,s$ with $q \neq 0$ and $s \neq 0$. The rational $\frac{p}{q}$ is said to be less than or equal to $\frac{r}{s}$ (denoted $\frac{p}{q} \leq \frac{r}{s}$) if and only if % \begin{equation*} ( qs > 0 \text{ and } ps \leq qr ) \text{ or } ( qs < 0 \text{ and } ps \geq qr ) \end{equation*} % Similarly, $\frac{p}{q}$ is strictly less than $\frac{r}{s}$ (denoted $\frac{p}{q} < \frac{r}{s}$) if and only if % \begin{equation*} ( qs > 0 \text{ and } ps < qr ) \text{ or } ( qs < 0 \text{ and } ps > qr ) \end{equation*} % Just as with the related inequality relation on the other numbers, $\frac{p}{q} \leq \frac{r}{s}$ can also be denoted $\frac{r}{s} \geq \frac{p}{q}$, and $\frac{p}{q} < \frac{r}{s}$ can also be denoted $\frac{r}{s} > \frac{p}{q}$. In these cases, $>$ ($\geq$) represents that a rational is greater than (or equal to) another rational. This ordering implies that % \begin{equation*} \cdots \leq -2^* \leq -1^* \leq \frac{-1}{2} \leq \frac{-1}{4} \leq \frac{-1}{8} \leq 0^* \leq \frac{1}{8} \leq \frac{1}{4} \leq \frac{1}{2} \leq 1^* \leq 2^* \leq \cdots \end{equation*} % and, in fact, % \begin{equation*} \cdots < -2^* < -1^* < \frac{-1}{2} < \frac{-1}{4} < \frac{-1}{8} < 0^* < \frac{1}{8} < \frac{1}{4} < \frac{1}{2} < 1^* < 2^* < \cdots \end{equation*} % We refer to any rational greater than $0^*$ as \emph{positive} and any rational less than $0^*$ as \emph{negative}. The \emph{non-negative rationals} are the positive rationals and $0^*$ (\ie, the complement of the negative rationals). The \emph{non-positive rationals} are the negative rationals and $0^*$ (\ie, the complement of the positive rationals). The \emph{non-zero rationals} are all of the rationals except for $0^*$ (\ie, $\Q \setdiff \{0^*\}$, the complement of $\{0^*\}$). \paragraph{Dense Ordering:} Note that for any two \emph{distinct} rational numbers $x,y \in \Q$ such that $x < y$, there is a third rational number $z \in \Q$ such that $x < z < y$. As discussed, this is not the case with the whole numbers nor the integers. This property makes the set of rational numbers $\Q$ a \emph{densely ordered set}. This is an important property of the rational numbers. \paragraph{Lack of Gaplessness:} In \longref{app:math_ordering_issues}, a subset of the rationals is presented that has no least upper bound. Therefore, $\Q$ cannot be gapless. It is interesting that the rational numbers are both a countable set and a densely ordered set. Being both densely ordered and countable prevents the rational numbers from being \emph{gapless}, as is shown in \longref{app:math_countability_and_order}. This motivates the need for the \emph{real numbers}, described in \longref{app:math_reals}, which are gapless and have a dense ordering; however, this requires that the set of the real numbers is uncountable. \paragraph{Lack of Certain Existence of Minima and Maxima:} Since $\Q$ is not gapless, there are subsets of $\Q$ that do not have greatest lower bounds or do not have least upper bounds; these such subsets also cannot have minima nor maxima. An example of this is shown in \longref{app:math_ordering_issues}. \paragraph{Addition:} For integers $p,q,r,s \in \Z$ with $q \neq 0^*$ and $s \neq 0$, define the \emph{addition} operator $+$ such that % \begin{equation*} \frac{p}{q} + \frac{r}{s} \triangleq \frac{ps + qr}{qs} \end{equation*} % where the result of the addition is called the \emph{sum}. Thus, % \begin{align*} \frac{r}{s} + \frac{p}{q} = \frac{rq + sp}{sq} = \frac{qr + ps}{qs} = \frac{ps + qr}{qs} = \frac{p}{q} + \frac{r}{s} \end{align*} % Therefore, rational number addition is \emph{commutative}. Thus, for any two rationals $x,y \in \Q$, $x + y = y + x$. Additionally, for integers $p,q,r,s,t,u \in \Z$ with $q \neq 0$, $s \neq 0$, and $u \neq 0$, % \begin{align*} \frac{p}{q} + \left(\frac{r}{s} + \frac{t}{u}\right) &= \frac{p}{q} + \frac{ru+st}{su} = \frac{psu+q(ru+st)}{qsu} = \frac{psu+qru+qst}{qsu}\\ &= \frac{(ps+qr)u+qst}{qsu} = \frac{ps+qr}{qs} + \frac{t}{u}\\ &= \left(\frac{p}{q}+\frac{r}{s}\right) + \frac{t}{u} \end{align*} % Therefore, rational number addition is also \emph{associative}. Thus, for any three rationals $x,y,z \in \Q$, $x + (y+z) = (x+y)+z$. Note that for any three integers $p,q,r \in \Z$ with $q \neq 0$ and $r \neq 0$, % \begin{align*} \frac{p}{q} + 0^* &= \frac{p}{q} + \frac{0}{r} = \frac{pr+q \times 0}{rq} = \frac{pr}{rq} = \frac{p}{q} \end{align*} % where the second and last steps are justified by \longref{eq:rational_equivalence_relation}. Thus, for any rational number $x \in \Q$, $x + 0^* = x$, and so $0^*$ is known as the \emph{additive identity} for rational numbers. \paragraph{Additive Inverses:} For integers $p,q \in \Z$ with $q \neq 0^*$, note that % \begin{align*} \frac{p}{q} + {-p}{q} &= \frac{pq+q({-p})}{qq} = \frac{pq-qp}{qq} = \frac{pq-pq}{qq}\\ &= \frac{(p-p)q}{qq} = \frac{p-p}{q} = \frac{0}{q} = 0^* \end{align*} % Therefore, for any rational $\frac{p}{q}$, its \emph{additive inverse} is the rational $\frac{-p}{q}$. For reasons to be explained, the additive inverse of rational $x \in \Q$ will be denoted by $-x$. It can be shown that for $x,y \in \Q$ with $x > 0^*$ and $y < 0^*$ then $-x < 0^*$ and $-y > 0^*$. It can also be shown that for $x \in \Q$, $-(-x)=x$. \paragraph{Multiplication:} For integers $p,q,r,s \in \Z$ with $q \neq 0$ and $s \neq 0$, define the \emph{multiplication} operator $\times$ such that % \begin{equation*} \frac{p}{q} \times \frac{r}{s} \triangleq \frac{pr}{qs} \end{equation*} % where the result of a multiplication is called the \emph{product}. Thus % \begin{align*} \frac{r}{s} \times \frac{p}{q} = \frac{rp}{sq} = \frac{pr}{qs} = \frac{p}{q} \times \frac{r}{s} \end{align*} % Therefore, rational number multiplication is \emph{commutative}. That is, for any two rationals $x,y \in \Q$, $x \times y = y \times x$. Now take integers $p,q,r,s,t,u \in \Z$ with $q \neq 0$, $s \neq 0$, and $u \neq 0$. Note that % \begin{align*} \frac{p}{q} \times \left(\frac{r}{s} \times \frac{t}{u}\right) = \frac{p}{q} \times \frac{rt}{su} = \frac{prt}{qsu} = \frac{pr}{qs} \times \frac{t}{u} = \left(\frac{p}{q} \times \frac{r}{s}\right) \times \frac{t}{u} \end{align*} % Therefore, rational number multiplication is also \emph{associative}. That is, for any three rationals $x,y,z \in \Q$, $x\times(y\times z)=(x\times y)\times z$. Also note that when multiplication is used with addition, all multiplication operations should be completed first unless grouping symbols like parentheses indicate that an addition should be completed first. However, note that for any three rationals $x,y,z \in \Q$, % \begin{equation*} p\times(q + r) = pq + pr \end{equation*} % That is, rational number multiplication and addition have the distributive property. Additionally, the notation $x \dot y$ or simply $x y$ will often be used instead of $x \times y$. Note that for any three integers $p,q,r \in \Z$ with $q \neq 0$ and $r \neq 0$, % \begin{align*} \frac{p}{q} \times 0^* = \frac{p}{q} \times \frac{0}{r} = \frac{p \times 0}{qr} = \frac{0}{qr} = 0^* \end{align*} % where the second and last steps are justified by \longref{eq:rational_equivalence_relation}. Thus, for any rational number $x \in \Q$, $x \times 0^* = 0^*$. In fact, for any two rational numbers $x,y \in \Q$, if $x y = 0^*$ then it must be that $x=0^*$ or $y=0^*$ or both. Additionally, it is the case that % \begin{align*} \frac{p}{q} \times 1^* = \frac{p}{q} \times \frac{r}{r} = \frac{pr}{qr} = \frac{p}{q} \end{align*} % Therefore, for any rational number $x \in \Q$, it is the case that $x \times 1^* = x$. Thus, $1^*$ is known as the \emph{multiplicative identity} for the rational numbers. Additionally, % \begin{align*} \frac{p}{q} \times -1^* = \frac{p}{q} \times \frac{-r}{r} = \frac{p(-r)}{qr} = \frac{-pr}{qr} = \frac{-p}{q} \end{align*} % Thus, multiplying any rational number $x \in \Q$ by the rational $-1^*$ produces the additive inverse of $x$. Therefore, a shorthand notation for $-1^* \times x$ is simply $-x$. \paragraph{Multiplicative Inverses:} Take integers $p,q \in \Z$ with $p \neq 0$ and $q \neq 0$. Note that % \begin{align*} \frac{p}{q} \times \frac{q}{p} &= \frac{pq}{qp} = \frac{pq}{pq} = \frac{1}{1} = 1^* \end{align*} % In other words, $\frac{q}{p}$ is the \emph{multiplicative inverse} of $\frac{p}{q}$. That is, the multiplicative inverse of a rational number is generated by substituting the numerator and denominator of any ratio that represents that rational number. For a rational number, its multiplicative inverse is also called its \emph{reciprocal}. It should be clear that every rational number $x \in \Q$ such that $x \neq 0^*$ has a multiplicative inverse, and therefore the multiplicative inverse of $x$ is denoted $x^{-1}$. \paragraph{Subtraction:} We can define the \emph{subtraction} operator $-$ for rational numbers so that for any two rationals $x,y \in \Q$, % \begin{align*} x - y &\triangleq x + -y \end{align*} % However, even though this is clearly a shorthand for addition, this operation is not commutative nor associative. The result of a subtraction is called a \emph{difference}. \paragraph{Division:} For integers $p,q,r,s \in \Z$ with $q \neq 0$, $r \neq 0$, $r \neq 0$, and $s \neq 0$, define the \emph{division} operator $/$ such that % \begin{align*} \frac{p}{q} / \frac{r}{s} &\triangleq \frac{ps}{qr}\\ &= \frac{p}{q} \times \frac{s}{r} = \frac{p}{q} \times \frac{r}{s}^{-1} \end{align*} % where the result of the division is known as a \emph{quotient}. Sometimes the division operator $/$ will be represented as a ratio. That is, for rationals $x,y \in Q$, $x/y$ will be written $\frac{x}{y}$. Notice that division is simply multiplication by the multiplicative inverse; that is, notice that % \begin{align*} \frac{\frac{p}{q}}{\frac{r}{s}} &= \frac{p}{q} \frac{s}{r} \end{align*} % As described above, the ratio $\frac{s}{r}$ is known as the \emph{reciprocal} of the ratio $\frac{r}{s}$, and so $\frac{s}{r}$ is the \emph{multiplicative inverse} of $\frac{r}{s}$ (\ie, $\left(\frac{r}{s}\right)^{-1}$). Therefore, division is identical to multiplication with a reciprocal. However, it is \emph{not} the case that division is commutative, associative, nor distributive. It is simply a shorthand. Also note that % \begin{align*} 1^* / \frac{r}{s} = \frac{q}{q} / \frac{r}{s} = \frac{q}{q} \times \frac{s}{r} = \frac{qs}{qr} = \frac{s}{r} \end{align*} % where the last step is justified by \longref{eq:rational_equivalence_relation}. Therefore, for any non-zero rational $x \in \Q$ (\ie, $x \neq 0^*$), the notation $1^*/x$ or $\frac{1^*}{x}$ represents its reciprocal. For integers $p,q \in \Z \setdiff \{0\}$, note that % \begin{align*} \frac{p}{q} \times \frac{1}{\frac{p}{q}} &= \frac{p}{q} \times \frac{q}{p} = \frac{pq}{qp} = \frac{pq}{pq} = \frac{1}{1} = 1^* \end{align*} % That is, the reciprocal of any non-zero rational number is its own \emph{multiplicative inverse}. For any rational number $x \in \Q$, $x \times ( 1^*/x ) = 1^*$. For example, $\frac{1}{2}$ is the multiplicative inverse of $2^*$ since $2^* = \frac{2}{1}$. Note that for the $p,q \in Z \setdiff \{0\}$, % \begin{align*} \frac{\frac{p}{q}}{\frac{p}{q}} &= \frac{p}{q} \times \frac{q}{p} = \frac{p}{q} \times \frac{1}{\frac{p}{q}} = \frac{p}{q} \times \left( \frac{p}{q} \right)^{-1} = 1^* \end{align*} % In other words, by the definition of the ratio of two rational numbers, for any non-zero rational number $x \in \Q$, $x/x = 1^*$. \paragraph{Exponentiation:} Now that multiplication and division have been defined for the rationals, exponentiation can also be defined. For any rationals $x, y, a, b \in \Q$, exponentiation of the rationals is such that % \begin{align*} x^{0^*} &\triangleq 1^*\\ x^{1^*} &\triangleq x\\ x^{-1^*} &\triangleq \frac{1^*}{x}\\ x^{a+b} &\triangleq x^a \times x^b\\ x^{-b} &\triangleq \frac{1^*}{x^b}\\ x^{a-b} &\triangleq \frac{x^a}{x^b}\\ (x^a)^b &\triangleq x^{a \times b}\\ (x \times y)^a &\triangleq x^a \times y^a\\ \left(\frac{x}{y}\right)^a &\triangleq \frac{x^a}{y^a} \end{align*} % Take rational $x \in \Q$ and integers $p,q \in \Z$ with $q \neq 0$ that make up rational $\frac{p}{q} \in \Q$. By the laws above, the rational % \begin{equation*} x^\frac{p}{q} = ( x^\frac{1}{q} )^{p^*} \end{equation*} % where $p^* = \frac{p}{1}$. Note that if $q < 0$ then $x^\frac{1}{q} = ( x^\frac{1}{|q|})^{-1^*}$, and so assume that $q > 0$. Thus, the existence of $x^\frac{1}{q}$ where $q \in \Z$ with $q > 0$ is of critical importance. The rational number $x^\frac{1}{s}$ should be such that $( x^\frac{1}{q} )^{q^*} = x$. Note that $( -1^* )^\frac{1}{2}$ does not exist since there is no rational $x \in \Q$ such that $x \times x = -1^*$. Similarly, there is no rational $x \in \Q$ such that $x \times x = 2^*$, and so $(2^*)^\frac{1}{2}$ does not exist. However, $( -8^* )^\frac{1}{3} = -2^*$ since $-2^* \times -2^* \times -2^* = -8^*$. Also note that for any $x \in \Q$, $x^{2^*} \geq 0^*$. Additionally, by this definition, ${0^*}^{0^*} = 1^*$. This definition also gives an alternate notation for the multiplicative inverse. That is, for any $x \in \Q \setdiff \{0^*\}$, its multiplicative inverse $1^*/x$ is also denoted $x^{-1^*}$ and so $x \times x^{-1^*} = x^{-1^*} \times x = 1^*$. \paragraph{Roots:} Take integer $q \in \Z$ with $q > 0$ and rational $x \in \Q$. The rational number $x^\frac{1}{q}$ is called the \emph{$q\th$ root} of $x$ and is also denoted $\sqrt[q]{x}$. The special case of $\sqrt[3]{x}$ is called the \emph{cube root} of $x$. The special case of $\sqrt[2]{x}$ is called the \emph{square root} of $x$ and is often written as $\sqrt{x}$. \paragraph{Ratios of Even Integers:} Take integers $p,q \in \Z$ whose ratio $p/q$ represents a particular rational number (\ie, $q \neq 0$). Additionally, assume that $p$ and $q$ are both even integers. Thus, it must be that there are integers $r,s$ such that $p = 2r$ and $q = 2s$. Therefore, % \begin{align*} \frac{p}{q} &= \frac{2r}{2s} = \frac{r}{s} \end{align*} % where the last step is justified by \longref{eq:rational_equivalence_relation}. Thus, the rational number represented by $p/q$ must also be represented by $r/s$. It can be shown that applying this argument repeatedly leads to the conclusion that every rational number can be represented by a ratio of two integers where one integer is odd. For example, assume that there exists a rational number $x \in \Q$ such that $x^2 = 2^*$. Therefore it must be that there exists integers $p,q \in \Z$ with $q \neq 0$ such that $p^2/q^2 = 2/1$. By \longref{eq:rational_equivalence_relation}, $p^2 = 2 q^2$. Therefore $p^2$ is even. However, as was shown above, this must mean that $p$ is even as well. If $p$ is even then there exists $r \in \Q$ such that $p = 2r$. Thus, $p^2 = 4 r^2 = 2 \times 2 r^2$. This implies that $q^2$ must also be even, and thus $q$ must be even. Therefore any ratio representing rational number $x$ must be a ratio of two even integers. However, it was shown that every rational number can be expressed as the ratio of two integers, one of which is odd. So, this is a contradiction. Therefore, it must be the case that there exists no $x \in \Q$ such that $x^2 = 2^*$. \paragraph{Base-10 (Decimal) Notation:} Now that we have defined the rationals and have endowed them with addition, multiplication, and exponentiation, it is possible to introduce familiar decimal notations such as % \begin{align*} 1.205 \triangleq 1^* \times {10^*}^{0} + 2^* \times {10^*}^{-1} + 0^* \times {10^*}^{-2} + 5^* \times {10^*}^{-3} \end{align*} % We trust that the reader is familiar with such notation. For brevity, we will not explain it any further. A slightly more detailed discussion will be given in \longref{app:math_reals}. \paragraph{Absolute Value and Signum:} For any rational $x \in \Q$, denote its \emph{absolute value} with the notation $|x|$ defined by % \begin{equation*} |x| \triangleq \begin{cases} x &\text{if } x \geq 0^*\\ -x &\text{if } x < 0^* \end{cases} \end{equation*} % and define the \emph{signum function} (also called the \emph{sign function}, not to be confused with the \emph{sine function}) $\sgn: \Q \mapsto \{-1^*,0^*,1^*\}$ with % \begin{equation*} \sgn(x) \triangleq \begin{cases} -1^* &\text{ if} x < 0^*\\ 0^* &\text{ if} x = 0^*\\ 1^* &\text{ if} x > 0^* \end{cases} \end{equation*} % Therefore, any rational $z \in \Q$ can be represented as a magnitude (\ie, absolute value $|z|$) and a sign (\ie, $\sgn(z)$), as in % \begin{equation*} z = \sgn(z) \times |z| \end{equation*} % Note that the absolute value has some special properties. In particular, for any two rationals $x,y \in \Q$, % \begin{itemize} \item $|x| \geq 0^*$ \item $|x| = 0^*$ if and only if $x = 0^*$ \item $|x \times y| = |x| \times |y|$ \item $|x + y| \leq |x| + |y|$ \item $|x - y| \geq |x| - |y|$ \item $|{-x}| = |x|$ \item $|x| \leq y$ if and only if $-y \leq x \leq y$ \item $|x/y| = |x|/|y|$ if and only if $y \neq 0^*$ \end{itemize} % All of these properties are identical to the ones for integers, except for the last property which has been added specifically for the rationals. \paragraph{Algebraic Structure of the Rationals:} Note that for $(\Q,{+},0^*)$, it is the case that % \begin{itemize} \item for all $x,y \in \Q$, $x + y = y + x$ \item for all $x,y,z \in \Q$, $(x + y) + z = x + (y + z)$ \item for all $x \in \Q$, $0^* + x = x + 0^* = x$ \item for all $x \in \Q$, $x + -x = -x + x = 0^*$ \end{itemize} % and for $(\Q,{\times},1^*)$, it is the case that % \begin{itemize} \item for all $x,y \in \Q$, $x \times y = y \times x$ \item for all $x,y,z \in \Q$, $(x \times y) \times z = x \times (y \times z)$ \item for all $x \in \Q$, $1^* \times x = x \times 1^* = x$ \end{itemize} % And so for $(\Q,{+},{\times},0^*,1^*)$, % \begin{itemize} \item $(\Q,{+},0^*)$ is a \emph{commutative group} with additive inverse $-x$ for every $x \in \Q$ \item $(\Q,{\times},1^*)$ is a \emph{commutative monoid} with multiplicative inverse $x^{-1}$ for every $x \in \Q \setdiff \{0^*\}$ \item $0^* \neq 1^*$ \item for each $x,y,z \in \Q$, $x(y + z) = xy + yz$ and $(x + y)z = xz + yz$ \item for all $x \in \Q \setdiff \{0^*\}$, $x \times x^{-1} = x^{-1} \times x = 1$ \end{itemize} % Therefore, $(\Q,{+},{\times},0^*,1^*)$ is a \emph{field}. Thus, $(\Q,{+},{\times},0^*,1^*)$ is trivially an algebra over itself (\ie, a $\Q$-algebra). However, also note that for any $x,y,z \in \Q$, % \begin{itemize} \item if $x \leq y$ then $z + x \leq z + y$ \item if $0^* \leq x$ and $0^* \leq y$ then $0^* \leq xy$ \end{itemize} % and so $(\Q,{+},{\times},0^*,1^*,{\leq})$ is an \emph{ordered field} and all aspects of familiar arithmetic apply to it. Unless otherwise noted, whenever $\Q$ is used, it is assumed that it is equipped with operators $+$ and $\times$ and order relation $\leq$; in other words, $\Q$ is implicitly taken to be the ordered field $(\Q,{+},{\times},0^*,1^*,{\leq})$. \paragraph{Relationship to Integers:} Define the set $\Z^*$ as the set of rationals with a denominator of $1$. That is, define $\Z^*$ by % \begin{align*} \Z^* &\triangleq \left\{ \frac{p}{q} \in \Q : q = 1 \right\} = \{ [(p,q)] \in \Z \times \Z : p \in \Z, q = 1 \}\\ &= \{ [(p,1)] : p \in \Z \} = \left\{ \frac{p}{1} : p \in \Z \right\}\\ &= \left\{ p^* : p \in \Z \right\} \end{align*} % and so $\Z^*$ is the set of rationals that have a element in their equivalence class with denominator $1$. It is easy to show that if the image of $\Z^* \times \Z^*$ through either operator $+$ or $\times$ is $\Z^*$. Additionally, it can be shown that $(\Z^*,{+}|_{\Z^*},{\times}|_{\Z^*})$ forms a commutative ring, and so $\Z^*$ is a subring of $\Q$. Of course, since $\Q$ is an ordered field, $\Q$ is also an ordered ring; since every subring of an ordered ring is also an ordered ring then $(\Z^*,{+}|_{\Z^*},{\times}|_{\Z^*})$ is an ordered ring. Now, take the function $f: \Z^* \mapsto \Z$ defined by % \begin{align*} f &\triangleq \left\{ \left(\frac{p}{1}, p\right): \text{ for all } p \in \Z \right\}\\ &= \left\{ \dots, \left(\frac{-2}{1}, -2\right), \left(\frac{-1}{1}, -1\right), \left(\frac{0}{1}, 0\right), \left(\frac{1}{1}, 1\right), \left(\frac{2}{1}, 2\right), \dots \right\}\\ &= \{ (p^*, p): \text{ for all } p \in \Z \}\\ &= \{ \dots, (-2^*, -2), (-1^*, -1), (0^*, 0), (1^*, 1), (2^*, 2), \dots \} \end{align*} % Clearly this is a bijection. That is, the inverse $f^{-1}: \Z \mapsto \Z^*$ is defined by % \begin{align*} f^{-1} &\triangleq \left\{ \left(p,\frac{p}{1}\right): \text{ for all } p \in \Z \right\}\\ &= \left\{ \dots, \left(-2,\frac{-2}{1}\right), \left(-1,\frac{-1}{1}\right), \left(0,\frac{0}{1}\right), \left(1,\frac{1}{1}\right), \left(2,\frac{2}{1}\right), \dots \right\}\\ &= \{ (p^*, p): \text{ for all } p \in \Z \}\\ &= \{ \dots, (-2,-2^*), (-1,-1^*), (0,0^*), (1,1^*), (2,2^*), \dots \} \end{align*} % Therefore $\Z \cong \Z^*$. Also, note that for any rationals $x,y \in \Z^*$, % \begin{enumerate}[(i)] \item if $x \geq y$ then $f(x) \geq f(y)$ \label{item:rational_integer_ordering} \item $f(x + y) = f(x) + f(y)$ \label{item:rational_integer_ring_homomorphism_plus} \item $f(x \times y) = f(x) \times f(y)$ \label{item:rational_integer_ring_homomorphism_times} \item $f(1^*)=1$ \label{item:rational_integer_ring_homomorphism_m_identity} \end{enumerate} % Property (\shortref{item:rational_integer_ordering}) shows that $f$ is a monotone function, and properties (\shortref{item:rational_integer_ring_homomorphism_plus})--% (\shortref{item:rational_integer_ring_homomorphism_m_identity}) show that $f$ is a ring homomorphism. Since $f$ is also a bijection, it can be said that $f$ is both an isomorphism in both the order sense and the algebraic sense. In other words, $\Z$ is isomorphic to $\Z^*$ in both an order sense and an algebraic sense. Therefore, not only is $\Z \cong \Z^*$, but $\Z^*$ is a valid \emph{representation} for $\Z$, and it is justifiable to say that $\Z$ is a subring of $\Q$. For example, note that for any rationals $x,y \in \Z^*$ and whole number $a \in \W$, % \begin{itemize} \item $x = y$ if and only if $f(x) = f(y)$ \item $x \leq y$ if and only if $f(x) \leq f(y)$ \item $f(0^*) = 0$ \item $f(x + y) = f(x)+f(y)$ \item $f(x - y) = f(x)-f(y)$ \item $f(1^*) = 1$ \item $f(x y) = f(x) f(y)$ \item $f(x^{a^*}) = f(x)^a$ \end{itemize} % So arithmetic and order are both preserved by the bijection $f$. Thus, while $\Z$ is certainly not equal to $\Z^*$, it is equal in all of the important ways that matter to us, and so we can consider $\Z \subset \Q$ with all of its standard ordering and operations. In other words, the $*$ superscript can be dropped from all of the rational symbols above; the set $\Z^*$ is a valid representation of the set of the integers $\Z$. \subsection{Ordering Issues with the Countable Numbers} \label{app:math_ordering_issues} Up to this point, the numbers that have been defined have been very intuitive. That is, % \begin{itemize} \item Whole numbers (and natural numbers) are an abstraction of standard counting. \item Integers quantify the differences between whole numbers. \item Rationals provide a scale on which to order differences (\ie, integers) on equal footing. \end{itemize} % Additionally, since there is a subset of the integers that is isomorphic to a the whole numbers and a subset of the rationals that is isomorphic to the integers, the rationals provide an interesting new perspective on the other sets of numbers. That is, the rationals seem to fill gaps in the other numbers--they provide the same extension to the whole numbers that the integers do while also yielding an unbounded set of numbers between each integer. However, like the whole numbers and integers, the rationals are countable. As we will show, this will ultimately limit how well the rationals can fill the gaps between the integers. \paragraph{Example of Existence of Bounds:} As an exercise, take the sets $\set{X},\set{Y} \in \Q$ defined as % \begin{equation*} \set{X} \triangleq \{ q \in \Q : -2 < q \text{ and } q < 2 \} \quad \text{ and } \quad \set{Y} \triangleq \{ q \in \Q : 1 \leq q \leq 5 \} \end{equation*} % Of course, these sets could also be specified with $\{ q \in \Q : |q| < 2 \}$ and $\{ q \in \Q : |q-3| \leq 2 \}$ respectively. Note that the infimum and supremum of these two sets both exist. In particular, % \begin{equation*} \inf \set{X} = -2 \quad \text{ and } \quad \sup \set{X} = 2 \end{equation*} % and \begin{equation*} \inf \set{Y} = 1 \quad \text{ and } \quad \sup \set{Y} = 5 \end{equation*} % While set $\set{Y}$ has a maximum (\ie, $\max \set{Y} = 5$) and a minimum (\ie, $\min \set{Y} = 1$), set $\set{X}$ has neither a maximum nor a minimum. Both sets contain a countably infinite number of elements, and because of that they are both congruent to the set $\N$. These are typical sets of rational numbers. Both are bounded, though one includes its bounds and one does not. The bounds exist and are members of $\Q$. \paragraph{Example of Nonexistence of Bounds:} We borrow this example from \citet{Rudin76}. Consider the set $\set{Z} \subset \Q$ defined as % \begin{equation*} \set{Z} \triangleq \{ q \in \Q : q^2 \leq 2 \text{ and } q \geq 0 \} \end{equation*} % Note that it has been shown that there is no rational $q \in \Q$ such that $q^2 = 2$. Therefore, % \begin{equation*} \set{Z} = \{ q \in \Q : q^2 < 2 \text{ and } q \geq 0 \} \end{equation*} % Since $0^2 < 2$ and $0 \geq 0$, it is clear that $0 \in \set{Z}$. In fact, $\min \set{Z} = \inf \set{Z} = 0$. That is, $0$ is the greatest lower bound and, in fact, the minimum of $\set{Z}$. Also, it can be shown that for all $z \in \set{Z}$, $z < 2$. Therefore, $2$ is an upper bound for $\set{X}$. Thus, $\set{Z}$ is certainly bounded from above and bounded from below (\ie, $\set{Z}$ is \emph{bounded}). However, note that $2^2 > 2$, and so while $2$ is an upper bound on $\set{Z}$, $2 \notin \set{Z}$. To find the \emph{least} upper bound of $\set{Z}$, take $q,r \in \Q$ such that $q \geq 0$ and $r$ is such that % \begin{align} r &= q + \frac{2 - q^2}{q + 2} \label{eq:def_r_one}\\ &= \frac{q(q+2)}{q+2} + \frac{2 - q^2}{q + 2} = \frac{q^2+2q+2-q^2}{q+2} = \frac{2q+2}{q+2} \label{eq:def_r_two}\\ &= \frac{2(q+1)}{q+2} = 2 \frac{q+1}{q+2} \label{eq:def_r_three} \end{align} % then, by \longref{eq:def_r_one}, % \begin{align} r^2 &= \frac{(2q+2)^2}{(q+2)^2} = \frac{4 q^2 + 8q + 4}{(q+2)^2} = \frac{2 q^2 + 8q + 8 + 2q^2 - 4}{(q+2)^2} \nonumber\\ &= \frac{2 (q^2 + 4q + 4) + 2(q^2 - 2)}{(q+2)^2} = \frac{2 (q+2)^2 + 2(q^2 - 2)}{(q+2)^2} \nonumber\\ &= 2 + \frac{2(q^2 - 2)}{(q+2)^2} = 2 - \frac{2(2 - q^2)}{(q+2)^2} \label{eq:def_r2} \end{align} % Assume that $q \in \set{Z}$. Then $2 - q^2$ is positive and, by \longref{eq:def_r_one}, $r > q$. However, \longref{eq:def_r2} shows that $r^2 - 2$ is negative. That is, $r^2 < 2$. Therefore, $r \in \set{Z}$. Thus, for every $q \in \set{Z}$, there exists an $r > q$ such that $r \in \set{Z}$, and so there can be no upper bound for $\set{Z}$ contained in $\set{Z}$. Thus, assume that $q \in \Q$ such that $q \notin \set{Z}$ and$q \geq 0$. Then $2 - q^2$ is negative and, by \longref{eq:def_r_one}, $r < q$. However, \longref{eq:def_r2} shows that $r^2 - 2$ is positive. That is, $r^2 > 2$. Therefore, $r \notin \set{Z}$. Thus, for every $q \in \Q$ with $q \notin \set{Z}$ and $q \geq 0$, there exists an $r < q$ such that $r \notin \set{Z}$. Therefore, $\set{Z} \subset \Q$ has no least upper bound in $\Q$. This means $\Q$ cannot be \emph{gapless}. \paragraph{Gaps in Rational Numbers:} Thus, the prior example shows that the rationals $\Q$ are somehow missing important numbers. That is, despite that rationals are densely ordered (\ie, any gap between any two rationals is filled by an unbounded number of rationals), there are still some sort of gaps or holes between rational numbers. This is what prevents the rationals from being \emph{gapless} and therefore also not \emph{complete}. In fact, it can be shown that any partially ordered set that is densely ordered cannot be \emph{gapless} if it is also countable; the gaps in the rationals are a direct consequence of their countability. Nontrivial dense sets that are gapless must also be uncountable. \subsection{Countability and Order: Gaplessness and Dense Ordering} \label{app:math_countability_and_order} These well-known results are due to Cantor, who provided significant contributions to the analysis of infinite sets. \paragraph{Lemma:} Every subset of $(\N,{\leq})$ has a minimum element. This was discussed in \longref{app:math_whole_numbers}. It depends upon $\N$ being gapless, bounded from below, and not densely ordered. \paragraph{Theorem:} Take partially ordered set $(\set{X},{\leq})$. If it is the case that % \begin{enumerate}[(i)] \item $\set{X}$ contains at least two elements \label{item:countability_two_elements} \item $\set{X}$ is densely ordered \label{item:countability_densely_ordered} \item $\set{X}$ is gapless \label{item:countability_gapless} \end{enumerate} % then set $\set{X}$ must be uncountable. As a logical consequence of this theorem, if $\set{X}$ is countable and has at lest two elements then it must either not be gapless (\eg, $\Q$) or not be densely ordered (\eg, $\W$, $\N$, and $\Z$). Of course, if a set is not gapless then it cannot be complete. \paragraph{Proof of Theorem:} To prove this theorem, take a partially ordered set $(\set{X},{\leq})$ that meets properties (\shortref{item:countability_two_elements}), (\shortref{item:countability_densely_ordered}), and (\shortref{item:countability_gapless}). However, assume that set $\set{X}$ is countable. We will show that this leads to a logical contradiction, and thus $\set{X}$ must be uncountable. This proof method is a form of \emph{modus tollens}, or \emph{proof by contraposition}, which is described in \longref{eq:math_logic_application_proof}. Since $\set{X}$ is countable, there exists a bijective function $f: \N \mapsto \set{X}$. Take such a function $f$. Then, $f[\N]=\set{X}$ and $f^{-1}[\set{X}]=\N$. In particular, % \begin{equation*} \{ f(n) : n \in \N \} = \{ f(1), f(2), f(3), f(4), \dots \} = \set{X} \end{equation*} % In other words, each element of $\set{X}$ can be represented by a \emph{unique} symbol of the form $f(n)$ with $n \in \N$. Clearly, % \begin{equation*} \{ f^{-1}(x) : x \in \set{X} \} = \{f^{-1}(f(1)),f^{-1}(f(2)),f^{-1}(f(3)),f^{-1}(f(4)),\dots\} = \N \end{equation*} % Also take $a_0,b_0 \in \set{X}$ such that $a_0 < b_0$. This is possible by property (\shortref{item:countability_two_elements}). Define the set $\set{A}_0 \subseteq \N$ by % \begin{equation*} \set{A}_0 \triangleq \{ n \in \N : a_0 \leq f(n) \leq b_0 \} \end{equation*} % By property (\shortref{item:countability_densely_ordered}), $\set{A}_0 \neq \emptyset$. Additionally, since $\set{A}_0 \subseteq \N$, it has a minimum element by the lemma stated above. Therefore, define $a_1 \triangleq f( \min \set{A}_0 )$. Similarly, define the set $\set{B}_0 \subseteq \N$ by % \begin{align*} \set{B}_0 &\triangleq \{ n \in \N : a_1 \leq f(n) \leq b_0 \}\\ &= \set{A}_0 \setdiff \{\min \set{A}_0\} \end{align*} % and define $b_1 \triangleq f( \min \set{B}_0 )$. By property (\shortref{item:countability_densely_ordered}), this process can continue \adinfinitum{} with $a_i$ and $b_i$ defined for all $i \in \N$ as % \begin{equation*} a_i \triangleq f\left(\min\{n \in \N : a_{i-1} \leq f(n) \leq b_{i-1}\}\right) \end{equation*} % and % \begin{equation*} b_i \triangleq f\left( \min\{n \in \N : a_i \leq f_n \leq b_{i-1}\} \right) \end{equation*} % This process is shown graphically in \longref{fig:countable_intervals}, where the arrow points in the direction of increasing order; that is, since $a_1$ is to the right of $a_0$ then $a_1 \geq a_0$. % \begin{figure}[!ht]\centering \begin{picture}(300,20)(-150,-10) % x: -150 to 150 \put(-140,6){\makebox(0,0)[b]{$a_0$}} \put(-140,-3){\line(0,1){6}} \put(-140,-6){\makebox(0,0)[t]{$f(1)$}} \put(-118,6){\makebox(0,0)[b]{$a_1$}} \put(-118,-3){\line(0,1){6}} \put(-118,-6){\makebox(0,0)[t]{$f(3)$}} \put(-91,6){\makebox(0,0)[b]{$a_2$}} \put(-91,-3){\line(0,1){6}} \put(-91,-6){\makebox(0,0)[t]{$f(5)$}} \put(-59,6){\makebox(0,0)[b]{$a_3$}} \put(-59,-3){\line(0,1){6}} \put(-59,-6){\makebox(0,0)[t]{$f(7)$}} % %\put(-47.5,0){\vector(-1,0){102.5}} \put(-47.5,0){\line(-1,0){102.5}} \put(-39.5,0){\makebox(0,0){$\cdots$}} \put(-31.5,0){\vector(1,0){197.5}} % \put(-20,6){\makebox(0,0)[b]{$b_3$}} \put(-20,-3){\line(0,1){6}} \put(-20,-6){\makebox(0,0)[t]{$f(8)$}} \put(40,6){\makebox(0,0)[b]{$b_2$}} \put(40,-3){\line(0,1){6}} \put(40,-6){\makebox(0,0)[t]{$f(6)$}} \put(100,6){\makebox(0,0)[b]{$b_1$}} \put(100,-3){\line(0,1){6}} \put(100,-6){\makebox(0,0)[t]{$f(4)$}} \put(140,6){\makebox(0,0)[b]{$b_0$}} \put(140,-3){\line(0,1){6}} \put(140,-6){\makebox(0,0)[t]{$f(2)$}} \end{picture} \caption{Nested Intervals of a Countable Densely Ordered Set} \label{fig:countable_intervals} \end{figure} % Clearly, for any $i \in \N$, $a_i \geq a_{i-1}$ and $b_i \leq b_{i-1}$. Additionally, for any $i \in \N$ and $j \in \N$, it is the case that $a_i < b_j$. Now take sets $\set{A} \subset \set{X}$ and $\set{B} \subset \set{X}$, defined by % \begin{equation*} \set{A} \triangleq \{ a_i : i \in \N \} \end{equation*} % and % \begin{equation*} \set{B} \triangleq \{ b_i : i \in \N \} \end{equation*} % As shown above, any element of $\set{B}$ is an upper bound of set $\set{A}$, and any element of set $\set{A}$ is a lower bound of set $\set{B}$. Therefore, since $\set{X}$ is gapless, the least upper bound of $\set{A}$ (\ie, $\sup \set{A}$) and the greatest lower bound of $\set{B}$ (\ie, $\inf \set{A}$) exist. Since $\sup \set{A} \in \set{X}$ and $\set{X}$ is countable then there exists some $m \in \N$ such that $f(m) = \sup \set{A}$. Take such an $m$. It is easy to show that $\sup \set{A} \leq \inf \set{B}$, and so $f(m) \leq \inf \set{B}$. Therefore, for all $a \in \set{A}$ and all $b \in \set{B}$, it is the case that % \begin{equation} a \leq f(m) \leq b \label{eq:uncountable_proof_contradiction} \end{equation} % However, by the construction of elements $a_i$ and $b_i$, there exists some $n$ such that $a_n = f(m)$ or $b_n = f(m)$. Take such an $n$. As discussed, $a_{n+1} > a_n$ and $b_{n+1} < b_n$. Thus, it is either the case that $a_{n+1} > f(m)$ or $b_{n+1} < f(m)$, which contradicts \longref{eq:uncountable_proof_contradiction} since $a_{n+1} \in \set{A}$ and $b_{n+1} \in \set{B}$. Therefore, $\set{X}$ must be uncountable. \subsection{The Real Numbers} \label{app:math_reals} As shown in \longref{app:math_ordering_issues}, the rational numbers are somehow not complete; despite there being an unbounded number of rationals between any two rationals, there are still numbers missing from $\Q$. The real numbers have been constructed to fill these gaps. However, this construction requires the reals to be uncountable. \paragraph{Definition:} Following the example of \citet{Rudin76}, we construct the reals using \emph{Dedekind cuts} of the rational numbers, which is a method attributable to Dedekind. The basic idea is to cut the rational numbers $\Q$ into a partition of two sets where one set is constructed to have no least upper bound; each real number can be thought of as taking up the space in between the two sets of the partition. That is, each real number cuts the rationals into two halves. Alternatively, the real numbers can be defined as \emph{Cauchy sequences} of rational numbers, as is discussed by \citet{Stoll79}; this other construction is originally due to Cantor. These two constructions are isomorphic to each other in both the order and algebraic senses and thus form equivalent notions of the real numbers. The following construction is a condensed form of the derivation given by \citet{Rudin76}. We omit much of the proof for brevity. The real numbers are the most abstract of the conventional number systems, and thus their construction is considerably more complicated than the other number systems. Define a real number as a strict subset $\alpha \subset \Q$ where % \begin{enumerate}[(i)] \item $\alpha \neq \emptyset$ and $\alpha \neq \Q$. \label{item:real_proper} \item If $q \in \Q$ and $p \in \alpha$ such that $q < p$ then $q \in \alpha$. \label{item:real_member} \item If $p \in \alpha$ then $p < r$ for some $r \in \alpha$. \label{item:real_nonmember} \end{enumerate} % which is called a \emph{Dedekind cut} of the rational numbers. In other words, $\alpha \subset \Q$ is a strict non-empty subset of the rationals that has no largest member. Additionally, any rational that is not a member of $\alpha$ is greater than any member of $\alpha$. Similarly, if any rational is greater than a non-member of $\alpha$, that rational itself cannot be a member of $\alpha$. Define the set of the real numbers \symdef{Bnumbers.50}{reals}{$\R$}{the set of the real numbers}, also called the reals, as % \begin{equation*} \R \triangleq \{ \xi \subset \Q : \xi \text{ has properties (\shortref{item:real_proper}), (\shortref{item:real_member}), and (\shortref{item:real_nonmember})} \} \end{equation*} % If real numbers $\alpha, \beta \in \R$ are equal then $\alpha \subseteq \beta$ and $\beta \subseteq \alpha$. In other words, real numbers $\alpha$ and $\beta$ are equal if $\alpha = \beta$ (\ie, if the sets are equal). Of course, this will also be denoted $\alpha = \beta$. \paragraph{Symbols:} For every rational number $q \in \Q$, define the set $q^*$ with % \begin{equation*} q^* \triangleq \{ p \in \Q : p < q \} \end{equation*} % Therefore, % \begin{align*} &\mathrel{\vdots}\\ -2^* &\triangleq \{ p \in \Q : p < -2 \}\\ -\frac{1}{2}^* &\triangleq \left\{ p \in \Q : p < -\frac{1}{2} \right\}\\ -1^* &\triangleq \{ p \in \Q : p < -1 \}\\ 0^* &\triangleq \{ p \in \Q : p < 0 \}\\ 1^* &\triangleq \{ p \in \Q : p < 1 \}\\ \frac{1}{2}^* &\triangleq \left\{ p \in \Q : p < \frac{1}{2} \right\}\\ 2^* &\triangleq \{ p \in \Q : p < 2 \}\\ &\mathrel{\vdots} \end{align*} % Make special note of $0^*$, $1^*$, and $-1^*$, which will all be used explicitly below. Also note that $0^*$ is the set of all negative rational numbers. Clearly, for every $q \in \Q$, $q^* \in \R$. That is, define the set $\Q^*$ as % \begin{align*} \Q^* &\triangleq \{ \{ p \in \Q : p < q \} : q \in \Q \}\\ &= \{ q^* : q \in \Q \} \end{align*} % It is clear that $\Q^* \subseteq \R$. Also note that by construction, $\Q^* \cong \Q$. It is also clear that for any $r^* \in \Q^*$, the least upper bound of $r^*$ exists and is $r \in \Q$ (\ie, for all $r \in \Q$, $\sup r^* = r$). In fact, $\Q^*$ is the collection of all real numbers that have a least upper bound in $\Q$. Later we will justify denoting $r^*$ simply by $r$. We refrain from making this substitution early in order to stress the difference between real numbers and rational numbers. Note that since the least upper bound of every element of $\Q^*$ exists, it can be written that for every $r^* \in \Q^*$, % \begin{equation*} \sup r^* = \inf ( \Q \setdiff r^* ) = \min ( \Q \setdiff r^* ) = r \end{equation*} % where $r \in \Q$ such that $r^* = \{ p \in \Q : p < r \}$. Also note that eventually we will show that $\Q^*$ is isomorphic in both order and algebraic senses to $\Q$, and thus $\Q$ and $\Q^*$ can be considered equivalent without any loss of generality (\ie, since $\Q^* \subseteq \R$ then $\Q \subseteq \R$). \paragraph{Total Ordering:} Take two real numbers $\alpha, \beta \in \R$. It is the case that $\alpha$ is less than or equal to $\beta$ (denoted $\alpha \leq \beta$) if and only if $\alpha \subseteq \beta$. Similarly, $\alpha$ is strictly less than $\beta$ (denoted $\alpha < \beta$) if and only if $\alpha \subset \beta$. Just as with the related inequality relation on the other numbers, $\alpha \leq \beta$ can also be denoted $\beta \geq \alpha$, and $\alpha < \beta$ can also be denoted $\beta > \alpha$. In these cases, $>$ ($\geq$) represents that a real number is greater than (or equal to) another real. This ordering implies that % \begin{equation*} \cdots \leq -2^* \leq -1^* \leq \frac{-1}{2}^* \leq \frac{-1}{8}^* \leq 0^* \leq \frac{1}{8}^* \leq \frac{1}{2}^* \leq 1^* \leq 2^* \leq \cdots \end{equation*} % and, in fact, % \begin{equation*} \cdots < -2^* < -1^* < \frac{-1}{2}^* < \frac{-1}{8}^* < 0^* < \frac{1}{8}^* < \frac{1}{2}^* < 1^* < 2^* < \cdots \end{equation*} \paragraph{Special Subsets of the Reals:} We refer to any real number greater than $0^*$ as \emph{positive} and define the set of \emph{positive real numbers} \symdef{Bnumbers.510}{realsg0}{$\R_{>0}$}{the set of the strictly positive real numbers} as % \begin{equation*} \R_{>0} \triangleq \{ r \in \R : r > 0^* \} \end{equation*} % The set of the \emph{non-negative real numbers} \symdef{Bnumbers.511}{realsgeq0}{$\R_{\geq0}$}{the set of the non-negative real numbers} is defined to be the union of the positive real numbers with the singleton set $\{0^*\}$. That is, $\R_{\geq0}$ is defined to be % \begin{align*} \R_{\geq0} &\triangleq \R_{>0} \cup \{ 0^* \}\\ &= \{ r \in \R : r \geq 0^* \} \end{align*} % Similarly, we refer to any real less than $0^*$ as \emph{negative} and define the set of \emph{negative real numbers} \symdef{Bnumbers.520}{realsl0}{$\R_{<0}$}{the set of the strictly negative real numbers} as % \begin{equation*} \R_{<0} \triangleq \{ r \in \R : r < 0^* \} \end{equation*} % The set of the \emph{non-positive real numbers} \symdef{Bnumbers.521}{realsleq0}{$\R_{\leq0}$}{the set of the non-positive real numbers} is defined to be the union of the negative real numbers with the singleton set $\{0^*\}$. That is, $\R_{\leq0}$ is defined to be % \begin{align*} \R_{\leq0} &\triangleq \R_{<0} \cup \{ 0^* \}\\ &= \{ r \in \R : r \leq 0^* \} \end{align*} % Note that $\R_{\geq0} = \R \setdiff \R_{<0} = \R_{<0}^c$ and $\R_{\leq0} = \R \setdiff \R_{>0} = \R_{>0}^c$. That is, the complement of the negative reals is the non-negative reals and the complement of the positive reals is the non-positive reals. The \emph{non-zero reals} \symdef{Bnumbers.53}{realsneq0}{$\R_{\neq0}$}{the set of the non-zero real numbers} is defined to be the union of the positive reals and the negative reals. That is, $\R_{\neq0}$ is defined to be % \begin{align*} \R_{\neq0} &\triangleq \R_{>0} \cup \R_{<0}\\ &= \R \setdiff \{0^*\}\\ &= \{0^*\}^c\\ &= \{ r \in \R : r \neq 0^* \} \end{align*} % As shown, $\R_{\neq0}$ is the complement of the singleton set $\{0^*\}$. \paragraph{Dense Ordering:} Note that for any two \emph{distinct} real numbers $x,y \in \R$ such that $x < y$, there is a third real number $z \in \R$ such that $x < z < y$. As discussed, this is not the case with the whole numbers nor the integers. This property makes the set of real numbers $\R$ a \emph{densely ordered set}. This is an important property of the real numbers. The real numbers share this property with the rational numbers. \paragraph{Gaplessness:} Let $\set{A} \subset \R$ be nonempty and have an \emph{upper bound} $\beta \in \R$. That is, for every $\alpha \in \set{A}$, it is the case that $\alpha \leq \beta$. Now define $\gamma$ to be the union of all elements $\alpha \in \set{A}$; that is, define $\gamma \triangleq \bigcup \{ \alpha: \alpha \in \set{A} \}$. It can be shown that $\gamma \in \R$ and $\sup \set{A} = \gamma$. In other words, $\sup \set{A} \in \R$. That is, for any nonempty strict subset of $\R$ that is bounded from above, the least upper bound of that set exists and is a member of $\R$. Let $\set{A} \subset \R$ be nonempty and have a \emph{lower bound} $\beta \in \R$. That is, for every $\alpha \in \set{A}$, it is the case that $\beta \leq \alpha$. Now define $\gamma$ to be the intersection of all elements $\alpha \in \set{A}$; that is, define $\gamma \triangleq \bigcap \{ \alpha: \alpha \in \set{A} \}$. It can be shown that $\gamma \in \R$ and $\inf \set{A} = \gamma$. In other words, $\inf \set{A} \in \R$. That is, for any nonempty strict subset of $\R$ that is bounded from below, the greatest lower bound of that set exists and is a member of $\R$. Therefore, $\R$ is \emph{gapless} (\ie, \emph{Dedekind complete}). Of course, there are subsets of $\R$ that do not have this property. \paragraph{Countability:} By the theorem in \longref{app:math_countability_and_order}, since $\R$ is both densely ordered and gapless, $\R$ is \emph{uncountable}. This will be important in our discussion of the cardinality of $\R$ below. \paragraph{Addition:} Take real numbers $\alpha \in \R$ and $\beta \in \R$. Define the \emph{addition} operator $+$ such that % \begin{equation*} \alpha + \beta \triangleq \{ r + s : r \in \alpha, s \in \beta \} \end{equation*} % That is, $\alpha + \beta$ is the set of all rationals of the form $r + s$ where $r$ is any element of $\alpha$ and $s$ is any element of $\beta$. Take any three rationals $\alpha, \beta, \gamma \in \R$. The following statements can be shown. % \begin{itemize} \item $\alpha + \beta$ meets the requirements for a real number; it is a Dedekind cut. \item Since rational addition is commutative then $\alpha + \beta = \beta + \alpha$, and so real addition is also commutative. \item Since rational addition is associative then $(\alpha + \beta) + \gamma = \alpha + (\beta + \gamma)$, where the grouping symbols have the standard meaning. In other words, real addition is associative. \item It is the case that $\alpha + 0^* = \alpha$, and so $0^*$ is the \emph{additive identity} for real addition. \end{itemize} % The result of an addition is called a \emph{sum}. \paragraph{Additive Inverses:} Take real number $\alpha \in \R$. Define the symbol $-\alpha$ as % \begin{align*} -\alpha &\triangleq \{ p \in \Q : \text{there exists } r \in \Q \text{ with } r>0 \text{ such that } -p - r \notin \alpha \}\\ &= \Q \setdiff \{ p \in \Q : \text{ for all } r \in \Q \text{ with } r>0 \text{ such that } -p - r \in \alpha \} \end{align*} % It can be shown that $-\alpha \in \R$ (\ie, $-\alpha$ is a Dedekind cut, and so it is a valid real number). It can also be shown that $\alpha + -\alpha = 0^*$. Thus, $-\alpha$ is the \emph{additive inverse} for real number $\alpha$. Of course, it can be shown that if and only if $\alpha$ is positive (\ie, $\alpha \in \R_{>0}$) then $-\alpha$ is negative (\ie, $-\alpha \in \R_{<0}$). Similarly, if and only if $\alpha \in \R_{<0}$ then $-\alpha \in \R_{>0}$. Finally, $-(-\alpha)=\alpha$. \paragraph{Subtraction:} We can define the \emph{subtraction} operator $-$ for real numbers so that for any two reals $\alpha,\beta \in \R$, % \begin{align*} \alpha - \beta &\triangleq \alpha + -\beta \end{align*} % where $-\beta$ is the additive inverse for $\beta$. However, even though this is clearly a shorthand for addition, this operation is not commutative nor associative. The result of a subtraction is called a \emph{difference}. \paragraph{Multiplication:} Take \emph{positive} real numbers $\alpha, \beta \in \R_{>0}$. Define the \emph{multiplication} operator $\times$ (where juxtaposition implies this operator) such that % \begin{equation*} \alpha \beta \triangleq \{ p \in \Q : r \in \alpha, s \in \beta, p \leq rs \} \end{equation*} % Now take real numbers $\gamma,\delta \in \R$. Multiplication has not yet been defined for negative real numbers; however, it is defined for positive real numbers. Thus, negative multiplication will be defined in terms of positive multiplication. Multiplication by $0^*$ will be defined explicitly as $0^*$. That is, % \begin{equation*} \gamma \delta \triangleq \begin{cases} 0^* &\text{if } \gamma = 0^* \text{ or } \delta = 0^*\\ (-\gamma)(-\delta) &\text{if } \gamma < 0^* \text{ and } \delta < 0^*\\ -((-\gamma)\delta) &\text{if } \gamma < 0^* \text{ and } \delta > 0^*\\ -(\gamma(-\delta)) &\text{if } \gamma > 0^* \text{ and } \delta < 0^* \end{cases} \end{equation*} % It can be shown that $\gamma \times 1^* = 1^* \times \gamma = \gamma$. Therefore, $1^*$ is the \emph{multiplicative identity} for real multiplication. Additionally, $-1^* \times \gamma = -\gamma$. Also, the distributive property holds; multiplication distributes over addition. That is, for $\gamma,\delta,\varepsilon \in \R$, % \begin{equation*} \gamma ( \delta + \varepsilon ) = \gamma \delta + \gamma \varepsilon \end{equation*} % It is also easy to show that multiplication both \emph{commutative} and \emph{associative}. That is, for $\alpha,\beta,\gamma \in \R$, % \begin{equation*} \alpha \beta = \beta \alpha \end{equation*} % and % \begin{equation*} \alpha (\beta \gamma) = (\alpha \beta) \gamma \end{equation*} % where the grouping symbols have the normal impact on the order of operations. The result of a multiplication is called a \emph{product}. \paragraph{Multiplicative Inverses:} Take \emph{non-zero} real number $\alpha \in \R_{\neq0}$. There exists $\alpha^{-1} \in \R_{\neq0}$ such that $\alpha \times \alpha^{-1} = 1^*$, where $\alpha^{-1}$ is called the \emph{multiplicative inverse} of $\alpha$. \paragraph{Division:} Take real numbers $\alpha,\beta \in \R$ with $\beta \neq 0^*$. Define the \emph{division} operator $/$ such that % \begin{equation*} \alpha / \beta \triangleq \alpha \times \beta^{-1} \end{equation*} % where $\beta^{-1}$ is the multiplicative inverse of $\beta$. Even though this is clearly a shorthand for multiplication, this operation is not commutative nor associative. The result of a division is called a \emph{quotient}. Note that $1^* / \alpha = \alpha^{-1}$, and therefore the multiplicative inverse of $\alpha$ will sometimes be denoted $1^*/\alpha$. It is also common that $\alpha / \beta$ is denoted as the \emph{ratio} $\frac{\alpha}{\beta}$. \paragraph{Exponentiation:} Now that multiplication and division have been defined for the reals, exponentiation can also be defined. For any reals $x, y, a, b \in \R$, exponentiation of the reals is such that % \begin{align*} x^{0^*} &\triangleq 1^*\\ x^{1^*} &\triangleq x\\ x^{-1^*} &\triangleq \frac{1^*}{x}\\ x^{a+b} &\triangleq x^a \times x^b\\ x^{-b} &\triangleq \frac{1^*}{x^b}\\ x^{a-b} &\triangleq \frac{x^a}{x^b}\\ (x^a)^b &\triangleq x^{a \times b}\\ (x \times y)^a &\triangleq x^a \times y^a\\ \left(\frac{x}{y}\right)^a &\triangleq \frac{x^a}{y^a} \end{align*} % Take real $x \in \R$ and integers $p,q \in \Z$ with $q \neq 0$ that make up rational $\frac{p}{q} \in \Q$. By the laws above, the real % \begin{equation*} x^{\frac{p}{q}^*} = ( x^{\frac{1}{q}^*} )^{p^*} \end{equation*} % where $p^* = \frac{p}{1}^*$. Note that if $q < 0$ then $x^{\frac{1}{q}^*} = ( x^{\frac{1}{|q|}^*})^{-1^*}$, and so assume that $q > 0$. Thus, the existence of $x^{\frac{1}{q}^*}$ where $q \in \Z$ with $q > 0$ is of critical importance. The real number $x^{\frac{1}{s}^*}$ should be such that $( x^{\frac{1}{q}^*} )^{q^*} = x$. Note that $(-1^*)^{\frac{1}{2}^*}$ does not exist since there is no real $x \in \Q$ such that $x \times x = -1^*$; however, $(-8^*)^{\frac{1}{3}^*} = -2^*$ since $-2^* \times -2^* \times -2^* = -8^*$. Additionally, there \emph{is} a real number $x \in \R$ such that $x^{2^*} = 2^*$; that is, $(2^*)^{\frac{1}{2}^*}$ exists. In particular, % \begin{equation*} (2^*)^{\frac{1}{2}^*} = \{ p \in \Q : p^2 < 2 \} \end{equation*} % Also note that for any $x \in \Q$, $x^{2^*} \geq 0^*$. Additionally, by this definition, ${0^*}^{0^*} = 1^*$. This definition also gives an alternate notation for the multiplicative inverse. That is, for any $x \in \R \setdiff \{0^*\}$, its multiplicative inverse $1^*/x$ is also denoted $x^{-1^*}$ and so $x \times x^{-1^*} = x^{-1^*} \times x = 1^*$. Our discussion of \emph{logarithms} in \longref{app:math_logarithms} is intimately related with exponentiation of the real numbers. \paragraph{Roots:} Take integer $q \in \Z$ with $q > 0$ and real $x \in \R$. The real number $x^{\frac{1}{q}^*}$ is called the \emph{$q\th$ root} of $x$ and is also denoted $\sqrt[q]{x}$. The special case of $\sqrt[3]{x}$ is called the \emph{cube root} of $x$. The special case of $\sqrt[2]{x}$ is called the \emph{square root} of $x$ and is often written as $\sqrt{x}$. \paragraph{Absolute Value and Signum:} For any real $x \in \R$, denote its \emph{absolute value} with the notation $|x|$ defined by % \begin{equation*} |x| \triangleq \begin{cases} x &\text{if } x \geq 0^*\\ -x &\text{if } x < 0^* \end{cases} \end{equation*} % and define the \emph{signum function} (also called the \emph{sign function}, not to be confused with the \emph{sine function}) $\sgn: \R \mapsto \{-1^*,0^*,1^*\}$ with % \begin{equation*} \sgn(x) \triangleq \begin{cases} -1^* &\text{ if} x < 0^*\\ 0^* &\text{ if} x = 0^*\\ 1^* &\text{ if} x > 0^* \end{cases} \end{equation*} % Therefore, any real $z \in \R$ can be represented as a magnitude (\ie, absolute value $|z|$) and a sign (\ie, $\sgn(z)$), as in % \begin{equation*} z = \sgn(z) \times |z| \end{equation*} % Note that the absolute value has some special properties. In particular, for any two reals $x,y \in \R$, % \begin{itemize} \item $|x| \geq 0^*$ \item $|x| = 0^*$ if and only if $x = 0^*$ \item $|x \times y| = |x| \times |y|$ \item $|x + y| \leq |x| + |y|$ \item $|x - y| \geq |x| - |y|$ \item $|{-x}| = |x|$ \item $|x| \leq y$ if and only if $-y \leq x \leq y$ \item $|x/y| = |x|/|y|$ if and only if $y \neq 0^*$ \end{itemize} % All of these properties are identical to the ones for rationals. \paragraph{Algebraic Structure of the Reals:} Note that for $(\R,{+},0^*)$, it is the case that % \begin{itemize} \item for all $x,y \in \R$, $x + y = y + x$ \item for all $x,y,z \in \R$, $(x + y) + z = x + (y + z)$ \item for all $x \in \R$, $0^* + x = x + 0^* = x$ \item for all $x \in \R$, $x + -x = -x + x = 0^*$ \end{itemize} % and for $(\R,{\times},1^*)$, it is the case that % \begin{itemize} \item for all $x,y \in \R$, $x \times y = y \times x$ \item for all $x,y,z \in \R$, $(x \times y) \times z = x \times (y \times z)$ \item for all $x \in \R$, $1^* \times x = x \times 1^* = x$ \end{itemize} % And so for $(\R,{+},{\times},0^*,1^*)$, % \begin{itemize} \item $(\R,{+},0^*)$ is a \emph{commutative group} with additive inverse $-x$ for every $x \in \R$ \item $(\R,{\times},1^*)$ is a \emph{commutative monoid} with multiplicative inverse $x^{-1}$ for every $x \in \R \setdiff \{0^*\}$ \item $0^* \neq 1^*$ \item for each $x,y,z \in \R$, $x(y + z) = xy + yz$ and $(x + y)z = xz + yz$ \item for all $x \in \R \setdiff \{0\}$, $x \times x^{-1} = x^{-1} \times x = 1^*$ \end{itemize} % Therefore, $(\R,{+},{\times},0^*,1^*)$ is a \emph{field}. Thus, $(\R,{+},{\times},0^*,1^*)$ is trivially an algebra over itself (\ie, an $\R$-algebra). However, also note that for any $x,y,z \in \R$, % \begin{itemize} \item if $x \leq y$ then $z + x \leq z + y$ \item if $0^* \leq x$ and $0^* \leq y$ then $0^* \leq xy$ \end{itemize} % and so $(\R,{+},{\times},0^*,1^*,{\leq})$ is an \emph{ordered field} and all aspects of familiar arithmetic apply to it. Unless otherwise noted, whenever $\R$ is used, it is assumed that it is equipped with operators $+$ and $\times$ and order relation $\leq$; in other words, $\R$ is implicitly taken to be the ordered field $(\R,{+},{\times},0^*,1^*,{\leq})$. \paragraph{Relationship to Rational Numbers:} Recall that % \begin{align*} \Q^* &\triangleq \{ \{ p \in \Q : p < q \} : q \in \Q \}\\ &= \{ q^* : q \in \Q \} \end{align*} % and so $\Q^*$ is the set of reals that have a least upper bound that is a rational number. It is easy to show that if the image of $\Q^* \times \Q^*$ through either operator $+$ or $\times$ is $\Q^*$. Additionally, it can be shown that $(\Q^*,{+}|_{\Q^*},{\times}|_{\Q^*})$ forms a field, and so $\Q^*$ is a subfield of $\R$. Since every subfield of an ordered field is also an ordered field then $(\Q^*,{+}|_{\Q^*},{\times}|_{\Q^*})$ is an ordered field. Now, take the function $f: \Q^* \mapsto \Q$ defined by % \begin{align*} f &\triangleq \left\{ \left(\{q \in \Q : q < p\}, p\right): \text{ for all } p \in \Q \right\}\\ &= \{ (p^*, p): \text{ for all } p \in \Q \} \end{align*} % Clearly this is a bijection. That is, the inverse $f^{-1}: \Q \mapsto \Q^*$ is defined by % \begin{align*} f^{-1} &\triangleq \left\{ \left(p,\{q \in \Q : q < p\}\right): \text{ for all } p \in \Q \right\}\\ &= \{ (p^*, p): \text{ for all } p \in \Q \} \end{align*} % Therefore $\Q \cong \Q^*$. Also, note that for any rationals $x,y \in \Q^*$, % \begin{enumerate}[(i)] \item if $x \geq y$ then $f(x) \geq f(y)$ \label{item:real_rational_ordering} \item $f(x + y) = f(x) + f(y)$ \label{item:real_rational_field_homomorphism_plus} \item $f(x \times y) = f(x) \times f(y)$ \label{item:real_rational_field_homomorphism_times} \item $f(1^*)=1$ \label{item:real_rational_field_homomorphism_m_identity} \end{enumerate} % Property (\shortref{item:real_rational_ordering}) shows that $f$ is a monotone function, and properties (\shortref{item:real_rational_field_homomorphism_plus})--% (\shortref{item:real_rational_field_homomorphism_m_identity}) show that $f$ is a field homomorphism. Since $f$ is also a bijection, it can be said that $f$ is both an isomorphism in both the order sense and the algebraic sense. In other words, $\Q$ is isomorphic to $\Q^*$ in both an order sense and an algebraic sense. Therefore, not only is $\Q \cong \Q^*$, but $\Q^*$ is a valid \emph{representation} for $\Q$, and it is justifiable to say that $\Q$ is a subfield of $\R$. For example, note that for any rationals $x,y,a \in \Q^*$, % \begin{itemize} \item $x = y$ if and only if $f(x) = f(y)$ \item $x \leq y$ if and only if $f(x) \leq f(y)$ \item $f(0^*) = 0$ \item $f(x + y) = f(x)+f(y)$ \item $f(x - y) = f(x)-f(y)$ \item $f(1^*) = 1$ \item $f(x y) = f(x) f(y)$ \item $f(x^{a^*}) = f(x)^a$ \end{itemize} % So arithmetic and order are both preserved by the bijection $f$. Thus, while $\Q$ is certainly not equal to $\Q^*$, it is equal in all of the important ways that matter to us, and so we can consider $\Q \subset \R$ with all of its standard ordering and operations. In other words, the $*$ superscript can be dropped from all of the real symbols above; the set $\Q^*$ is a valid representation of the set of the rationals $\Q$. Note that the set $\R \setdiff \Q$ is known as the set of the \emph{irrational numbers}. An important irrational number, Euler's constant, is introduced in \longref{app:math_logarithms}. \paragraph{Ceiling and Floor:} Take any real number $x \in \R$. The the \emph{floor} of real number $x$ is denoted \symdef{Bnumbers.61}{floor}{$\lfloor x \rfloor$}{the floor of real number $x$ (\ie, the greatest integer not greater than $x$)} and defined by % \begin{equation*} \lfloor x \rfloor \triangleq \sup\{ n \in \Z : n \leq x \} \end{equation*} % This can be viewed as the greatest integer that is not greater than the real number. For example, % \begin{itemize} \item $\lfloor 2.2 \rfloor = 2$ \item $\lfloor 2 \rfloor = 2$ \item $\lfloor -1.8 \rfloor = -2$ \end{itemize} % Clearly, $\lfloor x \rfloor \leq x \leq \lfloor x \rfloor + 1$. Similarly, the \emph{ceiling} of real number $x$ is denoted \symdef{Bnumbers.60}{ceiling}{$\lceil x \rceil$}{the ceiling of real number $x$ (\ie, the least integer not less than $x$)} and defined by % \begin{equation*} \lceil x \rceil \triangleq \inf\{ n \in \Z : x \leq n \} \end{equation*} % This can be viewed as the least integer that is not less than the real number. For example, % \begin{itemize} \item $\lceil 2.2 \rceil = 3$ \item $\lceil 2 \rceil = 2$ \item $\lceil -1.8 \rceil = -1$ \end{itemize} % Clearly, $\lceil x \rceil - 1 \leq x \leq \lceil x \rceil$. \paragraph{Base-10 (Decimal) Notation:} Now that we have defined the reals and have endowed them with addition, multiplication, and exponentiation, it is possible to introduce familiar decimal notations. We also make use of the isomorphism of $\Q$ and $\Q^*$ (and the isomorphisms between $\Z^*$ and $\Z$ and between $\W^*$ and $\W$) for simplicity. Define $n: \W \times \R_{>0} \mapsto \W$ so that $n(0,x) \triangleq \max\{ w \in \W : w \leq x \}$ and $n(k,x)$ is defined by % \begin{equation*} n(k,x) \triangleq \max\{ w \in \W : n(0,x) + n(1,x) \times 10^{-1} + \dots + w \times 10^{-k} \leq x \} \end{equation*} % Note that $0 \leq n(k,x) < 10$ for all $x \in \R_{>0}$ and all $k \in \W$. Now take $x \in \R_{>0}$. Define the set $\set{E}_x$ to be $\{ n(x,k) : k \in \W \}$. Then $x = \sup \set{E}$ and the decimal expansion of $x$ is represented by % \begin{equation*} n(0,x).n(1,x) n(2,x) n(3,x) n(4,x) \cdots \end{equation*} % where juxtaposition is simply notation and does not imply multiplication. For $x = 0$, the decimal expansion of $x$ is simply $0$ or $0.0$ followed by any number of the symbol $0$. Finally, for $x < 0$, the decimal expansion of $x$ is identical to the decimal expansion of $|x|$ except that a $-$ is prepended to the front of the expansion. \paragraph{Cardinality and a Continuum:} Note that % \begin{itemize} \item $\W$ is countable, gapless, and not densely ordered \item $\Z$ is countable, gapless, and not densely ordered \item $\Q$ is countable, not gapless, and densely ordered \item $\R$ is uncountable, gapless, and densely ordered \end{itemize} % As a consequence of the theorem in \longref{app:math_countability_and_order}, it is impossible to be both gapless and densely ordered while also being countable. The rational numbers are able to fill spaces between each of the numbers in the integers by being densely ordered. However, since $\Z$ and $\Q$ are both countable then a bijection exists between them and so the rationals can be constructed by simply reordering the integers in a method similar to the one shown in \longref{tab:rationals_and_naturals}. Unfortunately, simply adding a dense ordering to a countable set destroys gaplessness. If a set is both gapless and densely ordered, it must somehow have more elements than a countably infinite set in order to fill the gaps introduced by adding a dense ordering to a countable set. In other words, a set that is both gapless and densely ordered must be uncountable. In fact, it can be shown that a bijection exists between the power set $\Pow(\set{Q})$ and the set of the real numbers $\R$. Since a set always has a smaller cardinality than its power set then the reals must somehow have more elements than the rationals. This is the expected result; the difference in cardinality reflects the extra elements of $\R$ that fill the gaps in $\Q$. Because the real numbers lack any gaps, the real numbers are sometimes called a \emph{continuum}. \paragraph{Bounded Intervals of Real Numbers and Compact Sets:} Take $a,b \in \R$ with $a \leq b$. The interval $[a,b]$ where % \begin{equation*} [a,b] \triangleq \{x \in \R : a \leq x \leq b \} \end{equation*} % is called a \emph{closed interval}. The interval $(a,b)$ where % \begin{equation*} (a,b) \triangleq \{x \in \R : a < x < b \} \end{equation*} % is called an \emph{open interval} or a \emph{segment} \citep{Rudin76}. The intervals % \begin{align*} [a,b) &\triangleq \{x \in \R : a \leq x < b \}\\ (a,b] &\triangleq \{x \in \R : a < x \leq b \} \end{align*} % are called \emph{half-open (or half-closed) intervals}. Note that these four intervals are \emph{bounded}. In fact, because the interval $[a,b]$ is not only bounded but includes its bounds and is gapless, it is \emph{complete}. In \longref{app:math_real_numbers_as_metric_spaces}, we will discuss how closed intervals of the real numbers are called \emph{compact sets} because they are closed and bounded; however, that notion of boundedness is different than the boundedness from order theory. However, in the case of the reals, the two notions are equivalent. \paragraph{Unbounded Intervals of Real Numbers:} Take $a \in \R$. Define the intervals $[a,\infty)$, $(a,\infty)$, $(-\infty,a]$, and $(-\infty,a)$ by % \begin{align*} [a,\infty) &\triangleq \{x \in \R : x \geq a \}\\ (a,\infty) &\triangleq \{x \in \R : x > a \}\\ (-\infty,a] &\triangleq \{x \in \R : x \leq a \}\\ (-\infty,a) &\triangleq \{x \in \R : x < a \} \end{align*} % respectively. Similarly, $(-\infty,\infty) \triangleq \R$. Also note that the symbol $+\infty$ will sometimes be used in place of $\infty$. \paragraph{Real Functions:} Take any set $\set{X}$. The function $f: \set{X} \mapsto \R$ is called a \emph{real function} or a \emph{real functional} because the range of the function only takes values from $\R$. That is, a real function provides a relationship between set $\set{X}$ and the real number system $\R$. \subsection{The Extended Real Numbers} \label{app:math_ext_reals} Call \symdef{Bnumbers.54}{extreals}{$\extR$}{the set of the extended real numbers (\ie, $\R \cup \{-\infty,+\infty\}$)} the set of the \emph{extended real numbers}, which are defined by % \begin{equation*} \extR \triangleq \{{-\infty},{+\infty}\} \cup \R \end{equation*} % where $\infty$ is a shorthand notation for ${+\infty}$. Note that $\R \subset \extR$. \paragraph{Finite Numbers:} Take $x,y \in \extR \cap \R$. For $x$ and $y$, define relation $\leq$ and operators $+$, $-$, $\times$, and $/$ in the same was as in $\R$ and call $x$ and $y$ \emph{finite real numbers}. \paragraph{Ordering:} Take $x \in \extR \cap \R$ (\ie, $x$ is \emph{finite}). Define $\leq$ so that % \begin{equation*} {-\infty} < x < {+\infty} \end{equation*} % This way ${-\infty}$ is a lower bound and ${+\infty}$ is an upper bound for every subset of $\extR$. Refer to ${-\infty}$ and ${+\infty}$ as being \emph{infinite}. \paragraph{Upper and Lower Bounds:} By construction of $\extR$, any subset $\set{X} \subseteq \extR$ will have both a least upper bound and a greatest lower bound, which makes $\extR$ not only gapless but complete. However, note that % \begin{equation*} \inf \emptyset = \infty \end{equation*} % That is, the greatest lower bound of the empty set is the infinite upper bound $\infty$. This is because every $x \in R$ is an lower bound for $\infty$, and so the \emph{greatest} lower bound must be $\sup \extR$, which is $\infty$. Similarly, % \begin{equation*} \sup \emptyset = -\infty \end{equation*} % That is, the least upper bound of the empty set must be the infinite lower bound $-\infty$ (\ie, $\inf \extR$). \paragraph{Arithmetic:} Take $x \in \extR$. Define $+$, $-$, $\times$, and $/$ (also represented as a ratio) such that % \begin{enumerate}[(i)] \item for finite $x$ (\ie, $x \in \R$), \begin{itemize} \item $x + \infty = \infty$ \item $x + {-\infty} = {-\infty}$ \item $x - \infty = {-\infty}$ \item $\frac{x}{+\infty} = 0$ \item $\frac{x}{-\infty} = 0$ \end{itemize} \item for $x > 0$, $x \times {+\infty} = {+\infty}$ and $x \times {-\infty} = {-\infty}$ \item for $x < 0$, $x \times {+\infty} = {-\infty}$ and $x \times {-\infty} = {+\infty}$ \end{enumerate} % Notice that $\infty - \infty$, $\infty + {-\infty}$, $0 \times \infty$, $0 \times {-\infty}$, $y/0$ for $y \in \extR$, and $\alpha/\beta$ for $\alpha,\beta \in \{ {-\infty},{+\infty}\}$ are not defined. However, as is done in \longref{app:math_lebesgue_integral}, it will sometimes be convenient to define $\infty \times 0 = 0 \times \infty = 0$; this will never be assumed unless otherwise noted. \paragraph{Algebraic Structure of the Extended Reals:} Unlike $\R$, it is not true that $\extR$ is a field. In fact, $\extR$ is also not a ring. However, the arithmetic defined above is usually sufficient for the situations in which is it is needed. \paragraph{Completeness:} The extended real numbers are sometimes called the \emph{completion} or the \emph{closure} or the \emph{compactification} of the real numbers. That is, whereas the real numbers are only gapless, the extended reals are not only gapless but also \emph{complete}. That is, every (nonempty) subset of the extended reals has both a least upper bound and a greatest lower bound. \paragraph{Intervals of Extended Real Numbers and Compactness:} Intervals of the extended real numbers are defined exactly the same as the intervals for the real numbers. However, each unbounded real interval can be considered a bounded extended real interval, and so even these intervals can be called closed, half-open, or open. There are also additional intervals that include $\infty$ and $-\infty$. Take $a \in \extR \cap \R$ (\ie, finite $a$). Then define the intervals % \begin{align*} [a,\infty] &\triangleq \{x \in \R : x \geq a \} \cup \{\infty\}\\ (a,\infty] &\triangleq \{x \in \R : x > a \} \cup \{\infty\}\\ [-\infty,a] &\triangleq \{x \in \R : x \leq a \} \cup \{-\infty\}\\ [-\infty,a) &\triangleq \{x \in \R : x < a \} \cup \{-\infty\}\\ (-\infty,\infty] &\triangleq \R \cup \{\infty\}\\ [-\infty,\infty) &\triangleq \R \cup \{-\infty\}\\ [-\infty,\infty] &\triangleq \extR \end{align*} % As mentioned, $[-\infty,\infty]$ is a a \emph{closed interval} and $[-\infty,\infty)$ is a \emph{half-open (or half-closed) interval}. This is due to the completeness of the extended real numbers. Since every interval is bounded, every closed interval is a \emph{compact set}. Again, this quality of closed and bounded being equivalent to compact is a special quality of the real numbers. \paragraph{Real Functions as Extended Real Functions:} Take arbitrary set $\set{X}$. Any function $f: \set{X} \mapsto \R$ is a real function, as discussed; however, such a function is implicitly an extended real function. That is, a function $f: \set{X} \mapsto \R$ can be said to be a function $f: \set{X} \mapsto \extR$ with almost no loss of generality since its range will still be a subset of $\R$. \section{Basic Topology} \label{app:math_topology} One of the key reasons why $\R$ has so many real practical applications is because it is uncountable. However, this presents many challenges for analysis. That is, if points in a set cannot even be counted, it is difficult to reason about them. Thus, the mathematical study of \emph{topology} presents ways to put distance between points. In other words, points can be viewed as existing in a certain \emph{place} with respect to other points. By placing this sort of map over a set, a topology adds \emph{shape} to the set. Therefore, topology is roughly a study of the place or location of points in a set. The topology that we discuss is often called \emph{point-set topology} for this reason. \subsection{The Topological Space} \label{app:math_topological_spaces} Take a set $\set{X}$ and a set $\setset{T} \in \Pow(\set{X})$ (\ie, $\setset{T}$ is a set of subsets of $\set{X}$) such that % \begin{enumerate}[(i)] \item $\emptyset \in \setset{T}$ and $\set{X} \in \setset{T}$ \item for any set of sets $\setset{C} \subseteq \setset{T}$, the union $\bigcup \setset{C} \in \setset{T}$ \item for any sets $\set{G} \in \setset{T}$ and $\set{H} \in \setset{T}$, the intersection $\set{G} \cap \set{H} \in \setset{T}$ \end{enumerate} % Then $(\set{X},\setset{T})$ is called a \emph{topological space} and $\setset{T}$ is called a \emph{topology on $\set{X}$}. Elements of $\set{X}$ will be called \emph{points}. The sets that are contained in the topology $\setset{T}$ are called \emph{open sets} and the complements of these sets are called \emph{closed sets}. \paragraph{Open Sets and Neighborhoods:} For the following definitions, take the generic topological space $(\set{X},\setset{T})$. That is, take a set $\set{X}$ with topology $\setset{T}$ with elements of $\set{X}$ called points. Also take subset $\set{E} \subseteq \set{X}$. Also recall the definitions of \emph{filter on a set} and \emph{filter base on a set} from \longref{app:math_filters_on_sets}. % \begin{description} \item\emph{Universal Set:} The \emph{universal set} for the topological space is $\set{X}$. \item\emph{Set Complement:} The \emph{complement} of $\set{E}$ is denoted $\set{E}^c$ and defined by $\set{E}^c \triangleq \set{X} \setdiff \set{E}$. This is consistent with calling $\set{X}$ the \emph{universal set} for all sets contained in the topological space. \item\emph{Open Sets:} To say that $\set{E}$ is an \emph{open set} means that $\set{E} \in \setset{T}$. That is, $\setset{T}$ is a collection of all open sets in topological space $(\set{X},\setset{T})$. Sometimes it may be convenient to call this \emph{open in $\set{X}$} or \emph{open with respect to topology $\setset{T}$ on $\set{X}$}. \item\emph{Closed Sets:} To say that $\set{E}$ is a \emph{closed set} means that $\set{E}^c$ is an open set (\ie, $\set{E}^c \in \setset{T}$). That is, the complements of the sets in the topology $\setset{T}$ are the closed sets in topological space $(\set{X},\setset{T})$. Sometimes it may be convenient to call this \emph{closed in $\set{X}$} or \emph{closed with respect to topology $\setset{T}$ on $\set{X}$}. \item\emph{Clopen Sets:} To say that $\set{E}$ is a \emph{clopen set} or to call it \emph{clopen} means that $\set{E}$ is both an open set (\ie, $\set{E} \in \setset{T}$) and a closed set (\ie, $\set{E}^c \in \setset{T}$). Of course, the complement of any clopen set is also clopen. It can be shown that $\emptyset$ and $\set{X}$ are clopen. Sometimes it may be convenient to call this \emph{clopen in $\set{X}$} or \emph{clopen with respect to toplogy $\setset{T}$ on $\set{X}$}. \item\emph{Neighborhoods:} Take $x \in \set{X}$. A \emph{neighborhood} $\set{U}$ of $x$ (in $\set{X}$) is a set such that there exists an open set $\set{G} \in \setset{T}$ with $\set{G} \subseteq \set{U}$ where $x \in \set{G}$. A neighborhood of $x$ does not need to be an open set; however, it must contain an open set that includes $x$. \item\emph{Neighborhood Systems:} The notation \symdef{Ganalysis.0001}{nhd}{$\nhd_x$}{neighborhood system of $x$ (\ie, set of all topological neighborhoods of $x$)} is called the \emph{neighborhood system} at $x$ (for $\set{X}$) or the \emph{neighborhood filter} at $x$ (for $\set{X}$). This is the set of all neighborhoods of $x$. Therefore, to say that $\set{U}$ is a neighborhood of $x$ is equivalent to saying that $\set{U} \in \nhd_x$. It can be verified that $\nhd_x$ is a \emph{filter on set $\set{X}$}. \item\emph{Neighborhood Base:} Take $x \in \set{X}$. A \emph{neighborhood base} $\setset{B}$ at $x$ (for $\set{X}$) is such that % \begin{itemize} \item for all $\set{B} \in \setset{B}$, $\set{B}$ is a neighborhood of $x$ (\ie, $\set{B} \in \nhd_x$) \item for any neighborhood $\set{U}$ of $x$ (\ie, $\set{U} \in \nhd_x$), there exists a $\set{B} \in \setset{B}$ such that $\set{B} \subseteq \set{U}$ \end{itemize} % That is, a neighborhood base $\setset{B}$ at $x$ is a set of neighborhoods of $x$ such that every neighborhood of $x$ contains some set that belongs to $\setset{B}$. Note that any neighborhood base $\setset{B}$ at $x$ is such that $\setset{B} \subseteq \nhd_x$. It can be verified that any neighborhood base $\setset{B}$ of a point $x$ is a \emph{filter base on set $\set{X}$} that \emph{generates} the neighborhood system $\nhd_x$ (\ie, $\setset{B}$ is a basis for the neighborhood filter $\nhd_x$). \end{description} \paragraph{Points and Sets:} The following are some common terms used to describe points and sets in topological spaces. For the following definitions, take the generic topological space $(\set{X},\setset{T})$. That is, take a set $\set{X}$ with topology $\setset{T}$ with elements of $\set{X}$ called points. Also take subset $\set{E} \subseteq \set{X}$. % \begin{description} \item\emph{Limit Points of Sets:} Take point $x \in \set{X}$. The point $x$ is a \emph{limit point} of a set $\set{E}$ if every neighborhood of $x$ includes a point $p \in \set{E}$ with $p \neq x$. In other words, to say that $x$ is a limit point of set $\set{E}$ means that for all $\set{U} \in \nhd_x$, there is a point $p \in \set{U}$ with $p \in \set{E} \setdiff \{x\}$ (\ie, $\set{U} \cap (\set{E} - \{x\}) \neq \emptyset$). Note that if $x$ is a limit point of $\set{E}$, it need not be an element of $\set{E}$. It can be shown that the set $\set{E}$ is a closed set if and only if every limit point of $\set{E}$ is also an element of set $\set{E}$. An extension of this shows that the set of limit points of a set is a closed set. \item\emph{Isolated Points:} If point $x \in \set{E}$ is not a limit point of set $\set{E}$ then $x$ is an \emph{isolated point} of set $\set{E}$. \item\emph{Interior Points:} A point $x$ is an \emph{interior point} of set $\set{E}$ if there is a neighborhood $\set{U}$ of $x$ such that $\set{U} \subseteq \set{E}$. It can be shown that $\set{E}$ is an open set if and only if every element of $\set{E}$ is an interior point of $\set{E}$. \item\emph{Interior:} The \emph{interior} of $\set{E}$ is denoted $\interior(\set{E})$ and is the set of all interior points of set $\set{E}$. Some authors denote the interior of $\set{E}$ by $\overset{\circ}{\set{E}}$ or $\set{E}^\circ$. \item\emph{Dense Sets:} The set $\set{E}$ is called \emph{dense in $\set{X}$} if every point in $\set{X}$ is either a limit point of $\set{E}$, a point in $\set{E}$, or both. Roughly speaking, if $\set{E}$ is dense in $\set{X}$, then for any point in $\set{E}$, there is a point in $\set{X}$ that is near to it. Precisely, this means that if $\set{E}$ is dense in $\set{X}$ then for any point $p \in \set{E}$ and neighborhood $\set{U} \in \nhd_p$, there exists a point $x \in \set{X}$ such that $x \in \set{U}$. To say a set is \emph{dense in itself} means that the set contains no isolated points. Note that this is similar to what we called densely ordered; however, it is not the same notion. \item\emph{Set Closure:} The \emph{(topological) closure} of $\set{E}$ is denoted $\overline{\set{E}}$ and is the intersection of all closed sets that are supersets of $\set{E}$. It is defined by $\overline{\set{E}} \triangleq \set{E} \cup \set{E}'$ where $\set{E}'$ is the set of all limit points of $\set{E}$. In other words, the closure of set $\set{E}$ is the set of all elements of $\set{E}$ and all limit points of $\set{E}$. \item\emph{Closure Point:} A \emph{closure point} is a point of set $\set{E}$ is a point that is an element of its closure $\overline{\set{E}}$. That is, $x \in \set{X}$ is a closure point for $\set{E}$ if and only if $x \in \overline{\set{E}}$. \end{description} \paragraph{Some Useful Results:} The following results relate the terms given above. Again, take the generic topological space $(\set{X},\setset{T})$. Also take subset $\set{E} \subseteq \set{X}$. % \begin{itemize} \item Every neighborhood contains an open set. For example, for a point $x \in \set{X}$ and neighborhood $\set{U}$ of $x$ (\ie, $\set{U} \in \nhd_x$), there exists a set $\set{G} \in \setset{T}$ such that $\set{G} \subseteq \set{U}$. \item $(\set{E}^c)^c = \set{E}$ \item $\emptyset = \set{X}^c$ and $\set{X} = \emptyset^c$ \item Set $\set{E}$ is an open set \emph{if and only if} its complement is closed (\ie, $(\set{E}^c)^c \in \setset{T}$). \item Set $\set{E}$ is a closed set if and only if its complement is open (\ie, $\set{E}^c \in \setset{T}$). \item Set $\set{E}$ is a closed set if and only if it includes all of its limit points. \item Sets can be both open and closed simultaneously. That is, there may exist some set $\set{G} \subseteq \set{X}$ such that $\set{G} \in \setset{T}$ and $\set{G}^c \notin \setset{T}$. When a set is both open and closed, it is called a \emph{clopen set} or simply \emph{clopen}. \item $\set{E}$ is clopen if and only if $\set{E}^c$ is clopen. \item Some sets may be neither open nor closed. That is, it may be that $\set{E} \notin \setset{T}$ and $\set{E}^c \notin \setset{T}$. \item The empty set $\emptyset$ and the universal set $\set{X}$ are both open and closed in $\set{X}$ (\ie, they are \emph{clopen}). This is clear since $\emptyset \in \setset{T}$, $\emptyset = \set{X}^c$, $\set{X} \in \setset{T}$, and $\set{X} = \emptyset^c$. \item Recall that $\overline{\set{E}}$ is the closure of $\set{E}$. It can be shown that % \begin{itemize} \item $\overline{\set{E}}$ is a closed set (\ie, $\overline{\set{E}}^c \in \setset{T}$) \item $\set{E} \subseteq \overline{\set{E}}$ \item $\set{E} = \overline{\set{E}}$ if and only if $\set{E}$ is a closed set \item $\overline{\overline{\set{E}}} = \overline{\set{E}}$ \item if $\set{F} \subseteq \set{E}$ then $\overline{\set{F}} \subseteq \overline{\set{E}}$ \item if $\set{F} \subseteq \set{X}$ is a closed set (\ie, $\set{F}^c \subseteq \setset{T}$) then $\set{E} \subseteq \set{F}$ if and only if $\overline{\set{E}} \subseteq \set{F}$ \item $\overline{\set{E}}$ is the intersection of all closed sets that contain $\set{E}$ (\ie, $\overline{\set{E}} = \bigcap \{ \set{G}: \set{G}^c \in \setset{T}, \set{E} \subseteq \set{G} \}$) \item $\overline{\set{E}}$ is the smallest closed subset of $\set{X}$ that contains $\set{E}$ (\ie, $\overline{\set{E}}^c \in \setset{T}$ and for all $\set{G} \in \setset{T}$ with $\set{E} \subseteq \set{G}^c$, $\overline{\set{E}} \subseteq \set{G}^c$) \item $\overline{\set{E}} = \interior(\set{E}^c)^c$ where $\interior(\set{F})$ is the interior of a set $\set{F} \subseteq \set{X}$ \end{itemize} % \item Recall that $\interior(\set{E})$ is the interior of $\set{E}$. It can be shown that % \begin{itemize} \item $\interior(\set{E})$ is an open set (\ie, $\interior(\set{E}) \in \setset{T}$) \item $\interior(\set{E}) \subseteq \set{E}$ \item $\set{E} = \interior(\set{E})$ if and only if $\set{E}$ is an open set \item $\interior(\interior(\set{E})) = \interior(\set{E})$ \item if $\set{E} \subseteq \set{F}$ then $\interior(\set{E}) \subseteq \interior(\set{F})$ \item if $\set{F} \subseteq \set{X}$ is an open set (\ie, $\set{F} \subseteq \setset{T}$) then $\set{F} \subseteq \set{E}$ if and only if $\set{F} \subseteq \interior(\set{E})$ \item $\interior(\set{E})$ is the union of all open sets contained in $\set{E}$ (\ie, $\interior(\set{E}) = \bigcup \{ \set{G} \in \setset{T} : \set{G} \subseteq \set{E} \}$) \item $\interior(\set{E})$ is the largest open subset of $\set{X}$ contained in $\set{E}$ (\ie, $\interior(\set{E}) \in \setset{T}$ and for all $\set{G} \in \setset{T}$ with $\set{G} \subseteq \set{E}$, $\set{G} \subseteq \interior(\set{E})$) \item $\interior(\set{E}) = \overline{\set{E}^c}^c$ where $\overline{\set{F}}$ is the closure of a set $\set{F} \subseteq \set{X}$ \end{itemize} % From this, it should be clear that the interior is somewhat a \emph{dual} notion of closure. \item A point $x \in \set{X}$ is a limit point of $\set{E}$ if and only if $x \in \overline{\set{E} \setdiff \{x\}}$. \item Define $\set{E}' \triangleq \{ \text{limit points of $\set{E}$} \}$. The set $\set{E}'$ is a closed set. \item $\set{E}$ is an open set if and only if any point $x \in \set{E}$ is an interior point of $\set{E}$ (\ie, $x \in \interior(\set{E})$). \item For a point $x \in \overline{\set{E}}$, for all $\set{U} \in \nhd_x$, $\set{U} \cap \set{E} \neq \emptyset$. \end{itemize} \paragraph{Compactness and Compact Sets:} The analysis of topological spaces often involves a property known as the \emph{compactness}. Take $(\set{X},\setset{T})$ to be a topological space and subset $\set{E} \subseteq \set{X}$. To say that $\set{E}$ is \emph{compact} means that for any $\setset{U} \in \Pow(\set{E})$ (\ie, a set of subsets of $\set{E}$) such that $\set{E} \subseteq \bigcup \setset{U}$, there exists a finite $\setset{U}_0 \subseteq \setset{U}$ (\ie, $\setset{U}_0$ is a finite set of subsets of $\set{E}$) such that $\set{E} \subseteq \bigcup \setset{U}_0$. It is often said that a set is called compact if all of its open \emph{covers} have a finite \emph{subcover}. This is a useful property for dealing with infinite sets. For example, imagine two objects separated by some finite distance. Even though there are an infinite number of points between the two objects, a ruler with a finite number of points can be placed between the objects to measure the distance separating them. Therefore, the set of points between the two objects must be compact. Note that it already has been said that any closed and bounded subset of $\R$ (\eg, $[a,b]$ with $a,b \in \R$) is called compact; this is because all covers of closed and bounded subsets of $\R$ have a finite subcover, which is similar to the ruler example (note that we will discuss the topological properties of $\R$ below; in particular, we will show that all closed intervals are closed sets). Compact sets are generalizations of finite sets; they provide a way to reduce an infinite set to a finite union of open sets. \paragraph{First-Countable Spaces:} Take a topological space $(\set{X},\setset{T})$. Take any point $x \in \set{X}$. Assume that there is a sequence $(\set{B}_n)$ where $\set{B}_n \subseteq \set{X}$ for all $n \in \N$ such that for every $\set{U} \in \nhd_x$, there exists some $i \in \N$ where $\set{B}_i \subseteq \set{U}$; that is, assume that there is a countable neighborhood base at $x$. In this case, the topological space is called \emph{first-countable}. That is, a \emph{first-countable space} is a topological space where each point in the space has a countable neighborhood base. \subsection{Limits of Sets} \label{app:math_topology_set_limits} The following defined constructs which are commonly used with filters; however, we define them for sets to motivate the filter case. For the following, take the topological space $(\set{X},\setset{T})$. \paragraph{Limit Inferior and Limit Superior of a Set:} Take the subset $\set{E} \subseteq \set{X}$; however, also assume that $\set{X}$ is a partially ordered set. Take $\set{L}$ to be the set of all limit points of $\set{E}$. The \emph{limit superior} or \emph{supremum limit} of $\set{E}$, denoted $\limsup \set{E}$, is defined as the least upper bound of $\set{L}$. That is, % \begin{equation*} \limsup \set{E} \triangleq \sup \set{L} = \sup \{ x \in \set{X} : x \text{ is a limit point of } \set{E} \} \end{equation*} % The \emph{limit inferior} or \emph{infimum limit} of $\set{E}$, denoted $\liminf \set{E}$, is defined as the greatest lower bound of $\set{L}$. That is, % \begin{equation*} \liminf \set{E} \triangleq \inf \set{L} = \inf \{ x \in \set{X} : x \text{ is a limit point of } \set{E} \} \end{equation*} % Neither $\limsup \set{E}$ nor $\liminf \set{E}$ must exist; however, they will always exist when $\set{X}$ is a complete lattice. If they do exist and $\limsup \set{E} = \liminf \set{E}$ then $\set{L}$ must be a \emph{singleton set} (\ie, $\set{E}$ must have exactly one limit point). \subsection{Convergence of a Filter Base} As we will show, filter bases provide a very general framework for studying \emph{convergence} and \emph{limits}, which are very important topics in mathematical analysis. Therefore, here we describe the limiting and clustering behavior of filter bases. \paragraph{Limit Points of Filter Bases:} Take the generic topological space $(\set{X},\setset{T})$ and point $x \in \set{X}$. Assume that $\setset{B}$ is a filter base on $\set{X}$. To say \symdef[]{Ganalysis.120}{limarrow}{$\to$}{a limit}\symdef[]{Ganalysis.1201}{limfb}{$\setset{B} \to p$}{filter base $\setset{B}$ converges to $p$}$\setset{B} \to x$ means that for any neighborhood $\set{U}$ of $x$, there exists a $\set{B} \in \setset{B}$ such that $\set{B} \subseteq \set{U}$. If $\setset{B} \to x$, then it is said that $\setset{B}$ is a \emph{convergent filter base (in $\set{X}$)} that \emph{converges (in $\set{X}$) to} $x$, where $x$ is called the \emph{limit point} of $\setset{B}$. In other words, for a convergent filter base $\setset{B}$ (in $\set{X}$), the following are equivalent: % \begin{itemize} \item $\setset{B} \to x$ (in $\set{X}$) \item $x$ is a limit point of $\setset{B}$ (in $\set{X}$) \item $\setset{B}$ converges (in $\set{X}$) to $x$ \item for all $\set{U} \in \nhd_x$, there exists a $\set{B} \in \setset{B}$ such that $\set{B} \subseteq \set{U}$ \end{itemize} % Technically, convergence should always be listed with the topological space (and the particular topology, if multiple exist) in which the convergence is occurring. In many cases, the relevant topology should be obvious, and so we will omit the text in the parenthetical expressions shown above. \paragraph{Convergence in Hausdorff Spaces:} Take the topological space $(\set{X},\setset{T})$ and point $x \in \set{X}$. To say that $\set{X}$ is a \emph{Hausdorff} space means that every filter base in $\set{X}$ has at \emph{most} one limit (\ie, every convergent filter base has exactly one limit). In other words, in a Hausdorff space, % \begin{enumerate}[(i)] \item for any filter base $\setset{B}$ in $\set{X}$ and points $x,y \in \set{X}$, if $\setset{B} \to x$ and $\setset{B} \to y$ then $x = y$ \label{item:Hausdorff_unique_limits} \item for points $x,y \in \set{X}$, there exists $\set{U} \in \nhd_x$ and $\set{V} \in \nhd_y$ such that $\set{U} \cap \set{V} = \emptyset$ \label{item:Hausdorff_separated} \end{enumerate} % These two properties are actually identical. Property (\shortref{item:Hausdorff_unique_limits}) states that limits are unique in a Hausdorff space. Property (\shortref{item:Hausdorff_separated}) states that disjoint neighborhoods of any two points exist. That is, there are no two points in which all neighborhoods overlap. If a space is Hausdorff, since limits are unique, if $\setset{B} \to x$ then $x$ is called \symdef[\emph{the limit of}]{Ganalysis.10}{lim}{$\lim$}{limit (\eg, unique limit of filter base, function, net, or sequence)} $\setset{B}$ and the notation % \begin{equation*} \lim \setset{B} = x \end{equation*} % may be used. \paragraph{Cluster Points of Filter Bases:} Take the generic topological space $(\set{X},\setset{T})$, point $x \in \set{X}$, and filter base $\setset{B}$ on $\set{X}$. To say that $\setset{B}$ \emph{clusters (in $\set{X}$) at} $x$ or that $x$ is a \emph{cluster point for} $\setset{B}$ (\emph{in} $\set{X}$) means that for each $\set{B} \in \setset{B}$ and each $\set{U} \in \nhd_x$, $\set{B} \cap \set{U} \neq \emptyset$. \paragraph{Limit Points as Cluster Points:} Take the generic topological space $(\set{X},\setset{T})$, point $x \in \set{X}$, and filter base $\setset{B}$ on $\set{X}$. Assume that $x$ is a limit point of $\setset{B}$. Take a neighborhood $\set{U} \in \nhd_x$ and set $\set{B} \in \setset{B}$ with $\set{B} \subseteq \set{U}$; this is possible since $x$ is a limit point of $\setset{B}$. Now take $\set{C} \in \setset{B}$. By the definition of a filter base, $\set{B} \cap \set{C} \neq \emptyset$. However, since $\set{B} \subseteq \set{U}$, then $\set{U} \cap \set{C} \neq \emptyset$. However, $\set{U}$ and $\set{C}$ were chosen arbitrarily. Therefore, for each $\set{C} \in \setset{B}$ and each $\set{U} \in \nhd_x$, $\set{C} \cap \set{U} \neq \emptyset$. Thus, the limit point $x$ must also be a cluster point for $\setset{B}$. That is, every limit point of a filter base is a cluster point of the filter base, and so if a filter base has no cluster points then it will have no limit points as well; however, it is not necessarily the case that a cluster point is a limit point. Assume that $\set{X}$ is a Hausdorff space. In that case, % \begin{itemize} \item as mentioned, if $\setset{B}$ has no cluster points then $\setset{B}$ must also have no limit points \item if $\setset{B}$ has a single cluster point then that cluster point is the single limit point of $\setset{B}$ \item if $\setset{B}$ has more than one cluster point then there are no limit points of $\setset{B}$ \end{itemize} % Note the similarity between cluster points of a filter base and limit points of a set. \paragraph{Filter Bases on Subsets:} Take the generic topological space $(\set{X},\setset{T})$, a subset $\set{E} \subseteq \set{X}$, and point $x \in \set{X}$. Assume that $\setset{B}$ is a filter base on $\set{E}$ and that $\setset{B} \to x$. Recalling the definitions given above, it should be clear that $\setset{B} \to x$ means that % \begin{enumerate}[(i)] \item for any $\set{B} \in \setset{B}$, $\set{B} \subseteq \set{E}$ \label{item:base_on_E_on_E} \item for any $\set{U} \in \nhd_x$, there exists a $\set{B} \in \setset{B}$ with $\set{B} \neq \emptyset$ and $\set{B} \subseteq \set{U}$ \label{item:base_on_E_base} \end{enumerate} % where property (\shortref{item:base_on_E_on_E}) comes from $\setset{B}$ being \emph{on $\set{E}$} and property (\shortref{item:base_on_E_base}) comes from $\setset{B}$ being a filter base. Therefore, to say $\setset{B} \to x$ means that for any set $\set{U} \in \nhd_x$, there exists a $\set{B} \in \setset{B}$ with $\set{B} \neq \emptyset$ and $\set{B} \subseteq \set{U} \cap \set{E}$. Similarly, to say that $\setset{B}$ clusters at $x$ means that for any set $\set{U} \in \nhd_x$, there exists a $\set{B} \in \setset{B}$ with $\set{B} \cap \set{U} \cap \set{E} \neq \emptyset$. Also note that filter bases should always be listed with the sets on which they are defined; however, many topological results will apply to filter bases regardless of the sets on which they are defined. Additionally, many times the set on which the filter base is defined will be obvious. Thus, we will often omit information about the set on which a filter base is defined. \paragraph{Filter Base Cluster Points as Set Closure Points:} Take the topological space $(\set{X},\setset{T})$ and a filter base $\setset{B}$. Consider two cases. % \begin{enumerate}[(i)] \item Assume that $x \in \set{X}$ is a cluster point of $\setset{B}$. Take $\set{B} \in \setset{B}$. By the definition of a cluster point, for every $\set{U} \in \nhd_x$, $\set{U} \cap \set{B} \neq \emptyset$. However, this is the definition of a closure point of arbitrary set $\set{B} \in \setset{B}$. Therefore, $x \in \overline{\set{B}}$. In other words, it is \emph{necessary} that any cluster point of a filter base is a closure point of \emph{every} set included in the filter base. That is, $x \in \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \}$. \item Assume that $x \in \set{X}$ is a closure point of every set in the filter base $\setset{B}$. That is, assume that $x \in \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \}$. Then, for any set $\set{B} \in \setset{B}$ and any neighborhood $\set{U} \in \nhd_x$, it is such that $\set{B} \cap \set{U} \neq \emptyset$. However, this is the definition of a cluster point for filter base $\setset{B}$. Therefore, $x$ is a cluster point for $\setset{B}$. \end{enumerate} % This proves that $x$ is a cluster point for $\setset{B}$ if and only if $x \in \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \}$. In other words, the set of closure points common to all sets in the filter base, described by % \begin{equation*} \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \} \end{equation*} % is precisely the set of cluster points for filter base $\setset{B}$. \paragraph{Some Useful Results:} For the following, take the generic topological space $(\set{X},\setset{T})$ and subset $\set{E} \subseteq \set{X}$. % \begin{itemize} \item $\nhd_x \to x$ \item For any neighborhood base $\setset{N}$ of $x$, $\setset{N} \to x$. \item If $\setset{N}$ is a neighborhood base of $x$ and $\setset{B}$ is a filter base on $\set{X}$ then $\setset{B} \to x$ if and only if $\setset{B}$ is finer than $\setset{N}$. \item $\overline{\set{E}} = \{ x \in \set{X} : \text{there exists a filter base } \setset{B} \text{ on } \set{E} \text{ such that } \setset{B} \to x \}$, where $\overline{\set{E}}$ is the closure of set $\set{E}$. \item For a point $x \in \set{X}$, $x$ is a limit point of $\set{E}$ if and only if there exists a filter base $\setset{B}$ on $\set{E} \setdiff \{x\}$ such that $\setset{B} \to x$. \item For any convergent filter base $\setset{B}$ such that $\setset{B} \to x$, $\setset{B}$ clusters at $x$ and thus $x$ is a cluster point for $\setset{B}$ in $\set{X}$. \item For any filter base $\setset{B}$, $x$ is a cluster point for $\setset{B}$ if and only if there exists a filter base $\setset{C}$ such that $\setset{C}$ is finer than $\setset{B}$ and $\setset{C} \to x$. \item For any filter base $\setset{B}$, the intersection $\bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \}$ is the set of all cluster points of $\setset{B}$. This was proved above. \end{itemize} \paragraph{Set Limit Points and Filter Base Limit Points:} Take the topological space $(\set{X},\setset{T})$ and subset $\set{E} \subseteq \set{X}$. Recall that the claim $x \in \set{X}$ is a \emph{limit point of $\set{E}$} means that for any $\set{U} \in \nhd_x$, $\set{U} \cap (\set{E} \setdiff \{x\}) \neq \emptyset$. This is equivalent to saying that $x$ is a limit point of $\set{E}$ if and only if there exists a filter base $\setset{B}$ on $\set{E} \setdiff \{x\}$ such that $\setset{B} \to x$. \subsection{The Limit Inferior and Limit Superior} \label{app:math_liminf_limsup_fb} Take a topological space $(\set{X},\setset{T})$ such that $(\set{X},{\leq})$ is a partially ordered set. Also take subset $\set{E} \subseteq \set{X}$. Recall from \longref{app:math_topology_set_limits} that the limit inferior of $\set{E}$ is % \begin{equation*} \liminf \set{E} = \inf \{ x \in \set{X} : x \text{ is a limit point of } \set{E} \} \end{equation*} % and the limit superior of $\set{E}$ is % \begin{equation*} \limsup \set{E} = \sup \{ x \in \set{X} : x \text{ is a limit point of } \set{E} \} \end{equation*} % and so these are the greatest lower and least upper bounds of $\set{E}$ respectively. If $(\set{E},{\leq})$ is a complete lattice, then the limit inferior and limit superior are actually the least and greatest limit points of $\set{E}$ respectively. If both bounds exist and are equal to each other, then there must be exactly one limit point of $\set{E}$ and that single point \emph{might} be called the limit of $\set{E}$ (though it is not common to do this). Now recall that the limit points of a set are similar to the cluster points of a filter base. Thus, it is natural to define a bounds on the cluster points in a similar fashion. \paragraph{The Limit Inferior of a Filter Base:} Take the topological space $(\set{X},\set{T}_\set{X})$ where $(\set{X},{\leq})$ is a partially ordered set. Now take a filter base $\setset{B}$. Recall that the cluster points of $\setset{B}$ are given by the set % \begin{equation*} \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \} \end{equation*} % The \symdef[\emph{limit inferior}]{Ganalysis.11}{liminf}{$\liminf$}{limit inferior (\ie, $\sup \inf$)} of filter base $\setset{B}$, denoted $\liminf \setset{B}$, is the greatest lower bound of the cluster points of filter base $\setset{B}$. That is, % \begin{equation*} \liminf \setset{B} \triangleq \inf \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \} \end{equation*} % It can be shown that % \begin{equation*} \liminf \setset{B} = \sup \{ \inf \set{B} : \set{B} \in \setset{B} \} \end{equation*} % which is the more common definition of the limit inferior of a filter base. Note that the limit inferior may not exist; however, the limit inferior will always exist if $(\set{X},{\leq})$ is a complete lattice. \paragraph{The Limit Superior of a Filter Base:} Take the topological space $(\set{X},\set{T}_\set{X})$ where $(\set{X},{\leq})$ is a partially ordered set. Now take a filter base $\setset{B}$. The \symdef[\emph{limit superior}]{Ganalysis.11}{limsup}{$\limsup$}{limit superior (\ie, $\inf \sup$)} of filter base $\setset{B}$, denoted $\limsup \setset{B}$, is the least upper bound of the cluster points of filter base $\setset{B}$. That is, % \begin{equation*} \limsup \setset{B} \triangleq \sup \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \} \end{equation*} % It can be shown that % \begin{equation*} \limsup \setset{B} = \inf \{ \sup \set{B} : \set{B} \in \setset{B} \} \end{equation*} % which is the more common definition of the limit superior of a filter base. Note that the limit superior may not exist; however, the limit superior will always exist if $(\set{X},{\leq})$ is a complete lattice. \paragraph{Agreement of Limit Inferior and Limit Superior:} Take the topological space $(\set{X},\set{T}_\set{X})$ where $(\set{X},{\leq})$ is a partially ordered set. Now take a filter base $\setset{B}$. Assume that $\liminf \setset{B}$ and $\limsup \setset{B}$ both exist. In that case, for some $q \in \set{X}$, % \begin{equation*} \liminf \setset{B} = \limsup \setset{B} = q \quad \text{ if and only if } \lim \setset{B} = q \end{equation*} % In other words, if the limit superior and limit inferior both exist and agree, then there must be only one cluster point. If there is only one cluster point, then that cluster point must be the limit point of the filter base. Similarly, if the limit of the filter base exists, it must be the only cluster point, and therefore the upper and lower bounds on the cluster points must agree. Note that if the limit inferior and limit superior both exist and do \emph{not} agree, then the limit will not exist. \section{Metric Spaces and Numerical Topology} \label{app:math_metric_spaces} So far all of our results from topology have been given in terms of general topological spaces; however, we have not yet provided concrete examples of topological spaces. Before we can do that, we must introduce a specifically kind of topological space: the \emph{metric space}. It can be said that topology establishes a sort of distance relationship between points. Metric spaces explicitly define that distance, and because of that they are a sort of ideal topological space. Once we define the metric space, we show how metric spaces can be used to make $\R$ and $\extR$ valid topological spaces. \subsection{The Metric Space} \label{app:math_metric_space_specification} Take a set $\set{X}$ and $p,q,r \in \set{X}$ which will be called \emph{points}. Define the \emph{distance} function $d: \set{X} \times \set{X} \mapsto \R$ such that % \begin{enumerate}[(i)] \item $d(p,q) \geq 0$ \item $d(p,q) = 0$ \text{ if and only if } $p = q$ \item $d(p,q) = d(q,p)$ \item $d(p,r) \leq d(p,q) + d(q,r)$ \label{item:metric_triangle_inequality} \end{enumerate} % Then $(\set{X},d)$ is called a \emph{metric space} and $d$ is called a \emph{metric} on $\set{X}$. As we will discuss, every metric space is a topological space; that is, the metric $d$ can \emph{induce} a topology on $\set{X}$. Note that property (\shortref{item:metric_triangle_inequality}), known as the \emph{triangle inequality}, is equivalent to the statement that % \begin{equation*} d(p,q) \geq d(p,r) - d(q,r) \end{equation*} % which is sometimes called the \emph{inverse triangle inequality}. \subsection{Metric Space as Topological Space} \label{app:math_metric_space_as_topological_space} We will now show that all metric spaces are topological spaces. We do this by defining an \emph{open ball} and then constructing all of the open sets of the topology in terms of those open balls. For the following, take a metric space $(\set{X},d)$ and subset $\set{E} \subseteq \set{X}$. \paragraph{Open and Closed Balls:} Take $x \in \set{X}$ and $r \in \R_{>0}$. Call \symdef{Ganalysis.00000}{openball}{$B(x;r)$}{open metric ball of radius $r$ centered at $x$} an \emph{open (metric) ball} of radius $r$ with center $x$, and define it as the set % \begin{equation*} B(x;r) \triangleq \{ y \in \set{X} : d(x,y) < r \} \end{equation*} % Similarly, call \symdef{Ganalysis.00001}{closedball}{${B[x;r]}$}{closed metric ball of radius $r$ centered at $x$} a \emph{closed (metric) ball} of radius $r$ with center $x$, and define it as the set % \begin{equation*} B[x;r] \triangleq \{ y \in \set{X} : d(x,y) \leq r \} \end{equation*} % Note that it is always the case that $B(x;r) \subseteq B[x;r]$. \paragraph{Metrically Open Sets:} The set $\set{E}$ is a \emph{metrically open set} if for any point $x \in \set{E}$, there is an $\varepsilon \in \R_{>0}$ such that $B(x;\varepsilon) \subseteq \set{E}$. In other words, all points of a metrically open set are elements of open metric balls that are subsets of $\set{E}$. Sometimes it may be convenient to call this \emph{metrically open in $\set{X}$}. It is equivalent to say that $\set{E}$ is a metrically open set if and only if it is a (possibly infinite) union of open metric balls. \paragraph{Definition of Topology on a Metric Space:} Note that % \begin{itemize} \item the empty set $\emptyset$ has no points, and so it is trivially a metrically open set \item the set $\set{X}$ is a metrically open set \item the union of any set of metrically open sets is also a metrically open set \item the intersection of any two metrically open sets is also a metrically open set \end{itemize} % therefore the set $\setset{T} \triangleq \{ \set{S} : \set{S} \text{ is a metrically open set in } \set{X} \}$ is a valid topology on $\set{X}$, and $(\set{X},\setset{T})$ is a topological space. Thus, any metrically open set in $\set{X}$ is equivalently an \emph{open set (in $\set{X}$)} and so all definitions and results from \longref{app:math_topology} also apply to metric space $(\set{X},d)$ with this notion of an open set being a union of open balls. \subsection{Definitions and Notation} \label{app:math_metric_space_definitions} Now that the metric space has been shown to be a topological space, it is useful to translate the constructs used with topological spaces into a framework specific to metric spaces. Thus, we now redefine terms used with topological spaces in a way more applicable to metric spaces. We also introduce some additional terms used specifically with metric spaces. Note that all topological relationships among these terms still hold. \paragraph{Open Sets and Neighborhoods:} For the following definitions, take the generic metric space $(\set{X},d)$. That is, take a set $\set{X}$ with metric $d$ with elements of $\set{X}$ called points. Also take subset $\set{E} \subseteq \set{X}$. As before, recall the definitions of \emph{filter on a set} and \emph{filter base on a set} from \longref{app:math_filters_on_sets}. % \begin{description} \item\emph{Universal Set:} The \emph{universal set} for the metric space is $\set{X}$. \item\emph{Set Complement:} The \emph{complement} of $\set{E}$ is denoted $\set{E}^c$ and defined by $\set{E}^c \triangleq \set{X} \setdiff \set{E}$. This is consistent with calling $\set{X}$ the \emph{universal set} for all sets contained in the metric space. \item\emph{Open Sets:} To say that $\set{E}$ is an \emph{open set} means that for all $x \in \set{E}$, there is an $\varepsilon \in \R_{>0}$ such that $B(x;\varepsilon) \subseteq \set{E}$. Sometimes it may be convenient to call this \emph{open in $\set{X}$} or \emph{open with respect to metric $d$ on $\set{X}$}. \item\emph{Closed Sets:} To say that $\set{E}$ is a \emph{closed set} means that $\set{E}^c$ is an open set. Sometimes it may be convenient to call this \emph{closed in $\set{X}$} or \emph{closed with respect to metric $d$ on $\set{X}$}. \item\emph{Clopen Sets:} To say that $\set{E}$ is a \emph{clopen set} or that it is \emph{clopen} means that $\set{E}$ both an open set and a closed set. It is the case that $\set{E}$ is clopen if and only if $\set{E}^c$ is clopen. It can be shown that that $\emptyset$ and $\set{X}$ are clopen. Sometimes it may be convenient to call this \emph{clopen in $\set{X}$} or \emph{clopen with respect to metric $d$ on $\set{X}$}. \item\emph{Neighborhoods:} Take $x \in \set{X}$. A \emph{neighborhood} $\set{U}$ of $x$ (in $\set{X}$) is a set such that there exists an open set $\set{G} \in \setset{T}$ with $\set{G} \subseteq \set{U}$ where $x \in \set{G}$. A neighborhood of $x$ does not need to be an open set; however, it must contain an open set that includes $x$. It can be shown that for a metric space, $\set{U} \subseteq \set{X}$ is a neighborhood of $x$ if and only if there exists an $\varepsilon \in \R_{>0}$ where $B(x;\varepsilon) \subseteq \set{U}$. In fact, for any $r \in \R_{>0}$, $B(x;r)$ is a neighborhood of $x$. \item\emph{Neighborhood Systems:} The notation $\nhd_x$ is called the \emph{neighborhood system} at $x$ (for $\set{X}$) or the \emph{neighborhood filter} at $x$ (for $\set{X}$). This is the set of all neighborhoods of $x$. Therefore, to say that $\set{U}$ is a neighborhood of $x$ is equivalent to saying that $\set{U} \in \nhd_x$. It can be verified that $\nhd_x$ is a \emph{filter on set $\set{X}$}. \item\emph{Neighborhood Base:} Take $x \in \set{X}$. A \emph{neighborhood base} $\setset{B}$ at $x$ (for $\set{X}$) is such that % \begin{itemize} \item for all $\set{B} \in \setset{B}$, $\set{B}$ is a neighborhood of $x$ (\ie, $\set{B} \in \nhd_x$) \item for any neighborhood $\set{U}$ of $x$ (\ie, $\set{U} \in \nhd_x$), there exists a $\set{B} \in \setset{B}$ such that $\set{B} \subseteq \set{U}$ \end{itemize} % That is, a neighborhood base $\setset{B}$ at $x$ is a set of neighborhoods of $x$ such that every neighborhood of $x$ contains some set that belongs to $\setset{B}$. Note that any neighborhood base $\setset{B}$ at $x$ is such that $\setset{B} \subseteq \nhd_x$. It can be verified that any neighborhood base $\setset{B}$ of a point $x$ is a \emph{filter base on set $\set{X}$} that \emph{generates} the neighborhood system $\nhd_x$ (\ie, $\setset{B}$ is a basis for the neighborhood filter $\nhd_x$). \end{description} \paragraph{Points and Sets:} The following are some common terms used to describe points and sets in metric spaces. For the following definitions, take the generic metric space $(\set{X},d)$. That is, take a set $\set{X}$ with metric $d$ with elements of $\set{X}$ called points. Also take subset $\set{E} \subseteq \set{X}$. % \begin{description} \item\emph{Limit Points of Sets:} Take point $x \in \set{X}$. The point $x$ is a \emph{limit point} of a set $\set{E}$ if every neighborhood of $x$ includes a point $p \in \set{E}$ with $p \neq x$. In other words, to say that $x$ is a limit point of set $\set{E}$ means that for all $\set{U} \in \nhd_x$, there is a point $p \in \set{U}$ with $p \in \set{E} \setdiff \{x\}$ (\ie, $\set{U} \cap (\set{E} - \{x\}) \neq \emptyset$). Note that if $x$ is a limit point of $\set{E}$, it need not be an element of $\set{E}$. It can be shown that the set $\set{E}$ is a closed set if and only if every limit point of $\set{E}$ is also an element of set $\set{E}$. An extension of this shows that the set of limit points of a set is a closed set. \item\emph{Isolated Points:} If point $x \in \set{E}$ is not a limit point of set $\set{E}$ then $x$ is an \emph{isolated point} of set $\set{E}$. \item\emph{Interior Points:} A point $x$ is an \emph{interior point} of set $\set{E}$ if there is a neighborhood $\set{U}$ of $x$ such that $\set{U} \subseteq \set{E}$. In other words, a point $x$ is an interior point of $\set{E}$ if there exists some $\varepsilon \in \R_{>0}$ with $B(x;\varepsilon) \subseteq \set{E}$. It can be shown that $\set{E}$ is an open set if and only if every element of $\set{E}$ is an interior point of $\set{E}$. \item\emph{Interior:} The \emph{interior} of $\set{E}$ is denoted $\interior(\set{E})$ and is the set of all interior points of set $\set{E}$. Some authors denote the interior of $\set{E}$ by $\overset{\circ}{\set{E}}$ or $\set{E}^\circ$. In this metric space, the $\interior(\set{E})$ is the union of all open balls contained in $\set{E}$ (\ie, $\interior(\set{E}) = \bigcup \{ B(x;\varepsilon) : x \in \set{E}, \varepsilon \in \R_{>0}, B(x;\varepsilon) \subseteq \set{E} \}$). \item\emph{Bounded:} The set $\set{E}$ is called \emph{bounded} if there is a real number $b \in \R_{>0}$ and a point $x \in \set{X}$ such that $d(x,y) < b$ for all $y \in \set{E}$. We have already defined bounded for partially ordered sets. While this is not the same notion, because of a special property of the real numbers, in our examples there should be no conflict between these two definitions. \item\emph{Dense Sets:} The set $\set{E}$ is called \emph{dense in $\set{X}$} if every point in $\set{X}$ is either a limit point of $\set{E}$, a point in $\set{E}$, or both. Roughly speaking, if $\set{E}$ is dense in $\set{X}$, then for any point in $\set{E}$, there is a point in $\set{X}$ that is near to it. Precisely, this means that if $\set{E}$ is dense in $\set{X}$ then for any point $p \in \set{E}$ and $\varepsilon \in \R_{>0}$, there exists a point $x \in \set{X}$ such that $x \in B(p;\varepsilon)$. To say a set is \emph{dense in itself} means that the set contains no isolated points. Note that this is similar to what we called densely ordered; however, it is not the same notion. \item\emph{Set Closure:} The \emph{(topological) closure} of $\set{E}$ is denoted $\overline{\set{E}}$ and is the intersection of all closed sets that are supersets of $\set{E}$. It is defined by $\overline{\set{E}} \triangleq \set{E} \cup \set{E}'$ where $\set{E}'$ is the set of all limit points of $\set{E}$. In other words, the closure of set $\set{E}$ is the set of all elements of $\set{E}$ and all limit points of $\set{E}$. \item\emph{Closure Point:} A \emph{closure point} is a point of set $\set{E}$ is a point that is an element of its closure $\overline{\set{E}}$. That is, $x \in \set{X}$ is a closure point for $\set{E}$ if and only if $x \in \overline{\set{E}}$. \end{description} \subsection{Important Metric Space Results} \label{app:important_metric_results} There are a number of important results for metric spaces. \paragraph{Open Balls as Open Sets:} All open balls are open sets. To see this, consider $p \in \set{X}$, $r \in \R_{>0}$, and open ball $B(p;r)$. Take $q \in B(p;r)$; therefore, $d(p,q) < r$, and so there is an $h \in \R_{>0}$ such that $d(p,q) = r - h$. Now, since $d$ is a metric then for all $s \in \set{X}$, % \begin{equation*} d(p,s) \leq d(p,q) + d(q,s) \end{equation*} % Therefore, for all $s \in B(q;h)$ (\ie, $s \in \set{X}$ with $d(q,s) < h$), it must be that % \begin{equation*} d(p,s) < (r - h) + h = r \end{equation*} % Thus, for all $s \in B(q;h)$, $d(p,s) < r$, and so $s \in B(p;r)$. Therefore $q$ is an interior point of $B(p;r)$. Since $q$ was chosen arbitrarily then every point of $B(p;r)$ is an interior point of $B(p;r)$, and so $B(p;r)$ must be an open set. This proves that every open ball is an open set. However, every open set is a neighborhood by definition; and thus, every open ball is a neighborhood of its center. \paragraph{Cascades of Open Balls:} Take a metric space $(\set{X},d)$ and $r_1,r_2 \in \R_{>0}$ such that $r_1 > r_2$. Now take a point $x \in \set{X}$ and another point $y \in B(x;r_2)$. Note that % \begin{align*} d(x,y) < r_2 < r_1 \end{align*} % and therefore $y \in B(x;r_1)$. Thus, $B(x;r_2) \subseteq B(x;r_1)$. \paragraph{Metric Spaces as Hausdorff Topological Spaces:} Take a metric space $(\set{X},d)$ and points $p,q \in \set{X}$ with $p \neq q$. By the definition of a metric, $d(p,q) > 0$. Therefore, there exists some $r \in \R_{>0}$ where $d(p,q) = r$. Take such an $r$. Now take a point $x \in \set{X}$ such that $d(p,x) < r/2$. Clearly $x \in B(p,r/2)$. Additionally, by the properties of metric $d$, % \begin{align*} d(q,x) &\geq d(q,p) - d(x,p)\\ &= d(p,q) - d(x,p)\\ &= d(p,q) - d(p,x)\\ &= r - d(p,x)\\ &\geq r - \frac{r}{2}\\ &= \frac{r}{2} \end{align*} % Thus, since $d(q,x) \geq r/2$ it must be that $x \notin B(q;r/2)$. Therefore, these two metric balls have no common elements and % \begin{equation} B(p;\tfrac{r}{2}) \cap B(q;\tfrac{r}{2}) = \emptyset \label{eq:metric_hausdorff_proof} \end{equation} % Now, assume that there exists a filter base $\setset{B}$ such that $\setset{B} \to p$ and $\setset{B} \to q$. This implies that there are sets $\set{B}_p,\set{B}_q \in \setset{B}$ such that $\set{B}_p \subseteq B(p;r/2)$ and $\set{B}_q \subseteq B(q;r/2)$. Therefore, % \begin{equation*} \set{B}_p \cap \set{B}_q \subseteq B(p;\tfrac{r}{2}) \cap B(q;\tfrac{r}{2}) \end{equation*} % However, since $\setset{B}$ is a filter base, $\set{B}_p \cap \set{B}_q \neq \emptyset$, and so % \begin{equation*} B(p;\tfrac{r}{2}) \cap B(q;\tfrac{r}{2}) \neq \emptyset \end{equation*} % However, this is a contradiction of \longref{eq:metric_hausdorff_proof}, and so it must be that $p=q$. That is, since $\setset{B} \to p$ and $\setset{B} \to q$, then $p$ and $q$ are the same point. Therefore, all metric spaces are Hausdorff topological spaces. \paragraph{Metric Spaces as First-Countable Spaces:} As discussed in \longref{app:math_topological_spaces}, a first-countable space is a topological space where each point in the space has a countable neighborhood base. Take a metric space $(\set{X},d)$ and a point $x \in \set{X}$. Now define $\setset{B}_x \subseteq \Pow(\set{X})$ by % \begin{equation*} \setset{B}_x \triangleq \{ B(x,\tfrac{1}{n}) : n \in \N) \} \end{equation*} % Since every metric ball centered at $x$ is a neighborhood of $x$ then $\setset{B}_x$ is a set of neighborhoods of $x$. Additionally, since for all $r \in \R_{>0}$, there exists an $N \in \N$ such that $1/n < r$, then for any neighborhood $\set{U} \in \nhd_x$, there exists a $\set{B} \in \setset{B}_x$ such that $\set{B} \subseteq \set{U}$. Therefore, $\setset{B}_x$ is a countable neighborhood base of $x$. Since $x$ was chosen arbitrarily, then metric space $(\set{X},d)$ is a first-countable topological space. That is, all metric spaces are first-countable spaces. \subsection{Real Numbers as Metric Spaces} \label{app:math_real_numbers_as_metric_spaces} Take the ordered field $\R$. Define the function $d: \R \times \R \mapsto \R$ as the absolute value of the difference between its arguments; that is, define $d$ by % \begin{equation*} d(x,y) \triangleq |x-y| \end{equation*} % It can be verified that $d$ is a metric. This makes $(\R,d)$ a metric space (and thus a Hausdorff topological space). In fact, $d$ is the standard metric on $\R$ and is usually assumed to be equipped with $\R$ unless otherwise specified. \paragraph{Open and Closed Balls:} Take $x \in \R$ and $\delta \in \R_{>0}$. It is clear that % \begin{align*} B(x;\delta) &= \{ y \in \R : x - \delta < y < x + \delta \}\\ &= (x - \delta,x + \delta) \end{align*} % where $(x - \delta,x + \delta)$ is an \emph{open interval} or a \emph{segment} as defined above. Similarly, % \begin{equation*} B[x;\delta] = [x - \delta,x + \delta] \end{equation*} % where $[x - \delta,x + \delta]$ is a \emph{closed interval} as defined above. \paragraph{Open Intervals as Neighborhoods:} Clearly, for $x \in \R$ and $\delta \in \R_{>0}$, the open interval $(x - \delta,x + \delta)$ is a not only an open ball centered at $x$ but is also a neighborhood of $x$. \paragraph{Intervals as Open and Closed Sets:} Any open interval is an open set, and any closed interval is a closed set. For example, for $a,b \in \R$ with $a p \} : \set{U} \in \nhd_p \} \label{eq:filter_base_for_right_limit} \end{equation} % Since $p$ is a limit point of $\set{E}$ then $\setset{B}_{+}$ is a filter base on $\{ x \in \set{E} : x > p \}$ such that $\setset{B}_{+} \to p$, and therefore we can say that $x \to p$ as $x \to \setset{B}_{+}$. In this special case, we say that $x$ \emph{approaches $p$ from the right}, which we denote by $x \to {p+}$. Now assume that $f\{ \setset{B}_{+} \} \to q$. That is, $f(x) \to q$ as $x \to \setset{B}_{+}$. In this case, it is said that % \begin{equation*} f(x) \to q \text{ as } x \to {p+} \end{equation*} % or that $f(x)$ \emph{converges to $q$ as $x$ approaches $p$ from the right}. This is known as the \emph{right-hand limit} of function $f$ at point $p$. Now assume that $(\set{Y},\set{T}_\set{Y})$ is a Hausdorff space. In this case, it is said that $q$ is the \emph{limit of $f$ as $x$ approaches $p$ from the right} or the \emph{right-hand limit of $f$ at $p$}, and it is written % \begin{equation*} \lim\limits_{x \to {p+}} f(x) = q \end{equation*} % Technically, this notation can be used when $\set{Y}$ is not a Hausdorff space just as long as $q$ is the unique limit of $f$ as $x$ approaches $p$ from the right. Note that since $f\{ \setset{B}_{p+} \} \to q$ in a Hausdorff space then % \begin{equation*} \lim\limits_{x \to {p+}} f(x) = \lim f\{\setset{B}_{p+}\} \end{equation*} % where $\setset{B}_{p+}$ is from \longref{eq:filter_base_for_right_limit}. \paragraph{Agreement of Left and Right Limits:} Take the topological spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ and a subset $\set{E} \subseteq \set{X}$. Assume that $(\set{X},{\leq})$ is a totally ordered set and $\set{Y}$ is a Hausdorff space. Now take function $f: \set{E} \mapsto \set{Y}$ and points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a limit point of set $\set{E}$. It is the case that $f(x) \to q$ as $x \to p$ if and only if $f(x) \to q$ as $x \to {p-}$ and $f(x) \to q$ as $x \to {p+}$. That is, % \begin{equation} \lim\limits_{x \to {p-}} f(x) = \lim\limits_{x \to {p+}} f(x) = q \quad \text{ if and only if } \quad \lim\limits_{x \to p} f(x) = q \label{eq:left_and_right_limit_agreement} \end{equation} % Technically, there is something similar that is true when $\set{Y}$ is not a Hausdorff space; however, we omit that case for brevity. Note that if the left-hand limit and the right-hand limit do not agree, then the limit cannot exist. \subsection{The Limit Inferior and Limit Superior} Recall the definitions of limit superior and limit inferior of a filter base from \longref{app:math_liminf_limsup_fb}. These establish bounds on the cluster points of the filter base. The image of every filter base under a function is another filter base, and so it makes sense to focus on the bounds of the cluster points of this filter base. \paragraph{The Limit Inferior of a Function:} Take the topological spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ such that $(\set{Y},{\leq})$ is a partially ordered set. Also take a subset $\set{E} \subseteq \set{X}$, a function $f: \set{E} \mapsto \set{Y}$, and points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a limit point of set $\set{E}$. Recall the definition of $\setset{B}_p$ from \longref{eq:filter_base_for_limit}. The image of this filter base under $f$ is another filter base, and that filter base may have a limit inferior. Therefore, call $\liminf_{x \to p} f(x)$ the \emph{limit inferior of function $f$ as $x$ approaches $p$} and define it by % \begin{equation*} \liminf\limits_{x \to p} f(x) \triangleq \liminf f\{\setset{B}_p\} \end{equation*} % This bound may not exist. However, if $(\set{Y},{\leq})$ is a complete lattice then it will exist. \paragraph{The Handed Limit Inferiors of a Function:} Take the topological spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ such that $(\set{X},{\leq})$ is a totally ordered set and $(\set{Y},{\leq})$ is a partially ordered set. Also take a subset $\set{E} \subseteq \set{X}$, a function $f: \set{E} \mapsto \set{Y}$, and points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a limit point of set $\set{E}$. Recall the definitions of $\setset{B}_{p-}$ and $\setset{B}_{p+}$ from \longrefs{eq:filter_base_for_left_limit} and \shortref{eq:filter_base_for_right_limit} respectively. The image of each of these filter bases under $f$ is another filter base, and that filter base may have a limit inferior. Therefore, call $\liminf_{x \to {p-}} f(x)$ the \emph{limit inferior of function $f$ as $x$ approaches $p$ from the left} and define it by % \begin{equation*} \liminf\limits_{x \to {p-}} f(x) \triangleq \liminf f\{\setset{B}_{p-}\} \end{equation*} % Additionally, call $\liminf_{x \to {p+}} f(x)$ the \emph{limit inferior of function $f$ as $x$ approaches $p$ from the right} and define it by % \begin{equation*} \liminf\limits_{x \to {p+}} f(x) \triangleq \liminf f\{\setset{B}_{p+}\} \end{equation*} % Of course, neither of these two limit inferiors must exist. However, if $(\set{Y},{\leq})$ is a complete lattice then they will exist. \paragraph{The Limit Superior of a Function:} Take the topological spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ such that $(\set{Y},{\leq})$ is a partially ordered set. Also take a subset $\set{E} \subseteq \set{X}$, a function $f: \set{E} \mapsto \set{Y}$, and points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a limit point of set $\set{E}$. Recall the definition of $\setset{B}_p$ from \longref{eq:filter_base_for_limit}. The image of this filter base under $f$ is another filter base, and that filter base may have a limit superior. Therefore, call $\limsup_{x \to p} f(x)$ the \emph{limit superior of function $f$ as $x$ approaches $p$} and define it by % \begin{equation*} \limsup\limits_{x \to p} f(x) \triangleq \limsup f\{\setset{B}_p\} \end{equation*} % This bound may not exist. However, if $(\set{Y},{\leq})$ is a complete lattice then it will exist. \paragraph{The Handed Limit Superiors of a Function:} Take the topological spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ such that $(\set{X},{\leq})$ is a totally ordered set and $(\set{Y},{\leq})$ is a partially ordered set. Also take a subset $\set{E} \subseteq \set{X}$, a function $f: \set{E} \mapsto \set{Y}$, and points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a limit point of set $\set{E}$. Recall the definitions of $\setset{B}_{p-}$ and $\setset{B}_{p+}$ from \longrefs{eq:filter_base_for_left_limit} and \shortref{eq:filter_base_for_right_limit} respectively. The image of each of these filter bases under $f$ is another filter base, and that filter base may have a limit superior. Therefore, call $\limsup_{x \to {p-}} f(x)$ the \emph{limit superior of function $f$ as $x$ approaches $p$ from the left} and define it by % \begin{equation*} \limsup\limits_{x \to {p-}} f(x) \triangleq \limsup f\{\setset{B}_{p-}\} \end{equation*} % Additionally, call $\limsup_{x \to {p+}} f(x)$ the \emph{limit superior of function $f$ as $x$ approaches $p$ from the right} and define it by % \begin{equation*} \limsup\limits_{x \to {p+}} f(x) \triangleq \limsup f\{\setset{B}_{p+}\} \end{equation*} % Of course, neither of these two limit superiors must exist. However, if $(\set{Y},{\leq})$ is a complete lattice then they will exist. \paragraph{Agreement of Four Limits:} Take the topological spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ and a subset $\set{E} \subseteq \set{X}$. Assume that $(\set{X},{\leq})$ is a totally ordered set and $(\set{Y},{\leq})$ is a partially ordered set. Now take function $f: \set{E} \mapsto \set{Y}$ and points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a limit point of set $\set{E}$. Now assume that $\liminf_{x \to {p-}} f(x)$ and $\limsup_{x \to {p-}} f(x)$ both exist. It is clear that for some $q \in \set{Y}$, % \begin{equation*} \liminf\limits_{x \to {p-}} f(x) = \limsup\limits_{x \to {p-}} f(x) = q \quad \text{ if and only if } \quad \lim\limits_{x \to {p-}} f(x) = q \end{equation*} % and so if the limit superior and limit inferior do not agree, the limit from the left will not exist. Similarly, instead assume that $\liminf_{x \to {p+}} f(x)$ and $\limsup_{x \to {p+}} f(x)$ both exist. For some $q \in \set{Y}$, % \begin{equation*} \liminf\limits_{x \to {p+}} f(x) = \limsup\limits_{x \to {p+}} f(x) = q \quad \text{ if and only if } \quad \lim\limits_{x \to {p+}} f(x) = q \end{equation*} % and so if the limit superior and limit inferior do not agree, the limit from the right will not exist. Now, as is commonly done, define the notations % \begin{align*} f(p-) &\triangleq \lim\limits_{x \to {p-}} f(x)\\ f(p+) &\triangleq \lim\limits_{x \to {p+}} f(x)\\ f(p^{+}) &\triangleq \limsup\limits_{x \to {p+}} f(x)\\ f(p^{-}) &\triangleq \limsup\limits_{x \to {p-}} f(x)\\ f(p_{+}) &\triangleq \liminf\limits_{x \to {p+}} f(x)\\ f(p_{-}) &\triangleq \liminf\limits_{x \to {p-}} f(x) \end{align*} % However, note that each of these may or may not exist, where the latter four will always exist when $(\set{Y},{\leq})$ is a complete lattice. Also recall that there is no guarantee that $f(p)$ is equal to $\lim_{x \to p} f(x)$; in fact, there is no guarantee that $f(p)$ is even defined. Now assume that $f(p^{+})$, $f(p^{-})$, $f(p_{+})$, and $f(p_{-})$ all exist (\eg, $(\set{Y},{\leq})$ is a complete lattice). In this case, \longref{eq:left_and_right_limit_agreement} dictates that for some $q \in \set{Y}$, % \begin{equation*} f(p^{+}) = f(p^{-}) = f(p_{+}) = f(p_{-}) = q \quad \text{ if and only if } \quad \lim\limits_{x \to p} f(x) = q \end{equation*} % and so if all of the limit inferiors and limit superiors discussed here do not agree then the limit will not exist. \subsection{Limits of Nets} \label{app:math_lim_nets} As mentioned, filters generalize nets and sequences, and thus everything defined above can be applied to nets and sequences as well. In other words, filters give a general framework for working in analysis. Here, we discuss results for directed sets and nets. \paragraph{Limit of Tails of Directed Sets:} Take a directed set $(\set{A},{\leq})$. Let $(a_\alpha)$ be the directed set with domain $\set{A}$ and codomain $\set{A}$ where $a_\alpha = \alpha$ for all $\alpha \in \set{A}$. Now, define the filter base $\setset{A}$ by % \begin{equation} \setset{A} \triangleq \{ \{ \alpha \in \set{A} : \alpha_0 \leq \alpha \} : \alpha_0 \in \set{A} \} \label{eq:filter_base_of_A_tails} \end{equation} % which is the filter base of tails of $(a_\alpha)$. However, since $a_\alpha = \alpha$ for all $\alpha \in \set{A}$, $\setset{A}$ could be called the filter base of tails of the sequence $(\alpha)$. For ease of notation, define the identity function $f: \set{A} \mapsto \set{A}$ by $f(\alpha)=a_\alpha=\alpha$ for all $\alpha \in \set{A}$. Thus, $f\{\setset{A}\} = \setset{A}$. Now assume that $(\set{A},\setset{T}_\set{A})$ is a topological space and there exists a $p \in \set{A}$ such that $f\{\setset{A}\} \to p$. Thus, $f(\alpha) \to p$ as $\alpha \to \setset{A}$. In fact, since $f(\alpha) = \alpha$ for all $\alpha \in \set{A}$ then it can be said that $\alpha \to p$ as $\alpha \to \set{A}$. \paragraph{Limit of a Net:} Take directed set $(\set{A},{\leq})$ and topological space $(\set{X},\setset{T}_\set{X})$. Let $(x_\alpha)$ be a net with domain $\set{A}$ and codomain $\set{X}$. For ease of notation, define the function $f: \set{A} \mapsto \set{X}$ by $f(\alpha) = x_\alpha$ for all $\alpha \in \set{A}$. Take $\setset{A}$ to be the filter base defined in \longref{eq:filter_base_of_A_tails}. Thus, the filter base that is the image of $\setset{A}$ under $f$ is % \begin{equation*} f\{ \setset{A} \} = \{ \{ x_\alpha : \alpha \in \set{A}, \alpha_0 \leq \alpha \} : \alpha_0 \in \set{A} \} \end{equation*} % Assume that there exists a $q \in \set{Y}$ such that $f\{ \setset{A} \} \to q$. That is, $f(\alpha) \to q$ as $\alpha \to \setset{A}$. In this case, we say that the net $(x_\alpha)$ \emph{converges to} $q$. In other words, for any $\set{U} \in \nhd_q$, there exists an $\alpha_0 \in \set{A}$ such that $\{ x_\alpha : \alpha \in \set{A}, \alpha_0 \leq \alpha \} \subseteq \set{U}$. In the case of this net, it can be written that $x_\alpha \to q$. Now assume that it is also the case that $(\set{A},\setset{T}_\set{A})$ is a topological space and $p \in \set{A}$ is such that $\setset{A} \to p$ (\eg, for poset $\set{A}$ assume that $\sup \set{A}$ exists and let $p = \sup \set{A}$). Thus, it can be said that $\alpha \to p$ as $\alpha \to \setset{A}$. Since it is also the case that $x_\alpha \to q$ as $\alpha \to \setset{A}$, then we can say that $x_\alpha \to q$ as $\alpha \to p$. If $q$ is the unique limit of the sequence (\eg, if $\set{X}$ is a Hausdorff space) then we can write % \begin{equation*} \lim\limits_{\alpha \to p} x_\alpha = q \end{equation*} % This is the standard definition for convergence of a net. Note that if no such $q$ exists, then the net is said to \emph{diverge (in $\set{X}$)}. \subsection{Limits of Sequences} \label{app:math_lim_sequences} As discussed in \longref{app:important_metric_results}, every metric space is a first-countable space. In first-countable spaces, sequences can be used in the place of nets. Thus, even though sequences are nets, here we focus on sequences for clarity. \paragraph{Limit of Tails of Natural Numbers:} As already discussed, $(\N,{\leq})$ is a directed set (\ie, it is totally ordered). Let $(a_n)$ be the sequence (\ie, a net with domain $\N$) with codomain $\extR$ where $a_n = n$ for all $n \in \N$. In other words, $(a_n)$ is the sequence $(1,2,3,4,\dots)$. Now, define the filter base $\setset{R}$ by % \begin{align} \setset{R} &\triangleq \{ \{ n \in \N : n_0 \leq n \} : n_0 \in \N \} \nonumber\\ &= \{ \{ n_0, n_0+1, n_0+2, \dots \} : n_0 \in \N \} \label{eq:filter_base_of_N_tails} \end{align} % which is called the filter base of tails of the sequence $(a_n)$. However, since $a_n = n$ for all $n \in \N$, $\setset{R}$ could be called the filter base of tails of the sequence $(n)$. For ease of notation, define the function $f: \N \mapsto \extR$ by $f(n)=a_n=n$ for all $n \in \N$. Thus $f\{ \setset{R} \} = \setset{R}$. Note that $\N \subseteq \extR$ and $\extR$ is a Hausdorff topological space where $\nhd_\infty$ is defined as % \begin{equation*} \nhd_\infty \triangleq \{ (a,\infty] : a \in \R \} \end{equation*} % Now, for any $a \in \R$, there exists a $n_0 \in \N$ such that $\{ n_0, n_0+1, n_0+2, \dots \} \subseteq (a,\infty]$. By the definition of $\nhd_\infty$, this means that $f\{ \setset{R} \} \to \infty$. Thus, $f(n) \to \infty$ as $n \to \setset{R}$. In fact, since $f(n)=n$ for all $n \in \N$ then it can be said that $n \to \infty$ as $n \to \setset{R}$. \paragraph{Limit of a Sequence:} Take topological space $(\set{X},\setset{T}_\set{X})$ and directed set $(\N,{\leq})$. Let $(x_n)$ be a sequence (\ie, a net with domain $\N$) with codomain $\set{X}$. For ease of notation, define the function $f: \N \mapsto \set{X}$ by $f(n) = x_n$ for all $n \in \N$. Take $\setset{R}$ to be the filter base defined in \longref{eq:filter_base_of_N_tails}. Thus, the filter base that is the image of $\setset{R}$ under $f$ is % \begin{align*} f\{ \setset{R} \} &\triangleq \{ \{ x_n : n \in \N, n_0 \leq n \} : n_0 \in \N \}\\ &= \{ \{ x_{n_0}, x_{n_0+1}, x_{n_0+2}, \dots \} : n_0 \in \N \} \end{align*} % which is called the filter base of tails of the sequence $(x_n)$. Assume that there exists a $q \in \set{Y}$ such that $f\{ \setset{R} \} \to q$. That is, $f(n) \to q$ as $n \to \setset{R}$. In this case, we say that the net $(x_n)$ \emph{converges to} $q$. In other words, for any $\set{U} \in \nhd_q$, there exists an $n_0 \in \N$ such that $\{ x_{n_0}, x_{n_0+1}, x_{n_0+2}, \dots \} \subseteq \set{U}$. Note that since $x_n \to q$ as $n \to \setset{R}$ and since $n \to \infty$ as $n \to \setset{R}$, we can simply say that $x_n \to q$ as $n \to \infty$ or, for this sequence, simply \symdef[$x_n \to q$]{Ganalysis.121}{limseq}{$p_n \to p$}{limit of sequence $(p_n)$}. If $q$ is the unique limit of the sequence (\eg, if $\set{X}$ is a Hausdorff space) then we can write % \begin{equation*} \lim\limits_{n \to \infty} x_n = q \end{equation*} % This is the standard definition for convergence of a sequence. Note that if no such $q$ exists, then the sequence is said to \emph{diverge (in $\set{X}$)}. \paragraph{Monotonic Sequences:} Take a totally ordered set $(\set{X},{\leq})$ where $(\set{X},\setset{T})$ is also a topological space. Also take a sequence $(a_n)$ such that $a_n \in \set{X}$ for all $n \in \N$. Assume that $(a_n)$ is monotonically increasing. If $\sup \{ a_n : n \in \N \}$ exists then % \begin{equation} \lim\limits_{n \to \infty} a_n = \sup\{ a_n : n \in \N \} \label{eq:monotonically_increasing_limit} \end{equation} % Now assume that $(a_n)$ is monotonically decreasing. If $\inf \{ a_n : n \in \N \}$ exists then % \begin{equation} \lim\limits_{n \to \infty} a_n = \inf\{ a_n : n \in \N \} \label{eq:monotonically_decreasing_limit} \end{equation} % We state this without proof; however, this is an intuitive result. \paragraph{Limit Inferior and Limit Superior:} Take a partially ordered set $(\set{X},{\leq})$ and a sequence $(x_n)$ such that $x_i \in \set{X}$ for all $i \in \N$. Recall the definitions of $\inf$ (\ie, greatest lower bound) and $\sup$ (\ie, least upper bound) provided in \longref{app:math_upper_lower_bound}. The \emph{limit inferior} of the sequence $(x_n)$ is denoted $\liminf_{n \to \infty} x_n$ and defined by % \begin{equation} \liminf\limits_{n \to \infty} x_n \triangleq \sup\{ \inf\{ x_m : m \geq n \}: n \geq 0 \} \label{eq:liminf_seq_definition} \end{equation} % This can be called an eventual greatest lower bound of the sequence $(x_n)$; somewhat roughly speaking, all but a finite number of elements of $x_n$ are bounded from below by the limit inferior. The \emph{limit superior} of the sequence $(x_n)$ is denoted $\limsup_{n \to \infty} x_n$ and defined by % \begin{equation} \limsup\limits_{n \to \infty} x_n \triangleq \inf\{ \sup\{ x_m : m \geq n \}: n \geq 0 \} \label{eq:limsup_seq_definition} \end{equation} % This can be called an eventual least upper bound of the sequence $(x_n)$; somewhat roughly speaking, all but a finite number of elements of $x_n$ are bounded from above by the limit superior. Since $(\set{X},{\leq})$ is only a partially ordered set then these limits are not guaranteed to exist. However, if both exist then it is always the case that % \begin{equation*} \liminf\limits_{n \to \infty} x_n \leq \limsup\limits_{n \to \infty} x_n \end{equation*} % We will call the limit inferior and limit superior the \emph{extremum limits} of a sequence. \paragraph{Limit Inferior and Limit Superior as Limits:} Take a partially ordered set $(\set{X},{\leq})$ where $(\set{X},\setset{T})$ is a topological space. Also take a sequence $(x_n)$ such that $x_i \in \set{X}$ for all $i \in \N$. Assume that $\inf\{ x_m : m \geq n \}$ exists for all $n \in \N$ and define the sequence $(a_n)$ such that for all $n \in \N$, % \begin{equation*} a_n \triangleq \inf\{ x_m : m \geq n \} \end{equation*} % That is, % \begin{equation*} (a_n) = ( \inf\{ x_m : m \geq 1 \}, \inf\{ x_m : m \geq 2 \}, \inf\{ x_m : m \geq 3 \}, \dots ) \end{equation*} % Therefore, for each $n \in \N$, $a_n$ is the greatest lower bound of all but the first $n-1$ elements of $(x_n)$. Note that for all $m,n \in \N$ with $m > n$, the greatest lower bound of all but the first $m-1$ elements of $(x_n)$ must be greater than or equal to the greatest lower bound of all but the first $n-1$ elements of $(x_n)$. Therefore, $(a_n)$ must be a monotonically increasing sequence. Assume that $\sup\{ a_n : n \in \N \}$ exists. Therefore, \longref{eq:monotonically_decreasing_limit} applies, and so % \begin{equation} \liminf\limits_{n \to \infty} x_n = \lim\limits_{n \to \infty} \inf\{ x_m : m \geq n \} \label{eq:seq_liminf_as_limit} \end{equation} % Similarly, using similar reasoning, as long as the relevant suprema and infima exist, by \longref{eq:monotonically_increasing_limit}, % \begin{equation} \limsup\limits_{n \to \infty} x_n = \lim\limits_{n \to \infty} \sup\{ x_m : m \geq n \} \label{eq:seq_limsup_as_limit} \end{equation} % This is one justification for $\liminf$ and $\limsup$ being called limits. \paragraph{Dominated Sequences:} Take a partially ordered set $(\set{X},{\leq})$ and a sequence $(x_n)$ such that $x_i \in \set{X}$ for all $i \in \N$. Now, take an additional sequence $(y_n)$ such that there is some $N \in \N$ such that $y_i \leq x_i$ for all $i \geq N$. In that case, if the limit inferior and limit superior of both sequences exist then % \begin{equation} \liminf\limits_{n \to \infty} y_n \leq \liminf\limits_{n \to \infty} x_n \quad \text{ and } \quad \limsup\limits_{n \to \infty} y_n \leq \limsup\limits_{n \to \infty} x_n \label{eq:theorem_limsupinf_seq} \end{equation} % which is not a surprising result. Of course, neither the limit inferior and limit superior must exist. \paragraph{Agreement of Limit Inferior and Limit Superior:} Take a partially ordered set $(\set{X},{\leq})$ where $(\set{X},\setset{T})$ is a topological space. Also take a sequence $(x_n)$ such that $x_i \in \set{X}$ for all $i \in \N$. If both the limit inferior and limit superior of the sequence exist, then it must be the case that for some $q \in \set{X}$, % \begin{equation*} \liminf\limits_{n \to \infty} x_n = \limsup\limits_{n \to \infty} x_n = q \quad \text{ if and only if } \quad \lim\limits_{n \to \infty} x_n = q \end{equation*} % Thus, if the limit inferior and limit superior do not exist or do exist but do not agree, then the limit will not exist. \subsection{Series} \label{app:math_series} Take a topological space $(\set{X},\setset{T})$ where $(\set{X},{+})$ is a magma. Take sequence $(a_n)$ such that $a_n \in \set{X}$ for all $n \in \N$. \paragraph{Definition of a Series:} Given the sequence $(a_n)$, a new sequence $(s_n)$ can be constructed with % \begin{equation*} s_n \triangleq \sum_{i=1}^n a_i \end{equation*} % where $s_n$ is called a \emph{partial sum} and the sequence $(s_n)$ is known as a \emph{sequence of partial sums}. That is, the sequence $(s_n)$ is defined by % \begin{equation*} (s_n) \triangleq (a_1, a_1 + a_2, a_1 + a_2 + a_3, a_1 + a_2 + a_3 + a_4, \cdots) \end{equation*} % However, $(s_n)$ is sometimes denoted % \begin{equation*} a_1 + a_2 + a_3 + \cdots \end{equation*} % or % \begin{equation*} \sum\limits_{i=1}^\infty a_i \end{equation*} % The latter notation is called an \emph{infinite series} or simply a \emph{series}. If the sequence $(s_n)$ converges to some limit $s$ (\ie, $s_n \to s$ as $n \to \infty$) then the notation % \begin{equation} \sum\limits_{i=1}^n a_i = s \label{eq:series_notation} \end{equation} % is used where the limit $s$ is called the \emph{sum of the series}. This is simply a compact notation for a limit. This does \emph{not} indicate that the sequence $(a_n)$ has the sum of $s$; it only indicates that the real sequence of partial sums $(s_n)$ converges to $s$. \paragraph{Alternate Notations:} The \emph{index} $i$ in \longref{eq:series_notation} will often be used when elements of the sequence being summed have a certain pattern. If an element $a_i$ of a sequence $(a_n)$ is a function of $i-1$ rather than just $i$ then it is often convenient to use the notation % \begin{equation*} \sum\limits_{i=0}^\infty a_{i+1} = s \label{eq:series_notation_1} \end{equation*} % For example, consider the sequence $(0,1,2,3,\dots)$. In this case, the sum of this series might be denoted $\sum_{i=1}^\infty (i-1)$ or, more simply, $\sum_{i=0}^\infty i$. For similar reasons, the most general series notation for the sum of the sequence $(a_n)$ is % \begin{equation*} \sum\limits_{i=1-z}^\infty a_{i+z} = s \label{eq:series_notation_a} \end{equation*} % where $z \in \Z$. \section{Continuous Functions} Take the topological spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ and a subset $\set{E} \subseteq \set{X}$. Assume that $(\set{Y},\setset{T}_\set{Y})$ is a Hausdorff space. Now take function $f: \set{E} \mapsto \set{Y}$, subset $\set{F} \subseteq \set{E}$, and point $p \in \set{E}$ that is a limit point for set $\set{E}$. % \begin{itemize} \item If it is the case that $\lim_{x \to p} f(x) = f(p)$ then the function $f$ is called \emph{continuous at (point) $p$}. \item If it is the case that for some subset $\set{F} \subseteq \set{E}$ that $f$ is continuous at $x$ for all $x \in \set{F}$ then $f$ is called \emph{continuous on set $\set{F}$}. \item Furthermore, if it is the case that $f$ is continuous at $x$ for all $x \in \set{E}$ then $f$ is simply called \emph{continuous (on its domain)}. \end{itemize} % Let $q \in \set{Y}$ be such that $q = f(p)$. To summarize, to say that $f$ is continuous at $p$ means that for any $\set{V} \in \nhd_q$, the preimage $f^{-1}[\set{V}] \in \nhd_p$. Note the following. % \begin{itemize} \item The function $f$ is continuous if and only if the preimage of every open set is open. \item The function $f$ is continuous if and only if the preimage of every closed set is closed. \end{itemize} \paragraph{Compactness and Continuity:} Take the topological spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ and subsets $\set{C}_\set{X} \subseteq \set{X}$ and $\set{C}_\set{Y} \subseteq \set{Y}$. Assume that $(\set{Y},\setset{T}_\set{Y})$ is a Hausdorff space and the set $\set{C}_\set{Y}$ is a compact set. Also take a continuous function $f: \set{C}_\set{X} \mapsto \set{Y}$. It is the case that the image $f[\set{C}_\set{X}]$ is compact. In particular, the continuous image of every compact set is compact. That is, the image of every contact set under a continuous function is compact. \paragraph{Compositions of Continuous Functions:} Take $(\set{X},\setset{T}_{\set{X}})$, $(\set{Y},\setset{T}_{\set{Y}})$, and $(\set{Z},\setset{T}_{\set{Z}})$ to be three topological spaces. Take subset $\set{E} \subseteq \set{X}$ and a function $f: \set{E} \mapsto \set{Y}$. Also take function $g: \range(f) \mapsto \set{Z}$. Now define function $h: \set{E} \mapsto \set{Z}$ by % \begin{equation*} h(x) \triangleq g(f(x)) \end{equation*} % for all $x \in \set{E}$. In other words, $h$ is the composition of $g$ and $f$; that is, $h = g \comp f$. In this case, if $f$ is continuous at $p \in \set{E}$ and $g$ is continuous at point $f(p) \in \set{Y}$ then $h$ is continuous at $p \in \set{E}$ as well. In fact, if $f$ is continuous on set $\set{E}$ and $g$ is continuous on set $\range(f) \subseteq \set{Y}$ then $h$ is also continuous on set $\set{E}$. In other words, compositions of continuous functions are also continuous. \section{Basic Real Analysis} \label{app:math_real_analysis} The following are some useful remarks about $\extR$. Note that due to the various isomorphisms, $\N \subset \W \subset \Z \subset \Q \subset \R \subset \extR$, and thus these statements apply to all of the numbers discussed here. \subsection{Real-Valued Sequences and Functions} For brevity, we now introduce \emph{real-valued} sequences and functions. \paragraph{Real and Extended Real Sequences:} Take a sequence $(x_n)$ such that $x_i \in \R$ for all $i \in \N$. This sequence is called a \emph{real sequence} because all of its elements come from the real number system. Sometimes such a sequence will be called a \emph{real-valued sequence} or simply \emph{real-valued}. Note that all real sequences are implicitly \emph{extended real sequences} since $\R \subset \extR$. \paragraph{Real and Extended Real Functions:} Take a set $\set{X}$ and a function $f: \set{X} \mapsto \R$. This function is called a \emph{real function} or a \emph{real functional} because all of its elements come from the real number system. Sometimes such a function will be called a \emph{real-valued function} or simply \emph{real-valued}. Note that all real functions are implicitly \emph{extended real functions} since $\R \subset \extR$. \subsection{Limiting Behavior} As explained, the real numbers $\R$ are a metric space and the extended real numbers $\extR$ are a Hausdorff topological space that is a totally ordered complete lattice. This greatly simplifies the limiting behavior of real-valued sequences and real-valued functions. For the following, recall that every real sequence can be viewed as a real-valued function with the domain $\N$. That is, for every real sequence $(x_n)$, there is a function $f: \N \mapsto \R$ defined by $f(n) = x_n$ for all $n \in \N$. Thus, for simplicity, the following results will be based on sequences with limits at $\infty$; however, they also follow for functions with limits anywhere. \paragraph{Divergence of a Limit of Function or Sequence:} Take the real sequences $(a_n)$ and $(b_n)$ defined by % \begin{equation*} (a_n) \triangleq (1,2,3,4,5,\dots) \quad \text{ and } \quad (b_n) \triangleq (-1,-2,-3,-4,-5,\dots) \end{equation*} % As explained in \longref{app:math_lim_sequences}, the extended real sequence $(a_n)$ converges to $\infty$; similarly, $(b_n)$ converges to $-\infty$. However, neither of these two sequences has a limit in the metric space sense. Thus, to say that $a_n \to \infty$ is to say that $a_n$ has no upper bound. Similarly, to say that $b_n \to {-\infty}$ is to say that $b_n$ has no lower bound. That is, % \begin{itemize} \item for all $R \in \R$, there exists an $N \in \N$ such that $a_n \geq R$ for all $n \geq N$, and so $a_n \to \infty$ \item for all $R \in \R$, there exists an $N \in \N$ such that $b_n \leq R$ for all $n \geq N$, and so $b_n \to {-\infty}$ \end{itemize} % Therefore, stating that a sequence (or function) converges in an infinite sense communicates information about the sequence. Thus, we will always consider real sequences and real functions to be extended real functions so that $\infty$ and $-\infty$ can always be used as limits. \paragraph{Oscillation of a Sequence or Function:} It should be clear that a real sequence or function can diverge in both a real metric sense as well as an extended real topological sense. For example, take the sequence % \begin{equation*} (1,0,1,0,1,0,1,0,1,0,\dots) \end{equation*} % It \emph{oscillates}, and so all of its values are bounded; however, it still does not converge. Its limit simply does not exist; however, its limit inferior is $0$ and its limit superior is $1$. As we will show, the limit inferior and limit superior will always exist in the extended real context, and the limit will only exist when the limit superior and limit inferior agree. \paragraph{Extended Real Limit Inferior and Limit Superior:} Recall the definitions of the limit inferior and limit superior of a sequence from \longrefs{eq:liminf_seq_definition} and \shortref{eq:limsup_seq_definition} respectively. Since the extended real numbers are a complete lattice, the limit inferior and limit superior must exist for all real sequences. Thus, for real sequence $(x_n)$, it is always the case that % \begin{equation} \liminf\limits_{n \to \infty} x_n = \infty \quad \text{ or } \quad \liminf\limits_{n \to \infty} x_n = a \quad \text{ or } \quad \liminf\limits_{n \to \infty} x_n = -\infty \label{eq:liminf_seq_always} \end{equation} % and % \begin{equation} \limsup\limits_{n \to \infty} x_n = \infty \quad \text{ or } \quad \limsup\limits_{n \to \infty} x_n = b \quad \text{ or } \quad \limsup\limits_{n \to \infty} x_n = -\infty \label{eq:limsup_seq_always} \end{equation} % where $a,b \in \R$ (\ie, $a$ and $b$ are finite); note that $a$ and $b$ need not be equivalent. In the case that the limit inferior or limit superior is not $\infty$ or $-\infty$ then that limit is said to be \emph{finite}. An infinite limit indicates no eventual extremum bound in the real sense. These results hold for real functions as well. \paragraph{Interpretation of Limit Inferior and Limit Superior for Reals:} Take a sequence $(x_n)$ such that $x_i \in \R$ for all $i \in \N$. % \begin{itemize} \item If there is a $b \in \R$ such that for any $\varepsilon \in \R_{>0}$ there exists an $M \in \N$ such that $x_n < b + \varepsilon$ for all $n \geq M$ then $\limsup_{n \to \infty} x_n = b$. \item If there is an $a \in \R$ such that for any $\varepsilon \in \R_{>0}$, there exists an $N \in \N$ such that $x_n > b - \varepsilon$ for all $n \geq N$ then $\liminf_{n \to \infty} x_n = a$. \end{itemize} % In other words, the limit superior is the least upper bound and the limit inferior is the greatest lower bound \emph{for all but a finite number of elements} of the real sequence $(x_n)$. For example, take the sequence % \begin{equation*} (100,-100,1,0,1,0,1,0,1,0,\dots) \end{equation*} % where the pattern of $1$ and $0$ continues \adinfinitum{}. In this case, the limit of the sequence does not exist. However, $1$ is an upper bound for all but the first element, and so $1$ is the limit superior of the sequence. Similarly, $0$ is a lower bound for all but the second element, and so $0$ is the limit inferior of the sequence. In fact, the set of cluster points for the filter base generated by this sequence is $\{0,1\}$. Now take the sequence % \begin{equation*} (1,-1,2,-2,3,-3,4,-4,5,-5,6,-6,7,-7,\dots) \end{equation*} % Again, it is clear that the limit of this sequence does not exist because it oscillates. In fact, there is also no finite upper bound nor finite lower bound. However, by definition $\infty \in \extR$ is always an upper bound for the elements of the sequence and $-\infty \in \extR$ is always a lower bound for the elements of the sequence. Therefore, the sequence's limit superior is $\infty$ and the sequence's limit inferior is $-\infty$. It should be clear that $\infty$ and $-\infty$ are always possibilities for the limit inferior and limit superior, and therefore the limit inferior and limit superior are always defined (and thus always exist) in the extended real context. \paragraph{Special Case of Limit Inferior and Limit Superior:} Take a real sequence $(x_n)$. Also take a second real sequence $(y_n)$ such that $y_i=-x_i$ for all $i \in \N$. Recall the arithmetic rules for extended real numbers (\eg, $-1 \times \infty = -\infty$). Keeping these in mind, it is always the case that % \begin{equation*} -\liminf\limits_{n \to \infty} x_n = \limsup\limits_{n \to \infty} y_n \end{equation*} % and % \begin{equation*} -\limsup\limits_{n \to \infty} x_n = \liminf\limits_{n \to \infty} y_n \end{equation*} % In fact, the second statement is redundant. \paragraph{Extremum Limits and Convergence:} Take a real sequence $(x_n)$. Of course, using the standard metric for $\R$, the sequence $(x_n)$ converges to point $x \in \extR$ if and only if % \begin{equation*} \liminf\limits_{n \to \infty} x_n = \limsup\limits_{n \to \infty} x_n = x \end{equation*} % That is, as discussed, the limit $\lim_{n \to \infty} x_n$ exists if and only if the limit superior and limit inferior of $(x_n)$ agree. When the limit does exist, it is the case that % \begin{equation*} \lim\limits_{n \to \infty} x_n = \liminf\limits_{n \to \infty} x_n = \limsup\limits_{n \to \infty} x_n \end{equation*} % For example, consider the real sequences $(a_n)$, $(b_n)$, $(c_n)$, and $(d_n)$ defined by % \begin{align*} (a_n) &\triangleq (1,2,3,4,5,\dots)\\ (b_n) &\triangleq (-1,-2,-3,-4,-5,\dots)\\ (c_n) &\triangleq (1,-1,2,-2,3,-3,4,-4,5,-5,\dots)\\ (d_n) &\triangleq (1,0,1,0,1,0,1,0,\dots) \end{align*} % It is the case that % \begin{align*} \liminf\limits_{n \to \infty} a_n = \infty \quad &\text{ and } \quad \limsup\limits_{n \to \infty} a_n = \infty\\ \liminf\limits_{n \to \infty} b_n = -\infty \quad &\text{ and } \quad \limsup\limits_{n \to \infty} b_n = -\infty\\ \liminf\limits_{n \to \infty} c_n = -\infty \quad &\text{ and } \quad \limsup\limits_{n \to \infty} c_n = \infty\\ \liminf\limits_{n \to \infty} d_n = 0 \quad &\text{ and } \quad \limsup\limits_{n \to \infty} d_n = 1 \end{align*} % and therefore $a_n \to \infty$ and $b_n \to -\infty$ as $n \to \infty$. However, the limits for $(c_n)$ and $(d_n)$ simply do not exist since the limit inferior and limit superior do not agree for each of them. \paragraph{Limit Arithmetic:} Take $(\set{X},\setset{T}_\set{X})$ to be a topological space and a subset $\set{E} \subseteq \set{X}$. Also take functions $f: \set{E} \mapsto \R$ and $g: \set{E} \mapsto \R$. Thus, $f$ and $g$ are both extended real functions. Now take $x_0 \in \set{X}$ to be a limit point of $\set{E}$ in $\set{X}$. Assume that there exists $p,q \in \extR$ such that $f(x) \to p$ and $g(x) \to q$ as $x \to x_0$ (in the topological extended real sense of the limit). That is, assume that $p$ and $q$ are such that % \begin{equation*} \lim\limits_{x \to x_0} f(x) = p \quad \text{ and } \quad \lim\limits_{x \to x_0} g(x) = q \end{equation*} % Keeping in mind the rules for arithmetic in the extended real numbers, % \begin{itemize} \item if $(p,q) \notin \{({-\infty},\infty), (\infty,{-\infty})\}$ then % \begin{equation*} \lim\limits_{x \to x_0} ( f(x)+g(x) ) = p + q \end{equation*} \item if $(p,q) \notin \{(\infty,\infty), ({-\infty},{-\infty})\}$ then % \begin{equation*} \lim\limits_{x \to x_0} ( f(x)-g(x) ) = p - q \end{equation*} \item if $(p,q) \notin \{(0,\infty),(\infty,0), (0,{-\infty}),({-\infty},0)\}$ then % \begin{equation*} \lim\limits_{x \to x_0} ( f(x) g(x) ) = p q \end{equation*} \item if $q \neq 0$ and $p,q \notin \{\infty, {-\infty})\}$ then % \begin{equation*} \lim\limits_{x \to x_0} \frac{ f(x) }{ g(x) } = \frac{p}{q} \end{equation*} \end{itemize} % Otherwise, the limits of these sums, differences, products, and quotients of functions do not exist. This same arithmetic holds for right-handed and left-handed limits as well as limit inferiors and limit superiors. \subsection{Semi-Continuity of Real-Valued Functions} For a real-valued function, the concept of continuity can be broken lower semi-continuity and upper semi-continuity. A function is continuous if and only if these two notions agree. \paragraph{Lower Semi-Continuous Functions:} Take the topological space $(\set{X},\setset{T}_\set{X})$ and a subset $\set{E} \subseteq \set{X}$. Now take function $f: \set{E} \mapsto \R$, subset $\set{F} \subseteq \set{E}$, and point $p \in \set{E}$ that is a limit point for set $\set{E}$. % \begin{itemize} \item If it is the case that $\liminf_{x \to p} f(x) \geq f(p)$ then the function $f$ is called \emph{lower semi-continuous at (point) $p$}. \item If it is the case that for some subset $\set{F} \subseteq \set{E}$ that $f$ is lower semi-continuous at $x$ for all $x \in \set{F}$ then $f$ is called \emph{lower semi-continuous on set $\set{F}$}. \item Furthermore, if it is the case that $f$ is lower semi-continuous at $x$ for all $x \in \set{E}$ then $f$ is simply called \emph{lower semi-continuous (on its domain)}. \end{itemize} % Define the function $f_*: \set{E} \mapsto \extR$ by % \begin{equation*} f_*(p) = \liminf_{x \to p} f(x) \end{equation*} % for all $p \in \set{E}$. It is clear that $f_*$ is a lower semi-continuous function. Additionally, so is the function $g: \R \mapsto \R$ defined by % \begin{equation*} g(x) \triangleq \lceil x \rceil \end{equation*} % for all $x \in \R$. That is, the \emph{ceiling function} is lower semi-continuous. \paragraph{Upper Semi-Continuous Functions:} Take the topological space $(\set{X},\setset{T}_\set{X})$ and a subset $\set{E} \subseteq \set{X}$. Now take function $f: \set{E} \mapsto \R$, subset $\set{F} \subseteq \set{E}$, and point $p \in \set{E}$ that is a limit point for set $\set{E}$. % \begin{itemize} \item If it is the case that $\limsup_{x \to p} f(x) \leq f(p)$ then the function $f$ is called \emph{upper semi-continuous at (point) $p$}. \item If it is the case that for some subset $\set{F} \subseteq \set{E}$ that $f$ is upper semi-continuous at $x$ for all $x \in \set{F}$ then $f$ is called \emph{upper semi-continuous on set $\set{F}$}. \item Furthermore, if it is the case that $f$ is upper semi-continuous at $x$ for all $x \in \set{E}$ then $f$ is simply called \emph{upper semi-continuous (on its domain)}. \end{itemize} % Define the function $f^*: \set{E} \mapsto \extR$ by % \begin{equation*} f^*(p) = \limsup_{x \to p} f(x) \end{equation*} % for all $p \in \set{E}$. It is clear that $f^*$ is an upper semi-continuous function. Additionally, so is the function $g: \R \mapsto \R$ defined by % \begin{equation*} g(x) \triangleq \lfloor x \rfloor \end{equation*} % for all $x \in \R$. That is, the \emph{floor function} is upper semi-continuous. \paragraph{From Semi-Continuity to Continuity:} Take the topological space $(\set{X},\setset{T}_\set{X})$ and a subset $\set{E} \subseteq \set{X}$. Now take function $f: \set{E} \mapsto \R$, subset $\set{F} \subseteq \set{E}$, and point $p \in \set{E}$ that is a limit point for set $\set{E}$. It is the case that $f$ is continuous at $p$ if and only if $f$ is upper semi-continuous at $p$ and $f$ is lower-semicontinuous at $p$. \subsection{The Intermediate Value Theorem} Take $a,b \in \R$ with $a c$, $f(d) \geq f(c)$ (\ie, $f$ is increasing) if and only if for all $x \in (a,b)$, $f'(x) \geq 0$ \item for any $c,d \in [a,b]$ with $d > c$, $f(d) \leq f(c)$ (\ie, $f$ is decreasing) if and only if for all $x \in (a,b)$, $f'(x) \leq 0$ \item if for all $x \in (a,b)$, $f'(x) > 0$ then for all $c,d \in [a,b]$ with $d > c$, $f(d) > f(c)$ (\ie, $f$ is strictly increasing) \item if for all $x \in (a,b)$, $f'(x) < 0$ then for all $c,d \in [a,b]$ with $d > c$, $f(d) < f(c)$ (\ie, $f$ is strictly decreasing) \end{itemize} % These match the intuitive description of a derivative as a \emph{slope} of a \emph{tangent line} at a point. \subsection{Necessary Condition for Maxima and Minima} The problem of optimization involves the maximization or minimization of a function. When these functions are differentiable, this process is simplified. \paragraph{Necessary Conditions for Minima:} Take set $\set{A} \subseteq \R$ and function $f: \set{A} \mapsto \R$. Take a point $p$ such that there exists some $\varepsilon \in \R_{>0}$ such that $(p,p+\varepsilon) \cap \set{A} \neq \emptyset$ and $(p-\varepsilon,p) \cap \set{A} \neq \emptyset$ (\eg, $p \in \interior(\set{A})$) and assume that $f$ is differentiable at point $p$. Also assume that there exists some $\varepsilon \in \R_{>0}$ such that for all $x \in \set{A} \cap (p-\varepsilon,p+\varepsilon)$, $f(p) \leq f(x)$. That is, $f$ has a \emph{local minimum} at $p$. In this case, it must be that $f'(p)=0$. Now assume that function $f'$ is differentiable at $p$. In this case, it must be that $0 \leq f''(p)$. \paragraph{Sufficient Conditions for Minima:} Take set $\set{A} \subseteq \R$ and function $f: \set{A} \mapsto \R$. Take a point $p$ such that $f$ is differentiable at $p$. Also assume that $f'$ is differentiable at $p$. If it is the case that $f'(p)=0$ and $f''(p) > 0$, then $f$ must have a local minimum at $p$. \paragraph{Necessary Conditions for Maxima:} Take set $\set{A} \subseteq \R$ and function $f: \set{A} \mapsto \R$. Take a point $p$ such that there exists some $\varepsilon \in \R_{>0}$ such that $(p,p+\varepsilon) \cap \set{A} \neq \emptyset$ and $(p-\varepsilon,p) \cap \set{A} \neq \emptyset$ (\eg, $p \in \interior(\set{A})$) and assume that $f$ is differentiable at point $p$. Also assume that there exists some $\varepsilon \in \R_{>0}$ such that for all $x \in \set{A} \cap (p-\varepsilon,p+\varepsilon)$, $f(x) \leq f(p)$. That is, $f$ has a \emph{local maximum} at $p$. In this case, it must be that $f'(p)=0$. Now assume that function $f'$ is differentiable at $p$. In this case, it must be that $f''(p) \leq 0$. \paragraph{Sufficient Conditions for Minima:} Take set $\set{A} \subseteq \R$ and function $f: \set{A} \mapsto \R$. Take a point $p$ such that $f$ is differentiable at $p$. Also assume that $f'$ is differentiable at $p$. If it is the case that $f'(p)=0$ and $f''(p) < 0$, then $f$ must have a local maximum at $p$. \section{Partial and Total Derivatives} \label{app:math_partial_derivatives} Take sets $\set{T} \subseteq \R$ and $\set{X} \subseteq \R$ and functions $x: \set{T} \mapsto \set{X}$ and $f: \set{X} \mapsto \R$. Take a point $p \in \set{T}$ and assume that function $x$ is differentiable at $p$ and function $f$ is differentiable at $x(p)$. Define the composition $f \comp x$ as $g$. That is, define $g: \set{T} \mapsto \R$ by % \begin{equation*} g(t) \triangleq f(x(t)) \end{equation*} % for all $t \in \set{T}$. As discussed in \longref{app:math_chain_rule}, $g$ is differentiable at $p$ and $g'(p) = f'( x(p) ) x'(p)$. For simplicity, whenever $f$ is evaluated at a symbol $x$, assume the normal definition of $f$; however, whenever $f$ is evaluated at symbol $t$, assume that $g(t)$ is meant. That is, % \begin{equation*} f(x) \triangleq f(x) \quad \text{ and } \quad f(t) \triangleq g(t) = f(x(t)) \end{equation*} % Now assume that $f(x)$ is differentiable for all $x \in \set{X}$ and $f(t)$ (\ie, $g(t)$) is differentiable for all $t \in \set{T}$. Therefore, there are two relevant derivatives, namely $f'(x)$ for all $x \in \set{X}$ and $f'(t)$ (\ie, $g'(t)$) for all $t \in \set{T}$. % \begin{itemize} \item We call $f'(t)$ the \symdef[\emph{total derivative of $f$ at point $t$}]{Ganalysis.2y}{total_deriv}{$\total f/\total t$}{total derivative of function $f$ at point $t$} and use the notations % \begin{enumerate}[(i)] \item $\frac{\total f}{\total t} \triangleq g'$ \label{item:total_deriv_function} \item $\frac{\total f(t)}{\total t} \triangleq f'(t) = g'(t) = f'( x(t)) x'(t)$ \label{item:total_deriv_point} \item $\left.\frac{\total f(t)}{\total t}\right|_{t = t_0} \triangleq = g'(t_0) = f'( x(t_0) ) x'(t_0)$ \label{item:total_deriv_function_eval} \end{enumerate} % Notation (\shortref{item:total_deriv_function}) represents the first derivative function. Notation (\shortref{item:total_deriv_point}) represents the first derivative function evaluated at point $t$ (\ie, the derivative of $g$ at $t \in \set{T}$). Notation (\shortref{item:total_deriv_function_eval}) represents the first derivative function evaluated at point $t_0$ (\ie, the derivative of $g$ at $t_0 \in \set{T}$). \item We call $f'(x)$ the \symdef[\emph{partial derivative of $f$ at point $x$}]{Ganalysis.2z}{partial_deriv}{$\partial f/\partial x$}{partial derivative of function $f$ with respect to $x$} and use the notations % \begin{enumerate}[(i)] \item $\frac{\partial f}{\partial x} \triangleq f'$ \label{item:partial_deriv_function} \item $\frac{\partial f(x)}{\partial x} \triangleq f'(x)$ \label{item:partial_deriv_point} \item $\left.\frac{\partial f(x)}{\partial x}\right|_{x = x_0} \triangleq f'(x_0)$ \label{item:partial_deriv_function_eval} \end{enumerate} % Notation (\shortref{item:partial_deriv_function}) represents the first derivative function. Notation (\shortref{item:partial_deriv_point}) represents the first derivative function evaluated at point $x$ (\ie, the derivative of $f$ at $x in \set{X}$). Notation (\shortref{item:partial_deriv_function_eval}) represents the first derivative function evaluated at point $x_0$ (\ie, the derivative of $f$ at $x_0 \in \set{X}$). \end{itemize} % By these definitions, it is clear that % \begin{equation*} \frac{ \total f(t) }{ \total t } = f'(t) = \frac{ \partial f(x(t)) }{ \partial x } \frac{ \total x(t) }{ \total t } \end{equation*} % This is a restatement of the chain rule. \subsection{Functions of Multiple Variables} Take sets $\set{X} \subseteq \R$, $\set{Y} \subseteq \R$, $\set{Z} \subseteq \R$, and $\set{T} \subseteq \R$ and a function $p: \set{X} \times \set{Y} \times \set{Z} \times \set{T} \mapsto \R$. Choose $x_0,y_0,z_0,t_0 \in \R$. Now, define the functions $p_x: \set{X} \mapsto \R$, $p_y: \set{Y} \mapsto \R$, and $p_z: \set{Z} \mapsto \R$ with % \begin{equation*} p_x(x) \triangleq p( x, y_0, z_0, t_0 ) \quad \text{ and } \quad p_y(y) \triangleq p( x_0, y, z_0, t_0 ) \quad \text{ and } \quad p_z(z) \triangleq p( x_0, y_0, z, t_0 ) \end{equation*} % Next, take functions $x: \set{T} \mapsto \R$, $y: \set{T} \mapsto \R$, and $z: \set{T} \mapsto \R$, and define functions $\hat{p}_x: \set{T} \mapsto \R$, $\hat{p}_y: \set{T} \mapsto \R$, and $\hat{p}_z: \set{T} \mapsto \R$ by % \begin{equation*} \hat{p}_x(t) \triangleq p_x(x(t)) = p( x(t), y_0, z_0, t_0 ) \end{equation*} % and % \begin{equation*} \hat{p}_y(t) \triangleq p_y(y(t)) = p( x_0, y(t), z_0, t_0 ) \end{equation*} % and % \begin{equation*} \hat{p}_z(t) \triangleq p_z(z(t)) = p( x_0, y_0, z(t), t_0 ) \end{equation*} % Therefore, we can define partial derivatives % \begin{equation*} \frac{ \partial p }{ \partial x } \triangleq \frac{ \partial p_x }{ \partial x } \quad \text{ and } \quad \frac{ \partial p }{ \partial y } \triangleq \frac{ \partial p_y }{ \partial y } \quad \text{ and } \quad \frac{ \partial p }{ \partial z } \triangleq \frac{ \partial p_z }{ \partial z } \end{equation*} % Clearly, provided differentiability holds, each of these can be considered functions with domain $\set{X} \times \set{Y} \times \set{Z} \times \set{T}$. Additionally, if % \begin{equation*} p(t) \triangleq p( x(t), y(t), z(t), t ) \end{equation*} % then it can be shown that the the \emph{total derivative} of $p(t)$ is % \begin{equation*} \frac{ \total p }{ \total t } = \frac{ \partial p }{ \partial x } \frac{ \total x }{ \total t } + \frac{ \partial p }{ \partial y } \frac{ \total y }{ \total t } + \frac{ \partial p }{ \partial z } \frac{ \total z }{ \total t } + \frac{ \partial p }{ \partial t } \end{equation*} % This general form extends to functions of any finite number of variables. \subsection{Second and Higher Total Derivatives} Take set $\set{T} \subseteq \R$ and function $f: \set{T} \mapsto \R$. Define the notations\symdef[]{Ganalysis.2y2}{second_total_deriv}{$\total^2 f/{\total t}^2$}{second total derivative of function $f$ (\ie, $f''$)} \symdef[]{Ganalysis.2y3}{third_total_deriv}{$\total^3 f/{\total t}^3$}{third total derivative of function $f$ (\ie, $f'''$)} \symdef[]{Ganalysis.2yn}{n_total_deriv}{$\total^n f/{\total t}^n$}{$n\th$ total derivative of function $f$ (\ie, $f^{(n)}$)} % \begin{equation*} \frac{ \total^2 f }{ {\total t}^2 } \triangleq f'' \quad \text{ and } \quad \frac{ \total^3 f }{ {\total t}^3 } \triangleq f''' \quad \text{ and } \quad \frac{ \total^4 f }{ {\total t}^4 } \triangleq f^{(4)} \end{equation*} % and, in general, % \begin{equation*} \frac{ \total^n f }{ {\total t}^n } \triangleq f^{(n)} \end{equation*} % for all $n \in \{4,5,6,7,\dots\}$. \subsection{Second Partial Derivatives} Take sets $\set{X} \subseteq \R$, $\set{Y} \subseteq \R$, $\set{Z} \subseteq \R$, and $\set{T} \subseteq \R$ and a function $p: \set{X} \times \set{Y} \times \set{Z} \mapsto \R$. Assuming differentiability, the partial derivatives\symdef[]{Ganalysis.2zxy}{second_partial_deriv}{$\partial^2 f/\partial x \partial y$}{partial derivative of function $\partial f/\partial x$ with respect to $y$} % \begin{equation*} \frac{ \partial p }{ \partial x } \quad \text{ and } \quad \frac{ \partial p }{ \partial y } \quad \text{ and } \quad \frac{ \partial p }{ \partial z } \end{equation*} % are each functions with domain $\set{X} \times \set{Y} \times \set{Z}$. The notations % \begin{equation*} \frac{ \partial^2 p }{ \partial x \partial x } \quad \text{ and } \quad \frac{ \partial^2 p }{ \partial x \partial y } \quad \text{ and } \quad \frac{ \partial^2 p }{ \partial x \partial z } \end{equation*} % represent the partial derivatives of $\partial p/\partial x$ with respect to $x$, $y$, and $z$ respectively. However, $\partial^2 p/\partial x \partial x$ is usually denoted $\partial^2 p/{\partial x}^2$. That is, % \begin{equation*} \begin{matrix} \frac{ \partial^2 p }{ {\partial x}^2 } & \frac{ \partial^2 p }{ \partial x \partial y } & \frac{ \partial^2 p }{ \partial x \partial z }\\ \frac{ \partial^2 p }{ \partial y \partial x } & \frac{ \partial^2 p }{ {\partial y}^2 } & \frac{ \partial^2 p }{ \partial y \partial z }\\ \frac{ \partial^2 p }{ \partial z \partial x } & \frac{ \partial^2 p }{ \partial z \partial y } & \frac{ \partial^2 p }{ {\partial z}^2 } \end{matrix} \end{equation*} % represent each of the nine second partial derivatives of the function $p$. \section{Special Real-Valued Functions} Here, we discuss a number of commonly used real-valued functions and classes of real-valued functions. \subsection{The Exponential Function and Logarithms} \label{app:math_logarithms} Now that sequences and series have been defined, it is possible to construct \emph{Euler's number}, an important constant in mathematics. This gives us an opportunity to introduce logarithms in terms of Euler's constant. \paragraph{Euler's Number:} The symbol \symdef{Bnumbers.55}{econst}{$e$}{Euler's number (\ie, constant $e \approx 2.71828182845904523536$)} is often defined by % \begin{equation} e \triangleq \sum\limits_{i=0}^\infty \frac{1}{n!} \label{eq:definition_e} \end{equation} % where the symbol $!$ indicates a \symdef[\emph{factorial}]{Ganalysis.001}% {factorial}{${n\bang}$}{factorial of $n$ (\ie, ${n\bang}=1\times2\times\cdots\times n$ with ${0\bang}=1$)}, which is defined so that for $n \in \W$, % \begin{equation*} n! \triangleq \begin{cases} 1 &\text{if } n=0\\ 1 \times 2 \times 3 \times \cdots \times n &\text{if } n > 0 \end{cases} \end{equation*} % It can be shown that the series in \longref{eq:definition_e} converges. In fact, using the result summarized by \longref{eq:theorem_limsupinf_seq}, it can be shown that % \begin{equation*} e = \lim\limits_{n \to \infty} \left(1 + \frac{1}{n}\right)^n \end{equation*} % For technical reasons, $e$ has a great many applications in mathematics. Because $e$ is an \emph{irrational number}, its value cannot be written in a compact fashion. However, it is approximately (\ie, the difference between this rational number and $e$ is very small) % \begin{equation*} e \approx 2.71828182845904523536 \end{equation*} % where \symdef{Ageneral.1}{approx}{$\approx$}{is approximately equal to} is a symbol indicating an approximation rather than an equality. \paragraph{Exponential Function:} \symdef{Bnumbers.595}{expfunc}{$\exp(x)$}{exponential function (\ie, $\exp(x) \triangleq e^x$)}The \emph{exponential function} ${\exp}: \R \mapsto \R_{>0}$ is defined by % \begin{equation*} \exp(x) \triangleq e^x \end{equation*} % This is a widely used function in science and mathematics. \paragraph{The Natural Logarithm:} The \emph{natural logarithm} of a positive real number $x \in \R_{>0}$ is denoted by $\log_e(x)$ or \symdef{Bnumbers.58}{naturallog}{$\ln(x)$}{natural logarithm of positive real number $x$ (\ie, $e^{\ln(x)} = x$)} is such that % \begin{equation*} e^{\ln(x)} = x \end{equation*} % In other words, the natural logarithm is the \emph{exponent} or \emph{power} to which the number $e$ is \emph{raised} in order to result in the positive real number $x$. Note that $\ln(1)=0$, $\ln(e)=1$, and $\ln(e^n)=n$ for all $n \in \W$. Also note that when the logarithm is viewed as a function ${\ln}: \R_{>0} \mapsto \R$, it is the inverse of the exponential function $\exp: \R \mapsto \R_{>0}$. In other words, for all $x \in \R$ and $y \in \R_{>0}$, % \begin{equation*} \ln(\exp(x)) = x \quad \text{ and } \quad \exp(\ln(y)) = y \end{equation*} \paragraph{The Common Logarithm:} The \emph{common logarithm} of a positive real number $x \in \R_{>0}$ is denoted by $\log_{10}(x)$ or \symdef{Bnumbers.57}{commonlog}{$\log(x)$}{common logarithm of positive real number $x$ (\ie, $10^{\log(x)} = x$)} is such that % \begin{equation*} 10^{\log(x)} = x \end{equation*} % In other words, the common logarithm is the exponent or power to which the number $10$ is raised in order to result in the positive real number $x$. Note that $\log(1)=0$, $\log(10)=1$, and $\log(10^n)=n$ for all $n \in \W$. \paragraph{The Logarithm:} For \emph{base} $b \in \R_{>0}$, the \emph{logarithm} of a number \emph{in base $b$} is denoted by \symdef{Bnumbers.56}{log}{$\log_b(x)$}{logarithm of positive real number $x$ in base $b$ (\ie, $b^{\log_b(x)} = x$)} is such that % \begin{equation*} b^{\log_b(x)} = x \end{equation*} % In other words, the logarithm is the exponent or power to which the positive real number $b$ is raised in order to result in the positive real number $x$. Note that $\log_b(1)=0$, $\log_b(b)=1$, and $\log_b(b^n)=n$ for all $n \in \W$. It can be shown that for $a,b,x \in \R_{>0}$, % \begin{equation*} \log_c(x) = \frac{ \log_b(x) }{ \log_b(c) } \end{equation*} % In particular, % \begin{equation*} \log_c(x) = \frac{ \ln(x) }{ \ln(b) } \quad \text{ and } \quad \log(x) = \frac{ \ln(x) }{ \ln(10) } \end{equation*} % Therefore, the choice of base $b$ is usually arbitrary when the general term \emph{logarithm} is used. \subsection{Special Classes of Real Functions} \label{app:math_special_real_functions} There are classes of real-valued functions that take forms that have some useful properties. Here, we introduce two such classes that we will use frequently. \paragraph{Polynomials:} Take $n \in \W$ and indexed family $(a_i : i \in \{0,1,2,\dots,n\})$ where $a_i \in \R$ for all $i \in \{0,1,2,\dots,n\}$. Take subset $\set{E} \subseteq \R$ and function $f: \set{E} \mapsto \R$ defined by % \begin{equation*} f(x) \triangleq a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n \end{equation*} % The function $f$ is called a \emph{polynomial} and is continuous (and therefore differentiable) on set $\set{E}$. In fact, the derivative $f': \set{E} \mapsto \R$ is also a polynomial, and so it will also be continuous (and differentiable) on set $\set{E}$. \paragraph{Rational Functions:} Take subset $\set{E} \subseteq \R$ and polynomial functions $p: \set{E} \mapsto \R$ and $q: \set{E} \mapsto \R$. Define set $\set{E}_0 \subseteq \set{E}$ by % \begin{equation*} \set{E}_0 \triangleq \{ x \in \set{E} : q(x) \neq 0 \} \end{equation*} % That is, $\set{E}_0$ is the set of points in $\set{E}$ for which polynomial $q$ is not $0$. Now take a function $h: \set{E}_0 \mapsto \R$ defined by % \begin{equation*} h(x) \triangleq \frac{p(x)}{q(x)} \end{equation*} % The function $h$ is called a \emph{rational function} and is continuous (and therefore differentiable) on the set $\set{E}_0$. In fact, the derivative $h': \set{E}_0 \mapsto \R$ is also a rational function, and so it will also be continuous (and differentiable) on set $\set{E}_0$. \section{Coordinate Vectors and Matrices} \label{app:math_vectors_matrices} We now define constructs from linear algebra that have many practical applications. In particular, we define the coordinate vector space and spaces of matrices that can shape vectors from that space. \subsection{The Coordinate Vector Space} \label{app:math_coord_vector_space} Let $(\set{F},{+},{\times},0,1)$ be a field with set elements called scalars. Take $n \in \N$ to be some finite natural number. For simplicity, assume that $n > 1$. However, note that all of these definitions naturally extend to the $n=1$ case and should be (\eg, $\set{F}$ can be substituted for $\set{F}^1$). Note that an element $\v{x} \in \set{F}^n$ (recall this notation from \longref{app:math_cartesian_prod}) takes the form % \begin{equation*} \v{x} = (x_1, x_2, x_3, \dots, x_n) \end{equation*} % where $x_i \in \set{F}$ for all $i \in \{1,2,3,\dots,n\}$. That is, for vector $\v{y} \in \set{F}^n$, the $i\th$ coordinate of $\v{y}$ is denoted \symdef{Hvectors.2}{ithcoordinate}{$y_i$}{the $i\th$ coordinate of vector $\v{y}$}. Next, take any $x,y \in \set{F}^n$. Define the operation ${+}: \set{F}^n \times \set{F}^n \mapsto \set{F}^n$ such that % \begin{equation*} \v{x} + \v{y} \triangleq (x_1+y_1,x_2+y_2,x_3+y_3,\dots,x_n+y_n) \end{equation*} Also take $a \in \set{F}$. Define the operation ${\times}: \set{F} \times \set{F}^n \mapsto \set{F}^n$ (which will be represented with juxtaposition) by % \begin{equation*} a\v{x} \triangleq (a x_1, a x_2, a x_3, \dots, a x_n) \end{equation*} Additionally, define the notation $-\v{x}$ to represent % \begin{equation*} {-\v{x}} \triangleq ({-x_1},{-x_2},{-x_3},\dots,{-x_n}) \end{equation*} % and use $0$ to represent % \begin{equation*} 0 \triangleq \{0\}^n = (0,0,0,\dots,0) \end{equation*} % Of course, $0 \in \set{F}^n$. Also use the notation $\v{x} - \v{y}$ to represent $\v{x} + {-\v{y}}$. Finally, use \symdef{Hvectors.42}{ithbasisvector}{$\v{e}_i$}{the $i\th$ elementary (or standard) basis vector} for $i \in \{1,2,3,\dots,n\}$ to represent % \begin{align*} \v{e}_1 &\triangleq (1,0,0,\dots,0)\\ \v{e}_2 &\triangleq (0,1,0,\dots,0)\\ \v{e}_3 &\triangleq (0,0,1,\dots,0)\\ &\vdots\\ \v{e}_n &\triangleq (0,0,0,\dots,1) \end{align*} % These are called \emph{basis vectors} for $\set{F}^n$ because, using the definitions above, for any vector $\v{z} \in \set{F}^n$, there exists an $n$-tuple $(a_1,a_2,a_3,\dots,a_n)$ where $a_i \in \set{F}$ for all $i \in \N$ (\ie, scalars) such that % \begin{equation*} \v{z} = a_1 \v{e}_1 + a_2 \v{e}_2 + a_3 \v{e}_3 \cdots + a_n \v{e}_n \end{equation*} % In particular, for these particular basis vectors, $(a_1,a_2,a_3,\dots,a_n)=(z_1,z_2,z_3,\dots,z_n)$. Thus, these are called the \emph{elementary (or standard) basis vectors} for $\set{F}^n$. Note that $\v{x} + 0 = 0 + \v{x} = \v{x}$. Also note that $\v{x} - \v{x} = {-\v{x}} + \v{x} = 0$. In this case, it is easy to show that for all $a,b \in \set{F}$ and $\v{x},\v{y} \in \set{F}^n$, % \begin{itemize} \item $(\set{F}^n,{+})$ is a commutative group \item $a( x + y ) = ax + ay$ \item $(a + b)x = ax + bx$ \item $a(bx) = (ab)x$ \item $1x = x$ \end{itemize} % Because of this, $\set{F}^n$ is a vector space over the field $\set{F}$. In particular, we call $\set{F}^n$ a \emph{coordinate (vector) space}. Elements of $\set{F}^n$ are thus called \emph{vectors} and elements of $\set{F}$ are thus called \emph{scalars}. We will typically use the coordinate vector space $\R^n$, which is called the \emph{real coordinate space}. Later, we will endow $\R^n$ with a particular notion of distance, in which $\R^n$ will become a \emph{Euclidean space}. \paragraph{Notation and the Covector Space:} Take $n \in \N$ and the coordinate vector space $\set{F}^n$ with operations defined above. Take a vector $\v{x} \in \set{F}^n$. Rather than denoting $\v{x} = (x_1, x_2, x_3, \dots, x_n)$, use the \emph{matrix notation} % \begin{equation*} \v{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_n \end{bmatrix} \end{equation*} % That is, denote $\v{x}$ as a \emph{column} vector and call it an \emph{$n$-dimensional vector} or simply an \emph{$n$-vector}. Now call $\set{F}^{1 \times n}$ a vector space identical to $\set{F}^n$ except with elements represented as \emph{row} vectors that are otherwise called \emph{covectors}. That is, call $\set{F}^{1 \times n}$ a \emph{covector space}. Take covector $\v{y} \in \set{F}^{1 \times n}$. The covector $\v{y}$ is represented by % \begin{equation*} \v{y} = \begin{bmatrix} y_1 & y_2 & y_3 & \dots & y_n \end{bmatrix} \end{equation*} % Therefore, call $\v{y}$ an \emph{$n$-dimensional covector} or simply an \emph{$n$-covector}. Now define the \emph{transpose} operations ${\T}: \set{F}^n \mapsto \set{F}^{1 \times n}$ and ${\T}: \set{F}^{1 \times n} \mapsto \set{F}^n$ so that\symdef[]{Hvectors.3}{xtranspose}{$\v{x}^\T$}{the transpose of vector or covector $\v{x}$ (\ie, if $\v{x}$ is an $n$-vector then $\v{x} = [x_1, x_2, \dots, x_n]^\T)$} % \begin{equation*} \v{x}^\T = \begin{bmatrix} x_1 & x_2 & x_3 & \dots & x_n \end{bmatrix} \end{equation*} % and % \begin{equation*} \v{y}^\T = \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n \end{bmatrix} \end{equation*} % Therefore, $\v{x} = [ x_1, x_2, x_3, \dots, x_n ]^\T$, and so this notation will often be used when it is more convenient to denote vectors horizontally. That is, all vectors are transposes of covectors and all covectors are transposes of vectors. Because of this duality, the vector space $\set{F}^n$ will sometimes be denoted $\set{F}^{n \times 1}$. These topics will be generalized in \longref{app:math_matrices}. \paragraph{Multiplication of Vectors and Covectors:} Take $n \in \N$ and the coordinate vector space $\set{F}^n$ and coordinate covector space $\set{F}^{1 \times n}$. The multiplication operators $\times: \set{F}^{1 \times n} \times \set{F}^n \mapsto \set{F}$ and $\times: \set{F} \times \set{F}^{1 \times n} \mapsto \set{F}$ (both denoted by juxtaposition) are defined such that for vectors $\v{x},\v{y} \in \set{F}^n$, % \begin{equation*} \v{x} \v{y}^\T \triangleq \v{x}^\T \v{y} \triangleq x_1 y_1 + x_2 y_2 + x_3 y_3 + \dots + x_n y_n = \sum\limits_{i=1}^n x_i y_i \end{equation*} % Note that % \begin{align*} \v{x}^\T \v{y} = \v{y}^\T \v{x} = \v{x} \v{y}^\T = \v{y} \v{x}^\T \end{align*} % and, of course, all of these products are scalars by definition. \subsection{Real Inner-Product Spaces} Take $n \in \N$ and real coordinate vector space $\R^n$. Define a bilinear function \symdef[]{Hvectors.4}{innerprod}{$\langle \v{x}, \v{y} \rangle$}{the inner product of vectors $\v{x}$ and $\v{y}$}$\langle \cdot , \cdot \rangle : \R^n \times \R^n \mapsto \R$. That is, for any scalars $a,b \in \set{F}$ and any vectors $\v{x},\v{y},\v{z} \in \R^n$, % \begin{equation*} \langle a \v{x} + b \v{y}, \v{z} \rangle = a \langle \v{x}, \v{z} \rangle + b \langle \v{y}, \v{z} \rangle \end{equation*} % and % \begin{equation*} \langle \v{x}, a \v{y} + b \v{z} \rangle = a \langle \v{x}, \v{y} \rangle + b \langle \v{x}, \v{z} \rangle \end{equation*} % In this case, $\langle \cdot, \cdot \rangle$ is called an \emph{real inner product}. This may sometimes be denoted with the \emph{Dirac inner-product notation} $\langle \cdot | \cdot \rangle$, which replaces the comma (\ie, $,$) with a vertical bar (\ie, $|$). If $\R^n$ is endowed with a real inner product then it is called a \emph{real inner-product space}. \paragraph{Dot Product:} For $n \in \N$, on $\R^n$, the standard inner product is the \emph{dot product}, which is defined for any $\v{x},\v{y} \in \R^n$ by % \begin{align*} \langle \v{x}, \v{y} \rangle &\triangleq x_1 y_1 + x_2 y_2 + x_3 y_3 + \dots + x_n y_n\\ &= \sum\limits_{i=1}^n x_i y_i\\ &= \v{x}^\T \v{y} = \v{x} \v{y}^\T = \v{y}^\T \v{x} = \v{y} \v{x}^\T \end{align*} % That is, the dot product is simply the multiplication of one vector by the transpose of the other and results in a scalar. This motivates the \emph{Dirac inner-product notation} which defines $\langle \v{x} |$ and $| \v{y} \rangle$ by % \begin{align*} \langle \v{x} | &\triangleq \v{x}^\T\\ | \v{y} \rangle &\triangleq \v{y} \end{align*} % The inner product denoted $\langle \v{x} | \v{y} \rangle$ is defined by % \begin{equation*} \langle \v{x} | \v{y} \rangle \triangleq \langle \v{x} | | \v{y} \rangle = \v{x}^\T \v{y} \end{equation*} % Thus, it is natural to use this notation when using the dot product as an inner product. \subsection{Normed Vector Spaces} \label{app:math_normed_vector_spaces} Take $n \in \N$ and coordinate vector space $\set{F}^n$. However, assume that $\set{F}$ is a subfield of $\R$. Therefore, $\set{F}$ is an ordered field with an absolute value defined. This definition can be expanded so that $\set{F}$ can be a subfield of the \emph{complex numbers}, which we do not discuss, since the complex numbers also have an absolute value function. However, we restrict our definition to the real case. \symdef[]{Hvectors.401}{realnorm}{$\ppipe \v{x} \ppipe$}{the norm of vector $\v{x}$}Define a function $\|\cdot\| : \set{F}^n \mapsto \R$ such that for all $a \in \set{F}$ and all $\v{x},\v{y} \in \set{F}^n$, % \begin{enumerate}[(i)] \item $\| a \v{x} \| = |a| \| \v{x} \|$ \item $\| \v{x} + \v{y} \| \leq \|\v{x}\|+\|\v{y}\|$ \item $\| \v{x} \| = 0$ if and only if $\v{x} = 0$ \end{enumerate} % In this case, $\|\cdot\|$ is called a \emph{(real) norm} and $\set{F}^n$ is called a \emph{normed vector space}. Note that $\|\v{x}\| \geq 0$ for all $\v{x} \in \set{F}^n$ with equality only when $\v{x}=0$. In \longref{app:math_euclidean_space}, we will demonstrate how norms can be defined on $\R^n$, a frequently used coordinate vector space. \paragraph{Norms as Metrics:} For a normed vector space $\set{F}^n$, define $d: \set{F}^n \times \set{F}^n \mapsto \set{F}$ such that for any $\v{x},\v{y} \in \set{F}^n$, % \begin{equation*} d(\v{x},\v{y}) = \| \v{x} - \v{y} \| \end{equation*} % It can be shown that $d$ is a metric. Therefore, any normed vector space is a metric space. In fact, since any inner-product space is a normed vector space, any inner-product space is a metric space. \paragraph{Metrics as Norms and Boundedness:} Take metric space $(\set{V},d)$ where $\set{F}$ is a subfield of $\R$ and $\set{V}$ is a vector space over $\set{F}$. Assume that the metric $d$ satisfies the additional properties % \begin{enumerate}[(i)] \item $d(\v{x},\v{y}) = d(\v{x}+\v{z},\v{y}+\v{z})$ \item $d(a\v{x},a\v{y}) = |a|d(\v{x},\v{y})$ \label{item:homogenous_metric} \end{enumerate} % for all $\v{x},\v{y},\v{z} \in \set{V}$ and all $a \in \set{F}$. Now, for all $\v{x} \in \set{V}$, define $\|\v{x}\|$ by % \begin{equation*} \|\v{x}\| \triangleq d(\v{x},0) \end{equation*} % where $0$ is the additive identity for $(\set{V},{+})$. It can be shown that this $\|\cdot\|$ satisfies the conditions required for being a norm. That is, under these restrictions on a metric space, the distance from zero can be considered to be the norm (\ie, the length) of a vector. Finally, note the following two remarks. % \begin{itemize} \item This definition of a norm induced by a metric can be weakened so that $\set{F}$ may also be the subfield of the \emph{complex numbers}, which are not ordered. However, we do not introduce the complex numbers here and so we force $\set{F}$ to be a subfield of $\R$. \item Metric spaces with this quality are metric spaces in which \emph{bounded} with respect to order is equivalent to \emph{bounded} with respect to metric. That is, the norm induced by this metric gives the distance away from zero; however, the notion of order can be viewed as a distance away from zero as well. Thus, being bounded in one sense is equivalent to being bounded in the other sense. \end{itemize} % Therefore, many metric spaces are also normed vector spaces. \subsection{The Euclidean Space} \label{app:math_euclidean_space} We will frequently use the \emph{Euclidean space} $\R^n$, which is defined to be a vector space equipped with a special inner product, norm, and metric. In fact, what is special about the Euclidean space is how distances are defined. This space captures the familiar notions of distance. Before defining this space precisely, we must show how norms, inner products, and metrics are related (on $\R^n$). \paragraph{Norm Induced by Inner-Product:} Take the real inner-product space $\R^n$. By the properties of the inner product, the a norm $\|\cdot\|$ can be defined so that for any $\v{x} \in \R^n$, % \begin{equation*} \|\v{x}\| \triangleq \sqrt{\langle \v{x}, \v{x} \rangle} \end{equation*} % and, of course, % \begin{equation*} \|\v{x}\|^2 = \langle \v{x}, \v{x} \rangle \end{equation*} % This is known as the norm \emph{induced by} the inner product. It can be shown that for all $\v{x} \in \R^n$, $\|\v{x}\| \geq 0$ where $\|\v{x}\| > 0$ if and only if $\v{x} \neq 0$. Thus, every inner-product space is also a normed space. \paragraph{2-Norm Induced by Dot Product:} Take the real inner-product space $\R^n$ where the inner product is taken to be the dot product. \symdef[]{Hvectors.402}{2norm}{$\ppipe \v{x} \ppipe_2$}{the Euclidean norm of vector $\v{x}$ (\ie, the norm induced by the dot product)}In other words, for all $\v{x},\v{y} \in \R^n$, % \begin{equation*} \langle \v{x}, \v{y} \rangle = \v{x}^\T \v{y} \end{equation*} % The \emph{$2$-norm} or the \emph{Euclidean norm}, denoted $\|\cdot\|_2$, is the norm induced by this inner product. That is, for $\v{x} \in \R^n$, % \begin{equation*} \|\v{x}\|_2 \triangleq \sqrt{ \v{x}^\T \v{x} } = \sqrt{ x_1^2 + x_2^2 + x_3^2 + \dots + x_n^2 } \end{equation*} % and, of course, % \begin{equation*} \|\v{x}\|_2^2 = \v{x}^\T \v{x} = x_1^2 + x_2^2 + x_3^2 + \dots + x_n^2 \end{equation*} \paragraph{The Euclidean Metric:} Take the real inner-product space $\R^n$ with the dot product and the Euclidean norm (\ie, the $2$-norm). The \emph{Euclidean metric} $d: \R^n \times \R^n \mapsto \R$ is defined so that for any $\v{x},\v{y} \in \R^n$, % \begin{align*} d( \v{x}, \v{y} ) &\triangleq \| \v{x} - \v{y} \|_2\\ &= \sqrt{ ( \v{x} - \v{y} )^\T ( \v{x} - \v{y} ) }\\ &= \sqrt{ (x_1-y_1)^2 + (x_2-y_2)^2 + (x_3-y_3)^2 + \cdots + (x_n-y_n)^2 } \end{align*} % and so % \begin{equation*} d( \v{x}, \v{y} )^2 = (x_1-y_1)^2 + (x_2-y_2)^2 + (x_3-y_3)^2 + \cdots + (x_n-y_n)^2 \end{equation*} % This is a very familiar distance function. Note that % \begin{itemize} \item $d(\v{x},\v{y}) = d(\v{x}+\v{z},\v{y}+\v{z})$ \item $d(a\v{x},a\v{y}) = |a|d(\v{x},\v{y})$ \label{item:homogenous_euclidean_metric} \end{itemize} % for all $\v{x},\v{y},\v{z} \in \R^n$ and $a \in \R$. By the discussion in \longref{app:math_normed_vector_spaces}, $\R^n$ is a special metric space where the term \emph{bounded} with respect to order is equivalent to the term \emph{bounded} with respect to metric. \paragraph{The Euclidean Space:} Take $n \in \N$ and real coordinate space $\R^n$. Equip $\R^n$ with the dot product, the Euclidean norm (\ie, the $2$-norm), and the Euclidean metric. Of course, $\R^n$ is a metric space and thus a Hausdorff topological space. Additionally, under these conditions, \symdef[]{Bnumbers.545}{euclideanspace}{$\R^n$}{the Euclidean $n$-space} is called the \emph{Euclidean space} or the \emph{Euclidean $n$-space}. \paragraph{The Euclidean Topology and Compact Sets:} The set of sets $\setset{T} \subseteq \Pow(\R^n)$ defined by % \begin{equation*} \setset{T} \triangleq \{ \set{X}_1 \times \set{X}_2 \times \cdots \times \set{X}_n : \set{X}_i \text{ is an open set in } \R \} \end{equation*} % is the standard topology (\ie, the open sets) for $\R^n$ (\ie, $(\R^n,\set{T})$ is a topological space). Of course, we define open sets in $\R$ using the standard metric for $\R$ (\ie, $d(x,y)=|x-y|$ for all $x,y \in \R$). The set $\setset{T}$ is called the \emph{box topology} for reasons involving a geometric interpretation of the shape of its open sets. Because $n$ is a finite number, $\setset{T}$ will also be called a \emph{product topology} for reasons outside the scope of this document. A subset $\set{X} \subseteq \R^n$ is compact if and only if it is closed and bounded, using the standard definitions of a closed set and a bounded set in a metric space (though, in this space, boundedness in the sense of order is also applicable). \subsection{Matrices} \label{app:math_matrices} Take $n,m \in \N$, the vector space $\set{F}^n$, and the $m$-tuple of $\set{F}^n$ $n$-vectors $(\v{x}^1,\v{x}^2,\v{x}^3,\dots,\v{x}^m)$. Recall that % \begin{equation*} \v{x}^1 = \begin{bmatrix} x_1^1 \\ x_2^1 \\ x_3^1 \\ \vdots\\ x_n^1 \end{bmatrix} \end{equation*} % Collect each of the $m$ vectors into an \emph{$n$-by-$m$ matrix} $\mat{X}$ with $n$ rows and $m$ columns so that % \begin{equation*} \mat{X} \triangleq \begin{bmatrix} x_1^1 & x_1^2 & x_1^3 & \cdots & x_1^m \\ x_2^1 & x_2^2 & x_2^3 & \cdots & x_2^m \\ x_3^1 & x_3^2 & x_3^3 & \cdots & x_3^m \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_n^1 & x_n^2 & x_n^3 & \cdots & x_n^m \end{bmatrix} \end{equation*} % which can be written more compactly as % \begin{equation*} \mat{X} = \begin{bmatrix} \v{x}^1 & \v{x}^2 & \v{x}^3 & \cdots & \v{x}^m \end{bmatrix} \end{equation*} % \symdef[]{Bnumbers.5451}{realmatrices}{$\R^{n \times m}$}{space of $n$-by-$m$ real matrices}All $n$-by-$m$ matrices are said to be elements of the $\set{F}^{n \times m}$ space. Matrices from a space of the form $\set{F}^{n \times n}$ (\ie, $n=m$) are said to be \emph{square matrices}. Now take the covector space $\set{F}^{1 \times n}$ and the $m$-tuple of $\set{F}^{1 \times n}$ $n$-covectors $(\v{y}^1,\v{y}^2,\v{y}^3,\dots,\v{y}^m)$. Collect each of the $m$ covectors into an \emph{$m$-by-$n$ matrix} $\mat{Y}$ with $m$ rows and $n$ columns so that % \begin{equation*} \mat{Y} \triangleq \begin{bmatrix} y_1^1 & y_2^1 & y_3^1 & \cdots & y_n^1 \\ y_1^2 & y_2^2 & y_3^2 & \cdots & y_n^2 \\ y_1^3 & y_2^3 & y_3^3 & \cdots & y_n^3 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ y_1^m & y_2^m & y_3^m & \cdots & y_n^m \end{bmatrix} \end{equation*} % which can be written more compactly as % \begin{equation*} \mat{Y} = \begin{bmatrix} \v{y}^1 \\ \v{y}^2 \\ \v{y}^3 \\ \vdots \\ \v{y}^m \end{bmatrix} \end{equation*} % All $m$-by-$n$ matrices are said to be elements of the $\set{F}^{m \times n}$ space. \symdef[]{Hvectors.31}{mattranspose}{$\mat{A}^\T$}{the transpose of matrix $\mat{A}$}Now define the transpose operators $\T: \set{F}^{n \times m} \mapsto \set{F}^{m \times n}$ and $\T: \set{F}^{m \times n} \mapsto \set{F}^{n \times m}$ such that % \begin{equation*} \mat{X}^\T = \begin{bmatrix} {\v{x}^1}^\T \\ {\v{x}^2}^\T \\ {\v{x}^3}^\T \\ \vdots \\ {\v{x}^m}^\T \end{bmatrix} = \begin{bmatrix} x_1^1 & x_2^1 & x_3^1 & \cdots & x_n^1 \\ x_1^2 & x_2^2 & x_3^2 & \cdots & x_n^2 \\ x_1^3 & x_2^3 & x_3^3 & \cdots & x_n^3 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_1^m & x_2^m & x_3^m & \cdots & x_n^m \end{bmatrix} \end{equation*} % and % \begin{equation*} \mat{Y}^\T = \begin{bmatrix} {\v{y}^1}^\T & {\v{y}^2}^\T & {\v{y}^3}^\T & \cdots & {\v{y}^m}^\T \end{bmatrix} = \begin{bmatrix} y_1^1 & y_1^2 & y_1^3 & \cdots & y_1^m \\ y_2^1 & y_2^2 & y_2^3 & \cdots & y_2^m \\ y_3^1 & y_3^2 & y_3^3 & \cdots & y_3^m \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ y_n^1 & y_n^2 & y_n^3 & \cdots & y_n^m \end{bmatrix} \end{equation*} \paragraph{Matrix Addition:} Take $m,n \in \N$. Take matrices $\mat{X},\mat{Y} \in \set{F}^{n \times m}$ denoted to make their columns explicit. That is, % \begin{equation*} \mat{X} = \begin{bmatrix} \v{X}^1 & \v{X}^2 & \v{X}^3 & \cdots & \v{X}^m \end{bmatrix} \quad \text{ and } \quad \mat{Y} = \begin{bmatrix} \v{Y}^1 & \v{Y}^2 & \v{Y}^3 & \cdots & \v{Y}^m \end{bmatrix} \end{equation*} % Now, define the matrix addition operator ${+}: \set{F}^{n \times m} \times \set{F}^{n \times m} \mapsto \set{F}^{n \times m}$ so that % \begin{equation*} \mat{X} + \mat{Y} \triangleq \begin{bmatrix} (\v{X}^1 + \v{Y}^1) & (\v{X}^2 + \v{Y}^2) & (\v{X}^3 + \v{Y}^3) & \cdots & (\v{X}^m + \v{Y}^m) \end{bmatrix} \end{equation*} % Equivalently, define the matrix addition operator ${+}: \set{F}^{m \times n} \times \set{F}^{m \times n} \mapsto \set{F}^{m \times n}$ so that for $\mat{X},\mat{Y} \in \set{F}^{m \times n}$, % \begin{equation*} \mat{X}^\T + \mat{Y}^\T \triangleq \begin{bmatrix} (\v{X}^1 + \v{Y}^1)^\T \\ (\v{X}^2 + \v{Y}^2)^\T \\ (\v{X}^3 + \v{Y}^3)^\T \\ \vdots \\ (\v{X}^m + \v{Y}^m)^\T \end{bmatrix} \end{equation*} % Note that matrix addition is commutative and associative. \paragraph{Scalar (Matrix) Multiplication:} Take $m,n \in \N$. Take matrix $\mat{X} \in \set{F}^{n \times m}$ denoted to make its columns explicit. That is, % \begin{equation*} \mat{X} = \begin{bmatrix} \v{X}^1 & \v{X}^2 & \v{X}^3 & \cdots & \v{X}^m \end{bmatrix} \end{equation*} % Now, define the scalar multiplication operator ${\times}: \set{F} \times \set{F}^{n \times m} \mapsto \set{F}^{n \times m}$ so that for $a \in \set{F}$, % \begin{equation*} a \mat{X} \triangleq \begin{bmatrix} a \v{X}^1 & a \v{X}^2 & a \v{X}^3 & \cdots & a \v{X}^m \end{bmatrix} \end{equation*} % Equivalently, define the scalar multiplication operator ${\times}: \set{F} \times \set{F}^{m \times n} \mapsto \set{F}^{m \times n}$ so that for $a \in \set{F}$, % \begin{equation*} a \mat{X}^\T \triangleq \begin{bmatrix} ( a \v{X}^1 )^\T \\ ( a \v{X}^2 )^\T \\ ( a \v{X}^3 )^\T \\ \vdots \\ ( a \v{X}^m )^\T \end{bmatrix} \end{equation*} % In other words, a scalar multiplied by a matrix will multiply each scalar element of the matrix by the scalar. \paragraph{Matrix Multiplication:} Take $k,m,n \in \N$. Take matrices $\mat{X} \in \set{F}^{m \times k}$ and $\mat{Y} \in \set{F}^{m \times n}$ denoted to make their columns explicit. That is, % \begin{equation*} \mat{X} = \begin{bmatrix} \v{X}^1 & \v{X}^2 & \v{X}^3 & \cdots & \v{X}^k \end{bmatrix} \quad \text{ and } \quad \mat{Y} = \begin{bmatrix} \v{Y}^1 & \v{Y}^2 & \v{Y}^3 & \cdots & \v{Y}^n \end{bmatrix} \end{equation*} % Note that $\mat{X}^\T \in \set{F}^{k \times m}$. Now, define the matrix multiplication operator ${\times}: \set{F}^{k \times m} \times \set{F}^{m \times n} \mapsto \set{F}^{k \times n}$ (using juxtaposition notation) so that % \begin{equation*} \mat{X}^\T \mat{Y} \triangleq \begin{bmatrix} {\v{X}^1}^\T \v{Y}^1 & {\v{X}^1}^\T \v{Y}^2 & {\v{X}^1}^\T \v{Y}^3 & \cdots & {\v{X}^1}^\T \v{Y}^n \\ {\v{X}^2}^\T \v{Y}^1 & {\v{X}^2}^\T \v{Y}^2 & {\v{X}^2}^\T \v{Y}^3 & \cdots & {\v{X}^2}^\T \v{Y}^n \\ {\v{X}^3}^\T \v{Y}^1 & {\v{X}^3}^\T \v{Y}^2 & {\v{X}^3}^\T \v{Y}^3 & \cdots & {\v{X}^3}^\T \v{Y}^n \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {\v{X}^k}^\T \v{Y}^1 & {\v{X}^k}^\T \v{Y}^2 & {\v{X}^k}^\T \v{Y}^3 & \cdots & {\v{X}^k}^\T \v{Y}^n \end{bmatrix} \end{equation*} % Equivalently, define the matrix multiplication operator ${\times}: \set{F}^{n \times m} \times \set{F}^{m \times k} \mapsto \set{F}^{n \times k}$ (using juxtaposition notation) so that % \begin{equation*} \mat{Y}^\T \mat{X} \triangleq \begin{bmatrix} {\v{Y}^1}^\T \v{X}^1 & {\v{Y}^1}^\T \v{X}^2 & {\v{Y}^1}^\T \v{X}^3 & \cdots & {\v{Y}^1}^\T \v{X}^k \\ {\v{Y}^2}^\T \v{X}^1 & {\v{Y}^2}^\T \v{X}^2 & {\v{Y}^2}^\T \v{X}^3 & \cdots & {\v{Y}^2}^\T \v{X}^k \\ {\v{Y}^3}^\T \v{X}^1 & {\v{Y}^3}^\T \v{X}^2 & {\v{Y}^3}^\T \v{X}^3 & \cdots & {\v{Y}^3}^\T \v{X}^k \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ {\v{Y}^n}^\T \v{X}^1 & {\v{Y}^n}^\T \v{X}^2 & {\v{Y}^n}^\T \v{X}^3 & \cdots & {\v{Y}^n}^\T \v{X}^k \end{bmatrix} \end{equation*} % Note that while $\mat{X}^\T \mat{Y}$ and $\mat{Y}^\T \mat{X}$ are defined, all of % \begin{equation*} \mat{X} \mat{X} \qquad \mat{Y} \mat{Y} \qquad \mat{X}^\T \mat{X}^\T \qquad \mat{Y}^\T \mat{Y}^\T \qquad \mat{X} \mat{Y} \qquad \mat{Y} \mat{X} \qquad \mat{X} \mat{Y}^\T \qquad \mat{Y} \mat{X}^\T \end{equation*} % are not defined in general. However, if $k=n$ then % \begin{equation*} \mat{X} \mat{Y}^\T \qquad \mat{X}^\T \mat{Y} \end{equation*} % are always defined. However, the former results in a $k$-by-$k$ matrix and the latter results in an $m$-by-$m$ matrix and so clearly $\mat{X} \mat{Y}^\T \neq \mat{X}^\T \mat{Y}$ if $m \neq n$. In fact, this comparison is nonsense unless $k=m=n$. In that case, $\set{X}$ and $\set{Y}$ are \emph{square matrices}. If $\set{X}$ and $\set{Y}$ are $n$-by-$n$ square matrices, then so are $\mat{X}^\T$ and $\mat{Y}^\T$, and so any combination of $\mat{X}$, $\mat{Y}$, $\mat{X}^\T$, and $\mat{Y}^\T$ can be multiplied in any order. \subsection{Square Matrices} For a field $(\set{F},+,\times,0,1)$ and $n \in \N$, the square matrices in the space $\set{F}^{n \times n}$ have some special properties. \paragraph{Square Matrix Multiplication:} Take $n \in \N$ and matrices $\mat{X},\mat{Y} \in \set{F}^{n \times n}$. All of $\mat{X}$, $\mat{Y}$, $\mat{X}^\T$, and $\mat{Y}^\T$ are square $n$-by-$n$ matrices, and therefore % \begin{equation*} \mat{X} \mat{X} \qquad \mat{Y} \mat{Y} \qquad \mat{X}^\T \mat{X}^\T \qquad \mat{Y}^\T \mat{Y}^\T \qquad \mat{X} \mat{Y} \qquad \mat{Y} \mat{X} \qquad \mat{X} \mat{Y}^\T \qquad \mat{Y} \mat{X}^\T \end{equation*} % are all defined. However, in general $\mat{X} \mat{Y} \neq \mat{Y} \mat{X}$. In other words, matrix multiplication is \emph{not} communicative. However, it can be shown that matrix multiplication is associative. \paragraph{Square Matrix Identity:} Take $n \in \N$ and a vector space $\set{F}^n$. Recall the definitions of $\v{e}_i$ for all $i \in \N$ from \longref{app:math_coord_vector_space}. Define the \emph{identity matrix} $\mat{I}_n \in \set{F}^{n \times n}$ as % \begin{equation*} \I_n \triangleq \begin{bmatrix} \v{e}_1 & \v{e}_2 & \v{e}_3 & \cdots & \v{e}_n \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \end{bmatrix} \end{equation*} % Now take $m \in \N$ and matrices $\mat{X} \in \set{F}^{m \times n}$ and $\mat{Y} \in \set{F}^{n \times m}$. It can easily be verified that % \begin{equation*} \mat{X} \I_n = \mat{X} \quad \text{ and } \quad \I_n \mat{Y} = \mat{Y} \end{equation*} % In fact, for a square matrix $\mat{Z} \in \set{F}^{n \times n}$, % \begin{equation*} \mat{Z} \I_n = \I_n \mat{Z} = \mat{Z} \end{equation*} % and thus $\I_n$ is known as the \emph{identity matrix} for $n$-by-$n$ square matrices. Note that the notation \symdef{Hvectors.45}{identitymatrix}{$\I$}{the identity matrix} will often be used instead of $\I_n$ because the value of $n$ will usually be obvious in the context. \paragraph{Square Matrices as Unitary Associative Algebra:} Clearly, with the operations defined for matrices, for any $n \in \N$, the space $\set{F}^{n \times n}$ not only forms a vector space over the field $\set{F}$ but also forms a unitary associative algebra over the field $\set{F}$ (\ie, a unitary associative $\set{F}$-algebra). This means that many aspects of familiar arithmetic can be easily applied to matrices. \subsection{Matrices as Vector Functions} Take $m,n \in \N$. Note that the vector space $\set{F}^n$ can be viewed as a space of $n$-by-$1$ matrices and the covector space $\set{F}^{1 \times n}$ can be viewed as a space of $1$-by-$n$ matrices. Therefore, take a matrix $\mat{A} \in \set{F}^{m \times n}$ and vector $\v{x} \in \set{F}^n$ and covector $\v{y} \in \set{F}^{1 \times n}$. There exists vector $\v{q} \in \set{F}^m$ and covector $\v{r} \in \set{F}^{1 \times m}$ such that % \begin{equation*} \v{q} = \mat{A} \v{x} \quad \text{ and } \quad \v{r} = \v{y} \mat{A} \end{equation*} % In other words, the matrix $\mat{A}$ can be viewed as a function that translates one vector into another. \paragraph{Square Matrices as Functions:} Take $n \in \N$, vector space $\set{F}^n$, and unitary associative algebra $\mat{F}^{n \times n}$. Take a matrix $\mat{A} \in \mat{F}^{n \times n}$. Take vector $\v{x} \in \set{F}^n$ and covector $\v{y} \in \set{F}^{1 \times n}$. There exists a vector $\v{q} \in \set{F}^n$ and a covector $\v{r} \in \set{F}^{1 \times n}$ such that % \begin{equation*} \v{q} = \mat{A} \v{x} \quad \text{ and } \quad \v{r} = \v{y} \mat{A} \end{equation*} % Therefore, the matrix $\mat{A}$ can be thought of as a function which reshapes vectors from $\set{F}^n$ (or covectors from $\set{F}^{1 \times n}$) to other vectors in $\set{F}^n$ (or other covectors from $\set{F}^{1 \times n}$). Additionally, there exists a scalar $a \in \set{F}$ such that % \begin{equation*} a = \v{x}^\T \mat{A} \v{x} \end{equation*} % This is known as a \emph{quadratic form}. Thus, the square matrix $\mat{A}$ can also be thought of as a function which somehow converts vectors from $\set{F}^n$ to scalars from $\set{F}$. \subsection{The Unitary Associative Real Algebra} Take $n \in \N$. Clearly, \symdef{Bnumbers.5452}{realalgebra}{$\R^{n \times n}$}{the unitary associative real algebra} is a unitary associative algebra over the field $\R$. \paragraph{Symmetric Matrices:} Take $n \in \N$ and matrix $\mat{A} \in \R^{n \times n}$. To say that $\mat{A}$ is a \emph{symmetric (real) matrix} means that $\mat{A} = \mat{A}^\T$. \subsection{Vector Derivatives: Gradients and Hessians} For $n \in \N$ and $n$-dimensional vector $\v{x} \in \R^n$, the $n$-dimensional operator vector $\nabla_{\v{x}}$ is % \begin{equation*} \nabla_{\v{x}} \triangleq \begin{bmatrix} \frac{ \partial }{ \partial x_1 }\\ \frac{ \partial }{ \partial x_2 }\\ \vdots\\ \frac{ \partial }{ \partial x_n } \end{bmatrix} \end{equation*} % Take $n \in \N$ and function $f: \set{D} \mapsto \R$ where $\set{D} \subseteq \R^n$. The \symdef[\emph{gradient}]{Hvectors.5}{gradient}{$\nabla_{\v{x}} f(\v{x})$}{the gradient vector of function $f$ at $\v{x}$} of function $f$ at $\v{x}$ is % \begin{equation*} \nabla_{\v{x}} f(\v{x}) \triangleq \begin{bmatrix} \frac{ \partial f(\v{x}) }{ \partial x_1 }\\ \frac{ \partial f(\v{x}) }{ \partial x_2 }\\ \vdots\\ \frac{ \partial f(\v{x}) }{ \partial x_n } \end{bmatrix} \end{equation*} % That is, this a vector of the $n$ first partial derivatives of function $f$. % \begin{itemize} \item If every partial derivative that makes up the gradient exists for all $x \in \set{D}$ then $f$ is said to be \emph{differentiable}. \item If every partial derivative that makes up the gradient exists and is continuous for all $x \in \set{D}$ then $f$ is said to be \emph{continuously differentiable}. \end{itemize} % The $n$-by-$n$ operator matrix $\nabla^2_{\v{x}\v{x}} \triangleq \nabla_{\v{x}} \nabla^\T_{\v{x}}$, and the \symdef[\emph{Hessian}]{Hvectors.51}{hessian}{$\nabla^2_{\v{x}\v{x}} f(\v{x})$}{the Hessian matrix of function $f$ at point $\v{x}$} of function $f$ at $\v{x}$ is % \begin{equation*} \nabla^2_{\v{x}\v{x}} f(\v{x}) \triangleq \begin{bmatrix} \frac{ \partial^2 f(\v{x})}{ \partial x_1 \partial x_1 } & \frac{ \partial^2 f(\v{x})}{ \partial x_1 \partial x_2 } & \cdots & \frac{ \partial^2 f(\v{x})}{ \partial x_1 \partial x_n } \\ \frac{ \partial^2 f(\v{x})}{ \partial x_2 \partial x_1 } & \frac{ \partial^2 f(\v{x})}{ \partial x_2 \partial x_2 } & \cdots & \frac{ \partial^2 f(\v{x})}{ \partial x_2 \partial x_n } \\ \vdots & \vdots & \ddots & \vdots\\ \frac{ \partial^2 f(\v{x})}{ \partial x_n \partial x_1 } & \frac{ \partial^2 f(\v{x})}{ \partial x_n \partial x_2 } & \cdots & \frac{ \partial^2 f(\v{x})}{ \partial x_n \partial x_n } \end{bmatrix} \end{equation*} % That is, this is a matrix of the $n^2$ second partial derivatives of function $f$. Note that if the function $f$ is continuous then its Hessian matrix is symmetric. % \begin{itemize} \item If every partial derivative that makes up the Hessian exists for all $x \in \set{D}$ then $f$ is said to be \emph{twice differentiable}. \item If every partial derivative that makes up the Hessian exists and is continuous for all $x \in \set{D}$ then $f$ is said to be \emph{twice continuously differentiable}. \end{itemize} \subsection{Euclidean Convexity} \label{app:math_euclidean_convexity} Take $n \in \N$ and the Euclidean space $\R^n$. Also take a set $\set{X} \subseteq \R^n$. The set $\set{X}$ is said to be \emph{convex (over $\R^n$)} if % \begin{equation*} t \v{x} + (1-t) \v{y} \in \set{X} \end{equation*} % for all $\v{x},\v{y} \in \set{X}$ with $\v{x} \neq \v{y}$ and all $t \in (0,1)$. \paragraph{Convex Sets of Scalars:} Consider the Euclidean space $\R$ (\ie, $\R^n$ with $n=1$). It is easy to show that all of the convex sets of $\R$ take the form % \begin{equation*} [a,b] \text{ or } (a,b] \text{ or } [a,b) \text{ or } (a,b) \end{equation*} % or % \begin{equation*} [a,\infty) \text{ or } (a,\infty) \text{ or } (-\infty,b] \text{ or } (-\infty,b) \end{equation*} % or % \begin{equation*} (-\infty,\infty) \end{equation*} % for all $a,b \in \R$. Therefore, all intervals of $\R$ are convex sets. In fact, $\R$ is trivially a convex set. \paragraph{Cartesian Products of Convex Sets:} The Cartesian product of convex sets is convex. For example, for $a,b,c,d \in \R$, the set % \begin{equation*} [a,b] \times (c,\infty) \times (-\infty,d] \end{equation*} % is convex subset of $\R^3$ because it is the Cartesian product of three convex sets of $\R$ (\ie, intervals of $\R$). Clearly, $\R^n$ is a convex set for all $n \in \N$ since $\R$ is trivially a convex set. \paragraph{Functions on Convex Sets:} Take $n \in \N$ and a function $f: \set{E} \mapsto \R$ where $\set{E} \subseteq \R^n$ is a convex set. Take some $\v{x}^* \in \set{E}$ and assume that there exists a $\varepsilon \in \R_{>0}$ such that % \begin{equation*} f(\v{x}^*) \leq f(\v{y}) \text{ for all } \v{y} \in \set{E} \setdiff \{\v{x}^*\} \text{ with } \| \v{x}^* - \v{y} \| < \varepsilon \end{equation*} % In this case, $\v{x}^*$ is called a \emph{local minimum} of set $f$. If $\v{x}^*$ is a local minimum then % \begin{equation} ( \nabla_{\v{x}} f(\v{x}^*) )^\T (\v{x} - \v{x}^*) \geq 0 \label{eq:convex_function_if} \end{equation} % for all $\v{x} \in \set{E}$. Therefore, this is a \emph{necessary condition} for a point to be a local minimum of a function over a convex set. \paragraph{Convex Functions:} Take $n \in \N$ and a function $f: \set{D} \mapsto \R$ where $\set{D} \subseteq \R^n$. Also take a convex set $\set{E} \subseteq \set{D}$. To say that the function $f$ is \emph{convex over (the convex set) $\set{E}$} means that % \begin{equation*} f(t \v{x} + (1-t) \v{y} ) \leq t f(\v{x}) + (1-t) f(\v{y}) \end{equation*} % for all $\v{x},\v{y} \in \set{E}$ with $\v{x} \neq \v{y}$ and all $t \in (0,1)$. To say that the function $f$ is \emph{strictly convex over (the convex set) $\set{E}$} means that % \begin{equation*} f(t \v{x} + (1-t) \v{y} ) < t f(\v{x}) + (1-t) f(\v{y}) \end{equation*} % for all $\v{x},\v{y} \in \set{E}$ with $\v{x} \neq \v{y}$ and all $t \in (0,1)$. Take function $f$ defined above to be convex over $\set{E}$. Note the following statements. % \begin{itemize} \item If $\set{D} = \set{E}$ then $f$ is simply called \emph{convex}. Similarly, when this is the case, all references to $\set{E}$ below may be omitted. Therefore, the restriction of $f$ to $\set{E}$ (\ie, $f|_\set{E}$) may be simply called convex. \item If $f^*$ is defined such that $f^*(\v{x}) \triangleq -f(\v{x})$ for all $\v{x} \in \set{D}$ then $f^*$ is called a \emph{concave} function over (the convex set) $\set{E}$. Take such $f^*$. If $f^*$ is also convex over $\set{E}$ then $f$ and $f^*$ must both be \emph{affine} functions. That is, there exists some $\v{a} \in \R^n$ and some $b \in \R$ such that % \begin{equation*} f(\v{x}) = \v{a}^\T \v{x} + b \quad \text{ and } \quad f^*(\v{x}) = -\v{a}^\T \v{x} - b \end{equation*} % for all $\v{x} \in \set{D}$. An affine function defined over a convex set is always both convex and concave over that convex set. \item Assume that there exists some $\varepsilon \in \R_{>0}$ and some $\v{x}^* \in \set{E}$ such that % \begin{equation*} f(\v{x}^*) \leq f(\v{y}) \text{ for all } \v{y} \in \set{E} \setdiff \{\v{x}^*\} \text{ with } \|\v{x}^*-\v{y}\|_2 < \varepsilon \end{equation*} % Take such a $\v{x}^*$. In this case, $\v{x}^*$ is called a \emph{local minimum} of function $f$. However, since function $f$ is convex over $\set{E}$, it is the case that % \begin{equation*} f(\v{x}^*) \leq f(\v{y}) \text{ for all } \v{y} \in \set{E} \setdiff \{\v{x}^*\} \end{equation*} % In other words, $\v{x}^*$ can be called a \emph{global minimum} of function $f$ over $\set{E}$. \item Assume that there exists some $\varepsilon \in \R_{>0}$ and some $\v{x}^* \in \set{E}$ such that % \begin{equation*} f(\v{x}^*) < f(\v{y}) \text{ for all } \v{y} \in \set{E} \setdiff \{\v{x}^*\} \text{ with } \|\v{x}^*-\v{y}\|_2 < \varepsilon \end{equation*} % Take such a $\v{x}^*$. In this case, $\v{x}^*$ is called a \emph{strict local minimum} of function $f$. Clearly, $\v{x}^*$ is also a local minimum of function $f$. However, since function $f$ is convex over $\set{E}$, it is the case that % \begin{equation*} f(\v{x}^*) < f(\v{y}) \text{ for all } \v{y} \in \set{E} \setdiff \{\v{x}^*\} \end{equation*} % In other words, $\v{x}^*$ can be called a \emph{strict global minimum} of function $f$ over $\set{E}$ (and, of course, a global minimum of function $f$ over $\set{E}$ as well). \item The point $\v{x}^* \in \set{E}$ is a local minimum of $f$ over convex set $\set{E}$ if and only if % \begin{equation} ( \nabla_{\v{x}} f(\v{x}^*) )^\T (\v{x} - \v{x}^*) \geq 0 \label{eq:convex_function_iff} \end{equation} % for all $\v{x} \in \set{E}$. This condition is both necessary \emph{and sufficient} for a local minimum because $\set{E}$ is a convex set \emph{and} $f$ is a convex function over $\set{E}$. This condition is necessary for all functions defined over convex sets. However, it becomes sufficient when those functions are also convex. \item The point $\v{x}^* \in \interior(\set{E})$ is a local minimum of $f$ over convex set $\set{E}$ if and only if % \begin{equation*} \nabla_{\v{x}} f(\v{x}^*) = 0 \end{equation*} % This is equivalent to the condition in \longref{eq:convex_function_iff} when $\v{x}^*$ is an element of the \emph{interior} of convex set $\set{E}$. Again, note that this condition is both necessary \emph{and sufficient} for a point to be a local minimum of $f$ over convex set $\set{E}$. % \item If $f$ is strictly convex over $\set{E}$, not only is every local minimum a global minimum, but there can be at most one global minimum of $f$. Therefore, if a local minimum has been found, it must be the global minimum of function $f$. \end{itemize} \paragraph{Sufficiency Conditions for Convexity:} Take $n \in \N$ and a function $f: \set{D} \mapsto \R$ where $\set{D} \subseteq \R^n$. Also take a convex set $\set{E} \subseteq \set{D}$, and assume that $f$ is twice continuously differentiable. If it is the case that % \begin{equation*} \v{\Delta}^\T \nabla^2_{\v{x}\v{x}} f(\v{x}) \v{\Delta} \geq 0 \text{ for all } \v{\Delta} \in \R^n \text{ and } \v{x} \in \set{E} \end{equation*} % then $f$ is convex over $\set{E}$. Additionally, if it is the case that % \begin{equation*} \v{\Delta}^\T \nabla^2_{\v{x}\v{x}} f(\v{x}) \v{\Delta} > 0 \text{ for all } \v{\Delta} \in \R^n \setdiff \{0\} \text{ and } \v{x} \in \set{E} \end{equation*} % then $f$ is strictly convex over $\set{E}$. \section{Measure Theory and Integration} \label{app:math_measure} Measure theory provides a method for measuring the size of sets. Our treatment of measure theory will be relatively sparse; however, it is necessary in order to discuss probability, the subject of \longref{app:math_probability}. Our definitions are based on the ones given by \citet{Krantz01} and \citet{Rudin76}. \Citet{Halmos50} gives a more complete treatment. \subsection{Sigma Algebras} Take a set $\set{U}$ and a set of sets $\setset{S} \subseteq \Pow(\set{U})$ (\ie, $\setset{S}$ is a set of subsets of $\set{U}$). Assume that $\setset{S}$ is such that % \begin{enumerate}[(i)] \item $\setset{S} \neq \emptyset$ \label{item:sigma_nonempty} \item if $\set{X} \in \setset{S}$ then $\set{U} \setdiff \set{X} \in \setset{S}$ \label{item:sigma_closed_complement} \item for a sequence $(\set{X}_n)$ where $\set{X}_n \in \setset{S}$ for all $n \in \N$, $\bigcup \{ \set{X}_i : i \in \N \} \in \setset{S}$ \label{item:sigma_closed_countable_union} \end{enumerate} % Property (\shortref{item:sigma_nonempty}) states that $\setset{S}$ is nonempty. Property (\shortref{item:sigma_closed_complement}) states that $\setset{S}$ is closed under complements. Property (\shortref{item:sigma_closed_countable_union}) states that $\setset{S}$ is closed under countable unions. It is clear that % \begin{itemize} \item $\emptyset \in \setset{S}$ \item $\set{U} \in \setset{S}$ \item for a sequence $(\set{X}_n)$ where $\set{X}_n \in \setset{S}$ for all $n \in \N$, $\bigcap \{ \set{X}_i : i \in \N \} \in \setset{S}$ \item $(\setset{S}, {\cap}, {\cup}, {{}^c}, \set{X}, \emptyset)$ is an algebra of sets (\ie, $\setset{S}$ is an algebra over $\set{U}$ and so $(\set{U},\setset{S})$ is a field of sets) \end{itemize} % Thus, $\setset{S}$ is called a \emph{$\sigma$-algebra} (\ie, a \emph{sigma algebra}) or a \emph{$\sigma$-ring} (\ie, a \emph{sigma ring}) and the field of sets $(\set{U},\setset{S})$ is called a \emph{$\sigma$-field} (\ie, a \emph{sigma field}). Of course, $(\set{U}, \Pow(\set{U}))$ is trivially a $\sigma$-field. In particular, take the finite set $\set{U} = \{ a,b,c,d \}$. Some possible $\sigma$-algebras for $\set{U}$ include % \begin{itemize} \item $\{\emptyset, \{ a, b, c, d \}\}$ \item $\{\emptyset, \{ a, b \}, \{ c, d \}, \{ a, b, c, d \}\}$ \item $\{\emptyset, \{ a, c \}, \{ b, d \}, \{ a, b, c, d \}\}$ \item $\{\emptyset, \{ a, d \}, \{ b, c \}, \{ a, b, c, d \}\}$ \item $\{\emptyset, \{ a \}, \{ b, c, d \}, \{ a, b, c, d \}\}$ \item $\{\emptyset, \{ a, b, c \}, \{ d \}, \{ a, b, c, d \}\}$ \end{itemize} % However, there are many more (in fact, there are $16$ total for this four element set). All are closed under complements and countable unions. Because of this, any $\sigma$-algebra that includes all of the singleton sets is necessarily the power set. Of course, all include the empty set and the universal set. \paragraph{Sigma Notation:} Very often $\sigma$-algebras will be denoted with the Greek uppercase letter $\Sigma$. It is not a coincidence that this is similar to the summation symbol $\sum$. Recall that all $\sigma$-algebras are closed under countable unions (\ie, property (\shortref{item:sigma_closed_countable_union}) above). Later, we will introduce a \emph{measure} which is a function which maps sets from $\sigma$-algebras to positive extended real numbers. In other words, measures quantify some notion of \emph{size} to sets. Specifically because $\sigma$-algebras are closed under countable unions, measures are \emph{countably additive}. That is, the union of a sequence of sets from a $\sigma$-algebra has a size equal to the sum of the size of each of its elements. This relationship to summation is the reason why $\sigma$-algebras are denoted with $\Sigma$; of course, this is also the reason why they are given they are called \emph{sigma} algebras. \subsection{The Borel Algebra} Take a topological space $(\set{U},\setset{T})$. Recall that $\setset{T}$ is by definition the set of all of open sets in the topological space. Assume that there exists a $\sigma$-algebra $\setset{B}$ of $\set{U}$ such that % \begin{itemize} \item $\setset{T} \subseteq \setset{B}$ \item for any $\sigma$-algebra $\setset{A}$ of $\set{U}$ such that $\setset{T} \subseteq \setset{A}$, $\setset{A} \cap \setset{B} = \setset{B}$ \end{itemize} % In other words, $\setset{B}$ is the smallest $\sigma$-algebra that includes all open sets of $\set{U}$. Because $\Pow(\set{U})$ is a $\sigma$-algebra of $\set{U}$ and $\setset{T} \subseteq \Pow(\set{U})$ then $\setset{B}$ must exist. In this case, $\setset{B}$ is the \emph{Borel algebra} of $\set{U}$ and will be denoted \symdef{Iprob.3}{borelalgebra}{$\Borel(\set{U})$}{the Borel algebra of set $\set{U}$}; that is, % \begin{equation*} \Borel(\set{U}) \triangleq \setset{B} \end{equation*} % Any subset $\setset{S} \subseteq \Borel(\set{U})$ is called a \emph{Borel subset} and any set $\set{E} \in \Borel(\set{U})$ is called a \emph{Borel set}. Additionally, $(\set{U},\Borel(\set{U}))$ is called a \emph{Borel $\sigma$-field} or simply a \emph{Borel field}. \paragraph{Generalized Construction of Borel Algebra:} Take a topological space $(\set{U},\setset{T})$. Define $\setset{B}_0 \triangleq \setset{T}$. That is, $\setset{B}_0$ is a set of all of the open sets of $\set{U}$. Now, define $\setset{B}_n$ for all $n \in \N$ such that % \begin{enumerate}[(i)] \item $\setset{B}_{n-1} \subseteq \setset{B}_n$ \item for all $\set{B} \in \setset{B}_{n-1}$, $\set{U} \setdiff \set{B} \in \setset{B}_n$ (\ie, $\set{B}^c \in \setset{B}_n$) \item for any $\setset{S} \subseteq \setset{B}_{n-1}$, $\bigcap \setset{S} \in \setset{B}_n$ \item for any $\setset{S} \subseteq \setset{B}_{n-1}$, $\bigcup \setset{S} \in \setset{B}_n$ \end{enumerate} % The Borel algebra $\Borel(\set{U})$ is the set that results from continuing this process \adinfinitum{}. In other words, $\Borel(\set{U})$ can be viewed as $\setset{B}_\infty$. It is the case that % \begin{itemize} \item $\setset{T} \subseteq \Borel(\set{U})$ \item $\Borel(\set{U})$ is a $\sigma$-algebra of $\set{U}$ (\ie, $(\set{U},\Borel(\set{U}))$ is a $\sigma$-field) \item for any $\sigma$-algebra $\setset{A}$ of $\set{U}$ such that $\setset{T} \subseteq \setset{A}$, $\Borel(\set{U}) \cap \setset{A} = \Borel(\set{U})$ (\ie, $\Borel(\set{U}) \subseteq \setset{A}$) \end{itemize} % These are the traits desired to call $\Borel(\set{U})$ the Borel algebra of $\set{U}$. \paragraph{Construction of Borel Algebra of the Extended Reals:} Take the extended real topological space $\extR$. Define $\setset{B}_0$ as all intervals of the reals. That is, % \begin{equation*} \setset{B}_0 \triangleq \setset{B}_{00} \cup \setset{B}_{01} \cup \setset{B}_{10} \cup \setset{B}_{11} \end{equation*} % where % \begin{align*} \setset{B}_{00} &\triangleq \{ (a,b) : a,b \in \extR, a \leq b \}\\ \setset{B}_{01} &\triangleq \{ (a,b] : a,b \in \extR, a \leq b \}\\ \setset{B}_{10} &\triangleq \{ [a,b) : a,b \in \extR, a \leq b \}\\ \setset{B}_{11} &\triangleq \{ [a,b] : a,b \in \extR, a \leq b \} \end{align*} % That is, $\setset{B}_0$ is a set of all of the intervals of $\extR$. Now, define $\setset{B}_n$ for all $n \in \N$ such that % \begin{enumerate}[(i)] \item $\setset{B}_{n-1} \subseteq \setset{B}_n$ \item for all $\set{B} \in \setset{B}_{n-1}$, $\set{U} \setdiff \set{B} \in \setset{B}_n$ (\ie, $\set{B}^c \in \setset{B}_n$) \item for any $\setset{S} \subseteq \setset{B}_{n-1}$, $\bigcap \setset{S} \in \setset{B}_n$ \item for any $\setset{S} \subseteq \setset{B}_{n-1}$, $\bigcup \setset{S} \in \setset{B}_n$ \end{enumerate} % The Borel algebra $\Borel(\extR)$ is the set that results from continuing this process \adinfinitum{}. In other words, $\Borel(\extR)$ can be viewed as $\setset{B}_\infty$. It is the case that % \begin{itemize} \item $\setset{B}_0 \subseteq \Borel(\extR)$ \item $\Borel(\extR)$ is a $\sigma$-algebra of $\extR$ (\ie, $(\extR,\Borel(\extR)$ is a $\sigma$-field) \item for any $\sigma$-algebra $\setset{A}$ of $\R$ such that $\setset{B}_0 \subseteq \setset{A}$, $\setset{B} \cap \setset{A} = \Borel(\extR)$ (\ie, $\Borel(\extR) \subseteq \setset{A}$) \end{itemize} % These are the traits desired to call $\Borel(\extR)$ the Borel algebra of $\extR$. In other words, $\Borel(\extR)$ is the smallest $\sigma$-algebra of $\extR$ that includes all of the intervals of $\extR$. \paragraph{Half-Line Construction of Extended Real Borel Algebra:} It is important to note that the Borel algebra of $\extR$ can also be said to be the smallest $\sigma$-algebra of $\extR$ that includes intervals of the form $[-\infty,a]$ where $a \in \extR$. Take $\set{R}$ to be the set of these \emph{half lines}; that is, % \begin{equation*} \set{R} \triangleq \{ [-\infty,a] : a \in \extR \} \end{equation*} % The Borel algebra $\Borel(\extR)$ is a $\sigma$-algebra such that $\set{R} \subseteq \Borel(\extR)$. In fact, $\Borel(\extR) \subseteq \setset{A}$ for any $\sigma$-algebra $\setset{A}$ of $\extR$ such that $\set{R} \subseteq \setset{A}$. Therefore, any Borel set $\set{E} \in \Borel(\extR)$ can be constructed with a countable number of unions, intersections, and complements of elements from $\set{R}$ (\ie, the half lines). \subsection{Measures} Take a $\sigma$-field $(\set{U},\Sigma)$. Define a function $\mu: \Sigma \mapsto [0,\infty]$, where interval $[0,\infty] \subset \extR$. Assume that % \begin{enumerate}[(i)] \item $\mu( \emptyset ) = 0$ \label{item:measure_zero} \item for a sequence $(\set{X}_n)$ where $\set{X}_n \in \Sigma$ for all $n \in \N$ and $\set{X}_i \cap \set{X}_j = \emptyset$ for all $i,j \in \N$ with $i \neq j$, it is the case that % \begin{equation*} \mu\left( \bigcup \{ \set{X}_i : i \in \N \} \right) = \sum\limits_{i=1}^\infty \mu\left( \set{X}_i \right) \end{equation*} \label{item:measure_countable_additivity} \end{enumerate} % In this case, % \begin{itemize} \item $\mu$ is called a \emph{measure} \item $(\set{U},\Sigma,\mu)$ is a \emph{measure space} \item any set $\set{X} \in \Sigma$ is called a \emph{measurable set} \end{itemize} % Property (\shortref{item:measure_zero}) states that the empty set has \emph{measure zero}. Any set $\set{X} \in \Sigma$ such that $\mu(\set{X}) = 0$ is said to have \emph{measure zero} or is said to be a \emph{null set} or simply \emph{null}. Property (\shortref{item:measure_countable_additivity}) is called \emph{countable additivity}. \paragraph{Singleton Notation:} Take $(\set{U},\Sigma,\mu)$ to be a measure space. Take a point $x \in \set{U}$ such that the singleton set $\{x\} \in \Sigma$. For simplicity, we will use the notation % \begin{equation*} \mu(x) \triangleq \mu( \{x\} ) \end{equation*} % That is, the measure of a single point is defined to be the measure of the singleton set that includes that point. \subsection{Measurable Functions} Take $\sigma$-fields $(\set{X},\Sigma_\set{U})$ and $(\set{Y},\Sigma_\set{Y})$ and a function $f: \set{U} \mapsto \set{Y}$. To say that the function $f$ is \emph{measurable} means that for all $\set{B} \in \Sigma_\set{Y}$, the preimage $f^{-1}[\set{B}] \in \Sigma_\set{U}$. Measurable functions will typically have real codomains, and so they can be viewed as mapping sets into numbers so that the size of the domain sets can be measured with respect to some numerical measure. That is, measurable functions combined with measures provide a way to quantify the size of measurable sets. This will be explained further in \longref{app:math_lebesgue_integral}. \paragraph{Borel measurable:} Take $\sigma$-field $(\set{X},\Sigma_\set{X})$ and Borel field $(\set{Y},\Borel(\set{Y}))$ and a function $f: \set{X} \mapsto \set{Y}$. To say that the function $f$ is \emph{Borel measurable} means that for all $\set{B} \in \Borel(\set{Y})$, the preimage $f^{-1}[\set{B}] \in \Sigma_\set{X}$. \paragraph{Real-valued measurable function:} When a function that is (extended) real-valued is said to be measurable, it is conventional to assume that Borel measurability is implied. That is, for $\sigma$-field $(\set{X},\Sigma_\set{X})$ and measurable function $f: \set{X} \mapsto \extR$, it is usually assumed (as it will be here) that the measurable sets of the codomain $\extR$ are the Borel sets of $\extR$ (\ie, $\Borel(\extR)$ is the applicable $\sigma$-algebra). \paragraph{Almost Everywhere Equivalence:} Take $\sigma$-field $(\set{X},\Sigma_\set{X})$ and measure space $(\set{Y},\Sigma_\set{Y},\mu)$. Also take measurable functions $f: \set{X} \mapsto \set{Y}$ and $g: \set{X} \mapsto \set{Y}$ and a set $\set{E} \in \Sigma_\set{X}$. To say that $f$ and $g$ are equivalent \emph{almost everywhere on $\set{E}$} or \emph{essentially equivalent on $\set{E}$} means that $\mu( x \in \set{E} : f(x) \neq g(x)) = 0$. That is, two functions are equal almost everywhere if the set where they differ is null. \subsection{The Lebesgue Integral} \label{app:math_lebesgue_integral} Given a measurable function mapping measurable sets into the Borel field of the reals, the Lebesgue integral provides a way of measuring the volume of the image of such a set. \paragraph{Characteristic Function:} Take a set $\set{U}$ and subset $\set{X} \subseteq \set{U}$. Denote the \emph{characteristic function} or \emph{indicator function} for set $\set{X}$ by $K_\set{X}$. Define $K_\set{X}: \set{U} \mapsto \R$ with % \begin{equation*} K_\set{X}(x) \triangleq \begin{cases} 1 &\text{if } x \in \set{X}\\ 0 &\text{otherwise} \end{cases} \end{equation*} % for all $x \in \set{U}$. \paragraph{Simple Functions:} Take $\sigma$-field $(\set{U},\Sigma)$ and a function $s: \set{U} \mapsto \R$. To say that $s$ is a simple function means that $\range(f)$ is a finite set of real numbers. Assume that $s$ is a simple function. Without loss of generality, assume that $n \in \N$ and $\range(s) = \{ c_1, c_2, c_3, \dots, c_n \}$ where $c_i \in \R$ for each $i \in \{1,2,3,\dots,n\}$. In this case, for each $i \in \{1,2,3,\dots,n\}$, define the set $\set{X}_i$ such that % \begin{equation*} \set{X}_i \triangleq \{ x \in \set{U} : s(x) = c_i \} \end{equation*} % Therefore, the function $s$ is % \begin{equation*} s(x) = \sum\limits_{i=1}^n c_i K_{\set{X}_i}(x) \end{equation*} % Every simple function can be written as the sum of a finite number of characteristic functions each multiplied by some real number. Now take $(\set{U},\Sigma,\mu)$ to be a measure space and take a set $\set{X} \in \Sigma$. Denote the \emph{(Lebesgue) integral} of simple function $s$ over $\set{X}$ with respect to measure $\mu$ as $\int_\set{X} s$, which is defined by % \begin{equation*} \int_\set{X} s \total \mu \triangleq \sum\limits_{i=1}^n c_i \mu( \set{X} \cap \set{X}_i ) \end{equation*} % Note that it will be consistent to define $0 \times \infty = \infty \times 0 = 0$ for the integral, as is commonly done in measure theory. Additionally, sometimes the notation % \begin{equation*} \int_\set{X} s(x) \total \mu(x) \triangleq \int_\set{X} s \total \mu \end{equation*} % will be used instead. There is little value to this notation here; however, when functions of multiple variables are defined, it can be a helpful way to avoid confusion when functions have multiple variables. \paragraph{The Integral:} Take measure space $(\set{U},\Sigma,\mu)$, a measurable function $g: \set{U} \mapsto \extR$, and a set $\set{X} \in \Sigma$. Assume that for all $x \in \set{U}$, $g(x) \geq 0$. That is, assume that $g$ is non-negative. Define the \emph{(Lebesgue) integral} of measurable non-negative function $g$ over $\set{X}$ with respect to measure $\mu$ by % \begin{equation*} \int_\set{X} g \total \mu \triangleq \sup \left\{ \int_\set{X} s \total \mu : s \text{ is a simple function with } 0 \leq s \leq g \right\} \end{equation*} % where $0 \leq s \leq g$ indicates that for all $x \in \set{X}$, $0 \leq s(x) \leq g(x)$ and the integral following the supremum was defined above for simple functions. Note that if $g$ is simple, this agrees with the definition already given for simple functions. Now take a measurable function $f: \set{U} \mapsto \extR$ and define non-negative measurable functions $f^+: \set{U} \mapsto \extR$ and $f^-: \set{U} \mapsto \extR$ by % \begin{equation*} f^+(x) \triangleq \max \{ f(x), 0 \} \quad \text{ and } \quad f^-(x) \triangleq -\min \{ f(x), 0 \} \end{equation*} % Finally, define the \emph{(Lebesgue) integral} of measurable function $f$ over $\set{X}$ with respect to measure $\mu$ by % \begin{equation*} \int_\set{X} f \total \mu \triangleq \int_\set{X} f^+ \total \mu - \int_\set{X} f^- \total \mu \end{equation*} % where the two integrals on the right were defined above for non-negative measurable functions. Note that extended real arithmetic should be used to evaluate this integral. It may be that $\int_\set{X} f \total \mu$ % \begin{itemize} \item exists and is finite \item exists and is $\infty$ \item exists and is $-\infty$ \item does not exist \end{itemize} % If the integral is finite, then $f$ is said to be \emph{Lebesgue measurable with respect to measure $\mu$}. Again, sometimes the alternate notation % \begin{equation*} \int_\set{X} f(x) \total \mu(x) \triangleq \int_\set{X} f \total \mu \end{equation*} % will be used. This notation adds little value here. However, it will be useful when functions of multiple variables are used. See \longref{app:math_convolution} for an example. \paragraph{Useful Properties of Integrals:} Take measure space $(\set{U},\Sigma,\mu)$ and set $\set{X} \in \Sigma$. Note that % \begin{equation*} \mu(\set{X}) = \int_\set{X} \total \mu \end{equation*} % Now take a measurable function $f: \set{U} \mapsto \extR$. If $\mu(\set{X}) = 0$ then % \begin{equation*} \int_\set{X} f \total \mu = 0 \end{equation*} % Now take an additional measurable function $g: \set{U} \mapsto \extR$ and assume that $f$ and $g$ are equal almost everywhere on $\set{X}$. In that case, % \begin{equation*} \int_\set{E} f \total \mu = \int_\set{E} g \total \mu \end{equation*} % for all $\set{E} \subseteq \set{X}$ where the integrals exist. \subsection{The Lebesgue Measure} Note that any measure can be used with the Lebesgue integral. However, it is common to use the Lebesgue measure, denoted $m$. Take some $\set{X} \subseteq \extR$ an define the outer measure $m^*: \Pow(\extR) \mapsto \extR$ as % \begin{equation*} m^*( \set{X} ) \triangleq \inf \left\{ \sum_{i=1}^\infty ( b_i - a_i ) : \set{X} \subseteq \bigcup \{ [a_i,b_i] : i \in \N \} \right\} \end{equation*} % In other words, $m^*$ is the greatest lower bound of the sums of the lengths of countable unions of intervals that cover $\set{X}$. This is called an \emph{outer measure} of the lengths of intervals. To say that $\set{X}$ is \emph{Lebesgue measurable} means that % \begin{equation*} m^*( \set{E} ) = m^*( \set{E} \cap \set{X} ) + m^*( \set{X} \setdiff \set{E} ) \end{equation*} % for all $\set{E} \in \Pow(\extR)$. Define the set $\setset{L}$ by % \begin{equation*} \setset{L} \triangleq \{ \set{X} \in \Pow(\extR) : m^*( \set{E} ) = m^*( \set{E} \cap \set{X} ) + m^*( \set{X} \setdiff \set{E} ) \text{ for all } \set{E} \in \Pow(\extR) \} \end{equation*} % this is the set of all Lebesgue measurable sets. Note that % \begin{itemize} \item the set $\setset{L}$ is a $\sigma$-algebra on $\extR$ \item $(\extR,\setset{L})$ is a $\sigma$-field \item all of the Borel sets of extended reals are Lebesgue measurable (\ie, $\Borel(\extR) \subseteq \setset{L}$) \end{itemize} % The \emph{Lebesgue measure} $m: \setset{L} \mapsto \extR$ is defined to be % \begin{equation*} m( \set{X} ) = m^*(\set{X}) \end{equation*} % for all $\set{X} \in \setset{L}$. Note that for all $a,b \in \extR$ with $a \leq b$, % \begin{equation*} m( (a,b) ) = m( [a,b) ) = m( (a,b] ) = m( [a,b] ) = b - a \end{equation*} % That is, all intervals with the same endpoints have equal measure, and that measure is the difference in the endpoints. Additionally, for all $a \in \extR$, % \begin{equation*} m( \{a\} ) = 0 \end{equation*} % That is, all singleton sets have zero measure. Note that a countable set of points is simply a countable union of singletons. Since measures are countably additive and singletons have measure zero, then any countable set of points is also going to have measure zero. That is, for all sequences $(x_n)$ where $x_i \in \extR$ for all $i \in \N$, % \begin{equation*} m( \{ x_i : i \in \N \} ) = 0 \end{equation*} % In general, to say that a subset $\set{E} \in \setset{L}$ is \emph{Lebesgue null} means that $m(\set{E})=0$, where $m$ is the Lebesgue measure. Thus, all countable subsets of $\extR$ are Lebesgue null. \paragraph{Implied Measure Notation:} Take measure space $(\set{U},\Sigma,\mu)$, a measurable function $f: \set{U} \mapsto \extR$, and a set $\set{X} \in \Sigma$. The Lebesgue integral of $f$ over $\set{X}$ with respect to measure $m$ (\ie, the Lebesgue measure) would typically be denoted % \begin{equation*} \int_\set{X} f \total m \end{equation*} % However, because the Lebesgue measure is the standard measure for the Lebesgue integral, sometimes the notation % \begin{equation*} \int_\set{X} f(x) \total x \triangleq \int_\set{X} f(x) \total \mu(x) = \int_\set{X} f \total m \end{equation*} % is used instead. Additionally, since $\set{X}$ can be represented as a countable number of unions of other elements of the Borel algebra on $\extR$, it is very often that the integral will be taken over an interval. \symdef[]{Iprob.4}{integral}{$\int_a^b f(x) \total x$}{the Lebesgue integral of function $f$ over interval $[a,b] \subset \extR$ with respect to the Lebesgue measure}Thus, when $\set{X}$ is an interval of $\extR$ with endpoints there is a $a,b \in \extR$ with $a \leq b$, % \begin{equation*} \int_a^b f(x) \total x \triangleq \int_\set{X} f(x) \total x \end{equation*} % This is the familiar form of the integral. \subsection{Dirac Delta Measure} Take $\sigma$-field $(\set{U},\Sigma)$ and a point $a \in \set{U}$. Define the function (which is indexed by $a$) \symdef[]{Iprob.5}{diracdelta}{$\delta_a(\set{E})$}{Dirac delta measure of set $\set{E}$ at point $a$ (\eg, $f(0) = \linebreak[4] \int_{-1}^1 f(x) \delta_0(\{x\}) \total x$)}\symdef[]{Iprob.50}{diracdeltasimp}{$\delta(x-p)$}{Simplified Dirac delta measure notation (\ie, $\delta(x-p) \triangleq \delta_p(\{x\})$)}$\delta_a: \Sigma \mapsto \extR$ by % \begin{equation*} \delta_a( \set{X} ) \triangleq \begin{cases} 1 &\text{if } a \in \set{X}\\ 0 &\text{otherwise} \end{cases} \end{equation*} % It is easy to verify that $\delta_a$ is a measure for $(\set{U},\Sigma)$, and so $(\set{U},\Sigma,\delta_a)$ forms a measure space. The measure $\delta_a$ is called the \emph{Dirac delta measure at $a$}. \paragraph{Integral Mass Notation:} Take a point $p \in \R$. Recall that $\{p\}$ has Lebesgue measure $0$. That is, $\{p\}$ is Lebesgue null and therefore has no Lebesgue mass. However, $\{p\}$ has measure $1$ with respect to the Dirac measure at $p$. Therefore, Dirac measures are often added to Lebesgue measures in order to add \emph{point mass}. To simplify notation, it is conventional to take % \begin{equation} \int_a^b f(x) \delta_p(\{x\}) \total x \triangleq \int_{[a,b]} f \total \delta_p = \begin{cases} f(p) &\text{if } p \in [a,b]\\ 0 &\text{otherwise} \end{cases} \label{eq:dirac_convention} \end{equation} % This way, the Dirac delta function can be viewed as forcing mass into the Lebesgue measure on sets that are typically Lebesgue null. \paragraph{Singleton Notation for Reals:} Take a $\sigma$-field $(\R,\Sigma)$. Take a point $x \in \R$ such that the singleton set $\{x\} \in \R$. For simplicity, some use the notation % \begin{equation*} \delta(x) \triangleq \delta_0( \{x\} ) \end{equation*} % Note that for a point $a \in \R$, % \begin{equation*} \delta(x-a) = \delta_0( \{x-a\} ) = \delta_a( \{x\} ) \end{equation*} % These notations simplify the convention shown in \longref{eq:dirac_convention}. That is, % \begin{align*} \int_a^b f(x) \delta(x-p) \total x &= \int_a^b f(x) \delta_p(\{x\}) \total x\\ &= \begin{cases} f(p) &\text{if } p \in [a,b]\\ 0 &\text{otherwise} \end{cases} \end{align*} \subsection{Convolution} \label{app:math_convolution} Take the $\sigma$-field $(\set{X},\Sigma)$ and the measure $\mu: \Sigma \mapsto [0,\infty]$. Assume that $(\set{X},{+})$ is a group where $+$ is an addition operator (and thus the $-$ operator notation is defined as the addition of the additive inverse). Also take subset $\set{D} \subseteq \set{X}$ and the measurable functions $f: \set{D} \mapsto \extR$ and $g: \set{D} \mapsto \extR$. From function $g$, define function $g^*: \extR \mapsto \extR$ by % \begin{equation*} g^*(t) \triangleq \begin{cases} g(t) &\text{if } t \in \set{D}\\ 0 &\text{if } t \in \extR \setdiff \set{D} \end{cases} \end{equation*} % for all $t \in \extR$. \symdef[]{Iprob.41}{convolution}{$f * g$}{convolution of function $f$ with function $g$ (\ie, $(f * g)(t) \triangleq \int_{-\infty}^\infty f(\tau) g(t-\tau) \total \tau$)}Define the \emph{convolution} operator ${*}: \extR^\set{D} \times \extR^\set{D} \mapsto \extR^\set{D}$ such that % \begin{equation*} (f * g)(t) \triangleq \int_\set{D} f^*(\tau) g^*(t - \tau) \total \mu(\tau) \end{equation*} % for all $t \in \set{D}$. Therefore, for any $f$ and $g$, the function $f * g: \set{D} \mapsto \extR$ can be defined using the convolution definition above. Now, take additional function $h: \set{D} \mapsto \extR$ and real numbers $a,b \in \extR$. For these $f,g,h$ and $a,b$, the convolution operator has a number of important properties. % \begin{description} \item\emph{Commutativity:} $f * g = g * f$ \item\emph{Associativity:} $f * (g * h) = (f * g) * h$ \item\emph{Linearity in the First Argument:} $(af + bg) * h = a(f * h) + b(g * h)$ \item\emph{Linearity in the Second Argument:} $f * (ag + bh) = a(f * g) + b(f * h)$ \end{description} % Clearly, convolution is a bilinear operation. \section{Probability, Random Variables, and Random Vectors} \label{app:math_probability} Probability is a specialization of measure theory, the subject of \longref{app:math_measure}. \Citet{PapoulisPillai02} and \citet{Viniotis98} provide good references on the theory of probability, random variables, and random processes. The application of probability is an attempt to model \emph{randomness} or extreme complexity. That is, when a parameter is known with complete certainty, it is said to be \emph{deterministic} and is not usually cast in a probabilistic framework. However, when the values of a parameter are uncertain but come from a set of possible values, the parameter is said to be \emph{stochastic}. \subsection{Probability Measures and Probability Spaces} Take $\sigma$-field $(\set{U},\Sigma)$. Define a function $\Pr: \Sigma \mapsto [0,\infty]$, where interval $[0,\infty] \subset \extR$. Assume that % \begin{enumerate}[(i)] \item for all $\set{E} \in \Sigma$, $\Pr(\set{E}) \geq 0$ \label{item:prob_nonnegtaive} \item $\Pr( \set{U} ) = 1$ \label{item:prob_certain_event} \item for a sequence $(\set{X}_n)$ where $\set{X}_n \in \Sigma$ for all $n \in \N$ and $\set{X}_i \cap \set{X}_j = \emptyset$ for all $i,j \in \N$ with $i \neq j$, it is the case that % \begin{equation*} \Pr\left( \bigcup \{ \set{X}_i : i \in \N \} \right) = \sum\limits_{i=1}^\infty \Pr\left( \set{X}_i \right) \end{equation*} \label{item:prob_countable_additivity} \end{enumerate} % Take $\set{E} \in \Sigma$ and the sequence % \begin{equation*} (\set{X}_n) \triangleq (\set{E},\emptyset,\emptyset,\emptyset,\emptyset,\dots) \end{equation*} % Clearly, for any $\set{X}_i$ and $\set{X}_j$ with $i \neq j$, $\set{X}_i \cap \set{X}_j = \emptyset$. Therefore, by the definition of $\Pr$, % \begin{equation*} \Pr\left( \bigcup \{ \set{X}_i : i \in \N \} \right) = \sum\limits_{i=1}^\infty \Pr\left( \set{X}_i \right) = \Pr( \set{E} ) + \Pr( \emptyset ) + \Pr( \emptyset ) + \cdots \end{equation*} % However, $\bigcup \{ \set{X}_i : i \in \N \} = \set{E}$, and so % \begin{equation*} \Pr( \set{E} ) = \Pr( \set{E} ) + \Pr( \emptyset ) + \Pr( \emptyset ) + \cdots \end{equation*} % Therefore, it must be that $\Pr( \emptyset ) = 0$. Thus, $\Pr$ meets all of the requirements for being a measure. In this case, \symdef{Iprob.541}{probspace}{$(\set{U},\Sigma,\Pr)$}{Probability space with outcomes $\set{U}$, $\sigma$-field of events $\Sigma$, and probability measure $\Pr$} is called a \emph{probability space} that models some \emph{random experiment}. Additionally, % \begin{itemize} \item \symdef{Iprob.540}{probmeasure}{$\Pr$}{Probability measure} is called a \emph{probability measure} \item the set $\set{U}$ is called the \emph{(universal) sample space} and is viewed as a set of \emph{outcomes} of the random experiment being modeled \item the set $\Sigma$ is called a set of \emph{events} (\ie, the events are the measurable subsets of the outcomes). \item the \emph{probability} of any event $\set{E} \in \Sigma$ is given by $\Pr(\set{E})$ \end{itemize} % Note that it is common for $(\set{U},\Sigma,\Pr)$ to be called a random experiment rather than a probability space. \paragraph{Properties of a Probability Space:} Take a probability space $(\set{U},\Sigma,\Pr)$. Take an event $\set{A} \in \Sigma$. Its complement $\set{A}^c = \set{U} \setdiff \set{A}$ (note that $\set{A}^c \in \Sigma$, and so $\set{A}^c$ is also an event). Take an additional event $\set{B}$. It can be shown that % \begin{itemize} \item $\Pr(\emptyset) = 0$ \item $\Pr(\set{A}) = 1 - P(\set{A}^c)$ \item $\Pr(\set{A}) \leq 1$ \item $\Pr(\set{A} \cup \set{B}) = \Pr(\set{A}) + \Pr(\set{B}) - \Pr(\set{A} \cap \set{B})$ \item $\Pr(\set{A} \cap \set{B}) = \Pr(\set{A}) + \Pr(\set{B}) - \Pr(\set{A} \cup \set{B})$ \item if $\set{B} \subseteq \set{A}$ then $\Pr(\set{A}) = \Pr(\set{B}) + \Pr(\set{A} \cap \set{B}^c)$ and $\Pr(\set{A}) \geq \Pr(\set{B})$ \end{itemize} % Because $\Pr(\set{A}) \leq 1$ for all $\set{A} \in \Sigma$, it is not uncommon for $\Pr$ to be defined with a codomain of $[0,1]$. \paragraph{Terminology:} Take a probability space $(\set{U},\Sigma,\Pr)$ and events $\set{A},\set{B} \in \Sigma$. In application, there are a number of terms that describe properties of events. % \begin{description} \item\emph{Independent Events:} Saying that $\set{A}$ and $\set{B}$ are \emph{(pairwise) independent} means that $\Pr( \set{A} \cap \set{B}) = \Pr(\set{A}) \Pr(\set{B})$. \item\emph{Disjoint Events:} Saying that $\set{A}$ and $\set{B}$ are \emph{disjoint (events)} means that $\set{A} \cap \set{B} = \emptyset$. Of course, if $\set{A}$ and $\set{B}$ are disjoint then $\Pr( \set{A} \cap \set{B}) = 0$. Assume that $\set{A}$ and $\set{B}$ are disjoint and independent. In this case, $\Pr( \set{A} \cap \set{B}) = \Pr(\set{A}) P(\set{B}) = 0$, which can only occur if $\Pr(\set{A})=0$ or $\Pr(\set{B})=0$ (or both). \item\emph{With Probability Zero:} If $\Pr(\set{A})=0$ then event $\set{A}$ is said to happen \emph{with probability zero} or \emph{almost never}. In a general measure context, $\set{A}$ is a \emph{null set}. Note that the random experiment that this probability space models may still have outcomes from $\set{A}$ that occur even though they exist in set that occurs with probability zero. \item\emph{Almost Sure:} If $\Pr(\set{A})=1$ then $\set{A}$ is said to happen \emph{with probability one} or \emph{almost surely}. In this case, $\Pr(\set{A}^c)=0$; therefore, the event $\set{A}^c$ occurs almost never. However, this does not guarantee that actual outcomes in the random experiment modeled by this probability space will always come from $\set{A}$. \end{description} \subsection{The Extended Reals as Probability Space} \label{app:math_extended_reals_prob_space} Take the probability space $(\extR,\Borel(\extR),\Pr)$. The justification for using $\Borel(\extR)$ will be introduced in \longref{app:math_random_variables}. Of course, $\Pr$ has domain $\Borel(\extR)$ and thus must be defined for all events $\set{E} \in \Borel(\extR)$. However, since $\Borel(\extR)$ is a Borel field then any element $\set{E} \in \Borel(\extR)$ is a Borel set and can be constructed by countable intersections, unions, or complements of half lines (\ie, intervals of the form $[-\infty,a]$ where $a \in \R$). By the properties of a probability space, if the probability of each half line is known, then the probability of $\set{E}$ can be determined analytically. Therefore, define a function $F: \extR \mapsto [0,1]$ by % \begin{equation*} F(x) \triangleq \Pr( [-\infty,x] ) = \Pr( \{ z \in \R : z \leq x \} ) \end{equation*} % for all $x \in \extR$. In this case, $F$ is called the \emph{cumulative distribution function} and can be used to find the probability of every event. In fact, it can be shown that $F$ is a monotonically increasing lower semi-continuous function. That is, for all $p,q \in \extR$ with $p \leq q$, it is the case that % \begin{equation*} F(p) \leq F(q) \end{equation*} % and % \begin{equation} \liminf\limits_{x \to p} F(x) \geq F(p) \label{eq:cdf_lsc} \end{equation} % However, note that % \begin{equation} \limsup\limits_{x \to p} F(x) \geq \liminf\limits_{x \to p} F(x) \label{eq:cdf_lsc_limsup} \end{equation} % Therefore, we use the notation % \begin{equation*} F(p+) \triangleq \limsup\limits_{x \to p} F(x) \end{equation*} % Note that by \longrefs{eq:cdf_lsc} and \shortref{eq:cdf_lsc_limsup} then $F(p+) \geq F(p)$ for all $p \in \extR$. Additionally, note that if $F(p+) = F(p)$ then $F$ is upper semi-continuous at $p$ and therefore continuous at $p$. Take $a \in \extR$. The following always hold. % \begin{itemize} \item It is always the case that $\Pr(\{a\}) = F(a+) - F(a)$. Thus, if $\Pr(\{a\}) = 0$ then $F$ is continuous at $a$. \item If $F$ is continuous then $\Pr(\{a\}) = 0$. \end{itemize} % Note that for $a,b \in \extR$ with $a \leq b$, % \begin{itemize} \item $\Pr( (a,b] ) = F(b) - F(a)$ \item $\Pr( [a,b] ) = F(b) - F(a) + F(a+) - F(a)$ \item $\Pr( [a,b) ) = F(b) - F(a) + F(a+) - F(a) - (F(b+) - F(b))$ \item $\Pr( (a,b) ) = F(b) - F(a) - (F(b+) - F(b))$ \item $F(\infty)=\Pr(\extR)=1$. \item $\Pr( (a,\infty] ) = F(\infty) - F(a) = 1 - F(a)$ \end{itemize} % Recall that for any $\set{E} \in \Borel(\extR)$, % \begin{equation*} \Pr( \set{E} ) = \int_\set{E} \total \Pr \end{equation*} % Take a point $x \in \R$ and the interval $[-\infty,x]$. Additionally, define $\set{E}_x$ by % \begin{equation*} \set{E}_x \triangleq \{ p \in [-\infty,x] : F(p+) \neq F(p) \} \end{equation*} % In other words, $\set{E}_x$ is the set of all points in the interval $[-\infty,x]$ where $F$ is not continuous. It can be shown that $\set{E}_x$ is a countable set of points, and thus it is Lebesgue null (\ie, $m(\set{E}_x)=0$). It can also be shown that % \begin{align*} F(x) &= \Pr( [-\infty,x] )\\ &= \int_{[-\infty,x]} \total \Pr\\ &= \int_{-\infty}^x F'(x) \total x + \Pr( \set{E}_x )\\ &= \int_{-\infty}^x F'(x) \total x + \int_{\set{E}_x} \total \Pr\\ &= \int_{-\infty}^x ( F'(x) + \sum\limits_{p \in \set{E}_x} (F(p+)-F(p)) \delta_p(\{x\}) ) \total x\\ &= \int_{-\infty}^x F'(x) \total x + \sum\limits_{p \in \set{E}_x} ( F(p+) - F(p) ) \end{align*} % In this case, denote the \emph{density function of measure $\Pr$ with respect to the Lebesgue measure} as $f: \extR \mapsto \extR$ which is defined by % \begin{align*} f(x) &\triangleq F'(x) + \sum_{p \in \set{E}_x} (F(p+)-F(p)) \delta_p(\{x\})\\ &= F'(x) + \sum_{p \in \set{E}_x} \Pr(\{p\}) \delta_p(\{x\})\\ &= F'(x) + \sum_{p \in \set{E}_x} \Pr(\{p\}) \delta(x-p) \end{align*} % where $F'(x)$ can be viewed as the derivative of $F$ with respect to $x$. Of course, $F$ may not be differentiable everywhere, and so its derivative may not exist at some points. However, we can somewhat arbitrarily define $F'$ on those points. The function $f$ is known as the \emph{probability density function}. It can be shown to be measurable, and so it is the case that % \begin{equation*} \Pr( [-\infty,a] ) = F(a) = \int_{-\infty}^a f(x) \total x \end{equation*} % for all $a \in \extR$. \subsection{Random Variables} \label{app:math_random_variables} Take a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. Clearly, any event $\set{E} \in \Sigma$ has a probability $\Pr(\set{E})$. However, it is difficult to specify the form of the probability measure $\Pr$ for every experiment. Thus, we introduce the \emph{random variable}. Assume that function % \begin{equation*} X: \set{U} \mapsto \extR \end{equation*} % is a Borel measurable function. In this case, $X$ is called a \emph{random variable}. That is, for all outcomes $\zeta \in \set{U}$, $\set{X}(\zeta)$ is a real number. Additionally, for any Borel set $\set{E} \in \Borel(\extR)$, the preimage $X^{-1}[\set{E}] \in \Sigma$. In other words, for all $\set{E} \in \Borel(\extR)$, % \begin{equation*} \{ \zeta \in \set{U} : X(\zeta) \in \set{E} \} \in \Sigma \end{equation*} % and so the preimage of $\set{E}$ under $X$ is a measurable set and so it also has a probability associated with it. \symdef[]{Iprob.545}{setRV}{$\{X \leq a\}$}{Measurable set induced by preimage of random variable $X$ (\ie, \linebreak[3] $\{ \zeta \in \set{U} : X(\zeta) \leq a \}$)}Motivated by this, we introduce the notation % \begin{equation*} \{ \text{statement about } X \} \triangleq \{ \zeta \in \set{U} : \text{statement about } X(\zeta) \} \end{equation*} % For example, for $a,b \in \extR$, % \begin{itemize} \item $\{ a \leq X \} \triangleq \{ \zeta \in \set{U} : a \leq X(\zeta) \}$ \item $\{ a < X \} \triangleq \{ \zeta \in \set{U} : a < X(\zeta) \}$ \item $\{ X \leq b \} \triangleq \{ \zeta \in \set{U} : X(\zeta) \leq b \}$ \item $\{ X < b \} \triangleq \{ \zeta \in \set{U} : X(\zeta) < b \}$ \item $\{ a \leq X \leq b \} \triangleq \{ \zeta \in \set{U} : a \leq X(\zeta) \leq b \}$ \item $\{ a < X \leq b \} \triangleq \{ \zeta \in \set{U} : a < X(\zeta) \leq b \}$ \item $\{ a \leq X < b \} \triangleq \{ \zeta \in \set{U} : a \leq X(\zeta) < b \}$ \item $\{ a < X < b \} \triangleq \{ \zeta \in \set{U} : a < X(\zeta) < b \}$ \end{itemize} % Some authors will use square brackets (\eg, $[ \text{statement about } X ]$) since preimages of sets are being generated by this notation. \symdef[]{Iprob.546}{probRV}{$\Pr(X \leq a)$}{Probability induced by preimage of random variable $X$ (\ie, \linebreak[3] $\Pr(\{ \zeta \in \set{U} : X(\zeta) \leq a \})$)}Additionally, we will use the notation % \begin{align*} \Pr( \text{statement about } X ) &\triangleq \Pr( \{ \text{statement about } X \} )\\ &= \Pr( \{ \zeta \in \set{U} : \text{statement about } X(\zeta) \} ) \end{align*} % For example, for some $a \in \extR$, % \begin{equation*} \Pr( X \leq a ) = \Pr( \{ \zeta \in \set{U} : X(\zeta) \leq a \} ) \end{equation*} % Again, some authors will use square brackets (\eg, $\Pr[ X \leq a ]$ for $a \in \extR$) which relates to the preimages being generated. \paragraph{Cumulative Distributions and Probability Densities:} Take a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and a random variable $X: \set{U} \mapsto \extR$. Note that every set in $\Borel(\extR)$ can be generated by a countable number of operations on half-lines (\ie, sets of the form $[-\infty,a]$ for all $a \in \extR$). Thus, we will focus on sets of the form $\{ X \leq a \}$ for all $a \in \extR$. Recall the discussion in \longref{app:math_extended_reals_prob_space}. \symdef[]{Iprob.55}{cdf}{$F_X(x)$}{Cumulative distribution function for random variable $X$ (\ie, $F_X(a) \triangleq \Pr(X \leq a)$)}\symdef[]{Iprob.55}{cdfplus}{$F_X(x+)$}{Limit superior of $F_x$ at point $p$}Denote the \emph{cumulative distribution function for random variable $X$} as the function $F_X: \extR \mapsto [0,1]$ defined by % \begin{equation*} F_X(x) \triangleq \Pr( X \leq x ) \end{equation*} % for all $x \in \extR$. It can be shown that $F_X$ is lower semi-continuous and monotonically increasing. Again, use the notation $F_X(p+)$ to denote the limit superior of $F_X$ at $p$. That is, % \begin{equation*} F_X(p+) = \limsup\limits_{x \to p} F_X(x) \end{equation*} % for any $p \in \extR$. Again, use $\set{E}_p$ to be the set of points in $[-\infty,p]$ where $F_X$ is not continuous. That is, % \begin{equation*} \set{E}_p \triangleq \{ x \in [-\infty,p] : F_X(x+) \neq F_X(x) \} \end{equation*} % It can be shown that the Lebesgue measure of $\set{E}_p$ is zero; the set $\set{E}_p$ is Lebesgue null. \symdef[]{Iprob.56}{pdf}{$f_X(x)$}{Probability density function for random variable $X$ (\ie, $F_X(a) = \int_{-\infty}^a f_X(x) \total x$)}Now denote the \emph{probability density function for random variable $X$} as the function $f_X: \extR \mapsto [0,\infty]$ defined by % \begin{align*} f_X(x) &\triangleq F_X'(x) + \sum_{p \in \set{E}_x} (F_X(p+)-F_X(p)) \delta(x-p)\\ &= F_X'(x) + \sum_{p \in \set{E}_x} \Pr(X=p) \delta(x-p) \end{align*} % for all $x \in \extR$. While $F_X'$ may not exist at all points in $\extR$, it can somewhat arbitrarily be defined on those points. It can be shown that $f_X$ is a measurable function, and so % \begin{equation*} \Pr( X \leq a ) = F_X(a) = \int_{-\infty}^a f_X(x) \total x \end{equation*} % Therefore, if either $F_X$ or $f_X$ is specified for a random variable, the probability of the preimages generated can be calculated easily. \paragraph{Omission of Domain and Codomain in Notation:} Notice that $\extR$ is the domain of all cumulative distribution and probability density functions. Because of this, the codomain of any random variable should technically always be $\extR$. Additionally, the codomain (and, in fact, range) of any cumulative distribution function will be $[0,1]$ and the codomain of any probability density function can safely be taken to be $\extR$. Finally, the domain of any random variable associated with a given probability space should be clear. Therefore, if a probability space is given, the domain and codomain of any random variable, cumulative distribution, or probability density function may be omitted. For example, for a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$, it is sufficient to declare a random variable $X$ with distribution $F_X$ and density $f_X$. \paragraph{Statistical Independence of Events:} Take a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. Take $\set{N} \subseteq \N$ and family $(\set{A}_n)_{n \in \set{N}}$ such that $\set{A}_i \in \Sigma$ for all $i \in \set{N}$. To say that the family of events $(\set{A}_n)_{n \in \set{N}}$ are \emph{pairwise independent} means that it is the case that for any $i,j \in \set{N}$ such that $i \neq j$, % \begin{equation*} \Pr\left(\set{A}_i \cap \set{A}_j\right) = \Pr\left(\set{A}_i\right) \Pr\left(\set{A}_j\right) \end{equation*} % To say that the family of events $(\set{A}_n)_{n \in \set{N}}$ are \emph{mutually independent} means that % \begin{equation*} \Pr\left( \bigcap \left\{ \set{A}_i : i \in \set{N} \right\} \right) = \prod\limits_{i \in \set{N}} \Pr( \set{A}_i ) \end{equation*} % Of course, mutual independence implies pairwise independence. Additionally, these events could be generated as preimages of random variables. Below we will define statistical independence for random variables by doing exactly that. \paragraph{Conditional Probabilities:} Take a random experiment modeled by probability space $(\set{U},\Sigma_\set{U},\Pr)$ and a random variable $X: \set{U} \mapsto \extR$. Take a set $\set{B} \in \Sigma$. Of course, this set has probability $\Pr(\set{B})$ and % \begin{equation*} \Pr(\set{B}) \leq 1 \end{equation*} % For simplicity, we implicitly assume that $\Pr(\set{B}) > 0$; however, a more rigorous development would not require this. Now, assume that it is \emph{given} that outcomes for this experiment will come from this set. In this case, we define a new experiment modeled by probability space $(\set{B},\Sigma_\set{B},\Pr|_{\set{B}})$ where $\Pr|_{\set{B}}$ is defined by % \begin{equation*} \Pr|_{\set{B}}(\set{E}) \triangleq \frac{ \Pr(\set{E} \cap \set{B}) }{ \Pr(\set{B}) } \end{equation*} % for all $\set{E} \in \Sigma_\set{B}$. Note that % \begin{equation*} \Pr|_{\set{B}}(\set{B}) = \frac{ \Pr(\set{B} \cap \set{B}) }{ \Pr(\set{B}) } = \frac{ \Pr(\set{B}) }{ \Pr(\set{B}) } = 1 \end{equation*} % which is expected since $\Pr|_{\set{B}}$ is defined to be a probability measure on $\set{B}$. For simplicity, use the notation % \begin{equation*} \Pr( \set{E} | \set{B} ) \triangleq \Pr|_{\set{B}}(\set{E}) = \frac{ \Pr(\set{E} \cap \set{B}) }{ \Pr(\set{B}) } \end{equation*} % for all $\set{E} \in \Sigma_\set{B}$; $\Pr(\set{E}|\set{B})$ is called the \emph{conditional probability} of $\set{E}$ \emph{given} $\set{B}$. Note that if a random variable $\set{A}: \set{U} \mapsto \extR$ with $\Pr(\set{A}) > 0$, $\set{A}$ and $\set{B}$ are statistically independent events if and only if % \begin{equation*} \Pr( \set{A} | \set{B} ) = \Pr( \set{A} ) \quad \text{ or, equivalently, } \quad \Pr( \set{B} | \set{A} ) = \Pr( \set{B} ) \end{equation*} % In other words, two events are independent if the probability of one event is not affected by the condition that the other event is certain. In geometric terms, the fraction of the universal probability space filled by one event matches the fraction of some subset of that probability space. Note that conditional probabilities can be used with random variables as well; that is, random variables can be used to specify the events. For example, for any $a,b \in \extR$, % \begin{equation*} \Pr( X \leq a | X < b ) = \frac{ \Pr(\{X \leq a\} \cap \{X < b\}) }{ \Pr(\{X < b\}) } \end{equation*} % Take $Y: \set{U} \mapsto \extR$ to be another random variable for the original process. It may be used to specify given conditions. For example, for any $a,b,c \in \extR$, % \begin{equation*} \Pr( X \leq a | X < b, Y = c ) = \frac{\Pr(\{X \leq a \} \cap \{ X < b \} \cap \{ Y = c \})}% {\Pr(\{ X < b \} \cap \{ Y = c \})} \end{equation*} \paragraph{Memorylessness:} Take a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and a random variable $X: \set{U} \mapsto \extR$. To say that $X$ is \emph{memoryless} or has the \emph{memoryless property} means that for any $a,b \in \R_{>0}$, % \begin{equation*} \Pr(X > a + b | X > b) = \Pr(X > a) \end{equation*} \paragraph{Functions of Random Variables:} Take a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and a random variable $X: \set{U} \mapsto \extR$. Define another Borel measurable function $f: \extR \mapsto \extR$. Denote the composition $f \comp X$ as function $Y: \set{U} \mapsto \extR$; that is, define $Y$ by % \begin{equation*} Y(\zeta) \triangleq f(X(\zeta)) \end{equation*} % for all $\zeta \in \set{U}$. The function $Y$ is another random variable. In fact, $Y$ will often be denoted as $f(X)$ (\ie, $Y = f(X)$). \paragraph{Exclusion of Outcome in Notation:} Take a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. Also let $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$ be random variables. The notation % \begin{equation*} \{ X = Y \} = \{ \zeta \in \set{U} : X(\zeta) = Y(\zeta) \} \end{equation*} % However, some authors will use $X = Y$ to denote $\{X=Y\}$ instead. For example, to say \emph{$X = Y$ with probability 1} means that % \begin{equation*} \Pr(X=Y) = \Pr( \{ \zeta \in \set{U} : X(\zeta) = Y(\zeta) \} ) = 1 \end{equation*} % However, the statement that $X = Y$ might denote that \emph{for any $\zeta \in \set{U}$, $X(\zeta) = Y(\zeta)$}. In this case, $X = Y$ is a statement about the functional form of $X$ and $Y$ and not about the preimages that they induce. Our convention is to use curly braces around preimages (\eg, $\{X = Y\}$) whenever measurable preimages need to be generated. Thus, if curly braces are not being used and a particular $\zeta \in \set{U}$ has not been identified, we mean that the functional expression holds for all $\zeta \in \set{U}$. \paragraph{Expectation of a Random Variable:} Take a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and a random variable $X: \set{U} \mapsto \extR$. Define another Borel measurable function $g: \extR \mapsto \extR$ and denote the composition $g \comp X$ as function $Y: \set{U} \mapsto \extR$ defined by $Y(\zeta) \triangleq g(X(\zeta))$ for all $\zeta \in \set{U}$. The \emph{expectation of $Y$} is denoted $\E(Y)$ or \symdef{Iprob.61}{expectationgX}{$\E(g(X))$}{Expectation of function $g$ of random variable $X$ (\ie, \linebreak[4] $\int_{-\infty}^\infty g(x) f_X(x) \total x$)} is defined % \begin{equation*} \E(g(X)) \triangleq \int_{-\infty}^\infty g(x) f_X(x) \total x \end{equation*} % where $f_X$ is the probability density function of random variable $X$. Note that this implies % \begin{equation*} \E(X) \triangleq \int_{-\infty}^\infty x f_X(x) \total x \end{equation*} % where \symdef{Iprob.60}{expectationX}{$\E(X)$}{Expectation of random variable $X$ (\ie, \linebreak[4] $\int_{-\infty}^\infty x f_X(x) \total x$)} is called the \emph{expectation of $X$}, which is a \emph{first-order statistic} of $X$. This is sometimes called the \emph{average} or \emph{mean} of random variable $X$; however, it should not be confused with other non-random uses of those terms. Additionally, % \begin{equation*} \E(X^2) \triangleq \int_{-\infty}^\infty x^2 f_X(x) \total x \end{equation*} % where $\E(X^2)$ is called the \emph{second moment} of random variable $X$, which is one of its \emph{second-order statistics}. \paragraph{Linearity of Expectation:} Take a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$, random variables $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$, and $a,b \in \R$. It is easy to show that % \begin{equation*} \E( a X + b Y ) = a \E(X) + b \E(Y) \end{equation*} % That is, the expectation is \emph{linear}. Additionally, assume that $c \in \extR$. Trivially, $c$ is a random variable. Therefore, % \begin{equation*} \E( c ) = c \end{equation*} % Of course, $\E(X) \in \extR$. Therefore, % \begin{equation*} \E( \E(X) ) = \E(X) \end{equation*} % Thus, % \begin{equation*} \E( X - \E(X) ) = \E( X ) - \E( \E(X) ) = \E(X) - \E(X) = 0 \end{equation*} \paragraph{Variance of a Random Variable:} Take a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and a random variable $X: \set{U} \mapsto \extR$. The \emph{variance of $X$} or the \emph{second central moment of $X$} is denoted $\var(X)$ or \symdef{Iprob.62}{varianceX}{$\var(X)$}{Variance of random variable $X$ (\ie, $\var(X) = \E(X^2) - \E(X)^2$)} is defined % \begin{equation*} \var(X) \triangleq \E( (X - \E(X))^2 ) = \int_{-\infty}^\infty (x - \E(X))^2 f_X(x) \total x \end{equation*} % where $f_X$ is the probability density function of random variable $X$. Note that this implies % \begin{equation*} \var(X) = \E(X^2) - \E(X)^2 \end{equation*} % which is a useful property of the variance. Note that this implies % \begin{equation*} \E(X^2) = \var(X) + \E(X)^2 \end{equation*} % The variance of $X$ is one of its \emph{second-order statistics}. \paragraph{Properties of Variance:} Take a random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$, random variables $X: \set{U} \mapsto \extR$, and $a,b \in \R$. It is easy to show that % \begin{equation*} \var( a X + b ) = a^2 \var( X ) \end{equation*} % This implicitly uses the fact that % \begin{equation*} \var( b ) = 0 \end{equation*} % In fact, % \begin{equation*} \var( \var(X) ) = 0 \quad \text{ and } \quad \var( \E(X) ) = 0 \quad \text{ and } \quad \E( \var(X) ) = \var(X) \end{equation*} % Now take additional random variable $Y: \set{U} \mapsto \extR$. It is the case that % \begin{align*} \var( a X + b Y ) &= a^2 \var(X) + b^2 \var(Y) + 2ab \E( (X-\E(X))(Y-\E(Y)) )\\ &= a^2 \var(X) + b^2 \var(Y) + 2ab ( \E(XY) - \E(X)\E(Y) )\\ &= a^2 \var(X) + b^2 \var(Y) + 2ab \E(XY) - 2ab \E(X)\E(Y) \end{align*} % where $\E( (X-\E(X))(Y-\E(Y)) )$ is sometimes called the \emph{covariance of $X$ and $Y$} and is denoted \symdef{Iprob.63}{covarianceXY}{$\cov(X,Y)$}{Covariance of random variables $X$ and $Y$ (\ie, $\cov(X,Y) = \E(XY) - \E(X)\E(Y)$)}. That is, % \begin{equation*} \cov(X,Y) \triangleq \E( (X-\E(X)) (Y-\E(Y)) ) = \E(XY) - \E(X)\E(Y) \end{equation*} \subsection{Relationship Between Random Variables} It is common for two probability spaces to be related to each other. That is, it is common for an experiment to generate multiple random variables that are related through the experiment. Examples of this have already been give. For example, take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and random variables $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$ and $a,b \in \R$. As we have discussed, it is the case that % \begin{equation*} \E( a X + b Y ) = a \E(X) + b \E(Y) \end{equation*} % and % \begin{equation*} \var( a X + b Y ) = a^2 \var(X) + b^2 \var(Y) + 2ab \cov(X,Y) \end{equation*} % Of course, these results will generalize to any finite collection of random variables. \paragraph{Identically Distributed Random Variables:} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. Also take $\set{N} \subseteq \N$ and the family $(X_n)_{n \in \set{N}}$ where $X_i: \set{U} \mapsto \extR$ is a random variable for each $i \in \set{N}$. For example, these random variables could represent successive \emph{trials} of the same experiment. Now assume that for any $i \in \set{N}$ with $i \neq j$, % \begin{equation*} f_{X_i}(x) = f_{X_j}(x) \end{equation*} % for all $x \in \extR$ (\ie, all random variables have the same distributions). In this case, the random variables are said to be \emph{identically distributed}. \paragraph{Joint Distributions and Densities:} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and random variables $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$ and $a,b \in \R$. Consider the events generated by % \begin{equation*} \{ X \leq a \} \quad \text{ and } \quad \{ Y \leq b \} \end{equation*} % Of course, both of these events are from the $\sigma$-field $\Sigma$, and so their intersection is also included in that field. We can generate that event by taking the intersection of the two statements above. Thus, it is useful for us to define the notation: % \begin{equation*} \{ X \leq a, Y \leq b \} \triangleq \{ X \leq a \} \cap \{ Y \leq b \} \end{equation*} % \symdef[]{Iprob.65}{jcdf}{$F_{XY}(x,y)$}{Joint distribution function for random variables $X$ and $Y$ (\ie, $F_{XY}(a,b) \triangleq \Pr(X \leq a, Y \leq b)$)}Now, we can define the \emph{joint distribution} $F_{XY}: \extR \times \extR \mapsto [0,1]$ as % \begin{equation*} F_{XY}(x,y) \triangleq P( X \leq x, Y \leq y ) \end{equation*} % for all $x,y \in \extR$. Recall how Dirac delta functions were introduced in the construction of a density function. \symdef[]{Iprob.66}{jpdf}{$f_{XY}(x,y)$}{Joint density function for random variables $X$ and $Y$}Through a similar process, we can introduce a \emph{joint density function} $f_{XY}: \extR \times \extR \mapsto [0,\infty]$ such that % \begin{equation*} F_{XY}(a,b) = \int_{-\infty}^b \int_{-\infty}^a f(x,y) \total x \total y \end{equation*} \paragraph{Conditional Random Variables:} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and random variables $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$ and $a,b \in \R$. Assume that the experiment is changed so that it is given that $X = a$. As with the definition of conditional events above, we can define a new probability space $(\{X = a\},\Sigma_{\{X=a\}},\Pr|_{\{X=a\}})$ where the notation % \begin{equation*} \Pr(\set{E}|X=a) \triangleq \Pr|_{\{X=a\}}(\set{E}) \end{equation*} % for all $\set{E} \in \Sigma_{\{X=a\}}$. Thus, we can define the \emph{conditional density function} of $Y$ \emph{given} $X=x$ \symdef[]{Iprob.670}{condpdf}{$f_{Y \pipe X}(y \pipe x)$}{Conditional density function for random variable $Y$ given $X=x$}$f_{Y|X}: \extR \times \extR \mapsto [0,\infty]$ by % \begin{equation*} f_{Y|X}(y|x) \triangleq \frac{ f_{XY}(x,y) }{ f_X(x) } \end{equation*} % which will lead to % \begin{equation*} F_{YX}(y|x) \triangleq \Pr( Y \leq y | X=x ) = \int_{-\infty}^y f_{Y|X}(y|x) \total y \end{equation*} % where \symdef[]{Iprob.671}{condcdf}{$F_{Y \pipe X}(y \pipe x)$}{Conditional distribution function for random variable $Y$ given $X=x$}$F_{Y|X}: \extR \times \extR \mapsto [0,1]$ is the \emph{conditional distribution function} of random variable $Y$ \emph{given} $X=x$. Similarly, % \begin{equation*} f_{X|Y}(x|y) \triangleq \frac{ f_{XY}(x,y) }{ f_Y(y) } \end{equation*} % which can be used in a similar way to generate conditional distribution function $F_{XY}$. \paragraph{Conditional Expectation:} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and random variables $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$. We can define the \emph{conditional expectation} of $Y$ given $X=x$ as % \begin{equation*} \E(Y|X=x) \triangleq \int_{-\infty}^\infty y f(y|x) \total y \end{equation*} % Note that this is a function of $x$. \symdef[]{Iprob.68}{condexp}{$\E(Y \pipe X)$}{Conditional expectation of $Y$ given $X$}Therefore, use the notation % \begin{equation*} \E(Y|X) \triangleq \E(Y|X=X) \end{equation*} % to represent a new random variable generated from the composition of $\E(Y|X=x)$ and $X$. This is called the \emph{conditional expectation of $Y$ given $X$}. It is the case that % \begin{equation*} \E(\E(Y|X)) = \E(Y) \quad \text{ and } \quad \E(\E(X|Y)) = \E(X) \end{equation*} % In fact, for measurable functions $g: \extR \mapsto \extR$ and $h: \extR \mapsto \extR$, % \begin{equation*} \E( g(X) h(Y) ) = \E( \E( g(X) h(Y) | Y ) ) = \E( h(Y) \E( g(X) | Y ) ) \end{equation*} % This is a useful fact. Note that it implies % \begin{equation} \E( X Y ) = \E( Y \E( X | Y ) ) \quad \text{ and } \quad \E( X ) = \E( \E( X | Y ) ) \label{eq:expectation_to_condexp} \end{equation} % These two relationships can be especially useful if the range of $Y$ is countable or finite. \paragraph{Uncorrelated Random Variables:} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and random variables $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$. To say $X$ and $Y$ are \emph{uncorrelated} means that their covariance is zero (\ie, $\cov(X,Y) = \cov(Y,X) = 0$). Equivalently, to say $X$ and $Y$ are uncorrelated means that % \begin{equation*} \E(XY) = \E(X) \E(Y) \end{equation*} \paragraph{Statistically Independent Random Variables:} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and random variables $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$ and $a,b \in \R$. Also take some $\set{E}_X,\set{E}_Y \in \Borel(\extR)$. Random variables $X$ and $Y$ are said to be \emph{(statistically) independent} or \emph{(statistically) pairwise independent} if % \begin{equation*} \Pr( X \in \set{E}_X, Y \in \set{E}_Y ) = \Pr( X \in \set{E}_X ) \Pr( Y \in \set{E}_Y ) \end{equation*} % In fact, it can be shown that $X$ and $Y$ are statistically independent if and only if % \begin{equation*} F_{XY}(x,y) = F_X(x) F_Y(y) \quad \text{ or } \quad f_{XY}(x,y) = f_X(x) f_Y(y) \end{equation*} % for all $x,y \in \extR$. Note that the condition that $f_{XY}(x,y) = f_X(x) f_Y(y)$ for all $x,y \in \extR$ is equivalent to requiring that % \begin{equation*} f_{X|Y}(x|y) = f_X(x) \quad \text{ and } \quad f_{Y|X}(y|x) = f_Y(y) \end{equation*} % In other words, this is also equivalent to statistical independence. Now, assume that $X$ and $Y$ are statistically independent. Also take $g: \extR \mapsto \extR$ and $h: \extR \mapsto \extR$ are two measurable functions. It is the case that % \begin{equation*} \E( g(X) h(Y) ) = \E( g(X) ) \E( h(Y) ) \end{equation*} % In fact, % \begin{equation*} \E( X Y ) = \E(X) \E(Y) \end{equation*} % Therefore, statistical independence implies uncorrelatedness. Note, however, that the converse is not necessarily true. Additionally, because these two random variables are uncorrelated (since they are statistically independent), % \begin{equation*} \var(X + Y) = \var(X) + \var(Y) \end{equation*} % Now take random variable $Z: \set{U} \mapsto \extR$ defined by % \begin{equation*} Z(\zeta) \triangleq X(\zeta) + Y(\zeta) \end{equation*} % for all $\zeta \in \set{U}$. If $X$ and $Y$ are independent random variables, it can be shown for all $z \in \extR$, % \begin{equation*} f_Z(z) = f_X(x) * f_Y(y) \end{equation*} % where $*$ denotes convolution, which is discussed in \longref{app:math_convolution}. \paragraph{Pairwise Independent Random Variables:} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. Also take $\set{N} \subseteq \N$ and the family $(X_n)_{n \in \set{N}}$ where $X_i: \set{U} \mapsto \extR$ is a random variable for each $i \in \set{N}$. Assume that for any family $(a_n)_{n \in \set{N}}$ such that $a_n \in \R$ for all $n \in \set{N}$, for any $i,j \in \set{N}$ with $i \neq j$, % \begin{equation*} \Pr\left(\{ X_i \leq a_i \} \cap \{ X_j \leq a_j \}\right) = \Pr\left(\{ X_i \leq a_i \}\right) \Pr\left(\{ X_j \leq a_j \}\right) \end{equation*} % These random variables are said to be \emph{pairwise independent}. This is equivalent to the statistical independence described above. \paragraph{Mutually Independent Random Variables:} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. Also take $\set{N} \subseteq \N$ and the family $(X_n)_{n \in \set{N}}$ where $X_i: \set{U} \mapsto \extR$ is a random variable for each $i \in \set{N}$. Assume that for any family $(a_n)_{n \in \set{N}}$ such that $a_n \in \R$ for all $n \in \set{N}$, % \begin{equation*} \Pr\left( \bigcap \left\{ \{ X_i \leq a_i \} : i \in \set{N} \right\} \right) = \prod\limits_{i \in \set{N}} \Pr( \{ X_i \leq a_i \} ) \end{equation*} % These random variables are said to be \emph{mutually independent}. Note that any collection of mutually independent random variables are necessarily pairwise independent as well. \paragraph{Independent and Identically Distributed Random Variables:} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. Also take $\set{N} \subseteq \N$ and the family $(X_n)_{n \in \set{N}}$ where $X_i: \set{U} \mapsto \extR$ is a random variable for each $i \in \set{N}$. If these random variables all have the same distribution (\ie, they are identically distributed) and are all \emph{mutually} independent, they are said to be \emph{\acro[\defarg][IID]{\iid}{independent and identically distributed}}. \subsection{Random Vectors} Take $n \in \N$ and an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. Now take an indexed family $(X_i)_{i=1}^n$ where $X_i: \set{U} \mapsto \extR$ is a random variable for all $i \in \{1,2,\dots,n\}$. Denote the $n$-tuple $(X_1,X_2,X_3,\dots,X_n)$ by $\v{X}$. Thus, $\v{X}: \set{U} \mapsto \extR^n$ is called a \emph{random vector} or an \emph{$n$-dimensional random vector}. Of course, if $n=1$ then the random vector is simply a random variable (which may also be called a one dimensional random vector). \subsection{Common Random Variables} There are a number of common random variables used in applications. We define a few here. For each of these, take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. The random variable being defined is function $X: \set{U} \mapsto \extR$. % \begin{description} \item\emph{The Constant Function:} Assume that there exists some $c \in \extR$ such that $X(\zeta)=c$ for all $\zeta \in \set{U}$. That is, $X$ is \emph{constant}. Clearly, its probability density function is % \begin{equation*} f_X( x ) = \delta(x-c) \end{equation*} % That is, all of the mass that is its probability is concentrated on $\{ X = c \}$. That is, $\{X=c\}=\set{U}$ and so $\Pr( X=c ) = 1$ trivially. Notice that % \begin{equation*} \E( X ) = c \quad \text{ and } \quad \var(X) = 0 \end{equation*} \item\emph{The Bernoulli Random Variable:} Take some $p \in [0,1]$. Assume $X$ is a \emph{Bernoulli random variable}. This means that $X$ has the probability density function % \begin{equation*} f_X( x ) = (1-p) \delta(x) + p \delta(x-1) \end{equation*} % Clearly, $\range(X)=\{0,1\}$ and so $\{ X = 0 \} \cup \{X = 1\} = \set{U}$. In particular, % \begin{equation*} \Pr( X = 0 ) = 1-p \quad \text{ and } \quad \Pr( X = 1 ) = p \end{equation*} % Notice that % \begin{equation*} \E( X ) = p \quad \text{ and } \quad \var(X) = p(1-p) \end{equation*} % If $(X_N)$ is a sequence of \iid{}\ Bernoulli random variables then $X_N$ is called a \emph{Bernoulli trial} for each $N \in \N$. The Bernoulli random variable can be viewed as a weighted coin flip (\ie, $\set{U} = \{ \text{heads}, \text{tails} \}$), where the event $\{X = 0\} = \{ \text{tails} \}$ and the event $\{X=1\} = \{ \text{heads} \}$. If its parameter $p=0.5$ then the outcome is equally likely to be \emph{heads} or \emph{tails}; if its parameter is $p=0.80$ then there is a much greater chance that the outcome will be \emph{heads}. Take some random variable $Y: \set{U} \mapsto \extR$ such that % \begin{equation*} f_{Y|X}(y|1) = f_{Y}(y) \end{equation*} % Note that this is a weak kind of statistical independence. It implies that $\E(Y|X=1)=\E(Y)$. By the definition of a Bernoulli random variable, it is then necessary that $X$ and $Y$ are uncorrelated (\ie, $\E(XY)=\E(X)\E(Y)$). Now, take some $n \in \N$. Notice that for all $\zeta \in \set{U}$, $X^n(\zeta) = X(\zeta)$. Therefore, since $X$ is a Bernoulli random variable with parameter $p$, $X^n$ is also a Bernoulli random variable with parameter $p$ for all $n \in \N$. Thus, % \begin{equation*} \E( X^n ) = p \quad \text{ and } \quad \var( X^n ) = p(1-p) \end{equation*} % In fact, any statistical properties endowed to $X$ will be inherited by $X^n$. For example, for a random variable $Y: \set{U} \mapsto \extR$ such that $X$ and $Y$ are uncorrelated, it is also the case that $X^n$ and $Y$ are uncorrelated (\ie, if $\E(XY)=p \E(Y)$ then $\E(X^n Y)=\E(XY)=p \E(Y)$). This is a special property of Bernoulli random variables. \item\emph{The Poisson Random Variable:} Take $\lambda \in \R_{>0}$. Assume $X$ is an \emph{Poisson random variable}. This means that $X$ has the probability density function % \begin{align*} f_X( x ) &= \begin{cases} \frac{ \exp(-\lambda) \lambda^x }{ x! } &\text{if } x \in \W\\ 0 &\text{otherwise} \end{cases}\\ &= \sum\limits_{k=0}^\infty \frac{ \exp(-\lambda) \lambda^k }{ k! } \delta( x - k ) \end{align*} % Clearly, $\range(X)=\W$. Notice that % \begin{equation*} \E( X ) = \lambda \quad \text{ and } \quad \var(X) = \lambda \end{equation*} % Such a random variable is said to be \emph{Poisson distributed} or \emph{Poissonian}. \item\emph{The Continuous Uniform Random Variable:} Take $a,b \in \R$ with $a < b$. Assume $X$ is a \emph{continuous uniform random variable}. This means that $X$ has the probability density function % \begin{equation*} f_X( x ) = \begin{cases} \frac{1}{b-a} &\text{if } x \in [a,b]\\ 0 &\text{otherwise} \end{cases} \end{equation*} % Clearly, $\range(X)=[0,1]$. Notice that % \begin{equation*} \E( X ) = \frac{a + b}{2} \quad \text{ and } \quad \var(X) = \frac{ (b-a)^2 }{12} \end{equation*} % Such a random variable is said to be \emph{uniformly distributed} on $[a,b]$. \item\emph{The Exponential Random Variable:} Take $\lambda \in \R_{>0}$. Assume $X$ is an \emph{exponential random variable}. This means that $X$ has the probability density function % \begin{equation*} f_X( x ) = \begin{cases} \lambda \exp( -\lambda x ) &\text{if } x \geq 0\\ 0 &\text{if } x < 0 \end{cases} \end{equation*} % Clearly, $\range(X)=[0,\infty)$. Notice that % \begin{equation*} \E( X ) = \frac{1}{\lambda} \quad \text{ and } \quad \var(X) = \frac{1}{\lambda^2} \end{equation*} % Note that for all $a,b \in \R_{>0}$, % \begin{equation*} \Pr(X > a + b | X > b) = \Pr(X > a) \end{equation*} % Therefore, this random variable has the \emph{memoryless property}. A random variable with this distribution is said to be \emph{exponentially distributed}. \item\emph{The Erlang Random Variable:} Take $\lambda \in \R_{>0}$ and $k \in \N$. Assume $X$ is an \emph{Erlang random variable}. This means that $X$ has the probability density function % \begin{equation*} f_X( x ) = \begin{cases} \frac{\lambda(\lambda x)^{k-1}\exp(-\lambda x)}{(k-1)!} &\text{if } x \geq 0\\ 0 &\text{if } x < 0 \end{cases} \end{equation*} % Clearly, $\range(X)=[0,\infty)$. Notice that % \begin{equation*} \E( X ) = \frac{k}{\lambda} \quad \text{ and } \quad \var(X) = \frac{k}{\lambda^2} \end{equation*} % Such a random variable is said to be \emph{Erlang distributed} or \emph{Erlang-$k$ distributed}. \end{description} \section{Random Processes} \label{app:probability_rp} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and some $n \in \N$. Now take a totally ordered set $\set{A}$ and a net \symdef[]{Iprob.70}{randomprocess}{$( \v{N}(t) : t \in \R_{\geq 0})$}{Random process (\ie, $\v{N}(t)$ is a random vector for all $t \in \R_{>0}$)}$(X(t) : t \in \set{A})$ such that $\v{X}(t): \set{U} \mapsto \extR^n$ is an $n$-dimensional random vector for all $t \in \set{A}$. This is known as an \emph{($n$-dimensional) stochastic process} or an \emph{($n$-dimensional) random process}. \subsection{Continuous and Discrete Time Processes} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. Assume that this experiment runs over some period of time. Therefore, at each instant of time, the experiment can be viewed as having outcomes. Assume that the outcomes are characterized by $n \in \N$ random variables. The experiment's time may be viewed in two distinct ways % \begin{description} \item\emph{Continuous Time:} Time ranges over a continuum of values taken from $\R_{\geq 0}$ (or, more generally, an uncountable subset of $\R_{\geq 0}$). That is, any $t \in \R_{\geq 0}$ is of interest. In this case, we can bundle those $n$ random variables into a random vector $\v{X}(t): \set{U} \mapsto \extR^n$ where $t \in \R_{\geq 0}$ is an instant of time. Therefore, $( \v{X}(t): t \in \R_{\geq 0})$ is called an \emph{($n$-dimensional) continuous-time random process} (\ie, the process is a net but not a sequence). \item\emph{Discrete Time:} Time ranges over a countable set of values taken from $\N$ (or, more generally, some countable set isomorphic to $\N$). That is, any $t \in \N$ is of interest. In this case, we say that time has been \emph{discretized} and we can bundle those $n$ random variables into a random vector $\v{X}(t): \set{U} \mapsto \extR^n$ where $t \in \N$ is an instant of time that comes immediately after instant $(t-1)$. Therefore, $( \v{X}(t): t \in \N)$ or simply $(\v{X}(t))$ is called an \emph{($n$-dimensional) discrete-time random process} (\ie, the process is a sequence). \end{description} % In both cases, each time might be viewed as a different \emph{trial} of a particular random variable, where a continuous-time random process is the limit as the density of trials (with respect to some interesting outcome) increases. \paragraph{Markov Processes and Chains:} Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and some $n \in \N$. Also take an $n$-dimensional random process $(\v{X}(t) : t \in \set{T})$ on this probability space where $\set{T} \subseteq \R$ (\ie, this may be a continuous-time or a discrete-time process). Additionally, take $\v{x}(t): \R \mapsto \extR^n$ to be some function of time and $y \in \extR^n$ to be some constant. Assume that it is the case that for any $t \in \R_{\geq 0}$ and any $h \in \R_{>0}$, % \begin{equation*} \Pr( \v{X}(t+h)=y | \v{X}(s) = \v{x}(s) \text{ for all} s \leq t ) = \Pr( \v{X}(t+h)=y | \v{X}(t) = \v{x}(t) ) \end{equation*} % That is, given the current state of the process, knowledge of any of the past states of the process makes no impact on the probability of the future states of the process. It might be said that this process has no memory since its future trajectory depends only on its present state and not any of its past states. This is known as the \emph{Markov property} and such a process is called a \emph{Markov process}. If this is a discrete-time random process, it will be called a \emph{Markov chain}. \subsection{Sure and Almost Sure Stochastic Convergence} Take a totally ordered set $\set{A} \subseteq \extR$ such that $\infty$ is a limit point of $\set{A}$ in $\extR$ (\eg, $\set{A} = \N$ or $\set{A} = \R$). Also take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. With these, define a random process $( Y_t : t \in \set{A} )$ where $Y_t: \set{U} \mapsto \extR$ is a random variable (\ie, a one dimensional random vector) for each $t \in \set{A}$. Define the set $\Omega \subseteq \set{U}$ by % \begin{equation*} \Omega \triangleq \{ \zeta \in \set{U} : \text{there exists } p \in \extR \text{ such that } Y_t(\zeta) \to p \text{ as } t \to \infty \} \end{equation*} % Now define function $Y: \Omega \mapsto \extR$ by % \begin{equation*} Y(\zeta) \triangleq \lim\limits_{t \to \infty} Y_t(\zeta) \end{equation*} % for all $\zeta \in \Omega$. Note that $\Omega \subseteq \set{U}$, and so $Y$ may not be a random variable in general. Additionally, even if $\Omega = \set{U}$, there is no guarantee that $Y$ is a Borel measurable function. Therefore, $Y$ should simply be viewed as a function with domain $\Omega$ and codomain $\extR$. Of course, it may be the case that there exists some $c \in \extR$ such that $Y(\zeta)=c$ for all $\zeta \in \Omega$; in fact, this is often the case of most interest in applications. However, here $\Omega$ is of critical interest. % \begin{description} \item\emph{Sure Convergence:} To say that $( Y_t: t \in \set{A})$ \emph{converges surely (to $Y(\zeta)$)} or \emph{converges (to $Y(\zeta)$) everywhere} means that $\Omega = \set{U}$. \symdef[]{Iprob.72}{sureconvergence}{$Y(t) \to Y$}{Random process $Y(t)$ converges surely to $Y$}\symdef[]{Iprob.7201}{ssureconvergence}{$Y(t) \xto{s.} Y$}{Random process $Y(t)$ converges surely to $Y$}\symdef[]{Iprob.7202}{slimsureconvergence}{$\lim \limits_{t\to\infty} Y(t) = Y$}{Random process $Y(t)$ converges surely to $Y$}In this case, it is written % \begin{equation*} Y_t \to Y \quad \text{ or } \quad Y_t \xto{s.} Y \quad \text{ or } \quad \lim\limits_{t \to \infty} Y_t = Y \end{equation*} % and this is called \emph{sure convergence} or \emph{everywhere convergence}. \item\emph{Almost Sure Convergence:} To say that $( Y_t: t \in \set{A} )$ \emph{converges almost surely (to $Y(\zeta)$)} or \emph{converges (to $Y(\zeta)$) with probability 1} or \emph{converges (to $Y(\zeta)$) almost everywhere} as $t \to \infty$ means that $\Pr( \Omega ) = 1$. \sym{Iprob.7301}{$Y(t) \xto {a.s.} Y$}{Random process $Y(t)$ converges almost surely (\ie{}, $\Pr(\lim_{t \to \infty} Y(t) = Y) = 1$) to $Y$}\symdef[]{Iprob.7302}{asureconvergencewp1}{$Y(t) \xto{w.p.1} Y$}{Random process $Y(t)$ converges almost surely (\ie, with probability 1) to $Y$}\symdef[]{Iprob.7303}{asureconvergenceaslim}{$\aslim \limits_{t \to \infty} Y(t) = Y$}{Random process $Y(t)$ converges almost surely (\ie, with probability 1) to $Y$}In this case, it is written % \begin{equation*} Y_t \xto{a.s.} Y \quad \text{ or } \quad Y_t \xto{w.p.1} Y \quad \text{ or } \quad \aslim\limits_{t \to \infty} Y_t = Y \end{equation*} % and this is called \emph{\acro{AS}{almost sure} convergence} or \emph{almost everywhere convergence}. \end{description} \subsection{Stochastic Convergence to Random Variables} Take a totally ordered set $\set{A} \subseteq \extR$ such that $\infty$ is a limit point of $\set{A}$ in $\extR$ (\eg, $\set{A} = \N$ or $\set{A} = \R$). Also take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. With these, define a random process $( Y_t : t \in \set{A} )$ where $Y_t: \set{U} \mapsto \extR$ is a random variable (\ie, a one dimensional random vector) for each $t \in \set{A}$. Additionally, define an additional random variable $Y: \set{U} \mapsto \extR$. There are four cases of interest. % \begin{description} \item\emph{Convergence in Probability:} \symdef[]{Iprob.7401}{convergenceinp}{$Y(t) \xto{P} Y$}{Random process $Y(t)$ converges in probability to random variable $Y$}\symdef[]{Iprob.7402}{convergenceinpr}{$Y(t) \xto{\Pr} Y$}{Random process $Y(t)$ converges in probability to random variable $Y$}To say $(Y_t: t \in \set{A})$ \emph{converges in probability} \symdef[]{Iprob.7403}{convergenceinplim}{$\plim\limits_{t \to \infty} Y(t) = Y$}{Random process $Y(t)$ converges in probability to random variable $Y$}To say $(Y_t: t \in \set{A})$ \emph{converges in probability} to random variable $Y$, denoted % \begin{equation*} Y_t \xto{P} Y \quad \text{ or } \quad Y_t \xto{\Pr} Y \quad \text{ or } \quad \plim\limits_{t \to \infty} Y_t = Y \end{equation*} % means that for all $\varepsilon \in \R_{>0}$, % \begin{equation*} \Pr( |Y_t - Y| > \varepsilon ) \to 0 \text{ as } t \to \infty \end{equation*} % or, equivalently, % \begin{equation*} \Pr( |Y_t - Y| \leq \varepsilon ) \to 1 \text{ as } t \to \infty \end{equation*} \item\emph{Convergence in Mean:} \symdef[]{Iprob.7501}{meanconvergence}{$Y(t) \xto{m.} Y$}{Random process $Y(t)$ converges in the mean to random variable $Y$}\symdef[]{Iprob.7502}{limeanconvergence} {$\limean \limits_{t \to \infty} Y(t) = Y$}{Random process $Y(t)$ converges in the mean to random variable $Y$ (\ie, $Y$ is \emph{l}imit \emph{i}n the \emph{m}ean)}To say $(Y_t: t \in \set{A})$ \emph{converges in the mean} to random variable $Y$, denoted % \begin{equation*} Y_t \xto{m.} Y \quad \text{ or } \quad \limean\limits_{t \to \infty} Y_t = Y \end{equation*} % means that % \begin{equation*} \lim\limits_{t \to \infty} \E( |Y_t - Y| ) = 0 \end{equation*} % where $Y$ is called the \emph{limit in the mean}. \item\emph{Mean-Square Convergence:} \symdef[]{Iprob.7503}{msconvergence}{$Y(t) \xto{m.s.} Y$}{Random process $Y(t)$ converges in the mean square to random variable $Y$}\symdef[]{Iprob.7504}{mslimconvergence}{$\mslim\limits_{t \to \infty} Y(t) = Y$}{Random process $Y(t)$ converges in the mean square to random variable $Y$}To say $(Y_t: t \in \set{A})$ \emph{converges in the mean square} to random variable $Y$, denoted % \begin{equation*} Y_t \xto{m.s.} Y \quad \text{ or } \quad \mslim\limits_{t \to \infty} Y_t = Y \end{equation*} % means that % \begin{equation*} \lim\limits_{t \to \infty} \E( (Y_t - Y)^2 ) = 0 \end{equation*} % where this type of convergence is called \emph{\acro{MS}{mean-square} convergence}. \item\emph{Convergence in Distribution:} \symdef[]{Iprob.7601}{convergenceind}{$Y(t) \xto{D} Y$}{Random process $Y(t)$ converges in distribution to random variable $Y$}\symdef[]{Iprob.7602}{convergenceinsmalld}{$Y(t) \xto{d} Y$}{Random process $Y(t)$ converges in distribution to random variable $Y$}\symdef[]{Iprob.7603}{convergenceindlim}{$\dlim \limits_{t \to \infty} Y(t) = Y$}{Random process $Y(t)$ converges in distribution to random variable $Y$}To say $(Y_t: t \in \set{A})$ \emph{converges in distribution} to random variable $Y$, denoted % \begin{equation*} Y_t \xto{D} Y \quad \text{ or } \quad Y_t \xto{d} Y \quad \text{ or } \quad \dlim\limits_{t \to \infty} Y_t = Y \end{equation*} % means that % \begin{equation*} \lim\limits_{t \to \infty} F_{X_t}(x) = F_{X}(x) \end{equation*} % for all points $x \in \extR$ where $F_{X_t}$ is continuous. \end{description} \subsection{Relationships Among Kinds of Stochastic Convergence} Take a totally ordered set $\set{A} \subseteq \extR$ such that $\infty$ is a limit point of $\set{A}$ in $\extR$ (\eg, $\set{A} = \N$ or $\set{A} = \R$). Also take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$. With these, define a random process $( Y_t : t \in \set{A} )$ where $Y_t: \set{U} \mapsto \extR$ is a random variable (\ie, a one dimensional random vector) for each $t \in \set{A}$. Additionally, define a function $Y: \set{U} \mapsto \extR$. % \begin{equation*} \text{ If } Y_t \to Y \text{ then } Y_t \xto{a.s.} Y. \end{equation*} % Now assume that $Y$ is a random variable. In this case, % \begin{itemize} \item if $Y_t \xto{m.s.} Y$ then $Y_t \xto{m.} Y$ \item if $Y_t \xto{a.s.} Y$ then $Y_t \xto{P} Y$ \item if $Y_t \xto{m.} Y$ then $Y_t \xto{P} Y$ \item if $Y_t \xto{P} Y$ then $Y_t \xto{D} Y$ \end{itemize} % Thus, \ac{MS} convergence and \ac{AS} convergence are of particular interest in applications as they are relatively strong forms of stochastic convergence. %That is, %% %\begin{equation*} % \text{m.s.} \implies \text{m.} \implies \text{P} \implies % \text{D} % \quad \text{ and } \quad % \text{a.s.} \implies \text{P} \implies \text{D} %\end{equation*}