% Upper-case    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
% Lower-case    a b c d e f g h i j k l m n o p q r s t u v w x y z
% Digits        0 1 2 3 4 5 6 7 8 9
% Exclamation   !           Double quote "          Hash (number) #
% Dollar        $           Percent      %          Ampersand     &
% Acute accent  '           Left paren   (          Right paren   )
% Asterisk      *           Plus         +          Comma         ,
% Minus         -           Point        .          Solidus       /
% Colon         :           Semicolon    ;          Less than     <
% Equals        =           Greater than >          Question mark ?
% At            @           Left bracket [          Backslash     \
% Right bracket ]           Circumflex   ^          Underscore    _
% Grave accent  `           Left brace   {          Vertical bar  |
% Right brace   }           Tilde        ~

% ---------------------------------------------------------------------|
% --------------------------- 72 characters ---------------------------|
% ---------------------------------------------------------------------|
%
% Optimal Foraging Theory Revisited: Appendix. Mathematical Background
% (this material not included in final version of document)
%
% (c) Copyright 2007 by Theodore P. Pavlic
%

% (it would be best to split this chapter into multiple files someday;
%  it is a long book in one file at the moment)
\chapter{Mathematical Background}
\label{app:math}
\sym{*conventions}{{[\texttt{xx}]}}{see reference number \texttt{xx} in
\hyperref[ch:bibliography]{the bibliography}}

This \appname{} is meant to provide most of the mathematical knowledge
required for understanding of our models and arguments. In order for
this material to be useful to a diverse audience, we develop nearly all
mathematical theory from first principles. That being said, we will
immediately make use of the symbols \symdef{Ageneral.0}{equals}{$=$}{is
equal to} and \symdef{Ageneral.0}{definedas}{$\triangleq$}{defined as}.
The former indicates that some quantity is \emph{equal} to another
quantity and the second quantity indicates that some quantity is
\emph{defined as} another quantity. The former will be used in the
conclusions of arguments, and the latter will be used to define symbols
useful in those arguments. This difference between $\triangleq$ and $=$
will be more clear in our examples. We also make use of general counting
principles that are surely well-understood by any member of our
audience. All other concepts and notation will be defined as needed.

This \appname{} focusses on set theory, algebra, number systems, real
analysis, elementary measure theory, and propositional logic. We will
give references for each of these individual topics when appropriate.
However, \citet{Stoll79} provides a detailed unified treatment of the
sets, algebra, numbers, and logic that could easily replace most of this
\appname{}.

\section{Sets}
\label{app:math_sets}

This is meant to be a brief introduction to set theory, a topic on which
nearly all mathematics can be constructed. While there are some
alternative foundation candidates for mathematics, set theory is
commonly used, and we consider it to be the foundation of all of the
constructs that we use. Common applications of mathematics (\eg,
\emph{arithmetic}) do not make their set theoretic foundations explicit.
However, set theory is extensively used explicitly in the study of
probability and random processes. We focus on the few set-theoretic
concepts that we use. \Citet{Viniotis98} provides another useful
appendix on set theory that contains additional examples and
definitions. \Citet{JW96} give a complete introduction to set theory.
Set theory is fundamentally related to formal logic, which is discussed
in \longref{app:math_logic}, and thus analogies between set-theoretic
and logical constructs are not coincidental. While the actual history of
set theory and formal logic is more complicated, we will view formal
logic as a specialization of set theory. As mentioned, modern set theory
generalizes nearly all of formal mathematics and thus is an important
fundamental concept.

\subsection{Sets: Definition and Examples}

A \symdef[\emph{set}]{Csets.0}{set}{$\set{X}$}{a set
$\set{X}$}\symdef[]{Csets.1a}{longset}{$\{a,b,c\}$}{a set of objects
$a$, $b$, and $c$} is a roughly a collection of distinct items, where an
item is any abstract object. This definition follows from \emph{naive
(or intuitive) set theory}. Unfortunately, this definition is not
rigorous and can lead to the construction of paradoxical sets. The
modern definition of a set follows from \emph{axiomatic set theory}
(\ie, \emph{\acro[]{ZFC}{Zermelo-Fraenkel set theory with the axiom of
choice assumed}}, which is \emph{\acro{ZF}{Zermelo-Fraenkel set theory}}
with the \emph{axiom of choice} also assumed), which prevents these
paradoxes by defining a set as an object that satisfies certain specific
mathematical axioms. These axioms endow sets with important
characteristics on which modern set theory is built. A proper handling
of set theory would define a set using these axioms; however, for
brevity, we give the naive set-theoretic definition. By doing this, we
risk leading the reader into paradoxes of logic \citep[for details,
see][]{JW96}; however, the theory used in the rest of our work depends
upon the modern axiomatic definition.

\paragraph{Notation:} When sets are listed explicitly, their elements
are usually separated by commas and bracketed with curly braces. Because
sets are abstract entities, they are often specified with words. The
following are some example sets that we will use throughout this
\appname{}.
%
\begin{subequations}\label{eq:ex_sets}
\begin{align}
        \set{Z} &\triangleq \{\text{The people in the living room right
        now}\}
        \label{eq:ex_set_Z}\\
        \set{Q} &\triangleq \{\text{The objects that could fit inside a
        cube with $1 \text{ m}^3$ volume}\}
        \label{eq:ex_set_Q}\\
        \set{J} &\triangleq \{\text{Statements made by Joe}\}
        \label{eq:ex_set_J}\\
 %       \set{S} &\triangleq \{\text{The four different outcomes of two
 %       successive coin tosses}\}\\
 %       &= \{ (\text{Tails},\text{Tails}),
 %       (\text{Tails},\text{Heads}), (\text{Heads},\text{Tails}),
 %       (\text{Heads},\text{Heads}) \}
        \set{S} &\triangleq \{\text{The two different outcomes of a
        single coin toss}\}\nonumber\\
        &= \{ Heads, Tails \}
        \label{eq:ex_set_S}
\end{align}
%
In the last case (\ie, the set $\set{S}$), we show how a set definition
can be made more precise with an enumeration of its specific elements.
We can similarly define the set $\setset{O}$ as
%
\begin{align}
        \setset{O} 
        &\triangleq \{\text{The set of the above examples of sets}\}
        \nonumber\\
        &= \{\set{Z},\set{Q},\set{J},\set{S}\}
        \label{eq:ex_set_O}
\end{align}
\end{subequations}
%
That is, the set $\setset{O}$ is a set of sets. We will typically use
calligraphy for the names of sets (\eg, $\set{A}$) and script for the
names of sets of sets (\eg, $\setset{A}$).

\paragraph{Numbers and Infinite Sets:} It is important that sets can
have elements of other sets. In fact, the set $\{\}$ is different than
the set $\{\{\}\}$ and so $\{ \{\}, \{\{\}\} \}$ is a legal
representation of a set since it contains distinct items. In fact, the
formal definitions of the set of \emph{natural numbers} and the set of
\emph{whole numbers} can each be defined by any of the sets
%
\begin{subequations}
\label{eq:some_countably_infinite_sets}
\begin{align}
        \biggl\{ 
        \{\}, 
        \bigl\{ \{\} \bigr\}, 
        \Bigl\{ \bigl\{\{\}\bigr\} \Bigr\}, 
        \dots 
        \biggr\}
        \label{eq:some_countably_infinite_sets_a}\\
        \Biggl\{ 
        \{\}, 
        \bigl\{ \{\} \bigr\}, 
        \Bigl\{ \{\}, \bigl\{\{\}\bigr\} \Bigr\},
        \biggl\{ \{\}, 
                 \bigl\{\{\}\bigr\}, 
                 \Bigl\{\{\}, \bigl\{\{\}\bigr\}\Bigr\} \biggr\},
        \dots 
        \Biggr\}
        \label{eq:some_countably_infinite_sets_b}\\
        \Biggl\{ 
        \bigl\{ \{\} \bigr\}, 
        \Bigl\{ \{\}, \bigl\{\{\}\bigr\} \Bigr\},
        \biggl\{ \{\}, 
                 \bigl\{\{\}\bigr\}, 
                 \Bigl\{\{\}, \bigl\{\{\}\bigr\}\Bigr\} \biggr\},
        \dots 
        \Biggr\}
        \label{eq:some_countably_infinite_sets_c}
\end{align}
\end{subequations}
%
where \symdef{Csets.1aa}{dots}{$\dots$}{continue the established pattern
\adinfinitum{} (\eg, the infinite set $\{1,2,3,\dots\}$)} indicates that
the established pattern should continue \adinfinitum{}. The pattern in
\longref{eq:some_countably_infinite_sets_a} is that each element set
contains the element before it (\ie, the element to the left of it in
the list). The pattern in \longrefs{eq:some_countably_infinite_sets_b}
and \shortref{eq:some_countably_infinite_sets_c} is that each element
set contains \emph{all} of the sets before it; however, the initial
element set is different in these two examples. Therefore, all three of
these sets are each \emph{infinite sets} since they contain an infinite
(\ie, unbounded) number of elements. The previous example sets
$\set{Z}$, $\set{S}$ and $\setset{O}$ are all \emph{finite sets} since
they contain a finite (\ie, bounded) number of elements. Without further
information, it is not clear whether the sets $\set{Q}$ and $\set{J}$
are infinite or finite sets. The concepts of finite and infinite sets
will be further explored in more below. 

We choose to define the set of whole numbers $\W$ and the set of natural
numbers $\N$ as the infinite sets in
\longrefs{eq:some_countably_infinite_sets_b} and
\shortref{eq:some_countably_infinite_sets_c} respectively. First, note
that every element of the infinite set in
\longref{eq:some_countably_infinite_sets_c} is also an element of the
set in \longref{eq:some_countably_infinite_sets_b}. Now assign familiar
symbols to the elements of these two infinite sets in order to make the
definitions of the \emph{whole numbers} and \emph{natural numbers} more
explicit. The result is
%
\begin{align}
        \W &\triangleq \{0,1,2,3,\dots\}
        \label{eq:whole_numbers}
\end{align}
%
and
%
\begin{align}
        \N &\triangleq \{1,2,3,\dots\}
        \label{eq:natural_numbers}
\end{align}
%
where
%
\begin{subequations}
\begin{align}
        0 &\triangleq \{\}
        \label{eq:zero}\\
        1 &\triangleq \bigl\{ \{\} \bigr\} = \{0\}
        \label{eq:one}\\
        2 
        &\triangleq 
        \Bigl\{ \{\}, \bigl\{\{\}\bigr\} \Bigr\} 
        = 
        \{0,1\}\label{eq:two}\\
        3 
        &\triangleq 
        \biggl\{ \{\}, 
                 \bigl\{\{\}\bigr\}, 
                 \Bigl\{\{\}, \bigl\{\{\}\bigr\}\Bigr\} \biggr\}
        = 
        \{0,1,2\}
        \label{eq:three}\\
        {}&\mathrel{\vdots} {}
        \nonumber
\end{align}
\end{subequations}
%
The justification for the construction process of the whole numbers,
which are used to count things, is as follows. If the universe was
empty, it would be equivalent to the empty set $\{\}$ and would have
zero items in it. Thus, $0 \triangleq \{\}$. Once $0$ was constructed,
the universe would now have one thing in it, and so it would be
represented by $\{0\}$, and thus $1 \triangleq \{0\}$. This construction
process can continue \adinfinitum{} until all of the whole numbers (\ie,
all elements of $\W$) are defined. We will discuss how \emph{arithmetic}
can be defined on the sets $\W$ and $\N$ in \longref{app:math_numbers}.
However, for the moment we will use these as two example infinite sets
and each whole number as an example finite set. To demonstrate how
arbitrary finite sets can interact, we also introduce example finite
sets
%
\begin{align*}
        \set{A} &\triangleq \{a,b,c\}\\
        \set{B} &\triangleq \{c,d,e\}\\
        \set{C} &\triangleq \{b\}\\
        \set{D} &\triangleq \{d,e\}\\
        \set{E} &\triangleq \{c,d,e\}
\end{align*}
%
where $a,b,c,d,e$ are arbitrary abstract objects. Additionally, since
the generic \emph{empty set} will frequently be used in discussion, we
will often denote it with the symbol
\symdef{Csets.1b}{emptyset}{$\emptyset$}{the empty set (\ie, $\{\}$)}
which is defined $\emptyset \triangleq \{\}$. If a set is not the empty
set, it will be called \emph{nonempty}. Also, note that the set
$\set{C}$ only includes a single element. In this case, the set is
called a \emph{singleton set}.

\subsection{Set Inclusion, Set Exclusion, Subsets, and Supersets}

There are a number of terms that capture the relationship between two
sets or a set and its elements.

\paragraph{Inclusion and Exclusion:} The notation
\symdef[]{Csets.1b}{in}{$\in$}{is an element of (\ie, set inclusion)}$a
\in \set{A}$ indicates that object $a$ is an \emph{element} of set
$\set{A}$, and $\set{A}$ is said to \emph{contain} the \emph{set}
$\{a\}$. Similarly, \symdef[]{Csets.1b}{notin}{$\notin$}{is not an
element of (\ie, set exclusion)}$a \notin \set{B}$ denotes that object
$a$ is not an element of set $\set{B}$. 

\paragraph{Containment:} Since every element of set $\set{D}$ is also an
element of set $\set{B}$, set $\set{D}$ is called a \emph{subset} of set
$\set{B}$ and set $\set{B}$ is called a \emph{superset} of set
$\set{D}$; this is denoted by either
\symdef[]{Csets.1c}{subsupseteq}{$\subseteq$ ($\supseteq$)}{is a subset
(superset) of}$\set{D} \subseteq \set{B}$ or $\set{B} \supseteq
\set{D}$. In this case, we say that $\set{D}$ \emph{is contained in}
$\set{B}$ or $\set{B}$ \emph{contains} $\set{D}$. In particular, note
that since $a \in \set{A}$, $\{a\} \subseteq \set{A}$, and so $\set{A}$
is said to \emph{contain} $\{a\}$.

\paragraph{Equality:} \symdef[]{Csets.1d}{setequal}{$\set{X} =
\set{Y}$}{set $\set{X}$ is equal to set $\set{Y}$ (\ie, $\set{X}
\subseteq \set{Y}$ and $\set{Y} \subseteq
\set{X}$)}\symdef[]{Csets.1d}{setnotequal}{$\set{X} \neq \set{Y}$}{set
$\set{X}$ is not equal to set $\set{Y}$}Two sets are \emph{equal} when
one set is both a subset and a superset of the other set. Otherwise, the
two sets are not equal. For example, since $\set{E} \subseteq \set{B}$
and $\set{E} \supseteq \set{B}$ then $\set{E}$ and $\set{B}$ are equal,
denoted $\set{E} = \set{B}$. However, since $\set{C} \subseteq \set{A}$
but set $\set{A}$ is not a subset of set $\set{C}$ then set $\set{A}$
and set $\set{C}$ are not equal, denoted $\set{A} \neq \set{C}$.

\paragraph{Strict Containment:} \symdef[]{Csets.1c}{subsupset}{$\subset$
($\supset$)}{is a proper/strict subset (superset) of}More generally,
when one set is a subset of another set but the sets are not equal then
the subset is called a \emph{proper (or strict) subset} and the superset
is a \emph{proper (or strict) superset}. From the previous example,
$\set{C} \subset \set{A}$ or $\set{A} \supset \set{C}$ both denote that
set $\set{C}$ is a proper subset of set $\set{A}$ and $\set{A}$ is a
proper superset of set $\set{C}$. In this case, we say that $\set{C}$
\emph{is strictly contained in} $\set{A}$ or $\set{A}$ \emph{strictly
contains} $\set{C}$. Since $a \in \set{A}$ and $\{a\} \neq \set{A}$ then
$\{a\} \subset \set{A}$, and so $\set{A}$ is said to \emph{strictly
contain} $\{a\}$ or \emph{contain $\{a\}$ strictly}. Note that some
authors omit symbols for strict containment and use the symbols
$\subset$ and $\supset$ to represent containment in general.

\paragraph{Containment of Empty Set:} The empty set $\emptyset$ is a
subset of every set. Thus, $\emptyset \subseteq \set{A}$ and $\emptyset
\subseteq \{\}$. In fact, $\emptyset \subset \set{A}$; every set
contains the empty set with \emph{strict} containment if and only if the
set is nonempty (\ie, if the set is nonempty then the set strictly
contains $\emptyset$ and if the set strictly contains $\emptyset$ then
the set must be nonempty).

\paragraph{The Size of Sets:} To say that set $\set{X}$ is
\emph{smaller} than $\set{Y}$ means that $\set{X} \subseteq \set{Y}$.
For a set of sets $\setset{B}$ with $\set{X} \in \setset{B}$, to say
that $\set{X}$ is the \emph{smallest} element of $\setset{B}$ means that
for any $\set{B} \in \setset{B}$, $\set{X} \subseteq \set{B}$.
Similarly, to say that set $\set{X}$ is \emph{larger} than $\set{Y}$
means that $\set{Y} \subseteq \set{X}$.  For a set of sets $\setset{B}$
with $\set{X} \in \setset{B}$, to say that $\set{X}$ is the
\emph{largest} element of $\setset{B}$ means that for any $\set{B} \in
\setset{B}$, $\set{B} \subseteq \set{X}$.

\paragraph{Infinite Sets:} Note that all of these relationships are
defined for infinite sets as well; for example, $\N \subset \W$ and $\N
\neq \W$. We should note that a formal definition of these set relations
(\ie, $=$, $\neq$, $\subseteq$, $\supseteq$, etc.) requires a discussion
of the \emph{universal set}, which we introduce in
\longref{app:math_universal_set}; we will discuss this briefly in
\longref{app:math_relations}.

\paragraph{Set-Builder Notation:} \symdef[]{Csets.1ab}{setbuilder}{$\{ u
: p \}$}{set of all elements of $u$ such that $p$}New sets can be built
from other \emph{already existing} sets using \emph{set-builder
notation}. That is, the notation $\{ u : p \}$ represents the set of all
elements of the \emph{universe of discourse} $u$ that make the
\emph{predicate} $p$ true. For example, the set $\{ x \in \set{A} : x
\in \set{B} \}$ (\ie, the set of all elements of set $\set{A}$ that are
also elements of set $\set{B}$) is equivalent to the set $\{ x : x \in
\set{A} \text{ and } x \in \set{B} \}$ (\ie, the set of all elements of
both sets $\set{A}$ and $\set{B}$) which represents the \emph{singleton
set} $\{c\}$. We will use this notation heavily to construct sets.
\symdef[]{Csets.1ab}{setbuilderlong}{$\{ u : p, q, r \}$}{set of all
elements of $u$ such that $p$, $q$, and $r$}If a number of statements in
the predicate are connected with commas, all must occur simultaneously.
For example, the set $\{ x : x \in \W, x \notin \N\} = \{0\}$ represents
the whole numbers that are not natural numbers.

\symdef[]{Dseq.0}{indexnotation}{$x(i)$~or~$x_i$~or~$x^i$}{alternate
notations for an index $i$ on a symbol $x$}% 
\paragraph{Index Notation and Index Sets:} A symbol $\theta$ may be
equipped with a \emph{subscript} like $\theta_i$, a \emph{superscript}
like $\theta^i$, or an \emph{argument} like $\theta(i)$. Depending on
the types of symbols $\theta$ and $i$ and the context of their use, each
of these notations may have a different meaning. However, very often $i$
is serves as an \emph{index} which makes a notation like $\theta_i$
distinct from a notation like $\theta_j$.   

In particular, often an \emph{index set} will be defined to provide
indices that help generate notations that share some similarity. For
example, take the index set $\set{I} \triangleq \{a,b,c\}$ which
generates the symbols $\theta_a$, $\theta_b$, and $\theta_c$. These
symbols can be easily collected using set-builder notation into the set
$\{ \theta_i : i \in \set{I} \}$. This is can be a more convenient
notation than explicitly listing each element in the set, as in $\{
\theta_a, \theta_b, \theta_c \}$. Note that very often index sets will
be \emph{equipped} with an \emph{order relation}, the topic of
\longref{app:math_order_theory}, for reasons discussed in
\longref{app:math_sumprod_ind_fam}.

\paragraph{Natural and Whole Numbers:} Note that the numbers defined in
\longrefs{eq:zero}--\shortref{eq:three} which are all elements of the
set $\W$ have element and subset relationships with each other. In
particular,
%
\begin{align*}
        0 \in 1 \text{ and } 
        0 \in 2 \text{ and } 
        0 \in 3 \text{ and } \cdots\\
        1 \in 2 \text{ and } 
        1 \in 3 \text{ and } \cdots\\
        2 \in 3 \text{ and } \cdots\\
        \vdots
\end{align*}
%
and
%
\begin{align*}
        0 \subset 1 \text{ and } 
        0 \subset 2 \text{ and } 
        0 \subset 3 \text{ and } \cdots\\
        1 \subset 2 \text{ and } 
        1 \subset 3 \text{ and } \cdots\\
        2 \subset 3 \text{ and } \cdots\\
        \vdots
\end{align*}
%
This is a special and noteworthy property of the elements of the whole
numbers $\W$. The subset relationship among the whole numbers can be
summarized as
%
\begin{align*}
        0 \subset 
        1 \subset 
        2 \subset 
        3 \subset 
        4 \subset 
        5 \subset 
        6 \subset 
        \cdots
\end{align*}
%
which, of course, also means that
%
\begin{align*}
        0 \subseteq
        1 \subseteq
        2 \subseteq
        3 \subseteq
        4 \subseteq
        5 \subseteq
        6 \subseteq
        \cdots
\end{align*}
%
and this kind of telescoping notation is common. This captures the more
familiar notions of $<$ and $\leq$ (\ie, less than and less than or
equal to) respectively, which both will be introduced in
\longref{app:math_total_order_set} and explored in
\longref{app:math_numbers}.

\subsection{The Ordered Pair}
\label{app:math_ordered_pair}

We will use $(\cdot,\cdot)$ to denote an \symdef[\emph{ordered
pair}]{Csets.2cart0}{orderedpair}{$(a,b)$}{ordered pair of objects $a$
and $b$ (\ie, $(a,b) \triangleq \{\{a\},\{a,b\}\}$)}. An ordered pair is
a collection of two objects that has the property that for objects $a$,
$b$, $c$, and $d$, the ordered pair $(a,b)$ is equal to ordered pair
$(c,d)$ if and only if $a$ is equal to $c$ and $b$ is equal to $d$. This
is a stronger property of equality than the one that is carried with
sets. We refer to this special property as the \emph{equality property}
of ordered pairs. Take arbitrary objects $a$ and $b$. There are two
special traits of ordered pairs that distinguish them from simple sets.
%
\begin{itemize}
        \item While $\{a,b\}$ and $\{b,a\}$ describe equivalent sets,
                $(a,b)$ and $(b,a)$ describe two distinct ordered pairs.
                In other words, ordered pairs have some notion of
                element place or rank. Every ordered pair has a
                \emph{first element} which may also be called its
                \emph{left projection}; similarly, every ordered pair
                has a \emph{second element} which may also be called its
                \emph{right projection}. For distinct objects $a$ and
                $b$ and ordered pair $(a,b)$, $a$ is the ordered pair's
                first element and $b$ is the ordered pair's second
                element. 
        \item Note that elements of an ordered pair need not be
                distinct. Thus, $(a,a)$ and $(b,b)$ are both valid
                ordered pairs. For each of these two examples, the first
                element and second element of the ordered pair are
                equal.
\end{itemize}
%
Other common notations for the ordered pair $(a,b)$ include $\langle a,b
\rangle$ and the \emph{Dirac inner-product notation} $\langle a|b
\rangle$. These other notations have been introduced to reduce ambiguity
between ordered pairs and other set-theoretic constructs. However, we
will use the $(a,b)$ notation as we will use parentheses around any
ordered list and curly braces around any unordered list (\eg, a set). We
will remove any ambiguity by the context in which the notation is used.

Ordered pairs can be formally defined using sets in a number of
intuitive ways. Again, take the arbitrary objects $a$ and $b$. It is
natural to define the ordered pair $(a,b)$ as the set
$\{\{0,a\},\{1,b\}\}$, which emphasizes the order of the two objects by
associating each of them with specific symbols $0$ and $1$.
Additionally, it is easy to show that this definition of ordered pair
has the special equality property required of all ordered pairs.
However, we make use of the notion of a \emph{Kuratowski pair}, which
defines the ordered pair $(a,b)$ as
%
\begin{equation*}
        (a,b) \triangleq \{ \{a\}, \{a,b\} \}
\end{equation*}
%
This is the usual definition of ordered pair used in axiomatic set
theory. It also has the equality property of ordered pairs, but it does
not require the introduction of symbols $0$ and $1$ like the other
definition.

\subsection{The Ordered Tuple}

An ordered list of zero or finite length is called an \emph{ordered
tuple}, which we refer to as simply a \emph{tuple}. Take $n \in
\{0,1,2,\dots\}$ and objects $x_1$, $x_2$, \dots, $x_n$. An
\emph{ordered $n$-tuple}, which we refer to as an \emph{$n$-tuple}, is a
tuple of length $n$. The shortest tuple, denoted $()$ and called a
$0$-tuple, is defined to be the empty set. That is,
%
\begin{equation*}
        () \triangleq \emptyset
\end{equation*}
%
A tuple made up of only the $x_1$ object, denoted $(x_1)$ and called a
$1$-tuple, is defined as
%
\begin{equation*}
        (x_1) \triangleq ((), x_1)
\end{equation*}
%
That is, a $1$-tuple is an ordered pair with a $0$-tuple left element
and an object for its right element. Similarly $(x_1,x_2)$ denotes a
$2$-tuple with the $x_1$ and $x_2$ items in that respective order and is
defined as
%
\begin{equation*}
        (x_1,x_2) \triangleq ((x_1), x_2)
\end{equation*}
%
That is, the $2$-tuple $(x_1,x_2)$ is defined as an ordered pair with
the $1$-tuple $(x_1)$ as its first element and the object $x_2$ as its
second element. This is different than the ordered pair $(x_1,x_2)$,
which has a specific set-theoretic definition. The ambiguity between
these two notations is one of the many reasons why other authors use a
different notation for an ordered pair. However, this ambiguity should
not cause any confusion in any of our arguments. Thus, we will use
parentheses in all structures related to lists (\ie, collections of
objects in which the order of the objects is important). In fact, this
ambiguity will serve as a notational convenience in
\longref{app:math_cartesian_prod}. In general,
\symdef{Csets.2cart01}{ntuple}{$(x_1,x_2,\dots,x_n)$}{$n$-tuple (\ie,
tuple of length $n \in \N$ with coordinates $x_1$, $x_2$,\dots,$x_n$ in
their respective order)} denotes an $n$-tuple with the $x_1$, $x_2$,
\dots, $x_n$ in their respective order and is defined as
%
\begin{equation*}
        (x_1,x_2,\dots,x_n) \triangleq ((x_1,x_2,\dots,x_{n-1}), x_n)
\end{equation*}
%
using an ordered pair construction similar to the one used for a
$2$-tuple. For an $n$-tuple $(x_1,x_2,\dots,x_n)$, $x_1$ is called the
\emph{first coordinate} and $x_2$ is called the \emph{second coordinate}
and, continuing in this pattern, $x_n$ is called the \emph{n$^\text{th}$
coordinate}. Thus, an $n$-tuple has $n$ \emph{coordinates}.

As defined here, all tuples of finite non-zero length can be expressed
in terms of ordered pairs. The construction of these tuples grows
``rightward'' as new elements are introduced as right projections of
each ordered pair. In computer science, it is common to define these
tuples as growing ``leftward'' with new elements introduced as left
projections of each ordered pair. It can be shown that this difference
has no major impact on the utility of the tuple. Growing tuples to the
right or to the left is largely influenced by historical conventions in
different disciplines and has little impact on the application of
tuples.

As will be shown in \longref{app:math_cartesian_prod}, it is more common
to use an ordered pair instead of a $2$-tuple primarily because the
recursive construction of tuples allows the ordered pair to serve as a
kind of fundamental tuple from which all other tuples can be built.
Many authors only define tuples for lists of three or more. For lists of
two items, ordered pairs are used. For lists of one item, the item is
stands alone without a list. We will follow this convention as well.

\subsection{Cartesian Products}
\label{app:math_cartesian_prod}

The \symdef[\emph{binary Cartesian
product}]{Csets.2cart1}{cartesian2}{$\set{X} \times \set{Y}$}{(binary)
Cartesian product of sets $\set{X}$ and $\set{Y}$ (\ie, $\set{X} \times
\set{Y} \triangleq \{(x,y):x \in \set{X}, y \in \set{Y}\}$)} of two
non-empty sets $\set{X}$ and $\set{Y}$ is denoted $\set{X} \times
\set{Y}$ and is defined
%
\begin{equation*}
        \set{X} \times \set{Y}
        \triangleq
        \{(x,y) : x \in \set{X}, y \in \set{Y}\}
\end{equation*}
%
where sets $\set{X}$ and $\set{Y}$ are called \emph{factors}. The
parenthetical notation in this definition represents the ordered pair.
That is, the binary Cartesian product of two non-empty sets is the set
of all ordered pairs that have a first coordinate from one set and a
second coordinate from the other set. If either of the two factor sets
are the empty set then the binary Cartesian product is also the empty
set.

Since the result of a binary Cartesian product of two sets is an
additional set, it can serve as a factor in an additional binary
Cartesian product. For example, consider non-empty sets $\set{X}$,
$\set{Y}$, and $\set{Z}$. Using the definitions above, the \emph{ternary
Cartesian product} $\set{X} \times \set{Y} \times \set{Z}$ can be built
with two binary Cartesian products and expressed as
%
\begin{equation*}
        \set{X} \times \set{Y} \times \set{Z}
        \triangleq
        \{(x,y,z) : x \in \set{X}, y \in \set{Y}, z \in \set{Z}\}
\end{equation*}
%
That is, it can be expressed as a set of all possible $3$-tuples with a
first coordinate from set $\set{X}$, a second coordinate from set
$\set{Y}$, and a third coordinate from set $\set{Z}$. Also note that if
any of these sets were empty, this ternary Cartesian product would also
be empty. This example shows the utility of using the parenthetical
notation for ordered pair. Because a binary Cartesian product is defined
with an ordered pair, binary Cartesian products of binary Cartesian
products can be defined with tuples. In particular, take $n \in
\{2,3,\dots\}$ and non-empty sets $\set{X}_1$, $\set{X}_2$, \dots,
$\set{X}_n$. If tuples and ordered pairs share the same notation then
the very general \emph{$n$-ary Cartesian product}, or simply the
\symdef[\emph{Cartesian product}]{Csets.2cart10}{cartesian}{$\set{X}_1
\times \cdots \times \set{X}_n$}{Cartesian product of $n$ sets
$\set{X}_1$, \dots, $\set{X}_n$ (\ie, $\set{X}_1 \times \cdots \times
\set{X}_n \triangleq \{(x_1,\dots,x_n):x_1 \in \set{X}_1, \dots, x_n \in
\set{X}_n\}$)}, of these sets can be defined by
%
\begin{equation*}
        \set{X}_1 \times \set{X}_2 \times \cdots \times \set{X}_n
        \triangleq
        \{(x_1,x_2,\dots,x_n) : 
        x_1 \in \set{X}_1, x_2 \in \set{X}_2, \dots, x_n \in \set{X}_n\}
\end{equation*}
%
where the $n$-tuple in the set definition uses the convention that a
$2$-tuple refers to an ordered pair and an $n$-tuple where $n > 2$ uses
the standard tuple definition. If any of these $n$ sets are empty, the
result is an empty set. The notation
%
\begin{equation*}
        \prod\limits_{i=1}^n \set{X}_i
        \triangleq
        \set{X}_1 \times \set{X}_2 \times \cdots \times \set{X}_n
\end{equation*}
%
is also often used.

Consider the special case of a Cartesian product of a single set
$\set{X}$ with itself $n$ times. In this case, this Cartesian product
\symdef{Csets.2cart11}{cartesiann}{$\set{X}^n$}{Cartesian product of set
$\set{X}$ with itself $n$ times (\eg, $\set{X}^3 \triangleq \set{X}
\times \set{X} \times \set{X}$)} is denoted $\set{X}^n$. That is, 
%
\begin{align*}
        \set{X}^n
        &\triangleq
        \prod\limits_{i=1}^n \set{X}\\
        &=
        \set{X} \times \set{X} \times \cdots \times \set{X}
\end{align*}
%
Therefore, the set $\set{X}^n$ is a the set of all $n$-tuples that can
be made by choosing each of the $n$ coordinates to be an element of the
set $\set{X}$. For example, the Cartesian product set $\{0,1\}^2 =
\{(0,0),(0,1),(1,0),(1,1)\}$ and $(0,1,1,0) \in \{0,1\}^4$. In
\longrefs{app:math_functions} and \shortref{app:math_cardinality}, it
will be shown that the notation $\set{X}^n$ has other interesting
interpretations that show that it was not chosen arbitrarily.

The most general definition of Cartesian product also allows for
products with an infinite (\ie, unbounded or even uncountable) number of
factors. For example, the set of all countably infinite strings
consisting of elements from the set $\{0,1\}$ is represented as
%
\begin{align*}
        \{0,1\}^\N
        &\triangleq
        \prod\limits_{i=1}^\infty \{0,1\}\\
        &=
        \{0,1\} \times
        \{0,1\} \times
        \{0,1\} \times
        \cdots
\end{align*}
%
where $\N$ are the natural numbers defined in
\longref{eq:natural_numbers}. Roughly speaking, $\{0,1\}^\N$ represents
every way an ordered list of every natural number can have each of its
elements replaced with either a $0$ or a $1$. Using the definition from
\longref{eq:two}, this set can also be represented by $2^\N$, which is a
notation that will be explored in more detail in
\longref{app:math_power_sets}. 

\subsection{Functions: Mappings Between Sets}
\label{app:math_functions}

Roughly speaking, a function relates elements from one set to elements
of another set. Take two arbitrary sets $\set{G}$ (called the
\emph{domain}) and $\set{H}$ (called the \emph{codomain}). A
\emph{(total) function} $f$ is a set with $f \subseteq \set{G} \times
\set{H}$ such that for every $g \in \set{G}$, there is \emph{exactly}
one pair $(x,y) \in f$ such that $x = g$. The set of all such functions
is denoted 
%
\begin{equation*}
        \set{H}^\set{G}
\end{equation*}
%
and so function $f \in \set{H}^\set{G}$; however, it is more common to
use the \symdef[]{Ganalysis.0011}{function}{$f: \set{X} \mapsto
\set{Y}$}{a function $f$ with domain $\set{X}$ and codomain
$\set{Y}$}notation 
%
\begin{equation*}
        f: \set{G} \mapsto \set{H}
\end{equation*}
%
Any ambiguity with the $\set{H}^\set{G}$ notation and the Cartesian
product notation will be removed in \longref{app:math_congruent_sets}.
For some $x \in \set{G}$ and the corresponding $(x,y) \in f$, the right
projection $y$ of $(x,y)$ is denoted $f(x)$. In other words, by the
definition of a function, for any function $f: \set{G} \mapsto \set{H}$,
for all $x \in \set{G}$, there exists a unique $f(x)$ such that
$(x,f(x)) \in f$.

\paragraph{Examples:} Take the three finite sets
%
\begin{align*}
        \set{X} &\triangleq \{a,b,c,d\}\\
        \set{Y} &\triangleq \{s,t,u,v,w\}\\
        \set{Z} &\triangleq \{m,n,o,p\}
\end{align*}
%
Now take functions $f_s: \set{X} \mapsto \set{Y}$,
$f_i: \set{Y} \mapsto \set{X}$, $f: \set{X} \mapsto \set{Z}$, and
$f^{-1}: \set{Z} \mapsto \set{X}$. Define these four functions by
%
\begin{align*}
        f_s &\triangleq \{(a,t),(b,u),(c,v),(d,w)\}\\
        f_i &\triangleq \{(s,a),(t,a),(u,b),(v,c),(w,d)\}\\
        f &\triangleq \{(a,m),(b,n),(c,o),(d,p)\}\\
        f^{-1} &\triangleq \{(m,a),(n,b),(o,c),(p,d)\}
\end{align*}
%
These four functions are depicted by
\longrefs{fig:functions_injective}--\shortref{fig:functions_inverse},
respectively. 
%
\begin{figure}[!ht]\centering
        \subfloat[Injective Function][Injective Function $f_i$]{
        \begin{picture}(100,100)(0,0)
                \thinlines
                %
                \put(25,50){\oval(40,100)}
                \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}}
                \put(25,80){\circle*{2}}
                \put(25,60){\circle*{2}}
                \put(25,40){\circle*{2}}
                \put(25,20){\circle*{2}}
                \put(22,80){\makebox(0,0)[r]{\text{$a$}}}
                \put(22,60){\makebox(0,0)[r]{\text{$b$}}}
                \put(22,40){\makebox(0,0)[r]{\text{$c$}}}
                \put(22,20){\makebox(0,0)[r]{\text{$d$}}}
                %
                \put(75,50){\oval(40,100)}
                \put(75,98){\makebox(0,0)[t]{\text{$\set{Y}$}}}
                \put(75,83.3){\circle*{2}}
                \put(75,66.7){\circle*{2}}
                \put(75,50){\circle*{2}}
                \put(75,33.3){\circle*{2}}
                \put(75,16.7){\circle*{2}}
                \put(78,83.3){\makebox(0,0)[l]{\text{$s$}}}
                \put(78,66.7){\makebox(0,0)[l]{\text{$t$}}}
                \put(78,50){\makebox(0,0)[l]{\text{$u$}}}
                \put(78,33.3){\makebox(0,0)[l]{\text{$v$}}}
                \put(78,16.7){\makebox(0,0)[l]{\text{$w$}}}
                %
                \qbezier(25,80)(50,90)(75,66.7)
                \qbezier(25,60)(50,70)(75,50)
                \qbezier(25,40)(50,50)(75,33.3)
                \qbezier(25,20)(50,30)(75,16.7)
                %
                \linethickness{\unitlength}
                \put(75,66.7){\vector(1000,-932){0}}
                \put(75,50){\vector(5,-4){0}}
                \put(75,33.3){\vector(1000,-668){0}}
                \put(75,16.7){\vector(1000,-532){0}}
        \end{picture}
        \label{fig:functions_injective}
        }
        \quad
        \subfloat[Surjective Function][Surjective Function $f_s$]{
        \begin{picture}(100,100)(0,0)
                \thinlines
                %
                \put(25,50){\oval(40,100)}
                \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}}
                \put(25,80){\circle*{2}}
                \put(25,60){\circle*{2}}
                \put(25,40){\circle*{2}}
                \put(25,20){\circle*{2}}
                \put(22,80){\makebox(0,0)[r]{\text{$a$}}}
                \put(22,60){\makebox(0,0)[r]{\text{$b$}}}
                \put(22,40){\makebox(0,0)[r]{\text{$c$}}}
                \put(22,20){\makebox(0,0)[r]{\text{$d$}}}
                %
                \put(75,50){\oval(40,100)}
                \put(75,98){\makebox(0,0)[t]{\text{$\set{Y}$}}}
                \put(75,83.3){\circle*{2}}
                \put(75,66.7){\circle*{2}}
                \put(75,50){\circle*{2}}
                \put(75,33.3){\circle*{2}}
                \put(75,16.7){\circle*{2}}
                \put(78,83.3){\makebox(0,0)[l]{\text{$s$}}}
                \put(78,66.7){\makebox(0,0)[l]{\text{$t$}}}
                \put(78,50){\makebox(0,0)[l]{\text{$u$}}}
                \put(78,33.3){\makebox(0,0)[l]{\text{$v$}}}
                \put(78,16.7){\makebox(0,0)[l]{\text{$w$}}}
                %
                \qbezier(25,80)(50,100)(75,83.3)
                \qbezier(25,80)(50,70)(75,66.7)
                \qbezier(25,60)(50,50)(75,50)
                \qbezier(25,40)(50,30)(75,33.3)
                \qbezier(25,20)(50,10)(75,16.7)
                %
                \linethickness{\unitlength}
                \put(25,80){\vector(-5,-4){0}}
                \put(25,80){\vector(-5,2){0}}
                \put(25,60){\vector(-5,2){0}}
                \put(25,40){\vector(-5,2){0}}
                \put(25,20){\vector(-5,2){0}}
        \end{picture}
        \label{fig:functions_surjective}
        }\\
        \medskip
        \subfloat[Bijective Function][Bijective Function $f$]{
        \begin{picture}(100,100)(0,0)
                \thinlines
                %
                \put(25,50){\oval(40,100)}
                \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}}
                \put(25,80){\circle*{2}}
                \put(25,60){\circle*{2}}
                \put(25,40){\circle*{2}}
                \put(25,20){\circle*{2}}
                \put(22,80){\makebox(0,0)[r]{\text{$a$}}}
                \put(22,60){\makebox(0,0)[r]{\text{$b$}}}
                \put(22,40){\makebox(0,0)[r]{\text{$c$}}}
                \put(22,20){\makebox(0,0)[r]{\text{$d$}}}
                %
                \put(75,50){\oval(40,100)}
                \put(75,98){\makebox(0,0)[t]{\text{$\set{Z}$}}}
                \put(75,80){\circle*{2}}
                \put(75,60){\circle*{2}}
                \put(75,40){\circle*{2}}
                \put(75,20){\circle*{2}}
                \put(78,80){\makebox(0,0)[l]{\text{$m$}}}
                \put(78,60){\makebox(0,0)[l]{\text{$n$}}}
                \put(78,40){\makebox(0,0)[l]{\text{$o$}}}
                \put(78,20){\makebox(0,0)[l]{\text{$p$}}}
                %
                \qbezier(25,80)(50,90)(75,80)
                \qbezier(25,60)(50,70)(75,60)
                \qbezier(25,40)(50,50)(75,40)
                \qbezier(25,20)(50,30)(75,20)
                %
                \linethickness{\unitlength}
                \put(75,80){\vector(5,-2){0}}
                \put(75,60){\vector(5,-2){0}}
                \put(75,40){\vector(5,-2){0}}
                \put(75,20){\vector(5,-2){0}}
        \end{picture}
        \label{fig:functions_bijective}
        }
        \quad
        \subfloat[Function Inverse][$f^{-1}$, Inverse of $f$]{
        \begin{picture}(100,100)(0,0)
                \thinlines
                %
                \put(25,50){\oval(40,100)}
                \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}}
                \put(25,80){\circle*{2}}
                \put(25,60){\circle*{2}}
                \put(25,40){\circle*{2}}
                \put(25,20){\circle*{2}}
                \put(22,80){\makebox(0,0)[r]{\text{$a$}}}
                \put(22,60){\makebox(0,0)[r]{\text{$b$}}}
                \put(22,40){\makebox(0,0)[r]{\text{$c$}}}
                \put(22,20){\makebox(0,0)[r]{\text{$d$}}}
                %
                \put(75,50){\oval(40,100)}
                \put(75,98){\makebox(0,0)[t]{\text{$\set{Z}$}}}
                \put(75,80){\circle*{2}}
                \put(75,60){\circle*{2}}
                \put(75,40){\circle*{2}}
                \put(75,20){\circle*{2}}
                \put(78,80){\makebox(0,0)[l]{\text{$m$}}}
                \put(78,60){\makebox(0,0)[l]{\text{$n$}}}
                \put(78,40){\makebox(0,0)[l]{\text{$o$}}}
                \put(78,20){\makebox(0,0)[l]{\text{$p$}}}
                %
                \qbezier(25,80)(50,90)(75,80)
                \qbezier(25,60)(50,70)(75,60)
                \qbezier(25,40)(50,50)(75,40)
                \qbezier(25,20)(50,30)(75,20)
                %
                \linethickness{\unitlength}
                \put(25,80){\vector(-5,-2){0}}
                \put(25,60){\vector(-5,-2){0}}
                \put(25,40){\vector(-5,-2){0}}
                \put(25,20){\vector(-5,-2){0}}
        \end{picture}
        \label{fig:functions_inverse}
        }
        \caption[Examples of the Four Types of Functions.]{Examples of
        the four types of functions.}
        \label{fig:functions}
\end{figure}
%
These images show each set as an oval with elements represented as dots
within the oval. The arrowhead curves represent each element of the
function where the head of the curve represents the right projection of
the element and the tail of the curve represents the left projection of
the element. In other words, these functions are \emph{mappings} from a
domain set to a corresponding codomain set. The function $f_i$ is known
as an \emph{injective} function since every element of the codomain is
mapped to from at \emph{most} one element of the domain. The function
$f_s$ is not an injective function since two of its elements, $(s,a)$
and $(t,a)$, both have $a$ as a right projection. However, $f_s$ is
called an \emph{surjective} function since every member of its codomain
is mapped to from at \emph{least} one element of the domain. Surjective
functions are said to be \emph{onto} their codomains. It is clear that
$f_i$ is not a surjective function because element $s$ of the codomain
$\set{Y}$ is not a right projection of any of the elements of $f_s$. The
function $f$ is both injective and surjective, and thus it is called a
\emph{bijective function} or simply a \emph{bijection}. For every
bijective function, there exists an \emph{inverse function} that is also
a bijective function. Because of this, bijective functions are also
called \emph{invertible}. The inverse of function $f$ is denoted by
$f^{-1}$. Roughly speaking, a function's inverse is a function which is
the reverse mapping of the original function. Because of this, the
bijection $f$ may be denoted by $f: \set{X} \biject \set{Z}$, which
indicates that a mapping exists both from set $\set{X}$ to set $\set{Z}$
as well as from set $\set{Z}$ to set $\set{X}$. A more precise
definition of inverse is given below.

\paragraph{The Identity Function:} For any set $\set{X}$, the function
$f: \set{X} \mapsto \set{X}$ defined by
%
\begin{equation*}
        f \triangleq \{ (x,x): x \in \set{X} \}
\end{equation*}
%
is called the \emph{identity function}. That is, for set $\set{X}$ and
identity function $f: \set{X} \mapsto \set{X}$, for all $x \in \set{X}$,
$f(x)=x$.

\paragraph{Compositions, and the Inverse:} Take three arbitrary sets
$\set{F}$, $\set{G}$, and $\set{H}$. Take function $g: \set{F} \mapsto
\set{G}$ and function $h: \set{G} \mapsto \set{H}$. The
\emph{composition} of functions $h$ and $g$ is a new function $c:
\mapsto \set{F} \times \set{H}$, such that for every $x \in \set{F}$,
there is a pair $(x,h(g(x))) \in c$. This composition function is
denoted $h \comp g$. For each $x \in \set{F}$, the right projection of
the corresponding pair in $h \comp g$ is denoted by either $(h \comp
g)(x)$ or $h(g(x))$.

The function $f_i \comp f_s$ is shown in
\longref{fig:function_comps_surjective_injective}. Its construction is
depicted graphically in
\longref{fig:function_comps_surjective_injective_composition}.
Similarly, $f^{-1} \comp f$ is shown in
\longref{fig:function_comps_identity}, and its construction is shown in
\longref{fig:function_comps_identity_composition}.
%
\begin{figure}[!ht]\centering
        \subfloat[Surjective Composed with Injective][$f_i$ composed
        with $f_s$]{
        \begin{picture}(150,100)(0,0)
                \thinlines
                %
                \put(25,50){\oval(40,100)}
                \put(25,98){\makebox(0,0)[t]{\text{$\set{Y}$}}}
                \put(25,83.3){\circle*{2}}
                \put(25,66.7){\circle*{2}}
                \put(25,50){\circle*{2}}
                \put(25,33.3){\circle*{2}}
                \put(25,16.7){\circle*{2}}
                \put(22,83.3){\makebox(0,0)[r]{\text{$s$}}}
                \put(22,66.7){\makebox(0,0)[r]{\text{$t$}}}
                \put(22,50){\makebox(0,0)[r]{\text{$u$}}}
                \put(22,33.3){\makebox(0,0)[r]{\text{$v$}}}
                \put(22,16.7){\makebox(0,0)[r]{\text{$w$}}}
                %
                \put(75,50){\oval(40,100)}
                \put(75,98){\makebox(0,0)[t]{\text{$\set{X}$}}}
                \put(75,80){\circle*{2}}
                \put(75,60){\circle*{2}}
                \put(75,40){\circle*{2}}
                \put(75,20){\circle*{2}}
                \put(75,77){\makebox(0,0)[t]{\text{$a$}}}
                \put(75,57){\makebox(0,0)[t]{\text{$b$}}}
                \put(75,37){\makebox(0,0)[t]{\text{$c$}}}
                \put(75,17){\makebox(0,0)[t]{\text{$d$}}}
                %
                \put(125,50){\oval(40,100)}
                \put(125,98){\makebox(0,0)[t]{\text{$\set{Y}$}}}
                \put(125,83.3){\circle*{2}}
                \put(125,66.7){\circle*{2}}
                \put(125,50){\circle*{2}}
                \put(125,33.3){\circle*{2}}
                \put(125,16.7){\circle*{2}}
                \put(128,83.3){\makebox(0,0)[l]{\text{$s$}}}
                \put(128,66.7){\makebox(0,0)[l]{\text{$t$}}}
                \put(128,50){\makebox(0,0)[l]{\text{$u$}}}
                \put(128,33.3){\makebox(0,0)[l]{\text{$v$}}}
                \put(128,16.7){\makebox(0,0)[l]{\text{$w$}}}
                %
                \qbezier(25,83.3)(50,90)(75,80)
                \qbezier(25,66.7)(50,70)(75,80)
                \qbezier(25,50)(50,50)(75,60)
                \qbezier(25,33.3)(50,30)(75,40)
                \qbezier(25,16.7)(50,10)(75,20)
                %
                \put(50,100){\makebox(0,0)[t]{\text{$f_s$}}}
                %
                \qbezier(75,80)(100,90)(125,66.7)
                \qbezier(75,60)(100,70)(125,50)
                \qbezier(75,40)(100,50)(125,33.3)
                \qbezier(75,20)(100,30)(125,16.7)
                %
                \put(100,100){\makebox(0,0)[t]{\text{$f_i$}}}
                %
                \linethickness{\unitlength}
                %
                \put(75,80){\vector(5,-2){0}}
                \put(75,80){\vector(5,2){0}}
                \put(75,60){\vector(5,2){0}}
                \put(75,40){\vector(5,2){0}}
                \put(75,20){\vector(5,2){0}}
                %
                \put(125,66.7){\vector(1000,-932){0}}
                \put(125,50){\vector(5,-4){0}}
                \put(125,33.3){\vector(1000,-668){0}}
                \put(125,16.7){\vector(1000,-532){0}}
        \end{picture}
        \label{fig:function_comps_surjective_injective_composition}
        }
        \quad
        \subfloat[Composition of Injective with Surjective]%
                 [Function $f_i \comp f_s$]{
        \begin{picture}(100,100)(0,0)
                \thinlines
                %
                \put(25,50){\oval(40,100)}
                \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}}
                \put(25,83.3){\circle*{2}}
                \put(25,66.7){\circle*{2}}
                \put(25,50){\circle*{2}}
                \put(25,33.3){\circle*{2}}
                \put(25,16.7){\circle*{2}}
                \put(22,83.3){\makebox(0,0)[r]{\text{$s$}}}
                \put(22,66.7){\makebox(0,0)[r]{\text{$t$}}}
                \put(22,50){\makebox(0,0)[r]{\text{$u$}}}
                \put(22,33.3){\makebox(0,0)[r]{\text{$v$}}}
                \put(22,16.7){\makebox(0,0)[r]{\text{$w$}}}
                %
                \put(75,50){\oval(40,100)}
                \put(75,98){\makebox(0,0)[t]{\text{$\set{Y}$}}}
                \put(75,83.3){\circle*{2}}
                \put(75,66.7){\circle*{2}}
                \put(75,50){\circle*{2}}
                \put(75,33.3){\circle*{2}}
                \put(75,16.7){\circle*{2}}
                \put(78,83.3){\makebox(0,0)[l]{\text{$s$}}}
                \put(78,66.7){\makebox(0,0)[l]{\text{$t$}}}
                \put(78,50){\makebox(0,0)[l]{\text{$u$}}}
                \put(78,33.3){\makebox(0,0)[l]{\text{$v$}}}
                \put(78,16.7){\makebox(0,0)[l]{\text{$w$}}}
                %
                \qbezier(25,83.3)(50,93.3)(75,66.7)
                \qbezier(25,66.7)(50,56.7)(75,66.7)
                \qbezier(25,50)(50,60)(75,50)
                \qbezier(25,33.3)(50,43.3)(75,33.3)
                \qbezier(25,16.7)(50,26.7)(75,16.7)
                %
                \linethickness{\unitlength}
                \put(75,66.7){\vector(125,-133){0}}
                \put(75,66.7){\vector(5,2){0}}
                \put(75,50){\vector(5,-2){0}}
                \put(75,33.3){\vector(5,-2){0}}
                \put(75,16.7){\vector(5,-2){0}}
        \end{picture}
        \label{fig:function_comps_surjective_injective}
        }\\
        \medskip
        \subfloat[Inverse Composed with Its Bijective][$f^{-1}$ composed
        with $f$]{
        \begin{picture}(150,100)(0,0)
                \thinlines
                %
                \put(25,50){\oval(40,100)}
                \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}}
                \put(25,80){\circle*{2}}
                \put(25,60){\circle*{2}}
                \put(25,40){\circle*{2}}
                \put(25,20){\circle*{2}}
                \put(22,80){\makebox(0,0)[r]{\text{$a$}}}
                \put(22,60){\makebox(0,0)[r]{\text{$b$}}}
                \put(22,40){\makebox(0,0)[r]{\text{$c$}}}
                \put(22,20){\makebox(0,0)[r]{\text{$d$}}}
                %
                \put(75,50){\oval(40,100)}
                \put(75,98){\makebox(0,0)[t]{\text{$\set{Z}$}}}
                \put(75,80){\circle*{2}}
                \put(75,60){\circle*{2}}
                \put(75,40){\circle*{2}}
                \put(75,20){\circle*{2}}
                \put(75,77){\makebox(0,0)[t]{\text{$m$}}}
                \put(75,57){\makebox(0,0)[t]{\text{$n$}}}
                \put(75,37){\makebox(0,0)[t]{\text{$o$}}}
                \put(75,17){\makebox(0,0)[t]{\text{$p$}}}
                %
                \put(125,50){\oval(40,100)}
                \put(125,98){\makebox(0,0)[t]{\text{$\set{X}$}}}
                \put(125,80){\circle*{2}}
                \put(125,60){\circle*{2}}
                \put(125,40){\circle*{2}}
                \put(125,20){\circle*{2}}
                \put(128,80){\makebox(0,0)[l]{\text{$a$}}}
                \put(128,60){\makebox(0,0)[l]{\text{$b$}}}
                \put(128,40){\makebox(0,0)[l]{\text{$c$}}}
                \put(128,20){\makebox(0,0)[l]{\text{$d$}}}
                %
                \qbezier(25,80)(50,90)(75,80)
                \qbezier(25,60)(50,70)(75,60)
                \qbezier(25,40)(50,50)(75,40)
                \qbezier(25,20)(50,30)(75,20)
                %
                \put(50,100){\makebox(0,0)[t]{\text{$f$}}}
                %
                \qbezier(75,80)(100,70)(125,80)
                \qbezier(75,60)(100,50)(125,60)
                \qbezier(75,40)(100,30)(125,40)
                \qbezier(75,20)(100,10)(125,20)
                %
                \put(100,100){\makebox(0,0)[t]{\text{$f^{-1}$}}}
                %
                \linethickness{\unitlength}
                %
                \put(75,80){\vector(5,-2){0}}
                \put(75,60){\vector(5,-2){0}}
                \put(75,40){\vector(5,-2){0}}
                \put(75,20){\vector(5,-2){0}}
                %
                \put(125,80){\vector(5,2){0}}
                \put(125,60){\vector(5,2){0}}
                \put(125,40){\vector(5,2){0}}
                \put(125,20){\vector(5,2){0}}
        \end{picture}
        \label{fig:function_comps_identity_composition}
        }
        \quad
        \subfloat[Composition of Inverse with Its Bijective][Identity
        function $f^{-1} \comp f$]{
        \begin{picture}(100,100)(0,0)
                \thinlines
                %
                \put(25,50){\oval(40,100)}
                \put(25,98){\makebox(0,0)[t]{\text{$\set{X}$}}}
                \put(25,80){\circle*{2}}
                \put(25,60){\circle*{2}}
                \put(25,40){\circle*{2}}
                \put(25,20){\circle*{2}}
                \put(22,80){\makebox(0,0)[r]{\text{$a$}}}
                \put(22,60){\makebox(0,0)[r]{\text{$b$}}}
                \put(22,40){\makebox(0,0)[r]{\text{$c$}}}
                \put(22,20){\makebox(0,0)[r]{\text{$d$}}}
                %
                \put(75,50){\oval(40,100)}
                \put(75,98){\makebox(0,0)[t]{\text{$\set{X}$}}}
                \put(75,80){\circle*{2}}
                \put(75,60){\circle*{2}}
                \put(75,40){\circle*{2}}
                \put(75,20){\circle*{2}}
                \put(78,80){\makebox(0,0)[l]{\text{$a$}}}
                \put(78,60){\makebox(0,0)[l]{\text{$b$}}}
                \put(78,40){\makebox(0,0)[l]{\text{$c$}}}
                \put(78,20){\makebox(0,0)[l]{\text{$d$}}}
                %
                \qbezier(25,80)(50,90)(75,80)
                \qbezier(25,60)(50,70)(75,60)
                \qbezier(25,40)(50,50)(75,40)
                \qbezier(25,20)(50,30)(75,20)
                %
                \linethickness{\unitlength}
                \put(75,80){\vector(5,-2){0}}
                \put(75,60){\vector(5,-2){0}}
                \put(75,40){\vector(5,-2){0}}
                \put(75,20){\vector(5,-2){0}}
        \end{picture}
        \label{fig:function_comps_identity}
        }
        \caption[Examples of Function Composition.]{Examples of
        function composition.}
        \label{fig:function_comps}
\end{figure}
%
The latter example, $f^{-1} \comp f: \set{X} \mapsto \set{X}$ is
equivalent to the \emph{identity function} which maps every element of
set $\set{X}$ to itself. That is, for all $x \in \set{X}$, it is the
case that $(f^{-1} \comp f)(x) = x$ (\ie, $f^{-1}(f(x))=x$). In fact,
this result follows directly from the precise definition of a function's
inverse. For an arbitrary function $f: \set{X} \to \set{Y}$, its inverse
$f^{-1}: \set{Y} \to \set{X}$ is function such that $f^{-1} \comp f$ is
the identity function defined on set $\set{X}$ (\ie, an identity
function for $\set{X} \mapsto \set{X}$).

\paragraph{The Range of a Function:} Each function is defined to have a
domain and a codomain. However, functions that are not surjective will
not map to every element of their codomain. For example, the function
$g: \{a,b,c\} \mapsto \{d,e,f\}$ defined as
%
\begin{equation*}
        g \triangleq \{(a,d),(b,f),(c,f)\}
\end{equation*}
%
provides no mapping to the element $e$ of the function's codomain. The
\emph{range} of a function is the subset of the function's codomain
which represents all of the elements that have one or more mapping from
the function's domain. That is, for an arbitrary function $f: \set{X}
\mapsto \set{Y}$, the range of function $f$, denoted $\range(f)$, is
defined
%
\begin{equation*}
        \range(f)
        \triangleq
        \{ y \in \set{Y} : (x,y) \in f \text{ for all } x \in \set{X} \}
\end{equation*}
%
which might also be denoted
%
\begin{equation*}
        \range(f)
        \triangleq
        \{ y \in \set{Y} : y=f(x) \text{ for all } x \in \set{X} \}
\end{equation*}
%
Clearly, this set is a subset of the function's codomain. That is,
$\range(f) \subseteq \set{Y}$. Note that if $\range(f) = \set{Y}$ then
function $f$ is surjective.

\paragraph{Images:} The \emph{image} of a subset of a function's domain
under that function is the subset of the codomain of the function that
are mapped to from that domain subset. That is, for a function $f:
\set{X} \mapsto \set{Y}$ where set $\set{Z} \subseteq \set{X}$ then the
image of $\set{Z}$ under $f$, denoted $f[\set{Z}]$ or $f(\set{Z})$, is
defined
%
\begin{equation*}
        f[\set{Z}]
        \triangleq
        \{ y \in \set{Y} : y=f(x) \text{ for all } x \in \set{Z} \}
\end{equation*}
%
Clearly $f[\set{Z}] \subseteq \set{Y}$. Additionally, the image of a
function's domain is its range. In other words, for function $f: \set{X}
\mapsto \set{Y}$, it is the case that $f[\set{X}] = \range(f)$.

\paragraph{Pre-images:} The \emph{preimage} or \emph{inverse image} of a
subset of a function's codomain under that function is the subset of the
domain of the function that maps to that codomain subset. That is, for a
function $f: \set{X} \mapsto \set{Y}$ where set $\set{Z} \subseteq
\set{Y}$ then the preimage of $\set{Z}$ under $f$, denoted
$f^{-1}[\set{Z}]$ or $f^{-1}(\set{Z})$, is defined
%
\begin{equation*}
        f^{-1}[\set{Z}]
        \triangleq
        \{ x \in \set{X} : f(x) \in \set{Z} \}
\end{equation*}
%
Clearly $f^{-1}[\set{Z}] \subseteq \set{X}$. Additionally, the image of
a function's range is its domain. In fact, the image of any superset of
a function's range (\eg, its codomain) is also its domain. In other
words, for a function $f: \set{X} \mapsto \set{Y}$, it is the case that
both $f^{-1}[\range{f}] = \set{X}$ and $f^{-1}[\set{Y}] = \set{X}$. It
is important to note that the inverse image or preimage of a set under a
function is \emph{not} equivalent to the inverse of a function. In fact,
the inverse of a function only exists if the function is a bijection.
For example, take the function $g: \{a,b,c\} \mapsto \{d,e,f\}$ defined
with
%
\begin{equation*}
        g \triangleq \{(a,d),(b,f),(c,f)\}
\end{equation*}
%
The preimage $g^{-1}[\{f\}] = \{b,c\}$; however, the inverse of $g$ does
not exist since $g$ is not a bijection. This can cause some confusion
because sometimes the notations $g^{-1}[f]$, $g^{-1}(\{f\})$, or even
$g^{-1}(f)$ might be used to represent the image of $\{f\}$ under
function $g$. 

\paragraph{Images of Sets of Sets:} Take sets $\set{X}$ and $\set{Y}$
and a function $f: \set{X} \mapsto \set{Y}$. Now take $\setset{B}
\subseteq \Pow(\set{X})$. That is, $\setset{B}$ is a set of subsets of
$\set{X}$. The \emph{image} of set of sets $\setset{B}$ is denoted
$f\{\setset{B}\}$ and defined by
%
\begin{equation*}
        f\{ \setset{B} \}
        \triangleq
        \{ f[\set{B}] : \set{B} \in \setset{B} \}
\end{equation*}
%
where $f[\set{B}]$ is the image of set $\set{B} \in \setset{B}$ under
$f$. That is, $f\{ \setset{B} \}$ is a set of images of sets.

\paragraph{Function Restrictions:} New functions can be generated from
existing functions by \emph{restricting} a function's mappings to map
from a subset of the function's domain. That is, for a function $f:
\set{X} \mapsto \set{Y}$ with subset $\set{Z} \subseteq \set{X}$, the
restriction of function $f$ to set $\set{Z}$, denoted $f|_\set{Z}$, is
defined
%
\begin{equation*}
        f|_\set{Z}
        \triangleq
        \{ (x,y) \in f : x \in \set{Z} \}
\end{equation*}
%
Therefore, for $z \in \set{Z}$, it is also the case that $z \in
\set{X}$, and so $f|_\set{Z}(z) = f(z)$. Note that the image of function
$f: \set{X} \mapsto \set{Y}$ of set $\set{Z} \subseteq \set{X}$ is equal
to the range of the restriction of function $f$ to set $\set{Z}$. In
other words, $\range(f|_\set{Z})=f[\set{Z}]$.

\paragraph{Closure Under a Function:} Take sets $\set{X}$ and $\set{Y}$
such that $\set{Y} \subseteq \set{X}$. Take a function $f: \set{X}
\mapsto \set{X}$. If the image $f[\set{Y}] \subseteq \set{Y}$ then the
subset $\set{Y}$ is said to be \emph{closed} under the function $f$. In
other words, the function $f$ maps elements of set $\set{Y}$ back to
$\set{Y}$. Put another way, the range of the restriction of $f$ to
$\set{Y}$ is also a subset $\set{Y}$ (\ie, $\range(f|_\set{Y}) \subseteq
\set{Y}$).

\paragraph{Functions of Cartesian Products:} Take sets $\set{X}$,
$\set{Y}$, and $\set{Z}$ and a function $f: \set{X} \times \set{Y}
\mapsto \set{Z}$. For $(x,y) \in \set{X} \times \set{Y}$, the right
projection of $(x,y)$ given by $f$ could be denoted $f((x,y))$; however,
the extra parentheses are usually dropped, so the notation $f(x,y)$ is
used. In other words, $((x,y),z) \in f$ if and only if $f(x,y)=z$.

\paragraph{Partial Functions:} There is also a notion of \emph{partial
function} that weakens the definition of function from including
\emph{exactly} one element for each element of the domain to including
\emph{at most} one element for each element of the domain. For example,
for domain set $\set{X}$ and codomain set $\set{Y}$, a partial function
$g \subset \set{X} \times \set{Y}$ (\ie, $g: \set{X} \mapsto \set{Y}$)
could be defined
%
\begin{align*}
        g \triangleq \{(a,s),(b,t),(c,t)\}
\end{align*}
%
This is a partial function (\ie, not a total function) because it
provides no mapping for element $d$ of the domain set $\set{X}$. Of
course, if the domain of $g$ was given as the set $\{a,b,c\}$ rather
than set $\set{X}$ then $g$ would be a total function. Whether a
\emph{function} is a partial function or a total function will often not
be important to a particular problem; when it is important, the context
should make it clear what the meaning of \emph{function} is. However,
note that the important notion of \emph{bijection} should only be
interpreted as involving total functions.

\subsection{Indexed Families}
\label{app:math_indexed_families}

The notion of an \emph{indexed family} is practically identical to the
notion of a function; that is, an indexed family provides an alternate
notation for a function with little loss of generality. Take a function
$f: \set{I} \mapsto \set{Y}$. Recall that the range of the function $f$
is the set
%
\begin{equation*}
        \{ f(i) : i \in \set{I} \}
\end{equation*}
%
This represents the \emph{set} of values to which the function maps.
Regardless of how many mappings are present in function $f$, if every
element of the domain $\set{I}$ gets mapped to a single element of the
range $y \in \set{Y}$ then the range will be simply $\{ y \}$ because
the range is a set and a set only contains distinct values. Thus, the
range lists the values mapped to by the function; however, it destroys
any information about the mappings and so the function cannot be
reconstructed by simply knowing the range. However, an \symdef[indexed
family]{Dseq.1}{indexedfamily}{$(x_i:i \in \set{I})$}{an indexed family
with index set $\set{I}$ (also $(x_i)_{i \in \set{I}}$)}, which is often
denoted by
%
\begin{equation*}
        ( f_i : i \in \set{I} )
        \quad \text{ or } \quad
        ( f_i )_{i \in \set{I}}
\end{equation*}
%
is not a set. This makes an indexed family a collection of values which
may or may not be distinct. For example, take the function $g: \{a,b,c\}
\mapsto \{d,e,f\}$ defined by
%
\begin{equation*}
        g \triangleq \{(a,d),(b,f),(c,f)\}
\end{equation*}
%
The range of this function is $\{d,f\}$. However, the indexed family
representation of this function is $( f_a, f_b, f_c )$ or $( f_i : i \in
\{a,b,c\})$ where $f_a = d$, $f_b = f$, and $f_c = f$. Also note that
$f_i$ can be replaced with other index notations, like $f(i)$ and $f^i$.
Important applications of indexed families can be found in
\longrefs{app:math_sumprod_ind_fam} and \shortref{app:math_probability}.

\paragraph{Ordered Indexed Families:} The indexed family notation can be
especially useful when the index set is a \emph{directed set}.  Directed
sets are discussed in \longref{app:math_order_theory}. In this case, the
indexed family is called an \symdef[\emph{ordered indexed
family}]{Dseq.2}{orderedindexedfamily}{$(x(t):t \geq 0)$}{an ordered
indexed family with a directed index set $\set{T}$ where $0 \in
\set{T}$}. For example, the set $\W$ with the standard $\leq$ order
relation is totally ordered and thus is also a directed set, so for a
function $f: \W \mapsto \set{Y}$, the corresponding ordered indexed
family might be listed
%
\begin{equation*}
        ( f(i) : i \geq 0 ) 
        \quad \text{ or } \quad
        ( f(i) )_{i \geq 0} 
        \quad \text{ or } \quad
        ( f(i) )_{i=0}^\infty
\end{equation*}
%
where the symbol $\infty$ indicates that the length of the list is
unbounded; that is, an equivalent notation is
%
\begin{equation*}
        ( f(0), f(1), f(2), f(3), \cdots )
\end{equation*}
%
Note that the order of the elements of the list match the order of the
elements in the index set; this is intentional. Therefore, this notation
provides a method for ordering the range values of the function. In
other words, the value $f(0)$ comes \emph{before} all of the other
values. This notation can also be used to restrict values of the
function to a certain subset of $\W$ while also implying that the
elements are still ordered. For example, $f$ restricted to $\{5,6,7,8\}$
(\ie, $f|_{\{5,6,7,8\}}$) could be listed
%
\begin{equation*}
        ( f(i) : 5 \leq i \leq 8 )
        \quad \text{ or } \quad
        ( f(i) )_{i=5}^{8}
\end{equation*}
%
which is equivalent to the list notation
%
\begin{equation*}
        ( f(5), f(6), f(7), f(8) )
\end{equation*}
%
where again the order of the elements of the list match the order of the
elements of the index subset. This notation not only compactly restricts
$f$ to a finite subset of $\W$, but it still \emph{maintains the order}
of the elements of $f$. 

It is important to note that when viewing an indexed family as an
alternate specification for a function, the indexed family does not
communicate much information about the function's codomain. Thus,
indexed families are primarily used to capture information about an list
of objects. When that list of objects carries with it some special
order, an indexed family can still carry information about that ordering
while providing a more compact notation than the tuple or Cartesian
product notation. Special versions of the ordered indexed family called
\emph{nets} and \emph{sequences} will be introduced later in
\longref{app:math_nets_and_sequences}

\subsection{Congruent Sets}
\label{app:math_congruent_sets}

The term \emph{congruent} can mean a number of things depending on its
context. However, all uses will have in common that two things that are
congruent are somehow equal. That is, congruence is some weaker form of
equality. That is, when unequal objects are similar enough to be
substituted for each other with little to no impact on a problem then
those objects might be called congruent. 

Our use of congruent is weaker than the use of most authors. That is,
our use of congruence roughly translates to stating that two objects are
the same size whereas other authors state that objects that are
congruent not only have the same size but also have a roughly equivalent
shape. However, most of these stronger definitions of congruent are
synonyms for more descriptive terms. Therefore, if we mean to imply some
stronger relationship between two sets than congruence, we will simply
use the more descriptive term. This will be the subject of
\longref{app:math_abstract_algebra}.

\paragraph{Congruence by Bijection:} For any two sets $\set{G}$ and
$\set{H}$, if there exists a bijection from $\set{G}$ to $\set{H}$ (\ie,
there exists a $g: \set{G} \biject \set{H}$) then the sets are said to
be where \symdef[\emph{congruent}]{Ageneral.2}{congruent}{$\cong$}{is
congruent to}, which is denoted $\set{G} \cong \set{H}$.  Congruence is
a notion of equality. For finite sets, it is equivalent to say that the
two sets have an equal number of elements. In the above examples,
because $f$ is a bijection from $\set{X}$ to $\set{Z}$, the sets
$\set{X}$ and $\set{Z}$ are congruent; that is, $\set{X} \cong \set{Z}$
and clearly the two finite sets have the same number of elements.
Congruence also applies to infinite sets. For example, using the
definition of $\W$ from \longref{eq:whole_numbers}, take the function
$s: \W \biject \N$, defined by
%
\begin{equation*}
        s \triangleq \{(0,1),(1,2),(2,3),(3,4),\cdots\}
\end{equation*}
%
That is, $s(47)=48$ and $s(1000)=1001$. Clearly, this function has an
inverse $s^{-1}: \N \biject \W$ which is defined by
%
\begin{equation*}
        s^{-1} \triangleq \{(1,0),(2,1),(3,2),(4,3),\cdots\}
\end{equation*}
%
That is, $s^{-1}(48)=47$ and $s^{-1}(1001)=1000$. So, $s$ is surely a
bijection. As an exercise, note that $s^{-1} \comp s: \W \mapsto \W$ is
%
\begin{align*}
        s^{-1} \comp s 
        \triangleq
        \{(0,0),(1,1),(2,2),(3,3),\cdots\}
\end{align*}
%
and $s \comp s^{-1}: \N \mapsto \N$ is
%
\begin{align*}
        s \comp s^{-1} 
        \triangleq
        \{(1,1),(2,2),(3,3),(4,4),\cdots\}
\end{align*}
%
which are both identity functions, as expected since $s$ is a bijection.
Since a bijection exists between the two infinite sets $\W$ and $\N$
then $\W \cong \N$. This is interesting because every element of $\N$ is
also an element of $\W$; however, $\W$ includes $0$, which is not
included in $\N$. That is, $\N$ is a strict subset of $\W$. In summary,
%
\begin{align*}
        \N \subset \W \quad \text{ and } \quad 
        \N \neq \W \quad \text{ and } \quad
        \N \cong \W
\end{align*}
%
It is impossible for two finite sets to be simultaneously related in
these ways; congruent finite sets must have the same number of elements
and so any set that is a strict subset of another set could never be
congruent to that other set. Therefore, congruence is more generally a
sort of structural equivalence between two sets rather than a size
equivalence. Note that not all infinite sets are congruent. The
congruence of finite and infinite sets plays a key role in the
discussion in \longref{app:math_cardinality}. 

\paragraph{Countably and Uncountably Infinite Sets:} Note that since $\N
\cong \N$ trivially and $\N \subset \W$ then $\N$ is not only congruent
to $\W$ but is also congruent to a subset of $\W$. In fact, $\N$ is
congruent to every countably infinite set. The definition of a
\emph{countably infinite set} is one that is congruent with $\N$ (\ie,
one in which there exists a bijection between it and $\N$). Therefore
any infinite subset of $\W$, including $\N$, is a countably infinite set
and is said to be \emph{countable}. If there is no bijection between a
given infinite set and $\N$ then that set is an \emph{uncountably
infinite set} and is said to be simply \emph{uncountable}. Put another
way, if there is no injective function from a set to $\N$ then the set
must be uncountable.

\paragraph{Cartesian Product and Sets of Functions:} Take arbitrary set
$\set{X}$. Recall that the Cartesian product $\set{X} \times \set{X}$ is
also represented by $\set{X}^2$. Also recall that by the definition in
\longref{eq:two}, $2 = \{0,1\}$. Thus, $\set{X}^2$ can also be written
$\set{X}^{\{0,1\}}$, which is the set of all functions from $\{0,1\}$ to
set $\set{X}$. It is true that $\set{X} \times \set{X}$ is certainly not
equivalent to the set of functions $\set{X}^{\{0,1\}}$. However, these
two sets are congruent. Each element of $\set{X}^{\{0,1\}}$ maps $0$ and
$1$ to elements of $\set{X}$. Similarly, each element of $\set{X} \times
\set{X}$ takes one element of $\set{X}$ as a left projection and one
element of $\set{Y}$ as a right projection. For each function in $f \in
\set{X}^{\{0,1\}}$, there is a pair $(f(0),f(1)) \in \set{X} \times
\set{X}$. Additionally, for each pair $(x,y) \in \set{X} \times
\set{X}$, there exists a function $f \in \set{X}^{\{0,1\}}$ such that
$f(0)=x$ and $f(1)=y$.  Therefore, there is a bijection between sets
$\set{X}^{\{0,1\}}$ and $\set{X} \times \set{X}$ (\ie, roughly, they
have the same size) and so the sets are congruent. This is why it is
acceptable to substitute $\set{X}^2$ for $\set{X} \times \set{X}$. This
is true for all $\set{X}^n$ with $n \in \{2,3,4,\dots\}$.

\subsection{Cardinality}
\label{app:math_cardinality}

This is a brief introduction to the mathematical topic of
\emph{cardinality}. To make it more complete, \emph{cardinals} and
\emph{ordinals} should be discussed separately and contrasted. However,
as it will not affect our work, our handling of cardinality may tend to
blur the two concepts for simplicity. Roughly, cardinals represent some
notion of size of a set and ordinals represent some notion of position
in an order. The distinction between ordinals and cardinals becomes
particularly important when handling infinite sets rigorously.

\paragraph{Finite Cardinality and Congruence:} Consider the infinite
number of sets of the form \longrefs{eq:zero}--\shortref{eq:three} that
are each elements of $\W$ (\ie, the sets more commonly represented by
symbols $0$, $1$, $2$, $3$, \dots). Take a arbitrary finite set
$\set{X}$.  There exists a unique element $c \in \W$ such that $c$ and
$\set{X}$ are congruent.  That is, there exists an element $c \in \W$
such that there is a bijection mapping every element from $c$ to every
element of $\set{X}$.  This unique element $c$ is referred to as the
\symdef[\emph{cardinality}]{Csets.1zz}{cardinality}%
{$\pipe\set{X}\pipe$}{cardinality of set $\set{X}$} and is denoted
$|\set{X}|$. For example, for some of the finite sets used as examples
above,
%
\begin{align*}
        |\emptyset| &= 0\\
        |\set{A}| &= 3\\
        |\set{B}| &= 3\\
        |\set{C}| &= 1\\
        |\set{D}| &= 2\\
        |\set{E}| &= 3
\end{align*}
%
and for the domain and codomain sets used in
\longref{app:math_functions},
%
\begin{align*}
        |\set{X}| &= 4\\
        |\set{Y}| &= 5\\
        |\set{Z}| &= 4
\end{align*}
%
as expected, since a bijection exists between sets $\set{X}$ and
$\set{Z}$, they are congruent, and since they are congruent, they have
the same cardinality. Likewise, since finite sets $\set{A}$, $\set{B}$,
and $\set{E}$ have the same cardinality, they are all congruent even
though set $\set{A}$ is not equivalent to either set $\set{B}$ or set
$\set{E}$ (also note that $\set{B}$ is equivalent to $\set{E}$, and so
it must be congruent as well). Of course, it is also true that
%
\begin{align*}
        |0| &= |\{\}| = 0\\
        |1| &= |\{0\}| = 1\\
        |2| &= |\{0,1\}| = 2\\
        |3| &= |\{0,1,2\}| = 3\\
        &\mathrel{\vdots}
\end{align*}
%
In fact, the function $n: \W \mapsto \W$ defined by
%
\begin{equation*}
        n \triangleq \{(x,y) \in \W \times \W: y = |x|\}
\end{equation*}
%
is the identity function on set $\W$. That is, $n(53)=53$. The
cardinality of any whole number is equal to itself (\ie, $|x|=x$ for all
$x \in \W$). 

\paragraph{Cardinality of Infinite Sets:} The subject of cardinality of
infinite sets is an interesting topic, but it is not crucial to our
work, and so our coverage of this subject is brief. For the same reason
that not all infinite sets are congruent, not all infinite sets have the
same cardinality. However, it is the case that all congruent infinite
sets do have the same cardinality. Just as every element of the whole
numbers $\W$ is a candidate cardinality for any finite set, there are
special numbers that have been created to play an analogous role for
infinite sets. 

As an example, take the set $2^\N$ (\ie, $\{0,1\}^\N$), which is the the
infinite set of all functions of the form $\N \mapsto \{0,1\}$. It can
be shown that the natural numbers $\N$ are congruent to a strict subset
of this set $2^\N$ just as the natural numbers are congruent to a strict
subset of $\W$; however, while $\N \cong \W$, $\N$ is not congruent to
$2^\N$. Roughly speaking, $2^\N$ is a larger set than $\N$ since any
function taking the form $2^\N \mapsto \N$ is surjective but not
injective. That is, for all functions of the form $2^\N \mapsto \N$,
there is at least one element of $2^\N$ that is mapped to more than one
element from $\N$. This prevents there from being a bijection between
these two sets, and so these two sets cannot be congruent. Therefore,
since there is no bijection between the two infinite sets $2^\N$ and
$\N$ then $2^\N$ must be an uncountably infinite set.

\subsection{Power Sets}
\label{app:math_power_sets}

The \symdef[\emph{power set}]{Csets.1z}{powerset}{$\Pow(\set{U})$}{power
set of set $\set{U}$ (\ie, the set of all subsets of $\set{U}$)} of a
set $\set{U}$, denoted $\Pow(\set{U})$, is defined to be the set of all
subsets of set $\set{U}$. Clearly,
%
\begin{enumerate}[(i)]
        \item $\emptyset \in \Pow(\set{U})$
                \label{item:power_set_emptyincl}
        \item $\set{U} \in \Pow(\set{U})$
                \label{item:power_set_setincl}
        \item for all $\set{X} \subseteq \set{U}$, $\set{X} \in
                \Pow(\set{U})$
\end{enumerate}
%
By properties (\shortref{item:power_set_emptyincl}) and
(\shortref{item:power_set_setincl}), $\Pow(\set{U}) \neq \emptyset$.
This is true even for $\Pow(\emptyset)$ since $\Pow(\emptyset)=\{
\emptyset \}$. Therefore, all power sets are nonempty.
                
\paragraph{Notations:} The notations $\set{Y} \in \Pow(\set{X})$ and
$\set{Y} \subseteq \set{X}$ are equivalent ways of specifying that
$\set{Y}$ is a subset of $\set{X}$. The power set is also denoted
$2^\set{X}$, or equivalently $\{0,1\}^\set{X}$, because the set of all
functions of the form $\set{X} \mapsto \{0,1\}$ is congruent to the set
of subsets of $\set{X}$. To see this, take $\set{Y} \subseteq \set{X}$
to be a subset of $\set{X}$.  Construct a function mapping every element
in set $\set{Y}$ to $1$ and every element of $\set{X}$ that is not an
element of $\set{Y}$ to $0$.  By construction, this function is an
element of $2^\set{X}$.  Additionally, take $f \in 2^\set{X}$ to be an
arbitrary function mapping all elements of $\set{X}$ to either a $0$ or
a $1$. Construct a subset of $\set{X}$ made up of only those elements
that map to a $1$. This subset will be a member of the power set of set
$\set{X}$. Therefore, the power set and $2^\set{X}$ are congruent.

\paragraph{Cardinality:} Recall from \longref{app:math_cardinality} that
$2^\N$ is an infinite set that is somehow larger than the infinite set
$\N$ (\ie, total functions from $\{0,1\}^\N$ to $\N$ can never be
injective and thus will never be bijective). From the above explanation,
$2^\N$ is congruent to $\Pow(\N)$ and is in fact an equivalent notation
specifying the power set of set $\N$. It can be shown that any non-empty
set is in some sense smaller than its corresponding power set. In the
case of $\N$, an infinite yet countable set, its power set $\Pow(\N)$ is
also infinite but is uncountable, and this lack of countability is what
makes $\Pow(\N)$ somehow larger than $\N$. In
\longref{app:math_numbers}, cardinality arithmetic is defined that
allows the cardinality of a power set to be calculated; this helps
justify that a set is always smaller (in terms of cardinality) than its
power set. We also make notions of larger and smaller more precise in
\longref{app:math_numbers}. Another important uncountable set that
\emph{is} congruent to $\Pow(\N)$ will be introduced in
\longref{app:math_reals}.

\subsection{The Universal Set and the Complement of a Set}
\label{app:math_universal_set}

For a given discussion, if all sets are subsets of a single set
$\set{U}$, that set $\set{U}$ is known as the \emph{universal set}.
There is no way to define a single universal set for all discussions; in
fact, defining a general set of all sets leads to contradictions of
logic. In other words, many important operations on sets are not
independent of context. This is particularly important when discussing
the \symdef[\emph{complement}]{Csets.207}{complement}{$\set{X}^c$}%
{complement of set $\set{X}^c$ (\eg, $\set{U} \setdiff \set{X}$ where
$\set{X} \subseteq \set{U}$)} of a set, which could roughly be defined
as a set made up of everything not in the set of interest. For example,
the complement of set $\set{J}$ from \longref{eq:ex_set_J}, denoted
$\set{J}^c$, could be $\{\text{Things not said by Joe}\}$ or could be
$\{\text{Things said by other people}\}$. It is not possible to define
$\set{J}^c$ without first defining the universal set. Once the universal
set $\set{U}$ is defined as a superset of $\set{J}$ then $\set{J}^c$ is
defined as
%
\begin{equation*}
        \set{J}^c \triangleq \{ x \in \set{U} : x \notin \set{J} \}
\end{equation*}
%
For example, some valid universal sets that are each supersets of set
$\set{J}$ are
%
\begin{align*}
        &\{\text{Things that were said or written by anyone}\} 
        \text{ or }\\
        &\{\text{Things that were said or written by Joe}\} 
        \text{ or }\\
        &\{\text{Things that were said by anyone}\}
\end{align*}
%
or even $\set{J}$ itself. If the universal set $\set{U} = \set{J}$ then
$\set{J}^c = \emptyset$; in fact, for any discussion, the complement of
the universal set is always the empty set (\ie, $\set{U}^c =
\emptyset$). Similarly, for any discussion, the complement of the empty
set, is always the universal set (\ie, $\emptyset^c = \set{U}$). Of
course, for any set $\set{X}$, $( \set{X}^c )^c = \set{X}$. That is, the
complement of the complement of any set is the set itself.

\paragraph{Power Set as Universal Set:} Take a universal set $\set{U}$
so that every set in a discussion would be a subset of $\set{U}$.
The power set $\Pow(\set{U})$ serves as a universal set for all subsets
of $\set{U}$. That is, for any $\set{X} \subseteq \set{U}$, $\set{X} \in
\Pow(\set{U})$. Take any set of sets $\setset{S}$. A minimal universal
set $\set{U}$ for each element of $\setset{S}$ can be defined by 
%
\begin{equation*}
        \set{U} 
        \triangleq 
        \{ x : x \in \set{X}, \set{X} \in \setset{S} \} 
\end{equation*}
%
Clearly, for all $\set{X} \in \setset{S}$, $\set{X} \subseteq \set{U}$
and so $\set{X} \in \Pow(\set{U})$. Therefore, $\setset{S} \subseteq
\Pow(\set{U})$, and so $\Pow(\set{U})$ can be viewed as the universal
set for all elements of $\setset{S}$.

\subsection{Operations on Sets}
\label{app:math_set_operations}

We will now discuss the standard \emph{set operations} and the
corresponding \emph{set operators}. A formal definition of these set
operations requires a discussion of the \emph{universal set}, which we
introduce in \longref{app:math_universal_set}; we will discuss this
briefly in \longref{app:math_operations}.

\paragraph{Union:} The \symdef[\emph{union (or
join)}]{Csets.202}{union}{$\set{X} \cup \set{Y}$}{set union (or join) of
sets $\set{X}$ and $\set{Y}$} of two arbitrary sets is the set resulting
from the inclusion of elements from both sets. That is, take arbitrary
sets $\set{X}$ and $\set{Y}$ that are each subsets of universal set
$\set{U}$. Their union is denoted $\set{X} \cup \set{Y}$ and defined by
%
\begin{equation*}
        \set{X} \cup \set{Y}
        \triangleq 
        \{ z \in \set{U} : z \in \set{X} \text{ or } z \in \set{Y} \}
\end{equation*}
%
Sometimes this operation is called as set addition and denoted $\set{X}
+ \set{Y}$. However, for reasons explained in
\longref{app:math_algebras_of_sets}, calling this operation addition may
not make sense. The \emph{symmetric difference} operation, explained
below, makes more sense as a set addition operation. Take sets $\set{X}$
and $\set{Y}$. Note that
%
\begin{equation*}
        \set{X} \subseteq \set{Y}
        \quad \text{ if and only if } \quad
        \set{X} \cup \set{Y} = \set{Y}
\end{equation*}
%
which relates to the reason for calling the set union the \emph{join} of
its elements. This actually shows one way of defining what it means to
be a subset of another set. Thus, 
%
\begin{itemize}
        \item to say that $\set{X}$ is \emph{larger} than $\set{Y}$
                means that $\set{Y} \subseteq \set{X}$ which is
                equivalent to saying that $\set{Y} \cup \set{X} =
                \set{X}$
        \item for set of sets $\setset{B}$, if $\set{X} \in \setset{B}$
                then set $\set{X}$ is the \emph{largest} set of
                $\setset{B}$ means that $\set{B} \cup \set{X} = \set{X}$
                (\ie, $\set{X} \subseteq \set{B}$) for all $\set{B} \in
                \setset{B}$
\end{itemize}
%
Take the indexed sets $\set{X}_1$, $\set{X}_2$, $\set{X}_3$, and
$\set{X}_4$. Also take a set of sets $\setset{X}$ defined by
%
\begin{align*}
        \setset{X}
        &\triangleq
        \{ \set{X}_i : i \in \{1,2,3,4\} \}\\
        &=
        \{ \set{X}_1, \set{X}_2, \set{X}_3, \set{X}_4 \}
\end{align*}
%
In this case, the symbol \symdef{Csets.2}{bigunion}{$\bigcup$}{union of
many sets (compare to $\sum$)} can be used to represent the union of the
elements of $\setset{X}$. That is, 
%
\begin{equation*}
        \bigcup \{ \set{X}_1, \set{X}_2, \set{X}_3, \set{X}_4 \}
        \quad
        \text{ and }
        \quad
        \bigcup \{ \set{X}_i : i \in \{1,2,3,4\} \}
        \quad
        \text{ and }
        \quad
        \bigcup \setset{X}
\end{equation*}
%
and the alternate notations
%
\begin{equation*}
        \bigcup\limits_{i \in \{1,2,3,4\}} \set{X}_i
        \quad
        \text{ and }
        \quad
        \bigcup\limits_{i=1}^4 \set{X}_i
\end{equation*}
%
are all equivalent notations for $\set{X}_1 \cup \set{X}_2 \cup
\set{X}_3 \cup \set{X}_4$. In other words, the symbol $\bigcup$ can be
used to take the union of multiple sets, whether they be indexed or are
simply elements of a set of sets. By convention, the union of an empty
set of sets (\ie, $\bigcup \{\}$) is the empty set; this is analogous to
the familiar \emph{additive identity}. Note that for arbitrary set
$\set{X}$ which is a subset of universal set $\set{U}$ it is the case
that $\set{X} \cup \set{X} = \set{X}$ and $\set{X} \cup \set{U} =
\set{U}$. 

\paragraph{Intersection:} The \symdef[\emph{intersection (or
meet)}]{Csets.201}{intersection}{$\set{X} \cap \set{Y}$}{set
intersection (or meet) of sets $\set{X}$ and $\set{Y}$} of two arbitrary
sets is the set of elements common to both sets. That is, take arbitrary
sets $\set{X}$ and $\set{Y}$ that are each subsets of universal set
$\set{U}$. Their intersection is denoted $\set{X} \cap \set{Y}$ and
defined by
%
\begin{equation*}
        \set{X} \cap \set{Y}
        \triangleq 
        \{ z \in \set{U} : z \in \set{X} \text{ and } z \in \set{Y} \}
\end{equation*}
%
Of course, since $\set{X} \subseteq \set{U}$, it is equivalent to say
%
\begin{equation*}
        \set{X} \cap \set{Y}
        =
        \{ z \in \set{X} : z \in \set{Y} \}
\end{equation*}
%
Take sets $\set{X}$ and $\set{Y}$. Note that
%
\begin{equation*}
        \set{X} \subseteq \set{Y}
        \quad \text{ if and only if } \quad
        \set{X} \cap \set{Y} = \set{X}
\end{equation*}
%
which relates to the reason for calling the set intersection the
\emph{meet} of its elements. This actually shows one way of defining
what it means to be a subset of another set. Thus, 
%
\begin{itemize}
        \item to say that $\set{X}$ is \emph{smaller} than $\set{Y}$
                means that $\set{X} \subseteq \set{Y}$ which is
                equivalent to saying that $\set{X} \cap \set{Y} =
                \set{X}$
        \item for set of sets $\setset{B}$, if $\set{X} \in \setset{B}$
                then set $\set{X}$ is the \emph{smallest} set of
                $\setset{B}$ means that $\set{X} \cap \set{B} = \set{X}$
                (\ie, $\set{X} \subseteq \set{B}$) for all $\set{B} \in
                \setset{B}$
\end{itemize}
%
As before, take $\setset{X} \triangleq \{ \set{X}_1, \set{X}_2,
\set{X}_3, \set{X}_4 \}$. Then the symbol
\symdef{Csets.2}{bigintersection}{$\bigcap$}{intersection of many sets
(compare to $\sum$)} can be used to represent the intersection of the
elements of $\setset{X}$. That is,
%
\begin{equation*}
        \bigcap \{ \set{X}_1, \set{X}_2, \set{X}_3, \set{X}_4 \}
        \quad
        \text{ and }
        \quad
        \bigcap \{ \set{X}_i : i \in \{1,2,3,4\} \}
        \quad
        \text{ and }
        \quad
        \bigcap \setset{X}
\end{equation*}
%
and the alternate notations
%
\begin{equation*}
        \bigcap\limits_{i \in \{1,2,3,4\}} \set{X}_i
        \quad
        \text{ and }
        \quad
        \bigcap\limits_{i=1}^4 \set{X}_i
\end{equation*}
%
are all equivalent notations for $\set{X}_1 \cap \set{X}_2 \cap
\set{X}_3 \cap \set{X}_4$. In other words, the symbol $\bigcap$ can be
used to take the intersection of multiple sets, whether they be indexed
or are simply elements of a set of sets. By convention, the intersection
of an empty set of sets (\ie, $\bigcap \{\}$) is the universal set; this
is analogous to the familiar \emph{multiplicative identity}. Note that
for arbitrary sets $\set{X}$ and $\set{Y}$ which are subsets of
universal set $\set{U}$, it is the case that $\set{X} \cap \set{X} =
\set{X}$ and $\set{X} \cap \emptyset = \emptyset$; it is also the case
that $\set{X} \cup ( \set{X} \cap \set{Y} ) = \set{X} \cap ( \set{X}
\cup \set{Y} ) = \set{X}$, where parentheses group operations which
should be applied first. 

\paragraph{Difference:} The \symdef[\emph{set
difference}]{Csets.203}{setdiff}{$\set{X} \setdiff \set{Y}$}{difference
of sets $\set{X}$ and $\set{Y}$} of two arbitrary sets is the set
resulting from the exclusion of the common elements (\ie, the
intersection) of both sets from one of the sets. That is, take arbitrary
sets $\set{X}$ and $\set{Y}$ that are each subsets of universal set
$\set{U}$. The set difference between them, denoted $\set{X} \setdiff
\set{Y}$, is defined by
%
\begin{equation*}
        \set{X} \setdiff \set{Y}
        \triangleq 
        \{ z \in \set{U}: z \in \set{X} \text{ and } z \notin \set{Y} \}
\end{equation*}
%
Since $\set{X} \subseteq \set{U}$ then
%
\begin{equation*}
        \set{X} \setdiff \set{Y}
        =
        \{ z \in \set{X} : z \notin \set{Y} \}
\end{equation*}
%
Both $-$ and $\setminus$ are frequently used to denote the set
difference. Note that $\set{X} \setdiff \set{Y} = \set{X} \cap
\set{Y}^c$. The set complement can be written in terms of the difference
between the set and the universal set. For arbitrary set $\set{X}$ that
is a subset of universal set $\set{U}$,
%
\begin{equation*}
        \set{X}^c = \set{U} \setdiff \set{X}
\end{equation*}
%
Many authors choose to make the universal set explicit and refer to the
set complement only in terms of this set difference. Additionally, the
set difference is sometimes called the \emph{relative complement}. That
is, $\set{X} \setdiff \set{Y}$ would be called the \emph{relative
complement of $\set{Y}$ in $\set{X}$}, which would be the elements in
$\set{Y}$ that are not in $\set{X}$. In other words, the relative
complement is the complement of a set if a new set was taken to be the
universal set.

\paragraph{Symmetric Difference:} The \symdef[\emph{symmetric
difference}]{Csets.204}{setsymdiff}{$\set{X} \symdiff
\set{Y}$}{symmetric difference of sets $\set{X}$ and $\set{Y}$ (\ie, an
exclusive union; $(\set{X} \cup \set{Y}) \setdiff (\set{Y} \cap
\set{X})$)} of two arbitrary sets is the set of elements taken from both
(\ie, the union) that are not common to both (\ie, not the
intersection). That is, take arbitrary sets $\set{X}$ and $\set{Y}$ that
are each subsets of universal set $\set{U}$. The symmetric difference
between them, denoted $\set{X} \symdiff \set{Y}$, is defined by
%
\begin{equation*}
        \set{X} \symdiff \set{Y}
        \triangleq 
        \{ z \in \set{U}: z \in \set{X} \cup \set{Y} \text{ and } z
        \notin \set{X} \cap \set{Y} \}
\end{equation*}
%
Of course, since $\set{X} \cup \set{Y} \subseteq \set{U}$, it is
equivalent to say
%
\begin{equation*}
        \set{X} \symdiff \set{Y}
        =
        \{ z \in \set{X} \cup \set{Y} : z \notin \set{X} \cap \set{Y} \}
\end{equation*}
%
For reasons explained in \longref{app:math_algebras_of_sets}, this
operation will sometimes be called set addition and be denoted $\set{X}
+ \set{Y}$. However, because some authors denote set union with the $+$
operator, the alternate notation $\set{X} \oplus \set{Y}$ may be used.
On the other hand, the operator $\oplus$ is identified with other
operations, and thus $\symdiff$ may be the best choice of notation. It
is easy to show that
%
\begin{itemize}
        \item $\set{X} \symdiff \set{Y} = (\set{X} \cup \set{Y}) \cap
                (\set{X} \cap \set{Y})^c$
        \item $\set{X} \symdiff \set{Y} = (\set{X} \cup \set{Y}) \cap
                (\set{X}^c \cup \set{Y}^c)$
        \item $\set{X} \symdiff \set{Y} = 
                (\set{X} \cup \set{Y}) \setdiff (\set{X} \cap \set{Y})$
        \item $\set{X} \symdiff \set{Y} = 
                (\set{X} \setdiff \set{Y}) \cup
                (\set{Y} \setdiff \set{X})$
        \item $\set{X} \symdiff \set{Y} = (\set{X} \cap \set{Y}^c) \cup
                (\set{Y} \cap \set{X}^c)$
\end{itemize}
%
We will use the symmetric difference rarely; however, it is important
when viewing sets in an algebraic context like the ones described in
\longrefs{app:math_abstract_algebra} and
\shortref{app:math_linear_algebra}. In particular, it will be important
to note that
%
\begin{itemize}
        \item $\set{X}^c = \set{U} \symdiff \set{X}$
        \item $\set{X} \cup \set{Y} = \set{X} \symdiff \set{Y} \symdiff
                ( \set{X} \cap \set{Y} )$
        \item $\set{X} \setdiff \set{Y} = ( \set{U} \symdiff \set{Y} )
                \cap \set{X}$
\end{itemize}
%
That is, set complement, set union, and set difference can all be built
from set symmetric difference and set intersection. Therefore, an
analysis of the structure of sets of sets need only be concerned with
these two operations.

\subsection{Partitions of Sets}
\label{app:math_sets_partitions}

Take \emph{non-empty} sets $\set{X}$, $\set{Y}$ and $\set{Z}$ with
$\set{X} \subseteq \set{Z}$ and $\set{Y} \subseteq \set{Z}$. 
%
\begin{itemize}
        \item If $\set{X} \cap \set{Y} = \emptyset$ (\ie, sets $\set{X}$
                and $\set{Y}$ have no common elements) then sets
                $\set{X}$ and $\set{Y}$ are said to be \emph{mutually
                exclusive} or \emph{(pairwise) disjoint}. That is,
                $\set{X}$ and $\set{Y}$ are \emph{disjoint sets}.
        \item If $\set{X} \cup \set{Y} = \set{Z}$ then sets $\set{X}$
                and $\set{Y}$ are said to be \emph{collectively
                exhaustive} in set $\set{Z}$.
        \item If sets $\set{X}$ and $\set{Y}$ are both mutually
                exclusive and collectively exhaustive in $\set{Z}$ then
                sets $\set{X}$ and $\set{Y}$ are said to
                \emph{partition} set $\set{Z}$. In this case, every
                element $z \in \set{Z}$ is an element of exactly one of
                the sets $\set{X}$ and $\set{Y}$. This \emph{partition}
                of set $\set{Z}$ is denoted as the set $\{ \set{X},
                \set{Y} \}$.
\end{itemize}
%
While these definitions have been given in terms of two non-empty
subsets, they apply to collections of any number of non-empty sets. 

\paragraph{Mutually Exclusive and Pairwise Disjoint:} Technically, two
sets $\set{X}$ and $\set{Y}$ are said to be \emph{disjoint} if $\set{X}
\cap \set{Y} = \emptyset$. For a set of sets $\{ \set{X}, \set{Y},
\set{Z} \}$, the collection of sets is said to be \emph{mutually
exclusive} or \emph{mutually disjoint} or \emph{pairwise disjoint} if
any two sets has an empty intersection. For example, for the infinite
set of indexed sets 
%
\begin{equation*}
        \{ 
        \set{X}_1, \set{X}_2, \set{X}_3, \set{X}_4, \set{X}_5, \dots 
        \}
\end{equation*}
%
the sets are said to be \emph{pairwise disjoint} if for any $i,j \in \N$
with $i \neq j$, it is the case that $\set{X}_i \cap \set{X}_j =
\emptyset$. If it is also the case that the union of these sets is equal
to set $\set{Y}$ then these sets are said to \emph{partition} $\set{Y}$
and the set of sets forms a \emph{partition} of $\set{Y}$.

\subsection{Geometric Interpretation of Set Operations}
\label{app:math_sets_venn_diagram}

In \longrefs{fig:functions} and \shortref{fig:function_comps} in
\longref{app:math_functions}, we made use of graphical depictions of
sets in order to make function mappings more intuitive. There are
similar diagrams for set operations that can be applied very generally.
An understanding of these diagrams allows for quick justification of the
statements made in \longrefs{app:math_sets_cadso} and
\shortref{app:math_sets_dml}.

The diagrams that represent sets are often types of \emph{Euler
diagrams}. This type of set diagram can be very useful when dealing with
propositional logic, the topic of \longref{app:math_logic}. Because of
this, versions of these diagrams used explicitly with logic are known as
\emph{Johnston diagrams}. When a Euler diagram is used to show all
possible relationships (\ie, union, intersection, and others) among a
number of sets, the diagram is commonly known as a \emph{Venn diagram}.

The combination of diagrams in \longref{fig:venn_diagrams} depict a
single Venn diagram shaded in six different ways. That is, all six
diagrams depict arbitrary sets $\set{X}$ and $\set{Y}$ which are subsets
of universal set $\set{U}$. The three sets $\set{X}$, $\set{Y}$, and
$\set{U}$ are each shown as squares with solid borders. The $\set{X}$
and $\set{Y}$ squares overlap to indicate that sets $\set{X}$ and
$\set{Y}$ may have some shared elements. The $\set{X}$ and $\set{Y}$
squares are both located within the $\set{U}$ square to indicate that
all elements of sets $\set{X}$ and $\set{Y}$ are also elements of
universal set $\set{U}$. Each of the six diagrams are identical except
for the shading, which selects elements that result from the set
operation in question. For example, \longref{fig:venn_set} is shaded to
select only elements from set $\set{X}$. The large region in
\longref{fig:venn_complement} shows the elements of the universal set
$\set{U}$ that are not elements of set $\set{X}$ (\ie, the complement of
$\set{X}$). The region in \longref{fig:venn_union} shows the elements
that are either members of set $\set{X}$ or $\set{Y}$ or both (\ie, the
union of $\set{X}$ and $\set{Y}$). The small region in
\longref{fig:venn_intersection} shows the few shared elements of sets
$\set{X}$ and $\set{Y}$ (\ie, the intersection of $\set{X}$ and
$\set{Y}$). The region in \longref{fig:venn_difference} shows the
elements of set $\set{X}$ that are not elements of $\set{Y}$ (\ie, the
difference $\set{X} \setdiff \set{Y}$). Finally, the region in
\longref{fig:venn_symdifference} shows elements that are members of
$\set{X}$ or $\set{Y}$ but not members of both (\ie, the symmetric
difference $\set{X} \symdiff \set{Y}$).
%
\begin{figure}[!ht]\centering
        \subfloat[Set $\set{X}$]{
        \begin{picture}(120,120)(-10,-10)
                \definecolor{white}{gray}{1}
                \definecolor{gray}{gray}{.5}
                \thicklines
                \put(0,0){\fcolorbox{black}{white}
                        {\makebox(100,100)[tr]{\text{$\set{U}$}}}}
                \put(23,48){\fcolorbox{black}{gray}
                        {\makebox(30,30){\text{$\set{X}$}}}}
                \put(48,23){\fcolorbox{black}{white}
                        {\makebox(30,30){\text{$\set{Y}$}}}}
                \put(48,48){\fcolorbox{black}{gray}
                        {\makebox(5,5){}}}
        \end{picture}
        \label{fig:venn_set}
        }
        \quad
        \subfloat[Set Complement $\set{X}^c$]{
        \begin{picture}(120,120)(-10,-10)
                \definecolor{white}{gray}{1}
                \definecolor{gray}{gray}{.5}
                \thicklines
                \put(0,0){\fcolorbox{black}{gray}
                        {\makebox(100,100)[tr]{\text{$\set{U}$}}}}
                \put(23,48){\fcolorbox{black}{white}
                        {\makebox(30,30){\text{$\set{X}$}}}}
                \put(48,23){\fcolorbox{black}{gray}
                        {\makebox(30,30){\text{$\set{Y}$}}}}
                \put(48,48){\fcolorbox{black}{white}
                        {\makebox(5,5){}}}
        \end{picture}
        \label{fig:venn_complement}
        }\\
        \medskip
        \subfloat[Set Union $\set{X} \cup \set{Y}$]{
        \begin{picture}(120,120)(-10,-10)
                \definecolor{white}{gray}{1}
                \definecolor{gray}{gray}{.5}
                \thicklines
                \put(0,0){\fcolorbox{black}{white}
                        {\makebox(100,100)[tr]{\text{$\set{U}$}}}}
                \put(23,48){\fcolorbox{black}{gray}
                        {\makebox(30,30){\text{$\set{X}$}}}}
                \put(48,23){\fcolorbox{black}{gray}
                        {\makebox(30,30){\text{$\set{Y}$}}}}
                \put(48,48){\fcolorbox{black}{gray}
                        {\makebox(5,5){}}}
        \end{picture}
        \label{fig:venn_union}
        }
        \quad
        \subfloat[Set Intersection $\set{X} \cap \set{Y}$]{
        \begin{picture}(120,120)(-10,-10)
                \definecolor{white}{gray}{1}
                \definecolor{gray}{gray}{.5}
                \thicklines
                \put(0,0){\fcolorbox{black}{white}
                        {\makebox(100,100)[tr]{\text{$\set{U}$}}}}
                \put(23,48){\fcolorbox{black}{white}
                        {\makebox(30,30){\text{$\set{X}$}}}}
                \put(48,23){\fcolorbox{black}{white}
                        {\makebox(30,30){\text{$\set{Y}$}}}}
                \put(48,48){\fcolorbox{black}{gray}
                        {\makebox(5,5){}}}
        \end{picture}
        \label{fig:venn_intersection}
        }\\
        \medskip
        \subfloat[Set Difference $\set{X} \setdiff \set{Y}$]{
        \begin{picture}(120,120)(-10,-10)
                \definecolor{white}{gray}{1}
                \definecolor{gray}{gray}{.5}
                \thicklines
                \put(0,0){\fcolorbox{black}{white}
                        {\makebox(100,100)[tr]{\text{$\set{U}$}}}}
                \put(23,48){\fcolorbox{black}{gray}
                        {\makebox(30,30){\text{$\set{X}$}}}}
                \put(48,23){\fcolorbox{black}{white}
                        {\makebox(30,30){\text{$\set{Y}$}}}}
                \put(48,48){\fcolorbox{black}{white}
                        {\makebox(5,5){}}}
        \end{picture}
        \label{fig:venn_difference}
        }
        \quad
        \subfloat[Symmetric Difference $\set{X} \symdiff \set{Y}$]{
        \begin{picture}(120,120)(-10,-10)
                \definecolor{white}{gray}{1}
                \definecolor{gray}{gray}{.5}
                \thicklines
                \put(0,0){\fcolorbox{black}{white}
                        {\makebox(100,100)[tr]{\text{$\set{U}$}}}}
                \put(23,48){\fcolorbox{black}{gray}
                        {\makebox(30,30){\text{$\set{X}$}}}}
                \put(48,23){\fcolorbox{black}{gray}
                        {\makebox(30,30){\text{$\set{Y}$}}}}
                \put(48,48){\fcolorbox{black}{white}
                        {\makebox(5,5){}}}
        \end{picture}
        \label{fig:venn_symdifference}
        }
        \caption[Graphical Interpretation of Set Operations]{Shaded
        regions depict operation result.}
        \label{fig:venn_diagrams}
\end{figure}

\subsection{Commutativity, Associativity, and Distributivity of Set
Operations}
\label{app:math_sets_cadso}

\paragraph{Commutativity:} The order of the union, intersection, or
symmetric difference of two sets has no impact on the outcome of the
operation. In other words, set intersection, set union, and set
symmetric difference are all \emph{commutative} operations. That is, for
sets $\set{X}$ and $\set{Y}$,
%
\begin{equation*}
        \set{X} \cup \set{Y} = \set{Y} \cup \set{X}
        \quad
        \text{ and }
        \quad
        \set{X} \cap \set{Y} = \set{Y} \cap \set{X}
        \quad
        \text{ and }
        \quad
        \set{X} \symdiff \set{Y} = \set{Y} \symdiff \set{X}
\end{equation*}
%
This can easily be seen in \longrefs{fig:venn_union} and
\shortref{fig:venn_intersection} as the area shaded does not vary with
the order of the arguments of the operation.

\paragraph{Associativity:} When taking the union, intersection, or
symmetric difference of a group of sets, the result will not be impacted
by the order in which the operations were applied. In other words, set
intersection, set union, and set symmetric difference are all
\emph{associative} operations. That is, for sets $\set{X}$, $\set{Y}$,
and $\set{Z}$,
%
\begin{equation*}
        \set{X} \cup (\set{Y} \cup \set{Z})
        = 
        (\set{X} \cup \set{Y}) \cup \set{Z}
\end{equation*}
%
and
%
\begin{equation*}
        \set{X} \cap (\set{Y} \cap \set{Z})
        = 
        (\set{X} \cap \set{Y}) \cap \set{Z}
\end{equation*}
%
and
%
\begin{equation*}
        \set{X} \symdiff (\set{Y} \symdiff \set{Z})
        = 
        (\set{X} \symdiff \set{Y}) \symdiff \set{Z}
\end{equation*}
%
where parentheses are used as grouping symbols to indicate which
operation should be completed first.

\paragraph{Distributivity of Intersection and Union:} Set operations can
distribute across grouping symbols. In other words, set intersection
\emph{distributes} over set union, and set union distributes over set
intersection. That is, for sets $\set{X}$, $\set{Y}$, and $\set{Z}$,
%
\begin{equation*}
        \set{X} \cup (\set{Y} \cap \set{Z})
        = 
        (\set{X} \cup \set{Y}) \cap (\set{X} \cup \set{Z})
        \quad
        \text{ and }
        \quad
        \set{X} \cap (\set{Y} \cup \set{Z})
        = 
        (\set{X} \cap \set{Y}) \cup (\set{X} \cap \set{Z})
\end{equation*}

\paragraph{Distributivity of Intersection over Symmetric Difference:}
The set intersection operation also distributes over symmetric
difference. That is, for sets $\set{X}$, $\set{Y}$, and $\set{Z}$,
%
\begin{equation*}
        \set{X} \cap (\set{Y} \symdiff \set{Z})
        = 
        (\set{X} \cap \set{Y}) \symdiff (\set{X} \cap \set{Z})
\end{equation*}

\subsection{The Set-Theoretic De Morgan's Laws}
\label{app:math_sets_dml}

For any two sets $\set{X}$ and $\set{Y}$, it is always the case that
%
\begin{equation*}
        ( \set{X} \cap \set{Y} )^c = \set{X}^c \cup \set{Y}^c 
\end{equation*}
%
which can also be written in terms of the universal set $\set{U}$ as
%
\begin{equation*}
        \set{U} \setdiff ( \set{X} \cap \set{Y} ) 
        = 
        ( \set{U} \setdiff \set{X} ) 
        \cup 
        ( \set{U} \setdiff \set{Y} )
\end{equation*}
%
This can be verified using the diagrams in \longref{fig:venn_diagrams}.
This relationship is particularly important to applications of
propositional logic, and so often the term \emph{De Morgan's Laws}
implies a logical context.

\section{Propositional Logic}
\label{app:math_logic}

The topic of \emph{propositional (or sentential) logic} provides a
general method for analytical reasoning, and so we need to introduce
logic as a tool to justify our claims. Here, we mean to define the
vocabulary we use in those claims. For example, the phrases \emph{if and
only if} and \emph{implies} will be defined here. Thus, our discussion
of logic is less formal and less complete than our discussions of other
mathematical constructs in this \appname{}. \Citet{Martin04} and
\citet{Gabbay02} provide concise summaries of symbolic logic, and
\citet{Hinman05} gives a more formal mathematical treatment. As already
mentioned, \citet{Stoll79} explicitly integrates logic with set theory
and algebra. We connect logic to algebra briefly in
\longref{app:math_prop_logic_boolean_algebra}. Additionally, we connect
sets to algebra in \longref{app:math_algebras_of_sets}. The connection
between set theory and logic is through the algebra that analyzes their
common structures.

\subsection{Sentences}

In propositional logic (also known as sentential logic), a
\emph{sentence} is a statement that can \emph{independently} be said to
be either \emph{true} or \emph{false} (but not both nor some other truth
value). For example, ``there exists a boy on Earth with a certain color
jacket'' cannot be a sentence because it cannot be said to be true or
false without knowing the color to which ``certain color'' refers.
However, ``there exists a boy on Earth with a red jacket'' is a
sentence; similarly, ``for any color, there exists a boy on Earth with a
jacket of that color'' is also a sentence. Both of these can be
evaluated as true or false without needing any additional information.
Sentences can also be specified in terms of mathematical relationships.
For example, ``$1+1=5$'' is a false sentence, and ``$2+2=4$'' is a true
sentence. Similarly, ``for any number $x$, $1+x=5$'' is a false sentence
and ``there exists a number $x$ such that $1+x=5$'' is a true sentence.
However, ``$1+x=5$'' alone is not a sentence because its truth cannot be
evaluated without knowing $x$. 

For simplicity, sentences will often be defined symbolically. For
example, consider the definitions:
%
\begin{align*}
        p &\triangleq \text{$2+2=4$}\\
        q &\triangleq \text{$4-2=2$}\\
        r &\triangleq \text{Joe eats with a fork.}\\
        s &\triangleq \text{Everyone on Earth eats with a fork.}\\
        t &\triangleq \text{Today, the sky on Earth is blue.}
\end{align*}
%
where Joe is a person on Earth. We will use these definitions in
examples below. Note that a symbolic sentence ``$a$'' is true only when
$a$. That is, since $2+2=4$ (\ie, $p$), the sentence ``$p$'' (\ie,
``$2+2=4$'') is true. For brevity, phrases like ``for all'' and ``for
any'' are often replaced with the symbol
\symdef{Elogic.exists0}{forall}{$\forall$}{for all/any}. Also, phrases
like ``there exists'' are often replaced with the symbol
\symdef{Elogic.exists1}{exists}{$\exists$}{there exists}. Similarly,
phrases like ``there does not exist'' will be replaced with the symbol
\symdef{Elogic.exists1}{nexists}{$\nexists$}{there does not exist}. The
phrase ``there exists a unique'' (\ie, implying the existence of one and
only one) is represented by the symbol
\symdef{Elogic.exists1}{existsunique}{$\exists \bang$}{there exists a
unique}.

\subsection{Logical Connectives and Compound Sentences}

Propositional logic is a \emph{truth-functional logic} because sentences
can be combined to make \emph{compound sentences} whose ultimate truth
depends \emph{only} upon the truth of their constituent sentences. In
other words, these compound sentences can be thought of as functions
mapping the truth of their constituents to some ultimate truth. These
compound sentences are constructed with \emph{logical connectives}.
These logical connectives are also known as \emph{logical operators} and
may be defined using the same constructs as other algebraic operators.
We describe the most common connectives here. Just as the order of
operations can be made explicit or changed with grouping symbols (\eg,
$($ and $)$) in arithmetic, those same grouping symbols can be used in
compound sentences for analogous reasons.

\paragraph{And and Or:} The sentence ``$p \text{ and } q$'' joins
sentences ``$p$'' and ``$q$'' to form the compound sentence ``$(2+2=4)
\text{ and } (4-2=2)$'' which is only true if both $2+2=4$ and $4-2=2$
(\ie, ``$p \text{ and } q$'' is only true if $p$ and $q$). Similarly,
the sentence ``$s \text{ or } t$'' is true if ``$s$'' is true, ``$t$''
is true, or both ``$s$'' and ``$t$'' are true (\ie, this is an
\emph{inclusive or}). That is, $s \text{ or } t$ only when $s$ or $t$ or
both $s$ and $t$. The symbols $\land$ and $\lor$ are often used to
represent \emph{and} and \emph{or} respectively. These symbols are
related to the symbols $\cap$ and $\cup$ respectively, which were
introduced in \longref{app:math_sets}; in fact, sometimes $\land$ and
$\lor$ are replaced with $\cap$ and $\cup$ respectively.

\paragraph{Negation:} For $a$, the sentence ``$\neg a$'' (\ie, ``not
$a$'') is the \emph{logical negation} of sentence ``$a$'' and is only
true when $a$ is not the case. That is, ``$\neg p$'' is a false sentence
since it is not the case that $2+2 \neq 4$. Note that the negation of a
sentence that includes ``every'' (\ie, $\forall$) is usually a sentence
that involves ``there is'' (\ie, $\exists$). For example, ``$\neg s$''
might be written, ``There is no one on earth who eats with a fork,''
which is most likely a false sentence. For reasons relating to the
material in \longref{app:math_sets}, the logical negation is also known
as the \emph{logical complement} or simply the \emph{complement}. In
these cases, $\neg a$ might be denoted $a^c$. 

\paragraph{Implication:} The \symdef[\emph{logical
implication}]{Elogic}{implies}{$\implies$}{logical implication}
connective for $a$ and $b$ forms the sentence ``$a \implies b$'' (\ie,
``$a \text{ implies } b$''), which is true when $a$ is not the case or
$b$ is the case (\ie, $\neg a$ or $b$). In other words, for $a$ and $b$,
``$a \implies b$'' represents the sentence ``if $a$ then $b$,'' which
can only be shown to be false when ``$a$'' is true and ``$b$'' is not
true. Thus, $p \implies q$ and $q \implies p$ (\ie, both ``$p \implies
q$'' and ``$q \implies p$'' are true sentences). However, while $s
\implies r$, it is not the case that $r \implies s$. 

\paragraph{Equivalence:} The \symdef[\emph{logical
equivalence}]{Elogic}{iff}{$\iff$}{logical equivalence} connective for
$a$ and $b$ forms the sentence ``$a \iff b$'' (\ie, ``$a \text{ is
equivalent to } b$'' or ``$a \text{ if and only if } b$'') and is only
true when $a \implies b$ and $b \implies a$. In other words, for $a$ and
$b$, if $a \iff b$ then $a$ and $b$ are equivalent sentences (with
respect to their logic). Thus, $p \iff q$; however, it is not the case
that $r \iff s$.  The symbol $\equiv$ may sometimes be used instead of
the symbol $\iff$, but $=$ is usually not an appropriate replacement.

In summary, for $a$ and $b$,
%
\begin{equation*}
        \left( 
        (a \implies b) \text{ and } (b \implies a) 
        \right)
        \iff
        \left( 
        (\neg a \text{ or } b) \text{ and } (\neg b \text{ or } a)
        \right)
        \iff
        (a \iff b)
\end{equation*}
%
is always the case.

\subsection{Converse, Inverse, and Contraposition}

Take arbitrary $a$ and $b$ and the sentence ``$a \implies b$.'' The
\emph{inverse} of the sentence is ``$\neg a \implies \neg b$.'' The
\emph{converse} of the sentence is ``$b \implies a$.'' The
\emph{contrapositive} of the sentence is the inverse of its converse,
which is ``$\neg b \implies \neg a$.'' It is the case that 
%
\begin{equation*}
        (a \implies b) \iff (\neg b \implies \neg a)
\end{equation*}
%
That is, any sentence is logically equivalent to its contrapositive.
This is not necessarily the case for its inverse and its converse. For
example, if ``everyone on Earth eats with a fork,'' then ``Joe eats with
a fork'' (\ie, $s \implies r$); however, if ``Joe does not eat with a
fork'' then ``There is someone on earth who does not eat with a fork''
(\ie, $\neg r \implies \neg s$).

\subsection{Commutativity, Associativity, and Distributivity of Logic
Operations}
\label{app:math_logic_cadso}

\paragraph{Commutativity of And and Or:} The order of the arguments of
$\land$ (\ie, \emph{and}) or $\lor$ (\ie, \emph{or}) has no impact on
the outcome of the operation. In other words, $\land$ and $\lor$ are
\emph{commutative} logic operations. That is, for $x$ and $y$,
%
\begin{equation*}
        x \lor y = y \lor x
        \quad
        \text{ and }
        \quad
        x \land y = y \land x
\end{equation*}

\paragraph{Associativity of And and Or:} A chain of three or more of the
$\land$ logical connective can be evaluated in any order.  Similarly, a
chain of three or more of the $\lor$ logical connective can be evaluated
in any order. In other words, $\land$ and $\lor$ are \emph{associtiave}
operations. That is, for $x$, $y$, and $z$,
%
\begin{equation*}
        x \lor (y \lor z)
        = 
        (x \lor y) \lor z
        \quad
        \text{ and }
        \quad
        x \land (y \land z)
        = 
        (x \land y) \land z
\end{equation*}
%
where parentheses are used as grouping symbols to indicate which
operation should be completed first.

\paragraph{Distributivity of And and Or:} Logic operations can
distribute across grouping symbols. In other words, logical and
\emph{distributes} over logical or, and logical or distributes over
logical and. That is, for $x$, $y$, and $z$,
%
\begin{equation*}
        x \lor (y \land z)
        = 
        (x \lor y) \land (x \lor z)
        \quad
        \text{ and }
        \quad
        x \land (y \lor z)
        = 
        (x \land y) \lor (x \land z)
\end{equation*}

\subsection{The Logical De Morgan's Laws}
\label{app:math_logic_dml}

For $a$ and $b$, it is always the case that
%
\begin{equation*}
        \neg ( a \text{ and } b ) \iff \neg a \text{ or } \neg b
\end{equation*}
%
In other words, to say that both $a$ and $b$ are not the case is the
same as saying that neither $a$ nor $b$ are the case.

\subsection{Application of Logic to Mathematical Proof}
\label{eq:math_logic_application_proof}

It is often necessary to prove that given some $a$, $b$ is either a
logical consequence of $a$ (\ie, $a \implies b$) or equivalent to $a$
(\ie, $a \iff b$). To prove that $a$ is equivalent to $b$, it is
necessary to prove $a \implies b$ and $b \implies a$. Thus, most
mathematical proof involves showing logical implication. To prove that
$a \implies b$, the methods of \emph{modus ponens} or \emph{modus
tollens} can be used. The former method assumes $a$ and asserts $b$. The
latter method, also called \emph{proof by contraposition}, assumes $\neg
b$ and asserts $\neg a$. We consider both proof methods to be equally
valid; however, this is the subject of sophisticated debate among
logicians.

\section{Order Theory}
\label{app:math_order_theory}

In \longref{app:math_sets}, we introduced the set, one of the most
fundamental constructs of mathematics, and hinted at ways in which sets
are used to construct the numbers and arithmetic, the subjects of
\longref{app:math_numbers}. To complete this picture, we must first
introduce concepts from \emph{order theory} which allow elements of sets
to be compared.

\subsection{Relations}
\label{app:math_relations}

Take sets $\set{X}$ and $\set{Y}$. The \emph{relation} $\rel{R}$ is the
ordered triple defined
%
\begin{equation*}
        {\rel{R}} \triangleq (\set{X}, \set{Y}, \set{G})
\end{equation*}
%
where $\set{G} \subseteq \set{X} \times \set{Y}$ is called the
\emph{graph} of the relation $\rel{R}$. For two elements $x \in \set{X}$
and $y \in \set{Y}$, if $(x,y) \in \set{G}$ then $x$ is said to be
\emph{$\rel{R}$-related} to $y$, which is denoted
%
\begin{equation*}
        x \rel{R} y
\end{equation*}
%
In other words, if $x$ is $\leq$-related to $y$ then the more familiar
notation
%
\begin{equation*}
        x \leq y
\end{equation*}
%
is used. Because of its familiarity, being $\leq$-related is often
called being \emph{less than or equal to} (\eg, $x$ is less than or
equal to $y$). We will specifically define the $\leq$ relation for
familiar numerical sets later.

\paragraph{Examples:} For example, define the relation ${<} \triangleq
(\{0,1\},\{0,1\},\set{G})$ where the graph $\set{G} \subset \{0,1\}^2$
is defined with
%
\begin{equation*}
        \set{G} \triangleq \{(0,1),(0,2),(1,2)\}
\end{equation*}
%
Thus, it is the case that $0 < 1$, $0 < 2$, and $1 < 2$. The $<$
relation might be called the \emph{less than} relation. More abstractly,
define the relation ${\prec} \triangleq (\{a,b\},\{c,d\},\set{G})$ where
the graph $\set{G} \subset \{a,b\} \times \{c,d\}$ is defined with
%
\begin{equation*}
        \set{G} \triangleq \{(a,c),(b,d)\}
\end{equation*}
%
Thus, it is the case that $a \prec c$ and $b \prec d$. However, there is
no $\prec$ relationship between $a$ and $d$, and there is no $\prec$
relationship between $b$ and $c$. 

\paragraph{Chains of Relations:} Take a set $\set{X}$ equipped with
the relations $\rel{R}$ and $\rel{S}$. Take the elements $x,y,z \in
\set{X}$. The notation
%
\begin{equation*}
        x \rel{R} y \rel{S} z
\end{equation*}
%
indicates that
%
\begin{equation*}
        x \rel{R} y 
        \quad \text{ and } \quad
        y \rel{S} z
\end{equation*}
%
That is, the former notation that shows a chain of relations is a
shorthand for the latter notation that links many relationships
logically.

\paragraph{Set Relations and the Power Set:} We have already informally
defined the relations $=$, $\subseteq$, $\supseteq$, $\subset$, and
$\supset$ for sets. Recall that whenever sets are defined, a universal
set needs to at least be implicitly defined. For example, define a
universal set $\set{U}$ to be a superset of all possible sets of
interest. That is, for any set $\set{X}$ and $\set{Y}$, $\set{X}
\subseteq \set{U}$ and $\set{Y} \subseteq \set{U}$. Note that any subset
of $\set{U}$ is an element of the power set $\Pow(\set{U})$. That is,
%
\begin{equation*}
        \Pow(\set{U}) = \{ \set{X} : x \in \set{X} \text{ implies } x
        \in \set{U} \}
\end{equation*}
%
In fact, the power set can be viewed as a universal set for all subsets
of $\set{U}$. Therefore, any relation $\rel{R}$ between two sets must
take the form
%
\begin{equation*}
        {\rel{R}} = ( \Pow(\set{U}), \Pow(\set{U}), \setset{G} )
\end{equation*}
%
where $\setset{G} \subseteq \Pow(\set{U}) \times \Pow(\set{U})$. That
is, for $\set{X} \subseteq \set{U}$ and $\set{Y} \subseteq \set{U}$
(\ie, $\set{X},\set{Y} \in \Pow(\set{U})$),
%
\begin{itemize}
        \item ${=} \triangleq ( \Pow(\set{U}), \Pow(\set{U}), \setset{G}
                )$ where $(\set{X},\set{Y}) \in \setset{G}$ if and only
                if $p \in \set{X}$ implies $p \in \set{Y}$ and $p \in
                \set{Y}$ implies $p \in \set{X}$
        \item ${\subseteq} \triangleq ( \Pow(\set{U}), \Pow(\set{U}),
                \setset{G})$ where $(\set{X},\set{Y}) \in \setset{G}$ if
                and only if for all $p \in \set{X}$, $p \in \set{Y}$
        \item ${\supseteq} \triangleq ( \Pow(\set{U}), \Pow(\set{U}),
                \setset{G})$ where $(\set{X},\set{Y}) \in \setset{G}$ if
                and only if for all $p \in \set{Y}$, $p \in \set{X}$
        \item ${\subset} \triangleq ( \Pow(\set{U}), \Pow(\set{U}),
                \setset{G} )$ where $(\set{X},\set{Y}) \in \setset{G}$
                if and only if there exists a $q \in \set{Y}$ such that
                $q \notin \set{X}$ and for all $p \in \set{X}$, $p \in
                \set{Y}$
        \item ${\supset} \triangleq ( \Pow(\set{U}), \Pow(\set{U}),
                \setset{G} )$ where $(\set{X},\set{Y}) \in \setset{G}$
                if and only if there exists a $q \in \set{X}$ such that
                $q \notin \set{Y}$ and for all $p \in \set{Y}$, $p \in
                \set{X}$
\end{itemize}
%
This shows one of the many important uses of the power set.

\paragraph{Notation:} The shorthand notation $(\set{X}, {\rel{R}})$
indicates that the set $\set{X}$ is \emph{equipped} with the relation
$\rel{R} = (\set{X},\set{X},\set{G})$. In other words, $(\set{X},
{\rel{R}})$ communicates that mutual elements of set $\set{X}$ are
$\rel{R}$-related by the graph $\set{G}$. Note that the $=$ relation is
typically equipped with all sets as its definition is well understood
and can be easily applied. Similarly, as we will show, familiar
relations like $\leq$ are typically assumed to be equipped with familiar
sets like $\W$. That is, it is rare to see these sets and these
relations grouped together explicitly; however, it is assumed that
$\leq$ is provided with the standard definition.

\subsection{Equivalence Relations on a Set}
\label{app:math_equivalence_relations}

An \emph{equivalence relation} on a set $\set{X}$ is a relation $\sim$
so that for $x,y,z \in \set{X}$,
%
\begin{itemize}
        \item $x \sim x$
        \item if $x \sim y$ then $y \sim x$
        \item if $x \sim y$ and $y \sim z$ then $y \sim z$
\end{itemize}
%
As mentioned above in \longref{app:math_relations}, the equivalence
relation $=$ defined for sets is that a set $\set{X} = \set{Y}$ if and
only if $\set{X} \subseteq \set{Y}$ and $\set{Y} \subseteq \set{X}$. It
can be shown that this relation satisfies the three criteria for an
equivalence relation.

\subsection{Equivalence Class}
\label{app:equivalence_class}

Take a set $\set{X}$ equipped with an \emph{equivalence relation} $\sim$
(\eg, for set $\W$, $\sim$ might be replaced with $=$). Take an element
$a \in \set{X}$. The \emph{equivalence class} of $a \in \set{X}$ is
denoted \symdef{Csets.3}{equivclass}{${[a]}$}{equivalence class (\eg,
$\{x \in \set{X} : x = a \}$)} and is defined
%
\begin{equation*}
        [a] \triangleq \{ x \in \set{X} : x = a \}
\end{equation*}
%
That is, $[a] \subseteq \set{X}$ is a subset of set $\set{X}$ in which
any two elements from $[a]$ are equivalent by the equivalence relation
$\sim$. 

Therefore any set $\set{X}$ equipped with equivalence relation $\sim$,
which can be denoted $(\set{X}, {\sim})$, has equivalence classes of the
form $[x]$ for every $x \in \set{X}$. Clearly, for two elements $x,y \in
\set{X}$, it is the case that
%
\begin{equation*}
        x \sim y \text{ if and only if } [x] \sim [y]
\end{equation*}
%
and so there may be many representations of the same equivalence class.
That is, for $(\set{X},{\sim})$ and any equivalence class $[x]$, it is
the case that
%
\begin{equation*}
        [x] = [y] \text{ for all } y \in [x]
\end{equation*}
%
where $=$ is the equivalence relation defined for sets. That is, $[x] =
[y]$ if and only if $[x] \subseteq [y]$ and $[y] \subseteq [x]$.

For $(\set{X},{\sim})$, the set of all equivalence classes
\emph{induced} by equivalence relation $\sim$ is denoted
\symdef{Csets.31}{quotientset}{$\set{X}/{=}$}{quotient set induced by
set $\set{X}$ over relation $=$ (\ie, set of all $\sim$ equivalence
classes in $\set{X}$)} and is defined and is called the \emph{quotient
set} of $\set{X}$ by $\sim$. That is,
%
\begin{equation*}
        \set{X}/{\sim}
        \triangleq
        \{ [x] : x \in \set{X} \}
\end{equation*}
%
In fact, the quotient set $\set{X}/{\sim}$ is a partition of $\set{X}$.
That is, for any two equivalence classes $[x],[y] \in \set{X}/{\sim}$,
it must be either that $[x] = [y]$ or $[x] \cap [y] = \emptyset$.
Additionally, the union of all sets in $\set{X}/{\sim}$ is the set
$\set{X}$. In other words, for a set $\set{X}$ with equivalence relation
$\sim$, the equivalence relation \emph{divides} the set into disjoint
subsets that collectively exhaust $\set{X}$. It is this \emph{division}
that motivates the notation $\set{X}/{\sim}$ which shows $\set{X}$ being
\emph{divided by} the equivalence relation $\sim$.

\subsection{Preorder Relations on a Set}

A \emph{preorder relation} on a set $\set{X}$ is a relation $\rel{R}$
(the symbol $\leq$ is often used) so that for $x,y,z \in \set{X}$,
%
\begin{enumerate}[(i)]
        \item $x \rel{R} x$
                \label{item:preorder_reflexivity}
        \item if $x \rel{R} y$ and $y \rel{R} z$ then $y \rel{R} z$
                \label{item:preorder_transitivity}
\end{enumerate}
%
In this case, the set $\set{X}$ is called a \emph{preordered set}. To
indicate the preorder relation on set $\set{X}$, it will often be
written that $(\set{X},{\rel{R}})$ is a preordered set.

\paragraph{Preorders as Equivalence Relations:} Take
$(\set{X},{\rel{R}})$ to be a preorder. Assume that it is the case that
for all $x,y \in \set{X}$, if $x \rel{R} y$ then $y \rel{R} x$. In this
case, the preorder relation ${\leq}$ is an equivalence relation.

\subsection{Directed Sets}

A \emph{direction} on a \emph{nonempty} set $\set{X}$ is a relation
$\leq$ so that for $x,y,z \in \set{X}$,
%
\begin{enumerate}[(i)]
        \item $x \leq x$
                \label{item:directed_reflexivity}
        \item if $x \leq y$ and $y \leq z$ then $x \leq z$
                \label{item:directed_transitivity}
        \item there exists $t \in \set{X}$ such that $x \leq t$ and $y
                \leq t$
                \label{item:directed_directedness}
\end{enumerate}
%
In this case, the nonempty set $\set{X}$ is called a \emph{directed
set}, and it is said that $\set{X}$ is \emph{directed} by the relation
$\leq$. To indicate the direction relation on set $\set{X}$, it may be
written that $(\set{X},{\leq})$ is a directed set. Note that all
directed sets are preordered sets. 

\paragraph{Downward Directed Sets:} Take a nonempty set $\set{X}$ and a
relation $\leq$ so that for $x,y,z \in \set{X}$,
%
\begin{enumerate}[(i)]
        \item $x \leq x$
                \label{item:downdirected_reflexivity}
        \item if $z \leq y$ and $y \leq x$ then $z \leq x$
                \label{item:downdirected_transitivity}
        \item there exists $t \in \set{X}$ such that $t \leq x$ and $t
                \leq y$
                \label{item:downdirected_directedness}
\end{enumerate}
%
In this case, the set $\set{X}$ is said to be \emph{downward directed}
by the relation $\leq$. Now assume that there is a relation $\geq$ such
that for any $x,y \in \set{X}$, $x \leq y$ is equivalent to $y \geq x$.
In that case, $(\set{X},{\leq})$ is a downward directed set if and only
if $(\set{X},{\geq})$ is a directed set, and $(\set{X},{\leq})$ is a
directed set if and only if $(\set{X},{\geq})$ is a downward directed
set.

\subsection{Partial Order Relations on a Set}
\label{app:partial_order_relations}

A \emph{partial order relation} on a set $\set{X}$ already equipped with
equivalence relation $=$ is a relation $\leq$ so that for $x,y,z \in
\set{X}$,
%
\begin{enumerate}[(i)]
        \item $x \leq x$
                \label{item:poset_reflexivity}
        \item if $x \leq y$ and $y \leq x$ then $y = x$
                \label{item:poset_antisymmetry}
        \item if $x \leq y$ and $y \leq z$ then $x \leq z$
                \label{item:poset_transitivity}
\end{enumerate}
%
In this case, the set $\set{X}$ is called a \emph{partially ordered set}
or a \emph{poset}. It is common to say that $(\set{X},{\leq})$ is a
poset, which indicates that the set is ordered by the partial order
relation $\leq$. Note that it is not the case that any two elements from
a poset can be compared. If any two elements from a poset can be
compared then that poset is called a \emph{totally ordered set}.
Additionally, clearly properties (\shortref{item:poset_reflexivity}) and
(\shortref{item:poset_transitivity}) make any partially ordered set a
preoredered set as well.

\symdef[]{Ageneral.5}{ineq}{$\leq$ ($\geq$)}{less (greater) than or
equal to}\symdef[]{Ageneral.5}{strictineq}{$<$ ($>$)}{strictly less
(greater) than}The symbol $\leq$ typically indicates that an element is
\emph{less than or equal to} another element or simply \emph{before}
another element.  The symbol $<$ can be used instead to indicate that an
element is \emph{(strictly) less than} another element. That is, for a
set $\set{X}$ and elements $x,y \in \set{X}$,
%
\begin{equation*}
        x < y 
        \quad \text{ if and only if } \quad 
        x \leq y \text{ and } x \neq y
\end{equation*}
%
Additionally, for set $\set{X}$ and elements $x,y \in \set{X}$, the
phrase $x < y$ ($x \leq y$) can be written $y > x$ ($y \geq x$), in
which case the symbol $>$ ($\geq$) indicates that $y$ is \emph{greater
than (or equal to)} $x$. 

\paragraph{Meets, Joins, and Lattices:} Take partially ordered set
$(\set{X},{\leq})$ and $x,y \in \set{X}$. Consider two cases.
%
\begin{enumerate}[(i)]
        \item Assume that there exists $a \in \set{X}$ such that $a \leq
                x$ and $a \leq y$ and $z \leq a$ for all $z \in \set{X}$
                such that $z \leq x$ and $z \leq y$. In that case, $a$
                is called the \emph{greatest lower bound} or
                \symdef[\emph{(pairwise) meet}]{Forder.02}{meet}{$x
                \land y$}{the pairwise meet (\ie, greatest lower bound)
                of $x$ and $y$} of $x$ and $y$. It can also be said that
                $a$ is the \emph{infima} of the set $\{x,y\}$. Denote
                $a$ with $x \land y$ or $y \land x$. It is not
                coincidental that this notation is similar to the
                notation for a logical \emph{and}.
        \item Assume that there exists $b \in \set{X}$ such that $x \leq
                b$ and $y \leq b$ and $b \leq z$ for all $z \in \set{X}$
                such that $x \leq z$ and $y \leq z$. In that case, $b$
                is called the \emph{least upper bound} or
                \symdef[\emph{(pairwise) join}]{Forder.02}{join}{$x \lor
                y$}{the pairwise join (\ie, least upper bound) of $x$
                and $y$} of $x$ and $y$. It can also be said that $b$ is
                the \emph{suprema} of the set $\{x,y\}$. Denote $b$ with
                $x \lor y$ or $y \lor x$.  It is not coincidental that
                this notation is similar to the notation for a logical
                \emph{or}.
\end{enumerate}
%
If the pairwise meet and pairwise join both exist for all $x,y \in
\set{X}$, then $(\set{X},{\leq})$ is called a \emph{lattice}. Three
common terms when dealing with lattices are the following, which assume
that $(\set{X},{\leq})$ is a partially ordered set.
%
\begin{description}
        \item\emph{Totally Ordered Set:} Assume that it is the case that
                for all $x,y \in \set{X}$, the set $\{ x \lor y, x \land
                y \}$ is equivalent to the set $\{x,y\}$ (which is
                equivalent to the set $\{y,x\}$). In this case,
                $(\set{X},{\leq})$ is a \emph{totally ordered set} and
                it is the case that for all $x,y \in \set{X}$, $x \leq
                y$ if and only if $x \land y = x$. This type of set is
                the subject of \longref{app:math_total_order_set}.
        \item\emph{Complete Lattice:} Take a subset $\set{Y} \subseteq
                \set{X}$. Assume that it is the case that there exists
                $a,b \in \set{X}$ such that $a \leq y$ and $y \leq b$
                for all $y \in \set{Y}$. Take such an $a$ and $b$. In
                this case, $a$ is called a \emph{lower bound} for
                $\set{Y}$ and $b$ is called an \emph{upper bound} for
                $\set{Y}$. Now assume that for all $c,d \in \set{X}$
                such that $c$ is a lower bound for $\set{Y}$ and $d$ is
                an upper bound for $\set{Y}$, $a \geq c$ and $b \leq d$.
                In this case, $a$ is called the \emph{greatest lower
                bound} for $\set{Y}$ and $b$ is called the \emph{least
                upper bound} for $\set{Y}$. The greatest lower bound of
                $\set{Y}$ is also called the \emph{meet} or the
                \emph{infima} of $\set{Y}$ and is denoted by
                %
                \begin{equation*}
                        \sup \set{Y} 
                        \quad \text{or} \quad 
                        \bigwedge \set{Y}
                \end{equation*}
                %
                The least upper bound of $\set{Y}$ is also called the
                \emph{join} or \emph{suprema} of $\set{Y}$ and is
                denoted by
                %
                \begin{equation*}
                        \inf \set{Y} 
                        \quad \text{or} \quad 
                        \bigvee \set{Y}
                \end{equation*}
                %
                If for every subset $\set{Y} \subseteq \set{X}$, the
                meet of $\set{Y}$ and the join of $\set{Y}$ exists, then
                $\set{X}$ is called a \emph{complete lattice}. Upper and
                lower bounds are treated in detail in
                \longref{app:math_upper_lower_bound}.
        \item\emph{Bounded Lattice:} Assume that there exists an $a,b
                \in \set{X}$ such that $a \leq x$ and $x \leq b$ for all
                $x \in \set{X}$. In this case, $a$ is called the
                \emph{least element} or \emph{bottom} of $\set{X}$ and
                $b$ is called the \emph{greatest element} or \emph{top}
                of $\set{X}$. If a poset has both a greatest element and
                a least element, it is called a \emph{bounded poset}. If
                it is additionally a lattice, it is called a
                \emph{bounded lattice}. Note that all complete lattices
                are bounded lattices.
\end{description}

\subsection{Total Ordering on a Set}
\label{app:math_total_order_set}

For a set $\set{X}$ already equipped with an equivalence relation $=$, a
\emph{total ordering} on that set is a \emph{total order} relation
$\leq$ such that for \emph{any} three elements $x,y,z \in \set{X}$, 
%
\begin{enumerate}[(i)]
        \item $x \leq y$ or $y \leq x$ (or both)
                \label{item:toset_totality}
        \item if $x \leq y$ and $y \leq x$ then $x = y$
                \label{item:toset_antisymmetry}
        \item if $x \leq y$ and $y \leq z$ then $x \leq z$
                \label{item:toset_transitivity}
\end{enumerate}
%
This is identical to a partially ordered set (\ie, a poset); however, in
this case, \emph{every} element of the set can be \emph{compared} to
every other element. This comparison is the \emph{ordering}, and since
it captures a relationship between \emph{any two elements}, it is called
a \emph{total ordering} or is said to be \emph{total}. A set $\set{X}$
equipped with an order relation $\leq$ is called a \emph{totally ordered
set} or simply an \emph{ordered set}. Such a set is sometimes denoted
with its (total) order relation as $(\set{X},{\leq})$; however, for sets
with a well-understood standard ordering, this notation is often
omitted. Note that the symbols $\leq$, $<$, $\geq$, and $>$ have the
same interpretation as in a poset. Clearly, every (totally) ordered set
is a poset as well.

\paragraph{Totally Ordered Set as a Lattice:} Take a totally ordered set
$(\set{X},{\leq})$. Take $a,b \in \set{X}$. Without loss of generality,
assume $a \leq b$. In this case, clearly $a$ is the join and $b$ is the
meet of $a$ and $b$. Therefore, every totally ordered set is a lattice
as well.

\paragraph{Total Ordering as Directed Set:} Take a \emph{nonempty}
totally ordered set $(\set{X},{\leq})$. Note that for all $x,y,z \in
\set{X}$, without loss of generality we can assume that $x$, $y$, and
$z$ were chosen such that
%
\begin{equation*}
        x \leq y \leq z
\end{equation*}
%
Therefore, for all $x,y \in \set{X}$, there exists a $z \in \set{X}$
such that $x \leq z$ and $y \leq z$. Therefore, every nonempty totally
ordered set is also a directed set.

\paragraph{Whole Numbers as Example:} The set of the whole numbers $\W$
can be equipped with an \emph{equivalence relation} $=$ and a total
order relation $\leq$ such that for any two whole numbers $x,y \in \W$,
%
\begin{itemize}
        \item $x \leq y$ if and only if $x \subseteq y$
        \item $x = y$ if and only if $x \subseteq y$ and $y \subseteq x$
\end{itemize}
%
That is, define ${\leq} \triangleq (\W,\W,\set{G}_\leq)$ and ${=}
\triangleq (\W,\W,\set{G}_=)$ with
%
\begin{equation*}
        \set{G}_\leq 
        \triangleq 
        \{ (x,y) \in \W^2 : x \subseteq y \}
        \quad \text{ and } \quad
        \set{G}_=
        \triangleq 
        \{ (x,y) \in \W^2 : x \subseteq y \text{ and } y \subseteq x \}
\end{equation*}
%
By these relations,
%
\begin{align*}
        0 < 1 < 2 < 3 < 4 < 5 < 6 < \cdots
\end{align*}
%
which, of course, also means that
%
\begin{align*}
        0 \leq 1 \leq 2 \leq 3 \leq 4 \leq 5 \leq 6 \leq \cdots
\end{align*}
%
and this is the standard ordering of the whole numbers. The set $\W$
equipped with the order relation $\leq$ makes $\W$ a totally ordered
set. Since this is the standard whole number ordering, $\W$ will rarely
be written explicitly with $\leq$ (\ie, $(\W,{\leq})$ or even
$(\W,{=},{\leq})$). However, non-traditional ordered sets or
non-traditional order relations on traditional ordered sets will often
be listed with their order relations. Note that $(\W,{\leq})$ and
$(\W,{\subseteq})$ are both directed sets as well; this is not
surprising because all nonempty totally ordered sets are also directed
sets.

\paragraph{Comparison to Partially Ordered Sets:} Note that all totally
ordered sets are also partially ordered sets. In a partially ordered
set, it is not the case that every element can be compared to every
other element. A poset (\ie, a partially ordered set) in which any two
elements can be compared is \emph{total}; that is, it is a totally
ordered set.

\paragraph{Intervals of Totally Ordered Sets:} Take totally ordered set
$(\set{X},{\leq})$. Take two elements $a,b \in \set{X}$ such that $a
\leq b$. The notations
\symdef[]{Csets.2intervals1}{interval1}{${[a,b]}$}{interval $[a,b]
\triangleq \{ x \in \set{X} : a \leq x \leq b
\}$}\symdef[]{Csets.2intervals2}{interval2}{${(a,b]}$}{interval $(a,b]
\triangleq \{ x \in \set{X} : a < x \leq b
\}$}\symdef[]{Csets.2intervals3}{interval3}{${[a,b)}$}{interval $[a,b)
\triangleq \{ x \in \set{X} : a \leq x < b
\}$}\symdef[]{Csets.2intervals4}{interval4}{${(a,b)}$}{interval $(a,b)
\triangleq \{ x \in \set{X} : a < x < b \}$}$[a,b]$, $(a,b]$, $[a,b)$,
and $(a,b)$ are defined with
%
\begin{align*}
        [a,b] &\triangleq \{ x \in \set{X} : a \leq x \leq b \}\\
        (a,b] &\triangleq \{ x \in \set{X} : a < x \leq b \}\\
        [a,b) &\triangleq \{ x \in \set{X} : a \leq x < b \}\\
        (a,b) &\triangleq \{ x \in \set{X} : a < x < b \}
\end{align*}
%
respectively. These sets are all called \emph{intervals} and $a$ and $b$
are called \emph{endpoints}. Specifically, $a$ is called the \emph{left
endpoint} and $b$ is called the \emph{right endpoint} of each of the
four intervals above. This notation provides a convenient way to specify
a range of elements from an ordering. Since these are defined as sets,
all standard set operations (\eg, intersection and union) apply to them.
For example, for $c \in \set{X}$ with $a < c < b$, $[a,b] \setdiff \{c\}
= [a,c) \cup (c,b]$. Also note that usually this notation is used with
$a < b$. Special intervals are presented at the conclusions of
\longrefs{app:math_reals} and \shortref{app:math_ext_reals}. 

\paragraph{Dense Ordering:} Take totally ordered set $(\set{X},{\leq})$.
If it is the case that for every $x,y \in \set{X}$ with $x < y$, there
exists a $z \in \set{X}$ such that $x < z < y$ then $(\set{X},{\leq})$
is said to be \emph{densely ordered} and $\leq$ is called a \emph{dense
order}.

\subsection{Upper and Lower Bounds}
\label{app:math_upper_lower_bound}

Take $\set{S}$ to be a partially ordered set equipped with partial order
relation $\leq$. Take $\set{X} \subseteq \set{S}$. If there exists an
$\alpha \in \set{S}$ such that for every $x \in \set{X}$, it is the case
that $\alpha \leq x$ then $\alpha$ is called a \emph{lower bound} of set
$\set{X}$ and $\set{X}$ is said to be \emph{bounded from below}.
Similarly, if there exists a $\beta \in \set{S}$ such that for every $x
\in \set{X}$, it is the case that $x \leq \beta$ then $\beta$ is called
an \emph{upper bound} of set $\set{X}$ and $\set{X}$ is said to be
\emph{bounded from above}. If $\set{X}$ is both bounded from above and
bounded from below, $\set{X}$ is simply called a \emph{bounded} set. 

Again, take $\set{S}$ to be an ordered set equipped with order relation
$\leq$, and take $\set{X} \subseteq \set{S}$. Assume that $\alpha \in
\set{S}$ is a lower bound of $\set{X}$ and that for every $s \in
\set{S}$, if $\alpha < s$ then $s$ is \emph{not} a lower bound of
$\set{X}$. In that case, $\alpha$ is called the \emph{greatest lower
bound} or the \emph{join} or the
\symdef[]{Forder.11}{infbigwedge}{$\bigwedge$}{join of a set (\ie,
lowest upper bound or
supremum)}\symdef[\emph{supremum}]{Forder.201}{sup}{$\sup$}{supremum
(\ie, lowest upper bound or join)} of $\set{X}$ and is denoted by 
%
\begin{equation*}
        \sup \set{X}
        \quad
        \text{ or }
        \quad
        \bigvee \set{X}
\end{equation*}
%
If $\sup \set{X} \in \set{X}$ then $\sup \set{X}$ is said to be the
\symdef[\emph{maximum}]{Forder.202}{max}{$\max$}{maximum element} of
$\set{X}$ and is denoted by 
%
\begin{equation*}
        \max \set{X}
\end{equation*}
%
Now assume that $\beta \in \set{S}$ is an upper bound of $\set{X}$ and
that for every $s \in \set{S}$, if $s < \beta$ then $s$ is \emph{not} an
upper bound of $\set{X}$. In that case, $\beta$ is called the
\emph{least upper bound} or the \emph{meet} or the
\symdef[]{Forder.12}{infbigvee}{$\bigvee$}{meet of a set (\ie, greatest
lower bound or
infimum)}\symdef[\emph{infimum}]{Forder.201}{inf}{$\inf$}{infimum (\ie,
greatest lower bound or meet)} of $\set{X}$ and is denoted by 
%
\begin{equation*}
        \inf \set{X}
        \quad
        \text{ or }
        \quad
        \bigwedge \set{X}
\end{equation*}
%
If $\inf \set{X} \in \set{X}$ then $\inf \set{X}$ is said to be the
\symdef[\emph{minimum}]{Forder.202}{min}{$\min$}{minimum element}
of $\set{X}$ and is denoted with 
%
\begin{equation*}
        \min \set{X}
\end{equation*}
%
We will call the infimum and supremum of a set its \emph{extremum
bounds}.

\paragraph{Bounded Poset Bounds:} Take bounded poset $(\set{X},{\leq})$.
Take $a \in \set{X}$ to be the bottom (\ie, least element) of $\set{X}$
and $b \in \set{X}$ to be the top (\ie, greatest element) of $\set{X}$.
Note that
%
\begin{equation*}
        \sup \emptyset = a
\end{equation*}
%
That is, since every element of $\set{X}$ can be called an upper bound
of $\emptyset$, the least upper bound of the empty set is the least
element of $\set{X}$. Similarly,
%
\begin{equation*}
        \inf \emptyset = b
\end{equation*}
%
That is, since every element of $\set{X}$ can be called a lower bound of
$\emptyset$, the greatest lower bound of the empty set is the greatest
element of $\set{X}$.

\paragraph{Gapless and Complete:} Take partially ordered set
$(\set{X},{\leq})$. If it is the case that
%
\begin{enumerate}[(i)]
        \item the supremum of every nonempty subset of $\set{X}$ that is
                bounded from above exists (\ie, is an element of
                $\set{X}$) \label{item:lub_property}
        \item the infimum of every nonempty subset of $\set{X}$ that is
                bounded from above exists (\ie, is an element of
                $\set{X}$) \label{item:glb_property}
\end{enumerate}
%
then set $\set{X}$ is called \emph{gapless} or \emph{Dedekind complete}.
In particular, property (\shortref{item:lub_property}) is called the
\emph{least-upper-bound property} and property
(\shortref{item:glb_property}) is called the \emph{greatest-lower-bound}
property. If set $\set{X}$ is gapless and every nonempty subset of
$\set{X}$ is bounded then $\set{X}$ is called \emph{complete (in the
sense of order)} or a \emph{complete lattice}. When we use the term
\emph{complete}, we use it in this sense. To be more specific, some use
the term \emph{complete lattice}. While the term \emph{gapless} is not
our invention, it is not conventional; it should not be used in
mathematical discourse without a definition.

Take partially ordered set $(\set{X},{\leq})$. Assume that nonempty sets
$\set{A}$ and $\set{B}$ form a \emph{partition} of set $\set{X}$ (\ie,
every element of set $\set{X}$ is either an element of $\set{A}$ or
$\set{B}$ but not an element of both). Assume that it is the case for
that any element $a \in \set{A}$ and any element $b \in \set{B}$, $a <
b$ (\ie, every element in set $\set{A}$ is less than every element in
$\set{B}$). If and only if there exists $c \in \set{X}$ such that for
any $a \in \set{A}$ and any $b \in \set{B}$ it is the case that $a \leq
c \leq b$ then $\set{X}$ is \emph{gapless}. That is, this is an
equivalent definition of \emph{gapless}. Also note that in this case,
for such a $c$, $c = \sup \set{A} = \inf \set{B}$; that is, $c$ forms a
sort of \emph{boundary} between $\set{A}$ and $\set{B}$.

\paragraph{Existence of Upper Bounded Set Maxima:} Take partially
ordered set $(\set{X},{\leq})$. Assume that $(\set{X},{\leq})$ is
gapless but \emph{not} densely ordered. Now take a nonempty subset
$\set{A} \subseteq \set{X}$ such that $\set{A}$ is bounded from above.
Since $(\set{X},{\leq})$ is gapless, $\sup \set{A}$ exists. However,
since $(\set{X},{\leq})$ is not densely ordered, it can be shown that
$\max \set{A}$ exists and, of course, $\max \set{A} = \sup \set{A}$. 

\paragraph{Existence of Lower Bounded Set Minima:} Take partially
ordered set $(\set{X},{\leq})$. Assume that $(\set{X},{\leq})$ is
gapless but \emph{not} densely ordered. Now take a nonempty subset
$\set{A} \subseteq \set{X}$ such that $\set{A}$ is bounded from below.
Since $(\set{X},{\leq})$ is gapless, $\inf \set{A}$ exists. However,
since $(\set{X},{\leq})$ is not densely ordered, it can be shown that
$\min \set{A}$ exists and, of course, $\min \set{A} = \inf \set{A}$. 

\subsection{Order-Preserving Functions and Order Isomorphic Sets}
\label{app:math_order_preserving}

Take $(\set{X},{\preceq})$ and $(\set{Y},{\trianglelefteq})$ to each be
sets paired with their corresponding total order relation. The function
$f: \set{X} \mapsto \set{Y}$ is called \emph{monotone} or
\emph{order-preserving} if it is the case that for every $x,y \in
\set{X}$ if $x \preceq y$ then $f(x) \trianglelefteq f(y)$. If a
monotone function is also a bijection (\ie, $\set{X} \cong \set{Y}$)
then the function is said to be an \emph{order isomorphism} and the sets
$\set{X}$ and $\set{Y}$ are \emph{order isomorphic}. Roughly, this means
that every element from set $\set{X}$ can be replaced with a unique
element from $\set{Y}$ and as long as the order relations are also
exchanged the ordering will not change.

\subsection{Filters on Partially Ordered Sets}
\label{app:math_filters_on_posets}

Take partially ordered set $(\set{S},{\leq})$. Take a \emph{nonempty}
subset $\set{F}$ (\ie, $\set{F} \subseteq \set{S}$ with $\set{F} \neq
\emptyset$). Now assume that
%
\begin{enumerate}[(i)]
        \item for all $x,y \in \set{F}$, there exists some $z \in
                \set{F}$ such that $z \leq x$ and $z \leq y$
                \label{item:poset_filter_base}
        \item for all $x \in \set{F}$ and $y \in \set{S}$, if $x \leq y$
                then $y \in \set{F}$
                \label{item:poset_upper_set}
%        \item $\set{F} \neq \set{S}$
%                \label{item:poset_proper}
\end{enumerate}
%
In this case, $\set{F}$ is called a \emph{filter}. If it is the case
that $\set{F} \neq \set{S}$ then $\set{F}$ may be called a \emph{proper
filter}. If only property $(\shortref{item:poset_filter_base})$ is met
then $\set{F}$ is called a \emph{filter base} (or a \emph{filter
basis}), and a filter base $\set{F}$ with $\set{F} \neq \set{S}$ is
called a \emph{proper filter base}.

\subsection{Nets and Sequences}
\label{app:math_nets_and_sequences}

Take a set $\set{X}$ and a directed set $(\set{A},{\leq})$. The ordered
indexed family $(x_\alpha)_{\alpha \in \set{A}}$ (\ie, a family with
domain $\set{A}$ and codomain $\set{X}$) is called a
\symdef[\emph{net}]{Dseq.3}{net}{$(x_\alpha)$}{a net (\ie, an ordered
indexed family $(x_\alpha : \alpha \in \set{A})$ with directed index set
$\set{A}$)}. Usually nets are listed without their index sets and the
indices are given by Greek lowercase alphabetic letters. For example,
%
\begin{equation*}
        (x_\alpha) \triangleq (x_\alpha)_{\alpha \in \set{A}}
\end{equation*}
%
is a net.

\paragraph{Sequences:} Take set $\set{X}$, totally ordered set
$(\N,{\leq})$, and the net $(x_n)$ from $\N$ to $\set{X}$. In this case,
when a net's domain is $\N$, the net is called a
\symdef[\emph{sequence}]{Dseq.3}{sequence}{$(x_n)$}{a sequence (\ie, an
ordered indexed family $(x_n : n \in \N)$ with totally ordered index set
$\N$)} and its indices are usually given with English lowercase
alphabetic letters. For example,
%
\begin{equation*}
        (x_n) \triangleq (x_n)_{n \in \N}
\end{equation*}
%
is a sequence (and, of course, also a net). 

\paragraph{Monotonic Sequences:} Take $(\set{X},{\leq})$ to be a totally
ordered set and a sequence $(x_n)$ such that $x_n \in \set{X}$ for all
$n \in \N$. If for all $m,n \in \N$ with $m > n$, 
%
\begin{itemize}
        \item $x_m \geq x_n$ then the sequence is said to be
                \emph{monotonically increasing}
        \item $x_m > x_n$ then the sequence is said to be
                \emph{strictly monotonically increasing}
        \item $x_m \leq x_n$ then the sequence is said to be
                \emph{monotonically decreasing}
        \item $x_m < x_n$ then the sequence is said to be
                \emph{strictly monotonically decreasing}
\end{itemize}
%
For example, the sequence $(1,2,3,4,\dots)$ is clearly strictly
monotonically increasing.

\section{Elementary Abstract Algebra}
\label{app:math_abstract_algebra}

Now that we have shown how elements of sets can be compared, we can
introduce concepts from \emph{algebra} which allow elements of sets to
interact. That is, we will show how elements can be operated on in order
to produce other elements. Together with the constructs from
\longref{app:math_order_theory}, this gives sets an idea of structure
and shape. We will then show how the structures of two different sets
can be related. Once a set is endowed with a sufficient order and
structure, familiar \emph{arirthmetic} can be defined for its elements;
this is our motivation for all of this discussion. \Citet{Roman92}
provides further information about the algebraic structures important to
us and their application.

\subsection{Operations}
\label{app:math_operations}

We focus our attention on \emph{binary operations}, which are also
called \emph{dyadic operations}. Our definition of these
\emph{operations} is weaker than many of the the conventional
definitions. For sets $\set{X}$, $\set{Y}$, and $\set{Z}$, a
\emph{binary operation} is a function of the form $\set{X} \times
\set{Y} \mapsto \set{Z}$ where at least two of the sets are usually the
same.

Take sets $\set{X}$, $\set{Y}$, and $\set{Z}$. Also take binary
operation ${\bin{Q}}: \set{X} \times \set{Y} \mapsto \set{Z}$.  For some
$(x,y) \in \set{X} \times \set{Y}$ and $z \in \set{Z}$, if it is the
case that $\mathop{\bin{Q}}(x,y) = z$, the notation
%
\begin{equation*}
        x \bin{Q} y = z
\end{equation*}
%
is used and $\bin{Q}$ is referred to as a \emph{binary operator} or
simply an \emph{operator}. For example, assume that an operator
${+}: \W \times \W \mapsto \W$ has been defined; then for any
$x,y \in \W$, there exists $z \in \W$ such that
%
\begin{equation*}
        x + y = z
\end{equation*}
%
This is possibly a more familiar notation than the generic one that uses
$\bin{Q}$ above.

\paragraph{Set Operations and the Power Set:} We have already informally
defined the operations $\cap$, $\cup$, and ${}^c$ (where ${}^c$ is a
\emph{unary operation}) for sets. Recall that whenever sets are defined,
a universal set needs to at least be implicitly defined. For example,
define a universal set $\set{U}$ to be a superset of all possible sets
of interest. That is, for any set $\set{X}$ and $\set{Y}$, $\set{X}
\subseteq \set{U}$ and $\set{Y} \subseteq \set{U}$. Note that any subset
of $\set{U}$ is an element of the power set $\Pow(\set{U})$. That is,
%
\begin{equation*}
        \Pow(\set{U}) = \{ \set{X} : x \in \set{X} \text{ implies } x
        \in \set{U} \}
\end{equation*}
%
In fact, the power set can be viewed as a universal set for all subsets
of $\set{U}$. Therefore, any operation $\bin{Q}$ between two sets must
take the form
%
\begin{equation*}
        {\bin{Q}}: 
        \Pow(\set{U}) \times \Pow(\set{U}) \mapsto \Pow(\set{U})
\end{equation*}
%
That is, for $\set{X} \subseteq \set{U}$ and $\set{Y} \subseteq \set{U}$
(\ie, $\set{X},\set{Y} \in \Pow(\set{U})$),
%
\begin{itemize}
        \item $\set{X} \cap \set{Y} \triangleq \{ x \in \set{X} : x \in
                \set{Y} \}$
        \item $\set{X} \cup \set{Y} \triangleq \{ x \in \set{U} : x \in
                \set{X} \text{ or } x \in \set{Y} \}$
        \item $\set{X}^c \triangleq \{ x \in \set{U} : x \notin \set{X}
                \}$
\end{itemize}
%
Of course, these are the standard definitions for these three
operations.

\paragraph{Magma Notation:} The shorthand notation $(\set{X},
{\bin{Q}})$ indicates that the set $\set{X}$ is \emph{equipped} with the
operation ${\bin{Q}}: \set{X} \times \set{X} \mapsto \set{X}$. In fact,
$(\set{X}, {\bin{Q}})$ is called a \emph{magma} or \emph{groupoid}. The
only requirement on ${\bin{Q}}$ is that it the set $\set{X}$ is closed
under the operation, which is implied by the codomain of ${\bin{Q}}$
being $\set{X}$. 

\paragraph{Multiple-Operator Notation:} In general, when a set $\set{X}$
is equipped with $n \in \N$ operations, the $(n+1)$-tuple with $\set{X}$
as its first coordinate and the $n$ operations as its other coordinates
is typically used. In the case where a set also has an order defined,
that order may also be listed as a coordinate; in fact, the order is
usually listed as the last coordinate of the tuple. 

\paragraph{Implicit Operators:} Also, familiar operations like $+$ are
typically assumed to be equipped with familiar sets like $\W$. That is,
it is rare to see these familiar sets and these familiar operations
grouped together explicitly in the $n$-tuple notation; instead, it is
assumed that the familiar operations (\eg, $+$) are provided with the
standard definitions. That being said, we will explicitly define these
operations for each of the familiar sets and then assume that that set
carries those operations with it. 

\paragraph{Order of Operations, Grouping Symbols, and Precedence:} In a
long string of binary operations, usually the leftmost operation should
be executed first and the result should be used as the left argument of
the operation adjacent to it. This chain of execution should continue
from left to right. However, the grouping symbols $($ and $)$ can be
used to indicate that certain operations should be executed out of
order. For example, for set $\set{X}$ and binary operations ${\bin{Q}}:
\set{X} \times \set{X} \mapsto \set{X}$ and ${\bin{R}}: \set{X} \times
\set{X} \mapsto \set{X}$ and elements $x,y,z \in \set{X}$, the statement
%
\begin{equation}
        x \bin{Q} y \bin{R} z
        \label{eq:bin_oper_QR}
\end{equation}
%
is equivalent to the statement
%
\begin{equation*}
        (x \bin{Q} y) \bin{R} z
\end{equation*}
%
which both state that $x \bin{Q} y$ should be the left argument to
$\bin{R}$; however,
%
\begin{equation}
        x \bin{Q} (y \bin{R} z)
        \label{eq:bin_oper_QpR}
\end{equation}
%
is a completely different statement as it indicates that $y \bin{R} z$
should be the right argument to $\bin{Q}$. However, note that when no
parentheses are given, operators may be defined so that one operator
takes \emph{precedence} over another operator. That is, in the above
example, $\bin{R}$ could have been defined to take precedence over
$\bin{Q}$; in that case, \longrefs{eq:bin_oper_QR} and
\shortref{eq:bin_oper_QpR} will be the equivalent pair above.

\subsection{Groups, Monoids, and Semigroups}

Take set $\set{X}$ equipped with equivalence relation $=$ and binary
operation ${\diamond}: \set{X} \times \set{X} \mapsto \set{X}$. If it
is the case that
%
\begin{enumerate}[(i)]
        \item For all $x, y, z \in \set{X}$, $(x \diamond y) \diamond z
                = x \diamond ( y \diamond c )$.
                \label{item:group_associativity}
        \item There exists an element $e \in \set{X}$ such that for all
                $x \in \set{X}$, $e \diamond x = x \diamond e = x$ where
                $e$ is known as the \emph{identity element}.
                \label{item:group_identity}
        \item For each $x \in \set{X}$, there exists a $y \in \set{X}$
                such that $x \diamond y = y \diamond x = e$, where $e$
                is the identity element from
                (\shortref{item:group_identity}), and $y$ is known as
                the \emph{inverse} of $x$.
                \label{item:group_inverse}
\end{enumerate}
%
then the magma $(\set{X}, {\diamond})$ is called a \emph{group} with
identity element $e$. The property in
(\shortref{item:group_associativity}) is known as \emph{associativity}
and the operator $\diamond$ is said to be \emph{associative}. Properties
(\shortref{item:group_associativity}) and
(\shortref{item:group_identity}) make $(\set{X}, {\diamond}, e)$ a
\emph{monoid}. Property (\shortref{item:group_associativity}) makes
$(\set{X}, {\diamond})$ a \emph{semigroup}. In summary, all groups are
monoids and all monoids are semigroups. 

\paragraph{Trivial Monoids and Semigroups:} Because groups and monoids
require the existence of an identity element, they \emph{must be
nonempty}. However, semigroups have no requirement of the existence of
any elements. Thus, we have the following.
%
\begin{itemize}
        \item The singleton set is the trivial monoid and thus is also
                the trivial group. Now take the trivial monoid
                $(\{x\},{\diamond})$. Note that it must be that $x$ is
                the identity element, and so $x
                \diamond x = x$. However, this also implies that $x$ is
                its own inverse, and so $(\{x\},{\diamond})$ is the
                trivial group as well. Therefore, all trivial monoids
                are trivial groups.
        \item The emptyset $\emptyset$ is the trivial semigroup.
\end{itemize}

\paragraph{Monoid Triple Notation:} Take $(\set{X},{\diamond})$ to be a
monoid. In this case, there is an identity element $e_\diamond$ for the
operation $\diamond$. In order to identify this identity element, it is
often listed explicitly in the notation. That is,
$(\set{X},{\diamond},e_\diamond)$ is an equivalent notation for the
monoid. Of course, since all groups are monoids then this is also an
equivalent notation for a group.

\paragraph{Commutative Semigroups, Monoids, and Groups:} Take $(\set{X},
{\diamond})$ that is either a semigroup, monoid, or group. If it is the
case that for any $x, y \in \set{X}$, $x \diamond y = y \diamond x$
then $(\set{X}, {\diamond})$ is said to be \emph{Abelian} or
\emph{commutative}. For example, if $(\set{X}, {\diamond})$ is a group
that has this property then $(\set{X}, {\diamond})$ is a
\emph{commutative group}. This property is known as \emph{commutivity}
and operators with this property are said to be \emph{commutative} as
well. 

\subsection{Rings}

Take set $\set{X}$ equipped with equivalence relation $=$ and binary
operations ${+}: \set{X} \times \set{X} \mapsto \set{X}$ and ${\times}:
\set{X} \times \set{X} \mapsto \set{X}$. Call
\symdef[$+$]{Ageneral.541}{addition}{$x + y$}{sum of $x$ and $y$} the
\emph{addition} operator and
\symdef[$\times$]{Ageneral.542}{multiplication}{$x \times y$}{product of
$x$ and $y$ (also denoted $xy$)} the \emph{multiplication} operator. If
it is the case that
%
\begin{enumerate}[(i)]
        \item the magma $(\set{X},{+},e_+)$ is a commutative group (with
                identity element $e_+$)
              \label{item:ring_addition}
        \item the magma $(\set{X},{\times},e_\times)$ is a monoid (with
                identity element $e_\times$)
              \label{item:ring_multiplication}
        \item for each $x,y,z \in \set{X}$, $x \times (y + z) = (x
                \times y) + (x \times z)$ and $(x + y) \times z = (x
                \times z) + (y \times z)$
              \label{item:ring_distributivity}
\end{enumerate}
%
then $(\set{X},{+},{\times})$ is called a \emph{ring} and is often shown
with its identity elements as $(\set{X},{+},{\times},e_+,e_\times)$. The
identity element $e_+$ in (\shortref{item:ring_addition}) is called the
\emph{additive identity} and is often denoted $0$, and the identity
element $e_\times$ in (\shortref{item:ring_multiplication}) is called
the \emph{multiplicative identity} and is often denoted $1$. Thus, it is
common to see a ring specified with $(\set{X},{+},{\times},0,1)$.  The
inverses for ${+}$ are called \emph{additive inverses} and the inverses
for ${\times}$ (which are not guaranteed to exist) are called
\emph{multiplicative inverses}. The property in
(\shortref{item:ring_distributivity}) is called \emph{distributivity};
that is, multiplication \emph{distributes} over addition. The result of
the addition operator ${+}$ is called the \emph{sum} of its arguments,
and the result of the multiplication operator ${\times}$ is called the
\emph{product} of its arguments. 

\paragraph{Additive Inverses and Subtraction:} Since $(\set{X},{+})$ is
a group, it has inverses. For an element $x \in \set{X}$, the additive
inverse for $x$ is often denoted
\symdef{Ageneral.543}{addinverse}{$-x$}{additive inverse of $x$}.
Additionally, for elements $x,y \in \set{X}$, the notation
\symdef{Ageneral.5431}{subtraction}{$x - y$}{difference of $x$ and $y$
(\ie, $x - y \triangleq x + -y$)} is often used to represent $x + -y$,
where $-y$ is the additive inverse of $y$. In this case, the operator
${-}$ is called the \emph{subtraction} operator, and its result is
called the \emph{difference} of its two arguments.

\paragraph{Multiplicative Inverses, Ratios, and Division:} Since
$(\set{X},{\times})$ is a monoid, some of the elements of $\set{X}$ may
have inverses. For an element $x \in \set{X}$ that has an inverse, the
multiplicative inverse for $x$ is often denoted $x^{-1}$. Additionally,
for elements $x,y \in \set{X}$ where $y$ has a multiplicative inverse,
the notation $x/y$ is often used to represent $x \times y^{-1}$, where
$y^{-1}$ is the multiplicative inverse of $y$. In this case, the
operator ${/}$ is called the \emph{division} operator, and its result is
called the \emph{quotient} of its two arguments. Additionally, the
notation $\frac{x}{y}$ is equivalent to the notation $x/y$. Both
notations are often referred to as \emph{ratios} of element $x$ to
element $y$.

\paragraph{Juxtaposition and Related Notations:} Note that the operator
$\times$ is often denoted by $\cdot$ or simply omitted completely. That
is, for $x,y \in \set{X}$, $x \times y$ and  $x \cdot y$ and $xy$ all
indicate the same operation. The latter case (\eg, $xy$) is called
\emph{juxtaposition} of $x$ and $y$.

\paragraph{Order of Operations:} The multiplication operation takes
precedence over the addition operation. That is, unless explicit
grouping symbols (\eg, $($ and $)$) denote otherwise, all multiplication
operations should be executed first.

\paragraph{Multiplication by Additive Identity:} Take a ring
$(\set{X},{+},{\times},0,1)$ and elements $x,y \in \set{X}$. Note that
$x(0 + -y) = x0 + x(-y)$. However, since $0 + -y = -y$ then $x(0 +
-y)=x(-y)$. Thus, $x(-y) = x0 + x(-y)$ and so $-(x(-y)) + x(-y) = x0$.
However, $-(x(-y)) + x(-y) = 0$, and so it must be that $x0 = 0$.
Similarly, it is easy to show that $0x = 0$ and so $x0 = 0x = 0$. This
holds for any ring.

\paragraph{Commutative Rings:} If ring $(\set{X},{+},{\times})$ is such
that $(\set{X},{\times})$ is a commutative monoid rather than just a
monoid then $(\set{X},{+},{\times})$ is called a \emph{commutative
ring}.

\paragraph{The Zero Ring:} Take singleton set $\{x\}$ with $\times$
defined so that $x \times x = x$ and $+$ defined so that $x + x = x$.
Clearly, $(\{x\}, {+}, {\times}, x, x)$ is a commutative ring. In fact,
this is often called the \emph{trivial ring} or the \emph{zero ring}.
Clearly, in this singleton set the multiplicative identity and the
additive identity are the same element. Since these are usually denoted
with $1$ and $0$ respectively, this is the same as saying $1 = 0$. Take
a ring $(\set{X},{+},{\times},0,1)$ where $1 = 0$. In that case, for any
$x \in \set{X}$,
%
\begin{equation*}
        x \times 1 = x \times 0 = 0
\end{equation*}
%
And thus $x$ can only be $0$ and so $\set{X}$ must be a singleton set.
Therefore, a ring is a trivial ring if and only if its multiplicative
identity and its additive identity are the same element. This trivial
property is sometimes denoted by $1 = 0$. Thus, if a ring is required
such that $1 \neq 0$ (\ie, the multiplicative and additive identities
are different), it is required that the ring is not the trivial zero
ring.

\paragraph{Semirings:} The definition of a ring can be relaxed slightly
to define a \emph{semiring}. In particular, $(\set{X},{+},{\times})$ is
called a \emph{semiring} if it is the case that
%
\begin{enumerate}[(i)]
        \item the magma $(\set{X}, {+})$ is a commutative \emph{monoid}
              \label{item:semiring_addition}
        \item the magma $(\set{X}, {\times})$ is a monoid
              \label{item:semiring_multiplication}
        \item for each $x,y,z \in \set{X}$, $x \times (y + z) = (x
                \times y) + (x \times z)$ and $(x + y) \times z = (x
                \times z) + (y \times z)$
              \label{item:semiring_distributivity}
        \item for each $x \in \set{X}$, $x \times e_+ = e_+ \times x =
                e_+$ where $e_+$ is the identity element from the monoid
                $(\set{X}, {+})$
\end{enumerate}
%
If semiring $(\set{X},{+},{\times})$ is such that $(\set{X},{\times})$
is a commutative monoid rather than just a monoid then
$(\set{X},{+},{\times})$ is called a \emph{commutative semiring}. It can
be shown that every semiring is a ring and every commutative semiring is
a commutative ring.

\subsection{Fields}

Take a commutative ring $(\set{X},{+},{\times},e_+,e_\times)$. If it is
the case that
%
\begin{enumerate}[(i)]
        \item the additive identity $e_+$ (\eg, $0$) and the
                multiplicative identity $e_\times$ (\eg, $1$) are unique
                (\ie, $e_+ \neq e_\times$)
                \label{item:field_not_trivial}
        \item for all $x \in \set{X}$, if $x$ is not the additive
                identity (\ie, $x \neq e_+$) then the multiplicative
                inverse (\eg, $x^{-1}$) exists (\eg, $x x^{-1} = x^{-1}
                x = e_\times$)
                \label{item:field_division}
\end{enumerate}
%
then $(\set{X},{+},{\times},e_+,e_\times)$ is called a \emph{field}. The
property in (\shortref{item:field_not_trivial}) simply excludes the
trivial zero ring. The property in (\shortref{item:field_division})
allows for the operation of \emph{division}.

\subsection{Subgroups, Subrings, and Subfields}

Take a set $\set{X}$ and subset $\set{Y} \subset \set{X}$. If there is
an algebraic structure (\eg, a group, a ring, or a field) for set
$\set{X}$ that maintains its structure when $\set{Y}$ is substituted for
$\set{X}$ and the operations are restricted to set $\set{Y}$ then the
structure with set $\set{Y}$ is known as a \emph{sub}structure. 

\paragraph{Examples:} For example, take $\set{X}$ and $\set{Y} \subset
\set{X}$ and assume that $(\set{X},{\star})$ is a group and
$(\set{X},{\star},{\divideontimes})$ is a field.
%
\begin{itemize}
        \item If $(\set{Y},{\star}|_\set{Y})$ is also a group then
                $\set{Y}$ is called a \emph{subgroup} of $\set{X}$ under
                the operation $\star$.
        \item If $(\set{Y},{\star}|_\set{Y},{\divideontimes}|_\set{Y})$
                is also a field then $\set{Y}$ is called a
                \emph{subfield} of $\set{X}$ under the operations
                $\star$ and $\divideontimes$.
\end{itemize}
%
Recall that the operations $\star$ and $\divideontimes$ are functions
that take the form $\set{X} \times \set{X} \mapsto \set{X}$, and thus
the ${}|_\set{Y}$ notation restricts them to the subset $\set{Y}$; that
is, the restrictions take the form $\set{Y} \times \set{Y} \mapsto
\set{Y}$. It is important that both the domain and codomain have been
restricted to $\set{Y}$. If it is not possible to restrict both of the
operator function's domain and codomain then the subset cannot be
considered a substructure. This is referred to as \emph{closure}. That
is, the subset must be \emph{closed} under the operation in order to
qualify as a substructure.

\paragraph{Other Relevant Substructures:} There are many other
substructure examples. Later, in
\longref{app:math_algebra_over_a_field}, we will define a type of
\emph{algebra}, and so there may be \emph{subalgebras}. Similarly, there
can be \emph{submonoids} and \emph{subrings}. If the main structure is
commutative, the type of the substructure may be preceded with
\emph{commutative} as well in order to indicate that it is also
commutative. That is, a commutative group may have a \emph{commutative
subgroup}.

\subsection{Homomorphisms and Homomorphic Structures}
\label{app:math_homomorphisms}

Take two sets $\set{X}$ and $\set{Y}$ and a function $f: \set{X} \mapsto
\set{Y}$. The function $f$ is called a \emph{homomorphism} if algebraic
structures are preserved through the function. For example, consider
\emph{group homomorphisms} and \emph{ring homomorphisms}.
%
\begin{itemize}
        \item Assume that $(\set{X},{\star},e_\star)$ and
                $(\set{Y},{\diamond},e_\diamond)$ are two groups. If the
                function $f$ is such that for $x,y \in \set{X}$,
                %
                \begin{equation*}
                        f(x \star y) = f(x) \diamond f(y)
                \end{equation*}
                %
                then $f$ is called a \emph{group homomorphism}. That is,
                $f$ preserves uses the group structure present in
                $(\set{Y},{\diamond})$ in order to transplant the
                existing group structure in $(\set{X},{\star})$. It can
                be shown that 
                %
                \begin{itemize}
                        \item $f(e_\star)=e_\diamond$
                        \item for element $x \in \set{X}$,
                                $f(x^\star)=f(x)^\diamond$ where
                                ${}^\star$ indicates an inverse in group
                                $(\set{X},{\star})$ and ${}^\diamond$
                                indicates an inverse in group
                                $(\set{Y},{\diamond})$
                \end{itemize}
        \item Assume that
                $(\set{X},{\oplus},{\otimes},e_\oplus,e_\otimes)$ and
                $(\set{Y},{\boxplus},{\boxtimes},%
                          e_\boxplus,e_\boxtimes)$
                are two rings. If the function $f$ is such that for $x,y
                \in \set{X}$,
                %
                \begin{itemize}
                        \item $f(x \oplus y) = f(x) \boxplus f(y)$
                        \item $f(x \otimes y) = f(x) \boxtimes f(y)$
                        \item $f(e_\otimes) = e_\boxtimes$
                \end{itemize}
                %
                then $f$ is called a \emph{ring homomorphism}. That is,
                $f$ preserves uses the ring structure present in
                $(\set{Y},{\boxplus},{\boxtimes},%
                          e_\boxplus,e_\boxtimes)$
                in order to transplant the existing group structure in
                $(\set{X},{\oplus},{\otimes},e_\oplus,e_\otimes)$. It
                can be shown that 
                %
                \begin{itemize}
                        \item $f(e_\oplus)=e_\boxplus$
                        \item for element $x \in \set{X}$,
                                $f(x^\oplus)=f(x)^\boxplus$ where
                                ${}^\oplus$ indicates an inverse in
                                group $(\set{X},{\oplus})$ and
                                ${}^\boxplus$ indicates an inverse in
                                group $(\set{Y},{\boxplus})$
                        \item if $x \in \set{X}$ has an inverse
                                $x^\otimes$ in monoid
                                $(\set{X},{\otimes})$ then $f(x)$ has
                                an inverse $f(x)^\boxtimes$ in monoid
                                $(\set{Y},\boxtimes)$ and
                                $f(x^\otimes)=f(x)^\boxtimes$
                \end{itemize}
\end{itemize}
%
Additionally,
%
\begin{itemize}
        \item a \emph{semigroup homomorphism} is defined the same way as
                a group homomorphism, except that it relates two
                semigroups and thus has no consequences involving
                identity or inverses
        \item a \emph{monoid homomorphism} is defined the same way as a
                group homomorphism, except that it relates two monoids
                and thus has no consequences involving inverses
        \item a \emph{semiring homomorphism} is defined the same way as
                a ring homomorphism, except that it relates two
                semirings
        \item a \emph{field homomorphism} is defined the same way as a
                ring homomorphism, except that it relates two fields
\end{itemize}
%
Two structures for which there exists a homomorphism between them are
said to be \emph{homomorphic}, which roughly means that they have the
same shape. 

\paragraph{Isomorphisms and Isomorphic Structures:} Any homomorphism
that is also bijective is called an \emph{isomorphism}. Additionally,
two algebraic structures for which there exists an isomorphism between
them are said to be \emph{isomorphic}. In other words, isomorphic
algebraic structures are ones that consist of congruent sets that are
homomorphic in their algebraic structures.
%
\begin{itemize}
        \item The fact that the two sets are congruent implies that
                every element of either set can be replaced with a
                unique element from the other set.
        \item The fact that the two algebraic structures are homomorphic
                means that any operation on elements of either set can
                be replaced with operations on elements of the other
                set.
\end{itemize}
%
Therefore two isomorphic algebraic structures are very similar. If the
isomorphism is also an order isomorphism (\ie, it preserves the
ordering) then one structure most likely can be used as an equally valid
\emph{representation} of the other.

\subsection{Ordered Rings, Absolute Value, and Ordered Fields}
\label{app:math_ordered_rings}

Take commutative ring $(\set{X},{+},{\times},0,1)$. Also assume that set
$\set{X}$ is \emph{totally} ordered with \emph{total} order relation
$\leq$. It is common to denote this by $(\set{X},{+},{\times},{\leq})$
or even $(\set{X},{+},{\times},0,1,{\leq})$, which groups all
operations, identities, and relations of interest. For any $x,y \in
\set{X}$, use the notation $x < y$ to denote the relationship that $x
\leq y$ and $x \neq y$. If it is the case that for any $x,y,z \in
\set{X}$,
%
\begin{enumerate}[(i)]
        \item if $x \leq y$ then $z + x \leq z + y$
                \label{item:ordered_ring_add}
        \item if $0 \leq x$ and $0 \leq y$ then $0 \leq x \times y$
                \label{item:ordered_ring_mult}
\end{enumerate}
%
then $(\set{X},{+},{\times},0,1,{\leq})$ is called an \emph{ordered
ring}. Additionally, for all $x \in \set{X}$ with $x \neq 0$,
%
\begin{itemize}
        \item if $x < 0$, $x$ is called \emph{negative}
        \item if $0 < x$, $x$ is called \emph{positive}
\end{itemize}
%
Thus, the \emph{sign function} of element $x \in \set{X}$ is denoted
\symdef{Ageneral.5432}{sgnfn}{$\sgn(x)$}{sign function of $x$} and
defined by
%
\begin{equation*}
        \sgn(x)
        \triangleq
        \begin{cases}
                {-1} &\text{if } x < 0\\
                0 &\text{if } x = 0\\
                1 &\text{if } x > 0
        \end{cases}
\end{equation*}
%
where $-1$ is the additive inverse of the multiplicative identity $1$.
As \citet{Rudin76} does, it is simple to show that every ordered ring is
such that for all $x,y,z \in \set{X}$,
%
\begin{itemize}
        \item if $0 < x$ then $-x < 0$ and vice versa, where $-x$
                is the additive inverse of $x$
        \item if $0 < x$ and $y < z$ then $x \times y < x \times z$
        \item if $x < 0$ and $y < z$ then $x \times z < x \times y$
        \item if $x \neq 0$ then $0 < x \times x$ 
        \item $0 < 1$
\end{itemize}
%
Additionally, for any element $x \in \set{X}$, the \emph{absolute value}
of $x$ is denoted \symdef{Ageneral.5433}{absvalue}{$\pipe x
\pipe$}{absolute value of $x$ (\ie, $x = \sgn(x) \pipe x \pipe$)} and
defined
%
\begin{equation*}
        |x|
        \triangleq
        \begin{cases}
                x &\text{if } x \geq 0\\
                -x &\text{otherwise}
        \end{cases}
\end{equation*}
%
Of course, for all $x \in \set{X}$, $x = \sgn(x) |x|$. Note that the
absolute value can also be defined for \emph{complex numbers}, which we
do not discuss here, even though they are not an ordered ring. Clearly,
every subring of an ordered ring is also an ordered ring.

\paragraph{Ordered Fields:} If ordered ring is also a field, it is
called an \emph{ordered field}. Take ordered field
$(\set{X},{+},{\times},0,1,{\leq})$. For $x,y \in \set{X}$, use the
relationship $x < y$ to indicate that $x \leq y$ and $x \neq y$. It is
the case that for any $x,y \in \set{X}$, 
%
\begin{itemize}
        \item if $0 < x < y$ then $0 < y^{-1} < x^{-1}$
\end{itemize}
%
where $x^{-1}$ and $y^{-1}$ are the multiplicative inverses for $x$ and
$y$ respectively. Additionally, every subfield of an ordered field is
also an ordered field. Intuitively, ordered fields have all of the
characteristics necessary for familiar \emph{arithmetic}.

\subsection{Summations and Products of Indexed Families}
\label{app:math_sumprod_ind_fam}

Recall the notion of indexed family from
\longref{app:math_indexed_families}. Also recall how these indexed
families were used with nets and sequences from
\longref{app:math_nets_and_sequences}. When an indexed family is made up
of elements for which addition and multiplication are defined, it may be
useful to take sums and products of every element in that family. Here
we present some common notations for these operations. 

\paragraph{Finite Summations over Commutative Magmas:} Take a nonempty
commutative magma $(\set{X},{+})$ and \emph{finite} nonempty set
$\set{I}$. Now take the nonempty indexed family $(a_n)_{i \in \set{I}}$
where $a_i \in \set{X}$ for all $i \in \set{I}$. Call the operator $+$
the addition operator. The \symdef[]{Ageneral.z}{summation}{$\sum$}{sum
of elements of a set}notation
%
\begin{equation*}
        \sum\limits_{i \in \set{I}} a_i
\end{equation*}
%
results in the sum of every instance of every value of the family. Since
$(\set{X},{+})$ is commutative and set $\set{I}$ is finite then the
order in which the sum is performed has no impact on the value of the
sum. For example, take $\set{I} = \{2,1,3\}$. Then
%
\begin{equation*}
        \sum\limits_{i \in \set{I}} a_i
        =
        \sum\limits_{i \in \{1,2,3\}} a_i
        =
        a_3 + a_2 + a_1
        =
        a_2 + a_3 + a_1
\end{equation*}

\paragraph{Ordered Summations over General Magmas:} Take a magma
$(\set{X},{+})$ and \emph{totally} ordered set $(\set{I},{\leq})$ that
is either \emph{finite} or \emph{countably infinite}. Now take the
indexed family $(a_n)_{i \in \set{I}}$ where $a_i \in \set{X}$ for all
$i \in \set{I}$. Call the operator $+$ the addition operator. If $m,n
\in \set{I}$ with $m \leq n$ then the notation
%
\begin{equation*}
        \sum\limits_{i=m}^n a_i
\end{equation*}
%
is the sum of all elements $a_i$ with $i \in \{ j \in \set{I} : m \leq j
\leq n \}$ where the order of operation matches the ordering of index
elements. For example, take $\set{I} = \N$ with the standard natural
number order relation $\leq$. Then
%
\begin{equation*}
        \sum\limits_{i=4}^8 a_i
        =
        a_4 + a_5 + a_6 + a_7 + a_8
\end{equation*}
%
where the elements are listed in this order since $4 \leq 5 \leq 6 \leq
7 \leq 8$.  

\paragraph{Empty Summations over Magmas with Identity:} Take a magma
$(\set{X},{+})$ such that there exists an element $0 \in \set{X}$ such
that for all $x \in \set{X}$, $0 + x = x + 0 = x$ (\ie, $0$ is the
\emph{identity element} for the magma operation $+$). Also take a set
$\set{I}$. Now take the indexed family $(a_n)_{i \in \set{I}}$ where
$a_i \in \set{X}$ for all $i \in \set{I}$.  Call the operator $+$ the
addition operator.  In this case, the summation $\sum_{i \in \emptyset}
a_i$ is defined by
%
\begin{equation*}
        \sum\limits_{i \in \emptyset} a_i
        \triangleq
        0
\end{equation*}
%
That is, the \emph{empty sum} is the identity element for the magma.

\paragraph{Finite Products over Commutative Magmas:} Take a nonempty
commutative magma $(\set{X},{\times})$ and \emph{finite} nonempty set
$\set{I}$. Now take the nonempty indexed family $(a_n)_{i \in \set{I}}$
where $a_i \in \set{X}$ for all $i \in \set{I}$. Call the operator
$\times$ the multiplication operator. The
\symdef[]{Ageneral.z}{product}{$\prod$}{product of elements of a
set}notation
%
\begin{equation*}
        \prod\limits_{i \in \set{I}} a_i
\end{equation*}
%
results in the product of every instance of every value of the family.
Since $(\set{X},{\times})$ is commutative and set $\set{I}$ is finite,
then the order in which the product is performed has no impact on the
value of the product. For example, take $\set{I} = \{2,1,3\}$. Then
%
\begin{equation*}
        \prod\limits_{i \in \set{I}} a_i
        =
        \prod\limits_{i \in \{1,2,3\}} a_i
        =
        a_3 \times a_2 \times a_1
        =
        a_2 \times a_3 \times a_1
\end{equation*}

\paragraph{Ordered Products over General Magmas:} Take a magma
$(\set{X},{\times})$ and \emph{totally} ordered set $(\set{I},{\leq})$
that is either \emph{finite} or \emph{countably infinite}. Now take the
indexed family $(a_n)_{i \in \set{I}}$ where $a_i \in \set{X}$ for all
$i \in \set{I}$. Call the operator $\times$ the multiplication operator.
If $m,n \in \set{I}$ with $m \leq n$ then the notation
%
\begin{equation*}
        \prod\limits_{i=m}^n a_i
\end{equation*}
%
is the product of all elements $a_i$ with $i \in \{ j \in \set{I} : m
\leq j \leq n \}$ where the order of operation matches the ordering of
index elements. For example, take $\set{I} = \N$ with the standard
natural number order relation $\leq$. Then
%
\begin{equation*}
        \prod\limits_{i=4}^8 a_i
        =
        a_4 \times a_5 \times a_6 \times a_7 \times a_8
\end{equation*}
%
where the elements are listed in this order since $4 \leq 5 \leq 6 \leq
7 \leq 8$.  

\paragraph{Empty Products over Magmas with Identity:} Take a magma
$(\set{X},{\times})$ such that there exists an element $1 \in \set{X}$
such that for all $x \in \set{X}$, $1 \times x = x \times 1 = x$ (\ie,
$1$ is the \emph{identity element} for the magma operation $\times$).
Also take a set $\set{I}$. Now take the indexed family $(a_n)_{i \in
\set{I}}$ where $a_i \in \set{X}$ for all $i \in \set{I}$.  Call the
operator $\times$ the multiplication operator. In this case, the product
$\prod_{i \in \emptyset} a_i$ is defined by
%
\begin{equation*}
        \prod\limits_{i \in \emptyset} a_i
        \triangleq
        1
\end{equation*}
%
That is, the \emph{empty product} is the identity element for the
magma.

\section{Linear Algebra: Vector Spaces and Algebras}
\label{app:math_linear_algebra}

When many variables are related in a problem, complicated mathematical
structures can be used to represent those relationships. However, these
relationships can often be shown to have a certain kind of structure.
The area of \emph{linear algebra} studies one of those kinds of
structure.

\subsection{Vector Spaces}
\label{app:math_vector_space}

Let $(\set{F},{+},{\times})$ be a field with set elements called
\emph{scalars}. Let $(\set{V},{\oplus})$ be a commutative group with set
elements called \emph{vectors}. Take scalars $a,b \in \set{F}$ and
vectors $\v{x},\v{y} \in \set{V}$. Define a \emph{scalar (vector)
multiplication} operator $\mathop{\otimes}: \set{F} \times \set{V}
\mapsto \set{V}$; however, use the juxtaposition notation so that $a
\v{x}$ is an equivalent expression for $a \otimes \v{x}$. If it is the
case that 
%
\begin{enumerate}[(i)]
        \item $a (\v{x} \oplus \v{y}) = a \v{x} \oplus a \v{y}$
        \item $(a + b) \v{x} = a \v{x} \oplus b \v{x}$
        \item $a (b \v{x}) = (a b) \v{x}$
        \item $1 \v{x} = \v{x}$ where $1$ is the multiplicative identity
                for $(\set{F},{\times})$
\end{enumerate}
%
then $\set{V}$ is called a \emph{vector space} over the field $\set{F}$.
The field $\set{F}$ is called the \emph{base field} of vector space
$\set{V}$. Additionally, set $\set{V}$ may be called a \emph{linear
space} instead of a vector space. 

\paragraph{Operator Notation:} Usually the same symbol will be used for
all forms of addition and all forms of multiplication. That is, symbol
$+$ may be used to represent both scalar addition (\eg, ${+}$ above) and
vector addition (\eg, ${\oplus}$). Similarly, symbol $\times$ may be
used to represent both scalar field multiplication (\eg, ${\times}$) and
scalar vector multiplication (\eg, ${\otimes}$). Furthermore,
multiplication in both cases can be represented by \emph{juxtaposition}
of arguments. The actual operator that should be used in these cases
should be clear from the type of the argument. That is, it is clear that
$ab$ denotes scalar field multiplication and $a\v{x}$ denotes scalar
vector multiplication. Juxtaposition is usually the preferred form of
multiplication because the two other common multiplication symbols,
$\times$ and $\cdot$, are often used to represent other common special
types of vector multiplication that we have not defined in this
document. If it is said that $\set{V}$ is a vector space over the field
$\set{F}$, it is implied that $+$ should be used for addition and
juxtaposition (or $\times$ or $\cdot$) should be used for
multiplication.

\paragraph{Vector Subspaces:} Take commutative group $(\set{V},{+})$ and
field $(\set{F},{+},{\times})$. Assume that $\set{V}$ is a vector space
over the field $\set{F}$ with scalar multiplication operator $\times$.
Juxtaposition (\ie, placing two elements next to each other without an
operator) will be used as a shorthand for multiplication, where the
definition of multiplication depends on the context. Additionally, take
$\set{W}$ to be a commutative subgroup of $\set{V}$. If it is the case
that for any $a \in \set{F}$ and any $\v{x},\v{y} \in \set{W}$,
%
\begin{enumerate}[(i)]
        \item $a \v{x} \in \set{W}$
        \item $\v{x} + \v{y} \in \set{W}$
                \label{item:vector_subspace_addition}
        \item $0 \in \set{W}$, where $0$ is the additive identity for
                group $(\set{V},{+})$
                \label{item:vector_subspace_identity}
\end{enumerate}
%
then $\set{W}$ is called a \emph{vector subspace} of $\set{V}$. Note
that (\shortref{item:vector_subspace_addition}) and
(\shortref{item:vector_subspace_identity}) are redundant since
$(\set{W},{+})$ is a subgroup; we list them here for emphasis only. In
other words, a commutative subgroup of a vector space only needs to be
\emph{closed} under the vector space's scalar multiplication in order to
be called a vector subspace. Sometimes the term \emph{linear subspace}
or simply \emph{subspace} is used instead of vector subspace. 

\paragraph{Interpretation:} Roughly, a vector can be thought of as any
element that has some form of magnitude and direction (\eg, length and
angle). Different vectors that have the same magnitude may point in
different directions. Scalars then \emph{scale} the length of a vector.
If a vector is multiplied by a \emph{negative} scalar, the length of the
vector is not only scaled but its direction is reversed. Concrete
examples of vector spaces will be given in
\longref{app:math_linear_algebra}.

\paragraph{Fields as Vector Spaces:} Note that any field is trivially a
vector space with itself as a base field. That is, any field
$(\set{F},{+},{\times})$ is a vector space over itself equipped with
scalar vector multiplication operator ${\times}$.

\paragraph{Vector Spaces over Commutative Rings:} A vector space over a
commutative ring can be defined exactly as above. In fact, everything
above holds with vector spaces over commutative rings; this is the case
because none of the requirements above involve multiplicative inverses
of scalars (\ie, scalar division).

\paragraph{Commutative Rings as Vector Spaces:} Note that any
commutative ring is trivially a vector space with itself as a base
commutative ring. That is, any commutative ring $(\set{R},{+},{\times})$
is a vector space over itself equipped with scalar vector multiplication
operator ${\times}$.

\subsection{Linear and Bilinear Functions}
\label{app:math_linear_operator}

Take $\set{X}$ and $\set{Y}$ to be two vector spaces over the same base
field $(\set{F},{+},{\times})$. A function $f: \set{X} \mapsto \set{Y}$
is called \emph{linear} if for any vectors $x,y \in \set{X}$ and any
scalar $a \in \set{F}$, it is the case that
%
\begin{enumerate}[(i)]
        \item $f(x+y) = f(x) + f(y)$
        \item $f(ax) = af(x)$
\end{enumerate}
%
where juxtaposition is used to indicate scalar (vector) multiplication.
It is equivalent to say that the function $f: \set{X} \mapsto \set{Y}$
is linear if and only if for any vectors $x,y \in \set{X}$ and scalars
$a,b \in \set{F}$, $f(ax+by)=af(x)+bf(y)$.

\paragraph{Bilinear Functions:} Take $\set{X}$, $\set{Y}$, and $\set{Z}$
to be three vector spaces over the same field $(\set{F},{+},{\times})$
so that juxtaposition denotes scalar (vector) multiplication. A function
$f: \set{X} \times \set{Y} \mapsto \set{Z}$ is called \emph{bilinear} if
for any vectors $\v{x}_1,\v{x}_2 \in \set{X}$ and $\v{y}_1,\v{y}_2 \in
\set{Y}$ and scalars $a,b \in \set{F}$, it is the case that
%
\begin{enumerate}[(i)]
        \item $f(a \v{y}_1 + b \v{y}_2,\v{y}_1) 
                = a f(\v{x}_1,\v{y}_1) + b f(\v{x}_2,\v{y}_1)$
                \label{item:bilinear_first_argument}
        \item $f(\v{x}_1,a \v{y}_1 + b \v{y}_2)
                = a f(\v{x}_1,\v{y}_1) + b f(\v{x}_1,\v{y}_2)$
                \label{item:bilinear_second_argument}
\end{enumerate}
%
Take any $\v{y}_0 \in \set{Y}$. Define a function $g: \set{X} \mapsto
\set{Z}$ with $g(\v{x}) \triangleq f(\v{x},\v{y}_0)$. Since $f$ is
bilinear then by property (\shortref{item:bilinear_first_argument}),
the new function $g$ is linear. This is why property
(\shortref{item:bilinear_first_argument}) is called being \emph{linear
in the first argument}. Similarly, property
(\shortref{item:bilinear_second_argument}) is called being \emph{linear
in the second argument}. Note that by these two properties, it is always
the case that for any vectors $\v{x} \in \set{X}$ and $\v{y} \in
\set{Y}$ and scalars $a,b \in \set{F}$, 
%
\begin{equation}
        \begin{split}
        f( a \v{x}, b \v{y} ) 
        &= b f( a \v{x}, \v{y} ) 
        = a b f( \v{x}, \v{y} ) 
        = b f( \v{x}, a \v{y} ) 
        = f( b \v{x}, a \v{y} )\\
        &= a f( \v{x}, b \v{y} ) 
        = b a f( \v{x}, \v{y} ) 
        = a f( b \v{x}, \v{y} )\\
        &= f( a b \v{x}, \v{y} ) 
        = f( \v{x}, a b \v{y} )\\
        &= f( b a \v{x}, \v{y} ) 
        = f( \v{x}, b a \v{y} )
        \end{split}
        \label{eq:bilinear_faux_associative}
\end{equation}
%
Also note that since $\set{F}$ is a field (and thus a commutative ring),
$a b = b a$, and so some of the equalities in
\longref{eq:bilinear_faux_associative} are redundant.

\paragraph{Bilinear Operators:} Take $\set{X}$ to be a vector space over
the field $(\set{F},{+},{\times})$ and denote scalar vector
multiplication operator with $\otimes$. Recall that the scalar vector
multiplication is a function of two arguments, namely ${\otimes}:
\set{F} \times \set{X} \mapsto \set{X}$. Thus, for $a \in \set{F}$ and
$\v{x} \in \set{X}$, the notation
%
\begin{equation*}
        a \otimes \v{x} \triangleq \mathop{\otimes}(a,\v{x})
\end{equation*}
%
where $a \v{x}$ (\ie, juxtaposition) will be an alternate way of
indicating $a \otimes \v{x}$. 

Recall that any field is trivially a vector space over itself. Thus,
$\set{F}$ and $\set{X}$ are two vector spaces defined over the same
field. Additionally, by the definition of a vector space, the scalar
vector multiplication $\otimes$ is a bilinear function. Using those
properties, it is simple to verify that for $a,b,c,d \in \set{F}$ and
$\v{x}_1,\v{x}_2 \in \set{X}$, 
%
\begin{itemize}
        \item $\mathop{\otimes}( ac + bd, \v{x}_1 ) 
                = a \mathop{\otimes}( c, \v{x}_1 ) 
                + b \mathop{\otimes}( d, \v{x}_1 )$
        \item $\mathop{\otimes}( c, a\v{x}_1 + b\v{x}_2 ) 
                = a \mathop{\otimes}( c, \v{x}_1 ) 
                + b \mathop{\otimes}( c, \v{x}_2 )$
\end{itemize}
%
which is equivalent to the statement that
%
\begin{itemize}
        \item $(ac + bd) \otimes \v{x}_1
                = a ( c \otimes \v{x}_1 ) + b ( d \otimes \v{x}_1 )$
        \item $c \otimes ( a\v{x}_1 + b\v{x}_2 ) 
                = a ( c \otimes \v{x}_1 ) + b ( c \otimes \v{x}_2 )$
\end{itemize}
%
which is equivalent to the statement that
%
\begin{itemize}
        \item $(ac + bd) \v{x}_1 = ac\v{x}_1 + bd\v{x}_1$
        \item $c ( a\v{x}_1 + b\v{x}_2 ) = ac\v{x}_1 + bc\v{x}_2$
\end{itemize}
%
and so operator $\otimes$ is called a \emph{bilinear operator} because
it is an operator that is linear in both its first and second arguments.

\subsection{Algebra over a Field}
\label{app:math_algebra_over_a_field}

Take magma $(\set{A},{\times})$ so that $\set{A}$ is a vector space over
the field $(\set{F},{+},{\times})$. Denote vector addition with $+$ and
scalar vector multiplication with $\times$. Thus, the symbol $\times$
can be used to denote three different multiplication operators, namely
%
\begin{enumerate}[(i)]
        \item vector (vector) multiplication from $(\set{A},{\times})$
                (\ie, ${\times}: \set{A} \times \set{A} \mapsto
                \set{A}$)
                \label{item:vector_vector_mult}
        \item scalar (vector) multiplication from the vector space
                (\ie, ${\times}: \set{F} \times \set{A} \mapsto
                \set{A}$)
                \label{item:scalar_vector_mult}
        \item multiplication from $(\set{F},{\times})$
                (\ie, ${\times}: \set{F} \times \set{F} \mapsto
                \set{F}$)
                \label{item:scalar_mult}
\end{enumerate}
%
where the new multiplication in (\shortref{item:vector_vector_mult})
provides a method for finding the product of two vectors. Note that this
vector multiplication is a binary operator where the vector spaces
making up its two arguments are both defined over the same field. Thus,
it is possible that this operator is bilinear. If the vector
multiplication is a bilinear operator then $\set{A}$ is called an
\emph{algebra} over the field $\set{F}$ or an $\set{F}$-algebra, where
$\set{F}$ is also called the \emph{base field} of algebra $\set{A}$.

For example, take $\set{A}$ to be an algebra over the field $\set{F}$
where $+$ denotes both scalar and vector addition and $\times$ or
juxtaposition denotes all three forms of multiplication. By the
bilinear property of vector multiplication, it is the case that for
every $\v{x},\v{y},\v{z} \in \set{A}$ and scalars $a,b \in \set{F}$, 
%
\begin{itemize}
        \item $(\v{x}+\v{y})\v{z} = \v{x}\v{z}+\v{y}\v{z}$
        \item $(a \v{x})(\v{y}) = (a)(\v{x}\v{y})$
        \item $\v{x}(\v{y}+\v{z}) = \v{x}\v{y}+\v{x}\v{z}$
        \item $(\v{x})(b \v{y}) = (b)(\v{x}\v{y})$
\end{itemize}
%
which can be summarized by
%
\begin{itemize}
        \item $(a\v{x}+b\v{y})\v{z} = a\v{x}\v{z}+b\v{y}\v{z}$
        \item $\v{x}(a\v{y}+b\v{z}) = a\v{x}\v{y}+b\v{x}\v{z}$
\end{itemize}
% 
Note that for $\v{x},\v{y} \in \set{A}$ and $a \in \set{F}$, $a \v{x}
\v{y} = \v{x} a \v{y}$. For this reason, while it is not technically
correct, for $\v{x} \in \set{A}$ and $a \in \set{F}$, the notation
$\v{x} a$ is usually taken to be equivalent to the notation $\v{x} a$
even though the product $\v{x} a$ is not technically defined.

\paragraph{Associative Algebras:} Take algebra $\set{A}$ over the field
$\set{F}$. If $(\set{A},{\times})$ is a semigroup (\ie, vector
multiplication is associative) then $\set{A}$ is called an
\emph{associative algebra}.

\paragraph{Unitary Associative Algebras:} Take algebra $\set{A}$ over
the field $\set{F}$. If $(\set{A},{\times})$ is a monoid (\ie, a vector
multiplicative identity exists) then $\set{A}$ is called an
\emph{unitary (or unital) associative algebra}. Note that
$(\set{A},{+})$ is a group and $(\set{A},{\times})$ is a monoid and
multiplication distributes over addition; therefore, $\set{A}$ is also a
ring. Note, however, that $\set{A}$ is not generally a commutative ring.

\paragraph{Algebras over Commutative Rings:} As with vector spaces, all
definitions above can be applied to algebras with bases of commutative
rings instead of fields. Algebras need only be over fields when scalar
multiplicative inverses (\ie, scalar division) are required.

\paragraph{Fields as Algebras:} Because fields are trivially vector
spaces over themselves and field multiplication is bilinear, any field
is trivially an algebra over itself. In fact, because field
multiplication is associative, any field is trivially an associative
algebra over itself.

\paragraph{Commutative Rings as Algebras:} Because commutative rings are
trivially vector spaces over themselves and commutative ring
multiplication is bilinear, any commutative ring is trivially an algebra
over itself. In fact, because commutative ring multiplication is
associative, any commutative ring is trivially an associative algebra
over itself.

\section{Boolean Rings and Algebras}
\label{app:math_boolean_rings_and_algebras}

We now introduce two new algebraic structures that have special
applications in set theory and logic. \Citet{Stoll79} gives detailed
information about these structures. We introduce them here to provide
analytical background for \longrefs{app:math_logic},
\shortref{app:math_measure}, and \shortref{app:math_probability}.
However, we do show that these two structures are identical.

\subsection{Boolean Rings}
\label{app:math_boolean_rings}

Take a ring $(\set{X},{+},{\times},0,1)$. To say that this is a
\emph{Boolean ring} means that for all $x \in \set{X}$, $x \times x =
x$. 

\paragraph{Boolean Rings as Commutative Rings:} Take
$(\set{X},{+},{\times},0,1)$ to be a Boolean ring and take juxtaposition
to denote multiplication (\ie, $\times$). Take $x \in \set{X}$. Since
this is a Boolean ring, $xx = x$.  Additionally, since $(\set{X},{+})$
is a commutative group, then there exists an element ${-x}$ such that $x
+ {-x} = 0$. Take such an element called ${-x}$. Now, note that
%
\begin{align*}
        x + x
        &= ( x + x )( x + x )\\
        &= xx + xx + xx + xx\\ 
        &= x + x + x + x
\end{align*}
%
Thus,
%
\begin{align*}
        x + x + {-x} + {-x} 
        &= x + x + x + x + {-x} + {-x}\\
        &= x + x + x + {-x} + x + {-x}\\
        &= x + x + 0 + 0\\
        &= x + x + 0\\
        &= x + x
\end{align*}
%
However, $x + x + {-x} + {-x} = x + 0 + {-x} = x + {-x} = 0$. Therefore,
%
\begin{equation}
        x + x = 0
        \label{eq:boolean_ring_xplusx}
\end{equation}
%
Now, also take $y,{-y} \in \set{X}$ such that $y + {-y} = 0$. Again,
$yy=y$. Additionally,
%
\begin{align*}
        x + y 
        &= ( x + y )( x + y )\\
        &= xx + xy + yx + yy\\
        &= x + xy + yx + y
\end{align*}
%
Thus, 
%
\begin{align*}
        x + y + {-x} + {-y}
        &= x + xy + yx + y + {-x} + {-y}\\
        &= x + {-x} + xy + yx + y + {-y}\\
        &= 0 + xy + yx + y + {-y}\\
        &= 0 + xy + yx + 0\\
        &= xy + yx + 0\\
        &= xy + yx
\end{align*}
%
However, $x + y + {-x} + {-y} = x + {-x} + y + {-y} = 0 + 0 = 0$, and so
$xy + yx = 0$. This means that $xy + yx + yx = yx$. However, $xy + yx +
yx = xy + 0 = xy$. Therefore, 
%
\begin{equation*}
        xy = yx
\end{equation*}
%
Thus, every Boolean ring is a commutative ring. Additionally, in a
Boolean ring, the addition of any element with itself is the additive
identity (\ie, \longref{eq:boolean_ring_xplusx}).

\paragraph{Boolean Rings as Algebras:} Take $(\set{X},{+},{\times},0,1)$
to be a Boolean ring. As shown, this Boolean ring is also a commutative
ring. Of course, all commutative rings are trivially algebras over
themselves. Thus, $(\set{X},{+},{\times},0,1)$ is an algebra with itself
as a base ring.

\subsection{Boolean Algebra}
\label{app:math_boolean_algebra}

We now introduce a new algebraic structure that is not based on any of
the previous structures. Take a nonempty set $\set{X}$ with elements $0$
and $1$ (\ie, $0,1 \in \set{X}$). Additionally, define operations $\lor:
\set{X} \times \set{X} \mapsto \set{X}$, $\land: \set{X} \times \set{X}
\mapsto \set{X}$, and $\lnot: \set{X} \mapsto \set{X}$ such that all of
the following are satisfied.
%
\begin{enumerate}[(i)]
        \item For all $x,y,z \in \set{X}$, 
                %
                \begin{equation*}
                        x \lor (y \lor z) = (x \lor y) \lor z
                        \quad \text{ and } \quad
                        x \land (y \land z) = (x \land y) \land z
                \end{equation*}
                %
                That is, $\lor$ and $\land$ are both \emph{associative}
                operations.
        \item For all $x,y \in \set{X}$,
                %
                \begin{equation*}
                        x \lor y = y \lor x
                        \quad \text{ and } \quad
                        x \land y = y \land x
                \end{equation*}
                %
                That is, $\lor$ and $\land$ are both \emph{commutative}
                operations.
        \item For all $x,y,z \in \set{X}$,
                %
                \begin{equation*}
                        x \lor (y \land z)=(x \lor y) \land (x \lor z)
                        \quad \text{ and } \quad
                        x \land (y \lor z)=(x \land y) \lor (x \land z)
                \end{equation*}
                %
                That is, $\lor$ \emph{distributes} over $\land$, and
                $\land$ distributes over $\lor$.
        \item For any $x \in \set{X}$, 
                %
                \begin{equation*}
                        x \lor 0 = x
                        \quad \text{ and } \quad
                        x \land 1 = x
                \end{equation*}
                %
                That is, $0$ is the \emph{identity element} for $\lor$
                and $1$ is the \emph{identity element} for $\land$.
        \item For any $x \in \set{X}$,
                %
                \begin{equation*}
                        x \lor \lnot x = 1
                        \quad \text{ and } \quad
                        x \land \lnot x = 0
                \end{equation*}
                %
                This is like an inverse property. In fact, for any $x
                \in \set{X}$, $\lnot x$ will be called the
                \emph{complement} of $x$.
\end{enumerate}
%
Together this set, these operations, and these two elements, represented
as the $6$-tuple $(\set{X},{\lor},{\land},{\lnot},0,1)$, is called a
\emph{Boolean algebra}.

\paragraph{Properties of Boolean Algebras:} Take a Boolean algebra
$(\set{X},{\lor},{\land},{\lnot},0,1)$. It can be easily shown that all
of the following hold.
%
\begin{itemize}
        \item Take $y \in \set{X}$. If $x \lor y = x$ for any $x \in
                \set{X}$ then $y = 0$. That is, $0$ is a unique element
                of $\set{X}$.
        \item Take $y \in \set{X}$. If $x \land y = x$ for any $x \in
                \set{X}$ then $y = 1$. That is, $1$ is a unique element
                of $\set{X}$.
        \item Take $x,y \in \set{X}$. If $x \land y = 1$ and $x \lor y =
                0$ then $y = \lnot x$. That is, every element has a
                unique complement.
        \item For all $x \in \set{X}$, $\lnot( \lnot x ) = x$.
        \item It is the case that $0 = \lnot 1$ and $1 = \lnot 0$. That
                is, $0$ and $1$ are complements of each other.
        \item For any $x \in \set{X}$, $x \lor x = x$ and $x \land x =
                x$.
        \item For any $x \in \set{X}$, $x \lor 1 = 1$ and $x \land 0 =
                0$.
        \item For any $x,y \in \set{X}$, $x \lor (x \land y) = x$ and $x
                \land (x \lor y) = x$.
        \item For any $x,y \in \set{X}$, $\lnot(x \lor y) = \lnot x
                \land \lnot y$ and $\lnot(x \land y) = \lnot x \lor
                \lnot y$.
        \item For any $x,y \in \set{X}$, $x \land y = x$ if and only if
                $x \lor y = y$.
\end{itemize}

\paragraph{Boolean Algebra Ordering:} Take a Boolean algebra
$(\set{X},{\lor},{\land},{\lnot},0,1)$. Introduce the ordering operation
$\leq$ such that for any $x,y \in \set{X}$,
%
\begin{equation*}
        x \leq y \quad \text{ if and only if } \quad x \land y = x
\end{equation*}
%
Of course, $x \land y = x$ if and only if $x \lor y = y$ for any $x,y
\in \set{X}$. Therefore, an equivalent definition of $\leq$ is that for
any $x,y \in \set{X}$,
%
\begin{equation*}
        x \leq y \quad \text{ if and only if } \quad x \lor y = y
\end{equation*}
%
Clearly, for all $x,y \in \set{X}$,
%
\begin{itemize}
        \item $x \leq x$
        \item if $x \leq y$ and $y \leq x$ then $x = y$
        \item if $x \leq y$ and $y \leq z$ then $x \leq z$
\end{itemize}
%
and so $\set{X}$ equipped with $\leq$, denoted $(\set{X},{\leq})$ or
$(\set{X},{\lor},{\land},{\lnot},0,1,{\leq})$, is a partially ordered
set which we will call an \emph{ordered Boolean algebra}. Take $x,y \in
\set{X}$. Note that by one of the properties of a Boolean algebra,
%
\begin{equation*}
        (x \land y) \lor x = x
        \quad \text{ and } \quad
        (x \land y) \lor y = y
\end{equation*}
%
and therefore $(x \land y) \leq x$ and $(x \land y) \leq y$. Now, assume
that $z \in \set{X}$ is such that $z \leq x$ and $z \leq y$. That is,
%
\begin{equation*}
        z \land x = z \quad \text{ and } \quad z \land y = z
\end{equation*}
%
Note that
%
\begin{align*}
        z \land (x \land y)
        &= z \land x \land y\\
        &= (z \land x) \land y\\
        &= z \land y\\
        &= z
\end{align*}
%
Therefore, $z \leq (x \land y)$. It can similarly be shown that $x \leq
(x \lor y)$ and $y \leq (x \lor y)$ and for any $z$ such that $x \leq z$
and $y \leq z$, $(x \lor y) \leq z$. Thus, for all $x,y \in \set{X}$,
%
\begin{itemize}
        \item the \emph{greatest lower bound} or \emph{infimum} or
                \emph{meet} of $x$ and $y$ is $x \land y$; that is,
                $\inf \{x,y\} = x \land y$
        \item the \emph{least upper bound} or \emph{supremum} or
                \emph{join} of $x$ and $y$ is $x \lor y$; that is,
                $\sup \{x,y\} = x \lor y$
\end{itemize}
%
Therefore, since the pairwise meet and pairwise join exist for any pair,
$(\set{X},{\leq})$ is a lattice. Additionally, note that for all $x \in
\set{X}$,
%
\begin{itemize}
        \item $x \leq 1$
        \item $0 \leq x$
\end{itemize}
%
Therefore, $1$ is the greatest (\ie, top) element of $(\set{X},{\leq})$
and $0$ is the least (\ie, bottom) element of $(\set{X},{\leq})$. This
makes $(\set{X},{\leq})$ a bounded lattice. Finally, take $x,y \in
\set{X}$ and assume that $x \leq y$. That is, assume that $x \land y =
x$. Thus, $\lnot x = \lnot ( x \land y ) = \lnot x \lor \lnot y$, and so
$\lnot y \leq \lnot x$. In summary, 
%
\begin{itemize}
        \item every ordered Boolean algebra is a partially ordered set
        \item every ordered Boolean algebra is a bounded lattice
        \item for any two elements of an ordered Boolean algebra, one
                element is less than or equal to another element if and
                only if its complement is greater than or equal to the
                other complement
\end{itemize}

\subsection{Boolean Rings as Boolean Algebras}

Take a Boolean ring $(\set{X},{+},{\times},0,1)$. Recall that this
Boolean ring can be called an algebra since all commutative rings are
algebras. Now, introduce the meet operator $\land: \set{X} \times
\set{X} \mapsto \set{X}$, the join operator $\lor: \set{X} \times
\set{X} \mapsto \set{X}$, and the complement operator $\lnot: \set{X}
\times \set{X}$ such that for any $x,y \in \set{X}$,
%
\begin{itemize}
        \item $x \land y \triangleq x \times y$
        \item $x \lor y \triangleq x + y + (x \times y)$
        \item $\lnot x \triangleq 1 + x$
\end{itemize}
%
Denote Boolean ring $\set{X}$ equipped with operators $\land$, $\lor$,
$\lnot$, and elements $0$ and $1$ with the $6$-tuple
$(\set{X},{\lor},{\land},{\lnot},0,1)$. Using all of the properties
endowed to $+$, $\times$, $0$, and $1$, it is easy to show that
$(\set{X},{\lor},{\land},{\lnot},0,1)$ is a Boolean algebra. That is,
the Boolean ring $(\set{X},{+},{\times},0,1)$ is an algebra and
$(\set{X},{\lor},{\land},{\lnot},0,1)$ is a Boolean algebra. This is
true for all Boolean rings.

\subsection{Boolean Algebras as Boolean Rings}

Take a Boolean algebra $(\set{X},{\lor},{\land},{\lnot},0,1)$. Introduce
the addition operator $+: \set{X} \times \set{X} \mapsto \set{X}$ and
the multiplication operator $\times: \set{X} \times \set{X} \mapsto
\set{X}$ such that for any $x,y \in \set{X}$,
%
\begin{itemize}
        \item $x \times y \triangleq x \land y$
        \item $x + y \triangleq (x \lor y) \land (\lnot x \lor \lnot y)$
\end{itemize}
%
Denote Boolean algebra $\set{X}$ equipped with operations $+$ and
$\times$ and elements $0$ and $1$ with the $5$-tuple
$(\set{X},{+},{\times},0,1)$.

\paragraph{Boolean Algebra as Commutative Group:} Take Boolean algebra
$(\set{X},{+},{\times},0,1)$. Using all of the properties endowed to
$\land$, $\lor$, $\lnot$, $0$, and $1$, it can be easily shown that
%
\begin{itemize}
        \item for all $x,y,z \in \set{X}$, $(x + y) + z = x + (y + z)$
        \item for all $x \in \set{X}$, $0 + x = x + 0 = x$
        \item for all $x \in \set{X}$, $x + x = 0$
        \item for all $x,y \in \set{X}$, $x + y = y + x$
\end{itemize}
%
That is, $(\set{X},{+})$ is a commutative group with identity element
$0$ where every element is its own additive inverse.

\paragraph{Boolean Algebra as Commutative Monoid:} Take Boolean algebra
$(\set{X},{+},{\times},0,1)$. Using all of the properties endowed to
$\land$, $\lor$, $\lnot$, $0$, and $1$, it can be easily shown that
%
\begin{itemize}
        \item for all $x,y,z \in \set{X}$, $(x \times y) \times z = x
                \times (y \times z)$
        \item for all $x \in \set{X}$, $1 \times x = x \times 1 = x$
        \item for all $x,y \in \set{X}$, $x \times y = y \times x$
\end{itemize}
%
That is, $(\set{X},{\times})$ is a commutative monoid with identity
element $1$.

\paragraph{Boolean Algebra as Commutative Ring:} Take Boolean algebra
$(\set{X},{+},{\times},0,1)$. Using all of the properties endowed to
$\land$, $\lor$, $\lnot$, $0$, and $1$, it can be easily shown that for
all $x,y,z \in \set{X}$,
%
\begin{equation*}
        x \times (y + z) = (x \times y) + (x \times z)
        \quad \text{ and } \quad
        (x + y) \times z = (x \times z) + (y \times z)
\end{equation*}
%
Therefore, $(\set{X},{+},{\times},0,1)$ is a commutative ring.

\paragraph{Boolean Algebra as Boolean Ring:} Take Boolean algebra
$(\set{X},{+},{\times},0,1)$. Using all of the properties endowed to
$\land$, $\lor$, $\lnot$, $0$, and $1$, it can be easily shown that for
all $x \in \set{X}$,
%
\begin{equation*}
        x \times x = x
\end{equation*}
%
Therefore, $(\set{X},{+},{\times},0,1)$ is a Boolean ring. In fact, we
have already shown that $x + x = 0$ since each element is its own
additive inverse. Therefore, all Boolean algebras are Boolean rings.
Since Boolean rings are all algebras (over rings), then Boolean algebras
are also algebras.

\subsection{Equivalence of Boolean Algebras and Boolean Rings}

Because every Boolean algebra is a Boolean ring and every Boolean ring
is a Boolean algebra, the two structures are equivalent. 

\subsection{Subalgebras of Boolean Algebras}
\label{app:math_boolean_subalgebras}

Take a set $\set{U}$ where $(\set{U},{\lor},{\land},{\lnot},0,1)$ is a
Boolean algebra. Sometimes this is called a \emph{Boolean algebra over
the set $\set{U}$}. Note that $\{0,1\} \subseteq \set{U}$ and
%
\begin{equation*}
        (\{0,1\},{\lor},{\land},{\lnot},0,1)
\end{equation*}
%
is also a Boolean algebra over the subset $\{0,1\}$; therefore, it is
called a \emph{subalgebra}. It is not necessary to list every algebraic
operation with a subalgebra as it is implied that they are the same as
the Boolean algebra which is over the superset. In other words, the set
$\{0,1\}$ is a subalgebra of the Boolean algebra
$(\set{U},{\lor},{\land},{\lnot},0,1)$. That being said, we will call
$(\set{U},{\lor},{\land},{\lnot},0,1)$ the \emph{$\set{U}$ Boolean
algebra} for brevity.

\paragraph{Requirements for a Subalgebra:} Take the algebra
$(\set{U},{\lor},{\land},{\lnot},0,1)$ and subset $\set{X} \subseteq
\set{U}$. To say subset $\set{X}$ forms a \emph{subalgebra of the
\set{U} Boolean algebra} means that
%
\begin{enumerate}[(i)]
        \item for any $x,y \in \set{X}$, $x \land y \in \set{X}$
                \label{item:boolean_algebra_closure_and}
        \item for any $x,y \in \set{X}$, $x \lor y \in \set{X}$
                \label{item:boolean_algebra_closure_or}
        \item for any $x \in \set{X}$, $\lnot x \in \set{X}$
                \label{item:boolean_algebra_closure_not}
\end{enumerate}
%
This will ensure that every one of the requirements for a Boolean
algebra hold, thus justifying calling $\set{X}$ a subalgebra of the
$\set{U}$ Boolean algebra. Assume that $\set{X}$ is a subalgebra of
the $\set{U}$ Boolean algebra and take $x \in \set{X}$. 
%
\begin{itemize}
        \item By property (\shortref{item:boolean_algebra_closure_not}),
                $\lnot x \in \set{X}$.
        \item By property (\shortref{item:boolean_algebra_closure_and}),
                since $x \in \set{X}$ and $\lnot x \in \set{X}$ then $x
                \land \lnot x \in \set{X}$; however, $x \land \lnot x =
                0$ and so $0 \in \set{X}$. 
        \item Additionally, by property
                (\shortref{item:boolean_algebra_closure_or}), since $x
                \in \set{X}$ and $\lnot x \in \set{X}$ then $x \lor
                \lnot x \in \set{X}$; however, $x \lor \lnot x = 1$ and
                so $1 \in \set{X}$.
\end{itemize}
%
Thus, the \emph{trivial subalgebra} of the $\set{U}$ Boolean algebra is
$\{0,1\}$.

\subsection{Propositional Logic and the Trivial Boolean Algebra}
\label{app:math_prop_logic_boolean_algebra}

A trivial Boolean algebra takes the form
%
\begin{equation*}
        (\{0,1\},{\lor},{\land},{\lnot},0,1)
\end{equation*}
%
That is, the trivial Boolean algebra contains only the two unique
identity elements. Another trivial Boolean algebra that is
important to us is
%
\begin{equation*}
        (\{\text{false},\text{true}\},
         \text{or},
         \text{and},
         \text{not},
         \text{false},
         \text{true})
\end{equation*}
%
This is the Boolean algebra which is the basis for the logic described
in \longref{app:math_logic}. Of course, all trivial Boolean algebras are
isomorphic to each other; that is, symbols and notation can be
substituted for each other. This is the justification for the use of
$\land$ for \emph{and}, $\lor$ for \emph{or}, and $\lnot$ for
\emph{not}. Similarly, as we will show in
\longref{app:math_algebras_of_sets}, sets with their set operations can
be shown to be Boolean algebras as well. This is the reason for the
similarities between operations like $\cap$ with sets and $\land$ for
logic. This shows the utility of algebra. By identifying common
structures, algebra provides a context for very general result that can
prevent repetitive work and reveal relationships that may not have been
easily anticipated in the specialized context.

\paragraph{Boolean Algebra Ordering and Logical Implication:} Recall the
topic of \emph{statements} and \emph{implication} in propositional
logic. Take $x$ and $y$ to be two logical statements. To say that $x$
implies $y$ means that
%
\begin{itemize}
        \item if $x$ is true then $y$ must be true
        \item if $x$ is not true then $y$ may be either true or false
\end{itemize}
%
Assume that $x$ implies $y$. Clearly, if $x$ is false then the
statement \emph{$x$ and $y$} must also be false. Additionally, if $x$ is
true then $y$ must be true so \emph{$x$ and $y$} must also be true.
Clearly, saying $x$ implies $y$ is equivalent to saying that
%
\begin{equation*}
        x \land y = x
\end{equation*}
%
where $\land$ is a symbol that represents \emph{and}. However, above
this was used as the definition for the partial order $\leq$. That is,
saying that \emph{$x$ implies $y$} is equivalent to saying that $x \leq
y$. To understand this, note that by this definition of $\leq$, it is
the case that $\text{false} \leq \text{true}$. Now assume that $x \leq
y$. Both $x,y \in \{\text{false},\text{true}\}$. Thus, if $x =
\text{true}$ then it must be that $y = \text{true}$ because $x \leq y$
and $\text{false} \leq \text{true}$. However, if $x = \text{false}$ then
$y \in \{\text{false},\text{true}\}$; that is, when $x$ is false,
nothing can be said about $y$. Therefore, the $\leq$ relation matches
what is expected from implication. Additionally, note that since every
Boolean algebra is a totally ordered set, if $x \leq y$ and $y \leq x$
then $x = y$. This also matches implication. That is, if $x$ implies $y$
and $y$ implies $x$ then $x$ and $y$ are equivalent; $x$ is true if and
only if $y$ is true. Therefore, in a Boolean algebra,
%
\begin{itemize}
        \item Elements are ordered by implication.
        \item Equivalent elements imply each other.
\end{itemize}

\paragraph{Boolean Algebra and the Exclusive Or:} Take the trivial
Boolean algebra $(\{\text{false},\text{true}\}, \text{or}, \text{and},
\text{not}, \text{false}, \text{true})$. As every Boolean algebra is a
Boolean ring, we can define an addition operator $\text{xor}$ so that
for any $x,y \in \{\text{false},\text{true}\}$,
%
\begin{equation*}
        x \text{ xor } y 
        \triangleq
        ( x \text{ or } y )
        \text{ and }
        ( \text{not } x \text{ or } \text{not } y )
\end{equation*}
%
And thus,
%
\begin{itemize}
        \item $\text{false} \text{ xor } \text{false} = \text{false}$
        \item $\text{false} \text{ xor } \text{true} = \text{true}$
        \item $\text{true} \text{ xor } \text{false} = \text{true}$
        \item $\text{false} \text{ xor } \text{false} = \text{false}$
\end{itemize}
%
In other words, using the more conventional $0$, $1$, and $+$, $0+0=0$,
$0+1=1$, $1+0=1$, and $1+1=0$. The operation \emph{xor} is known as the
\emph{exclusive or} (as opposed to the \emph{inclusive or} which is
another name for the conventional \emph{or}) and the statement $x \text{
xor } y$ is only true if exactly one of $x$ and $y$ are true. Thus, the
exclusive or can be viewed as addition in a trivial Boolean algebra. In
fact, addition in any Boolean algebra can be conceptualized as a type of
exclusive or. 

\section{Sets of Sets: Order and Algebra}
\label{app:math_sets_sets}

Recall that the power set of a set $\set{U}$ is a set of sets that
contains every subset of $\set{U}$. That is, for any set $\set{U}$ and
any subset $\set{X} \subseteq \set{U}$, it is the case that $\set{X} \in
\Pow(\set{U})$. In other words, for a set $\set{U}$,
%
\begin{equation*}
        \Pow(\set{U}) = \{ \set{X} : \set{X} \subseteq \set{U} \}
\end{equation*}
%
Thus, $\Pow(\set{U})$ serves as a universal set for every subset of
$\set{U}$. Elements of the power set are related by $\subseteq$ and can
generate new subsets with the $\cap$, $\cup$, and ${}^c$ operations.
Therefore, it is interesting to look at $\Pow(\set{U})$ in an order or
algebraic context. This will not only reveal properties of the structure
of the power set but of a large class of sets of sets.

\subsection{The Partially Ordered and Complete Power Set}
\label{app:math_poset_powerset}

The power set has a special ordering with some interesting properties.
Take a nonempty set $\set{S}$. We first show that
$(\Pow(\set{X}),{\subseteq})$ is a partially ordered set and then we
show that it is a complete lattice. In
\longref{app:math_sets_of_subsets_lattices}, we use this as
motivation for a general property of a class of sets of sets.

\paragraph{Power Set as Poset:} Take a \emph{nonempty} set $\set{S}$.
Note that for any subsets $\set{X} \subseteq \set{S}$ and $\set{Y}
\subseteq \set{S}$ and $\set{Z} \subseteq \set{S}$, it is the case that
%
\begin{itemize}
        \item $\set{X},\set{Y},\set{Z} \in \Pow(\set{S})$
        \item $\set{X} \subseteq \set{X}$
        \item if $\set{X} \subseteq \set{Y}$ and $\set{Y} \subseteq
                \set{X}$ then $\set{X} = \set{Y}$
        \item if $\set{X} \subseteq \set{Y}$ and $\set{Y} \subseteq
                \set{Z}$ then $\set{X} \subseteq \set{Z}$
\end{itemize}
%
Therefore, $(\Pow(\set{S}),{\subseteq})$ is a partially ordered set. In
other words, any subset $\setset{S} \subseteq \Pow(\set{S})$ is
partially ordered by $\subseteq$. This is known as being
\emph{(partially) ordered by inclusion}.

\paragraph{Power Set as Complete Lattice:} Take a nonempty set
$\set{S}$. Define $\setset{S}$ to be an arbitrary \emph{nonempty} set of
subsets of $\set{S}$. That is, $\setset{S} \subseteq \Pow(\set{S})$ and
$\setset{S} \neq \emptyset$. Notice that for any set $\set{X} \in
\setset{S}$,
%
\begin{itemize}
        \item $\bigcap \setset{S} \in \Pow(\set{S})$
        \item $\bigcup \setset{S} \in \Pow(\set{S})$
        \item $\bigcap \setset{S} \subseteq \set{X}$
        \item $\set{S} \subseteq \bigcup \setset{S}$
\end{itemize}
%
where $\bigcup \setset{S}$ is the union of all sets included in
$\setset{S}$ and $\bigcap \setset{S}$ is the intersection of sets
included in $\setset{S}$. Therefore, for partially ordered set
$(\Pow(\set{S}),{\subseteq})$ and $\setset{S} \subseteq \Pow(\set{S})$,
%
\begin{itemize}
        \item $\inf \setset{S} = \bigcap \setset{S}$
        \item $\sup \setset{S} = \bigcup \setset{S}$
\end{itemize}
%
and thus $\setset{S}$ has a least upper bound and a greatest lower
bound. Since $\setset{S}$ is an arbitrary subset of $\Pow(\set{S})$,
then $(\Pow(\set{S}),{\subseteq})$ is a \emph{complete lattice}. In this
case, the infimum of $\setset{S}$ is often called its \emph{meet} as it
represents the set of elements common to all sets included in
$\setset{S}$. Similarly, the supremum of $\setset{S}$ is often called
its \emph{join} as it represents the set of all elements collected from
all sets included in $\setset{S}$. Keeping that in mind, note that for
any sets $\set{X},\set{Y} \in \Pow{S}$ (\ie, any $\set{X} \subseteq
\set{S}$ and $\set{Y} \subseteq \set{S}$),
%
\begin{itemize}
        \item $\set{X} \cap \set{Y} \in \Pow(S)$
        \item $\set{X} \cup \set{Y} \in \Pow(S)$
        \item $\set{X} \cap \set{Y} \subseteq \set{X}$ and $\set{Y} \cap
                \set{Y} \subseteq \set{X}$
        \item $\set{X} \subseteq \set{X} \cup \set{Y}$ and $\set{Y}
                \subseteq \set{X} \cup \set{Y}$
\end{itemize}
%
This is the reason why the intersection of two sets is often called
their \emph{meet} and the union of two sets is often called their
\emph{join}. Also note the similarity in the symbols $\cap$ and $\land$,
$\cup$ and $\lor$, $\bigcap$ and $\bigwedge$, $\bigcup$ and $\bigvee$;
this is not coincidental. Finally, note that
%
\begin{itemize}
        \item $\{\} \in \Pow(\set{S})$
        \item $\set{S} \in \Pow(\set{S})$
        \item $\inf \Pow(\set{S}) = \{\}$
        \item $\sup \Pow(\set{S}) = \set{S}$
\end{itemize}
%
Therefore, $\set{S}$ is, of course, a bounded lattice and $\min
\Pow(\set{S}) = \{\}$ and $\max \Pow(\set{S}) = \set{S}$.

\subsection{General Sets of Subsets as Complete Lattices}
\label{app:math_sets_of_subsets_lattices}

As shown in \longref{app:math_poset_powerset}, the power set of any
nonempty set is partially ordered by inclusion and forms a complete
(and therefore bounded) lattice. There must be some subsets of the
power set for which this is also true. In particular, consider a subset
of a power set for which
%
\begin{enumerate}[(i)]
        \item the set is also a poset (\ie, it includes all pairwise
                meets and joins)
                \label{item:subposet}
        \item the intersection or union of any finite or infinite set of
                its elements is also in the set
                \label{item:closure_under_meets_and_joins}
\end{enumerate}
%
Using an argument similar to the one used in
\longref{app:math_poset_powerset}, the subset of the power set must also
be a complete lattice. However, because the ordering is by inclusion
(\ie, $\subseteq$), if property
(\shortref{item:closure_under_meets_and_joins}) is met then it is clear
that property (\shortref{item:subposet}) is also met. That is, a
pairwise meet is an intersection and a pairwise join is a union, and so
the inclusion of all intersections and unions makes it necessary that
pairwise meets and pairwise joins are included. Therefore, as long as a
set of sets is \emph{closed} under arbitrary unions and intersections,
it must be a complete lattice.

\paragraph{Closure Implies Poset:} Take a set $\set{S}$ and a set
$\setset{S} \in \Pow(\set{S})$. Assume that $\setset{S}$ is closed under
arbitrary (possibly infinite) intersections and unions. In other words,
for any subset $\setset{S}_0 \subseteq \setset{S}$, it is the case that
%
\begin{equation*}
        \bigcap \setset{S}_0 \in \setset{S}
        \quad \text{ and } \quad
        \bigcup \setset{S}_0 \in \setset{S}
\end{equation*}
%
It has already been shown that $(\Pow(\set{S}),{\subseteq})$ is a
partially ordered set. In particular, it has been shown that for all
$\set{X},\set{Y} \in \setset{S}$, 
%
\begin{itemize}
        \item $\set{X} \cap \set{Y} \in \Pow(\set{S})$
        \item $\set{X} \cap \set{Y} \subseteq \set{X}$ and $\set{X} \cap
                \set{Y} \subseteq \set{Y}$
        \item $\set{X} \cup \set{Y} \in \Pow(\set{S})$
        \item $\set{X} \subseteq \set{X} \cup \set{Y}$ and $\set{Y}
                \subseteq \set{X} \cup \set{Y}$
\end{itemize}
%
However, since $\setset{S}$ is closed under intersections and unions,
then $\set{X} \cap \set{Y} \in \setset{S}$ and $\set{X} \cup \set{Y} \in
\setset{S}$. Therefore, $(\setset{S},{\subseteq})$ must also be a poset.

\paragraph{Closure Implies Complete Lattice:} Take a set $\set{S}$ and a
set $\setset{S} \in \Pow(\set{S})$. As before, assume that $\setset{S}$
is closed under arbitrary (possibly infinite) intersections and unions.
Thus, $(\setset{S},{\subseteq})$ must also be a poset. It is clear to
see that due to closure of $\setset{S}$ under intersections and unions,
for any subset $\setset{S}_0 \subseteq \setset{S}$,
%
\begin{itemize}
        \item $\bigcap \setset{S}_0 \in \setset{S}$
        \item $\inf \setset{S}_0 = \bigcap \setset{S}_0$
        \item $\bigcup \setset{S}_0 \in \setset{S}$
        \item $\sup \setset{S}_0 = \bigcup \setset{S}_0$
\end{itemize}
%
Therefore, any set of sets that is closed under arbitrary (possibly
infinite) intersections and unions is a complete lattice when ordered by
inclusion (\ie, $\subseteq$). That is, for any set $\set{X}$ closed
under arbitrary intersections and unions, $(\set{X},{\subseteq})$ is a
complete lattice (and therefore a bounded lattice as well). Recall that
the symbol $\bigwedge$ (meet) will sometimes be used for $\inf$
(infimum) and $\bigvee$ (join) will sometimes be used for $\sup$
(supremum); this is often the case for sets of sets (especially when the
set of sets is closed under arbitrary intersections (meets) and unions
(joins)).


\subsection{Filters on Sets}
\label{app:math_filters_on_sets}

The application of filters from \longref{app:math_filters_on_posets} has
important uses in \emph{topology}, the subject of
\longref{app:math_topology}. A framework of filters on sets allows for
the discussion of overall trends of infinite sets. Therefore, it is
useful for us to introduce them. Recall from
\longref{app:math_poset_powerset} that for any set $\set{S}$,
$(\Pow(\set{S}),{\subseteq})$ is a partially ordered set.

\paragraph{Filter Bases:} Take a set $\set{S}$ and a \emph{nonempty} set
$\setset{B} \subseteq \Pow(\set{S})$ (\ie, $\setset{B}$ is a set of
subsets of $\set{S}$ and $\setset{B} \neq \emptyset$) where
%
\begin{enumerate}[(i)]
        \item $\emptyset \notin \setset{B}$ (and, again, $\setset{B}
                \neq \emptyset$)
        \item for any $\set{X} \in \setset{B}$ and $\set{Y} \in
                \setset{B}$, there exists a $\set{T} \in \setset{B}$
                such that $\set{T} \subseteq \set{X} \cap \set{Y}$
\end{enumerate}
%
In this case, $\setset{B}$ is called a \emph{filter base on set
$\set{S}$}. Note that $\setset{B}$ satisfies the conditions for a
proper filter base on poset $(\Pow(\set{S}),{\subseteq})$. Note that for
any elements $\set{X},\set{Y},\set{Z} \in \setset{B}$,
%
\begin{itemize}
        \item $\set{X} \supseteq \set{X}$
        \item if $\set{X} \supseteq \set{Y}$ and $\set{Y} \supseteq
                \set{Z}$ then $\set{X} \supseteq \set{Z}$
        \item there exists a $\set{T} \in \set{B}$ such that $\set{X}
                \supseteq \set{T}$ and $\set{Y} \supseteq \set{T}$
\end{itemize}
%
Therefore, $(\setset{B},{\supseteq})$ is a directed set and
$(\setset{B},{\subseteq})$ is a downward directed set. Therefore, filter
bases on sets are said to be \emph{downward directed} by $\subseteq$
(\ie, downward directed by inclusion).

\paragraph{Filters:} Take set $\set{S}$ and a set $\setset{F} \subseteq
\Pow(\set{S})$ where
%
\begin{enumerate}[(i)]
        \item if $\set{X} \in \setset{F}$ and $\set{Y} \in \setset{F}$
                then $\set{X} \cap \set{Y} \in \setset{F}$
        \item if $\set{X} \in \setset{F}$ and $\set{Y} \subseteq
                \set{S}$ with $\set{X} \subseteq \set{Y}$ then $\set{Y}
                \in \setset{F}$
        \item $\emptyset \notin \setset{F}$
        \item $\set{S} \in \setset{F}$
\end{enumerate}
%
In this case, $\setset{F}$ is called a \emph{filter on (nonempty) set
$\set{S}$}. Note that 
%
\begin{itemize}
        \item $\setset{F}$ satisfies the conditions for a proper filter
                on poset $(\Pow(\set{S}),{\subseteq})$
        \item $\setset{F} \neq \emptyset$ since $\set{S} \in \setset{F}$
        \item every filter on a set is also a filter base on the set
\end{itemize}
%
As we will discuss, because every filter is also a filter base, results
will usually be given in terms of filter bases rather than filters.

\paragraph{Filters from Filter Bases:} Take set $\set{S}$. Assume
$\setset{B}$ is a filter base on set $\set{S}$. Define $\setset{F}$ with
%
\begin{equation*}
        \setset{F}
        \triangleq
        \{ 
        \set{A} \subseteq \set{S}
        :
        \text{there exists } \set{B} \in
        \setset{B} \text{ such that } \set{B} \subseteq \set{A}
        \}
\end{equation*}
%
That is, $\setset{F}$ is the set of all subsets of $\set{S}$ that
contain a set in the filter base $\setset{B}$. In this case,
$\setset{F}$ is a filter on set $\set{S}$, and the filter $\setset{F}$
is said to be \emph{spanned} or \emph{generated} by the filter base
$\setset{B}$. In other words, a filter basis on a set completely
specifies a filter on that set. Therefore, it is common to generate
results with filter bases since any filter base generates a
corresponding filter.

\paragraph{Filter Base Refinements:} Take set $\set{S}$. Assume
$\setset{B}$ and $\setset{C}$ are filter bases on set $\set{S}$. Assume
that for all $\set{B} \in \setset{B}$, there is a $\set{C} \in
\setset{C}$ such that $\set{C} \subseteq \set{B}$. In this case, filter
base $\setset{C}$ is said to be \emph{finer} than filter base
$\setset{B}$. It is also said that a filter that is finer than another
filter is a \emph{refinement} of that filter, so $\setset{C}$ is a
refinement of $\setset{B}$. Note that if $\setset{C}$ is finer than
$\setset{B}$ and there is another filter base $\setset{D}$ on $\set{S}$
that is finer than $\setset{C}$ then $\setset{D}$ is finer than
$\setset{B}$.

\paragraph{Equivalent Filter Bases:} Take set $\set{S}$ and filter bases
$\setset{B}$ and $\setset{C}$ on set $\set{S}$. If $\setset{B}$ is a
finer than $\setset{C}$ and $\setset{C}$ is finer than $\setset{B}$
then $\setset{B}$ and $\setset{C}$ are said to be \emph{equivalent}
filter bases.

\paragraph{Functions of Filter Bases:} Take sets $\set{X}$ and $\set{Y}$
and a function $f: \set{X} \mapsto \set{Y}$. Now take filter base
$\setset{B}$ on $\set{X}$. It can be shown that $f\{ \setset{B} \}$ is a
filter base on $\set{Y}$.

\subsection{Nets and Sequences as Filters}
\label{app:math_nets_and_sequences_as_filters}

Take directed set $(\set{A},{\leq})$. Also take the set $\setset{A}$
defined by
%
\begin{equation*}
        \setset{A}
        \triangleq
        \{ 
        \{ \alpha \in \set{A} : \alpha_0 \leq \alpha \} 
        : \alpha_0 \in \set{A}
        \}
\end{equation*}
%
In other words, $\setset{A}$ is a set of \emph{tails} of the directed
set $\set{A}$. In fact, it is easy to verify that $\setset{A}$ is a
filter base. Therefore, $\setset{A}$ is called the \emph{filter base of
tails} of the directed set $\set{A}$.

\paragraph{Filter Bases Generated by Nets:} Take set $\set{X}$ and
directed set $(\set{A},{\leq})$ as well as the net $(x_\alpha)$ from
$\set{A}$ to $\set{X}$. Now, take the set $\setset{X}$ defined by
%
\begin{equation*}
        \setset{X}
        \triangleq
        \{ 
        \{ x_\alpha : \alpha \in \set{A}, \alpha_0 \leq \alpha \} 
        : \alpha_0 \in \set{A}
        \}
\end{equation*}
%
It is the case that $\setset{X}$ is a filter base. In fact, $\setset{X}$
is called the \emph{filter base of tails of net} $(x_\alpha)$ or the
\emph{filter base generated by net} $(x_\alpha)$.

\paragraph{Filter Bases Generated by Sequences:} Take set $\set{X}$,
totally ordered set $(\N,{\leq})$, and the sequence $(x_n)$ with
codomain $\set{X}$. Therefore, 
%
\begin{equation*}
        \setset{A}
        \triangleq
        \{ \{ n \in \N : n_0 \leq n \} : n_0 \in \N \}
\end{equation*}
%
is the \emph{filter base of tails} of the totally ordered set $\N$, and
%
\begin{equation*}
        \setset{X}
        \triangleq
        \{ \{ x_n : n \in \N, n_0 \leq n \} : n_0 \in \N \}
\end{equation*}
%
is the \emph{filter base generated by sequence} $(x_n)$.

\paragraph{Filters as General Framework:} Every sequence is a net, and
every net generates a filter. Thus, any statement about filters also
holds with nets and sequences. In fact, as we will discuss, a general
framework has been built based on filters that agrees with the expected
results derived independently with nets and sequences. Therefore, any
result that we derive for a filter will have a consistent result for a
any net or sequence that generates that filter as well.

\paragraph{Nets as Functions:} Take a directed set $(\set{A},{\leq})$
and a set $\set{X}$ and a net $(x_\alpha)$ with domain $\set{A}$ and
codomain $\set{X}$. By definition, nets are indexed families which are
functions, and so nets are functions. Therefore, for sake of notation,
define the function $f: \set{A} \mapsto \set{X}$ as
%
\begin{equation*}
        f(\alpha) \triangleq x_\alpha
\end{equation*}
%
for all $\alpha \in \set{A}$. Recall that for any $\set{B} \subseteq
\set{A}$, the image of $\set{B}$ under $f$ is $f[B]$. Now, define the
set $\setset{A}$ as
%
\begin{equation*}
        \setset{A}
        \triangleq
        \{ 
        \{ \alpha \in \set{A} : \alpha_0 \leq \alpha \} 
        : \alpha_0 \in \set{A}
        \}
\end{equation*}
%
As discussed, this is the filter base of tails of $\set{A}$, and it is
certainly a filter base on $\set{A}$. Therefore, $f\{\setset{A}\}$ (\ie,
the image of filter base $\setset{A}$ under $f$) must also be a filter
base. In fact, it is clear that
%
\begin{equation*}
        f\{ \setset{A} \}
        =
        \{ 
        \{ x_\alpha : \alpha \in \set{A}, \alpha_0 \leq \alpha \} 
        : \alpha_0 \in \set{A}
        \}
\end{equation*}
%
which is the filter base of tails of the net $(x_\alpha)$. Therefore the
filter base of tails of $\set{A}$ are related to the filter base of
tails of net $(x_\alpha)$ by the function $f$ which defines the net. Of
course, we could pick any filter base $\setset{B}$ on $\set{A}$ and
generate a new filter base $f\{\setset{B}\}$ on $\set{X}$. As we will
show in \longref{app:math_topology}, the analysis of images of filter
bases under nets is actually the analysis of the \emph{limits} of nets
(and sequences).

\subsection{Algebras, Subalgebras, and Fields of Sets}
\label{app:math_algebras_of_sets}

Take a set $\set{U}$ and its power set $\Pow(\set{U})$. It has been
shown that $(\Pow(\set{U}),{\subseteq})$ is a complete lattice; that is,
$\Pow(\set{U})$ can be viewed as being ordered by $\subseteq$. However,
it should be clear that for set $\set{U}$, its power set $\Pow(\set{U})$
forms a Boolean algebra and therefore also a Boolean ring. That is, 
%
\begin{equation*}
        ( \Pow(\set{U}), {\symdiff}, {\cap}, \emptyset, \set{U} )
\end{equation*}
%
is a Boolean ring and
%
\begin{equation*}
        ( \Pow(\set{U}), {\cup}, {\cap}, {{}^c}, \emptyset, \set{U} )
\end{equation*}
%
is a Boolean algebra. And so every power set is 
%
\begin{itemize}
        \item a Boolean algebra
        \item a Boolean ring
        \item a commutative ring
        \item an algebra over a ring
\end{itemize}
%
Therefore, the power set is called an \emph{algebra of sets}. Because of
this, $\Pow(\set{U})$ is ordered by a relation $\leq$ where for any
$\set{X},\set{Y} \in \Pow(\set{U})$, $\set{X} \leq \set{Y}$ if and only
if $\set{X} \cap \set{Y} = \set{X}$. However, this is the definition of
$\subseteq$. Therefore, ordering $\Pow(\set{U})$ by $\subseteq$ simply
follows from it being a Boolean algebra; that is, \emph{since}
$\Pow(\set{U})$ forms a Boolean algebra then
%
\begin{itemize}
        \item $(\Pow(\set{U}),{\subseteq})$ is a partially ordered set
        \item $(\Pow(\set{U}),{\subseteq})$ is bounded lattice with
                greatest element $\set{U}$ and least element $\emptyset$
        \item for $\set{X},\set{Y} \in \Pow(\set{U})$, if $\set{X}
                \subseteq \set{Y}$ then $\set{Y}^c \subseteq \set{X}^c$
\end{itemize}
%
In fact, as was already shown, $(\Pow(\set{U}),{\subseteq})$ is a
complete lattice. This justifies the statement that any set $\set{X} \in
\Pow(\set{U})$ is \emph{smaller} than a set $\set{Y} \in \Pow(\set{U})$
if $\set{X} \subseteq \set{Y}$ (\ie, $\set{X} \cap \set{Y} = \set{X}$).
When sets are related by order terminology, the implicit order relation
is $\subseteq$ (\ie, substitute $\subseteq$ for $\leq$).

\paragraph{Subalgebras as Fields of Sets:} Just as any Boolean algebra
can have subalgebras, the Boolean algebra formed by the power set has
subalgebras formed by sets of sets. Take a universal set $\set{U}$ and
the Boolean algebra $( \Pow(\set{U}), {\cup}, {\cap}, {{}^c}, \emptyset,
\set{U} )$. We will call this the \emph{power set Boolean algebra of
$\set{U}$}.

Take a subset $\setset{S} \subseteq \Pow(\set{U})$ (\ie, a set of
subsets of $\set{U}$). To say $\setset{S}$ forms a \emph{subalgebra of
the power set Boolean algebra of $\set{U}$} means that
%
\begin{enumerate}[(i)]
        \item for any $\set{X},\set{Y} \in \setset{S}$, $\set{X} \cap
                \set{Y} \in \set{S}$
                \label{item:boolean_algebra_closure_intersection}
        \item for any $\set{X},\set{Y} \in \setset{S}$, $\set{X} \cup
                \set{Y} \in \set{S}$
                \label{item:boolean_algebra_closure_union}
        \item for any $\set{X} \in \setset{S}$, $\set{X}^c \in
                \setset{S}$
                \label{item:boolean_algebra_closure_complement}
\end{enumerate}
%
This will ensure that every one of the requirements for a Boolean
algebra hold, thus justifying calling $\setset{S}$ a subalgebra of the
power set Boolean algebra. Assume that $\setset{S}$ is a subalgebra of
the power set Boolean algebra and take $\set{X} \in \setset{S}$. 
%
\begin{itemize}
        \item By property
                (\shortref{item:boolean_algebra_closure_complement}),
                $\set{X}^c \in \setset{S}$.
        \item By property
                (\shortref{item:boolean_algebra_closure_intersection}),
                since $\set{X} \in \setset{S}$ and $\set{X}^c \in
                \setset{S}$ then $\set{X} \cap \set{X}^c \in
                \setset{S}$; however, $\set{X} \cap \set{X}^c =
                \emptyset$ and so $\emptyset \in \setset{S}$. 
        \item Additionally, by property
                (\shortref{item:boolean_algebra_closure_union}), since
                $\set{X} \in \setset{S}$ and $\set{X}^c \in \setset{S}$
                then $\set{X} \cup \set{X}^c \in \setset{S}$; however,
                $\set{X} \cup \set{X}^c = \set{X}$ and so $\set{X} \in
                \setset{S}$. 
\end{itemize}
%
Thus, the trivial subalgebra of the power set Boolean algebra is
$\{\emptyset,\set{X}\}$. Any subalgebra of the power set Boolean algebra
of $\set{U}$ is \emph{algebra of sets} called an \emph{algebra over
$\set{U}$}. If $\setset{S}$ is an algebra over $\set{U}$, then
$(\set{U},\setset{S})$ is called a \emph{field of sets} and elements of
$\set{U}$ are called \emph{points}.

\section{The Numbers}
\label{app:math_numbers}

Now that we have described the set operations and have introduced basic
algebra, we will define numbers and arithmetic. First we will revisit
the whole numbers and natural numbers in detail, and then use them to
build integers and rational numbers. Once rational numbers are defined,
we will be able to define distance and limits and use these notions to
build the real numbers. A more complete and yet similarly structured
discussion of these number systems is given by \citet{Stoll79}.

\subsection{Whole Numbers}
\label{app:math_whole_numbers}

The basis for mathematics is counting. That is, before any argument or
analysis can be made quantitative, something must be counted.
Mathematics provides the \emph{whole numbers} as an abstract quantity
capturing the essence of counting. Each whole number represents how many
there are of a particular object.

\paragraph{Definition:} We have already introduced the natural numbers
and the whole numbers. Recall that the set of the whole numbers is
denoted \symdef{Bnumbers.2}{wholes}{$\W$}{the set of the whole numbers
(\ie, $\{0,1,2,3,\dots\}$)} and is defined by
%
\begin{align*}
        \W \triangleq \{0,1,2,3,\dots\}
\end{align*}
%
and the set of the natural numbers is denoted
\symdef{Bnumbers.1}{naturals}{$\N$}{the set of the natural numbers (\ie,
$\{1,2,3,\dots\}$)} and defined to be a subset of $\W$, namely 
%
\begin{align*}
        \N \triangleq \W \setdiff \{0\} = \{1,2,3,\dots\}
\end{align*}
%
where each whole number is defined by
%
\begin{align*}
        0 &= \{\}\\
        1 &= 0 \cup \{0\} = \{0\}\\
        2 &= 1 \cup \{1\} = \{0,1\}\\
        3 &= 2 \cup \{2\} = \{0,1,2\}\\
        4 &= 3 \cup \{3\} = \{0,1,2,3\}\\
        &\mathrel{\vdots}
\end{align*}
%
It is important to note that whole numbers are simple sets that carry
with them the standard set relations $=$, $\subseteq$, $\supseteq$,
$\subset$, and $\supset$. That is, for two whole numbers $x,y \in \W$,
it is only the case that $x = y$ if $x \subseteq y$ and $y \subseteq x$.
In other words, the \emph{equivalence relation} $=$ on $\W$ is the same
equivalence relation defined for sets. Clearly,
%
\begin{equation}
        0 \subseteq 
        1 \subseteq 2 \subseteq 3 \subseteq 4 \subseteq \cdots
        \label{eq:whole_number_subseteq_order}
\end{equation}
%
In fact,
%
\begin{equation}
        0 \subset 1 \subset 2 \subset 3 \subset 4 \subset \cdots
        \label{eq:whole_number_subset_order}
\end{equation}
%
In other words, for any $x,y \in \W$, $x \neq y$. Also, for any two
whole numbers $x,y \in \W$ such that $x \subset y$, it is also the case
that $x \in y$.

\paragraph{Successor Function:} Now define the \emph{successor function}
$S: \W \mapsto \N$ by
%
\begin{align*}
        S(x) \triangleq x \cup \{x\} 
\end{align*}
%
Thus, $S$ is a function that maps any whole number $x$ to its successor
$x \cup \{x\}$. For example, $S(0)=1$ and $S(3)=4$. Note that since
every whole number has a unique successor this function is injective.
Additionally, since every natural number is a successor of a whole
number, this function is bijective and therefore its inverse $S^{-1}$
exists. For example, $S^{-1}(1)=0$ and $S^{-1}(4)=3$. 

\paragraph{Addition:} Define the \emph{addition} operator $+$ so that
for any two whole numbers $x,y \in \W$,
%
\begin{align*}
        x + 0 \triangleq x 
        \quad \text{and} \quad 
        x + S(y) \triangleq S(x+y)
\end{align*}
%
where the result of an addition is called the \emph{sum}. For example,
take $5 + 1$. Since the successor function $S$ is bijective then the
right argument $7$ can be rewritten as $S(S^{-1}(1))$ as $S \comp
S^{-1}$ is the identity function on $\N$.  However, $S^{-1}(1)=0$, and
so $5 + 1$ can be rewritten as $5 + S(0)$.  By the definition of
addition, this is $S(5+0)$. However, also by the definition of addition
with $0$, $5+0=5$. Therefore, the result is $S(5)$ or $6$. This process
can be applied to any operation $x + y$ where $x,y \in \W$. Note that
this operator is \emph{commutative}. That is, for any two whole numbers
$x,y \in \W$, it is the case that $x + y = y + x$.  Additionally, this
operator is \emph{associative}. That is, for any three whole numbers
$x,y,z \in \W$, it is the case that $x+(y+z)=(x+y)+z$. Also, since $x +
0 = x$ for any whole number $x \in \W$, $0$ is known as the
\emph{additive identity} for the whole numbers.

\paragraph{Multiplication:} Similarly, define the \emph{multiplication}
operator $\times$ so that for any two whole numbers $x,y \in \W$,
%
\begin{align*}
        x \times 0 \triangleq 0 
        \quad \text{and} \quad 
        x \times S(y) \triangleq (x \times y) + x
\end{align*}
%
where the parentheses indicate that $(x \times y)$ should be viewed as
the left argument of the addition operator. Parentheses will often be
used to instruct that an arithmetic operation should occur first;
otherwise operations will occur from left to right. The result of a
multiplication is known as a \emph{product}. This definition of
multiplication can be applied in a similar fashion as the definition for
addition above. For example, for any whole number $x \in \W$
%
\begin{align*}
        x \times 1
        &= x \times S(S^{-1}(1))\\
        &= x \times S(0)\\
        &= (x \times 0) + x\\
        &= 0 + x\\
        &= x\\
\end{align*}
%
This is why $1$ is known as the \emph{multiplicative identity} for the
whole numbers. It can also be shown that the multiplication operator is
\emph{commutative}; that is, for any two whole numbers $x,y \in \W$, it
is the case that $x \times y = y \times x$. Additionally, this operator
is \emph{associative}. That is, for any three whole numbers $x,y,z \in
\W$, it is the case that $x \times ( y \times z ) = (x \times y) \times
z$. Finally, note that if there are two whole numbers $x,y \in \W$ such
that $x \times y = 0$, it must be that $x=0$, $y=0$, or both.

When an operation involves both multiplication and addition, the
multiplication operations should occur first unless grouping symbols
like parentheses indicate that certain operations should occur first.
Additionally, it can be shown that for any three whole numbers $x,y,z
\in \W$,
%
\begin{equation*}
        x \times (y + z) = x \times y + x \times z
\end{equation*}
%
This is the \emph{distributive} property of whole number multiplication.
Also note that for any two whole numbers $x,y \in \W$, the notation $xy$
or $x \cdot y$ is equivalent to $x \times y$.  Unfortunately, the use of
$\times$ for multiplication creates some ambiguity with the Cartesian
product. However, it is rare to take the Cartesian product of two whole
numbers. 

\paragraph{Exponentiation:} Now that multiplication has been defined for
the whole numbers, exponentiation can also be defined. For any three
whole numbers $x, a, b \in \W$, exponentiation of the whole numbers is
such that
%
\begin{align*}
        x^0 &\triangleq 1\\
        x^1 &\triangleq x\\
        x^{a+b} &\triangleq x^a \times x^b\\
        (x^a)^b &\triangleq x^{a \times b}\\
        (a \times b)^x &\triangleq a^x \times b^x
\end{align*}
%
For example, take a whole number $x \in \W$ and the exponentiation
$x^3$. The following represents the successive steps that can be used to
derive an equivalent expression for $x^3$ that does not involve
exponentiation.
%
\begin{align*}
        x^3 
        &= x^{2+1}\\
        &= x^2 x^1\\
        &= x^2 x\\
        &= x^{1+1} x\\
        &= x^1 x^1 x\\
        &= x^1 x x\\
        &= x x x
\end{align*}
%
In this case, the exponentiation $x^3$ is thus a shorthand for $x \times
x \times x$. Note that for any $x \in \W$, $x^2 \geq 0$.

\paragraph{Even and Odd Whole Numbers:} Take whole number $y \in \W$.
%
\begin{itemize}
        \item If it is the case that there exists another whole number
                $x \in \W$ such that $y = 2x$ then $y$ is called an
                \emph{even} whole number. 
        \item If it is the case that there exists another whole number
                $x \in \W$ such that $y = 2x+1$ then $y$ is called an
                \emph{odd} whole number. 
\end{itemize}
%
It can be shown that for every whole number $y \in \W$, $y$ is either an
even number or an odd number but not both. That is, the sets
%
\begin{equation*}
        \W_E \triangleq \{ w \in \W : w \text{ is even} \}
        \quad \text{ and } \quad
        \W_O \triangleq \{ w \in \W : w \text{ is odd} \}
\end{equation*}
%
are mutually exclusive and collectively exhaustive in $\W$ (\ie, $\W_E
\cap \W_O = \emptyset$ and $\W_E \cup \W_O = \W$). Therefore, $\{ \W_E,
\W_O \}$ is a partition of $\W$.

Assume that $x,y \in \W$. Also assume that $x$ is even. Thus, there
exists a $z \in \W$ such that $x = 2z$. Therefore,
%
\begin{equation*}
        x y = (2z) y = 2zy = 2 (zy)
\end{equation*}
%
That is, $xy$ must also be an even whole number. Now take $x,y \in \W$
as before, but assume that $x$ and $y$ are both odd. Then there must
exist $v,w \in \W$ such that $x = 2v+1$ and $y = 2w+1$. Then
%
\begin{align*}
        x y 
        &= (2v + 1)(2w + 1) 
        = 2v2w + 2v + 2w + 1 = 4vw + 2v + 2w + 1\\
        &= 2(vw + v + w) + 1
\end{align*}
%
Therefore $xy$ must also be an odd number. To summarize,
%
\begin{itemize}
        \item The product of two odd whole numbers is odd.
        \item The product of an even whole number with any other whole
                number is even.
\end{itemize}

\paragraph{Total Ordering:} Now that addition has been defined, a
\emph{total order} can be defined on the whole numbers. For any two
whole numbers $x,y \in \W$, it is said that $x$ is less than or equal to
$y$ (denoted $x \leq y$) if there exists another whole number $z \in \W$
such that $x + z = y$, and it is said that $x$ is strictly less than $y$
(denoted $x < y$) if there exists a natural number $z \in \N$ such that
$x + z = y$. Note that the phrases $x \leq y$ and $x < y$ can be written
$y \geq x$ and $y > x$ respectively. In this case, the symbol $>$
($\geq$) represents a greater than (or equal to) relationship. Note that
%
\begin{equation*}
        0 \leq 1 \leq 2 \leq 3 \leq 4 \leq \cdots
\end{equation*}
%
and, in fact,
%
\begin{equation*}
        0 < 1 < 2 < 3 < 4 < \cdots
\end{equation*}
%
Recall the subset relationships in
\longrefs{eq:whole_number_subseteq_order} and
\shortref{eq:whole_number_subset_order}. Clearly, the \emph{inequality
order relations} $\leq$ and $<$ for the whole numbers have been
constructed to match the relationships already in place by $\subseteq$
and $\subset$ respectively. In fact, it is the case that for any two
whole numbers $x,y \in \W$, $x \leq y$ if and only if $x \subseteq y$,
and similarly for any two whole numbers $x,y \in \W$, $x < y$ if and
only if $x \subset y$.

\paragraph{Lack of Dense Ordering:} Note that for $2$ and $3$, it is the
case that $2 < 3$; however, there is no whole number $z \in \W$ such
that $2 < z < 3$. This can be shown analytically by using the definition
of the successor function. Because of this, $\W$ cannot be densely
ordered. In fact, $\N$ is also not densely ordered for the same reason.

\paragraph{Gaplessness:} It is easy to show that both $(\W,{\leq})$ and
$(\N,{\leq})$ are \emph{gapless}. Of course, since $\W$ and $\N$ both
lack an upper bound then neither are complete.

\paragraph{Existence of Minima and Maxima:} Because both $(\W,{\leq})$
and $(\N,{\leq})$ are gapless and \emph{not} densely ordered then any
subset that is bounded from above has a maximum element and any subset
that is bounded from below has a minimum element. However, because $\W$
is bounded from below by $0$ and $\N \subseteq \W$ then $\N$ has a
minimum element, namely $1$. Therefore, all subsets of either $\N$ or
$\W$ must have minimum elements. This important fact will be used in
\longref{app:math_countability_and_order} to show that nontrivial
densely ordered sets that are gapless (\eg, the \emph{real numbers}
discussed in \longref{app:math_reals}) must also be uncountable.

\paragraph{Cardinal Arithmetic:} Now that arithmetic has been defined
for $\W$, we state without proof that for any two \emph{finite} sets
$\set{X}$ and $\set{Y}$,
%
\begin{align*}
        |\set{X} \times \set{Y}| &= |\set{X}||\set{Y}|\\
        |\set{X}^\set{Y}| &= |\set{X}|^{|\set{Y}|}
\end{align*}
%
For example, $\{(0,0),(0,1),(1,0),(1,1)\}$ can be represented as
$\{0,1\} \times \{0,1\}$ or as $\{0,1\}^{\{0,1\}}$ (\ie, $\{0,1\}^2$).
Clearly this set has cardinality $4$, and the cardinality of both of
these representations is also $4$ by the above two rules. 

Assume that there are two \emph{finite} sets $\set{X}$ and $\set{Y}$
such that $\set{X} \cap \set{Y} = \emptyset$ (\ie, they have no shared
elements) then
%
\begin{align*}
        |\set{X} \cup \set{Y}| &= |\set{X}|+|\set{Y}|
\end{align*}
%
If the intersection of finite sets $\set{X}$ and $\set{Y}$ is not empty
then this only provides a bound on the intersection's cardinality. That
is, in general for any two finite sets $\set{X}$ and $\set{Y}$, it is
always the case that
%
\begin{align*}
        |\set{X} \cup \set{Y}| &\leq |\set{X}|+|\set{Y}|
\end{align*}
%
For example, the union $\{0,1\} \cup \{2,3\}$ has cardinality $4$ while
union $\{0,1\} \cup \{0,1\}$ has cardinality $2$, which is less than
$4$. Cardinal arithmetic can be extended to infinite sets as well;
however, the cardinality of infinite sets is not critically important to
this work.

Recall that for any set $\set{X}$, the power set $\Pow(\set{X})$
is congruent to the set $2^\set{X}$; therefore, since $|2|=2$, the
cardinality $|\Pow(\set{X})|=2^{|\set{X}|}$. It can be shown that for
any whole number $x \in \W$, $2^x \geq x$. This means that for any set
$\set{X}$, the power set $2^\set{X}$ (\ie, $\Pow(\set{X})$) is has a
cardinality greater than or equal to the cardinality of set $\set{X}$.
That is, for any set $\set{X}$,
%
\begin{align*}
        |\Pow(\set{X})| \geq |\set{X}|
\end{align*}
%
In fact, for any non-empty set $\set{X}$, the cardinality of the power
set $\Pow(\set{X})$ is strictly greater than the cardinality of set
$\set{X}$. That is, for any non-empty set $\set{X}$, 
%
\begin{align*}
        |\Pow(\set{X})| > |\set{X}|
\end{align*}
%
In other words, in some sense all sets have a smaller or equal size as
their power sets, with equality only when the set is empty.

\paragraph{Algebraic Structure of the Whole Numbers:} Note that for
$(\W,{+},0)$, it is the case that
%
\begin{itemize}
        \item for all $x,y \in \W$, $x + y = y + x$
        \item for all $x,y,z \in \W$, $(x + y) + z = x + (y + z)$
        \item for all $x \in \W$, $0 + x = x + 0 = x$
\end{itemize}
%
and for $(\W,{\times},1)$, it is the case that
%
\begin{itemize}
        \item for all $x,y \in \W$, $x \times y = y \times x$
        \item for all $x,y,z \in \W$, $(x \times y) \times z = x \times
                (y \times z)$
        \item for all $x \in \W$, $1 \times x = x \times 1 = x$
\end{itemize}
%
And so for $(\W,{+},{\times},0,1)$,
%
\begin{itemize}
        \item $(\W,{+},0)$ is a \emph{commutative monoid}
        \item $(\W,{\times},1)$ is a \emph{commutative monoid}
        \item for each $x,y,z \in \W$, $x(y + z) = xy + yz$ and
                $(x + y)z = xz + yz$
\end{itemize}
%
Therefore, $(\W,{+},{\times},0,1)$ is a \emph{commutative semiring}.
Unless otherwise noted, whenever $\W$ is used, it is assumed that it is
equipped with operators $+$ and $\times$ and order relation $\leq$; in
other words, $\W$ is implicitly taken to be
$(\W,{+},{\times},0,1,{\leq})$.

\paragraph{Algebraic Structure of the Natural Numbers:} Note that for
the magma $(\N,{+})$, it is the case that
%
\begin{itemize}
        \item for all $x,y \in \N$, $x + y = y + x$
        \item for all $x,y,z \in \N$, $(x + y) + z = x + (y + z)$
\end{itemize}
%
and for $(\W,{\times},1)$, it is the case that
%
\begin{itemize}
        \item for all $x,y \in \N$, $x \times y = y \times x$
        \item for all $x,y,z \in \N$, $(x \times y) \times z = x \times
                (y \times z)$
        \item for all $x \in \N$, $1 \times x = x \times 1 = x$
\end{itemize}
%
Therefore, $(\N,{+})$ is a \emph{commutative semigroup} and
$(\N,{\times},1)$ is \emph{commutative monoid}. Since $(\N,{+})$ has no
identity element, there is no structure identified with
$(\N,{+},{\times})$. Unless otherwise noted, whenever $\N$ is used, it
is assumed that it is equipped with operators $+$ and $\times$ and order
relation $\leq$; in other words, $\N$ is implicitly taken to be
$(\N,{+},{\times},{\leq})$ (with multiplicative identity $1$).

\subsection{Integers}

Now that whole numbers have been defined so that items can be counted,
it is useful to define a number system that can compare two quantities.
That is, while the whole numbers are ordered, it is possible to further
quantity where in the ordering each whole number is with respect to some
other whole number. In other words, a common framework for describing
some sort of distance between whole numbers is needed. This common
framework comes in the form of the \emph{integers}.

\paragraph{Definition:} Just as the whole numbers are defined to be sets
of other whole numbers, the \emph{integers} are defined to be
equivalence classes of ordered pairs of two whole numbers. In
particular, for an element $(p,q) \in \W \times \W$, define the
equivalence relation $=$ on $\W \times \W$ so that for whole numbers
$p,q,r,s \in \W$, the elements $(p,q),(r,s) \in \W \times \W$ are
\emph{equal} if and only if
%
\begin{equation}
        p+s = q+r
        \label{eq:integer_equivalence_relation}
\end{equation}
%
Each integer is then defined as an equivalence class $[(p,q)]$ where
$(p,q) \in \W \times \W$. That is, the set of integers 
\symdef{Bnumbers.3}{integers}{$\Z$}{the set of the integers (\ie,
$\{\dots,-3,-2,-1,0,1,2,3,\dots\}$)} is defined to be the quotient set
%
\begin{equation*}
        \Z 
        \triangleq 
        (\W \times \W)/{=}
\end{equation*}
%
where the equivalence relation $=$ is given by
\longref{eq:integer_equivalence_relation}. 

\paragraph{Symbols:} For every natural number $p \in \N$, define two
symbols $p^*$ and $-p^*$. The symbol $p^*$ represents the integer that
includes $(p,0)$ in its equivalence class. Integers of this form are
called \emph{positive integers}. The symbol $-p^*$ represents the
integer that includes $(0,p)$ in its equivalence class. Integers of this
form are called \emph{negative integers}. For whole number $0$, define
symbol $0^*$ that includes $(0,0)$ in its equivalence class. In other
words, define the symbols 
%
\begin{align*}
        &\mathrel{\vdots}\\
        -q^*
        &\triangleq [(0,q)] 
        = \{ (p,p+q): \text{ for all $p \in \W$} \}
        \text{ for all $q \in \N$}\\
        &\mathrel{\vdots}\\
        -3^* 
        &\triangleq [(0,3)] 
        = \{ (p,p+3): \text{ for all $p \in \W$} \}\\
        -2^* 
        &\triangleq [(0,2)] = \{ (0,2), (1,3), (2,4), (3,5), \dots \}\\
        -1^* 
        &\triangleq [(0,1)] = \{ (0,1), (1,2), (2,3), (3,4), \dots \}\\
        0^* 
        &\triangleq [(0,0)] = \{ (0,0), (1,1), (2,2), (3,3), \dots \}\\
        1^* 
        &\triangleq [(1,0)] = \{ (1,0), (2,1), (3,2), (4,3), \dots \}\\
        2^* 
        &\triangleq [(2,0)] = \{ (2,0), (3,1), (4,2), (5,3), \dots \}\\
        3^* 
        &\triangleq [(3,0)] 
        = \{ (3+q,q): \text{ for all $q \in \W$} \}\\
        &\mathrel{\vdots}\\
        p^*
        &\triangleq [(p,0)] 
        = \{ (p+q,q): \text{ for all $q \in \W$} \}
        \text{ for all $p \in \N$}\\
        &\mathrel{\vdots}
\end{align*}
%
where the notation $[\cdot]$ indicates an equivalence class. As a review
of equivalence classes, note that since both of the equivalence
relations $[(1,3)]$ and $[(7,9)]$ include $(0,2)$ as an element, they
are both equal to each other and are also both equal to the equivalence
class $[(0,2)]$, which we have defined to be the symbol ${-2}^*$. Note
that we will justify removing the $*$ superscript later (\ie, replacing
$0^*$ by the symbol $0$) to make these symbols more familiar. Now that
we have defined these symbols, it is clear that the set of the integers
$\Z$ can also be expressed as
%
\begin{align}
        \Z 
        &=
        \{ p^* : \text{ for all } p \in \N \} 
        \cup 
        \{ -p^* : \text{ for all } p \in \N \} 
        \cup
        \{ p^* : p = 0 \}
        \label{eq:integers_with_stars}\\
        &=
        \{ \cdots, -4^*, -3^*, -2^*, -1^*, 
           0^*, 
           1^*, 2^*, 3^*, 4^*, \cdots \}
        \label{eq:integers_with_symbols}
\end{align}

\paragraph{Countability:} These integers in
\longref{eq:integers_with_symbols} are listed with no starting point.
That is, the pattern continues with no end from the left and to the
right. However, they can be rewritten in an order that starts at $0^*$.
In \longref{tab:integers_and_naturals}, the integers are listed
horizontally above a list of natural numbers.
%
\begin{table}[!ht]\centering
        \begin{tabular}{|l|cccccccccc|}
                \hline
                Integers: 
                & $0^*$ & ${-1}^*$ & $1^*$ & ${-2}^*$ 
                & $2^*$ & ${-3}^*$ & $3^*$ & ${-4}^*$
                & $4^*$ & $\cdots$
                \\
                Natural Numbers: 
                & $1$ &  $2$   & $3$ &  $4$   
                & $5$ &  $6$   & $7$ &  $8$ 
                & $9$ & $\cdots$
                \\
                \hline
        \end{tabular}
        \caption{Integers listed alongside natural numbers.}
        \label{tab:integers_and_naturals}
\end{table}
%
This simple pattern can be continued \adinfinitum{}, matching up exactly
one integer to exactly one natural number. Therefore, a bijection exists
between the integers and the natural numbers. That is, define a function
$f: \Z \mapsto \N$ with
%
\begin{equation*}
        f 
        \triangleq 
        \{(0^*,1),(-1^*,2),(1^*,3),(-2^*,4),(2^*,5),\cdots\}
\end{equation*}
%
and its corresponding inverse $f^{-1}: \N \mapsto \Z$ with
%
\begin{equation*}
        f^{-1} 
        \triangleq 
        \{(1,0^*),(2,-1^*),(3,1^*),(4,-2^*),(5,2^*),\cdots\}
\end{equation*}
%
Therefore, a bijection exists between $\Z$ and $\N$, and so those two
sets are congruent. Any set congruent to the natural numbers is
countably infinite. Therefore, the integers are countably infinite. It
is interesting that $\Z \cong \N$ because (due to our choice of familiar
symbols to represent each integer) it appears as if the integers are
somehow twice as large as the integers; however, this is not the case.

\paragraph{Total Ordering:} Take four whole numbers $p,q,r,s$ that are
the left and right projections of two integers $[(p,q)]$ and $[(r,s)]$.
The integer $[(p,q)]$ is said to be less than or equal to integer
$[(r,s)]$ (denoted $[(p,q)] \leq [(r,s)]$) if and only if 
%
\begin{equation*}
        p+s \leq q+r
\end{equation*}
%
Similarly, $[(p,q)]$ is strictly less than $[(r,s)]$ (denoted $[(p,q)] <
[(r,s)]$) if and only if
%
\begin{equation*}
        p+s < q+r
\end{equation*}
%
Just as with the related inequality relation on the whole numbers,
$[(p,q)] \leq [(r,s)]$ can also be denoted $[(r,s)] \geq [(p,q)]$, and
$[(p,q)] < [(r,s)]$ can also be denoted $[(r,s)] > [(p,q)]$. In these
cases, $>$ ($\geq$) represents that an integer is greater than (or equal
to) another integer. This ordering implies that
%
\begin{equation*}
        \cdots \leq -5^* \leq -4^* \leq -3^* \leq -2^* \leq -1^* \leq
        0^* \leq 1^* \leq 2^* \leq 3^* \leq 4^* \leq 5^* \leq \cdots
\end{equation*}
%
and, in fact,
%
\begin{equation*}
        \cdots < -5^* < -4^* < -3^* < -2^* < -1^* <
        0^* < 1^* < 2^* < 3^* < 4^* < 5^* \leq \cdots
\end{equation*}
%
We refer to any integer greater than $0^*$ as \emph{positive} and any
integer less than $0^*$ as \emph{negative}. The \emph{non-negative
integers} are the positive integers and $0^*$ (\ie, the complement of
the negative integers). The \emph{non-positive integers} are the
negative integers and $0^*$ (\ie, the complement of the positive
integers). The \emph{non-zero integers} are all of the integers except
for $0^*$ (\ie, $\Z \setdiff \{0^*\}$, the complement of $\{0^*\}$).

\paragraph{Lack of Dense Ordering:} Note that for $2^*$ and $3^*$, it is
the case that $2^* < 3^*$; however, there is no integer $z \in \Z$ such
that $2^* < z < 3^*$. This can be shown analytically by using the
definition of the integer and the lack of dense ordering of the whole
numbers. Because of this, $\Z$ cannot be densely ordered. 

\paragraph{Gaplessness:} It is easy to show that $(\Z,{\leq})$ and
is \emph{gapless}. Of course, since $\Z$ lacks both an upper and a lower
bound, it is not complete.

\paragraph{Existence of Minima and Maxima:} Because $(\Z,{\leq})$ is
gapless and \emph{not} densely ordered then any subset that is bounded
from above has a maximum element and any subset that is bounded from
below has a minimum element. 

\paragraph{Addition:} Again, take four whole numbers $p,q,r,s \in \W$
that make up two integers $[(p,q), [(r,s)] \in \Z$. The addition of
these two integers is defined as
%
\begin{equation*}
        [(p,q)] + [(r,s)] \triangleq [(p+r,q+s)]
\end{equation*}
%
where the result of an addition is called the \emph{sum}. For example,
%
\begin{align*}
        5^* + 6^* 
        &= [(5,0)]+[(8,2)]\\
        &= [(5+8,0+2)]\\
        &= [(13,2)]\\
        &= 11^*
\end{align*}
%
and, similarly,
%
\begin{align*}
        6^* + {-8}^*
        &= [(7,1)]+[(2,10)]\\
        &= [(7+2,1+10)]\\
        &= [(9,11)]\\
        &= {-2}^*
\end{align*}
%
where the last steps in both of these examples is justified by the
equivalence relation in \longref{eq:integer_equivalence_relation}.

Take whole numbers $p,q,r,s \in \W$ making up integers $[(p,q)],[(r,s)]
\in \Z$. Note that it is the case that
%
\begin{align*}
        [(r,s)] + [(p,q)]
        &= [(r+p,s+q)]\\
        &= [(p+r,q+s)]\\
        &= [(p,q)] + [(r,s)]
\end{align*}
%
In other words, integer addition is \emph{commutative}, so for any two
integers $x,y \in \Z$, $x+y=y+x$. 

Additionally, take whole numbers $p,q,r,s,t,u \in \W$ making up integers
$[(p,q)]$, $[(r,s)]$, and $[(t,u)]$. Note that grouping symbols like
parentheses also indicate that the operator surrounded by them should be
calculated first rather than following the normal left-to-right
operation. Thus,
%
\begin{align*}
        [(p,q)] + ( [(r,s)]+[(t,u)] )
        &= [(p,q)] + [(r+t,s+u)]\\
        &= [(p+(r+t),q+(s+u))]\\
        &= [((p+r)+t,(q+s)+u)]\\
        &= [(p+r,q+s)]+[(t,u)]\\
        &= ( [(p,q)]+[(r,s)] )+[(t,u)]
\end{align*}
%
In other words, integer addition is also \emph{associative}, so for any
three integers $x,y,z \in \Z$, $x+(y+z)=(x+y)+z$. 

Now take whole numbers $p,q,r \in \W$ that make integers $[(p,q)]$ and
$[(r,r)]$ (\ie, $0^*$). It is the case that
%
\begin{align*}
        [(p,q)]+[(r,r)]
        &= [(p+r,q+r)]\\
        &= [(p,q)]
\end{align*}
%
where the last step is justified by the definition of equality for
integers given in \longref{eq:integer_equivalence_relation}. That is,
because $p+r+q=q+r+p$ then $[(p+r,q+r)]=[(p,q)]$. Therefore, for any
integer $z \in \Z$, $z + 0^* = z$. Thus, $0^*$ is the \emph{additive
identity} for the integers. 

Also note that for any two whole numbers $p,q \in \W$, the integer
addition
%
\begin{align*}
        [(p,q)]+[(q,p)]
        &=[(p+q,q+p)]\\
        &=[(p+q,p+q)]\\
        &=0^*
\end{align*}
%
Therefore, the \emph{additive inverse} of $[(p,q)]$ is $[(q,p)]$. That
is,
%
\begin{align*}
        0^* + 0^*
        &= [(p,p)]+[(q,q)] = [(p+q,p+q)] = 0^*\\
        1^* + {-1}^* 
        &= [(p+1,p)]+[(q,q+1)]\\
        &= [(p+1+q,p+q+1)] = [(p+q+1,p+q+1)] = 0^*\\
        2^* + {-2}^* 
        &= [(p+2,p)]+[(q,q+2)]\\
        &= [(p+2+q,p+q+2)] = [(p+q+2,p+q+2)] = 0^*\\
        3^* + {-3}^* 
        &= [(p+3,p)]+[(q,q+3)]\\ 
        &= [(p+3+q,p+q+3)] = [(p+q+3,p+q+3)] = 0^*\\
        &\mathrel{\vdots}
\end{align*}
%
So $0^*$ is its own additive inverse, $-1^*$ is the additive inverse of
$1^*$, $-2^*$ is the additive inverse of $2^*$, and so on. In other
words, the familiar symbols chosen to represent the integers have been
named with focus on additive inverses.

\paragraph{Subtraction:} Motivated by the additive inverse of an
integer, the \emph{subtraction} operator $-$ is defined so that for any
four whole numbers $p,q,r,s \in \W$, the subtraction of two integers is
%
\begin{align*}
        [(p,q)] - [(r,s)]
        &\triangleq [(p,q)]+[(s,r)]\\
        &= [(p+s,q+r)]
\end{align*}
%
For example,
%
\begin{align*}
        2^* - 5^*
        &= [(2,0)] - [(9,4)]\\
        &= [(2,0)] + [(4,9)]\\
        &= [(2+4,0+9)]\\
        &= [(6,9)]\\
        &= -3^*
\end{align*}
%
Note that
%
\begin{align*}
        5^* - 2^* 
        &= [(6,1)] - [(3,1)]\\
        &= [(6,1)] + [(1,3)]\\
        &= [(6+1,1+3)]\\
        &= [(7,4)]\\
        &= 3^*
\end{align*}
%
Thus, $5^*-2^*$ is the additive inverse of $2^*-5^*$. In fact, for any
two integers $x,y \in \Z$, $x-y$ is the additive inverse of $y-x$. This
indicates that this operator is \emph{not} commutative. Additionally, we
state without proof that this operator is also \emph{not} associative.
Also note that for integers $x,y,y_i \in \Z$ where $y_i$ is the additive
inverse of $y$ then $x-y=x+y_i$. For example, $5^*-2^*=5^*+{-2}^*$.

\paragraph{Multiplication:} As before, take four whole numbers $p,q,r,s
\in \W$ that make up two integers $[(p,q)],[(r,s)] \in \Z$. The
\emph{multiplication} of these two integers is defined as
%
\begin{equation*}
        [(p,q)] \times [(r,s)]
        =
        [(p \times r + q \times s, p \times s + q \times r)]
\end{equation*}
%
and the result is called their \emph{product}. For example,
%
\begin{align*}
        -2^* \times 5^*
        &= [(3,5)] \times [(5,0)]\\
        &= [(3 \times 5 + 5 \times 0, 3 \times 0 + 5 \times 5)]\\
        &= [(15 + 0, 0 + 25)]\\
        &= [(15, 25)]\\
        &= -10^*
\end{align*}
%
Also note that for any two integers $x,y \in \Z$, the notations $x
\times y$, $x \cdot y$, and $xy$ are all equivalent. For those same
$(p,q)$ and $(r,s)$ from above, it is also the case that
%
\begin{align*}
        [(r,s)] \times [(p,q)] 
        &= [(r \times p + s \times q, r \times q + s \times p)]\\
        &= [(p \times r + q \times s, q \times r + p \times s)]\\
        &= [(p \times r + q \times s, p \times s + q \times r)]\\
        &= [(p,q)] \times [(r,s)]
\end{align*}
%
Therefore, integer multiplication is \emph{commutative}. In other words,
for any two integers $x,y \in \Z$, it is the case that $x \times y = y
\times x$. Similarly, for $p,q,r,s,t,u \in \W$ making up
$[(p,q)],[(r,s)],[(t,u)] \in \Z$,
%
\begin{align*}
        [(p,q)] \times ( [(r,s)] \times [(t,u)] )
        &= [(p,q)] \times [(rt+su,ru+st)]\\
        &= [(p(rt+su)+q(ru+st),p(ru+st)+q(rt+su))]\\
        &= [(prt+psu+qru+qst,pru+pst+qrt+qsu)]\\
        &= [(prt+qst+psu+qru,pst+qrt+qsu+pru)]\\
        &= [((pr+qs)t+(ps+qr)u,(ps+qr)t+(qs+pr)u)]\\
        &= [((pr+qs)t+(ps+qr)u,(ps+qr)t+(pr+qs)u)]\\
        &= [(pr+qs,ps+qr)] + [(t,u)]\\
        &= ( [(p,q)] + [(r,s)] ) + [(t,u)]
\end{align*}
%
Therefore, integer multiplication is \emph{associative}. In other words,
for any three integers $x,y,z \in \Z$, it is the case that $x \times ( y
\times z) = ( x \times y ) \times z$.

\paragraph{Multiplication Notables:} Take three whole numbers $p,q,r \in
\W$ that make up the integers $(p,q)$ and $(r,r)$ (\ie, the integer
$0^*$). It is the case that
%
\begin{align*}
        [(p,q)] \times [(r,r)]
        &= [(pr+qr,pr+qr)]\\
        &= 0^*
\end{align*}
%
where the last step is justified by
\longref{eq:integer_equivalence_relation}. In fact, for any two integers
$x,y \in \Z$, if $x \times y = 0^*$ then it must be that $x=0^*$,
$y=0^*$, or both. Now take the integers integers $(p,q)$ and $(r+1,r)$
(\ie, the integer $1^*$). Note that
%
\begin{align*}
        [(p,q)] \times [(r+1,r)]
        &= [(p(r+1)+qr,pr+q(r+1))]\\
        &= [(pr+p+qr,pr+qr+q)]\\
        &= [(pr+qr+p,pr+qr+q)]\\
        &= [(p,q)]
\end{align*}
%
where again the last step is justified by
\longref{eq:integer_equivalence_relation}. Therefore, the integer $1^*$
is the \emph{multiplicative identity}, and so for any integer $x \in
\Z$, it is the case that $x \times 1^* = x$. Now take the integers
$(p,q)$ and $(r,r+1)$ (\ie, the integer ${-1}^*$). It is such that 
%
\begin{align*}
        [(p,q)] \times [(r,r+1)]
        &= [(pr+q(r+1),p(r+1)+qr)]\\
        &= [(pr+qr+q,pr+p+qr)]\\
        &= [(pr+qr+q,pr+qr+p)]\\
        &= [(q,p)]
\end{align*}
%
Therefore, for any integer $x \in \Z$, the multiplication ${-1}^* \times
x$ is equivalent to the additive inverse of $x$. Because this operation
is very useful, it has the shorthand ${-x}$. This is also consistent
with the naming of the symbols that represent each integer (\eg, $3^*$
and its additive inverse ${-3}^*$). 

\paragraph{Subtraction:} For any two integers $x,y \in \Z$, the
\emph{subtraction} operator $-$ is defined so that
%
\begin{equation*}
        x - y \triangleq x + -y
\end{equation*}
%
where $-y$ is a shorthand for $-1^* \times y$, which results in the
additive inverse of $y$. And thus, for any integer $x \in \Z$, it is the
case that $x \times {-1}^* = -x$.

\paragraph{Absolute Value and Signum:} For any integer $x \in \Z$,
denote its \emph{absolute value} with the notation $|x|$ defined by
%
\begin{equation*}
        |x| 
        \triangleq 
        \begin{cases}
                x &\text{if } x \geq 0^*\\
                -x &\text{if } x < 0^*
        \end{cases}
\end{equation*}
%
and define the \emph{signum function} (also called the \emph{sign
function}, not to be confused with the \emph{sine function}) $\sgn: \Z
\mapsto \{-1^*,0^*,1^*\}$ with
%
\begin{equation*}
        \sgn(x)
        \triangleq
        \begin{cases}
                -1^* &\text{if } x < 0^*\\
                0^* &\text{if } x = 0^*\\
                1^* &\text{if } x > 0^*
        \end{cases}
\end{equation*}
%
Therefore, any integer $z \in \Z$ can be represented as a magnitude
(\ie, absolute value $|z|$) and a sign (\ie, $\sgn(z)$), as in
%
\begin{equation*}
        z = \sgn(z) \times |z|
\end{equation*}
%
Note that the absolute value has some special properties. In particular,
for any two integers $x,y \in \Z$,
%
\begin{itemize}
        \item $|x| \geq 0^*$
        \item $|x| = 0^*$ if and only if $x = 0^*$
        \item $|x \times y| = |x| \times |y|$
        \item $|x + y| \leq |x| + |y|$
        \item $|x - y| \geq |x| - |y|$
        \item $|{-x}| = |x|$
        \item $|x| \leq y$ if and only if $-y \leq x \leq y$
\end{itemize}
%
The last property will commonly be used to specify that an integer
should be within a certain range of other integers.

\paragraph{Exponentiation:} Now that multiplication has been defined for
the integers, exponentiation can also be defined. For any integers $x, y
\in \Z$ and whole numbers $a,b \in \W$, exponentiation of the integers
is such that
%
\begin{align*}
        x^0 &\triangleq 1^*\\
        x^1 &\triangleq x\\
        x^{a+b} &\triangleq x^a \times x^b\\
        (x^a)^b &\triangleq x^{a \times b}\\
        (x \times y)^a &\triangleq x^a \times y^a
\end{align*}
%
These are identical to the properties of whole number exponentiation. In
fact, $a$ and $b$ are not integers but are whole numbers. This is
because the multiplication operation for integers does not have a
multiplicative inverse. When exponentiation is defined for the rational
numbers, which does have a multiplicative inverse, there will be
additional properties. Note that for any $x \in \Z$, $x^2 \geq 0^*$.

\paragraph{Even and Odd Integers:} Take integer $y \in \Z$.
%
\begin{itemize}
        \item If it is the case that there exists another integer $x \in
                \Z$ such that $y = 2^* x$ then $y$ is called an
                \emph{even} integer. 
        \item If it is the case that there exists another integer $x \in
                \Z$ such that $y = 2^* x+1^*$ then $y$ is called an
                \emph{odd} integer. 
\end{itemize}
%
It can be shown that for every integer $y \in \Z$, $y$ is either an
even number or an odd number but not both. That is, the sets
%
\begin{equation*}
        \Z_E \triangleq \{ z \in \Z : z \text{ is even} \}
        \quad \text{ and } \quad
        \Z_O \triangleq \{ z \in \Z : z \text{ is odd} \}
\end{equation*}
%
are mutually exclusive and collectively exhaustive in $\Z$ (\ie, $\Z_E
\cap \Z_O = \emptyset$ and $\Z_E \cup \Z_O = \Z$). Therefore, $\{ \Z_E,
\Z_O \}$ is a partition of $\Z$.

Assume that $x,y \in \Z$. Also assume that $x$ is even. Thus, there
exists a $z \in \Z$ such that $x = 2z$. Therefore,
%
\begin{equation*}
        x y = (2^* z) y = 2^* zy = 2^* (zy)
\end{equation*}
%
That is, $xy$ must also be an even whole number. Now take $x,y \in \Z$
as before, but assume that $x$ and $y$ are both odd. Then there must
exist $v,w \in \Z$ such that $x = 2^* v+1$ and $y = 2^* w+1$. Then
%
\begin{equation*}
        x y 
        = (2^* v + 1)(2^* w + 1) 
        = 2^* v 2^* w + 2^* v + 2^* w + 1^* 
        = 4^* vw + 2^* v + 2^* w + 1^* 
        = 2^* (vw + v + w) + 1^*
\end{equation*}
%
Therefore $xy$ must also be an odd number. To summarize,
%
\begin{itemize}
        \item The product of two odd integers is odd.
        \item The product of an even integer with any other integer is
                even.
\end{itemize}

\paragraph{Algebraic Structure of the Integers:} Note that for
$(\Z,{+},0^*)$, it is the case that
%
\begin{itemize}
        \item for all $x,y \in \Z$, $x + y = y + x$
        \item for all $x,y,z \in \Z$, $(x + y) + z = x + (y + z)$
        \item for all $x \in \Z$, $0^* + x = x + 0^* = x$
        \item for all $x \in \Z$, $x + -x = -x + x = 0^*$
\end{itemize}
%
and for $(\Z,{\times},1^*)$, it is the case that
%
\begin{itemize}
        \item for all $x,y \in \Z$, $x \times y = y \times x$
        \item for all $x,y,z \in \Z$, $(x \times y) \times z = x \times
                (y \times z)$
        \item for all $x \in \Z$, $1^* \times x = x \times 1^* = x$
\end{itemize}
%
And so for $(\Z,{+},{\times},0^*,1^*)$,
%
\begin{itemize}
        \item $(\Z,{+},0^*)$ is a \emph{commutative group}
        \item $(\Z,{\times},1^*)$ is a \emph{commutative monoid}
        \item for each $x,y,z \in \Z$, $x(y + z) = xy + yz$ and
                $(x + y)z = xz + yz$
\end{itemize}
%
Therefore, $(\Z,{+},{\times},0^*,1^*)$ is a \emph{commutative ring}.
Thus, $(\Z,{+},{\times},0^*,1^*)$ is trivially an algebra over itself
(\ie, a $\Z$-algebra). However, also note that for any $x,y,z \in \Z$,
%
\begin{itemize}
        \item if $x \leq y$ then $z + x \leq z + y$
        \item if $0^* \leq x$ and $0^* \leq y$ then $0^* \leq xy$
\end{itemize}
%
and so $(\Z,{+},{\times},0^*,1^*,{\leq})$ is an \emph{ordered ring} and
aspects of familiar arithmetic that do not involve multiplicative
inverses apply to it. Unless otherwise noted, whenever $\Z$ is used, it
is assumed that it is equipped with operators $+$ and $\times$ and order
relation $\leq$; in other words, $\Z$ is implicitly taken to be the
ordered ring $(\Z,{+},{\times},0^*,1^*,{\leq})$.

\paragraph{Relationship to Whole Numbers:} Define the set $\W^*$ as the
set of non-negative integers. That is, define $\W^*$ by
%
\begin{equation*}
        \W^*
        \triangleq
        \{ z \in \Z : z \geq 0 \}
\end{equation*}
%
and so $\W^*$ is the set of the non-negative integers. It is easy to
show that if the image of $\W^* \times \W^*$ through either operator $+$
or $\times$ is $\W^*$. Additionally, it can be shown that
$(\W^*,{+}|_{\W^*},{\times}|_{\W^*})$ forms a commutative semiring, and
so $\W^*$ is a subsemiring of $\Z$.

Now take the function $f: \W^* \mapsto \W$ defined by
%
\begin{align*}
        f
        &\triangleq
        \{ (z^*, z): \text{ for all } z \in \W \}\\
        &=
        \{ (0^*, 0), (1^*, 1), (2^*, 2), (3^*, 3), \dots \}
\end{align*}
%
Clearly this is a bijection. That is, the inverse $f^{-1}: \W \mapsto
\W^*$ is defined by
%
\begin{align*}
        f^{-1}
        &\triangleq
        \{ (z, z^*): \text{ for all } z \in \W \}\\
        &=
        \{ (0, 0^*), (1, 1^*), (2, 2^*), (3, 3^*), \dots \}
\end{align*}
%
Therefore $\W \cong \W^*$. Also, note that for any integers $x,y \in
\W^*$, 
%
\begin{enumerate}[(i)]
        \item if $x \geq y$ then $f(x) \geq f(y)$
                \label{item:integer_whole_ordering}
        \item $f(x + y) = f(x) + f(y)$
                \label{item:integer_whole_ring_homomorphism_plus}
        \item $f(x \times y) = f(x) \times f(y)$
                \label{item:integer_whole_ring_homomorphism_times}
        \item $f(1^*)=1$
                \label{item:integer_whole_ring_homomorphism_m_identity}
\end{enumerate}
%
Property (\shortref{item:integer_whole_ordering}) shows that $f$ is a
monotone function, and properties
(\shortref{item:integer_whole_ring_homomorphism_plus})--%
(\shortref{item:integer_whole_ring_homomorphism_m_identity}) show that
$f$ is a semiring homomorphism. Since $f$ is also a bijection, it can be
said that $f$ is both an isomorphism in both the order sense and the
algebraic sense. In other words, $\W$ is isomorphic to $\W^*$ in both an
order sense and an algebraic sense. Therefore, not only is $\W \cong
\W^*$, but $\W^*$ is a valid \emph{representation} for $\W$, and it is
justifiable to say that $\W$ is a subsemiring of $\Z$.

For example, note that for any integers $x,y \in \W^*$ and whole number
$a \in \W$,
%
\begin{itemize}
        \item $x = y$ if and only if $f(x) = f(y)$
        \item $x \leq y$ if and only if $f(x) \leq f(y)$
        \item $f(0^*) = 0$
        \item $f(x + y) = f(x)+f(y)$
        \item $f(x - y) = f(x)-f(y)$
        \item $f(1^*) = 1$
        \item $f(x \times y) = f(x) \times f(y)$
        \item $f(x^a) = f(x)^a$
\end{itemize}
%
So arithmetic and order are both preserved by the bijection $f$. Thus,
while $\W$ is certainly not equal to $\W^*$, it is equal in all of the
important ways that matter to us, and so we can consider $\W \subset \Z$
with all of its standard ordering and operations. In other words, the
$*$ superscript can be dropped from all of the integer symbols above;
the non-negative integers (\ie, $\W^*$) are a valid representation of
the whole numbers (\ie, $\W$).

\subsection{Rational Numbers}

While the integers provide a way of analyzing the difference between
whole numbers (and, in fact, provide an equivalent representation of the
whole numbers as well), they do not answer questions about the relative
scale of whole numbers. That is, a difference between two whole numbers
may be significant in one case but insignificant in another. Thus, it
would be useful to have a framework to analyze differences with respect
to some common scale. This framework comes in the form of the
\emph{rational numbers}.

\paragraph{Definition:} Just as each integer is an equivalence class on
the set $\W \times \W$, each \emph{rational number} is an equivalence
class on the set $\Z \times (\Z \setdiff \{0\})$. That is, for an
element $(p,q) \in \Z \times (\Z \setdiff \{0\})$, define the
equivalence relation $=$ on $\Z \times (\Z \setdiff \{0\})$ so that for
integers $p,r \in \Z$ and $q,s \in \Z \setdiff \{0\}$, the elements
$(p,q),(r,s) \in \Z \times (\Z \setdiff \{0\})$ are \emph{equal} if and
only if
%
\begin{equation}
        ps = qr
        \label{eq:rational_equivalence_relation}
\end{equation}
%
Each rational number is then defined as an equivalence class $[(p,q)]$
where $(p,q) \in \Z \times (\Z \setdiff \{0\})$. That is, the set of
rational numbers \symdef{Bnumbers.4}{rationals}{$\Q$}{the set of the
rationals (\ie, ratios of integers)} is defined to be the quotient set
%
\begin{equation*}
        \Q
        \triangleq
        (\Z \times (\Z \setdiff \{0\}))/{=}
\end{equation*}
%
where the equivalence relation $=$ is given by
\longref{eq:rational_equivalence_relation}. To make things more
familiar, we introduce the notation
%
\begin{equation*}
        \frac{p}{q} \triangleq [(p,q)]
\end{equation*}
%
where $p \in \Z$ and $q \in \Z \setdiff \{0\}$. The notation $p/q$ is
equivalent. In both cases, the left projection $p$ is called the
\emph{numerator} and the right projection $q$ is called the
\emph{denominator} of the \emph{ratio} $\frac{p}{q}$. For example,
$\frac{1}{2} = [(1,2)]$. However, by
\longref{eq:rational_equivalence_relation}, $[(1,2)]=[(5,10)]$, which
could also be written $\frac{1}{2} = \frac{5}{10}$. Therefore, each line
of the following represents a single particular rational number
%
\begin{align*}
        \frac{1}{2} 
        = \frac{2}{4} = \frac{3}{6} = \cdots 
        &= \frac{n}{2n} \text{ for all $n \in \Z \setdiff \{0\}$}\\
        \frac{1}{3} 
        = \frac{2}{6} = \frac{3}{9} = \cdots 
        &= \frac{n}{3n} \text{ for all $n \in \Z \setdiff \{0\}$}\\
        \frac{-1}{5} 
        = \frac{-2}{10} = \frac{-3}{15} = \cdots 
        &= \frac{-n}{3n} \text{ for all $n \in \Z \setdiff \{0\}$}
\end{align*}
%
and thus the set of the rationals $\Q$ can be alternatively written as
%
\begin{equation*}
        \Q 
        \triangleq 
        \left\{
        \frac{p}{q} : p \in \Z, q \in \Z \setdiff \{0\}
        \right\}
\end{equation*}

\paragraph{Countability:} It may seem like the set $\Q$ is not
countable. That is, it may seem like the sets $\Q$ and $\N$ could not be
congruent. However, this would be a mistake. We will show that a
bijection exists between $\Q$ and $\N$. To motivate this, construct a
table of rational numbers with $\frac{1}{1}$ in its upper-left corner
that has increasing numerators down its rows and increasing denominators
across its columns, where increasing is in the integer sense. Now map
the upper-left corner of this table to natural number $1$ and then map
the cells of the nearest diagonal to $2$ and $3$. Continue in this
pattern of mapping integers by diagonal until the entire table is
filled. This mapping is shown in
\longref{tab:motiv_rationals_and_naturals}.
%
\begin{table}[!ht]\centering
        \begin{tabular}{|cccccc|}
                \hline[5pt]
                $\left( \frac{1}{1}, 1 \right)$ & 
                $\left( \frac{1}{2}, 3 \right)$ & 
                $\left( \frac{1}{3}, 6 \right)$ & 
                $\left( \frac{1}{4}, 10 \right)$ &
                $\left( \frac{1}{5}, 15 \right)$ &
                $\cdots$ \\[5pt]
                $\left( \frac{2}{1}, 2 \right)$ & 
                $\left( \frac{2}{2}, 5 \right)$ & 
                $\left( \frac{2}{3}, 9 \right)$ & 
                $\left( \frac{2}{4}, 14 \right)$ &
                $\left( \frac{2}{5}, 20 \right)$ &
                $\cdots$ \\[5pt]
                $\left( \frac{3}{1}, 4 \right)$ & 
                $\left( \frac{3}{2}, 8 \right)$ & 
                $\left( \frac{3}{3}, 13 \right)$ & 
                $\left( \frac{3}{4}, 19 \right)$ &
                $\left( \frac{3}{5}, 26 \right)$ &
                $\cdots$ \\[5pt]
                $\left( \frac{4}{1}, 7 \right)$ & 
                $\left( \frac{4}{2}, 12 \right)$ & 
                $\left( \frac{4}{3}, 18 \right)$ & 
                $\left( \frac{4}{4}, 25 \right)$ &
                $\left( \frac{4}{5}, 33 \right)$ &
                $\cdots$ \\[5pt]
                $\left( \frac{5}{1}, 11 \right)$ & 
                $\left( \frac{5}{2}, 17 \right)$ & 
                $\left( \frac{5}{3}, 24 \right)$ & 
                $\left( \frac{5}{4}, 32 \right)$ &
                $\left( \frac{5}{5}, 41 \right)$ &
                $\cdots$ \\[5pt]
                $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ &
                $\ddots$ \\[5pt]
                \hline
        \end{tabular}
        \caption{Motivation for rational number and natural number
        bijection}
        \label{tab:motiv_rationals_and_naturals}
\end{table}
%
Of course, this mapping is not a valid total function because elements
of the table do not represent distinct rationals. For example, creating
a mapping from $\frac{1}{1}$ that is different from the mapping from
$\frac{2}{2}$ is not valid since both ratios represent the same rational
number. Thus, construct a new table of rationals by traversing
\longref{tab:rationals_and_naturals} in the order of the natural numbers
in each mappings (\ie, traverse the diagonals starting in the upper-left
corner and move right) but skip the rationals that have already been
listed. That is, since $\frac{1}{1}$ is listed first, $\frac{2}{2}$ can
be skipped. Map the rationals that are not skipped to the natural
numbers, starting with $1$. The result is
\longref{tab:motiv_rationals_and_naturals_2}, where skipped ratios are
shown with the symbol $\cdot$.
%
\begin{table}[!ht]\centering
        \begin{tabular}{|cccccc|}
                \hline[5pt]
                $\left( \frac{1}{1}, 1 \right)$ & 
                $\left( \frac{1}{2}, 3 \right)$ & 
                $\left( \frac{1}{3}, 5 \right)$ & 
                $\left( \frac{1}{4}, 9 \right)$ &
                $\left( \frac{1}{5}, 11 \right)$ &
                $\cdots$ \\[5pt]
                $\left( \frac{2}{1}, 2 \right)$ & 
                $\cdot$ & 
                $\left( \frac{2}{3}, 8 \right)$ & 
                $\cdot$ &
                $\left( \frac{2}{5}, 16 \right)$ &
                $\cdots$ \\[5pt]
                $\left( \frac{3}{1}, 4 \right)$ & 
                $\left( \frac{3}{2}, 7 \right)$ & 
                $\cdot$ & 
                $\left( \frac{3}{4}, 15 \right)$ &
                $\left( \frac{3}{5}, 20 \right)$ &
                $\cdots$ \\[5pt]
                $\left( \frac{4}{1}, 6 \right)$ & 
                $\cdot$ & 
                $\left( \frac{4}{3}, 14 \right)$ & 
                $\cdot$ &
                $\left( \frac{4}{5}, 26 \right)$ &
                $\cdots$ \\[5pt]
                $\left( \frac{5}{1}, 10 \right)$ & 
                $\left( \frac{5}{2}, 13 \right)$ & 
                $\left( \frac{5}{3}, 19 \right)$ & 
                $\left( \frac{5}{4}, 25 \right)$ &
                $\cdot$ &
                $\cdots$ \\[5pt]
                $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ &
                $\ddots$ \\[5pt]
                \hline
        \end{tabular}
        \caption{More motivation for rational number and natural number
        bijection}
        \label{tab:motiv_rationals_and_naturals_2}
\end{table}
%
However, this provides no mapping for rational numbers representing with
ratios that include a single negative integer. It also does not provide
a mapping for rational number $\frac{0}{1}$. So, use the mapping
depicted in \longref{tab:rationals_and_naturals}.
%
\begin{table}[!ht]\centering
        \begin{tabular}{|c|}
                \hline[5pt]
                $\left( \frac{0}{1}, 1 \right)$ \\[5pt]
                \hline
        \end{tabular}\\
        \medskip
        \begin{tabular}{|cccc|}
                \hline[5pt]
                $\left( \frac{1}{1}, 2 \right)$ & 
                $\left( \frac{1}{2}, 6 \right)$ & 
                $\left( \frac{1}{3}, 10 \right)$ & 
                $\cdots$ \\[5pt]
                $\left( \frac{2}{1}, 4 \right)$ & 
                $\cdot$ & 
                $\left( \frac{2}{3}, 16 \right)$ & 
                $\cdots$ \\[5pt]
                $\left( \frac{3}{1}, 8 \right)$ & 
                $\left( \frac{3}{2}, 14 \right)$ & 
                $\cdot$ & 
                $\cdots$ \\[5pt]
                $\vdots$ & $\vdots$ & $\vdots$ &
                $\ddots$ \\[5pt]
                \hline
        \end{tabular}
        \quad
        \begin{tabular}{|cccc|}
                \hline[5pt]
                $\left( \frac{-1}{1}, 3 \right)$ & 
                $\left( \frac{-1}{2}, 7 \right)$ & 
                $\left( \frac{-1}{3}, 11 \right)$ & 
                $\cdots$ \\[5pt]
                $\left( \frac{-2}{1}, 5 \right)$ & 
                $\cdot$ & 
                $\left( \frac{-2}{3}, 17 \right)$ & 
                $\cdots$ \\[5pt]
                $\left( \frac{-3}{1}, 9 \right)$ & 
                $\left( \frac{-3}{2}, 15 \right)$ & 
                $\cdot$ & 
                $\cdots$ \\[5pt]
                $\vdots$ & $\vdots$ & $\vdots$ &
                $\ddots$ \\[5pt]
                \hline
        \end{tabular}
        \caption{The rational number to natural number bijection}
        \label{tab:rationals_and_naturals}
\end{table}
%
Clearly, this is mapping is a total function that is surjective (\ie, it
maps to every natural number) and is injective (each natural number only
receives one mapping), and thus it is a bijection. Therefore, $\Q \cong
\N$. In other words, $\Q$ is countably infinite; it is possible to count
each of the rationals. In fact, any set that can be listed in a table as
in \longref{tab:motiv_rationals_and_naturals} can be shown to be
countable. This includes the set $\N^2$ (\ie, $\N \times \N$) which can
easily be written in table form; therefore, $\N^2 \cong \N$. In fact, it
can be shown that for any $n \in \N - \{1\}$, it is the case that $\N^n
\cong \N$. Additionally, any set that is congruent to such a set is also
congruent to $\N$. However, the power set of any of these sets (\ie,
$\Pow(\N^2)$ which is congruent to and also denoted $2^{\N^2}$) is not
countable.

\paragraph{Symbols:} There are some symbols that are used to represent
some of the equivalence classes that are elements of set $\Q$. For every
integer $z \in \Z$, define the symbol $z^*$ as the rational $z/1$ (\ie,
the rational that includes $(z,1)$ in its equivalence class). That is,
define the familiar symbols 
%
\begin{align*}
        &\mathrel{\vdots}\\
        -2^* &\triangleq
        \frac{-2}{1} = \frac{-4}{2} = \frac{-6}{3} = \cdots\\
        -1^* &\triangleq
        \frac{-1}{1} = \frac{-2}{2} = \frac{-3}{3} = \cdots\\
        0^* &\triangleq 
        \frac{0}{1} = \frac{0}{2} = \frac{0}{3} = \cdots\\
        1^* &\triangleq 
        \frac{1}{1} = \frac{2}{2} = \frac{3}{3} = \cdots\\
        2^* &\triangleq 
        \frac{2}{1} = \frac{4}{2} = \frac{6}{3} = \cdots\\
        &\mathrel{\vdots}
\end{align*}
%
As we did with the symbols we used with $\Z$, later we will justify
dropping the $*$ superscripts on these symbols.

\paragraph{Total Ordering:} Take four whole numbers $p,q,r,s$ with $q
\neq 0$ and $s \neq 0$. The rational $\frac{p}{q}$ is said to be less
than or equal to $\frac{r}{s}$ (denoted $\frac{p}{q} \leq \frac{r}{s}$)
if and only if 
%
\begin{equation*}
        ( qs > 0 \text{ and } ps \leq qr ) 
        \text{ or }
        ( qs < 0 \text{ and } ps \geq qr ) 
\end{equation*}
%
Similarly, $\frac{p}{q}$ is strictly less than $\frac{r}{s}$ (denoted
$\frac{p}{q} < \frac{r}{s}$) if and only if
%
\begin{equation*}
        ( qs > 0 \text{ and } ps < qr ) 
        \text{ or }
        ( qs < 0 \text{ and } ps > qr ) 
\end{equation*}
%
Just as with the related inequality relation on the other numbers,
$\frac{p}{q} \leq \frac{r}{s}$ can also be denoted $\frac{r}{s} \geq
\frac{p}{q}$, and $\frac{p}{q} < \frac{r}{s}$ can also be denoted
$\frac{r}{s} > \frac{p}{q}$. In these cases, $>$ ($\geq$) represents
that a rational is greater than (or equal to) another rational. This
ordering implies that
%
\begin{equation*}
        \cdots 
        \leq -2^* \leq -1^*
        \leq \frac{-1}{2} \leq \frac{-1}{4} \leq \frac{-1}{8} \leq
        0^* 
        \leq \frac{1}{8} \leq \frac{1}{4} \leq \frac{1}{2} 
        \leq 1^* \leq 2^* \leq 
        \cdots
\end{equation*}
%
and, in fact,
%
\begin{equation*}
        \cdots 
        < -2^* < -1^* 
        < \frac{-1}{2} < \frac{-1}{4} < \frac{-1}{8} <
        0^* 
        < \frac{1}{8} < \frac{1}{4} < \frac{1}{2} 
        < 1^* < 2^* < \cdots
\end{equation*}
%
We refer to any rational greater than $0^*$ as \emph{positive} and any
rational less than $0^*$ as \emph{negative}. The \emph{non-negative
rationals} are the positive rationals and $0^*$ (\ie, the complement of
the negative rationals). The \emph{non-positive rationals} are the
negative rationals and $0^*$ (\ie, the complement of the positive
rationals). The \emph{non-zero rationals} are all of the rationals
except for $0^*$ (\ie, $\Q \setdiff \{0^*\}$, the complement of
$\{0^*\}$).

\paragraph{Dense Ordering:} Note that for any two \emph{distinct}
rational numbers $x,y \in \Q$ such that $x < y$, there is a third
rational number $z \in \Q$ such that $x < z < y$. As discussed, this is
not the case with the whole numbers nor the integers. This property
makes the set of rational numbers $\Q$ a \emph{densely ordered set}.
This is an important property of the rational numbers. 

\paragraph{Lack of Gaplessness:} In \longref{app:math_ordering_issues},
a subset of the rationals is presented that has no least upper bound.
Therefore, $\Q$ cannot be gapless. It is interesting that the rational
numbers are both a countable set and a densely ordered set. Being both
densely ordered and countable prevents the rational numbers from being
\emph{gapless}, as is shown in
\longref{app:math_countability_and_order}. This motivates the need for
the \emph{real numbers}, described in \longref{app:math_reals}, which
are gapless and have a dense ordering; however, this requires that the
set of the real numbers is uncountable.

\paragraph{Lack of Certain Existence of Minima and Maxima:} Since $\Q$
is not gapless, there are subsets of $\Q$ that do not have greatest
lower bounds or do not have least upper bounds; these such subsets also
cannot have minima nor maxima. An example of this is shown in
\longref{app:math_ordering_issues}.

\paragraph{Addition:} For integers $p,q,r,s \in \Z$ with $q \neq 0^*$
and $s \neq 0$, define the \emph{addition} operator $+$ such that
%
\begin{equation*}
        \frac{p}{q} + \frac{r}{s} \triangleq \frac{ps + qr}{qs}
\end{equation*}
%
where the result of the addition is called the \emph{sum}. Thus,
%
\begin{align*}
        \frac{r}{s} + \frac{p}{q} 
        = \frac{rq + sp}{sq}
        = \frac{qr + ps}{qs}
        = \frac{ps + qr}{qs}
        = \frac{p}{q} + \frac{r}{s}
\end{align*}
%
Therefore, rational number addition is \emph{commutative}. Thus, for any
two rationals $x,y \in \Q$, $x + y = y + x$. Additionally, for integers
$p,q,r,s,t,u \in \Z$ with $q \neq 0$, $s \neq 0$, and $u \neq 0$,
%
\begin{align*}
        \frac{p}{q} + \left(\frac{r}{s} + \frac{t}{u}\right)
        &= \frac{p}{q} + \frac{ru+st}{su}
        = \frac{psu+q(ru+st)}{qsu} = \frac{psu+qru+qst}{qsu}\\
        &= \frac{(ps+qr)u+qst}{qsu} = \frac{ps+qr}{qs} + \frac{t}{u}\\
        &= \left(\frac{p}{q}+\frac{r}{s}\right) + \frac{t}{u}
\end{align*}
%
Therefore, rational number addition is also \emph{associative}. Thus,
for any three rationals $x,y,z \in \Q$, $x + (y+z) = (x+y)+z$. Note that
for any three integers $p,q,r \in \Z$ with $q \neq 0$ and $r \neq 0$,
%
\begin{align*} 
        \frac{p}{q} + 0^*
        &= \frac{p}{q} + \frac{0}{r} = \frac{pr+q \times 0}{rq}
        = \frac{pr}{rq} = \frac{p}{q}
\end{align*}
%
where the second and last steps are justified by
\longref{eq:rational_equivalence_relation}. Thus, for any rational
number $x \in \Q$, $x + 0^* = x$, and so $0^*$ is known as the
\emph{additive identity} for rational numbers. 

\paragraph{Additive Inverses:} For integers $p,q \in \Z$ with $q
\neq 0^*$, note that
%
\begin{align*}
        \frac{p}{q} + {-p}{q}
        &= \frac{pq+q({-p})}{qq} = \frac{pq-qp}{qq} = \frac{pq-pq}{qq}\\
        &= \frac{(p-p)q}{qq} = \frac{p-p}{q} = \frac{0}{q} = 0^*
\end{align*}
%
Therefore, for any rational $\frac{p}{q}$, its \emph{additive inverse}
is the rational $\frac{-p}{q}$. For reasons to be explained, the
additive inverse of rational $x \in \Q$ will be denoted by $-x$. It can
be shown that for $x,y \in \Q$ with $x > 0^*$ and $y < 0^*$ then $-x <
0^*$ and $-y > 0^*$. It can also be shown that for $x \in \Q$,
$-(-x)=x$.

\paragraph{Multiplication:} For integers $p,q,r,s \in \Z$ with $q \neq
0$ and $s \neq 0$, define the \emph{multiplication} operator $\times$
such that
%
\begin{equation*}
        \frac{p}{q} \times \frac{r}{s} \triangleq \frac{pr}{qs}
\end{equation*}
%
where the result of a multiplication is called the \emph{product}. Thus
%
\begin{align*}
        \frac{r}{s} \times \frac{p}{q} 
        = \frac{rp}{sq}
        = \frac{pr}{qs}
        = \frac{p}{q} \times \frac{r}{s}
\end{align*}
%
Therefore, rational number multiplication is \emph{commutative}. That
is, for any two rationals $x,y \in \Q$, $x \times y = y \times x$. Now
take integers $p,q,r,s,t,u \in \Z$ with $q \neq 0$, $s \neq 0$, and $u
\neq 0$. Note that 
%
\begin{align*}
        \frac{p}{q} \times \left(\frac{r}{s} \times \frac{t}{u}\right)
        =
        \frac{p}{q} \times \frac{rt}{su} = \frac{prt}{qsu}
        =
        \frac{pr}{qs} \times \frac{t}{u}
        =
        \left(\frac{p}{q} \times \frac{r}{s}\right) \times \frac{t}{u}
\end{align*}
%
Therefore, rational number multiplication is also \emph{associative}.
That is, for any three rationals $x,y,z \in \Q$, $x\times(y\times
z)=(x\times y)\times z$. Also note that when multiplication is used with
addition, all multiplication operations should be completed first unless
grouping symbols like parentheses indicate that an addition should be
completed first. However, note that for any three rationals $x,y,z \in
\Q$,
%
\begin{equation*}
        p\times(q + r) = pq + pr
\end{equation*}
%
That is, rational number multiplication and addition have the
distributive property. Additionally, the notation $x \dot y$ or simply
$x y$ will often be used instead of $x \times y$. Note that for
any three integers $p,q,r \in \Z$ with $q \neq 0$ and $r \neq 0$,
%
\begin{align*} 
        \frac{p}{q} \times 0^*
        = \frac{p}{q} \times \frac{0}{r}
        = \frac{p \times 0}{qr}
        = \frac{0}{qr} = 0^*
\end{align*}
%
where the second and last steps are justified by
\longref{eq:rational_equivalence_relation}. Thus, for any rational
number $x \in \Q$, $x \times 0^* = 0^*$. In fact, for any two rational
numbers $x,y \in \Q$, if $x y = 0^*$ then it must be that $x=0^*$ or
$y=0^*$ or both. Additionally, it is the case that
%
\begin{align*}
        \frac{p}{q} \times 1^*
        = \frac{p}{q} \times \frac{r}{r}
        = \frac{pr}{qr}
        = \frac{p}{q}
\end{align*}
%
Therefore, for any rational number $x \in \Q$, it is the case that $x
\times 1^* = x$. Thus, $1^*$ is known as the \emph{multiplicative
identity} for the rational numbers. Additionally,
%
\begin{align*}
        \frac{p}{q} \times -1^*
        = \frac{p}{q} \times \frac{-r}{r}
        = \frac{p(-r)}{qr}
        = \frac{-pr}{qr}
        = \frac{-p}{q}
\end{align*}
%
Thus, multiplying any rational number $x \in \Q$ by the rational $-1^*$
produces the additive inverse of $x$. Therefore, a shorthand notation
for $-1^* \times x$ is simply $-x$. 

\paragraph{Multiplicative Inverses:} Take integers $p,q \in \Z$ with $p
\neq 0$ and $q \neq 0$. Note that
%
\begin{align*}
        \frac{p}{q} \times \frac{q}{p}
        &=
        \frac{pq}{qp} = \frac{pq}{pq} = \frac{1}{1} = 1^*
\end{align*}
%
In other words, $\frac{q}{p}$ is the \emph{multiplicative inverse} of
$\frac{p}{q}$. That is, the multiplicative inverse of a rational number
is generated by substituting the numerator and denominator of any ratio
that represents that rational number. For a rational number, its
multiplicative inverse is also called its \emph{reciprocal}. It should
be clear that every rational number $x \in \Q$ such that $x \neq 0^*$
has a multiplicative inverse, and therefore the multiplicative inverse
of $x$ is denoted $x^{-1}$.

\paragraph{Subtraction:} We can define the \emph{subtraction} operator
$-$ for rational numbers so that for any two rationals $x,y \in \Q$,
%
\begin{align*}
        x - y &\triangleq x + -y
\end{align*}
%
However, even though this is clearly a shorthand for addition, this
operation is not commutative nor associative. The result of a
subtraction is called a \emph{difference}.

\paragraph{Division:} For integers $p,q,r,s \in \Z$ with $q \neq 0$, $r
\neq 0$, $r \neq 0$, and $s \neq 0$, define the \emph{division} operator
$/$ such that
%
\begin{align*}
        \frac{p}{q} / \frac{r}{s} 
        &\triangleq \frac{ps}{qr}\\
        &= \frac{p}{q} \times \frac{s}{r}
        = \frac{p}{q} \times \frac{r}{s}^{-1}
\end{align*}
%
where the result of the division is known as a \emph{quotient}.
Sometimes the division operator $/$ will be represented as a ratio. That
is, for rationals $x,y \in Q$, $x/y$ will be written $\frac{x}{y}$.
Notice that division is simply multiplication by the multiplicative
inverse; that is, notice that
%
\begin{align*}
        \frac{\frac{p}{q}}{\frac{r}{s}}
        &=
        \frac{p}{q} \frac{s}{r}
\end{align*}
%
As described above, the ratio $\frac{s}{r}$ is known as the
\emph{reciprocal} of the ratio $\frac{r}{s}$, and so $\frac{s}{r}$ is
the \emph{multiplicative inverse} of $\frac{r}{s}$ (\ie,
$\left(\frac{r}{s}\right)^{-1}$). Therefore, division is identical to
multiplication with a reciprocal. However, it is \emph{not} the case
that division is commutative, associative, nor distributive. It is
simply a shorthand. Also note that
%
\begin{align*}
        1^* / \frac{r}{s} 
        = \frac{q}{q} / \frac{r}{s}
        = \frac{q}{q} \times \frac{s}{r}
        = \frac{qs}{qr}
        = \frac{s}{r}
\end{align*}
%
where the last step is justified by
\longref{eq:rational_equivalence_relation}. Therefore, for any non-zero
rational $x \in \Q$ (\ie, $x \neq 0^*$), the notation $1^*/x$ or
$\frac{1^*}{x}$ represents its reciprocal.

For integers $p,q \in \Z \setdiff \{0\}$, note that
%
\begin{align*}
        \frac{p}{q} \times \frac{1}{\frac{p}{q}}
        &=
        \frac{p}{q} \times \frac{q}{p}
        =
        \frac{pq}{qp}
        =
        \frac{pq}{pq}
        =
        \frac{1}{1}
        =
        1^*
\end{align*}
%
That is, the reciprocal of any non-zero rational number is its own
\emph{multiplicative inverse}. For any rational number $x \in \Q$, $x
\times ( 1^*/x ) = 1^*$. For example, $\frac{1}{2}$ is the
multiplicative inverse of $2^*$ since $2^* = \frac{2}{1}$. Note that for
the $p,q \in Z \setdiff \{0\}$,
%
\begin{align*}
        \frac{\frac{p}{q}}{\frac{p}{q}}
        &=
        \frac{p}{q} \times \frac{q}{p}
        =
        \frac{p}{q} \times \frac{1}{\frac{p}{q}}
        =
        \frac{p}{q} \times \left( \frac{p}{q} \right)^{-1}
        =
        1^*
\end{align*}
%
In other words, by the definition of the ratio of two rational numbers,
for any non-zero rational number $x \in \Q$, $x/x = 1^*$.

\paragraph{Exponentiation:} Now that multiplication and division have
been defined for the rationals, exponentiation can also be defined. For
any rationals $x, y, a, b \in \Q$, exponentiation of the rationals is
such that
%
\begin{align*}
        x^{0^*} &\triangleq 1^*\\
        x^{1^*} &\triangleq x\\
        x^{-1^*} &\triangleq \frac{1^*}{x}\\
        x^{a+b} &\triangleq x^a \times x^b\\
        x^{-b} &\triangleq \frac{1^*}{x^b}\\
        x^{a-b} &\triangleq \frac{x^a}{x^b}\\
        (x^a)^b &\triangleq x^{a \times b}\\
        (x \times y)^a &\triangleq x^a \times y^a\\
        \left(\frac{x}{y}\right)^a &\triangleq \frac{x^a}{y^a}
\end{align*}
%
Take rational $x \in \Q$ and integers $p,q \in \Z$ with $q \neq 0$ that
make up rational $\frac{p}{q} \in \Q$. By the laws above, the rational
%
\begin{equation*}
        x^\frac{p}{q} = ( x^\frac{1}{q} )^{p^*}
\end{equation*}
%
where $p^* = \frac{p}{1}$. Note that if $q < 0$ then $x^\frac{1}{q} = (
x^\frac{1}{|q|})^{-1^*}$, and so assume that $q > 0$. Thus, the
existence of $x^\frac{1}{q}$ where $q \in \Z$ with $q > 0$ is of
critical importance. The rational number $x^\frac{1}{s}$ should be such
that $( x^\frac{1}{q} )^{q^*} = x$. Note that $( -1^* )^\frac{1}{2}$
does not exist since there is no rational $x \in \Q$ such that $x \times
x = -1^*$. Similarly, there is no rational $x \in \Q$ such that $x
\times x = 2^*$, and so $(2^*)^\frac{1}{2}$ does not exist. However, $(
-8^* )^\frac{1}{3} = -2^*$ since $-2^* \times -2^* \times -2^* = -8^*$.
Also note that for any $x \in \Q$, $x^{2^*} \geq 0^*$. Additionally, by
this definition, ${0^*}^{0^*} = 1^*$. This definition also gives an
alternate notation for the multiplicative inverse. That is, for any $x
\in \Q \setdiff \{0^*\}$, its multiplicative inverse $1^*/x$ is also
denoted $x^{-1^*}$ and so $x \times x^{-1^*} = x^{-1^*} \times x = 1^*$.

\paragraph{Roots:} Take integer $q \in \Z$ with $q > 0$ and rational $x
\in \Q$. The rational number $x^\frac{1}{q}$ is called the \emph{$q\th$
root} of $x$ and is also denoted $\sqrt[q]{x}$. The special case of
$\sqrt[3]{x}$ is called the \emph{cube root} of $x$. The special case of
$\sqrt[2]{x}$ is called the \emph{square root} of $x$ and is often
written as $\sqrt{x}$.

\paragraph{Ratios of Even Integers:} Take integers $p,q \in \Z$ whose
ratio $p/q$ represents a particular rational number (\ie, $q \neq 0$).
Additionally, assume that $p$ and $q$ are both even integers. Thus, it
must be that there are integers $r,s$ such that $p = 2r$ and $q = 2s$.
Therefore,
%
\begin{align*}
        \frac{p}{q}
        &=
        \frac{2r}{2s}
        =
        \frac{r}{s}
\end{align*}
%
where the last step is justified by
\longref{eq:rational_equivalence_relation}. Thus, the rational number
represented by $p/q$ must also be represented by $r/s$. It can be shown
that applying this argument repeatedly leads to the conclusion that
every rational number can be represented by a ratio of two integers
where one integer is odd.

For example, assume that there exists a rational number $x \in \Q$ such
that $x^2 = 2^*$. Therefore it must be that there exists integers $p,q
\in \Z$ with $q \neq 0$ such that $p^2/q^2 = 2/1$. By
\longref{eq:rational_equivalence_relation}, $p^2 = 2 q^2$. Therefore
$p^2$ is even. However, as was shown above, this must mean that $p$ is
even as well. If $p$ is even then there exists $r \in \Q$ such that $p
= 2r$. Thus, $p^2 = 4 r^2 = 2 \times 2 r^2$. This implies that $q^2$
must also be even, and thus $q$ must be even. Therefore any ratio
representing rational number $x$ must be a ratio of two even integers.
However, it was shown that every rational number can be expressed as the
ratio of two integers, one of which is odd. So, this is a contradiction.
Therefore, it must be the case that there exists no $x \in \Q$ such that
$x^2 = 2^*$.

\paragraph{Base-10 (Decimal) Notation:} Now that we have defined the
rationals and have endowed them with addition, multiplication, and
exponentiation, it is possible to introduce familiar decimal notations
such as
%
\begin{align*}
        1.205
        \triangleq
        1^* \times {10^*}^{0} 
        + 2^* \times {10^*}^{-1} 
        + 0^* \times {10^*}^{-2} 
        + 5^* \times {10^*}^{-3}
\end{align*}
%
We trust that the reader is familiar with such notation. For brevity, we
will not explain it any further. A slightly more detailed discussion
will be given in \longref{app:math_reals}.

\paragraph{Absolute Value and Signum:} For any rational $x \in \Q$,
denote its \emph{absolute value} with the notation $|x|$ defined by
%
\begin{equation*}
        |x| 
        \triangleq 
        \begin{cases}
                x &\text{if } x \geq 0^*\\
                -x &\text{if } x < 0^*
        \end{cases}
\end{equation*}
%
and define the \emph{signum function} (also called the \emph{sign
function}, not to be confused with the \emph{sine function}) $\sgn: \Q
\mapsto \{-1^*,0^*,1^*\}$ with
%
\begin{equation*}
        \sgn(x)
        \triangleq
        \begin{cases}
                -1^* &\text{ if} x < 0^*\\
                0^* &\text{ if} x = 0^*\\
                1^* &\text{ if} x > 0^*
        \end{cases}
\end{equation*}
%
Therefore, any rational $z \in \Q$ can be represented as a magnitude
(\ie, absolute value $|z|$) and a sign (\ie, $\sgn(z)$), as in
%
\begin{equation*}
        z = \sgn(z) \times |z|
\end{equation*}
%
Note that the absolute value has some special properties. In particular,
for any two rationals $x,y \in \Q$,
%
\begin{itemize}
        \item $|x| \geq 0^*$
        \item $|x| = 0^*$ if and only if $x = 0^*$
        \item $|x \times y| = |x| \times |y|$
        \item $|x + y| \leq |x| + |y|$
        \item $|x - y| \geq |x| - |y|$
        \item $|{-x}| = |x|$
        \item $|x| \leq y$ if and only if $-y \leq x \leq y$
        \item $|x/y| = |x|/|y|$ if and only if $y \neq 0^*$
\end{itemize}
%
All of these properties are identical to the ones for integers, except
for the last property which has been added specifically for the
rationals.

\paragraph{Algebraic Structure of the Rationals:} Note that for 
$(\Q,{+},0^*)$, it is the case that
%
\begin{itemize}
        \item for all $x,y \in \Q$, $x + y = y + x$
        \item for all $x,y,z \in \Q$, $(x + y) + z = x + (y + z)$
        \item for all $x \in \Q$, $0^* + x = x + 0^* = x$
        \item for all $x \in \Q$, $x + -x = -x + x = 0^*$
\end{itemize}
%
and for $(\Q,{\times},1^*)$, it is the case that
%
\begin{itemize}
        \item for all $x,y \in \Q$, $x \times y = y \times x$
        \item for all $x,y,z \in \Q$, $(x \times y) \times z = x \times
                (y \times z)$
        \item for all $x \in \Q$, $1^* \times x = x \times 1^* = x$
\end{itemize}
%
And so for $(\Q,{+},{\times},0^*,1^*)$,
%
\begin{itemize}
        \item $(\Q,{+},0^*)$ is a \emph{commutative group} with
                additive inverse $-x$ for every $x \in \Q$
        \item $(\Q,{\times},1^*)$ is a \emph{commutative monoid} with
                multiplicative inverse $x^{-1}$ for every $x \in \Q
                \setdiff \{0^*\}$
        \item $0^* \neq 1^*$
        \item for each $x,y,z \in \Q$, $x(y + z) = xy + yz$ and
                $(x + y)z = xz + yz$
        \item for all $x \in \Q \setdiff \{0^*\}$, $x \times x^{-1} =
                x^{-1} \times x = 1$
\end{itemize}
%
Therefore, $(\Q,{+},{\times},0^*,1^*)$ is a \emph{field}. Thus,
$(\Q,{+},{\times},0^*,1^*)$ is trivially an algebra over itself (\ie, a
$\Q$-algebra). However, also note that for any $x,y,z \in \Q$,
%
\begin{itemize}
        \item if $x \leq y$ then $z + x \leq z + y$
        \item if $0^* \leq x$ and $0^* \leq y$ then $0^* \leq xy$
\end{itemize}
%
and so $(\Q,{+},{\times},0^*,1^*,{\leq})$ is an \emph{ordered field} and
all aspects of familiar arithmetic apply to it. Unless otherwise noted,
whenever $\Q$ is used, it is assumed that it is equipped with operators
$+$ and $\times$ and order relation $\leq$; in other words, $\Q$ is
implicitly taken to be the ordered field
$(\Q,{+},{\times},0^*,1^*,{\leq})$.

\paragraph{Relationship to Integers:} Define the set $\Z^*$ as the set
of rationals with a denominator of $1$. That is, define $\Z^*$ by
%
\begin{align*}
        \Z^*
        &\triangleq
        \left\{ \frac{p}{q} \in \Q : q = 1 \right\}
        =
        \{ [(p,q)] \in \Z \times \Z : p \in \Z, q = 1 \}\\
        &=
        \{ [(p,1)] : p \in \Z \}
        =
        \left\{ \frac{p}{1} : p \in \Z \right\}\\
        &=
        \left\{ p^* : p \in \Z \right\}
\end{align*}
%
and so $\Z^*$ is the set of rationals that have a element in their
equivalence class with denominator $1$. It is easy to show that if the
image of $\Z^* \times \Z^*$ through either operator $+$ or $\times$ is
$\Z^*$. Additionally, it can be shown that
$(\Z^*,{+}|_{\Z^*},{\times}|_{\Z^*})$ forms a commutative ring, and so
$\Z^*$ is a subring of $\Q$. Of course, since $\Q$ is an ordered field,
$\Q$ is also an ordered ring; since every subring of an ordered ring is
also an ordered ring then $(\Z^*,{+}|_{\Z^*},{\times}|_{\Z^*})$ is an
ordered ring.

Now, take the function $f: \Z^* \mapsto \Z$ defined by
%
\begin{align*}
        f
        &\triangleq 
        \left\{ 
        \left(\frac{p}{1}, p\right): \text{ for all } p \in \Z
        \right\}\\
        &=
        \left\{ \dots, 
           \left(\frac{-2}{1}, -2\right), 
           \left(\frac{-1}{1}, -1\right), 
           \left(\frac{0}{1}, 0\right), 
           \left(\frac{1}{1}, 1\right), 
           \left(\frac{2}{1}, 2\right), \dots \right\}\\
        &= \{ (p^*, p): \text{ for all } p \in \Z \}\\
        &=
        \{ \dots, (-2^*, -2), (-1^*, -1), 
                  (0^*, 0), (1^*, 1), (2^*, 2), \dots \}
\end{align*}
%
Clearly this is a bijection. That is, the inverse $f^{-1}: \Z \mapsto
\Z^*$ is defined by
%
\begin{align*}
        f^{-1}
        &\triangleq 
        \left\{ 
        \left(p,\frac{p}{1}\right): \text{ for all } p \in \Z
        \right\}\\
        &=
        \left\{ \dots, 
           \left(-2,\frac{-2}{1}\right), 
           \left(-1,\frac{-1}{1}\right), 
           \left(0,\frac{0}{1}\right), 
           \left(1,\frac{1}{1}\right), 
           \left(2,\frac{2}{1}\right), \dots \right\}\\
        &= \{ (p^*, p): \text{ for all } p \in \Z \}\\
        &=
        \{ \dots, (-2,-2^*), (-1,-1^*), 
                  (0,0^*), (1,1^*), (2,2^*), \dots \}
\end{align*}
%
Therefore $\Z \cong \Z^*$. Also, note that for any rationals $x,y \in
\Z^*$, 
%
\begin{enumerate}[(i)]
        \item if $x \geq y$ then $f(x) \geq f(y)$
                \label{item:rational_integer_ordering}
        \item $f(x + y) = f(x) + f(y)$
                \label{item:rational_integer_ring_homomorphism_plus}
        \item $f(x \times y) = f(x) \times f(y)$
                \label{item:rational_integer_ring_homomorphism_times}
        \item $f(1^*)=1$
              \label{item:rational_integer_ring_homomorphism_m_identity}
\end{enumerate}
%
Property (\shortref{item:rational_integer_ordering}) shows that $f$ is a
monotone function, and properties
(\shortref{item:rational_integer_ring_homomorphism_plus})--%
(\shortref{item:rational_integer_ring_homomorphism_m_identity}) show
that $f$ is a ring homomorphism. Since $f$ is also a bijection, it can
be said that $f$ is both an isomorphism in both the order sense and the
algebraic sense. In other words, $\Z$ is isomorphic to $\Z^*$ in both an
order sense and an algebraic sense. Therefore, not only is $\Z \cong
\Z^*$, but $\Z^*$ is a valid \emph{representation} for $\Z$, and it is
justifiable to say that $\Z$ is a subring of $\Q$.

For example, note that for any rationals $x,y \in \Z^*$ and whole number
$a \in \W$,
%
\begin{itemize}
        \item $x = y$ if and only if $f(x) = f(y)$
        \item $x \leq y$ if and only if $f(x) \leq f(y)$
        \item $f(0^*) = 0$
        \item $f(x + y) = f(x)+f(y)$
        \item $f(x - y) = f(x)-f(y)$
        \item $f(1^*) = 1$
        \item $f(x y) = f(x) f(y)$
        \item $f(x^{a^*}) = f(x)^a$
\end{itemize}
%
So arithmetic and order are both preserved by the bijection $f$. Thus,
while $\Z$ is certainly not equal to $\Z^*$, it is equal in all of the
important ways that matter to us, and so we can consider $\Z \subset \Q$
with all of its standard ordering and operations. In other words, the
$*$ superscript can be dropped from all of the rational symbols above;
the set $\Z^*$ is a valid representation of the set of the integers
$\Z$.

\subsection{Ordering Issues with the Countable Numbers}
\label{app:math_ordering_issues}

Up to this point, the numbers that have been defined have been very
intuitive. That is,
%
\begin{itemize}
        \item Whole numbers (and natural numbers) are an abstraction of
                standard counting. 
        \item Integers quantify the differences between whole numbers.
        \item Rationals provide a scale on which to order differences
                (\ie, integers) on equal footing.
\end{itemize}
%
Additionally, since there is a subset of the integers that is isomorphic
to a the whole numbers and a subset of the rationals that is isomorphic
to the integers, the rationals provide an interesting new perspective on
the other sets of numbers. That is, the rationals seem to fill gaps in
the other numbers--they provide the same extension to the whole numbers
that the integers do while also yielding an unbounded set of numbers
between each integer. However, like the whole numbers and integers, the
rationals are countable. As we will show, this will ultimately limit how
well the rationals can fill the gaps between the integers.

\paragraph{Example of Existence of Bounds:} As an exercise, take the
sets $\set{X},\set{Y} \in \Q$ defined as
%
\begin{equation*}
        \set{X}
        \triangleq
        \{ q \in \Q : -2 < q \text{ and } q < 2 \}
        \quad \text{ and } \quad
        \set{Y}
        \triangleq
        \{ q \in \Q : 1 \leq q \leq 5 \}
\end{equation*}
%
Of course, these sets could also be specified with $\{ q \in \Q : |q|
< 2 \}$ and $\{ q \in \Q : |q-3| \leq 2 \}$ respectively. Note that the
infimum and supremum of these two sets both exist. In particular,
%
\begin{equation*}
        \inf \set{X} = -2 
        \quad \text{ and } \quad
        \sup \set{X} = 2 
\end{equation*}
%
and
\begin{equation*}
        \inf \set{Y} = 1 
        \quad \text{ and } \quad
        \sup \set{Y} = 5 
\end{equation*}
%
While set $\set{Y}$ has a maximum (\ie, $\max \set{Y} = 5$) and a
minimum (\ie, $\min \set{Y} = 1$), set $\set{X}$ has neither a maximum
nor a minimum. Both sets contain a countably infinite number of
elements, and because of that they are both congruent to the set $\N$.
These are typical sets of rational numbers. Both are bounded, though one
includes its bounds and one does not. The bounds exist and are members
of $\Q$. 

\paragraph{Example of Nonexistence of Bounds:} We borrow this example
from \citet{Rudin76}. Consider the set $\set{Z} \subset \Q$ defined as
%
\begin{equation*}
        \set{Z}
        \triangleq
        \{ q \in \Q : q^2 \leq 2 \text{ and } q \geq 0 \}
\end{equation*}
%
Note that it has been shown that there is no rational $q \in \Q$ such
that $q^2 = 2$. Therefore, 
%
\begin{equation*}
        \set{Z}
        =
        \{ q \in \Q : q^2 < 2 \text{ and } q \geq 0 \}
\end{equation*}
%
Since $0^2 < 2$ and $0 \geq 0$, it is clear that $0 \in \set{Z}$. In
fact, $\min \set{Z} = \inf \set{Z} = 0$. That is, $0$ is the greatest
lower bound and, in fact, the minimum of $\set{Z}$. Also, it can be
shown that for all $z \in \set{Z}$, $z < 2$. Therefore, $2$ is an upper
bound for $\set{X}$. Thus, $\set{Z}$ is certainly bounded from above and
bounded from below (\ie, $\set{Z}$ is \emph{bounded}).  However, note
that $2^2 > 2$, and so while $2$ is an upper bound on $\set{Z}$, $2
\notin \set{Z}$. To find the \emph{least} upper bound of $\set{Z}$, take
$q,r \in \Q$ such that $q \geq 0$ and $r$ is such that
%
\begin{align}
        r 
        &= 
        q + \frac{2 - q^2}{q + 2}
        \label{eq:def_r_one}\\
        &= 
        \frac{q(q+2)}{q+2} + \frac{2 - q^2}{q + 2}
        = 
        \frac{q^2+2q+2-q^2}{q+2}
        = 
        \frac{2q+2}{q+2}
        \label{eq:def_r_two}\\
        &= 
        \frac{2(q+1)}{q+2}
        = 
        2 \frac{q+1}{q+2}
        \label{eq:def_r_three}
\end{align}
%
then, by \longref{eq:def_r_one},
%
\begin{align}
        r^2 
        &= 
        \frac{(2q+2)^2}{(q+2)^2}
        = 
        \frac{4 q^2 + 8q + 4}{(q+2)^2}
        = 
        \frac{2 q^2 + 8q + 8 + 2q^2 - 4}{(q+2)^2}
        \nonumber\\
        &= 
        \frac{2 (q^2 + 4q + 4) + 2(q^2 - 2)}{(q+2)^2}
        = 
        \frac{2 (q+2)^2 + 2(q^2 - 2)}{(q+2)^2}
        \nonumber\\
        &= 
        2 + \frac{2(q^2 - 2)}{(q+2)^2}
        = 
        2 - \frac{2(2 - q^2)}{(q+2)^2}
        \label{eq:def_r2}
\end{align}
%
Assume that $q \in \set{Z}$. Then $2 - q^2$ is positive and, by
\longref{eq:def_r_one}, $r > q$. However, \longref{eq:def_r2} shows that
$r^2 - 2$ is negative. That is, $r^2 < 2$. Therefore, $r \in \set{Z}$.
Thus, for every $q \in \set{Z}$, there exists an $r > q$ such that $r
\in \set{Z}$, and so there can be no upper bound for $\set{Z}$ contained
in $\set{Z}$. Thus, assume that $q \in \Q$ such that $q \notin \set{Z}$
and$q \geq 0$. Then $2 - q^2$ is negative and, by
\longref{eq:def_r_one}, $r < q$. However, \longref{eq:def_r2} shows that
$r^2 - 2$ is positive. That is, $r^2 > 2$. Therefore, $r \notin
\set{Z}$. Thus, for every $q \in \Q$ with $q \notin \set{Z}$ and $q \geq
0$, there exists an $r < q$ such that $r \notin \set{Z}$. Therefore,
$\set{Z} \subset \Q$ has no least upper bound in $\Q$. This means $\Q$
cannot be \emph{gapless}.

\paragraph{Gaps in Rational Numbers:} Thus, the prior example shows that
the rationals $\Q$ are somehow missing important numbers. That is,
despite that rationals are densely ordered (\ie, any gap between any two
rationals is filled by an unbounded number of rationals), there are
still some sort of gaps or holes between rational numbers. This is what
prevents the rationals from being \emph{gapless} and therefore also not
\emph{complete}. In fact, it can be shown that any partially ordered set
that is densely ordered cannot be \emph{gapless} if it is also
countable; the gaps in the rationals are a direct consequence of their
countability. Nontrivial dense sets that are gapless must also be
uncountable.

\subsection{Countability and Order: Gaplessness and Dense Ordering} 
\label{app:math_countability_and_order}

These well-known results are due to Cantor, who provided significant
contributions to the analysis of infinite sets.

\paragraph{Lemma:} Every subset of $(\N,{\leq})$ has a minimum element.
This was discussed in \longref{app:math_whole_numbers}. It depends upon
$\N$ being gapless, bounded from below, and not densely ordered.

\paragraph{Theorem:} Take partially ordered set $(\set{X},{\leq})$. If
it is the case that 
%
\begin{enumerate}[(i)]
        \item $\set{X}$ contains at least two elements
                \label{item:countability_two_elements}
        \item $\set{X}$ is densely ordered
                \label{item:countability_densely_ordered}
        \item $\set{X}$ is gapless
                \label{item:countability_gapless}
\end{enumerate}
%
then set $\set{X}$ must be uncountable. As a logical consequence of this
theorem, if $\set{X}$ is countable and has at lest two elements then it
must either not be gapless (\eg, $\Q$) or not be densely ordered (\eg,
$\W$, $\N$, and $\Z$). Of course, if a set is not gapless then it
cannot be complete.

\paragraph{Proof of Theorem:} To prove this theorem, take a partially
ordered set $(\set{X},{\leq})$ that meets properties
(\shortref{item:countability_two_elements}),
(\shortref{item:countability_densely_ordered}), and
(\shortref{item:countability_gapless}). However, assume that set
$\set{X}$ is countable. We will show that this leads to a logical
contradiction, and thus $\set{X}$ must be uncountable. This proof method
is a form of \emph{modus tollens}, or \emph{proof by contraposition},
which is described in \longref{eq:math_logic_application_proof}.

Since $\set{X}$ is countable, there exists a bijective function $f:
\N \mapsto \set{X}$. Take such a function $f$. Then, $f[\N]=\set{X}$ and
$f^{-1}[\set{X}]=\N$. In particular,
%
\begin{equation*}
        \{ f(n) : n \in \N \} 
        = \{ f(1), f(2), f(3), f(4), \dots \} 
        = \set{X}
\end{equation*}
%
In other words, each element of $\set{X}$ can be represented by a
\emph{unique} symbol of the form $f(n)$ with $n \in \N$. Clearly,
%
\begin{equation*}
        \{ f^{-1}(x) : x \in \set{X} \} 
        = \{f^{-1}(f(1)),f^{-1}(f(2)),f^{-1}(f(3)),f^{-1}(f(4)),\dots\}
        = \N
\end{equation*}
%
Also take $a_0,b_0 \in \set{X}$ such that $a_0 < b_0$. This is possible
by property (\shortref{item:countability_two_elements}). Define the set
$\set{A}_0 \subseteq \N$ by
%
\begin{equation*}
        \set{A}_0 \triangleq \{ n \in \N : a_0 \leq f(n) \leq b_0 \}
\end{equation*}
%
By property (\shortref{item:countability_densely_ordered}), $\set{A}_0
\neq \emptyset$. Additionally, since $\set{A}_0 \subseteq \N$, it has a
minimum element by the lemma stated above. Therefore, define $a_1
\triangleq f( \min \set{A}_0 )$. Similarly, define the set $\set{B}_0
\subseteq \N$ by
%
\begin{align*}
        \set{B}_0 
        &\triangleq \{ n \in \N : a_1 \leq f(n) \leq b_0 \}\\
        &= \set{A}_0 \setdiff \{\min \set{A}_0\}
\end{align*}
%
and define $b_1 \triangleq f( \min \set{B}_0 )$.  By property
(\shortref{item:countability_densely_ordered}), this process can
continue \adinfinitum{} with $a_i$ and $b_i$ defined for all $i \in \N$
as
%
\begin{equation*}
        a_i 
        \triangleq 
        f\left(\min\{n \in \N : a_{i-1} \leq f(n) \leq b_{i-1}\}\right)
\end{equation*}
%
and
%
\begin{equation*}
        b_i 
        \triangleq 
        f\left( \min\{n \in \N : a_i \leq f_n \leq b_{i-1}\} \right)
\end{equation*}
%
This process is shown graphically in \longref{fig:countable_intervals},
where the arrow points in the direction of increasing order; that is,
since $a_1$ is to the right of $a_0$ then $a_1 \geq a_0$. 
%
\begin{figure}[!ht]\centering
        \begin{picture}(300,20)(-150,-10) % x: -150 to 150
                \put(-140,6){\makebox(0,0)[b]{$a_0$}}
                \put(-140,-3){\line(0,1){6}}
                \put(-140,-6){\makebox(0,0)[t]{$f(1)$}}
                \put(-118,6){\makebox(0,0)[b]{$a_1$}}
                \put(-118,-3){\line(0,1){6}}
                \put(-118,-6){\makebox(0,0)[t]{$f(3)$}}
                \put(-91,6){\makebox(0,0)[b]{$a_2$}}
                \put(-91,-3){\line(0,1){6}}
                \put(-91,-6){\makebox(0,0)[t]{$f(5)$}}
                \put(-59,6){\makebox(0,0)[b]{$a_3$}}
                \put(-59,-3){\line(0,1){6}}
                \put(-59,-6){\makebox(0,0)[t]{$f(7)$}}
                %
                %\put(-47.5,0){\vector(-1,0){102.5}}
                \put(-47.5,0){\line(-1,0){102.5}}
                \put(-39.5,0){\makebox(0,0){$\cdots$}}
                \put(-31.5,0){\vector(1,0){197.5}}
                %
                \put(-20,6){\makebox(0,0)[b]{$b_3$}}
                \put(-20,-3){\line(0,1){6}}
                \put(-20,-6){\makebox(0,0)[t]{$f(8)$}}
                \put(40,6){\makebox(0,0)[b]{$b_2$}}
                \put(40,-3){\line(0,1){6}}
                \put(40,-6){\makebox(0,0)[t]{$f(6)$}}
                \put(100,6){\makebox(0,0)[b]{$b_1$}}
                \put(100,-3){\line(0,1){6}}
                \put(100,-6){\makebox(0,0)[t]{$f(4)$}}
                \put(140,6){\makebox(0,0)[b]{$b_0$}}
                \put(140,-3){\line(0,1){6}}
                \put(140,-6){\makebox(0,0)[t]{$f(2)$}}
        \end{picture}
        \caption{Nested Intervals of a Countable Densely Ordered Set}
        \label{fig:countable_intervals}
\end{figure}
%
Clearly, for any $i \in \N$, $a_i \geq a_{i-1}$ and $b_i \leq b_{i-1}$.
Additionally, for any $i \in \N$ and $j \in \N$, it is the case that
$a_i < b_j$.

Now take sets $\set{A} \subset \set{X}$ and $\set{B} \subset \set{X}$,
defined by
%
\begin{equation*}
        \set{A} \triangleq \{ a_i : i \in \N \}
\end{equation*}
%
and
%
\begin{equation*}
        \set{B} \triangleq \{ b_i : i \in \N \}
\end{equation*}
%
As shown above, any element of $\set{B}$ is an upper bound of set
$\set{A}$, and any element of set $\set{A}$ is a lower bound of set
$\set{B}$. Therefore, since $\set{X}$ is gapless, the least upper bound
of $\set{A}$ (\ie, $\sup \set{A}$) and the greatest lower bound of
$\set{B}$ (\ie, $\inf \set{A}$) exist. Since $\sup \set{A} \in \set{X}$
and $\set{X}$ is countable then there exists some $m \in \N$ such that
$f(m) = \sup \set{A}$. Take such an $m$. It is easy to show that $\sup
\set{A} \leq \inf \set{B}$, and so $f(m) \leq \inf \set{B}$. Therefore,
for all $a \in \set{A}$ and all $b \in \set{B}$, it is the case that
%
\begin{equation}
        a \leq f(m) \leq b
        \label{eq:uncountable_proof_contradiction}
\end{equation}
%
However, by the construction of elements $a_i$ and $b_i$, there exists
some $n$ such that $a_n = f(m)$ or $b_n = f(m)$. Take such an $n$. As
discussed, $a_{n+1} > a_n$ and $b_{n+1} < b_n$. Thus, it is either the
case that $a_{n+1} > f(m)$ or $b_{n+1} < f(m)$, which contradicts
\longref{eq:uncountable_proof_contradiction} since $a_{n+1} \in \set{A}$
and $b_{n+1} \in \set{B}$. Therefore, $\set{X}$ must be uncountable.

\subsection{The Real Numbers}
\label{app:math_reals}

As shown in \longref{app:math_ordering_issues}, the rational numbers are
somehow not complete; despite there being an unbounded number of
rationals between any two rationals, there are still numbers missing
from $\Q$. The real numbers have been constructed to fill these gaps.
However, this construction requires the reals to be uncountable. 

\paragraph{Definition:} Following the example of \citet{Rudin76}, we
construct the reals using \emph{Dedekind cuts} of the rational numbers,
which is a method attributable to Dedekind. The basic idea is to cut the
rational numbers $\Q$ into a partition of two sets where one set is
constructed to have no least upper bound; each real number can be
thought of as taking up the space in between the two sets of the
partition. That is, each real number cuts the rationals into two halves.
Alternatively, the real numbers can be defined as \emph{Cauchy
sequences} of rational numbers, as is discussed by \citet{Stoll79}; this
other construction is originally due to Cantor. These two constructions
are isomorphic to each other in both the order and algebraic senses and
thus form equivalent notions of the real numbers.

The following construction is a condensed form of the derivation given
by \citet{Rudin76}. We omit much of the proof for brevity. The real
numbers are the most abstract of the conventional number systems, and
thus their construction is considerably more complicated than the other
number systems.

Define a real number as a strict subset $\alpha \subset \Q$ where
%
\begin{enumerate}[(i)]
        \item $\alpha \neq \emptyset$ and $\alpha \neq \Q$.
                \label{item:real_proper}
        \item If $q \in \Q$ and $p \in \alpha$ such that $q < p$ then
                $q \in \alpha$. \label{item:real_member}
        \item If $p \in \alpha$ then $p < r$ for some $r \in \alpha$.
                \label{item:real_nonmember}
\end{enumerate}
%
which is called a \emph{Dedekind cut} of the rational numbers. In other
words, $\alpha \subset \Q$ is a strict non-empty subset of the rationals
that has no largest member. Additionally, any rational that is not a
member of $\alpha$ is greater than any member of $\alpha$.  Similarly,
if any rational is greater than a non-member of $\alpha$, that rational
itself cannot be a member of $\alpha$. Define the set of the real
numbers \symdef{Bnumbers.50}{reals}{$\R$}{the set of the real numbers},
also called the reals, as
%
\begin{equation*}
        \R 
        \triangleq 
        \{ \xi \subset \Q : \xi \text{ has properties
        (\shortref{item:real_proper}), 
        (\shortref{item:real_member}), and
        (\shortref{item:real_nonmember})} \}
\end{equation*}
%
If real numbers $\alpha, \beta \in \R$ are equal then $\alpha \subseteq
\beta$ and $\beta \subseteq \alpha$. In other words, real numbers
$\alpha$ and $\beta$ are equal if $\alpha = \beta$ (\ie, if the sets are
equal). Of course, this will also be denoted $\alpha = \beta$.

\paragraph{Symbols:} For every rational number $q \in \Q$, define the
set $q^*$ with
%
\begin{equation*}
        q^*
        \triangleq 
        \{ p \in \Q : p < q \}
\end{equation*}
%
Therefore,
%
\begin{align*}
        &\mathrel{\vdots}\\
        -2^* &\triangleq \{ p \in \Q : p < -2 \}\\
        -\frac{1}{2}^* 
        &\triangleq \left\{ p \in \Q : p < -\frac{1}{2} \right\}\\
        -1^* &\triangleq \{ p \in \Q : p < -1 \}\\
        0^* &\triangleq \{ p \in \Q : p < 0 \}\\
        1^* &\triangleq \{ p \in \Q : p < 1 \}\\
        \frac{1}{2}^* 
        &\triangleq \left\{ p \in \Q : p < \frac{1}{2} \right\}\\
        2^* &\triangleq \{ p \in \Q : p < 2 \}\\
        &\mathrel{\vdots}
\end{align*}
%
Make special note of $0^*$, $1^*$, and $-1^*$, which will all be used
explicitly below. Also note that $0^*$ is the set of all negative
rational numbers.

Clearly, for every $q \in \Q$, $q^* \in \R$. That is, define the set
$\Q^*$ as
%
\begin{align*}
        \Q^*
        &\triangleq
        \{ \{ p \in \Q : p < q \} : q \in \Q \}\\
        &=
        \{ q^* : q \in \Q \}
\end{align*}
%
It is clear that $\Q^* \subseteq \R$. Also note that by construction,
$\Q^* \cong \Q$. It is also clear that for any $r^* \in \Q^*$, the least
upper bound of $r^*$ exists and is $r \in \Q$ (\ie, for all $r \in \Q$,
$\sup r^* = r$). In fact, $\Q^*$ is the collection of all real numbers
that have a least upper bound in $\Q$. Later we will justify denoting
$r^*$ simply by $r$. We refrain from making this substitution early in
order to stress the difference between real numbers and rational
numbers.

Note that since the least upper bound of every element of $\Q^*$ exists,
it can be written that for every $r^* \in \Q^*$,
%
\begin{equation*}
        \sup r^* 
        = \inf ( \Q \setdiff r^* ) 
        = \min ( \Q \setdiff r^* ) 
        = r
\end{equation*}
%
where $r \in \Q$ such that $r^* = \{ p \in \Q : p < r \}$. Also note
that eventually we will show that $\Q^*$ is isomorphic in both order and
algebraic senses to $\Q$, and thus $\Q$ and $\Q^*$ can be considered
equivalent without any loss of generality (\ie, since $\Q^* \subseteq
\R$ then $\Q \subseteq \R$).

\paragraph{Total Ordering:} Take two real numbers $\alpha, \beta \in
\R$. It is the case that $\alpha$ is less than or equal to $\beta$
(denoted $\alpha \leq \beta$) if and only if $\alpha \subseteq \beta$.
Similarly, $\alpha$ is strictly less than $\beta$ (denoted $\alpha <
\beta$) if and only if $\alpha \subset \beta$. Just as with the related
inequality relation on the other numbers, $\alpha \leq \beta$ can also
be denoted $\beta \geq \alpha$, and $\alpha < \beta$ can also be denoted
$\beta > \alpha$. In these cases, $>$ ($\geq$) represents that a
real number is greater than (or equal to) another real.

This ordering implies that
%
\begin{equation*}
        \cdots 
        \leq -2^* \leq -1^*
        \leq \frac{-1}{2}^* \leq \frac{-1}{8}^* \leq
        0^* 
        \leq \frac{1}{8}^* \leq \frac{1}{2}^*
        \leq 1^* \leq 2^* \leq 
        \cdots
\end{equation*}
%
and, in fact,
%
\begin{equation*}
        \cdots 
        < -2^* < -1^* 
        < \frac{-1}{2}^* < \frac{-1}{8}^* <
        0^* 
        < \frac{1}{8}^* < \frac{1}{2}^*
        < 1^* < 2^* < \cdots
\end{equation*}

\paragraph{Special Subsets of the Reals:} We refer to any real number
greater than $0^*$ as \emph{positive} and define the set of
\emph{positive real numbers}
\symdef{Bnumbers.510}{realsg0}{$\R_{>0}$}{the set of the strictly
positive real numbers} as
%
\begin{equation*}
        \R_{>0} \triangleq \{ r \in \R : r > 0^* \}
\end{equation*}
%
The set of the \emph{non-negative real numbers}
\symdef{Bnumbers.511}{realsgeq0}{$\R_{\geq0}$}{the set of the
non-negative real numbers} is defined to be the union of the positive
real numbers with the singleton set $\{0^*\}$. That is, $\R_{\geq0}$ is
defined to be
%
\begin{align*}
        \R_{\geq0} 
        &\triangleq \R_{>0} \cup \{ 0^* \}\\
        &= \{ r \in \R : r \geq 0^* \}
\end{align*}
%
Similarly, we refer to any real less than $0^*$ as \emph{negative} and
define the set of \emph{negative real numbers}
\symdef{Bnumbers.520}{realsl0}{$\R_{<0}$}{the set of the strictly
negative real numbers} as
%
\begin{equation*}
        \R_{<0} \triangleq \{ r \in \R : r < 0^* \}
\end{equation*}
%
The set of the \emph{non-positive real numbers}
\symdef{Bnumbers.521}{realsleq0}{$\R_{\leq0}$}{the set of the
non-positive real numbers} is defined to be the union of the negative
real numbers with the singleton set $\{0^*\}$. That is, $\R_{\leq0}$ is
defined to be
%
\begin{align*}
        \R_{\leq0} 
        &\triangleq \R_{<0} \cup \{ 0^* \}\\
        &= \{ r \in \R : r \leq 0^* \}
\end{align*}
%
Note that $\R_{\geq0} = \R \setdiff \R_{<0} = \R_{<0}^c$ and $\R_{\leq0}
= \R \setdiff \R_{>0} = \R_{>0}^c$. That is, the complement of the
negative reals is the non-negative reals and the complement of the
positive reals is the non-positive reals. The \emph{non-zero reals}
\symdef{Bnumbers.53}{realsneq0}{$\R_{\neq0}$}{the set of the non-zero
real numbers} is defined to be the union of the positive reals and the
negative reals. That is, $\R_{\neq0}$ is defined to be
%
\begin{align*}
        \R_{\neq0} 
        &\triangleq \R_{>0} \cup \R_{<0}\\
        &= \R \setdiff \{0^*\}\\
        &= \{0^*\}^c\\
        &= \{ r \in \R : r \neq 0^* \}
\end{align*}
%
As shown, $\R_{\neq0}$ is the complement of the singleton set $\{0^*\}$.

\paragraph{Dense Ordering:} Note that for any two \emph{distinct} real
numbers $x,y \in \R$ such that $x < y$, there is a third real number $z
\in \R$ such that $x < z < y$. As discussed, this is not the case with
the whole numbers nor the integers. This property makes the set of real
numbers $\R$ a \emph{densely ordered set}. This is an important
property of the real numbers. The real numbers share this property with
the rational numbers.

\paragraph{Gaplessness:} Let $\set{A} \subset \R$ be nonempty and have
an \emph{upper bound} $\beta \in \R$. That is, for every $\alpha \in
\set{A}$, it is the case that $\alpha \leq \beta$. Now define $\gamma$
to be the union of all elements $\alpha \in \set{A}$; that is, define
$\gamma \triangleq \bigcup \{ \alpha: \alpha \in \set{A} \}$. It can be
shown that $\gamma \in \R$ and $\sup \set{A} = \gamma$. In other words,
$\sup \set{A} \in \R$. That is, for any nonempty strict subset of $\R$
that is bounded from above, the least upper bound of that set exists and
is a member of $\R$.

Let $\set{A} \subset \R$ be nonempty and have a \emph{lower bound}
$\beta \in \R$. That is, for every $\alpha \in \set{A}$, it is the case
that $\beta \leq \alpha$. Now define $\gamma$ to be the intersection of
all elements $\alpha \in \set{A}$; that is, define $\gamma \triangleq
\bigcap \{ \alpha: \alpha \in \set{A} \}$. It can be shown that $\gamma
\in \R$ and $\inf \set{A} = \gamma$. In other words, $\inf \set{A} \in
\R$. That is, for any nonempty strict subset of $\R$ that is bounded
from below, the greatest lower bound of that set exists and is a member
of $\R$.

Therefore, $\R$ is \emph{gapless} (\ie, \emph{Dedekind complete}). Of
course, there are subsets of $\R$ that do not have this property.

\paragraph{Countability:} By the theorem in
\longref{app:math_countability_and_order}, since $\R$ is both densely
ordered and gapless, $\R$ is \emph{uncountable}. This will be important
in our discussion of the cardinality of $\R$ below.

\paragraph{Addition:} Take real numbers $\alpha \in \R$ and $\beta \in
\R$. Define the \emph{addition} operator $+$ such that
%
\begin{equation*}
        \alpha + \beta 
        \triangleq
        \{ r + s : r \in \alpha, s \in \beta \}
\end{equation*}
%
That is, $\alpha + \beta$ is the set of all rationals of the form $r +
s$ where $r$ is any element of $\alpha$ and $s$ is any element of
$\beta$. 

Take any three rationals $\alpha, \beta, \gamma \in \R$. The following
statements can be shown.
%
\begin{itemize}
        \item $\alpha + \beta$ meets the requirements for a real number;
                it is a Dedekind cut.
        \item Since rational addition is commutative then $\alpha +
                \beta = \beta + \alpha$, and so real addition is also
                commutative.
        \item Since rational addition is associative then $(\alpha +
                \beta) + \gamma = \alpha + (\beta + \gamma)$, where the
                grouping symbols have the standard meaning. In other
                words, real addition is associative.
        \item It is the case that $\alpha + 0^* = \alpha$, and so $0^*$
                is the \emph{additive identity} for real addition.
\end{itemize}
%
The result of an addition is called a \emph{sum}.

\paragraph{Additive Inverses:} Take real number $\alpha \in \R$. Define
the symbol $-\alpha$ as
%
\begin{align*}
        -\alpha
        &\triangleq
        \{ p \in \Q : \text{there exists } r \in \Q \text{ with } r>0
                      \text{ such that } -p - r \notin \alpha \}\\
        &=
        \Q \setdiff \{ p \in \Q : \text{ for all } r \in \Q 
                                  \text{ with } r>0
                                  \text{ such that } 
                                  -p - r \in \alpha \}
\end{align*}
%
It can be shown that $-\alpha \in \R$ (\ie, $-\alpha$ is a Dedekind cut,
and so it is a valid real number). It can also be shown that $\alpha +
-\alpha = 0^*$. Thus, $-\alpha$ is the \emph{additive inverse} for real
number $\alpha$. Of course, it can be shown that if and only if $\alpha$
is positive (\ie, $\alpha \in \R_{>0}$) then $-\alpha$ is negative (\ie,
$-\alpha \in \R_{<0}$). Similarly, if and only if $\alpha \in \R_{<0}$
then $-\alpha \in \R_{>0}$. Finally, $-(-\alpha)=\alpha$.

\paragraph{Subtraction:} We can define the \emph{subtraction} operator
$-$ for real numbers so that for any two reals $\alpha,\beta \in \R$,
%
\begin{align*}
        \alpha - \beta &\triangleq \alpha + -\beta
\end{align*}
%
where $-\beta$ is the additive inverse for $\beta$. However, even though
this is clearly a shorthand for addition, this operation is not
commutative nor associative. The result of a subtraction is called a
\emph{difference}.

\paragraph{Multiplication:} Take \emph{positive} real numbers $\alpha,
\beta \in \R_{>0}$. Define the \emph{multiplication} operator $\times$
(where juxtaposition implies this operator) such that
%
\begin{equation*}
        \alpha \beta 
        \triangleq
        \{ p \in \Q : r \in \alpha, s \in \beta, p \leq rs \}
\end{equation*}
%
Now take real numbers $\gamma,\delta \in \R$. Multiplication has not yet
been defined for negative real numbers; however, it is defined for
positive real numbers. Thus, negative multiplication will be defined in
terms of positive multiplication.  Multiplication by $0^*$ will be
defined explicitly as $0^*$. That is, 
%
\begin{equation*}
        \gamma \delta
        \triangleq
        \begin{cases}
               0^*
               &\text{if } \gamma = 0^* \text{ or } \delta = 0^*\\
               (-\gamma)(-\delta)
               &\text{if } \gamma < 0^* \text{ and } \delta < 0^*\\
               -((-\gamma)\delta)
               &\text{if } \gamma < 0^* \text{ and } \delta > 0^*\\
               -(\gamma(-\delta))
               &\text{if } \gamma > 0^* \text{ and } \delta < 0^*
        \end{cases}
\end{equation*}
%
It can be shown that $\gamma \times 1^* = 1^* \times \gamma = \gamma$.
Therefore, $1^*$ is the \emph{multiplicative identity} for real
multiplication. Additionally, $-1^* \times \gamma = -\gamma$. Also, the
distributive property holds; multiplication distributes over addition.
That is, for $\gamma,\delta,\varepsilon \in \R$,
%
\begin{equation*}
        \gamma ( \delta + \varepsilon )
        =
        \gamma \delta + \gamma \varepsilon
\end{equation*}
%
It is also easy to show that multiplication both \emph{commutative} and
\emph{associative}. That is, for $\alpha,\beta,\gamma \in \R$,
%
\begin{equation*}
        \alpha \beta = \beta \alpha
\end{equation*}
%
and
%
\begin{equation*}
        \alpha (\beta \gamma) = (\alpha \beta) \gamma
\end{equation*}
%
where the grouping symbols have the normal impact on the order of
operations. The result of a multiplication is called a \emph{product}.

\paragraph{Multiplicative Inverses:} Take \emph{non-zero} real number
$\alpha \in \R_{\neq0}$. There exists $\alpha^{-1} \in \R_{\neq0}$ such
that $\alpha \times \alpha^{-1} = 1^*$, where $\alpha^{-1}$ is called
the \emph{multiplicative inverse} of $\alpha$.

\paragraph{Division:} Take real numbers $\alpha,\beta \in \R$ with
$\beta \neq 0^*$. Define the \emph{division} operator $/$ such that
%
\begin{equation*}
        \alpha / \beta
        \triangleq
        \alpha \times \beta^{-1}
\end{equation*}
%
where $\beta^{-1}$ is the multiplicative inverse of $\beta$. Even though
this is clearly a shorthand for multiplication, this operation is not
commutative nor associative. The result of a division is called a
\emph{quotient}. Note that $1^* / \alpha = \alpha^{-1}$, and therefore
the multiplicative inverse of $\alpha$ will sometimes be denoted
$1^*/\alpha$. It is also common that $\alpha / \beta$ is denoted as the
\emph{ratio} $\frac{\alpha}{\beta}$.

\paragraph{Exponentiation:} Now that multiplication and division have
been defined for the reals, exponentiation can also be defined. For any
reals $x, y, a, b \in \R$, exponentiation of the reals is such that
%
\begin{align*}
        x^{0^*} &\triangleq 1^*\\
        x^{1^*} &\triangleq x\\
        x^{-1^*} &\triangleq \frac{1^*}{x}\\
        x^{a+b} &\triangleq x^a \times x^b\\
        x^{-b} &\triangleq \frac{1^*}{x^b}\\
        x^{a-b} &\triangleq \frac{x^a}{x^b}\\
        (x^a)^b &\triangleq x^{a \times b}\\
        (x \times y)^a &\triangleq x^a \times y^a\\
        \left(\frac{x}{y}\right)^a &\triangleq \frac{x^a}{y^a}
\end{align*}
%
Take real $x \in \R$ and integers $p,q \in \Z$ with $q \neq 0$ that make
up rational $\frac{p}{q} \in \Q$. By the laws above, the real
%
\begin{equation*}
        x^{\frac{p}{q}^*} = ( x^{\frac{1}{q}^*} )^{p^*}
\end{equation*}
%
where $p^* = \frac{p}{1}^*$. Note that if $q < 0$ then
$x^{\frac{1}{q}^*} = ( x^{\frac{1}{|q|}^*})^{-1^*}$, and so assume that
$q > 0$. Thus, the existence of $x^{\frac{1}{q}^*}$ where $q \in \Z$
with $q > 0$ is of critical importance. The real number
$x^{\frac{1}{s}^*}$ should be such that $( x^{\frac{1}{q}^*} )^{q^*} =
x$. Note that $(-1^*)^{\frac{1}{2}^*}$ does not exist since there is no
real $x \in \Q$ such that $x \times x = -1^*$; however,
$(-8^*)^{\frac{1}{3}^*} = -2^*$ since $-2^* \times -2^* \times -2^* =
-8^*$. Additionally, there \emph{is} a real number $x \in \R$ such that
$x^{2^*} = 2^*$; that is, $(2^*)^{\frac{1}{2}^*}$ exists. In particular,
%
\begin{equation*}
        (2^*)^{\frac{1}{2}^*}
        =
        \{ p \in \Q : p^2 < 2 \}
\end{equation*}
%
Also note that for any $x \in \Q$, $x^{2^*} \geq 0^*$. Additionally, by
this definition, ${0^*}^{0^*} = 1^*$. This definition also gives an
alternate notation for the multiplicative inverse. That is, for any $x
\in \R \setdiff \{0^*\}$, its multiplicative inverse $1^*/x$ is also
denoted $x^{-1^*}$ and so $x \times x^{-1^*} = x^{-1^*} \times x = 1^*$.
Our discussion of \emph{logarithms} in \longref{app:math_logarithms} is
intimately related with exponentiation of the real numbers.

\paragraph{Roots:} Take integer $q \in \Z$ with $q > 0$ and real $x \in
\R$. The real number $x^{\frac{1}{q}^*}$ is called the \emph{$q\th$
root} of $x$ and is also denoted $\sqrt[q]{x}$. The special case of
$\sqrt[3]{x}$ is called the \emph{cube root} of $x$. The special case of
$\sqrt[2]{x}$ is called the \emph{square root} of $x$ and is often
written as $\sqrt{x}$.

\paragraph{Absolute Value and Signum:} For any real $x \in \R$,
denote its \emph{absolute value} with the notation $|x|$ defined by
%
\begin{equation*}
        |x| 
        \triangleq 
        \begin{cases}
                x &\text{if } x \geq 0^*\\
                -x &\text{if } x < 0^*
        \end{cases}
\end{equation*}
%
and define the \emph{signum function} (also called the \emph{sign
function}, not to be confused with the \emph{sine function}) $\sgn: \R
\mapsto \{-1^*,0^*,1^*\}$ with
%
\begin{equation*}
        \sgn(x)
        \triangleq
        \begin{cases}
                -1^* &\text{ if} x < 0^*\\
                0^* &\text{ if} x = 0^*\\
                1^* &\text{ if} x > 0^*
        \end{cases}
\end{equation*}
%
Therefore, any real $z \in \R$ can be represented as a magnitude
(\ie, absolute value $|z|$) and a sign (\ie, $\sgn(z)$), as in
%
\begin{equation*}
        z = \sgn(z) \times |z|
\end{equation*}
%
Note that the absolute value has some special properties. In particular,
for any two reals $x,y \in \R$,
%
\begin{itemize}
        \item $|x| \geq 0^*$
        \item $|x| = 0^*$ if and only if $x = 0^*$
        \item $|x \times y| = |x| \times |y|$
        \item $|x + y| \leq |x| + |y|$
        \item $|x - y| \geq |x| - |y|$
        \item $|{-x}| = |x|$
        \item $|x| \leq y$ if and only if $-y \leq x \leq y$
        \item $|x/y| = |x|/|y|$ if and only if $y \neq 0^*$
\end{itemize}
%
All of these properties are identical to the ones for rationals.

\paragraph{Algebraic Structure of the Reals:} Note that for 
$(\R,{+},0^*)$, it is the case that
%
\begin{itemize}
        \item for all $x,y \in \R$, $x + y = y + x$
        \item for all $x,y,z \in \R$, $(x + y) + z = x + (y + z)$
        \item for all $x \in \R$, $0^* + x = x + 0^* = x$
        \item for all $x \in \R$, $x + -x = -x + x = 0^*$
\end{itemize}
%
and for $(\R,{\times},1^*)$, it is the case that
%
\begin{itemize}
        \item for all $x,y \in \R$, $x \times y = y \times x$
        \item for all $x,y,z \in \R$, $(x \times y) \times z = x \times
                (y \times z)$
        \item for all $x \in \R$, $1^* \times x = x \times 1^* = x$
\end{itemize}
%
And so for $(\R,{+},{\times},0^*,1^*)$,
%
\begin{itemize}
        \item $(\R,{+},0^*)$ is a \emph{commutative group} 
                with additive inverse $-x$ for every $x \in \R$
        \item $(\R,{\times},1^*)$ is a \emph{commutative monoid} with
                multiplicative inverse $x^{-1}$ for every $x \in \R
                \setdiff \{0^*\}$
        \item $0^* \neq 1^*$
        \item for each $x,y,z \in \R$, $x(y + z) = xy + yz$ and
                $(x + y)z = xz + yz$
        \item for all $x \in \R \setdiff \{0\}$, $x \times x^{-1} =
                x^{-1} \times x = 1^*$
\end{itemize}
%
Therefore, $(\R,{+},{\times},0^*,1^*)$ is a \emph{field}. Thus,
$(\R,{+},{\times},0^*,1^*)$ is trivially an algebra over itself (\ie, an
$\R$-algebra). However, also note that for any $x,y,z \in \R$,
%
\begin{itemize}
        \item if $x \leq y$ then $z + x \leq z + y$
        \item if $0^* \leq x$ and $0^* \leq y$ then $0^* \leq xy$
\end{itemize}
%
and so $(\R,{+},{\times},0^*,1^*,{\leq})$ is an \emph{ordered field} and
all aspects of familiar arithmetic apply to it. Unless otherwise noted,
whenever $\R$ is used, it is assumed that it is equipped with operators
$+$ and $\times$ and order relation $\leq$; in other words, $\R$ is
implicitly taken to be the ordered field
$(\R,{+},{\times},0^*,1^*,{\leq})$.

\paragraph{Relationship to Rational Numbers:} Recall that 
%
\begin{align*}
        \Q^*
        &\triangleq
        \{ \{ p \in \Q : p < q \} : q \in \Q \}\\
        &=
        \{ q^* : q \in \Q \}
\end{align*}
%
and so $\Q^*$ is the set of reals that have a least upper bound that is
a rational number. It is easy to show that if the image of $\Q^* \times
\Q^*$ through either operator $+$ or $\times$ is $\Q^*$. Additionally,
it can be shown that $(\Q^*,{+}|_{\Q^*},{\times}|_{\Q^*})$ forms a
field, and so $\Q^*$ is a subfield of $\R$. Since every subfield of an
ordered field is also an ordered field then
$(\Q^*,{+}|_{\Q^*},{\times}|_{\Q^*})$ is an ordered field.

Now, take the function $f: \Q^* \mapsto \Q$ defined by
%
\begin{align*}
        f
        &\triangleq 
        \left\{ 
        \left(\{q \in \Q : q < p\}, p\right): \text{ for all } p \in \Q
        \right\}\\
        &= \{ (p^*, p): \text{ for all } p \in \Q \}
\end{align*}
%
Clearly this is a bijection. That is, the inverse $f^{-1}: \Q \mapsto
\Q^*$ is defined by
%
\begin{align*}
        f^{-1}
        &\triangleq 
        \left\{ 
        \left(p,\{q \in \Q : q < p\}\right): \text{ for all } p \in \Q
        \right\}\\
        &= \{ (p^*, p): \text{ for all } p \in \Q \}
\end{align*}
%
Therefore $\Q \cong \Q^*$. Also, note that for any rationals $x,y \in
\Q^*$, 
%
\begin{enumerate}[(i)]
        \item if $x \geq y$ then $f(x) \geq f(y)$
                \label{item:real_rational_ordering}
        \item $f(x + y) = f(x) + f(y)$
                \label{item:real_rational_field_homomorphism_plus}
        \item $f(x \times y) = f(x) \times f(y)$
                \label{item:real_rational_field_homomorphism_times}
        \item $f(1^*)=1$
              \label{item:real_rational_field_homomorphism_m_identity}
\end{enumerate}
%
Property (\shortref{item:real_rational_ordering}) shows that $f$ is a
monotone function, and properties
(\shortref{item:real_rational_field_homomorphism_plus})--%
(\shortref{item:real_rational_field_homomorphism_m_identity}) show
that $f$ is a field homomorphism. Since $f$ is also a bijection, it can
be said that $f$ is both an isomorphism in both the order sense and the
algebraic sense. In other words, $\Q$ is isomorphic to $\Q^*$ in both an
order sense and an algebraic sense. Therefore, not only is $\Q \cong
\Q^*$, but $\Q^*$ is a valid \emph{representation} for $\Q$, and it is
justifiable to say that $\Q$ is a subfield of $\R$.

For example, note that for any rationals $x,y,a \in \Q^*$,
%
\begin{itemize}
        \item $x = y$ if and only if $f(x) = f(y)$
        \item $x \leq y$ if and only if $f(x) \leq f(y)$
        \item $f(0^*) = 0$
        \item $f(x + y) = f(x)+f(y)$
        \item $f(x - y) = f(x)-f(y)$
        \item $f(1^*) = 1$
        \item $f(x y) = f(x) f(y)$
        \item $f(x^{a^*}) = f(x)^a$
\end{itemize}
%
So arithmetic and order are both preserved by the bijection $f$. Thus,
while $\Q$ is certainly not equal to $\Q^*$, it is equal in all of the
important ways that matter to us, and so we can consider $\Q \subset \R$
with all of its standard ordering and operations. In other words, the
$*$ superscript can be dropped from all of the real symbols above; the
set $\Q^*$ is a valid representation of the set of the rationals $\Q$.
Note that the set $\R \setdiff \Q$ is known as the set of the
\emph{irrational numbers}. An important irrational number, Euler's
constant, is introduced in \longref{app:math_logarithms}.

\paragraph{Ceiling and Floor:} Take any real number $x \in \R$. The the
\emph{floor} of real number $x$ is denoted
\symdef{Bnumbers.61}{floor}{$\lfloor x \rfloor$}{the floor of real
number $x$ (\ie, the greatest integer not greater than $x$)} and defined
by
%
\begin{equation*}
        \lfloor x \rfloor
        \triangleq
        \sup\{ n \in \Z : n \leq x \}
\end{equation*}
%
This can be viewed as the greatest integer that is not greater than the
real number. For example,
%
\begin{itemize}
        \item $\lfloor 2.2 \rfloor = 2$
        \item $\lfloor 2 \rfloor = 2$
        \item $\lfloor -1.8 \rfloor = -2$
\end{itemize}
%
Clearly, $\lfloor x \rfloor \leq x \leq \lfloor x \rfloor + 1$.
Similarly, the \emph{ceiling} of real number $x$ is denoted
\symdef{Bnumbers.60}{ceiling}{$\lceil x \rceil$}{the ceiling of real
number $x$ (\ie, the least integer not less than $x$)} and defined by
%
\begin{equation*}
        \lceil x \rceil
        \triangleq
        \inf\{ n \in \Z : x \leq n \}
\end{equation*}
%
This can be viewed as the least integer that is not less than the real
number. For example,
%
\begin{itemize}
        \item $\lceil 2.2 \rceil = 3$
        \item $\lceil 2 \rceil = 2$
        \item $\lceil -1.8 \rceil = -1$
\end{itemize}
%
Clearly, $\lceil x \rceil - 1 \leq x \leq \lceil x \rceil$.

\paragraph{Base-10 (Decimal) Notation:} Now that we have defined the
reals and have endowed them with addition, multiplication, and
exponentiation, it is possible to introduce familiar decimal notations.
We also make use of the isomorphism of $\Q$ and $\Q^*$ (and the
isomorphisms between $\Z^*$ and $\Z$ and between $\W^*$ and $\W$) for
simplicity.

Define $n: \W \times \R_{>0} \mapsto \W$ so that $n(0,x) \triangleq
\max\{ w \in \W : w \leq x \}$ and $n(k,x)$ is defined by
%
\begin{equation*}
        n(k,x)
        \triangleq
        \max\{ w \in \W :
        n(0,x) 
        + n(1,x) \times 10^{-1} 
        + \dots 
        + w \times 10^{-k}
        \leq x \}
\end{equation*}
%
Note that $0 \leq n(k,x) < 10$ for all $x \in \R_{>0}$ and all $k \in
\W$. Now take $x \in \R_{>0}$. Define the set $\set{E}_x$ to be $\{
n(x,k) : k \in \W \}$. Then $x = \sup \set{E}$ and the decimal expansion
of $x$ is represented by
%
\begin{equation*}
        n(0,x).n(1,x) n(2,x) n(3,x) n(4,x) \cdots
\end{equation*}
%
where juxtaposition is simply notation and does not imply
multiplication. For $x = 0$, the decimal expansion of $x$ is simply $0$
or $0.0$ followed by any number of the symbol $0$. Finally, for $x < 0$,
the decimal expansion of $x$ is identical to the decimal expansion of
$|x|$ except that a $-$ is prepended to the front of the expansion.

\paragraph{Cardinality and a Continuum:} Note that
%
\begin{itemize}
        \item $\W$ is countable, gapless, and not densely ordered
        \item $\Z$ is countable, gapless, and not densely ordered
        \item $\Q$ is countable, not gapless, and densely ordered
        \item $\R$ is uncountable, gapless, and densely ordered
\end{itemize}
%
As a consequence of the theorem in
\longref{app:math_countability_and_order}, it is impossible to be both
gapless and densely ordered while also being countable. The rational
numbers are able to fill spaces between each of the numbers in the
integers by being densely ordered. However, since $\Z$ and $\Q$ are both
countable then a bijection exists between them and so the rationals can
be constructed by simply reordering the integers in a method similar to
the one shown in \longref{tab:rationals_and_naturals}. Unfortunately,
simply adding a dense ordering to a countable set destroys gaplessness.
If a set is both gapless and densely ordered, it must somehow have more
elements than a countably infinite set in order to fill the gaps
introduced by adding a dense ordering to a countable set. In other
words, a set that is both gapless and densely ordered must be
uncountable. In fact, it can be shown that a bijection exists between
the power set $\Pow(\set{Q})$ and the set of the real numbers $\R$.
Since a set always has a smaller cardinality than its power set then
the reals must somehow have more elements than the rationals. This is
the expected result; the difference in cardinality reflects the extra
elements of $\R$ that fill the gaps in $\Q$. Because the real numbers
lack any gaps, the real numbers are sometimes called a \emph{continuum}.

\paragraph{Bounded Intervals of Real Numbers and Compact Sets:} Take
$a,b \in \R$ with $a \leq b$. The interval $[a,b]$ where
%
\begin{equation*}
        [a,b] \triangleq \{x \in \R : a \leq x \leq b \}
\end{equation*}
%
is called a \emph{closed interval}. The interval $(a,b)$ where
%
\begin{equation*}
        (a,b) \triangleq \{x \in \R : a < x < b \}
\end{equation*}
%
is called an \emph{open interval} or a \emph{segment} \citep{Rudin76}.
The intervals
%
\begin{align*}
        [a,b) &\triangleq \{x \in \R : a \leq x < b \}\\
        (a,b] &\triangleq \{x \in \R : a < x \leq b \}
\end{align*}
%
are called \emph{half-open (or half-closed) intervals}. Note that these
four intervals are \emph{bounded}. In fact, because the interval $[a,b]$
is not only bounded but includes its bounds and is gapless, it is
\emph{complete}. In \longref{app:math_real_numbers_as_metric_spaces}, we
will discuss how closed intervals of the real numbers are called
\emph{compact sets} because they are closed and bounded; however, that
notion of boundedness is different than the boundedness from order
theory. However, in the case of the reals, the two notions are
equivalent. 

\paragraph{Unbounded Intervals of Real Numbers:} Take $a \in \R$. Define
the intervals $[a,\infty)$, $(a,\infty)$, $(-\infty,a]$, and
$(-\infty,a)$ by
%
\begin{align*}
        [a,\infty) &\triangleq \{x \in \R : x \geq a \}\\
        (a,\infty) &\triangleq \{x \in \R : x > a \}\\
        (-\infty,a] &\triangleq \{x \in \R : x \leq a \}\\
        (-\infty,a) &\triangleq \{x \in \R : x < a \}
\end{align*}
%
respectively. Similarly, $(-\infty,\infty) \triangleq \R$. Also note
that the symbol $+\infty$ will sometimes be used in place of $\infty$.

\paragraph{Real Functions:} Take any set $\set{X}$. The function $f:
\set{X} \mapsto \R$ is called a \emph{real function} or a \emph{real
functional} because the range of the function only takes values from
$\R$. That is, a real function provides a relationship between set
$\set{X}$ and the real number system $\R$.

\subsection{The Extended Real Numbers}
\label{app:math_ext_reals}

Call \symdef{Bnumbers.54}{extreals}{$\extR$}{the set of the extended
real numbers (\ie, $\R \cup \{-\infty,+\infty\}$)} the set of the
\emph{extended real numbers}, which are defined by
%
\begin{equation*}
        \extR
        \triangleq
        \{{-\infty},{+\infty}\} \cup \R
\end{equation*}
%
where $\infty$ is a shorthand notation for ${+\infty}$. Note that $\R
\subset \extR$.

\paragraph{Finite Numbers:} Take $x,y \in \extR \cap \R$. For $x$ and
$y$, define relation $\leq$ and operators $+$, $-$, $\times$, and $/$ in
the same was as in $\R$ and call $x$ and $y$ \emph{finite real numbers}.

\paragraph{Ordering:} Take $x \in \extR \cap \R$ (\ie, $x$ is
\emph{finite}). Define $\leq$ so that
%
\begin{equation*}
        {-\infty} < x < {+\infty}
\end{equation*}
%
This way ${-\infty}$ is a lower bound and ${+\infty}$ is an upper bound
for every subset of $\extR$. Refer to ${-\infty}$ and ${+\infty}$ as
being \emph{infinite}.

\paragraph{Upper and Lower Bounds:} By construction of $\extR$, any
subset $\set{X} \subseteq \extR$ will have both a least upper bound and
a greatest lower bound, which makes $\extR$ not only gapless but
complete. However, note that
%
\begin{equation*}
        \inf \emptyset = \infty
\end{equation*}
%
That is, the greatest lower bound of the empty set is the infinite upper
bound $\infty$. This is because every $x \in R$ is an lower bound for
$\infty$, and so the \emph{greatest} lower bound must be $\sup \extR$,
which is $\infty$. Similarly,
%
\begin{equation*}
        \sup \emptyset = -\infty
\end{equation*}
%
That is, the least upper bound of the empty set must be the infinite
lower bound $-\infty$ (\ie, $\inf \extR$).

\paragraph{Arithmetic:} Take $x \in \extR$. Define $+$, $-$, $\times$,
and $/$ (also represented as a ratio) such that
%
\begin{enumerate}[(i)]
        \item for finite $x$ (\ie, $x \in \R$),
                \begin{itemize}
                        \item $x + \infty = \infty$
                        \item $x + {-\infty} = {-\infty}$
                        \item $x - \infty = {-\infty}$
                        \item $\frac{x}{+\infty} = 0$
                        \item $\frac{x}{-\infty} = 0$
                \end{itemize}
        \item for $x > 0$, 
                $x \times {+\infty} = {+\infty}$ and
                $x \times {-\infty} = {-\infty}$ 
        \item for $x < 0$, 
                $x \times {+\infty} = {-\infty}$ and
                $x \times {-\infty} = {+\infty}$ 
\end{enumerate}
%
Notice that $\infty - \infty$, $\infty + {-\infty}$, $0 \times \infty$,
$0 \times {-\infty}$, $y/0$ for $y \in \extR$, and $\alpha/\beta$ for
$\alpha,\beta \in \{ {-\infty},{+\infty}\}$ are not defined. However, as
is done in \longref{app:math_lebesgue_integral}, it will sometimes be
convenient to define $\infty \times 0 = 0 \times \infty = 0$; this will
never be assumed unless otherwise noted.

\paragraph{Algebraic Structure of the Extended Reals:} Unlike $\R$, it
is not true that $\extR$ is a field. In fact, $\extR$ is also not a
ring. However, the arithmetic defined above is usually sufficient for
the situations in which is it is needed.

\paragraph{Completeness:} The extended real numbers are sometimes called
the \emph{completion} or the \emph{closure} or the
\emph{compactification} of the real numbers. That is, whereas the real
numbers are only gapless, the extended reals are not only gapless but
also \emph{complete}. That is, every (nonempty) subset of the extended
reals has both a least upper bound and a greatest lower bound.

\paragraph{Intervals of Extended Real Numbers and Compactness:}
Intervals of the extended real numbers are defined exactly the same as
the intervals for the real numbers. However, each unbounded real
interval can be considered a bounded extended real interval, and so even
these intervals can be called closed, half-open, or open. There are also
additional intervals that include $\infty$ and $-\infty$. Take $a \in
\extR \cap \R$ (\ie, finite $a$). Then define the intervals
%
\begin{align*}
        [a,\infty] 
        &\triangleq \{x \in \R : x \geq a \} \cup \{\infty\}\\
        (a,\infty]
        &\triangleq \{x \in \R : x > a \} \cup \{\infty\}\\
        [-\infty,a] 
        &\triangleq \{x \in \R : x \leq a \} \cup \{-\infty\}\\
        [-\infty,a) 
        &\triangleq \{x \in \R : x < a \} \cup \{-\infty\}\\
        (-\infty,\infty]
        &\triangleq \R \cup \{\infty\}\\
        [-\infty,\infty)
        &\triangleq \R \cup \{-\infty\}\\
        [-\infty,\infty]
        &\triangleq \extR
\end{align*}
%
As mentioned, $[-\infty,\infty]$ is a a \emph{closed interval} and
$[-\infty,\infty)$ is a \emph{half-open (or half-closed) interval}. This
is due to the completeness of the extended real numbers. Since every
interval is bounded, every closed interval is a \emph{compact set}.
Again, this quality of closed and bounded being equivalent to compact is
a special quality of the real numbers.

\paragraph{Real Functions as Extended Real Functions:} Take arbitrary
set $\set{X}$. Any function $f: \set{X} \mapsto \R$ is a real function,
as discussed; however, such a function is implicitly an extended real
function. That is, a function $f: \set{X} \mapsto \R$ can be said to be
a function $f: \set{X} \mapsto \extR$ with almost no loss of generality
since its range will still be a subset of $\R$.

\section{Basic Topology}
\label{app:math_topology}

One of the key reasons why $\R$ has so many real practical applications
is because it is uncountable. However, this presents many challenges for
analysis. That is, if points in a set cannot even be counted, it is
difficult to reason about them. Thus, the mathematical study of
\emph{topology} presents ways to put distance between points. In other
words, points can be viewed as existing in a certain \emph{place} with
respect to other points. By placing this sort of map over a set, a
topology adds \emph{shape} to the set. Therefore, topology is roughly
a study of the place or location of points in a set. The topology that
we discuss is often called \emph{point-set topology} for this reason.

\subsection{The Topological Space}
\label{app:math_topological_spaces}

Take a set $\set{X}$ and a set $\setset{T} \in \Pow(\set{X})$ (\ie,
$\setset{T}$ is a set of subsets of $\set{X}$) such that
%
\begin{enumerate}[(i)]
        \item $\emptyset \in \setset{T}$ and $\set{X} \in \setset{T}$
        \item for any set of sets $\setset{C} \subseteq \setset{T}$, the
                union $\bigcup \setset{C} \in \setset{T}$
        \item for any sets $\set{G} \in \setset{T}$ and $\set{H} \in
                \setset{T}$, the intersection $\set{G} \cap \set{H} \in
                \setset{T}$
\end{enumerate}
%
Then $(\set{X},\setset{T})$ is called a \emph{topological space} and
$\setset{T}$ is called a \emph{topology on $\set{X}$}. Elements of
$\set{X}$ will be called \emph{points}. The sets that are contained in
the topology $\setset{T}$ are called \emph{open sets} and the
complements of these sets are called \emph{closed sets}. 

\paragraph{Open Sets and Neighborhoods:} For the following definitions,
take the generic topological space $(\set{X},\setset{T})$. That is, take
a set $\set{X}$ with topology $\setset{T}$ with elements of $\set{X}$
called points. Also take subset $\set{E} \subseteq \set{X}$. Also recall
the definitions of \emph{filter on a set} and \emph{filter base on a
set} from \longref{app:math_filters_on_sets}.
%
\begin{description}
        \item\emph{Universal Set:} The \emph{universal set} for the
                topological space is $\set{X}$.
        \item\emph{Set Complement:} The \emph{complement} of $\set{E}$
                is denoted $\set{E}^c$ and defined by $\set{E}^c
                \triangleq \set{X} \setdiff \set{E}$. This is consistent
                with calling $\set{X}$ the \emph{universal set} for all
                sets contained in the topological space.
        \item\emph{Open Sets:} To say that $\set{E}$ is an \emph{open
                set} means that $\set{E} \in \setset{T}$. That is,
                $\setset{T}$ is a collection of all open sets in
                topological space $(\set{X},\setset{T})$. Sometimes it
                may be convenient to call this \emph{open in $\set{X}$}
                or \emph{open with respect to topology $\setset{T}$ on
                $\set{X}$}.
        \item\emph{Closed Sets:} To say that $\set{E}$ is a \emph{closed
                set} means that $\set{E}^c$ is an open set (\ie,
                $\set{E}^c \in \setset{T}$). That is, the complements of
                the sets in the topology $\setset{T}$ are the closed
                sets in topological space $(\set{X},\setset{T})$.
                Sometimes it may be convenient to call this \emph{closed
                in $\set{X}$} or \emph{closed with respect to topology
                $\setset{T}$ on $\set{X}$}.
        \item\emph{Clopen Sets:} To say that $\set{E}$ is a \emph{clopen
                set} or to call it \emph{clopen} means that $\set{E}$ is
                both an open set (\ie, $\set{E} \in \setset{T}$) and a
                closed set (\ie, $\set{E}^c \in \setset{T}$). Of course,
                the complement of any clopen set is also clopen. It can
                be shown that $\emptyset$ and $\set{X}$ are clopen.
                Sometimes it may be convenient to call this \emph{clopen
                in $\set{X}$} or \emph{clopen with respect to toplogy
                $\setset{T}$ on $\set{X}$}.
        \item\emph{Neighborhoods:} Take $x \in \set{X}$. A
                \emph{neighborhood} $\set{U}$ of $x$ (in $\set{X}$) is a
                set such that there exists an open set $\set{G} \in
                \setset{T}$ with $\set{G} \subseteq \set{U}$ where $x
                \in \set{G}$. A neighborhood of $x$ does not need to be
                an open set; however, it must contain an open set that
                includes $x$.
        \item\emph{Neighborhood Systems:} The notation
                \symdef{Ganalysis.0001}{nhd}{$\nhd_x$}{neighborhood
                system of $x$ (\ie, set of all topological neighborhoods
                of $x$)} is called the \emph{neighborhood system} at $x$
                (for $\set{X}$) or the \emph{neighborhood filter} at $x$
                (for $\set{X}$). This is the set of all neighborhoods of
                $x$. Therefore, to say that $\set{U}$ is a neighborhood
                of $x$ is equivalent to saying that $\set{U} \in
                \nhd_x$. It can be verified that $\nhd_x$ is a
                \emph{filter on set $\set{X}$}.
        \item\emph{Neighborhood Base:} Take $x \in \set{X}$. A
                \emph{neighborhood base} $\setset{B}$ at $x$ (for
                $\set{X}$) is such that
                %
                \begin{itemize}
                        \item for all $\set{B} \in \setset{B}$,
                                $\set{B}$ is a neighborhood of $x$ (\ie,
                                $\set{B} \in \nhd_x$)
                        \item for any neighborhood $\set{U}$ of $x$
                                (\ie, $\set{U} \in \nhd_x$), there
                                exists a $\set{B} \in \setset{B}$ such
                                that $\set{B} \subseteq \set{U}$
                \end{itemize}
                %
                That is, a neighborhood base $\setset{B}$ at $x$ is a
                set of neighborhoods of $x$ such that every neighborhood
                of $x$ contains some set that belongs to $\setset{B}$.
                Note that any neighborhood base $\setset{B}$ at $x$ is
                such that $\setset{B} \subseteq \nhd_x$. It can be
                verified that any neighborhood base $\setset{B}$ of a
                point $x$ is a \emph{filter base on set $\set{X}$} that
                \emph{generates} the neighborhood system $\nhd_x$ (\ie,
                $\setset{B}$ is a basis for the neighborhood filter
                $\nhd_x$).
\end{description}

\paragraph{Points and Sets:} The following are some common terms used to
describe points and sets in topological spaces. For the following
definitions, take the generic topological space $(\set{X},\setset{T})$.
That is, take a set $\set{X}$ with topology $\setset{T}$ with elements
of $\set{X}$ called points. Also take subset $\set{E} \subseteq
\set{X}$.
%
\begin{description}
        \item\emph{Limit Points of Sets:} Take point $x \in \set{X}$.
                The point $x$ is a \emph{limit point} of a set $\set{E}$
                if every neighborhood of $x$ includes a point $p \in
                \set{E}$ with $p \neq x$. In other words, to say that
                $x$ is a limit point of set $\set{E}$ means that for all
                $\set{U} \in \nhd_x$, there is a point $p \in \set{U}$
                with $p \in \set{E} \setdiff \{x\}$ (\ie, $\set{U} \cap
                (\set{E} - \{x\}) \neq \emptyset$). Note that if $x$ is
                a limit point of $\set{E}$, it need not be an element of
                $\set{E}$. It can be shown that the set $\set{E}$ is a
                closed set if and only if every limit point of $\set{E}$
                is also an element of set $\set{E}$. An extension of
                this shows that the set of limit points of a set is a
                closed set.
        \item\emph{Isolated Points:} If point $x \in \set{E}$ is not a
                limit point of set $\set{E}$ then $x$ is an
                \emph{isolated point} of set $\set{E}$.
        \item\emph{Interior Points:} A point $x$ is an \emph{interior
                point} of set $\set{E}$ if there is a neighborhood
                $\set{U}$ of $x$ such that $\set{U} \subseteq \set{E}$.
                It can be shown that $\set{E}$ is an open set if and
                only if every element of $\set{E}$ is an interior point
                of $\set{E}$.
        \item\emph{Interior:} The \emph{interior} of $\set{E}$ is
                denoted $\interior(\set{E})$ and is the set of all
                interior points of set $\set{E}$. Some authors denote
                the interior of $\set{E}$ by $\overset{\circ}{\set{E}}$
                or $\set{E}^\circ$.
        \item\emph{Dense Sets:} The set $\set{E}$ is called \emph{dense
                in $\set{X}$} if every point in $\set{X}$ is either a
                limit point of $\set{E}$, a point in $\set{E}$, or both.
                Roughly speaking, if $\set{E}$ is dense in $\set{X}$,
                then for any point in $\set{E}$, there is a point in
                $\set{X}$ that is near to it. Precisely, this means that
                if $\set{E}$ is dense in $\set{X}$ then for any point $p
                \in \set{E}$ and neighborhood $\set{U} \in \nhd_p$,
                there exists a point $x \in \set{X}$ such that $x \in
                \set{U}$. To say a set is \emph{dense in itself} means
                that the set contains no isolated points. Note that this
                is similar to what we called densely ordered; however,
                it is not the same notion.
        \item\emph{Set Closure:} The \emph{(topological) closure} of
                $\set{E}$ is denoted $\overline{\set{E}}$ and is the
                intersection of all closed sets that are supersets of
                $\set{E}$. It is defined by $\overline{\set{E}}
                \triangleq \set{E} \cup \set{E}'$ where $\set{E}'$ is
                the set of all limit points of $\set{E}$. In other
                words, the closure of set $\set{E}$ is the set of all
                elements of $\set{E}$ and all limit points of $\set{E}$.
        \item\emph{Closure Point:} A \emph{closure point} is a point of
                set $\set{E}$ is a point that is an element of its
                closure $\overline{\set{E}}$. That is, $x \in \set{X}$
                is a closure point for $\set{E}$ if and only if $x \in
                \overline{\set{E}}$.
\end{description}

\paragraph{Some Useful Results:} The following results relate the terms
given above. Again, take the generic topological space
$(\set{X},\setset{T})$. Also take subset $\set{E} \subseteq \set{X}$.
%
\begin{itemize}
        \item Every neighborhood contains an open set. For example, for
                a point $x \in \set{X}$ and neighborhood $\set{U}$ of
                $x$ (\ie, $\set{U} \in \nhd_x$), there exists a set
                $\set{G} \in \setset{T}$ such that $\set{G} \subseteq
                \set{U}$.
        \item $(\set{E}^c)^c = \set{E}$
        \item $\emptyset = \set{X}^c$ and $\set{X} = \emptyset^c$
        \item Set $\set{E}$ is an open set \emph{if and only if} its
                complement is closed (\ie, $(\set{E}^c)^c \in
                \setset{T}$).
        \item Set $\set{E}$ is a closed set if and only if its
                complement is open (\ie, $\set{E}^c \in \setset{T}$).
        \item Set $\set{E}$ is a closed set if and only if it includes
                all of its limit points.
        \item Sets can be both open and closed simultaneously. That is,
                there may exist some set $\set{G} \subseteq \set{X}$
                such that $\set{G} \in \setset{T}$ and $\set{G}^c \notin
                \setset{T}$. When a set is both open and closed, it is
                called a \emph{clopen set} or simply \emph{clopen}.
        \item $\set{E}$ is clopen if and only if $\set{E}^c$ is clopen.
        \item Some sets may be neither open nor closed. That is, it may
                be that $\set{E} \notin \setset{T}$ and $\set{E}^c
                \notin \setset{T}$.
        \item The empty set $\emptyset$ and the universal set $\set{X}$
                are both open and closed in $\set{X}$ (\ie, they are
                \emph{clopen}). This is clear since $\emptyset \in
                \setset{T}$, $\emptyset = \set{X}^c$, $\set{X} \in
                \setset{T}$, and $\set{X} = \emptyset^c$. 
        \item Recall that $\overline{\set{E}}$ is the closure of
                $\set{E}$. It can be shown that
                %
                \begin{itemize}
                        \item $\overline{\set{E}}$ is a closed set (\ie,
                                $\overline{\set{E}}^c \in \setset{T}$)
                        \item $\set{E} \subseteq \overline{\set{E}}$
                        \item $\set{E} = \overline{\set{E}}$ if and only
                                if $\set{E}$ is a closed set
                        \item $\overline{\overline{\set{E}}} =
                                \overline{\set{E}}$
                        \item if $\set{F} \subseteq \set{E}$ then
                                $\overline{\set{F}} \subseteq
                                \overline{\set{E}}$
                        \item if $\set{F} \subseteq \set{X}$ is a closed
                                set (\ie, $\set{F}^c \subseteq
                                \setset{T}$) then $\set{E} \subseteq
                                \set{F}$ if and only if
                                $\overline{\set{E}} \subseteq \set{F}$
                        \item $\overline{\set{E}}$ is the intersection
                                of all closed sets that contain
                                $\set{E}$ (\ie, $\overline{\set{E}} =
                                \bigcap \{ \set{G}: \set{G}^c \in
                                \setset{T}, \set{E} \subseteq \set{G}
                                \}$)
                        \item $\overline{\set{E}}$ is the smallest
                                closed subset of $\set{X}$ that contains
                                $\set{E}$ (\ie, $\overline{\set{E}}^c
                                \in \setset{T}$ and for all $\set{G} \in
                                \setset{T}$ with $\set{E} \subseteq
                                \set{G}^c$, $\overline{\set{E}}
                                \subseteq \set{G}^c$)
                        \item $\overline{\set{E}} =
                                \interior(\set{E}^c)^c$ where
                                $\interior(\set{F})$ is the interior of
                                a set $\set{F} \subseteq \set{X}$
                \end{itemize}
                %
        \item Recall that $\interior(\set{E})$ is the interior of
                $\set{E}$. It can be shown that
                %
                \begin{itemize}
                        \item $\interior(\set{E})$ is an open set (\ie,
                                $\interior(\set{E}) \in \setset{T}$)
                        \item $\interior(\set{E}) \subseteq \set{E}$
                        \item $\set{E} = \interior(\set{E})$ if and only
                                if $\set{E}$ is an open set
                        \item $\interior(\interior(\set{E})) =
                                \interior(\set{E})$
                        \item if $\set{E} \subseteq \set{F}$ then
                                $\interior(\set{E}) \subseteq
                                \interior(\set{F})$
                        \item if $\set{F} \subseteq \set{X}$ is an open
                                set (\ie, $\set{F} \subseteq
                                \setset{T}$) then $\set{F} \subseteq
                                \set{E}$ if and only if $\set{F}
                                \subseteq \interior(\set{E})$ 
                        \item $\interior(\set{E})$ is the union of all
                                open sets contained in $\set{E}$ (\ie,
                                $\interior(\set{E}) = \bigcup \{ \set{G}
                                \in \setset{T} : \set{G} \subseteq
                                \set{E} \}$)
                        \item $\interior(\set{E})$ is the largest open
                                subset of $\set{X}$ contained in
                                $\set{E}$ (\ie, $\interior(\set{E}) \in
                                \setset{T}$ and for all $\set{G} \in
                                \setset{T}$ with $\set{G} \subseteq
                                \set{E}$, $\set{G} \subseteq
                                \interior(\set{E})$)
                        \item $\interior(\set{E}) =
                                \overline{\set{E}^c}^c$ where
                                $\overline{\set{F}}$ is the closure of a
                                set $\set{F} \subseteq \set{X}$
                \end{itemize}
                %
                From this, it should be clear that the interior is
                somewhat a \emph{dual} notion of closure.  
        \item A point $x \in \set{X}$ is a limit point of $\set{E}$ if
                and only if $x \in \overline{\set{E} \setdiff \{x\}}$.
        \item Define $\set{E}' \triangleq \{ \text{limit points of
                $\set{E}$} \}$. The set $\set{E}'$ is a closed set.
        \item $\set{E}$ is an open set if and only if any point $x \in
                \set{E}$ is an interior point of $\set{E}$ (\ie, $x \in
                \interior(\set{E})$).
        \item For a point $x \in \overline{\set{E}}$, for all $\set{U}
                \in \nhd_x$, $\set{U} \cap \set{E} \neq \emptyset$.
\end{itemize}

\paragraph{Compactness and Compact Sets:} The analysis of topological
spaces often involves a property known as the \emph{compactness}. Take
$(\set{X},\setset{T})$ to be a topological space and subset $\set{E}
\subseteq \set{X}$. To say that $\set{E}$ is \emph{compact} means that
for any $\setset{U} \in \Pow(\set{E})$ (\ie, a set of subsets of
$\set{E}$) such that $\set{E} \subseteq \bigcup \setset{U}$, there
exists a finite $\setset{U}_0 \subseteq \setset{U}$ (\ie, $\setset{U}_0$
is a finite set of subsets of $\set{E}$) such that $\set{E} \subseteq
\bigcup \setset{U}_0$. It is often said that a set is called compact if
all of its open \emph{covers} have a finite \emph{subcover}. This is a
useful property for dealing with infinite sets. For example, imagine two
objects separated by some finite distance. Even though there are an
infinite number of points between the two objects, a ruler with a finite
number of points can be placed between the objects to measure the
distance separating them. Therefore, the set of points between the two
objects must be compact. Note that it already has been said that any
closed and bounded subset of $\R$ (\eg, $[a,b]$ with $a,b \in \R$) is
called compact; this is because all covers of closed and bounded subsets
of $\R$ have a finite subcover, which is similar to the ruler example
(note that we will discuss the topological properties of $\R$ below; in
particular, we will show that all closed intervals are closed sets).
Compact sets are generalizations of finite sets; they provide a way to
reduce an infinite set to a finite union of open sets.

\paragraph{First-Countable Spaces:} Take a topological space
$(\set{X},\setset{T})$. Take any point $x \in \set{X}$. Assume that
there is a sequence $(\set{B}_n)$ where $\set{B}_n \subseteq \set{X}$
for all $n \in \N$ such that for every $\set{U} \in \nhd_x$, there
exists some $i \in \N$ where $\set{B}_i \subseteq \set{U}$; that is,
assume that there is a countable neighborhood base at $x$. In this case,
the topological space is called \emph{first-countable}. That is, a
\emph{first-countable space} is a topological space where each point in
the space has a countable neighborhood base. 

\subsection{Limits of Sets}
\label{app:math_topology_set_limits}

The following defined constructs which are commonly used with filters;
however, we define them for sets to motivate the filter case. For the
following, take the topological space $(\set{X},\setset{T})$.

\paragraph{Limit Inferior and Limit Superior of a Set:} Take the subset
$\set{E} \subseteq \set{X}$; however, also assume that $\set{X}$ is a
partially ordered set. Take $\set{L}$ to be the set of all limit points
of $\set{E}$. The \emph{limit superior} or \emph{supremum limit} of
$\set{E}$, denoted $\limsup \set{E}$, is defined as the least upper
bound of $\set{L}$. That is,
%
\begin{equation*}
        \limsup \set{E}
        \triangleq
        \sup \set{L}
        =
        \sup 
        \{ x \in \set{X} : x \text{ is a limit point of } \set{E} \}
\end{equation*}
%
The \emph{limit inferior} or \emph{infimum limit} of $\set{E}$, denoted
$\liminf \set{E}$, is defined as the greatest lower bound of $\set{L}$.
That is,
%
\begin{equation*}
        \liminf \set{E}
        \triangleq
        \inf \set{L}
        =
        \inf 
        \{ x \in \set{X} : x \text{ is a limit point of } \set{E} \}
\end{equation*}
%
Neither $\limsup \set{E}$ nor $\liminf \set{E}$ must exist; however,
they will always exist when $\set{X}$ is a complete lattice. If they do
exist and $\limsup \set{E} = \liminf \set{E}$ then $\set{L}$ must be a
\emph{singleton set} (\ie, $\set{E}$ must have exactly one limit point).

\subsection{Convergence of a Filter Base}

As we will show, filter bases provide a very general framework for
studying \emph{convergence} and \emph{limits}, which are very important
topics in mathematical analysis. Therefore, here we describe the
limiting and clustering behavior of filter bases.

\paragraph{Limit Points of Filter Bases:} Take the generic topological
space $(\set{X},\setset{T})$ and point $x \in \set{X}$. Assume that
$\setset{B}$ is a filter base on $\set{X}$. To say
\symdef[]{Ganalysis.120}{limarrow}{$\to$}{a
limit}\symdef[]{Ganalysis.1201}{limfb}{$\setset{B} \to p$}{filter base
$\setset{B}$ converges to $p$}$\setset{B} \to x$ means that for any
neighborhood $\set{U}$ of $x$, there exists a $\set{B} \in \setset{B}$
such that $\set{B} \subseteq \set{U}$. If $\setset{B} \to x$, then it is
said that $\setset{B}$ is a \emph{convergent filter base (in $\set{X}$)}
that \emph{converges (in $\set{X}$) to} $x$, where $x$ is called the
\emph{limit point} of $\setset{B}$. In other words, for a convergent
filter base $\setset{B}$ (in $\set{X}$), the following are equivalent:
%
\begin{itemize}
        \item $\setset{B} \to x$ (in $\set{X}$)
        \item $x$ is a limit point of $\setset{B}$ (in $\set{X}$)
        \item $\setset{B}$ converges (in $\set{X}$) to $x$
        \item for all $\set{U} \in \nhd_x$, there exists a $\set{B} \in
                \setset{B}$ such that $\set{B} \subseteq \set{U}$
\end{itemize}
%
Technically, convergence should always be listed with the topological
space (and the particular topology, if multiple exist) in which the
convergence is occurring. In many cases, the relevant topology should be
obvious, and so we will omit the text in the parenthetical expressions
shown above.

\paragraph{Convergence in Hausdorff Spaces:} Take the topological space
$(\set{X},\setset{T})$ and point $x \in \set{X}$. To say that $\set{X}$
is a \emph{Hausdorff} space means that every filter base in $\set{X}$
has at \emph{most} one limit (\ie, every convergent filter base has
exactly one limit). In other words, in a Hausdorff space, 
%
\begin{enumerate}[(i)]
        \item for any filter base $\setset{B}$ in $\set{X}$ and points
                $x,y \in \set{X}$, if $\setset{B} \to x$ and $\setset{B}
                \to y$ then $x = y$
                \label{item:Hausdorff_unique_limits}
        \item for points $x,y \in \set{X}$, there exists $\set{U} \in
                \nhd_x$ and $\set{V} \in \nhd_y$ such that $\set{U} \cap
                \set{V} = \emptyset$
                \label{item:Hausdorff_separated}
\end{enumerate}
%
These two properties are actually identical. Property
(\shortref{item:Hausdorff_unique_limits}) states that limits are unique
in a Hausdorff space. Property (\shortref{item:Hausdorff_separated})
states that disjoint neighborhoods of any two points exist. That is,
there are no two points in which all neighborhoods overlap. If a space
is Hausdorff, since limits are unique, if $\setset{B} \to x$ then $x$ is
called \symdef[\emph{the limit of}]{Ganalysis.10}{lim}{$\lim$}{limit
(\eg, unique limit of filter base, function, net, or sequence)}
$\setset{B}$ and the notation
%
\begin{equation*}
        \lim \setset{B} = x
\end{equation*}
%
may be used.

\paragraph{Cluster Points of Filter Bases:} Take the generic topological
space $(\set{X},\setset{T})$, point $x \in \set{X}$, and filter base
$\setset{B}$ on $\set{X}$. To say that $\setset{B}$ \emph{clusters (in
$\set{X}$) at} $x$ or that $x$ is a \emph{cluster point for}
$\setset{B}$ (\emph{in} $\set{X}$) means that for each $\set{B} \in
\setset{B}$ and each $\set{U} \in \nhd_x$, $\set{B} \cap \set{U} \neq
\emptyset$. 

\paragraph{Limit Points as Cluster Points:} Take the generic topological
space $(\set{X},\setset{T})$, point $x \in \set{X}$, and filter base
$\setset{B}$ on $\set{X}$. Assume that $x$ is a limit point of
$\setset{B}$. Take a neighborhood $\set{U} \in \nhd_x$ and set $\set{B}
\in \setset{B}$ with $\set{B} \subseteq \set{U}$; this is possible since
$x$ is a limit point of $\setset{B}$. Now take $\set{C} \in \setset{B}$.
By the definition of a filter base, $\set{B} \cap \set{C} \neq
\emptyset$. However, since $\set{B} \subseteq \set{U}$, then $\set{U}
\cap \set{C} \neq \emptyset$. However, $\set{U}$ and $\set{C}$ were
chosen arbitrarily. Therefore, for each $\set{C} \in \setset{B}$ and
each $\set{U} \in \nhd_x$, $\set{C} \cap \set{U} \neq \emptyset$. Thus,
the limit point $x$ must also be a cluster point for $\setset{B}$. That
is, every limit point of a filter base is a cluster point of the filter
base, and so if a filter base has no cluster points then it will have no
limit points as well; however, it is not necessarily the case that a
cluster point is a limit point. Assume that $\set{X}$ is a Hausdorff
space. In that case,
%
\begin{itemize}
        \item as mentioned, if $\setset{B}$ has no cluster points then
                $\setset{B}$ must also have no limit points 
        \item if $\setset{B}$ has a single cluster point then that
                cluster point is the single limit point of $\setset{B}$ 
        \item if $\setset{B}$ has more than one cluster point then there
                are no limit points of $\setset{B}$
\end{itemize}
%
Note the similarity between cluster points of a filter base and limit
points of a set.

\paragraph{Filter Bases on Subsets:} Take the generic topological space
$(\set{X},\setset{T})$, a subset $\set{E} \subseteq \set{X}$, and point
$x \in \set{X}$. Assume that $\setset{B}$ is a filter base on $\set{E}$
and that $\setset{B} \to x$. Recalling the definitions given above, it
should be clear that $\setset{B} \to x$ means that
%
\begin{enumerate}[(i)]
        \item for any $\set{B} \in \setset{B}$, $\set{B} \subseteq
                \set{E}$
                \label{item:base_on_E_on_E}
        \item for any $\set{U} \in \nhd_x$, there exists a $\set{B} \in
                \setset{B}$ with $\set{B} \neq \emptyset$ and $\set{B}
                \subseteq \set{U}$
                \label{item:base_on_E_base}
\end{enumerate}
%
where property (\shortref{item:base_on_E_on_E}) comes from $\setset{B}$
being \emph{on $\set{E}$} and property (\shortref{item:base_on_E_base})
comes from $\setset{B}$ being a filter base. Therefore, to say
$\setset{B} \to x$ means that for any set $\set{U} \in \nhd_x$, there
exists a $\set{B} \in \setset{B}$ with $\set{B} \neq \emptyset$ and
$\set{B} \subseteq \set{U} \cap \set{E}$. Similarly, to say that
$\setset{B}$ clusters at $x$ means that for any set $\set{U} \in
\nhd_x$, there exists a $\set{B} \in \setset{B}$ with $\set{B} \cap
\set{U} \cap \set{E} \neq \emptyset$. Also note that filter bases should
always be listed with the sets on which they are defined; however, many
topological results will apply to filter bases regardless of the sets on
which they are defined. Additionally, many times the set on which the
filter base is defined will be obvious. Thus, we will often omit
information about the set on which a filter base is defined.

\paragraph{Filter Base Cluster Points as Set Closure Points:} Take the
topological space $(\set{X},\setset{T})$ and a filter base $\setset{B}$.
Consider two cases.
%
\begin{enumerate}[(i)]
        \item Assume that $x \in \set{X}$ is a cluster point of
                $\setset{B}$. Take $\set{B} \in \setset{B}$. By the
                definition of a cluster point, for every $\set{U} \in
                \nhd_x$, $\set{U} \cap \set{B} \neq \emptyset$. However,
                this is the definition of a closure point of arbitrary
                set $\set{B} \in \setset{B}$. Therefore, $x \in
                \overline{\set{B}}$. In other words, it is
                \emph{necessary} that any cluster point of a filter base
                is a closure point of \emph{every} set included in the
                filter base. That is, $x \in \bigcap \{
                \overline{\set{B}} : \set{B} \in \setset{B} \}$.
        \item Assume that $x \in \set{X}$ is a closure point of every
                set in the filter base $\setset{B}$. That is, assume
                that $x \in \bigcap \{ \overline{\set{B}} : \set{B} \in
                \setset{B} \}$. Then, for any set $\set{B} \in
                \setset{B}$ and any neighborhood $\set{U} \in \nhd_x$,
                it is such that $\set{B} \cap \set{U} \neq \emptyset$.
                However, this is the definition of a cluster point for
                filter base $\setset{B}$. Therefore, $x$ is a cluster
                point for $\setset{B}$.
\end{enumerate}
%
This proves that $x$ is a cluster point for $\setset{B}$ if and only if
$x \in \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \}$. In
other words, the set of closure points common to all sets in the filter
base, described by
%
\begin{equation*}
        \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \}
\end{equation*}
%
is precisely the set of cluster points for filter base $\setset{B}$. 

\paragraph{Some Useful Results:} For the following, take the generic
topological space $(\set{X},\setset{T})$ and subset $\set{E} \subseteq
\set{X}$. 
%
\begin{itemize}
        \item $\nhd_x \to x$
        \item For any neighborhood base $\setset{N}$ of $x$, $\setset{N}
                \to x$.
        \item If $\setset{N}$ is a neighborhood base of $x$ and
                $\setset{B}$ is a filter base on $\set{X}$ then
                $\setset{B} \to x$ if and only if $\setset{B}$ is finer
                than $\setset{N}$.
        \item $\overline{\set{E}} = \{ x \in \set{X} : \text{there
                exists a filter base } \setset{B} \text{ on } \set{E}
                \text{ such that } \setset{B} \to x \}$, where
                $\overline{\set{E}}$ is the closure of set $\set{E}$.
        \item For a point $x \in \set{X}$, $x$ is a limit point of
                $\set{E}$ if and only if there exists a filter base
                $\setset{B}$ on $\set{E} \setdiff \{x\}$ such that
                $\setset{B} \to x$.
        \item For any convergent filter base $\setset{B}$ such that
                $\setset{B} \to x$, $\setset{B}$ clusters at $x$ and
                thus $x$ is a cluster point for $\setset{B}$ in
                $\set{X}$.
        \item For any filter base $\setset{B}$, $x$ is a cluster point
                for $\setset{B}$ if and only if there exists a filter
                base $\setset{C}$ such that $\setset{C}$ is finer than
                $\setset{B}$ and $\setset{C} \to x$.
        \item For any filter base $\setset{B}$, the intersection
                $\bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B}
                \}$ is the set of all cluster points of $\setset{B}$.
                This was proved above.
\end{itemize}

\paragraph{Set Limit Points and Filter Base Limit Points:} Take the
topological space $(\set{X},\setset{T})$ and subset $\set{E} \subseteq
\set{X}$. Recall that the claim $x \in \set{X}$ is a \emph{limit point
of $\set{E}$} means that for any $\set{U} \in \nhd_x$, $\set{U} \cap
(\set{E} \setdiff \{x\}) \neq \emptyset$. This is equivalent to saying
that $x$ is a limit point of $\set{E}$ if and only if there exists a
filter base $\setset{B}$ on $\set{E} \setdiff \{x\}$ such that
$\setset{B} \to x$.

\subsection{The Limit Inferior and Limit Superior}
\label{app:math_liminf_limsup_fb}

Take a topological space $(\set{X},\setset{T})$ such that
$(\set{X},{\leq})$ is a partially ordered set. Also take subset $\set{E}
\subseteq \set{X}$. Recall from \longref{app:math_topology_set_limits}
that the limit inferior of $\set{E}$ is
%
\begin{equation*}
        \liminf \set{E}
        =
        \inf 
        \{ x \in \set{X} : x \text{ is a limit point of } \set{E} \}
\end{equation*}
%
and the limit superior of $\set{E}$ is
%
\begin{equation*}
        \limsup \set{E}
        =
        \sup 
        \{ x \in \set{X} : x \text{ is a limit point of } \set{E} \}
\end{equation*}
%
and so these are the greatest lower and least upper bounds of $\set{E}$
respectively. If $(\set{E},{\leq})$ is a complete lattice, then the
limit inferior and limit superior are actually the least and greatest
limit points of $\set{E}$ respectively. If both bounds exist and are
equal to each other, then there must be exactly one limit point of
$\set{E}$ and that single point \emph{might} be called the limit of
$\set{E}$ (though it is not common to do this). Now recall that the
limit points of a set are similar to the cluster points of a filter
base. Thus, it is natural to define a bounds on the cluster points in a
similar fashion.

\paragraph{The Limit Inferior of a Filter Base:} Take the topological
space $(\set{X},\set{T}_\set{X})$ where $(\set{X},{\leq})$ is a
partially ordered set. Now take a filter base $\setset{B}$. Recall that
the cluster points of $\setset{B}$ are given by the set
%
\begin{equation*}
        \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \}
\end{equation*}
%
The \symdef[\emph{limit
inferior}]{Ganalysis.11}{liminf}{$\liminf$}{limit inferior (\ie, $\sup
\inf$)} of filter base $\setset{B}$, denoted $\liminf \setset{B}$, is
the greatest lower bound of the cluster points of filter base
$\setset{B}$. That is,
%
\begin{equation*}
        \liminf \setset{B}
        \triangleq
        \inf 
        \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \}
\end{equation*}
%
It can be shown that
%
\begin{equation*}
        \liminf \setset{B}
        =
        \sup \{ \inf \set{B} : \set{B} \in \setset{B} \}
\end{equation*}
%
which is the more common definition of the limit inferior of a filter
base. Note that the limit inferior may not exist; however, the limit
inferior will always exist if $(\set{X},{\leq})$ is a complete lattice.

\paragraph{The Limit Superior of a Filter Base:} Take the topological
space $(\set{X},\set{T}_\set{X})$ where $(\set{X},{\leq})$ is a
partially ordered set. Now take a filter base $\setset{B}$. The
\symdef[\emph{limit superior}]{Ganalysis.11}{limsup}{$\limsup$}{limit
superior (\ie, $\inf \sup$)} of filter base $\setset{B}$, denoted
$\limsup \setset{B}$, is the least upper bound of the cluster points of
filter base $\setset{B}$. That is,
%
\begin{equation*}
        \limsup \setset{B}
        \triangleq
        \sup 
        \bigcap \{ \overline{\set{B}} : \set{B} \in \setset{B} \}
\end{equation*}
%
It can be shown that
%
\begin{equation*}
        \limsup \setset{B}
        =
        \inf \{ \sup \set{B} : \set{B} \in \setset{B} \}
\end{equation*}
%
which is the more common definition of the limit superior of a filter
base. Note that the limit superior may not exist; however, the limit
superior will always exist if $(\set{X},{\leq})$ is a complete lattice.

\paragraph{Agreement of Limit Inferior and Limit Superior:} Take the
topological space $(\set{X},\set{T}_\set{X})$ where $(\set{X},{\leq})$
is a partially ordered set. Now take a filter base $\setset{B}$. Assume
that $\liminf \setset{B}$ and $\limsup \setset{B}$ both exist. In that
case, for some $q \in \set{X}$,
%
\begin{equation*}
        \liminf \setset{B} =
        \limsup \setset{B} = q
        \quad \text{ if and only if }
        \lim \setset{B} = q
\end{equation*}
%
In other words, if the limit superior and limit inferior both exist and
agree, then there must be only one cluster point. If there is only one
cluster point, then that cluster point must be the limit point of the
filter base. Similarly, if the limit of the filter base exists, it must
be the only cluster point, and therefore the upper and lower bounds on
the cluster points must agree. Note that if the limit inferior and limit
superior both exist and do \emph{not} agree, then the limit will not
exist.

\section{Metric Spaces and Numerical Topology}
\label{app:math_metric_spaces}

So far all of our results from topology have been given in terms of
general topological spaces; however, we have not yet provided concrete
examples of topological spaces. Before we can do that, we must introduce
a specifically kind of topological space: the \emph{metric space}. It
can be said that topology establishes a sort of distance relationship
between points. Metric spaces explicitly define that distance, and
because of that they are a sort of ideal topological space. Once we
define the metric space, we show how metric spaces can be used to make
$\R$ and $\extR$ valid topological spaces.

\subsection{The Metric Space}
\label{app:math_metric_space_specification}

Take a set $\set{X}$ and $p,q,r \in \set{X}$ which will be called
\emph{points}. Define the \emph{distance} function $d: \set{X} \times
\set{X} \mapsto \R$ such that
%
\begin{enumerate}[(i)]
        \item $d(p,q) \geq 0$
        \item $d(p,q) = 0$ \text{ if and only if } $p = q$
        \item $d(p,q) = d(q,p)$
        \item $d(p,r) \leq d(p,q) + d(q,r)$
                \label{item:metric_triangle_inequality}
\end{enumerate}
%
Then $(\set{X},d)$ is called a \emph{metric space} and $d$ is called a
\emph{metric} on $\set{X}$. As we will discuss, every metric space is a
topological space; that is, the metric $d$ can \emph{induce} a topology
on $\set{X}$. Note that property
(\shortref{item:metric_triangle_inequality}), known as the
\emph{triangle inequality}, is equivalent to the statement that 
%
\begin{equation*}
        d(p,q) \geq d(p,r) - d(q,r)
\end{equation*}
%
which is sometimes called the \emph{inverse triangle inequality}.

\subsection{Metric Space as Topological Space}
\label{app:math_metric_space_as_topological_space}

We will now show that all metric spaces are topological spaces. We do
this by defining an \emph{open ball} and then constructing all of the
open sets of the topology in terms of those open balls. For the
following, take a metric space $(\set{X},d)$ and subset $\set{E}
\subseteq \set{X}$.

\paragraph{Open and Closed Balls:} Take $x \in \set{X}$ and $r \in
\R_{>0}$. Call \symdef{Ganalysis.00000}{openball}{$B(x;r)$}{open metric
ball of radius $r$ centered at $x$} an \emph{open (metric) ball} of
radius $r$ with center $x$, and define it as the set
%
\begin{equation*}
        B(x;r)
        \triangleq
        \{ y \in \set{X} : d(x,y) < r \}
\end{equation*}
%
Similarly, call \symdef{Ganalysis.00001}{closedball}{${B[x;r]}$}{closed
metric ball of radius $r$ centered at $x$} a \emph{closed (metric) ball}
of radius $r$ with center $x$, and define it as the set
%
\begin{equation*}
        B[x;r]
        \triangleq
        \{ y \in \set{X} : d(x,y) \leq r \}
\end{equation*}
%
Note that it is always the case that $B(x;r) \subseteq B[x;r]$.

\paragraph{Metrically Open Sets:} The set $\set{E}$ is a
\emph{metrically open set} if for any point $x \in \set{E}$, there is an
$\varepsilon \in \R_{>0}$ such that $B(x;\varepsilon) \subseteq
\set{E}$. In other words, all points of a metrically open set are
elements of open metric balls that are subsets of $\set{E}$.  Sometimes
it may be convenient to call this \emph{metrically open in $\set{X}$}.
It is equivalent to say that $\set{E}$ is a metrically open set if and
only if it is a (possibly infinite) union of open metric balls.

\paragraph{Definition of Topology on a Metric Space:} Note that
%
\begin{itemize}
        \item the empty set $\emptyset$ has no points, and so it is
                trivially a metrically open set
        \item the set $\set{X}$ is a metrically open set
        \item the union of any set of metrically open sets is also a
                metrically open set
        \item the intersection of any two metrically open sets is also
                a metrically open set
\end{itemize}
%
therefore the set $\setset{T} \triangleq \{ \set{S} : \set{S} \text{ is
a metrically open set in } \set{X} \}$ is a valid topology on $\set{X}$,
and $(\set{X},\setset{T})$ is a topological space. Thus, any metrically
open set in $\set{X}$ is equivalently an \emph{open set (in $\set{X}$)}
and so all definitions and results from \longref{app:math_topology} also
apply to metric space $(\set{X},d)$ with this notion of an open set
being a union of open balls. 

\subsection{Definitions and Notation}
\label{app:math_metric_space_definitions}

Now that the metric space has been shown to be a topological space, it
is useful to translate the constructs used with topological spaces into
a framework specific to metric spaces. Thus, we now redefine terms used
with topological spaces in a way more applicable to metric spaces. We
also introduce some additional terms used specifically with metric
spaces. Note that all topological relationships among these terms still
hold.

\paragraph{Open Sets and Neighborhoods:} For the following definitions,
take the generic metric space $(\set{X},d)$. That is, take a set
$\set{X}$ with metric $d$ with elements of $\set{X}$ called points. Also
take subset $\set{E} \subseteq \set{X}$. As before, recall the
definitions of \emph{filter on a set} and \emph{filter base on a set}
from \longref{app:math_filters_on_sets}.
%
\begin{description}
        \item\emph{Universal Set:} The \emph{universal set} for the
                metric space is $\set{X}$.
        \item\emph{Set Complement:} The \emph{complement} of $\set{E}$
                is denoted $\set{E}^c$ and defined by $\set{E}^c
                \triangleq \set{X} \setdiff \set{E}$. This is consistent
                with calling $\set{X}$ the \emph{universal set} for all
                sets contained in the metric space.
        \item\emph{Open Sets:} To say that $\set{E}$ is an \emph{open
                set} means that for all $x \in \set{E}$, there is an
                $\varepsilon \in \R_{>0}$ such that $B(x;\varepsilon)
                \subseteq \set{E}$. Sometimes it may be convenient to
                call this \emph{open in $\set{X}$} or \emph{open with
                respect to metric $d$ on $\set{X}$}.
        \item\emph{Closed Sets:} To say that $\set{E}$ is a \emph{closed
                set} means that $\set{E}^c$ is an open set. Sometimes it
                may be convenient to call this \emph{closed in
                $\set{X}$} or \emph{closed with respect to metric $d$ on
                $\set{X}$}.
        \item\emph{Clopen Sets:} To say that $\set{E}$ is a \emph{clopen
                set} or that it is \emph{clopen} means that $\set{E}$
                both an open set and a closed set. It is the case that
                $\set{E}$ is clopen if and only if $\set{E}^c$ is
                clopen. It can be shown that that $\emptyset$ and
                $\set{X}$ are clopen.  Sometimes it may be convenient to
                call this \emph{clopen in $\set{X}$} or \emph{clopen
                with respect to metric $d$ on $\set{X}$}.
        \item\emph{Neighborhoods:} Take $x \in \set{X}$. A
                \emph{neighborhood} $\set{U}$ of $x$ (in $\set{X}$) is a
                set such that there exists an open set $\set{G} \in
                \setset{T}$ with $\set{G} \subseteq \set{U}$ where $x
                \in \set{G}$. A neighborhood of $x$ does not need to be
                an open set; however, it must contain an open set that
                includes $x$. It can be shown that for a metric space,
                $\set{U} \subseteq \set{X}$ is a neighborhood of $x$ if
                and only if there exists an $\varepsilon \in \R_{>0}$
                where $B(x;\varepsilon) \subseteq \set{U}$. In fact, for
                any $r \in \R_{>0}$, $B(x;r)$ is a neighborhood of $x$.
        \item\emph{Neighborhood Systems:} The notation $\nhd_x$ is
                called the \emph{neighborhood system} at $x$ (for
                $\set{X}$) or the \emph{neighborhood filter} at $x$ (for
                $\set{X}$). This is the set of all neighborhoods of $x$.
                Therefore, to say that $\set{U}$ is a neighborhood of
                $x$ is equivalent to saying that $\set{U} \in \nhd_x$.
                It can be verified that $\nhd_x$ is a \emph{filter on
                set $\set{X}$}.
        \item\emph{Neighborhood Base:} Take $x \in \set{X}$. A
                \emph{neighborhood base} $\setset{B}$ at $x$ (for
                $\set{X}$) is such that
                %
                \begin{itemize}
                        \item for all $\set{B} \in \setset{B}$,
                                $\set{B}$ is a neighborhood of $x$ (\ie,
                                $\set{B} \in \nhd_x$)
                        \item for any neighborhood $\set{U}$ of $x$
                                (\ie, $\set{U} \in \nhd_x$), there
                                exists a $\set{B} \in \setset{B}$ such
                                that $\set{B} \subseteq \set{U}$
                \end{itemize}
                %
                That is, a neighborhood base $\setset{B}$ at $x$ is a
                set of neighborhoods of $x$ such that every neighborhood
                of $x$ contains some set that belongs to $\setset{B}$.
                Note that any neighborhood base $\setset{B}$ at $x$ is
                such that $\setset{B} \subseteq \nhd_x$. It can be
                verified that any neighborhood base $\setset{B}$ of a
                point $x$ is a \emph{filter base on set $\set{X}$} that
                \emph{generates} the neighborhood system $\nhd_x$ (\ie,
                $\setset{B}$ is a basis for the neighborhood filter
                $\nhd_x$).
\end{description}

\paragraph{Points and Sets:} The following are some common terms used to
describe points and sets in metric spaces. For the following
definitions, take the generic metric space $(\set{X},d)$. That is, take
a set $\set{X}$ with metric $d$ with elements of $\set{X}$ called
points. Also take subset $\set{E} \subseteq \set{X}$.
%
\begin{description}
        \item\emph{Limit Points of Sets:} Take point $x \in \set{X}$.
                The point $x$ is a \emph{limit point} of a set $\set{E}$
                if every neighborhood of $x$ includes a point $p \in
                \set{E}$ with $p \neq x$. In other words, to say that
                $x$ is a limit point of set $\set{E}$ means that for all
                $\set{U} \in \nhd_x$, there is a point $p \in \set{U}$
                with $p \in \set{E} \setdiff \{x\}$ (\ie, $\set{U} \cap
                (\set{E} - \{x\}) \neq \emptyset$). Note that if $x$ is
                a limit point of $\set{E}$, it need not be an element of
                $\set{E}$. It can be shown that the set $\set{E}$ is a
                closed set if and only if every limit point of $\set{E}$
                is also an element of set $\set{E}$. An extension of
                this shows that the set of limit points of a set is a
                closed set.
        \item\emph{Isolated Points:} If point $x \in \set{E}$ is not a
                limit point of set $\set{E}$ then $x$ is an
                \emph{isolated point} of set $\set{E}$.
        \item\emph{Interior Points:} A point $x$ is an \emph{interior
                point} of set $\set{E}$ if there is a neighborhood
                $\set{U}$ of $x$ such that $\set{U} \subseteq \set{E}$.
                In other words, a point $x$ is an interior point of
                $\set{E}$ if there exists some $\varepsilon \in \R_{>0}$
                with $B(x;\varepsilon) \subseteq \set{E}$. It can be
                shown that $\set{E}$ is an open set if and only if every
                element of $\set{E}$ is an interior point of $\set{E}$.
        \item\emph{Interior:} The \emph{interior} of $\set{E}$ is
                denoted $\interior(\set{E})$ and is the set of all
                interior points of set $\set{E}$. Some authors denote
                the interior of $\set{E}$ by $\overset{\circ}{\set{E}}$
                or $\set{E}^\circ$. In this metric space, the
                $\interior(\set{E})$ is the union of all open balls
                contained in $\set{E}$ (\ie, $\interior(\set{E}) =
                \bigcup \{ B(x;\varepsilon) : x \in \set{E}, \varepsilon
                \in \R_{>0}, B(x;\varepsilon) \subseteq \set{E} \}$).  
        \item\emph{Bounded:} The set $\set{E}$ is called \emph{bounded}
                if there is a real number $b \in \R_{>0}$ and a point $x
                \in \set{X}$ such that $d(x,y) < b$ for all $y \in
                \set{E}$. We have already defined bounded for partially
                ordered sets. While this is not the same notion, because
                of a special property of the real numbers, in our
                examples there should be no conflict between these two
                definitions.
        \item\emph{Dense Sets:} The set $\set{E}$ is called \emph{dense in
                $\set{X}$} if every point in $\set{X}$ is either a limit
                point of $\set{E}$, a point in $\set{E}$, or both.
                Roughly speaking, if $\set{E}$ is dense in $\set{X}$,
                then for any point in $\set{E}$, there is a point in
                $\set{X}$ that is near to it. Precisely, this means that
                if $\set{E}$ is dense in $\set{X}$ then for any point $p
                \in \set{E}$ and $\varepsilon \in \R_{>0}$, there exists
                a point $x \in \set{X}$ such that $x \in
                B(p;\varepsilon)$. To say a set is \emph{dense in
                itself} means that the set contains no isolated points.
                Note that this is similar to what we called densely
                ordered; however, it is not the same notion.
        \item\emph{Set Closure:} The \emph{(topological) closure} of
                $\set{E}$ is denoted $\overline{\set{E}}$ and is the
                intersection of all closed sets that are supersets of
                $\set{E}$. It is defined by $\overline{\set{E}}
                \triangleq \set{E} \cup \set{E}'$ where $\set{E}'$ is
                the set of all limit points of $\set{E}$. In other
                words, the closure of set $\set{E}$ is the set of all
                elements of $\set{E}$ and all limit points of $\set{E}$.
        \item\emph{Closure Point:} A \emph{closure point} is a point of
                set $\set{E}$ is a point that is an element of its
                closure $\overline{\set{E}}$. That is, $x \in \set{X}$
                is a closure point for $\set{E}$ if and only if $x \in
                \overline{\set{E}}$.
\end{description}

\subsection{Important Metric Space Results}
\label{app:important_metric_results}

There are a number of important results for metric spaces.

\paragraph{Open Balls as Open Sets:} All open balls are open sets. To
see this, consider $p \in \set{X}$, $r \in \R_{>0}$, and open ball
$B(p;r)$. Take $q \in B(p;r)$; therefore, $d(p,q) < r$, and so there is
an $h \in \R_{>0}$ such that $d(p,q) = r - h$. Now, since $d$ is a
metric then for all $s \in \set{X}$,
%
\begin{equation*}
        d(p,s) \leq d(p,q) + d(q,s)
\end{equation*}
%
Therefore, for all $s \in B(q;h)$ (\ie, $s \in \set{X}$ with $d(q,s) <
h$), it must be that
%
\begin{equation*}
        d(p,s) < (r - h) + h = r
\end{equation*}
%
Thus, for all $s \in B(q;h)$, $d(p,s) < r$, and so $s \in B(p;r)$.
Therefore $q$ is an interior point of $B(p;r)$. Since $q$ was chosen
arbitrarily then every point of $B(p;r)$ is an interior point of
$B(p;r)$, and so $B(p;r)$ must be an open set. This proves that every
open ball is an open set. However, every open set is a neighborhood by
definition; and thus, every open ball is a neighborhood of its center.

\paragraph{Cascades of Open Balls:} Take a metric space $(\set{X},d)$
and $r_1,r_2 \in \R_{>0}$ such that $r_1 > r_2$. Now take a point $x \in
\set{X}$ and another point $y \in B(x;r_2)$. Note that
%
\begin{align*}
        d(x,y) < r_2 < r_1
\end{align*}
%
and therefore $y \in B(x;r_1)$. Thus, $B(x;r_2) \subseteq B(x;r_1)$.

\paragraph{Metric Spaces as Hausdorff Topological Spaces:} Take a metric
space $(\set{X},d)$ and points $p,q \in \set{X}$ with $p \neq q$. By the
definition of a metric, $d(p,q) > 0$. Therefore, there exists some $r
\in \R_{>0}$ where $d(p,q) = r$. Take such an $r$. Now take a point $x
\in \set{X}$ such that $d(p,x) < r/2$. Clearly $x \in B(p,r/2)$.
Additionally, by the properties of metric $d$,
%
\begin{align*}
        d(q,x) 
        &\geq d(q,p) - d(x,p)\\
        &= d(p,q) - d(x,p)\\
        &= d(p,q) - d(p,x)\\
        &= r - d(p,x)\\
        &\geq r - \frac{r}{2}\\
        &= \frac{r}{2}
\end{align*}
%
Thus, since $d(q,x) \geq r/2$ it must be that $x \notin B(q;r/2)$.
Therefore, these two metric balls have no common elements and 
%
\begin{equation}
        B(p;\tfrac{r}{2}) \cap B(q;\tfrac{r}{2}) = \emptyset
        \label{eq:metric_hausdorff_proof}
\end{equation}
%
Now, assume that there exists a filter base $\setset{B}$ such that
$\setset{B} \to p$ and $\setset{B} \to q$. This implies that there are
sets $\set{B}_p,\set{B}_q \in \setset{B}$ such that $\set{B}_p \subseteq
B(p;r/2)$ and $\set{B}_q \subseteq B(q;r/2)$. Therefore, 
%
\begin{equation*}
        \set{B}_p \cap \set{B}_q 
        \subseteq 
        B(p;\tfrac{r}{2}) \cap B(q;\tfrac{r}{2})
\end{equation*}
%
However, since $\setset{B}$ is a filter base, $\set{B}_p \cap \set{B}_q
\neq \emptyset$, and so 
%
\begin{equation*}
        B(p;\tfrac{r}{2}) \cap B(q;\tfrac{r}{2}) \neq \emptyset
\end{equation*}
%
However, this is a contradiction of \longref{eq:metric_hausdorff_proof},
and so it must be that $p=q$. That is, since $\setset{B} \to p$ and
$\setset{B} \to q$, then $p$ and $q$ are the same point. Therefore, all
metric spaces are Hausdorff topological spaces.

\paragraph{Metric Spaces as First-Countable Spaces:} As discussed in
\longref{app:math_topological_spaces}, a first-countable space is a
topological space where each point in the space has a countable
neighborhood base. Take a metric space $(\set{X},d)$ and a point $x \in
\set{X}$. Now define $\setset{B}_x \subseteq \Pow(\set{X})$ by
%
\begin{equation*}
        \setset{B}_x
        \triangleq
        \{
        B(x,\tfrac{1}{n}) : n \in \N)
        \}
\end{equation*}
%
Since every metric ball centered at $x$ is a neighborhood of $x$ then
$\setset{B}_x$ is a set of neighborhoods of $x$. Additionally, since for
all $r \in \R_{>0}$, there exists an $N \in \N$ such that $1/n < r$,
then for any neighborhood $\set{U} \in \nhd_x$, there exists a $\set{B}
\in \setset{B}_x$ such that $\set{B} \subseteq \set{U}$. Therefore,
$\setset{B}_x$ is a countable neighborhood base of $x$. Since $x$ was
chosen arbitrarily, then metric space $(\set{X},d)$ is a first-countable
topological space. That is, all metric spaces are first-countable
spaces.

\subsection{Real Numbers as Metric Spaces}
\label{app:math_real_numbers_as_metric_spaces}

Take the ordered field $\R$. Define the function $d: \R \times \R
\mapsto \R$ as the absolute value of the difference between its
arguments; that is, define $d$ by
%
\begin{equation*}
        d(x,y) \triangleq |x-y|
\end{equation*}
%
It can be verified that $d$ is a metric. This makes $(\R,d)$ a metric
space (and thus a Hausdorff topological space). In fact, $d$ is the
standard metric on $\R$ and is usually assumed to be equipped with $\R$
unless otherwise specified.

\paragraph{Open and Closed Balls:} Take $x \in \R$ and $\delta \in
\R_{>0}$. It is clear that
%
\begin{align*}
        B(x;\delta) 
        &= \{ y \in \R : x - \delta < y < x + \delta \}\\
        &= (x - \delta,x + \delta)
\end{align*}
%
where $(x - \delta,x + \delta)$ is an \emph{open interval} or a
\emph{segment} as defined above. Similarly,
%
\begin{equation*}
        B[x;\delta] = [x - \delta,x + \delta]
\end{equation*}
%
where $[x - \delta,x + \delta]$ is a \emph{closed interval} as defined
above.

\paragraph{Open Intervals as Neighborhoods:} Clearly, for $x \in \R$ and
$\delta \in \R_{>0}$, the open interval $(x - \delta,x + \delta)$ is a
not only an open ball centered at $x$ but is also a neighborhood of $x$.

\paragraph{Intervals as Open and Closed Sets:} Any open interval is an
open set, and any closed interval is a closed set. For example, for $a,b
\in \R$ with $a<b$, $(a,b)$ and $(a,\infty)$ are both open sets and
$[a,b]$ is a closed set. However, for $a,b \in \R$ with $a<b$, the sets
$(a,b]$ and $[a,b)$ are neither closed nor open sets. Note that since
$(-\infty,\infty) = \R$ and $\R$ is clopen then $(-\infty,\infty)$ must
be both a closed and open set. Of course, this is not the case for
$\extR$.

\paragraph{Not All Closed Sets Are Bounded:} Take $a \in \R$. The
interval $[a,\infty)$ is a closed set. However, there is no $b \in \R$
such that $x \leq b$ for all $x \in [a,\infty)$. Similarly, there is no
$b \in \R$ such that $d(x,y) < b$ for all $x,y \in [a,\infty)$. Thus,
while $[a,\infty)$ is closed, it is not bounded in any sense. In fact,
because of a special property of the real numbers, boundedness in the
order sense is equivalent to boundedness in the metric sense.

\paragraph{Compact Sets:} When considering metric space $\R$ with the
standard metric, a set is \emph{compact} if and only if it is both
closed and bounded. Note that for $a,b \in \R$ with $a \leq b$, it can
easily be shown that $[a,b]$ is closed and bounded. It can be shown that
all closed and bounded subsets of $\R$ meet the topological definition
of compact given above.

\subsection{Extended Real Numbers as Topological Spaces}
\label{app:math_extended_real_numbers_as_metric_spaces}

There is no desirable way to extend the standard metric in $\R$ to the
elements $\infty, {-\infty} \in \extR$. However, by defining topological
neighborhoods of $\infty$ and $-\infty$ explicitly and inheriting all of
the existing neighbhorhoods of elements of $\R$, we can treat the
extended reals as a Hausdorff topological space.

\paragraph{Neighborhoods of Infinity:} Explicitly define a topological
\emph{neighborhood of $\infty$} as any interval of the form $(c,\infty]$
where $c \in \R$. Define $\nhd_\infty$ as the set of all neighborhoods
of $\infty$. Note that $\nhd_\infty$ is a filter base and $\nhd_\infty
\to \infty$. In fact, $\infty$ is the unique limit of $\nhd_\infty$.

\paragraph{Neighborhoods of Negative Infinity:} Explicitly define a
topological \emph{neighborhood of $-\infty$} as any interval of the form
$[-\infty,c)$ where $c \in \R$. Define $\nhd_{-\infty}$ as the set of
all neighborhoods of $-\infty$. Note that $\nhd_{-\infty}$ is a filter
base and $\nhd_{-\infty} \to {-\infty}$. In fact, $-\infty$ is the
unique limit of $\nhd_{-\infty}$.

\paragraph{Intervals as Open and Closed Sets:} Any open interval is an
open set, and any closed interval is a closed set. For example, for $a,b
\in \R$ with $a<b$, $(a,b)$ and $(a,\infty)$ are both open sets and
$[a,b]$ and $[-\infty,a]$ are both closed sets. However, for $a,b \in
\R$ with $a<b$, the sets $(-\infty,\infty]$, $[-\infty,\infty)$,
$(a,b]$, $[b,a)$ are neither closed nor open sets. Note that in $\extR$,
$(-\infty,\infty)$ is not a closed set; this is a critical difference
between $\R$ and $\extR$. Of course, in $\extR$, $[-\infty,\infty]$ is
clopen since $[-\infty,\infty] = \extR$.

\paragraph{All Sets are Bounded:} Because $\extR$ is a complete lattice,
all subsets of $\extR$ are bounded. Therefore, all closed sets are
bounded. For example, take $a \in \R$. The set $[a,\infty] \subseteq
\extR$ is bounded from above by $\infty \in \extR$ and bounded from
below by $a \in \R$. Note that the sets $[a,\infty)$, $({-\infty},a]$,
and $({-\infty},\infty)$ are all bounded but are not closed in $\extR$.

\paragraph{Compact Sets:} When considering Hausdorff topological space
$\extR$ with the neighborhoods described above, a set is \emph{compact}
if and only if it is closed. However, for $a \in \R$, while sets like
$(-\infty,\infty)$, $[a,\infty)$, $(-\infty,a]$ are all closed sets in
$\R$, these are not closed in $\extR$. Therefore, these sets are not
compact in either $\R$ or $\extR$. However, the set $[-\infty,\infty]$
is compact in $\extR$.

\section{Limits of Functions on Topological Spaces}
\label{app:limits_in_topological_spaces}

For the following, recall the definitions of the \emph{image} of a set
under a function and the \emph{image} of a set of sets under a function.
In particular, recall that for sets $\set{X}$ and $\set{Y}$, subset
$\set{E} \subseteq \set{X}$, set of sets $\setset{B}$ on $\set{X}$, and
function $f: \set{X} \mapsto \set{Y}$,
%
\begin{itemize}
        \item $f[E] = \{ f(x) : x \in \set{E} \}$ is the image of
                $\set{E}$ under $f$
        \item $f\{ \setset{B} \} = \{ f[B] : B \in \setset{B} \}$ is the
                image of $\setset{B}$ under $f$
\end{itemize}
%
Also recall that the image of a filter base under a function is a filter
base on that function's codomain.

\subsection{Limits of Functions}

Now that we have defined the limiting behaviors of filters, we can use
filters to describe the behavior of functions.

\paragraph{Functions with Topological Codomains:} Take a set $\set{X}$,
a topological space $(\set{Y},\setset{T}_\set{Y})$, and subset $\set{E}
\subseteq \set{X}$.  Also take a function $f: \set{E} \mapsto \set{Y}$.
Additionally, let $\setset{B}$ be a filter base on $\set{E}$ and point
$q \in \set{Y}$. To say that $f$ \emph{converges to $q$ (as $x$ runs)
along} $\setset{B}$ means that for all $\set{V} \in \nhd_q$ there exists
a $\set{B} \in \setset{B}$ such that $f[B] \subseteq \set{V}$. Recall
that the image $f\{ \setset{B} \}$ is a filter base on $\set{Y}$. By
this definition of convergence, $f$ converges to $q$ along $\setset{B}$
if and only if $f\{ \setset{B} \} \to q$. This is denoted by any of
%
\begin{itemize}
        \item $f \to q$ along $\setset{B}$
        \item $f(x) \to q$ as $x \to \setset{B}$
\end{itemize}
%
Therefore, it can be said that $f \to q$ along $\setset{B}$ if and only
if $f\{ \setset{B} \} \to q$.

\paragraph{The Identity Function:} Take a topological space
$(\set{X},\setset{T}_\set{X})$ and subset $\set{E} \subseteq \set{X}$.
Additionally, let $f: \set{E} \mapsto \set{E}$ be the identity function,
which is defined by $f(x) \triangleq x$ for all $x \in \set{E}$. Take a
point $p \in \set{X}$ and filter base $\setset{B}$ on $\set{E}$ such
that $\setset{B} \to p$. Clearly it is also the case that $f\{
\setset{B} \} = \setset{B}$, and so it must also be that $f\{ \setset{B}
\} \to p$. From the notation introduced above, we could say that $f \to
p$ as $x \to \setset{B}$; however, since $f$ is the identity function,
we say
%
\begin{equation*}
        x \to p \text{ as } x \to \setset{B}
\end{equation*}
%
Of course, this is the case if and only if $\setset{B} \to p$. In this
case, we say that $x$ \emph{approaches} $p$.

\paragraph{Functions on Topological Spaces:} Take the topological spaces
$(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ and a
subset $\set{E} \subseteq \set{X}$. Now take function $f: \set{E}
\mapsto \set{Y}$ and points $p \in \set{X}$ and $q \in \set{Y}$ where
$p$ is a limit point of set $\set{E}$. Define $\setset{B}_p$ as
%
\begin{equation}
        \setset{B}_p 
        \triangleq 
        \{ 
        \set{U} \cap ( \set{E} \setdiff \{p\} )
        :
        \set{U} \in \nhd_p
        \}
        \label{eq:filter_base_for_limit}
\end{equation}
%
Since $p$ is a limit point of $\set{E}$ then $\setset{B}_p$ is a filter
base on $\set{E} \setdiff \{p\}$ such that $\setset{B}_p \to p$, and
therefore we can say that $x \to p$ as $x \to \setset{B}_p$. Now assume
that $f\{ \setset{B}_p \} \to q$. That is,
\symdef{Ganalysis.122}{limfunc}{$f(x) \to q$}{limit of function $f$
(\eg, as $x \to p$)} as $x \to \setset{B}_p$. In this case, it is said
that
%
\begin{equation*}
        f(x) \to q \text{ as } x \to p
\end{equation*}
%
or that $f(x)$ \emph{converges to $q$ as $x$ approaches} $p$. In
summary, this is a statement that for all $\set{V} \in \nhd_q$, there
exists a $\set{U} \in \nhd_p$ such that $f[\set{U}] \subseteq \set{V}$.
Note that this does not imply that $f(p)=q$. In fact, since $p$ is only
a limit point of $\set{E}$, it may be that $p \notin \set{E}$, in which
case $f(p)$ would not be defined. Now assume that
$(\set{Y},\setset{T}_\set{Y})$ is a Hausdorff space. In this case, for
$y_1,y_2 \in \set{Y}$, $f\{ \setset{B}_p \} \to y_1$ and $f\{ \setset{B}
\}_p \to y_2$ implies that $y_1 = y_2 = q$. Therefore, it is said that
$q$ is \emph{the limit of $f$ as $x$ approaches} $p$, and it is written
%
\begin{equation*}
        \lim\limits_{x \to p} f(x) = q
\end{equation*}
%
Technically, this notation can be used when $\set{Y}$ is not a Hausdorff
space just as long as $q$ is the unique limit of $f$ as $x$ approaches
$p$. Note that since $f\{ \setset{B}_p \} \to q$ in a Hausdorff space
then
%
\begin{equation*}
        \lim\limits_{x \to p} f(x)
        =
        \lim f\{\setset{B}_p\} = q
\end{equation*}
%
where $\setset{B}_p$ is from \longref{eq:filter_base_for_limit}. Note
that if no such $q \in \set{Y}$ exists, then the function is said to
\emph{diverge at $p$ (in $\set{X}$)}.

\subsection{Limits from the Left and the Right}

If the domain of a function happens to be totally ordered, there are two
important filter bases that can be built that lead to limits with
important practical applications.

\paragraph{Limits from the Left:} Take the topological spaces
$(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ and a
subset $\set{E} \subseteq \set{X}$. Assume that $(\set{X},{\leq})$ is a
totally ordered set. Now take function $f: \set{E} \mapsto \set{Y}$ and
points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a limit point of
set $\set{E}$. Define $\setset{B}_{p-}$ as
%
\begin{equation}
        \setset{B}_{p-}
        \triangleq 
        \{ 
        \set{U} \cap \{ x \in \set{E} : x < p \} 
        :
        \set{U} \in \nhd_p
        \}
        \label{eq:filter_base_for_left_limit}
\end{equation}
%
Since $p$ is a limit point of $\set{E}$ then $\setset{B}_{p-}$ is a
filter base on $\{ x \in \set{E} : x < p \}$ such that $\setset{B}_{p-}
\to p$, and therefore we can say that $x \to p$ as $x \to
\setset{B}_{p-}$. In this special case, we say that $x$ \emph{approaches
$p$ from the left}, which we denote by $x \to {p-}$. Now assume that
$f\{ \setset{B}_{p-} \} \to q$. That is, $f(x) \to q$ as $x \to
\setset{B}_{p-}$. In this case, it is said that
%
\begin{equation*}
        f(x) \to q \text{ as } x \to {p-}
\end{equation*}
%
or that $f(x)$ \emph{converges to $q$ as $x$ approaches $p$ from the
left}. This is known as the \emph{left-hand limit} of function $f$ at
point $p$. Now assume that $(\set{Y},\set{T}_\set{Y})$ is a Hausdorff
space. In this case, it is said that $q$ is the \emph{limit of $f$ as
$x$ approaches $p$ from the left} or the \emph{left-hand limit of $f$ at
$p$}, and it is written
%
\begin{equation*}
        \lim\limits_{x \to {p-}} f(x) = q
\end{equation*}
%
Technically, this notation can be used when $\set{Y}$ is not a Hausdorff
space just as long as $q$ is the unique limit of $f$ as $x$ approaches
$p$ from the left. Note that since $f\{ \setset{B}_{p-} \} \to q$ in
a Hausdorff space then
%
\begin{equation*}
        \lim\limits_{x \to {p-}} f(x)
        =
        \lim f\{\setset{B}_{p-}\}
\end{equation*}
%
where $\setset{B}_{p-}$ is from \longref{eq:filter_base_for_left_limit}.

\paragraph{Limits from the Right:} Take the topological spaces
$(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ and a
subset $\set{E} \subseteq \set{X}$.  Assume that $(\set{X},{\leq})$ is a
totally ordered set. Now take function $f: \set{E} \mapsto \set{Y}$ and
points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a limit point of
set $\set{E}$. Define $\setset{B}_{+}$ as
%
\begin{equation}
        \setset{B}_{+}
        \triangleq 
        \{ 
        \set{U} \cap \{ x \in \set{E} : x > p \} 
        :
        \set{U} \in \nhd_p
        \}
        \label{eq:filter_base_for_right_limit}
\end{equation}
%
Since $p$ is a limit point of $\set{E}$ then $\setset{B}_{+}$ is a
filter base on $\{ x \in \set{E} : x > p \}$ such that $\setset{B}_{+}
\to p$, and therefore we can say that $x \to p$ as $x \to
\setset{B}_{+}$. In this special case, we say that $x$ \emph{approaches
$p$ from the right}, which we denote by $x \to {p+}$. Now assume that
$f\{ \setset{B}_{+} \} \to q$. That is, $f(x) \to q$ as $x \to
\setset{B}_{+}$. In this case, it is said that
%
\begin{equation*}
        f(x) \to q \text{ as } x \to {p+}
\end{equation*}
%
or that $f(x)$ \emph{converges to $q$ as $x$ approaches $p$ from the
right}. This is known as the \emph{right-hand limit} of function $f$ at
point $p$. Now assume that $(\set{Y},\set{T}_\set{Y})$ is a Hausdorff
space. In this case, it is said that $q$ is the \emph{limit of $f$ as
$x$ approaches $p$ from the right} or the \emph{right-hand limit of $f$
at $p$}, and it is written
%
\begin{equation*}
        \lim\limits_{x \to {p+}} f(x) = q
\end{equation*}
%
Technically, this notation can be used when $\set{Y}$ is not a Hausdorff
space just as long as $q$ is the unique limit of $f$ as $x$ approaches
$p$ from the right. Note that since $f\{ \setset{B}_{p+} \} \to q$ in
a Hausdorff space then
%
\begin{equation*}
        \lim\limits_{x \to {p+}} f(x)
        =
        \lim f\{\setset{B}_{p+}\}
\end{equation*}
%
where $\setset{B}_{p+}$ is from
\longref{eq:filter_base_for_right_limit}.

\paragraph{Agreement of Left and Right Limits:} Take the topological
spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$
and a subset $\set{E} \subseteq \set{X}$. Assume that $(\set{X},{\leq})$
is a totally ordered set and $\set{Y}$ is a Hausdorff space. Now take
function $f: \set{E} \mapsto \set{Y}$ and points $p \in \set{X}$ and $q
\in \set{Y}$ where $p$ is a limit point of set $\set{E}$. It is the case
that $f(x) \to q$ as $x \to p$ if and only if $f(x) \to q$ as $x \to
{p-}$ and $f(x) \to q$ as $x \to {p+}$. That is,
%
\begin{equation}
        \lim\limits_{x \to {p-}} f(x) =
        \lim\limits_{x \to {p+}} f(x) = q
        \quad \text{ if and only if } \quad
        \lim\limits_{x \to p} f(x) = q
        \label{eq:left_and_right_limit_agreement}
\end{equation}
%
Technically, there is something similar that is true when $\set{Y}$ is
not a Hausdorff space; however, we omit that case for brevity. Note that
if the left-hand limit and the right-hand limit do not agree, then the
limit cannot exist.

\subsection{The Limit Inferior and Limit Superior}

Recall the definitions of limit superior and limit inferior of a filter
base from \longref{app:math_liminf_limsup_fb}. These establish bounds on
the cluster points of the filter base. The image of every filter base
under a function is another filter base, and so it makes sense to focus
on the bounds of the cluster points of this filter base.

\paragraph{The Limit Inferior of a Function:} Take the topological
spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$
such that $(\set{Y},{\leq})$ is a partially ordered set. Also take a
subset $\set{E} \subseteq \set{X}$, a function $f: \set{E} \mapsto
\set{Y}$, and points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a
limit point of set $\set{E}$. Recall the definition of $\setset{B}_p$
from \longref{eq:filter_base_for_limit}. The image of this filter base
under $f$ is another filter base, and that filter base may have a limit
inferior. Therefore, call $\liminf_{x \to p} f(x)$ the \emph{limit
inferior of function $f$ as $x$ approaches $p$} and define it by
%
\begin{equation*}
        \liminf\limits_{x \to p} f(x)
        \triangleq
        \liminf f\{\setset{B}_p\}
\end{equation*}
%
This bound may not exist. However, if $(\set{Y},{\leq})$ is a complete
lattice then it will exist.

\paragraph{The Handed Limit Inferiors of a Function:} Take the
topological spaces $(\set{X},\setset{T}_\set{X})$ and
$(\set{Y},\setset{T}_\set{Y})$ such that $(\set{X},{\leq})$ is a totally
ordered set and $(\set{Y},{\leq})$ is a partially ordered set. Also take
a subset $\set{E} \subseteq \set{X}$, a function $f: \set{E} \mapsto
\set{Y}$, and points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a
limit point of set $\set{E}$. Recall the definitions of
$\setset{B}_{p-}$ and $\setset{B}_{p+}$ from
\longrefs{eq:filter_base_for_left_limit} and
\shortref{eq:filter_base_for_right_limit} respectively. The image of
each of these filter bases under $f$ is another filter base, and that
filter base may have a limit inferior. Therefore, call $\liminf_{x \to
{p-}} f(x)$ the \emph{limit inferior of function $f$ as $x$ approaches
$p$ from the left} and define it by
%
\begin{equation*}
        \liminf\limits_{x \to {p-}} f(x)
        \triangleq
        \liminf f\{\setset{B}_{p-}\}
\end{equation*}
%
Additionally, call $\liminf_{x \to {p+}} f(x)$ the \emph{limit inferior
of function $f$ as $x$ approaches $p$ from the right} and define it by
%
\begin{equation*}
        \liminf\limits_{x \to {p+}} f(x)
        \triangleq
        \liminf f\{\setset{B}_{p+}\}
\end{equation*}
%
Of course, neither of these two limit inferiors must exist. However, if
$(\set{Y},{\leq})$ is a complete lattice then they will exist.

\paragraph{The Limit Superior of a Function:} Take the topological
spaces $(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$
such that $(\set{Y},{\leq})$ is a partially ordered set. Also take a
subset $\set{E} \subseteq \set{X}$, a function $f: \set{E} \mapsto
\set{Y}$, and points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a
limit point of set $\set{E}$. Recall the definition of $\setset{B}_p$
from \longref{eq:filter_base_for_limit}. The image of this filter base
under $f$ is another filter base, and that filter base may have a limit
superior. Therefore, call $\limsup_{x \to p} f(x)$ the \emph{limit
superior of function $f$ as $x$ approaches $p$} and define it by
%
\begin{equation*}
        \limsup\limits_{x \to p} f(x)
        \triangleq
        \limsup f\{\setset{B}_p\}
\end{equation*}
%
This bound may not exist. However, if $(\set{Y},{\leq})$ is a complete
lattice then it will exist.

\paragraph{The Handed Limit Superiors of a Function:} Take the
topological spaces $(\set{X},\setset{T}_\set{X})$ and
$(\set{Y},\setset{T}_\set{Y})$ such that $(\set{X},{\leq})$ is a totally
ordered set and $(\set{Y},{\leq})$ is a partially ordered set. Also take
a subset $\set{E} \subseteq \set{X}$, a function $f: \set{E} \mapsto
\set{Y}$, and points $p \in \set{X}$ and $q \in \set{Y}$ where $p$ is a
limit point of set $\set{E}$. Recall the definitions of
$\setset{B}_{p-}$ and $\setset{B}_{p+}$ from
\longrefs{eq:filter_base_for_left_limit} and
\shortref{eq:filter_base_for_right_limit} respectively. The image of
each of these filter bases under $f$ is another filter base, and that
filter base may have a limit superior. Therefore, call $\limsup_{x \to
{p-}} f(x)$ the \emph{limit superior of function $f$ as $x$ approaches
$p$ from the left} and define it by
%
\begin{equation*}
        \limsup\limits_{x \to {p-}} f(x)
        \triangleq
        \limsup f\{\setset{B}_{p-}\}
\end{equation*}
%
Additionally, call $\limsup_{x \to {p+}} f(x)$ the \emph{limit superior
of function $f$ as $x$ approaches $p$ from the right} and define it by
%
\begin{equation*}
        \limsup\limits_{x \to {p+}} f(x)
        \triangleq
        \limsup f\{\setset{B}_{p+}\}
\end{equation*}
%
Of course, neither of these two limit superiors must exist. However, if
$(\set{Y},{\leq})$ is a complete lattice then they will exist.

\paragraph{Agreement of Four Limits:} Take the topological spaces
$(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ and a
subset $\set{E} \subseteq \set{X}$. Assume that $(\set{X},{\leq})$ is a
totally ordered set and $(\set{Y},{\leq})$ is a partially ordered set.
Now take function $f: \set{E} \mapsto \set{Y}$ and points $p \in
\set{X}$ and $q \in \set{Y}$ where $p$ is a limit point of set
$\set{E}$. Now assume that $\liminf_{x \to {p-}} f(x)$ and $\limsup_{x
\to {p-}} f(x)$ both exist. It is clear that for some $q \in \set{Y}$,
%
\begin{equation*}
        \liminf\limits_{x \to {p-}} f(x) = 
        \limsup\limits_{x \to {p-}} f(x) = q
        \quad \text{ if and only if } \quad
        \lim\limits_{x \to {p-}} f(x) = q
\end{equation*}
%
and so if the limit superior and limit inferior do not agree, the limit
from the left will not exist. Similarly, instead assume that $\liminf_{x
\to {p+}} f(x)$ and $\limsup_{x \to {p+}} f(x)$ both exist. For some $q
\in \set{Y}$,
%
\begin{equation*}
        \liminf\limits_{x \to {p+}} f(x) = 
        \limsup\limits_{x \to {p+}} f(x) = q
        \quad \text{ if and only if } \quad
        \lim\limits_{x \to {p+}} f(x) = q
\end{equation*}
%
and so if the limit superior and limit inferior do not agree, the limit
from the right will not exist. Now, as is commonly done, define the
notations
%
\begin{align*}
        f(p-) &\triangleq \lim\limits_{x \to {p-}} f(x)\\
        f(p+) &\triangleq \lim\limits_{x \to {p+}} f(x)\\
        f(p^{+}) &\triangleq \limsup\limits_{x \to {p+}} f(x)\\
        f(p^{-}) &\triangleq \limsup\limits_{x \to {p-}} f(x)\\
        f(p_{+}) &\triangleq \liminf\limits_{x \to {p+}} f(x)\\
        f(p_{-}) &\triangleq \liminf\limits_{x \to {p-}} f(x)
\end{align*}
%
However, note that each of these may or may not exist, where the latter
four will always exist when $(\set{Y},{\leq})$ is a complete lattice.
Also recall that there is no guarantee that $f(p)$ is equal to $\lim_{x
\to p} f(x)$; in fact, there is no guarantee that $f(p)$ is even
defined. Now assume that $f(p^{+})$, $f(p^{-})$, $f(p_{+})$, and
$f(p_{-})$ all exist (\eg, $(\set{Y},{\leq})$ is a complete lattice). In
this case, \longref{eq:left_and_right_limit_agreement} dictates that for
some $q \in \set{Y}$,
%
\begin{equation*}
        f(p^{+}) =
        f(p^{-}) =
        f(p_{+}) =
        f(p_{-}) = q
        \quad \text{ if and only if } \quad
        \lim\limits_{x \to p} f(x) = q
\end{equation*}
%
and so if all of the limit inferiors and limit superiors discussed here
do not agree then the limit will not exist.

\subsection{Limits of Nets}
\label{app:math_lim_nets}

As mentioned, filters generalize nets and sequences, and thus everything
defined above can be applied to nets and sequences as well. In other
words, filters give a general framework for working in analysis. Here,
we discuss results for directed sets and nets.

\paragraph{Limit of Tails of Directed Sets:} Take a directed set
$(\set{A},{\leq})$. Let $(a_\alpha)$ be the directed set with domain
$\set{A}$ and codomain $\set{A}$ where $a_\alpha = \alpha$ for all
$\alpha \in \set{A}$. Now, define the filter base $\setset{A}$ by
%
\begin{equation}
        \setset{A}
        \triangleq
        \{ 
        \{ \alpha \in \set{A} : \alpha_0 \leq \alpha \} 
        : \alpha_0 \in \set{A}
        \}
        \label{eq:filter_base_of_A_tails}
\end{equation}
%
which is the filter base of tails of $(a_\alpha)$. However, since
$a_\alpha = \alpha$ for all $\alpha \in \set{A}$, $\setset{A}$ could be
called the filter base of tails of the sequence $(\alpha)$. For ease of
notation, define the identity function $f: \set{A} \mapsto \set{A}$ by
$f(\alpha)=a_\alpha=\alpha$ for all $\alpha \in \set{A}$. Thus,
$f\{\setset{A}\} = \setset{A}$. Now assume that
$(\set{A},\setset{T}_\set{A})$ is a topological space and there exists a
$p \in \set{A}$ such that $f\{\setset{A}\} \to p$. Thus, $f(\alpha) \to
p$ as $\alpha \to \setset{A}$. In fact, since $f(\alpha) = \alpha$ for
all $\alpha \in \set{A}$ then it can be said that $\alpha \to p$ as
$\alpha \to \set{A}$.

\paragraph{Limit of a Net:} Take directed set $(\set{A},{\leq})$ and
topological space $(\set{X},\setset{T}_\set{X})$.  Let $(x_\alpha)$ be a
net with domain $\set{A}$ and codomain $\set{X}$.  For ease of notation,
define the function $f: \set{A} \mapsto \set{X}$ by $f(\alpha) =
x_\alpha$ for all $\alpha \in \set{A}$. Take $\setset{A}$ to be the
filter base defined in \longref{eq:filter_base_of_A_tails}. Thus, the
filter base that is the image of $\setset{A}$ under $f$ is
%
\begin{equation*}
        f\{ \setset{A} \}
        =
        \{ 
        \{ x_\alpha : \alpha \in \set{A}, \alpha_0 \leq \alpha \} 
        : \alpha_0 \in \set{A}
        \}
\end{equation*}
%
Assume that there exists a $q \in \set{Y}$ such that $f\{ \setset{A} \}
\to q$. That is, $f(\alpha) \to q$ as $\alpha \to \setset{A}$. In this
case, we say that the net $(x_\alpha)$ \emph{converges to} $q$. In other
words, for any $\set{U} \in \nhd_q$, there exists an $\alpha_0 \in
\set{A}$ such that $\{ x_\alpha : \alpha \in \set{A}, \alpha_0 \leq
\alpha \} \subseteq \set{U}$. In the case of this net, it can be written
that $x_\alpha \to q$. Now assume that it is also the case that
$(\set{A},\setset{T}_\set{A})$ is a topological space and $p \in
\set{A}$ is such that $\setset{A} \to p$ (\eg, for poset $\set{A}$
assume that $\sup \set{A}$ exists and let $p = \sup \set{A}$). Thus, it
can be said that $\alpha \to p$ as $\alpha \to \setset{A}$. Since it is
also the case that $x_\alpha \to q$ as $\alpha \to \setset{A}$, then we
can say that $x_\alpha \to q$ as $\alpha \to p$. If $q$ is the unique
limit of the sequence (\eg, if $\set{X}$ is a Hausdorff space) then we
can write
%
\begin{equation*}
        \lim\limits_{\alpha \to p} x_\alpha = q
\end{equation*}
%
This is the standard definition for convergence of a net. Note that if
no such $q$ exists, then the net is said to \emph{diverge (in
$\set{X}$)}.


\subsection{Limits of Sequences}
\label{app:math_lim_sequences}

As discussed in \longref{app:important_metric_results}, every metric
space is a first-countable space. In first-countable spaces, sequences
can be used in the place of nets. Thus, even though sequences are nets,
here we focus on sequences for clarity.

\paragraph{Limit of Tails of Natural Numbers:} As already discussed,
$(\N,{\leq})$ is a directed set (\ie, it is totally ordered). Let
$(a_n)$ be the sequence (\ie, a net with domain $\N$) with codomain
$\extR$ where $a_n = n$ for all $n \in \N$. In other words, $(a_n)$ is
the sequence $(1,2,3,4,\dots)$. Now, define the filter base $\setset{R}$
by
%
\begin{align}
        \setset{R}
        &\triangleq
        \{ \{ n \in \N : n_0 \leq n \} : n_0 \in \N \}
        \nonumber\\
        &=
        \{ \{ n_0, n_0+1, n_0+2, \dots \} : n_0 \in \N \}
        \label{eq:filter_base_of_N_tails}
\end{align}
%
which is called the filter base of tails of the sequence $(a_n)$.
However, since $a_n = n$ for all $n \in \N$, $\setset{R}$ could be
called the filter base of tails of the sequence $(n)$. For ease of
notation, define the function $f: \N \mapsto \extR$ by $f(n)=a_n=n$ for
all $n \in \N$. Thus $f\{ \setset{R} \} = \setset{R}$. Note that $\N
\subseteq \extR$ and $\extR$ is a Hausdorff topological space where
$\nhd_\infty$ is defined as
%
\begin{equation*}
        \nhd_\infty \triangleq \{ (a,\infty] : a \in \R \}
\end{equation*}
%
Now, for any $a \in \R$, there exists a $n_0 \in \N$ such that $\{ n_0,
n_0+1, n_0+2, \dots \} \subseteq (a,\infty]$. By the definition of
$\nhd_\infty$, this means that $f\{ \setset{R} \} \to \infty$. Thus,
$f(n) \to \infty$ as $n \to \setset{R}$. In fact, since $f(n)=n$ for all
$n \in \N$ then it can be said that $n \to \infty$ as $n \to
\setset{R}$.

\paragraph{Limit of a Sequence:} Take topological space
$(\set{X},\setset{T}_\set{X})$ and directed set $(\N,{\leq})$. Let
$(x_n)$ be a sequence (\ie, a net with domain $\N$) with codomain
$\set{X}$. For ease of notation, define the function $f: \N \mapsto
\set{X}$ by $f(n) = x_n$ for all $n \in \N$. Take $\setset{R}$ to be the
filter base defined in \longref{eq:filter_base_of_N_tails}. Thus, the
filter base that is the image of $\setset{R}$ under $f$ is
%
\begin{align*}
        f\{ \setset{R} \}
        &\triangleq 
        \{ \{ x_n : n \in \N, n_0 \leq n \} : n_0 \in \N \}\\
        &=
        \{ \{ x_{n_0}, x_{n_0+1}, x_{n_0+2}, \dots \} : n_0 \in \N \}
\end{align*}
%
which is called the filter base of tails of the sequence $(x_n)$.
Assume that there exists a $q \in \set{Y}$ such that $f\{ \setset{R} \}
\to q$. That is, $f(n) \to q$ as $n \to \setset{R}$. In this case, we
say that the net $(x_n)$ \emph{converges to} $q$. In other words, for
any $\set{U} \in \nhd_q$, there exists an $n_0 \in \N$ such that $\{
x_{n_0}, x_{n_0+1}, x_{n_0+2}, \dots \} \subseteq \set{U}$. Note that
since $x_n \to q$ as $n \to \setset{R}$ and since $n \to \infty$ as $n
\to \setset{R}$, we can simply say that $x_n \to q$ as $n \to \infty$
or, for this sequence, simply \symdef[$x_n \to
q$]{Ganalysis.121}{limseq}{$p_n \to p$}{limit of sequence $(p_n)$}. If
$q$ is the unique limit of the sequence (\eg, if $\set{X}$ is a
Hausdorff space) then we can write
%
\begin{equation*}
        \lim\limits_{n \to \infty} x_n = q
\end{equation*}
%
This is the standard definition for convergence of a sequence. Note that
if no such $q$ exists, then the sequence is said to \emph{diverge (in
$\set{X}$)}.

\paragraph{Monotonic Sequences:} Take a totally ordered set
$(\set{X},{\leq})$ where $(\set{X},\setset{T})$ is also a topological
space. Also take a sequence $(a_n)$ such that $a_n \in \set{X}$ for all
$n \in \N$. Assume that $(a_n)$ is monotonically increasing. If $\sup \{
a_n : n \in \N \}$ exists then
%
\begin{equation}
        \lim\limits_{n \to \infty} a_n = \sup\{ a_n : n \in \N \}
        \label{eq:monotonically_increasing_limit}
\end{equation}
%
Now assume that $(a_n)$ is monotonically decreasing. If $\inf \{ a_n : n
\in \N \}$ exists then
%
\begin{equation}
        \lim\limits_{n \to \infty} a_n = \inf\{ a_n : n \in \N \}
        \label{eq:monotonically_decreasing_limit}
\end{equation}
%
We state this without proof; however, this is an intuitive result.

\paragraph{Limit Inferior and Limit Superior:} Take a partially ordered
set $(\set{X},{\leq})$ and a sequence $(x_n)$ such that $x_i \in
\set{X}$ for all $i \in \N$. Recall the definitions of $\inf$ (\ie,
greatest lower bound) and $\sup$ (\ie, least upper bound) provided in
\longref{app:math_upper_lower_bound}. The \emph{limit inferior} of the
sequence $(x_n)$ is denoted $\liminf_{n \to \infty} x_n$ and defined by
%
\begin{equation}
        \liminf\limits_{n \to \infty} x_n
        \triangleq 
        \sup\{ \inf\{ x_m : m \geq n \}: n \geq 0 \}
        \label{eq:liminf_seq_definition}
\end{equation}
%
This can be called an eventual greatest lower bound of the sequence
$(x_n)$; somewhat roughly speaking, all but a finite number of elements
of $x_n$ are bounded from below by the limit inferior. The \emph{limit
superior} of the sequence $(x_n)$ is denoted $\limsup_{n \to \infty}
x_n$ and defined by
%
\begin{equation}
        \limsup\limits_{n \to \infty} x_n
        \triangleq 
        \inf\{ \sup\{ x_m : m \geq n \}: n \geq 0 \}
        \label{eq:limsup_seq_definition}
\end{equation}
%
This can be called an eventual least upper bound of the sequence
$(x_n)$; somewhat roughly speaking, all but a finite number of elements
of $x_n$ are bounded from above by the limit superior. Since
$(\set{X},{\leq})$ is only a partially ordered set then these limits
are not guaranteed to exist. However, if both exist then it is always
the case that
%
\begin{equation*}
        \liminf\limits_{n \to \infty} x_n
        \leq
        \limsup\limits_{n \to \infty} x_n
\end{equation*}
%
We will call the limit inferior and limit superior the \emph{extremum
limits} of a sequence.

\paragraph{Limit Inferior and Limit Superior as Limits:} Take a
partially ordered set $(\set{X},{\leq})$ where $(\set{X},\setset{T})$ is
a topological space. Also take a sequence $(x_n)$ such that $x_i \in
\set{X}$ for all $i \in \N$. Assume that $\inf\{ x_m : m \geq n \}$
exists for all $n \in \N$ and define the sequence $(a_n)$ such that for
all $n \in \N$,
%
\begin{equation*}
        a_n \triangleq \inf\{ x_m : m \geq n \}
\end{equation*}
%
That is,
%
\begin{equation*}
        (a_n)
        =
        (
        \inf\{ x_m : m \geq 1 \},
        \inf\{ x_m : m \geq 2 \},
        \inf\{ x_m : m \geq 3 \},
        \dots
        )
\end{equation*}
%
Therefore, for each $n \in \N$, $a_n$ is the greatest lower bound of all
but the first $n-1$ elements of $(x_n)$. Note that for all $m,n \in \N$
with $m > n$, the greatest lower bound of all but the first $m-1$
elements of $(x_n)$ must be greater than or equal to the greatest lower
bound of all but the first $n-1$ elements of $(x_n)$. Therefore, $(a_n)$
must be a monotonically increasing sequence. Assume that $\sup\{ a_n : n
\in \N \}$ exists. Therefore,
\longref{eq:monotonically_decreasing_limit} applies, and so
%
\begin{equation}
        \liminf\limits_{n \to \infty} x_n
        =
        \lim\limits_{n \to \infty} \inf\{ x_m : m \geq n \}
        \label{eq:seq_liminf_as_limit}
\end{equation}
%
Similarly, using similar reasoning, as long as the relevant suprema and
infima exist, by \longref{eq:monotonically_increasing_limit},
%
\begin{equation}
        \limsup\limits_{n \to \infty} x_n
        =
        \lim\limits_{n \to \infty} \sup\{ x_m : m \geq n \}
        \label{eq:seq_limsup_as_limit}
\end{equation}
%
This is one justification for $\liminf$ and $\limsup$ being called
limits.

\paragraph{Dominated Sequences:} Take a partially ordered set
$(\set{X},{\leq})$ and a sequence $(x_n)$ such that $x_i \in \set{X}$
for all $i \in \N$. Now, take an additional sequence $(y_n)$ such that
there is some $N \in \N$ such that $y_i \leq x_i$ for all $i \geq N$. In
that case, if the limit inferior and limit superior of both sequences
exist then
%
\begin{equation}
        \liminf\limits_{n \to \infty} y_n
        \leq
        \liminf\limits_{n \to \infty} x_n
        \quad
        \text{ and }
        \quad
        \limsup\limits_{n \to \infty} y_n
        \leq
        \limsup\limits_{n \to \infty} x_n
        \label{eq:theorem_limsupinf_seq}
\end{equation}
%
which is not a surprising result. Of course, neither the limit inferior
and limit superior must exist. 

\paragraph{Agreement of Limit Inferior and Limit Superior:} Take a
partially ordered set $(\set{X},{\leq})$ where $(\set{X},\setset{T})$ is
a topological space. Also take a sequence $(x_n)$ such that $x_i \in
\set{X}$ for all $i \in \N$. If both the limit inferior and limit
superior of the sequence exist, then it must be the case that for some
$q \in \set{X}$,
%
\begin{equation*}
        \liminf\limits_{n \to \infty} x_n =
        \limsup\limits_{n \to \infty} x_n = q
        \quad \text{ if and only if } \quad
        \lim\limits_{n \to \infty} x_n = q
\end{equation*}
%
Thus, if the limit inferior and limit superior do not exist or do exist
but do not agree, then the limit will not exist.


\subsection{Series}
\label{app:math_series}

Take a topological space $(\set{X},\setset{T})$ where $(\set{X},{+})$ is
a magma. Take sequence $(a_n)$ such that $a_n \in \set{X}$ for all $n
\in \N$.

\paragraph{Definition of a Series:} Given the sequence $(a_n)$, a new
sequence $(s_n)$ can be constructed with
%
\begin{equation*}
        s_n \triangleq \sum_{i=1}^n a_i
\end{equation*}
%
where $s_n$ is called a \emph{partial sum} and the sequence $(s_n)$ is
known as a \emph{sequence of partial sums}. That is, the sequence
$(s_n)$ is defined by
%
\begin{equation*}
        (s_n) 
        \triangleq
        (a_1, 
         a_1 + a_2, 
         a_1 + a_2 + a_3, 
         a_1 + a_2 + a_3 + a_4, \cdots)
\end{equation*}
%
However, $(s_n)$ is sometimes denoted
%
\begin{equation*}
        a_1 + a_2 + a_3 + \cdots
\end{equation*}
%
or
%
\begin{equation*}
        \sum\limits_{i=1}^\infty a_i
\end{equation*}
%
The latter notation is called an \emph{infinite series} or simply a
\emph{series}. If the sequence $(s_n)$ converges to some limit $s$ (\ie,
$s_n \to s$ as $n \to \infty$) then the notation
%
\begin{equation}
        \sum\limits_{i=1}^n a_i = s
        \label{eq:series_notation}
\end{equation}
%
is used where the limit $s$ is called the \emph{sum of the series}. This
is simply a compact notation for a limit. This does \emph{not} indicate
that the sequence $(a_n)$ has the sum of $s$; it only indicates that the
real sequence of partial sums $(s_n)$ converges to $s$.

\paragraph{Alternate Notations:} The \emph{index} $i$ in
\longref{eq:series_notation} will often be used when elements of the
sequence being summed have a certain pattern. If an element $a_i$ of a
sequence $(a_n)$ is a function of $i-1$ rather than just $i$ then it is
often convenient to use the notation
%
\begin{equation*}
        \sum\limits_{i=0}^\infty a_{i+1} = s
        \label{eq:series_notation_1}
\end{equation*}
%
For example, consider the sequence $(0,1,2,3,\dots)$. In this case, the
sum of this series might be denoted $\sum_{i=1}^\infty (i-1)$ or, more
simply, $\sum_{i=0}^\infty i$. For similar reasons, the most general
series notation for the sum of the sequence $(a_n)$ is
%
\begin{equation*}
        \sum\limits_{i=1-z}^\infty a_{i+z} = s
        \label{eq:series_notation_a}
\end{equation*}
%
where $z \in \Z$.

\section{Continuous Functions}

Take the topological spaces $(\set{X},\setset{T}_\set{X})$ and
$(\set{Y},\setset{T}_\set{Y})$ and a subset $\set{E} \subseteq \set{X}$.
Assume that $(\set{Y},\setset{T}_\set{Y})$ is a Hausdorff space. Now
take function $f: \set{E} \mapsto \set{Y}$, subset $\set{F} \subseteq
\set{E}$, and point $p \in \set{E}$ that is a limit point for set
$\set{E}$.
%
\begin{itemize}
        \item If it is the case that $\lim_{x \to p} f(x) = f(p)$ then
                the function $f$ is called \emph{continuous at (point)
                $p$}. 
        \item If it is the case that for some subset $\set{F} \subseteq
                \set{E}$ that $f$ is continuous at $x$ for all $x \in
                \set{F}$ then $f$ is called \emph{continuous on set
                $\set{F}$}. 
        \item Furthermore, if it is the case that $f$ is continuous at
                $x$ for all $x \in \set{E}$ then $f$ is simply called
                \emph{continuous (on its domain)}.
\end{itemize}
%
Let $q \in \set{Y}$ be such that $q = f(p)$. To summarize, to say that
$f$ is continuous at $p$ means that for any $\set{V} \in \nhd_q$, the
preimage $f^{-1}[\set{V}] \in \nhd_p$. Note the following. 
%
\begin{itemize}
        \item The function $f$ is continuous if and only if the preimage
                of every open set is open.
        \item The function $f$ is continuous if and only if the preimage
                of every closed set is closed.
\end{itemize}

\paragraph{Compactness and Continuity:} Take the topological spaces
$(\set{X},\setset{T}_\set{X})$ and $(\set{Y},\setset{T}_\set{Y})$ and
subsets $\set{C}_\set{X} \subseteq \set{X}$ and $\set{C}_\set{Y}
\subseteq \set{Y}$. Assume that $(\set{Y},\setset{T}_\set{Y})$ is a
Hausdorff space and the set $\set{C}_\set{Y}$ is a compact set. Also
take a continuous function $f: \set{C}_\set{X} \mapsto \set{Y}$. It is
the case that the image $f[\set{C}_\set{X}]$ is compact. In particular,
the continuous image of every compact set is compact. That is, the image
of every contact set under a continuous function is compact.

\paragraph{Compositions of Continuous Functions:} Take
$(\set{X},\setset{T}_{\set{X}})$, $(\set{Y},\setset{T}_{\set{Y}})$, and
$(\set{Z},\setset{T}_{\set{Z}})$ to be three topological spaces.  Take
subset $\set{E} \subseteq \set{X}$ and a function $f: \set{E} \mapsto
\set{Y}$. Also take function $g: \range(f) \mapsto \set{Z}$. Now define
function $h: \set{E} \mapsto \set{Z}$ by
%
\begin{equation*}
        h(x) \triangleq g(f(x))
\end{equation*}
%
for all $x \in \set{E}$. In other words, $h$ is the composition of $g$
and $f$; that is, $h = g \comp f$. In this case, if $f$ is continuous at
$p \in \set{E}$ and $g$ is continuous at point $f(p) \in \set{Y}$ then
$h$ is continuous at $p \in \set{E}$ as well. In fact, if $f$ is
continuous on set $\set{E}$ and $g$ is continuous on set $\range(f)
\subseteq \set{Y}$ then $h$ is also continuous on set $\set{E}$. In
other words, compositions of continuous functions are also continuous.

\section{Basic Real Analysis}
\label{app:math_real_analysis}

The following are some useful remarks about $\extR$. Note that due to
the various isomorphisms, $\N \subset \W \subset \Z \subset \Q \subset
\R \subset \extR$, and thus these statements apply to all of the numbers
discussed here.

\subsection{Real-Valued Sequences and Functions}

For brevity, we now introduce \emph{real-valued} sequences and
functions.

\paragraph{Real and Extended Real Sequences:} Take a sequence $(x_n)$
such that $x_i \in \R$ for all $i \in \N$. This sequence is called a
\emph{real sequence} because all of its elements come from the real
number system. Sometimes such a sequence will be called a
\emph{real-valued sequence} or simply \emph{real-valued}. Note that all
real sequences are implicitly \emph{extended real sequences} since $\R
\subset \extR$.

\paragraph{Real and Extended Real Functions:} Take a set $\set{X}$ and a
function $f: \set{X} \mapsto \R$. This function is called a \emph{real
function} or a \emph{real functional} because all of its elements come
from the real number system. Sometimes such a function will be called a
\emph{real-valued function} or simply \emph{real-valued}. Note that all
real functions are implicitly \emph{extended real functions} since $\R
\subset \extR$.

\subsection{Limiting Behavior}

As explained, the real numbers $\R$ are a metric space and the extended
real numbers $\extR$ are a Hausdorff topological space that is a totally
ordered complete lattice. This greatly simplifies the limiting behavior
of real-valued sequences and real-valued functions. For the following,
recall that every real sequence can be viewed as a real-valued function
with the domain $\N$. That is, for every real sequence $(x_n)$, there is
a function $f: \N \mapsto \R$ defined by $f(n) = x_n$ for all $n \in
\N$. Thus, for simplicity, the following results will be based on
sequences with limits at $\infty$; however, they also follow for
functions with limits anywhere.

\paragraph{Divergence of a Limit of Function or Sequence:} Take the real
sequences $(a_n)$ and $(b_n)$ defined by
%
\begin{equation*}
        (a_n) \triangleq (1,2,3,4,5,\dots)
        \quad \text{ and } \quad
        (b_n) \triangleq (-1,-2,-3,-4,-5,\dots)
\end{equation*}
%
As explained in \longref{app:math_lim_sequences}, the extended real
sequence $(a_n)$ converges to $\infty$; similarly, $(b_n)$ converges to
$-\infty$. However, neither of these two sequences has a limit in the
metric space sense. Thus, to say that $a_n \to \infty$ is to say that
$a_n$ has no upper bound. Similarly, to say that $b_n \to {-\infty}$ is
to say that $b_n$ has no lower bound. That is,
%
\begin{itemize}
        \item for all $R \in \R$, there exists an $N \in \N$ such that
                $a_n \geq R$ for all $n \geq N$, and so $a_n \to \infty$
        \item for all $R \in \R$, there exists an $N \in \N$ such that
                $b_n \leq R$ for all $n \geq N$, and so $b_n \to
                {-\infty}$
\end{itemize}
%
Therefore, stating that a sequence (or function) converges in an
infinite sense communicates information about the sequence. Thus, we
will always consider real sequences and real functions to be extended
real functions so that $\infty$ and $-\infty$ can always be used as
limits.

\paragraph{Oscillation of a Sequence or Function:} It should be clear
that a real sequence or function can diverge in both a real metric sense
as well as an extended real topological sense. For example, take the
sequence
%
\begin{equation*}
        (1,0,1,0,1,0,1,0,1,0,\dots)
\end{equation*}
%
It \emph{oscillates}, and so all of its values are bounded; however, it
still does not converge. Its limit simply does not exist; however, its
limit inferior is $0$ and its limit superior is $1$. As we will show,
the limit inferior and limit superior will always exist in the extended
real context, and the limit will only exist when the limit superior and
limit inferior agree.

\paragraph{Extended Real Limit Inferior and Limit Superior:} Recall the
definitions of the limit inferior and limit superior of a sequence from
\longrefs{eq:liminf_seq_definition} and
\shortref{eq:limsup_seq_definition} respectively. Since the extended
real numbers are a complete lattice, the limit inferior and limit
superior must exist for all real sequences. Thus, for real sequence
$(x_n)$, it is always the case that
%
\begin{equation}
        \liminf\limits_{n \to \infty} x_n = \infty
        \quad \text{ or } \quad
        \liminf\limits_{n \to \infty} x_n = a
        \quad \text{ or } \quad
        \liminf\limits_{n \to \infty} x_n = -\infty
        \label{eq:liminf_seq_always}
\end{equation}
%
and
%
\begin{equation}
        \limsup\limits_{n \to \infty} x_n = \infty
        \quad \text{ or } \quad
        \limsup\limits_{n \to \infty} x_n = b
        \quad \text{ or } \quad
        \limsup\limits_{n \to \infty} x_n = -\infty
        \label{eq:limsup_seq_always}
\end{equation}
%
where $a,b \in \R$ (\ie, $a$ and $b$ are finite); note that $a$ and $b$
need not be equivalent. In the case that the limit inferior or limit
superior is not $\infty$ or $-\infty$ then that limit is said to be
\emph{finite}. An infinite limit indicates no eventual extremum bound in
the real sense. These results hold for real functions as well.

\paragraph{Interpretation of Limit Inferior and Limit Superior for
Reals:} Take a sequence $(x_n)$ such that $x_i \in \R$ for all $i \in
\N$.
%
\begin{itemize}
        \item If there is a $b \in \R$ such that for any $\varepsilon
                \in \R_{>0}$ there exists an $M \in \N$ such that $x_n <
                b + \varepsilon$ for all $n \geq M$ then $\limsup_{n
                \to \infty} x_n = b$. 
        \item If there is an $a \in \R$ such that for any $\varepsilon
                \in \R_{>0}$, there exists an $N \in \N$ such that $x_n
                > b - \varepsilon$ for all $n \geq N$ then $\liminf_{n
                \to \infty} x_n = a$. 
\end{itemize}
%
In other words, the limit superior is the least upper bound and the
limit inferior is the greatest lower bound \emph{for all but a finite
number of elements} of the real sequence $(x_n)$. For example, take the
sequence
%
\begin{equation*}
        (100,-100,1,0,1,0,1,0,1,0,\dots)
\end{equation*}
%
where the pattern of $1$ and $0$ continues \adinfinitum{}. In this case,
the limit of the sequence does not exist. However, $1$ is an upper bound
for all but the first element, and so $1$ is the limit superior of the
sequence. Similarly, $0$ is a lower bound for all but the second
element, and so $0$ is the limit inferior of the sequence. In fact, the
set of cluster points for the filter base generated by this sequence is
$\{0,1\}$. Now take the sequence
%
\begin{equation*}
        (1,-1,2,-2,3,-3,4,-4,5,-5,6,-6,7,-7,\dots)
\end{equation*}
%
Again, it is clear that the limit of this sequence does not exist
because it oscillates. In fact, there is also no finite upper bound nor
finite lower bound. However, by definition $\infty \in \extR$ is always
an upper bound for the elements of the sequence and $-\infty \in \extR$
is always a lower bound for the elements of the sequence. Therefore, the
sequence's limit superior is $\infty$ and the sequence's limit inferior
is $-\infty$. It should be clear that $\infty$ and $-\infty$ are always
possibilities for the limit inferior and limit superior, and therefore
the limit inferior and limit superior are always defined (and thus
always exist) in the extended real context.

\paragraph{Special Case of Limit Inferior and Limit Superior:} Take a
real sequence $(x_n)$. Also take a second real sequence $(y_n)$ such
that $y_i=-x_i$ for all $i \in \N$. Recall the arithmetic rules for
extended real numbers (\eg, $-1 \times \infty = -\infty$). Keeping these
in mind, it is always the case that
%
\begin{equation*}
        -\liminf\limits_{n \to \infty} x_n 
        =
        \limsup\limits_{n \to \infty} y_n
\end{equation*}
%
and
%
\begin{equation*}
        -\limsup\limits_{n \to \infty} x_n 
        =
        \liminf\limits_{n \to \infty} y_n
\end{equation*}
%
In fact, the second statement is redundant.

\paragraph{Extremum Limits and Convergence:} Take a real sequence
$(x_n)$. Of course, using the standard metric for $\R$, the sequence
$(x_n)$ converges to point $x \in \extR$ if and only if
%
\begin{equation*}
        \liminf\limits_{n \to \infty} x_n
        =
        \limsup\limits_{n \to \infty} x_n
        =
        x
\end{equation*}
%
That is, as discussed, the limit $\lim_{n \to \infty} x_n$ exists if and
only if the limit superior and limit inferior of $(x_n)$ agree. When the
limit does exist, it is the case that
%
\begin{equation*}
        \lim\limits_{n \to \infty} x_n
        =
        \liminf\limits_{n \to \infty} x_n
        =
        \limsup\limits_{n \to \infty} x_n
\end{equation*}
%
For example, consider the real sequences $(a_n)$, $(b_n)$, $(c_n)$, and
$(d_n)$ defined by
%
\begin{align*}
        (a_n) &\triangleq (1,2,3,4,5,\dots)\\
        (b_n) &\triangleq (-1,-2,-3,-4,-5,\dots)\\
        (c_n) &\triangleq (1,-1,2,-2,3,-3,4,-4,5,-5,\dots)\\
        (d_n) &\triangleq (1,0,1,0,1,0,1,0,\dots)
\end{align*}
%
It is the case that
%
\begin{align*}
        \liminf\limits_{n \to \infty} a_n = \infty
        \quad &\text{ and } \quad
        \limsup\limits_{n \to \infty} a_n = \infty\\
        \liminf\limits_{n \to \infty} b_n = -\infty
        \quad &\text{ and } \quad
        \limsup\limits_{n \to \infty} b_n = -\infty\\
        \liminf\limits_{n \to \infty} c_n = -\infty
        \quad &\text{ and } \quad
        \limsup\limits_{n \to \infty} c_n = \infty\\
        \liminf\limits_{n \to \infty} d_n = 0
        \quad &\text{ and } \quad
        \limsup\limits_{n \to \infty} d_n = 1
\end{align*}
%
and therefore $a_n \to \infty$ and $b_n \to -\infty$ as $n \to \infty$.
However, the limits for $(c_n)$ and $(d_n)$ simply do not exist since
the limit inferior and limit superior do not agree for each of them.

\paragraph{Limit Arithmetic:} Take $(\set{X},\setset{T}_\set{X})$ to be
a topological space and a subset $\set{E} \subseteq \set{X}$. Also take
functions $f: \set{E} \mapsto \R$ and $g: \set{E} \mapsto \R$. Thus, $f$
and $g$ are both extended real functions. Now take $x_0 \in \set{X}$ to
be a limit point of $\set{E}$ in $\set{X}$. Assume that there exists
$p,q \in \extR$ such that $f(x) \to p$ and $g(x) \to q$ as $x \to x_0$
(in the topological extended real sense of the limit). That is, assume
that $p$ and $q$ are such that
%
\begin{equation*}
        \lim\limits_{x \to x_0} f(x) = p
        \quad \text{ and } \quad
        \lim\limits_{x \to x_0} g(x) = q
\end{equation*}
%
Keeping in mind the rules for arithmetic in the extended real numbers,
%
\begin{itemize}
        \item if $(p,q) \notin \{({-\infty},\infty),
                (\infty,{-\infty})\}$ then 
                %
                \begin{equation*}
                        \lim\limits_{x \to x_0} ( f(x)+g(x) )
                        =
                        p + q
                \end{equation*}
        \item if $(p,q) \notin \{(\infty,\infty),
                ({-\infty},{-\infty})\}$ then 
                %
                \begin{equation*}
                        \lim\limits_{x \to x_0} ( f(x)-g(x) )
                        =
                        p - q
                \end{equation*}
        \item if $(p,q) \notin \{(0,\infty),(\infty,0),
                (0,{-\infty}),({-\infty},0)\}$ then 
                %
                \begin{equation*}
                        \lim\limits_{x \to x_0} ( f(x) g(x) )
                        =
                        p q
                \end{equation*}
        \item if $q \neq 0$ and $p,q \notin \{\infty, {-\infty})\}$ then
                %
                \begin{equation*}
                        \lim\limits_{x \to x_0} 
                        \frac{ f(x) }{ g(x) }
                        =
                        \frac{p}{q}
                \end{equation*}
\end{itemize}
%
Otherwise, the limits of these sums, differences, products, and
quotients of functions do not exist. This same arithmetic holds for
right-handed and left-handed limits as well as limit inferiors and
limit superiors.

\subsection{Semi-Continuity of Real-Valued Functions}

For a real-valued function, the concept of continuity can be broken
lower semi-continuity and upper semi-continuity. A function is
continuous if and only if these two notions agree.

\paragraph{Lower Semi-Continuous Functions:} Take the topological space
$(\set{X},\setset{T}_\set{X})$ and a subset $\set{E} \subseteq \set{X}$.
Now take function $f: \set{E} \mapsto \R$, subset $\set{F} \subseteq
\set{E}$, and point $p \in \set{E}$ that is a limit point for set
$\set{E}$.
%
\begin{itemize}
        \item If it is the case that $\liminf_{x \to p} f(x) \geq f(p)$
                then the function $f$ is called \emph{lower
                semi-continuous at (point) $p$}. 
        \item If it is the case that for some subset $\set{F} \subseteq
                \set{E}$ that $f$ is lower semi-continuous at $x$ for
                all $x \in \set{F}$ then $f$ is called \emph{lower
                semi-continuous on set $\set{F}$}. 
        \item Furthermore, if it is the case that $f$ is lower
                semi-continuous at $x$ for all $x \in \set{E}$ then $f$
                is simply called \emph{lower semi-continuous (on its
                domain)}.
\end{itemize}
%
Define the function $f_*: \set{E} \mapsto \extR$ by
%
\begin{equation*}
        f_*(p) = \liminf_{x \to p} f(x)
\end{equation*}
%
for all $p \in \set{E}$. It is clear that $f_*$ is a lower
semi-continuous function. Additionally, so is the function $g: \R
\mapsto \R$ defined by
%
\begin{equation*}
        g(x) \triangleq \lceil x \rceil
\end{equation*}
%
for all $x \in \R$. That is, the \emph{ceiling function} is lower
semi-continuous.

\paragraph{Upper Semi-Continuous Functions:} Take the topological space
$(\set{X},\setset{T}_\set{X})$ and a subset $\set{E} \subseteq \set{X}$.
Now take function $f: \set{E} \mapsto \R$, subset $\set{F} \subseteq
\set{E}$, and point $p \in \set{E}$ that is a limit point for set
$\set{E}$.
%
\begin{itemize}
        \item If it is the case that $\limsup_{x \to p} f(x) \leq f(p)$
                then the function $f$ is called \emph{upper
                semi-continuous at (point) $p$}. 
        \item If it is the case that for some subset $\set{F} \subseteq
                \set{E}$ that $f$ is upper semi-continuous at $x$ for
                all $x \in \set{F}$ then $f$ is called \emph{upper
                semi-continuous on set $\set{F}$}. 
        \item Furthermore, if it is the case that $f$ is upper
                semi-continuous at $x$ for all $x \in \set{E}$ then $f$
                is simply called \emph{upper semi-continuous (on its
                domain)}.
\end{itemize}
%
Define the function $f^*: \set{E} \mapsto \extR$ by
%
\begin{equation*}
        f^*(p) = \limsup_{x \to p} f(x)
\end{equation*}
%
for all $p \in \set{E}$. It is clear that $f^*$ is an upper
semi-continuous function. Additionally, so is the function $g: \R
\mapsto \R$ defined by
%
\begin{equation*}
        g(x) \triangleq \lfloor x \rfloor
\end{equation*}
%
for all $x \in \R$. That is, the \emph{floor function} is upper
semi-continuous.

\paragraph{From Semi-Continuity to Continuity:} Take the topological
space $(\set{X},\setset{T}_\set{X})$ and a subset $\set{E} \subseteq
\set{X}$. Now take function $f: \set{E} \mapsto \R$, subset $\set{F}
\subseteq \set{E}$, and point $p \in \set{E}$ that is a limit point for
set $\set{E}$. It is the case that $f$ is continuous at $p$ if and only
if $f$ is upper semi-continuous at $p$ and $f$ is lower-semicontinuous
at $p$.

\subsection{The Intermediate Value Theorem}

Take $a,b \in \R$ with $a<b$ and a continuous function $f: [a,b] \mapsto
\R$. Take $y$ such that $f(a) \leq y \leq f(b)$ or $f(b) \leq y \leq
f(a)$. There exists some $c \in [a,b]$ such that $f(c) = y$. In other
words, the image of interval $[a,b]$ under $f$ is also an interval. This
result follows from the continuity of $f$ and the gaplessness of $\R$.
In fact, if we were to define some other special properties of the
interval $[a,b]$, then we could generalize this theorem. This is
typically called the \emph{intermediate value theorem}. 

\subsection{The Extreme Value Theorem}

Take $a,b \in \R$ with $a<b$ and a continuous function $f: [a,b] \mapsto
\R$. There exists $\ell,u \in \R$ such that
%
\begin{equation*}
        \ell \leq f(x) \leq u
\end{equation*}
%
for all $x \in [a,b]$. In other words, $f$ must be a \emph{bounded}
function. In fact, there must exist $c,d \in [a,b]$ such that
%
\begin{equation*}
        f(c) \leq f(x) \leq f(d)
\end{equation*}
%
for all $x \in [a,b]$. This last fact is known as the \emph{extreme
value theorem}. It is the statement that every continuous function
defined on a closed interval must not only be bounded but must actually
attain its maximum and minimum values. Sometimes this is called the
\emph{Weierstrass theorem} which should not be confused with the
well-known \emph{Stone-Weierstrass Theorem}.

\section{Differentiation of Real-Valued Functions}

Take $\set{D} \subseteq \R$ and function $f: \set{D} \mapsto \R$. Take
$p \in \set{D}$ to be a limit point of $\set{D}$ (notice that $p \in
\set{D}$). Define
\symdef[$f'(p)$]{Ganalysis.2b}{first_oderiv}{$f'(x_0)$}{the first
(ordinary) derivative of function $f$ at point $x_0$} by
%
\begin{equation*}
        f'(p)
        \triangleq
        \lim\limits_{x \to p} \frac{ f(x) - f(p) }{ x - p }
\end{equation*}
%
To say $f$ is \emph{differentiable at} $p$ means that $f'(p)$ exists in
$\R$ (\ie, it is \emph{finite}). If $f$ is differentiable at $p$ then
$f'(p)$ is called the \emph{first (ordinary) derivative of $f$ at $p$}
or simply the \emph{derivative} of $f$ at $p$. Take a subset $\set{E}
\subseteq \set{D}$. 
%
\begin{itemize}
        \item If it is the case that $f$ is differentiable at $p$ for
                all $p \in \set{E}$, it is said that $f$ is
                \emph{differentiable on} $\set{E}$, and we define the
                function $f': \set{E} \mapsto \R$ where $f'(p)$ is the
                derivative of $f$ at every point $p \in \set{E}$.
        \item If it is the case that $f$ is differentiable at $p$ for
                all $p \in \set{D}$, it is simply said that $f$ is
                \emph{differentiable (on its domain)}, and we define the
                function $f': \set{D} \mapsto \R$ where $f'(p)$ is the
                derivative of $f$ at every point $p \in \set{D}$.
        \item If it is the case that $f$ is differentiable on its
                domain and the function $f'$ is continuous, then $f$ is
                called \emph{continuously differentiable}.
        \item If it is the case that $f$ is continuously differentiable
                and $f'$ is differentiable, we define function $f'':
                \set{D} \mapsto \R$ to represent the derivative of
                function $f'$ at all points in $\set{D}$. In this case,
                when $f'$ is differentiable, $f$ is called \emph{twice
                differentiable}.
        \item If it is the case that $f$ and $f'$ are both continuously
                differentiable (\ie, $f''$ is continuous) then $f$ is
                called \emph{twice continuously differentiable}.
\end{itemize}
%
An intuitive geometric interpretation of the derivative at
a point is the \emph{slope} of the \emph{tangent line} at the point. 

\subsection{Handed Derivatives} 

Again, take $\set{D} \subseteq \R$ and function $f: \set{D} \mapsto \R$.
Take $p \in \set{D}$ to be a limit point of $\set{D}$. Define
\symdef[$f'({p+})$]{Ganalysis.2a}{right_deriv}{$f'(x_0+)$}{the
right-hand derivative of function $f$ at point $x_0$} by
%
\begin{equation*}
        f'({p+})
        \triangleq
        \lim\limits_{x \to {p+}} \frac{ f(x) - f(p) }{ x - p }
\end{equation*}
%
which is called the \emph{right-hand derivative of $f$ at $p$}.
Similarly, define
\symdef[$f'({p-})$]{Ganalysis.2a}{left_deriv}{$f'(x_0-)$}{the left-hand
derivative of function $f$ at point $x_0$} by
%
\begin{equation*}
        f'({p-})
        \triangleq
        \lim\limits_{x \to {p-}} \frac{ f(x) - f(p) }{ x - p }
\end{equation*}
%
which is called the \emph{left-hand derivative of $f$ at $p$}. Note that
if $x_0 \in \interior(\set{D})$ then there is a $q \in \R$ such that
$f'(p)=q$ if and only if $f'({p-})=f'({p+})=q$.

\subsection{Functions with Interval Domains} 

Now take $a,b \in \R$ with $a < b$ and a function $f: [a,b] \mapsto \R$.
Take $p \in [a,b]$. This way, $p$ is a limit point of $[a,b]$. In this
case,
%
\begin{itemize}
        \item if $p \in (a,b)$ then there is a $q \in \R$ such that
                $f'(p)=q$ if and only if $f'({p-})=f'({p+})=q$
        \item if $p=b$ then there is a $q \in \R$ such that $f'(p)=q$ if
                and only if $f'({p-})=q$
        \item if $p=a$ then there is a $q \in \R$ such that $f'(p)=q$ if
                and only if $f'({p+})=q$
\end{itemize}
%
That is, assuming that $f'(p)$ exists,
%
\begin{equation*}
        f'(p)
        =
        \begin{cases}
                \lim\limits_{x \to {p+}} \frac{ f(x) - f(p) }{ x - p }
                &\text{if } p = a\\
                \lim\limits_{x \to p} \frac{ f(x) - f(p) }{ x - p }
                &\text{if } p \in (a,b)\\
                \lim\limits_{x \to {p-}} \frac{ f(x) - f(p) }{ x - p }
                &\text{if } p = b
        \end{cases}
\end{equation*}

\subsection{Useful Information about Derivatives} 

The derivative is used often in analysis. Consider a differentiable
function $f: \set{D} \mapsto \R$ in the following.
%
\begin{itemize}
        \item This derivative is also called the \emph{ordinary
                derivative} or the \emph{total derivative} to
                differentiate it from the \emph{partial derivative}.
                These are described further in
                \longref{app:math_partial_derivatives}.
        \item The notion of derivative is often only defined for any
                point $x_0 \in \interior(\set{D})$; however, since our
                definition of limits is very general, we are able to
                include the natural extension of the derivative to any
                point $x_0 \in \set{D}$.

        \item Take $\set{E} \subseteq \set{D}$ to be the set of points
                on which $f$ is differentiable. Thus, the function $f':
                \set{E} \mapsto \R$ is usually defined and called the
                \emph{first derivative} of function $f$; we say that
                this function is generated by the derivative of $f$.
                Therefore, for some $x_0 \in \set{E}$, $f'(x_0)$ can
                either represent the derivative of $f$ at point $x_0$ or
                the value of function $f'$ at point $x_0$. Similarly,
                %
                \symdef[]{Ganalysis.2b2}{second_oderiv}{$f''(x_0)$}{the
                second (ordinary) derivative of function $f$ at point
                $x_0$}
                \symdef[]{Ganalysis.2b3}{third_oderiv}{$f'''(x_0)$}{the
                third (ordinary) derivative of function $f$ at point
                $x_0$}
                \symdef[]{Ganalysis.2b4}{n_oderiv}{$f^{(n)}(x_0)$}{the
                $n\th$ (ordinary) derivative of function $f$ at point
                $x_0$ where $n \in \{4,5,6,\dots\}$}
                %
                \begin{itemize}
                        \item the function $f''$ is generated by
                                function $f'$ and is called the
                                \emph{second derivative} of function $f$
                        \item the function $f'''$ is generated by
                                function $f''$ and is called the
                                \emph{third derivative} of function $f$
                        \item the function $f^{(4)}$ is generated by
                                function $f'''$ and is called the
                                \emph{fourth derivative} of function $f$
                        \item the function $f^{(5)}$ is generated by
                                function $f^{(4)}$ and is called the
                                \emph{fifth derivative} of function
                                $f$
                        \item the function $f^{(n)}$ is generated by
                                function $f^{(n-1)}$ and is called the
                                \emph{$n\th$ derivative} of function $f$
                                where $n \in \{5,6,7,8,\dots\}$
                \end{itemize}
                %
                If $f'$ is continuous, $f$ is called \emph{continuously
                differentiable}. If $f''$ is continuous, $f$ is called
                \emph{twice continuously differentiable}. This pattern
                continuous for all derivatives of $f$.
        \item Notice that in the definition above, a derivative is only
                said to exist if it is \emph{finite}. That is, if the
                limit that defines the derivative is $\infty$ or
                $-\infty$, the derivative is said to not exist. In other
                words, the limits that define derivatives may be
                considered to be metric space $\R$ limits rather than
                topological space $\extR$ limits.
        \item If the function $f$ is differentiable at point $x_0 \in
                \set{D}$, it is also continuous at point $x_0$; however,
                the converse of this statement is not true. By
                extension, if $f$ is a differentiable function, $f$ must
                also be a continuous function.
\end{itemize}

\subsection{Notation} 

This $f'$-notation for the derivative is known as Lagrange's notation.
We will also use Leibniz's notation, where
%
\begin{align*}
        \frac{\total f}{\total x} 
        &\triangleq f'\\
        \frac{\total f(x)}{\total x} 
        &\triangleq f'(x)\\
        \left.\frac{\total f(x)}{\total x}\right|_{x = x_0}
        &\triangleq f'(x_0)
\end{align*}
%
are three different representations of the first derivative of function
$f$ defined by $f(x)$, where the third representation expresses the
derivative evaluated at point $x=x_0$. Similarly, the second derivative
can be written in in an analogous way as
%
\begin{align*}
        \frac{\total^2 f}{{\total x}^2}
        &\triangleq f''\\
        \frac{\total^2 f(x)}{{\total x}^2}
        &\triangleq f''(x)\\
        \left.\frac{\total f(x)}{{\total x}^2}\right|_{x = x_0}
        &\triangleq f''(x_0)
\end{align*}


\subsection{The Intermediate Value Theorem}

Take $a,b \in \R$ with $a<b$ and a function $f: [a,b] \mapsto \R$.
Assume that $f$ is differentiable on its domain. Then for any $y$ such
that that $f'(a) \leq y \leq f'(b)$ or $f'(b) \leq y \leq f'(a)$, there
exists some $c \in [a,b]$ such that $f'(c) = y$. In other words, even
though $f'$ is not continuous, it does satisfy the \emph{intermediate
value theorem} that holds for continuous functions as well. In other
words, the image of interval $[a,b]$ under $f'$ is also an interval.
This is known as \emph{Darboux's Theorem (of real analysis)}. 

\subsection{The Chain Rule}
\label{app:math_chain_rule}

Take sets $\set{A} \subseteq \R$ and $\set{B} \subseteq \R$ and
functions $f: \set{A} \mapsto \set{B}$ and $g: \set{B} \mapsto \R$. Take
a point $p \in \set{A}$ and assume that $f$ is differentiable at $p$ and
$g$ is differentiable at $f(p)$. Define the composition $g \comp f$ as
$h$. That is, define $h: \set{A} \mapsto \R$ by
%
\begin{equation*}
        h(x) \triangleq g(f(x))
\end{equation*}
%
for all $x \in \set{A}$. In this case, $h$ is differentiable at $p$ and
$h'(p) = g'( f(p) ) f'(p)$.

\subsection{Products and Quotients of Functions}

It is common to take products and quotients of functions, and thus their
derivatives are an important special case.

\paragraph{Products of Functions:} Take set $\set{A} \mapsto \R$ and
functions $f: \set{A} \mapsto \R$ and $g: \set{A} \mapsto \R$.  Take a
point $p \in \set{A}$ such that $f$ and $g$ are both differentiable at
$p$. Define a new function $h: \set{A} \mapsto \R$ by
%
\begin{equation*}
        h(x) \triangleq f(x) g(x)
\end{equation*}
%
The function $h$ is differentiable at $p$ and
%
\begin{equation*}
        h'(x) = g(x) f'(x) + f(x) g'(x)
\end{equation*}

\paragraph{Quotients of Functions:} Take set $\set{A} \mapsto \R$ and
functions $f: \set{A} \mapsto \R$ and $g: \set{A} \mapsto \R$ where
there is no $x \in \set{A}$ such that $g(x) = 0$. Take a point $p \in
\set{A}$ such that $f$ and $g$ are both differentiable at $p$. Define a
new function $h: \set{A} \mapsto \R$ by
%
\begin{equation*}
        h(x) \triangleq \frac{ f(x) }{ g(x) }
\end{equation*}
%
The function $h$ is differentiable at $p$ and
%
\begin{equation*}
        h'(x) = \frac{ g(x) f'(x) - f(x) g'(x) }{ g(x) g(x) }
\end{equation*}

\subsection{The Mean Value Theorem}

An important tool in analysis is the \emph{mean value theorem}. However,
there is a generalized form that we present first. The commonly-known
mean value theorem will be presented as a result of the generalized mean
value theorem.

\paragraph{Generalized Mean Value Theorem:} Take $a,b \in \R$ with $a<b$
and functions $f: [a,b] \mapsto \R$ and $g: [a,b] \mapsto \R$. Assume
that $f$ and $g$ are both continuous on $[a,b]$ and differentiable on
$(a,b)$. Then there exists some $c \in (a,b)$ such that
%
\begin{equation*}
        ( f(b) - f(a) ) g'(c) = ( g(b) - g(a) ) f'(c)
\end{equation*}

\paragraph{The Mean Value Theorem:} Take $a,b \in \R$ with $a<b$ and a
function $f: [a,b] \mapsto \R$ that is continuous on $[a,b]$ and
differentiable on $(a,b)$. There exists some $c \in (a,b)$ such that
%
\begin{equation*}
        f'(c) = \frac{ f(b) - f(a) }{ b - a }
\end{equation*}
%
Note that this follows from the generalized mean value theorem if the
function $g: [a,b] \mapsto \R$ is defined by$g(x) \triangleq x$ for all
$x \in [a,b]$. In that case, $g(b) - g(a) = ( b - a )$ and $g'(x) = 1$
for all $x \in (a,b)$.

\subsection{The Racetrack Principle}

Another important tool in analysis is the \emph{racetrack principle}.
This is a generalized result of something known as \emph{Rolle's
theorem}. 

\paragraph{Rolle's Theorem:} Take $a,b \in \R$ with $a<b$ and a function
$f: [a,b] \mapsto \R$ that is continuous on $[a,b]$ and differentiable
on $(a,b)$. Also assume that $f(a)=f(b)$. Then there exists some $c \in
(a,b)$ such that $h'(c) = 0$.

\paragraph{The Racetrack Principle:} Take $a,b \in \R$ with $a<b$ and
functions $f: [a,b] \mapsto \R$ and $g: [a,b] \mapsto \R$. Assume that
both functions are continuous on $[a,b]$ and differentiable on $(a,b)$.
Assume that $f(a)=g(a)$ and $f(b)=g(b)$. There exists some $c \in (a,b)$
such that $f'(c) = g'(c)$. Note that this follows from Rolle's theorem
applied to the function $h: [a,b] \mapsto \R$ defined by $h(x) = g(x) -
f(x)$ for all $x \in [a,b]$.

\subsection{Limits of Ratios of Differentiable Functions}

The following is typically known as \emph{l'H\^{o}spital's rule}. The
final result is familiar to most students of calculus; however, that
result is not trivial. Take $a,b \in \extR$ with $a<b$ and functions $f:
(a,b) \mapsto \R$ and $g: (a,b) \mapsto \R$ which are both
differentiable on $(a,b)$. Consider two cases.
%
\begin{enumerate}[{Case} 1:]
        \item Assume that
                %
                \begin{equation*}
                        f(x) \to 0 
                        \text{ and } 
                        g(x) \to 0 
                        \text{ as } x \to {a+}
                \end{equation*}
                %
                \emph{or}
                %
                \begin{equation*}
                        |g(x)| \to \infty \text{ as } x \to {a+}
                \end{equation*}
                %
                Also assume that for all $x \in (a,b)$, $g'(x) \neq 0$.
                In this case,
                %
                \begin{equation}
                        \liminf\limits_{x \to {a+}}
                        \frac{ f'(x) }{ g'(x) }
                        \leq
                        \liminf\limits_{x \to {a+}}
                        \frac{ f(x) }{ g(x) }
                        \leq
                        \limsup\limits_{x \to {a+}}
                        \frac{ f(x) }{ g(x) }
                        \leq
                        \limsup\limits_{x \to {a+}}
                        \frac{ f'(x) }{ g'(x) }
                        \label{eq:lhospitals_left}
                \end{equation}
                %
                That is, we can bound the limit superior and limit
                inferior of $f(x)/g(x)$ by the limit superior and limit
                inferior of $f'(x)/g'(x)$. Thus, if the limit inferior
                and limit superior of $f'(x)/g'(x)$ agree, then the
                limit inferior and limit superior of $f(x)/g(x)$ must
                agree. Therefore, if $y \in \extR$ and
                %
                \begin{equation*}
                        \frac{ f'(x) }{ g'(x) } \to y 
                        \text{ as } x \to {a+}
                \end{equation*}
                %
                then
                %
                \begin{equation*}
                        \frac{ f(x) }{ g(x) } \to y 
                        \text{ as } x \to {a+}
                \end{equation*}
                %
                This is half of the common result. However,
                \longref{eq:lhospitals_left} is a more interesting
                result.
        \item Assume that
                %
                \begin{equation*}
                        f(x) \to 0 \text{ and } g(x) \to 0 
                        \text{ as } x \to {b-}
                \end{equation*}
                %
                \emph{or}
                %
                \begin{equation*}
                        |g(x)| \to \infty \text{ as } x \to {b-}
                \end{equation*}
                %
                Also assume that for all $x \in (a,b)$, $g'(x) \neq 0$.
                In this case,
                %
                \begin{equation}
                        \liminf\limits_{x \to {b-}}
                        \frac{ f'(x) }{ g'(x) }
                        \leq
                        \liminf\limits_{x \to {b-}}
                        \frac{ f(x) }{ g(x) }
                        \leq
                        \limsup\limits_{x \to {b-}}
                        \frac{ f(x) }{ g(x) }
                        \leq
                        \limsup\limits_{x \to {b-}}
                        \frac{ f'(x) }{ g'(x) }
                        \label{eq:lhospitals_right}
                \end{equation}
                %
                That is, we can bound the limit superior and limit
                inferior of $f(x)/g(x)$ by the limit superior and limit
                inferior of $f'(x)/g'(x)$. Thus, if the limit inferior
                and limit superior of $f'(x)/g'(x)$ agree, then the
                limit inferior and limit superior of $f(x)/g(x)$ must
                agree. Therefore, if $y \in \extR$ and
                %
                \begin{equation*}
                        \frac{ f'(x) }{ g'(x) } \to y 
                        \text{ as } x \to {b-}
                \end{equation*}
                %
                then
                %
                \begin{equation*}
                        \frac{ f(x) }{ g(x) } \to y 
                        \text{ as } x \to {b-}
                \end{equation*}
                %
                This is half of the common result. However,
                \longref{eq:lhospitals_right} is a more interesting
                result.
\end{enumerate}
%
The combined result is that for $c \in [a,b]$, if
%
\begin{equation*}
        f(x) \to 0 \text{ and } g(x) \to 0 \text{ as } x \to c
\end{equation*}
%
\emph{or}
%
\begin{equation*}
        |g(x)| \to \infty \text{ as } x \to c
\end{equation*}
%
then
%
\begin{equation}
        \liminf\limits_{x \to c}
        \frac{ f'(x) }{ g'(x) }
        \leq
        \liminf\limits_{x \to c}
        \frac{ f(x) }{ g(x) }
        \leq
        \limsup\limits_{x \to c}
        \frac{ f(x) }{ g(x) }
        \leq
        \limsup\limits_{x \to c}
        \frac{ f'(x) }{ g'(x) }
        \label{eq:lhospitals}
\end{equation}
%
That is, we can bound the limit superior and limit inferior of
$f(x)/g(x)$ by the limit superior and limit inferior of $f'(x)/g'(x)$.
Thus, if the limit inferior and limit superior of $f'(x)/g'(x)$ agree,
then the limit inferior and limit superior of $f(x)/g(x)$ must agree.
Therefore, if $y \in \extR$ and
%
\begin{equation*}
        \frac{ f'(x) }{ g'(x) } \to y \text{ as } x \to c
\end{equation*}
%
then
%
\begin{equation*}
        \frac{ f(x) }{ g(x) } \to y \text{ as } x \to c
\end{equation*}
%
This is the commonly known result. However, the result in
\longref{eq:lhospitals} is more interesting.

\subsection{General Results for Differentiable Functions}

Take $a,b \in \R$ where $a < b$ and a continuous function $f: [a,b]
\mapsto \R$. Also assume that $f$ is differentiable on the \emph{open}
interval $(a,b)$. It is the case that
%
\begin{itemize}
        \item there exists some $c \in \R$ such that $f(x)=c$ for all $x
                \in [a,b]$ (\ie, $f$ is constant) if and only if for all
                $x \in (a,b)$, $f'(x) = 0$
        \item for any $c,d \in [a,b]$ with $d > c$, $f(d) \geq f(c)$
                (\ie, $f$ is increasing) if and only if for all $x \in
                (a,b)$, $f'(x) \geq 0$ 
        \item for any $c,d \in [a,b]$ with $d > c$, $f(d) \leq f(c)$
                (\ie, $f$ is decreasing) if and only if for all $x \in
                (a,b)$, $f'(x) \leq 0$
        \item if for all $x \in (a,b)$, $f'(x) > 0$ then for all $c,d
                \in [a,b]$ with $d > c$, $f(d) > f(c)$ (\ie, $f$ is
                strictly increasing)
        \item if for all $x \in (a,b)$, $f'(x) < 0$ then for all $c,d
                \in [a,b]$ with $d > c$, $f(d) < f(c)$ (\ie, $f$ is
                strictly decreasing)
\end{itemize}
%
These match the intuitive description of a derivative as a \emph{slope}
of a \emph{tangent line} at a point.

\subsection{Necessary Condition for Maxima and Minima}

The problem of optimization involves the maximization or minimization of
a function. When these functions are differentiable, this process is
simplified.

\paragraph{Necessary Conditions for Minima:} Take set $\set{A} \subseteq
\R$ and function $f: \set{A} \mapsto \R$. Take a point $p$ such that
there exists some $\varepsilon \in \R_{>0}$ such that $(p,p+\varepsilon)
\cap \set{A} \neq \emptyset$ and $(p-\varepsilon,p) \cap \set{A} \neq
\emptyset$ (\eg, $p \in \interior(\set{A})$) and assume that $f$ is
differentiable at point $p$. Also assume that there exists some
$\varepsilon \in \R_{>0}$ such that for all $x \in \set{A} \cap
(p-\varepsilon,p+\varepsilon)$, $f(p) \leq f(x)$. That is, $f$ has a
\emph{local minimum} at $p$. In this case, it must be that $f'(p)=0$.
Now assume that function $f'$ is differentiable at $p$. In this case, it
must be that $0 \leq f''(p)$.

\paragraph{Sufficient Conditions for Minima:} Take set $\set{A}
\subseteq \R$ and function $f: \set{A} \mapsto \R$. Take a point $p$
such that $f$ is differentiable at $p$. Also assume that $f'$ is
differentiable at $p$. If it is the case that $f'(p)=0$ and $f''(p) >
0$, then $f$ must have a local minimum at $p$.

\paragraph{Necessary Conditions for Maxima:} Take set $\set{A} \subseteq
\R$ and function $f: \set{A} \mapsto \R$. Take a point $p$ such that
there exists some $\varepsilon \in \R_{>0}$ such that $(p,p+\varepsilon)
\cap \set{A} \neq \emptyset$ and $(p-\varepsilon,p) \cap \set{A} \neq
\emptyset$ (\eg, $p \in \interior(\set{A})$) and assume that $f$ is
differentiable at point $p$. Also assume that there exists some
$\varepsilon \in \R_{>0}$ such that for all $x \in \set{A} \cap
(p-\varepsilon,p+\varepsilon)$, $f(x) \leq f(p)$. That is, $f$ has a
\emph{local maximum} at $p$. In this case, it must be that $f'(p)=0$.
Now assume that function $f'$ is differentiable at $p$. In this case, it
must be that $f''(p) \leq 0$.

\paragraph{Sufficient Conditions for Minima:} Take set $\set{A}
\subseteq \R$ and function $f: \set{A} \mapsto \R$. Take a point $p$
such that $f$ is differentiable at $p$. Also assume that $f'$ is
differentiable at $p$. If it is the case that $f'(p)=0$ and $f''(p) <
0$, then $f$ must have a local maximum at $p$.

\section{Partial and Total Derivatives}
\label{app:math_partial_derivatives}

Take sets $\set{T} \subseteq \R$ and $\set{X} \subseteq \R$ and
functions $x: \set{T} \mapsto \set{X}$ and $f: \set{X} \mapsto \R$. Take
a point $p \in \set{T}$ and assume that function $x$ is differentiable
at $p$ and function $f$ is differentiable at $x(p)$. Define the
composition $f \comp x$ as $g$. That is, define $g: \set{T} \mapsto \R$
by
%
\begin{equation*}
        g(t) \triangleq f(x(t))
\end{equation*}
%
for all $t \in \set{T}$. As discussed in \longref{app:math_chain_rule},
$g$ is differentiable at $p$ and $g'(p) = f'( x(p) ) x'(p)$. For
simplicity, whenever $f$ is evaluated at a symbol $x$, assume the normal
definition of $f$; however, whenever $f$ is evaluated at symbol $t$,
assume that $g(t)$ is meant. That is,
%
\begin{equation*}
        f(x) \triangleq f(x) 
        \quad \text{ and } \quad
        f(t) \triangleq g(t) = f(x(t))
\end{equation*}
%
Now assume that $f(x)$ is differentiable for all $x \in \set{X}$ and
$f(t)$ (\ie, $g(t)$) is differentiable for all $t \in \set{T}$.
Therefore, there are two relevant derivatives, namely $f'(x)$ for all $x
\in \set{X}$ and $f'(t)$ (\ie, $g'(t)$) for all $t \in \set{T}$. 
%
\begin{itemize}
        \item We call $f'(t)$ the \symdef[\emph{total derivative of $f$
                at point $t$}]{Ganalysis.2y}{total_deriv}{$\total
                f/\total t$}{total derivative of function $f$ at point
                $t$} and use the notations
                %
                \begin{enumerate}[(i)]
                        \item $\frac{\total f}{\total t} \triangleq g'$
                                \label{item:total_deriv_function}
                        \item $\frac{\total f(t)}{\total t} \triangleq
                                f'(t) = g'(t) = f'( x(t)) x'(t)$
                                \label{item:total_deriv_point}
                        \item $\left.\frac{\total f(t)}{\total
                                t}\right|_{t = t_0} \triangleq =
                                g'(t_0) = f'( x(t_0) ) x'(t_0)$
                                \label{item:total_deriv_function_eval}
                \end{enumerate}
                %
                Notation (\shortref{item:total_deriv_function})
                represents the first derivative function. Notation
                (\shortref{item:total_deriv_point}) represents the first
                derivative function evaluated at point $t$ (\ie, the
                derivative of $g$ at $t \in \set{T}$). Notation
                (\shortref{item:total_deriv_function_eval}) represents
                the first derivative function evaluated at point $t_0$
                (\ie, the derivative of $g$ at $t_0 \in \set{T}$).
        \item We call $f'(x)$ the \symdef[\emph{partial derivative of
                $f$ at point
                $x$}]{Ganalysis.2z}{partial_deriv}{$\partial f/\partial
                x$}{partial derivative of function $f$ with respect to
                $x$} and use the notations
                %
                \begin{enumerate}[(i)]
                        \item $\frac{\partial f}{\partial x} 
                                \triangleq f'$
                                \label{item:partial_deriv_function}
                        \item $\frac{\partial f(x)}{\partial x} 
                                \triangleq f'(x)$
                                \label{item:partial_deriv_point}
                        \item $\left.\frac{\partial f(x)}{\partial
                                x}\right|_{x = x_0} \triangleq f'(x_0)$
                                \label{item:partial_deriv_function_eval}
                \end{enumerate}
                %
                Notation (\shortref{item:partial_deriv_function})
                represents the first derivative function. Notation
                (\shortref{item:partial_deriv_point}) represents the
                first derivative function evaluated at point $x$ (\ie,
                the derivative of $f$ at $x in \set{X}$). Notation
                (\shortref{item:partial_deriv_function_eval}) represents
                the first derivative function evaluated at point $x_0$
                (\ie, the derivative of $f$ at $x_0 \in \set{X}$).
\end{itemize}
%
By these definitions, it is clear that
%
\begin{equation*}
        \frac{ \total f(t) }{ \total t }
        =
        f'(t) 
        = 
        \frac{ \partial f(x(t)) }{ \partial x }
        \frac{ \total x(t) }{ \total t }
\end{equation*}
%
This is a restatement of the chain rule.

\subsection{Functions of Multiple Variables} 

Take sets $\set{X} \subseteq \R$, $\set{Y} \subseteq \R$, $\set{Z}
\subseteq \R$, and $\set{T} \subseteq \R$ and a function $p: \set{X}
\times \set{Y} \times \set{Z} \times \set{T} \mapsto \R$. Choose
$x_0,y_0,z_0,t_0 \in \R$. Now, define the functions $p_x: \set{X}
\mapsto \R$, $p_y: \set{Y} \mapsto \R$, and $p_z: \set{Z} \mapsto \R$
with
%
\begin{equation*}
        p_x(x) \triangleq p( x, y_0, z_0, t_0 )
        \quad \text{ and } \quad
        p_y(y) \triangleq p( x_0, y, z_0, t_0 )
        \quad \text{ and } \quad
        p_z(z) \triangleq p( x_0, y_0, z, t_0 )
\end{equation*}
%
Next, take functions $x: \set{T} \mapsto \R$, $y: \set{T} \mapsto \R$,
and $z: \set{T} \mapsto \R$, and define functions $\hat{p}_x: \set{T}
\mapsto \R$, $\hat{p}_y: \set{T} \mapsto \R$, and $\hat{p}_z: \set{T}
\mapsto \R$ by
%
\begin{equation*}
        \hat{p}_x(t) \triangleq p_x(x(t)) = p( x(t), y_0, z_0, t_0 )
\end{equation*}
%
and
%
\begin{equation*}
        \hat{p}_y(t) \triangleq p_y(y(t)) = p( x_0, y(t), z_0, t_0 )
\end{equation*}
%
and
%
\begin{equation*}
        \hat{p}_z(t) \triangleq p_z(z(t)) = p( x_0, y_0, z(t), t_0 )
\end{equation*}
%
Therefore, we can define partial derivatives
%
\begin{equation*}
        \frac{ \partial p }{ \partial x }
        \triangleq
        \frac{ \partial p_x }{ \partial x }
        \quad \text{ and } \quad
        \frac{ \partial p }{ \partial y }
        \triangleq
        \frac{ \partial p_y }{ \partial y }
        \quad \text{ and } \quad
        \frac{ \partial p }{ \partial z }
        \triangleq
        \frac{ \partial p_z }{ \partial z }
\end{equation*}
%
Clearly, provided differentiability holds, each of these can be
considered functions with domain $\set{X} \times \set{Y} \times \set{Z}
\times \set{T}$. Additionally, if
%
\begin{equation*}
        p(t) \triangleq p( x(t), y(t), z(t), t )
\end{equation*}
%
then it can be shown that the the \emph{total derivative} of $p(t)$ is
%
\begin{equation*}
        \frac{ \total p }{ \total t }
        =
        \frac{ \partial p }{ \partial x }
        \frac{ \total x }{ \total t }
        +
        \frac{ \partial p }{ \partial y }
        \frac{ \total y }{ \total t }
        +
        \frac{ \partial p }{ \partial z }
        \frac{ \total z }{ \total t }
        +
        \frac{ \partial p }{ \partial t }
\end{equation*}
%
This general form extends to functions of any finite number of
variables.

\subsection{Second and Higher Total Derivatives}

Take set $\set{T} \subseteq \R$ and function $f: \set{T} \mapsto \R$.
Define the
notations\symdef[]{Ganalysis.2y2}{second_total_deriv}{$\total^2
f/{\total t}^2$}{second total derivative of function $f$ (\ie, $f''$)}
\symdef[]{Ganalysis.2y3}{third_total_deriv}{$\total^3 f/{\total
t}^3$}{third total derivative of function $f$ (\ie, $f'''$)}
\symdef[]{Ganalysis.2yn}{n_total_deriv}{$\total^n f/{\total
t}^n$}{$n\th$ total derivative of function $f$ (\ie, $f^{(n)}$)}
%
\begin{equation*}
        \frac{ \total^2 f }{ {\total t}^2 } \triangleq f''
        \quad \text{ and } \quad
        \frac{ \total^3 f }{ {\total t}^3 } \triangleq f'''
        \quad \text{ and } \quad
        \frac{ \total^4 f }{ {\total t}^4 } \triangleq f^{(4)}
\end{equation*}
%
and, in general,
%
\begin{equation*}
        \frac{ \total^n f }{ {\total t}^n } \triangleq f^{(n)}
\end{equation*}
%
for all $n \in \{4,5,6,7,\dots\}$.

\subsection{Second Partial Derivatives}

Take sets $\set{X} \subseteq \R$, $\set{Y} \subseteq \R$, $\set{Z}
\subseteq \R$, and $\set{T} \subseteq \R$ and a function $p: \set{X}
\times \set{Y} \times \set{Z} \mapsto \R$. Assuming differentiability,
the partial
derivatives\symdef[]{Ganalysis.2zxy}{second_partial_deriv}{$\partial^2
f/\partial x \partial y$}{partial derivative of function $\partial
f/\partial x$ with respect to $y$}
%
\begin{equation*}
        \frac{ \partial p }{ \partial x }
        \quad \text{ and } \quad
        \frac{ \partial p }{ \partial y }
        \quad \text{ and } \quad
        \frac{ \partial p }{ \partial z }
\end{equation*}
%
are each functions with domain $\set{X} \times \set{Y} \times \set{Z}$.
The notations
%
\begin{equation*}
        \frac{ \partial^2 p }{ \partial x \partial x }
        \quad \text{ and } \quad
        \frac{ \partial^2 p }{ \partial x \partial y }
        \quad \text{ and } \quad
        \frac{ \partial^2 p }{ \partial x \partial z }
\end{equation*}
%
represent the partial derivatives of $\partial p/\partial x$ with
respect to $x$, $y$, and $z$ respectively. However, $\partial^2
p/\partial x \partial x$ is usually denoted $\partial^2 p/{\partial
x}^2$. That is,
%
\begin{equation*}
        \begin{matrix}
                \frac{ \partial^2 p }{ {\partial x}^2 } &
                \frac{ \partial^2 p }{ \partial x \partial y } &
                \frac{ \partial^2 p }{ \partial x \partial z }\\
                \frac{ \partial^2 p }{ \partial y \partial x } &
                \frac{ \partial^2 p }{ {\partial y}^2 } &
                \frac{ \partial^2 p }{ \partial y \partial z }\\
                \frac{ \partial^2 p }{ \partial z \partial x } &
                \frac{ \partial^2 p }{ \partial z \partial y } &
                \frac{ \partial^2 p }{ {\partial z}^2 }
        \end{matrix}
\end{equation*}
%
represent each of the nine second partial derivatives of the function
$p$.

\section{Special Real-Valued Functions}

Here, we discuss a number of commonly used real-valued functions and
classes of real-valued functions.

\subsection{The Exponential Function and Logarithms}
\label{app:math_logarithms}

Now that sequences and series have been defined, it is possible to
construct \emph{Euler's number}, an important constant in mathematics.
This gives us an opportunity to introduce logarithms in terms of Euler's
constant.

\paragraph{Euler's Number:} The symbol
\symdef{Bnumbers.55}{econst}{$e$}{Euler's number (\ie, constant $e
\approx 2.71828182845904523536$)} is often defined by
%
\begin{equation}
        e \triangleq \sum\limits_{i=0}^\infty \frac{1}{n!}
        \label{eq:definition_e}
\end{equation}
%
where the symbol $!$ indicates a
\symdef[\emph{factorial}]{Ganalysis.001}%
{factorial}{${n\bang}$}{factorial of $n$ (\ie,
${n\bang}=1\times2\times\cdots\times n$ with ${0\bang}=1$)}, which is
defined so that for $n \in \W$,
%
\begin{equation*}
        n!
        \triangleq
        \begin{cases}
                1 &\text{if } n=0\\
                1 \times 2 \times 3 \times \cdots \times n
                &\text{if } n > 0
        \end{cases}
\end{equation*}
%
It can be shown that the series in \longref{eq:definition_e} converges.
In fact, using the result summarized by
\longref{eq:theorem_limsupinf_seq}, it can be shown that
%
\begin{equation*}
        e = \lim\limits_{n \to \infty} \left(1 + \frac{1}{n}\right)^n
\end{equation*}
%
For technical reasons, $e$ has a great many applications in mathematics.
Because $e$ is an \emph{irrational number}, its value cannot be written
in a compact fashion. However, it is approximately (\ie, the difference
between this rational number and $e$ is very small)
%
\begin{equation*}
        e \approx 2.71828182845904523536
\end{equation*}
%
where \symdef{Ageneral.1}{approx}{$\approx$}{is approximately equal to}
is a symbol indicating an approximation rather than an equality.

\paragraph{Exponential Function:}
\symdef{Bnumbers.595}{expfunc}{$\exp(x)$}{exponential function (\ie,
$\exp(x) \triangleq e^x$)}The \emph{exponential function} ${\exp}: \R
\mapsto \R_{>0}$ is defined by
%
\begin{equation*}
        \exp(x) \triangleq e^x
\end{equation*}
%
This is a widely used function in science and mathematics.

\paragraph{The Natural Logarithm:} The \emph{natural logarithm} of a
positive real number $x \in \R_{>0}$ is denoted by $\log_e(x)$ or
\symdef{Bnumbers.58}{naturallog}{$\ln(x)$}{natural logarithm of positive
real number $x$ (\ie, $e^{\ln(x)} = x$)} is such that
%
\begin{equation*}
        e^{\ln(x)} = x
\end{equation*}
%
In other words, the natural logarithm is the \emph{exponent} or
\emph{power} to which the number $e$ is \emph{raised} in order to result
in the positive real number $x$. Note that $\ln(1)=0$, $\ln(e)=1$, and
$\ln(e^n)=n$ for all $n \in \W$. Also note that when the logarithm is
viewed as a function ${\ln}: \R_{>0} \mapsto \R$, it is the inverse of
the exponential function $\exp: \R \mapsto \R_{>0}$. In other words, for
all $x \in \R$ and $y \in \R_{>0}$,
%
\begin{equation*}
        \ln(\exp(x)) = x
        \quad \text{ and } \quad
        \exp(\ln(y)) = y
\end{equation*}

\paragraph{The Common Logarithm:} The \emph{common logarithm} of a
positive real number $x \in \R_{>0}$ is denoted by $\log_{10}(x)$ or
\symdef{Bnumbers.57}{commonlog}{$\log(x)$}{common logarithm of positive
real number $x$ (\ie, $10^{\log(x)} = x$)} is such that
%
\begin{equation*}
        10^{\log(x)} = x
\end{equation*}
%
In other words, the common logarithm is the exponent or power to which
the number $10$ is raised in order to result in the positive real number
$x$. Note that $\log(1)=0$, $\log(10)=1$, and $\log(10^n)=n$ for all $n
\in \W$.

\paragraph{The Logarithm:} For \emph{base} $b \in \R_{>0}$, the
\emph{logarithm} of a number \emph{in base $b$} is denoted by
\symdef{Bnumbers.56}{log}{$\log_b(x)$}{logarithm of positive real number
$x$ in base $b$ (\ie, $b^{\log_b(x)} = x$)} is such that
%
\begin{equation*}
        b^{\log_b(x)} = x
\end{equation*}
%
In other words, the logarithm is the exponent or power to which the
positive real number $b$ is raised in order to result in the positive
real number $x$. Note that $\log_b(1)=0$, $\log_b(b)=1$, and
$\log_b(b^n)=n$ for all $n \in \W$. It can be shown that for $a,b,x \in
\R_{>0}$,
%
\begin{equation*}
        \log_c(x) = \frac{ \log_b(x) }{ \log_b(c) }
\end{equation*}
%
In particular,
%
\begin{equation*}
        \log_c(x) = \frac{ \ln(x) }{ \ln(b) }
        \quad \text{ and } \quad
        \log(x) = \frac{ \ln(x) }{ \ln(10) }
\end{equation*}
%
Therefore, the choice of base $b$ is usually arbitrary when the general
term \emph{logarithm} is used.

\subsection{Special Classes of Real Functions}
\label{app:math_special_real_functions}

There are classes of real-valued functions that take forms that have
some useful properties. Here, we introduce two such classes that we will
use frequently.

\paragraph{Polynomials:} Take $n \in \W$ and indexed family $(a_i : i
\in \{0,1,2,\dots,n\})$ where $a_i \in \R$ for all $i \in
\{0,1,2,\dots,n\}$. Take subset $\set{E} \subseteq \R$ and function $f:
\set{E} \mapsto \R$ defined by
%
\begin{equation*}
        f(x) \triangleq a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n
\end{equation*}
%
The function $f$ is called a \emph{polynomial} and is continuous (and
therefore differentiable) on set $\set{E}$. In fact, the derivative $f':
\set{E} \mapsto \R$ is also a polynomial, and so it will also be
continuous (and differentiable) on set $\set{E}$.

\paragraph{Rational Functions:} Take subset $\set{E} \subseteq \R$ and
polynomial functions $p: \set{E} \mapsto \R$ and $q: \set{E} \mapsto
\R$. Define set $\set{E}_0 \subseteq \set{E}$ by 
%
\begin{equation*}
        \set{E}_0 \triangleq \{ x \in \set{E} : q(x) \neq 0 \}
\end{equation*}
%
That is, $\set{E}_0$ is the set of points in $\set{E}$ for which
polynomial $q$ is not $0$. Now take a function $h: \set{E}_0 \mapsto \R$
defined by
%
\begin{equation*}
        h(x) \triangleq \frac{p(x)}{q(x)}
\end{equation*}
%
The function $h$ is called a \emph{rational function} and is continuous
(and therefore differentiable) on the set $\set{E}_0$. In fact, the
derivative $h': \set{E}_0 \mapsto \R$ is also a rational function, and
so it will also be continuous (and differentiable) on set $\set{E}_0$.

\section{Coordinate Vectors and Matrices}
\label{app:math_vectors_matrices}

We now define constructs from linear algebra that have many practical
applications. In particular, we define the coordinate vector space and
spaces of matrices that can shape vectors from that space.

\subsection{The Coordinate Vector Space}
\label{app:math_coord_vector_space}

Let $(\set{F},{+},{\times},0,1)$ be a field with set elements called
scalars. Take $n \in \N$ to be some finite natural number. For
simplicity, assume that $n > 1$. However, note that all of these
definitions naturally extend to the $n=1$ case and should be (\eg,
$\set{F}$ can be substituted for $\set{F}^1$). Note that an element
$\v{x} \in \set{F}^n$ (recall this notation from
\longref{app:math_cartesian_prod}) takes the form
%
\begin{equation*}
        \v{x} = (x_1, x_2, x_3, \dots, x_n)
\end{equation*}
%
where $x_i \in \set{F}$ for all $i \in \{1,2,3,\dots,n\}$. That is, for
vector $\v{y} \in \set{F}^n$, the $i\th$ coordinate of $\v{y}$ is
denoted \symdef{Hvectors.2}{ithcoordinate}{$y_i$}{the $i\th$ coordinate
of vector $\v{y}$}. Next, take any $x,y \in \set{F}^n$. Define the
operation ${+}: \set{F}^n \times \set{F}^n \mapsto \set{F}^n$ such that
%
\begin{equation*}
        \v{x} + \v{y} \triangleq (x_1+y_1,x_2+y_2,x_3+y_3,\dots,x_n+y_n)
\end{equation*}

Also take $a \in \set{F}$. Define the operation ${\times}: \set{F}
\times \set{F}^n \mapsto \set{F}^n$ (which will be represented with
juxtaposition) by
%
\begin{equation*}
        a\v{x} \triangleq (a x_1, a x_2, a x_3, \dots, a x_n)
\end{equation*}

Additionally, define the notation $-\v{x}$ to represent
%
\begin{equation*}
        {-\v{x}} \triangleq ({-x_1},{-x_2},{-x_3},\dots,{-x_n})
\end{equation*}
%
and use $0$ to represent
%
\begin{equation*}
        0 \triangleq \{0\}^n = (0,0,0,\dots,0)
\end{equation*}
%
Of course, $0 \in \set{F}^n$. Also use the notation $\v{x} - \v{y}$ to
represent $\v{x} + {-\v{y}}$. 

Finally, use \symdef{Hvectors.42}{ithbasisvector}{$\v{e}_i$}{the $i\th$
elementary (or standard) basis vector} for $i \in \{1,2,3,\dots,n\}$ to
represent
%
\begin{align*}
        \v{e}_1 &\triangleq (1,0,0,\dots,0)\\
        \v{e}_2 &\triangleq (0,1,0,\dots,0)\\
        \v{e}_3 &\triangleq (0,0,1,\dots,0)\\
        &\vdots\\
        \v{e}_n &\triangleq (0,0,0,\dots,1)
\end{align*}
%
These are called \emph{basis vectors} for $\set{F}^n$ because, using the
definitions above, for any vector $\v{z} \in \set{F}^n$, there exists an
$n$-tuple $(a_1,a_2,a_3,\dots,a_n)$ where $a_i \in \set{F}$ for all $i
\in \N$ (\ie, scalars) such that
%
\begin{equation*}
        \v{z} 
        = 
        a_1 \v{e}_1 + a_2 \v{e}_2 + a_3 \v{e}_3 \cdots + a_n \v{e}_n
\end{equation*}
%
In particular, for these particular basis vectors,
$(a_1,a_2,a_3,\dots,a_n)=(z_1,z_2,z_3,\dots,z_n)$. Thus, these are
called the \emph{elementary (or standard) basis vectors} for
$\set{F}^n$.

Note that $\v{x} + 0 = 0 + \v{x} = \v{x}$. Also note that $\v{x} - \v{x}
= {-\v{x}} + \v{x} = 0$. In this case, it is easy to show that for all
$a,b \in \set{F}$ and $\v{x},\v{y} \in \set{F}^n$,
%
\begin{itemize}
        \item $(\set{F}^n,{+})$ is a commutative group
        \item $a( x + y ) = ax + ay$
        \item $(a + b)x = ax + bx$
        \item $a(bx) = (ab)x$
        \item $1x = x$
\end{itemize}
%
Because of this, $\set{F}^n$ is a vector space over the field $\set{F}$.
In particular, we call $\set{F}^n$ a \emph{coordinate (vector) space}.
Elements of $\set{F}^n$ are thus called \emph{vectors} and elements of
$\set{F}$ are thus called \emph{scalars}. We will typically use the
coordinate vector space $\R^n$, which is called the \emph{real
coordinate space}. Later, we will endow $\R^n$ with a particular notion
of distance, in which $\R^n$ will become a \emph{Euclidean space}.

\paragraph{Notation and the Covector Space:} Take $n \in \N$ and the
coordinate vector space $\set{F}^n$ with operations defined above. Take
a vector $\v{x} \in \set{F}^n$. Rather than denoting $\v{x} = (x_1, x_2,
x_3, \dots, x_n)$, use the \emph{matrix notation}
%
\begin{equation*}
        \v{x} 
        = 
        \begin{bmatrix} 
                x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_n
        \end{bmatrix}
\end{equation*}
%
That is, denote $\v{x}$ as a \emph{column} vector and call it an
\emph{$n$-dimensional vector} or simply an \emph{$n$-vector}. Now call
$\set{F}^{1 \times n}$ a vector space identical to $\set{F}^n$ except
with elements represented as \emph{row} vectors that are otherwise
called \emph{covectors}. That is, call $\set{F}^{1 \times n}$ a
\emph{covector space}. Take covector $\v{y} \in \set{F}^{1 \times n}$.
The covector $\v{y}$ is represented by
%
\begin{equation*}
        \v{y} 
        = 
        \begin{bmatrix} 
                y_1 & y_2 & y_3 & \dots & y_n
        \end{bmatrix}
\end{equation*}
%
Therefore, call $\v{y}$ an \emph{$n$-dimensional covector} or simply an
\emph{$n$-covector}. Now define the \emph{transpose} operations ${\T}:
\set{F}^n \mapsto \set{F}^{1 \times n}$ and ${\T}: \set{F}^{1 \times n}
\mapsto \set{F}^n$ so
that\symdef[]{Hvectors.3}{xtranspose}{$\v{x}^\T$}{the transpose of
vector or covector $\v{x}$ (\ie, if $\v{x}$ is an $n$-vector then $\v{x}
= [x_1, x_2, \dots, x_n]^\T)$}
%
\begin{equation*}
        \v{x}^\T
        =
        \begin{bmatrix}
                x_1 & x_2 & x_3 & \dots & x_n
        \end{bmatrix}
\end{equation*}
%
and
%
\begin{equation*}
        \v{y}^\T
        =
        \begin{bmatrix}
                y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n
        \end{bmatrix}
\end{equation*}
%
Therefore, $\v{x} = [ x_1, x_2, x_3, \dots, x_n ]^\T$, and so this
notation will often be used when it is more convenient to denote vectors
horizontally. That is, all vectors are transposes of covectors and all
covectors are transposes of vectors. Because of this duality, the vector
space $\set{F}^n$ will sometimes be denoted $\set{F}^{n \times 1}$.
These topics will be generalized in \longref{app:math_matrices}.

\paragraph{Multiplication of Vectors and Covectors:} Take $n \in \N$ and
the coordinate vector space $\set{F}^n$ and coordinate covector space
$\set{F}^{1 \times n}$. The multiplication operators $\times: \set{F}^{1
\times n} \times \set{F}^n \mapsto \set{F}$ and $\times: \set{F} \times
\set{F}^{1 \times n} \mapsto \set{F}$ (both denoted by juxtaposition)
are defined such that for vectors $\v{x},\v{y} \in \set{F}^n$,
%
\begin{equation*}
        \v{x} \v{y}^\T
        \triangleq
        \v{x}^\T \v{y}
        \triangleq
        x_1 y_1 + x_2 y_2 + x_3 y_3 + \dots + x_n y_n
        =
        \sum\limits_{i=1}^n x_i y_i
\end{equation*}
%
Note that
%
\begin{align*}
        \v{x}^\T \v{y} = 
        \v{y}^\T \v{x} = 
        \v{x} \v{y}^\T = 
        \v{y} \v{x}^\T
\end{align*}
%
and, of course, all of these products are scalars by definition.

\subsection{Real Inner-Product Spaces}

Take $n \in \N$ and real coordinate vector space $\R^n$. Define a
bilinear function \symdef[]{Hvectors.4}{innerprod}{$\langle \v{x}, \v{y}
\rangle$}{the inner product of vectors $\v{x}$ and $\v{y}$}$\langle
\cdot , \cdot \rangle : \R^n \times \R^n \mapsto \R$. That is, for any
scalars $a,b \in \set{F}$ and any vectors $\v{x},\v{y},\v{z} \in \R^n$,
%
\begin{equation*}
        \langle a \v{x} + b \v{y}, \v{z} \rangle
        =
        a \langle \v{x}, \v{z} \rangle 
        +
        b \langle \v{y}, \v{z} \rangle
\end{equation*}
%
and
%
\begin{equation*}
        \langle \v{x}, a \v{y} + b \v{z} \rangle
        =
        a \langle \v{x}, \v{y} \rangle 
        +
        b \langle \v{x}, \v{z} \rangle
\end{equation*}
%
In this case, $\langle \cdot, \cdot \rangle$ is called an \emph{real
inner product}. This may sometimes be denoted with the \emph{Dirac
inner-product notation} $\langle \cdot | \cdot \rangle$, which replaces
the comma (\ie, $,$) with a vertical bar (\ie, $|$). If $\R^n$ is
endowed with a real inner product then it is called a \emph{real
inner-product space}.

\paragraph{Dot Product:} For $n \in \N$, on $\R^n$, the standard inner
product is the \emph{dot product}, which is defined for any $\v{x},\v{y}
\in \R^n$ by
%
\begin{align*}
        \langle \v{x}, \v{y} \rangle
        &\triangleq
        x_1 y_1 + x_2 y_2 + x_3 y_3 + \dots + x_n y_n\\
        &=
        \sum\limits_{i=1}^n x_i y_i\\
        &=
        \v{x}^\T \v{y}
        =
        \v{x} \v{y}^\T
        =
        \v{y}^\T \v{x}
        =
        \v{y} \v{x}^\T
\end{align*}
%
That is, the dot product is simply the multiplication of one vector by
the transpose of the other and results in a scalar. This motivates the
\emph{Dirac inner-product notation} which defines $\langle \v{x} |$ and
$| \v{y} \rangle$ by
%
\begin{align*}
        \langle \v{x} | &\triangleq \v{x}^\T\\
        | \v{y} \rangle &\triangleq \v{y}
\end{align*}
%
The inner product denoted $\langle \v{x} | \v{y} \rangle$ is defined by
%
\begin{equation*}
        \langle \v{x} | \v{y} \rangle 
        \triangleq 
        \langle \v{x} | | \v{y} \rangle
        =
        \v{x}^\T \v{y}
\end{equation*}
%
Thus, it is natural to use this notation when using the dot product as
an inner product.

\subsection{Normed Vector Spaces}
\label{app:math_normed_vector_spaces}

Take $n \in \N$ and coordinate vector space $\set{F}^n$. However, assume
that $\set{F}$ is a subfield of $\R$. Therefore, $\set{F}$ is an ordered
field with an absolute value defined. This definition can be expanded so
that $\set{F}$ can be a subfield of the \emph{complex numbers}, which we
do not discuss, since the complex numbers also have an absolute value
function. However, we restrict our definition to the real case.
\symdef[]{Hvectors.401}{realnorm}{$\ppipe \v{x} \ppipe$}{the norm of
vector $\v{x}$}Define a function $\|\cdot\| : \set{F}^n \mapsto \R$ such
that for all $a \in \set{F}$ and all $\v{x},\v{y} \in \set{F}^n$,
%
\begin{enumerate}[(i)]
        \item $\| a \v{x} \| = |a| \| \v{x} \|$
        \item $\| \v{x} + \v{y} \| \leq \|\v{x}\|+\|\v{y}\|$
        \item $\| \v{x} \| = 0$ if and only if $\v{x} = 0$
\end{enumerate}
%
In this case, $\|\cdot\|$ is called a \emph{(real) norm} and $\set{F}^n$
is called a \emph{normed vector space}. Note that $\|\v{x}\| \geq 0$ for
all $\v{x} \in \set{F}^n$ with equality only when $\v{x}=0$. In
\longref{app:math_euclidean_space}, we will demonstrate how norms can be
defined on $\R^n$, a frequently used coordinate vector space.

\paragraph{Norms as Metrics:} For a normed vector space $\set{F}^n$,
define $d: \set{F}^n \times \set{F}^n \mapsto \set{F}$ such that for any
$\v{x},\v{y} \in \set{F}^n$,
%
\begin{equation*}
        d(\v{x},\v{y}) 
        = 
        \| \v{x} - \v{y} \|
\end{equation*}
%
It can be shown that $d$ is a metric. Therefore, any normed vector
space is a metric space. In fact, since any inner-product space is a
normed vector space, any inner-product space is a metric space. 

\paragraph{Metrics as Norms and Boundedness:} Take metric space
$(\set{V},d)$ where $\set{F}$ is a subfield of $\R$ and $\set{V}$ is a
vector space over $\set{F}$. Assume that the metric $d$ satisfies the
additional properties
%
\begin{enumerate}[(i)]
        \item $d(\v{x},\v{y}) = d(\v{x}+\v{z},\v{y}+\v{z})$
        \item $d(a\v{x},a\v{y}) = |a|d(\v{x},\v{y})$
                \label{item:homogenous_metric}
\end{enumerate}
%
for all $\v{x},\v{y},\v{z} \in \set{V}$ and all $a \in \set{F}$. Now,
for all $\v{x} \in \set{V}$, define $\|\v{x}\|$ by
%
\begin{equation*}
        \|\v{x}\|
        \triangleq
        d(\v{x},0)
\end{equation*}
%
where $0$ is the additive identity for $(\set{V},{+})$. It can be shown
that this $\|\cdot\|$ satisfies the conditions required for being a
norm. That is, under these restrictions on a metric space, the distance
from zero can be considered to be the norm (\ie, the length) of a
vector. Finally, note the following two remarks.
%
\begin{itemize}
        \item This definition of a norm induced by a metric can be
                weakened so that $\set{F}$ may also be the subfield of
                the \emph{complex numbers}, which are not ordered.
                However, we do not introduce the complex numbers here
                and so we force $\set{F}$ to be a subfield of $\R$.
        \item Metric spaces with this quality are metric spaces in which
                \emph{bounded} with respect to order is equivalent to
                \emph{bounded} with respect to metric. That is, the norm
                induced by this metric gives the distance away from
                zero; however, the notion of order can be viewed as a
                distance away from zero as well. Thus, being bounded in
                one sense is equivalent to being bounded in the other
                sense.
\end{itemize}
%
Therefore, many metric spaces are also normed vector spaces.

\subsection{The Euclidean Space}
\label{app:math_euclidean_space}

We will frequently use the \emph{Euclidean space} $\R^n$, which is
defined to be a vector space equipped with a special inner product,
norm, and metric. In fact, what is special about the Euclidean space is
how distances are defined. This space captures the familiar notions of
distance. Before defining this space precisely, we must show how norms,
inner products, and metrics are related (on $\R^n$). 

\paragraph{Norm Induced by Inner-Product:} Take the real inner-product
space $\R^n$. By the properties of the inner product, the a norm
$\|\cdot\|$ can be defined so that for any $\v{x} \in \R^n$,
%
\begin{equation*}
        \|\v{x}\| \triangleq \sqrt{\langle \v{x}, \v{x} \rangle}
\end{equation*}
%
and, of course,
%
\begin{equation*}
        \|\v{x}\|^2 = \langle \v{x}, \v{x} \rangle
\end{equation*}
%
This is known as the norm \emph{induced by} the inner product. It can be
shown that for all $\v{x} \in \R^n$, $\|\v{x}\| \geq 0$ where $\|\v{x}\|
> 0$ if and only if $\v{x} \neq 0$. Thus, every inner-product space is
also a normed space. 

\paragraph{2-Norm Induced by Dot Product:} Take the real inner-product
space $\R^n$ where the inner product is taken to be the dot product.
\symdef[]{Hvectors.402}{2norm}{$\ppipe \v{x} \ppipe_2$}{the Euclidean
norm of vector $\v{x}$ (\ie, the norm induced by the dot product)}In
other words, for all $\v{x},\v{y} \in \R^n$,
%
\begin{equation*}
        \langle \v{x}, \v{y} \rangle = \v{x}^\T \v{y}
\end{equation*}
%
The \emph{$2$-norm} or the \emph{Euclidean norm}, denoted $\|\cdot\|_2$,
is the norm induced by this inner product. That is, for $\v{x} \in
\R^n$,
%
\begin{equation*}
        \|\v{x}\|_2
        \triangleq \sqrt{ \v{x}^\T \v{x} }
        = \sqrt{ x_1^2 + x_2^2 + x_3^2 + \dots + x_n^2 }
\end{equation*}
%
and, of course,
%
\begin{equation*}
        \|\v{x}\|_2^2
        = \v{x}^\T \v{x} = x_1^2 + x_2^2 + x_3^2 + \dots + x_n^2
\end{equation*}

\paragraph{The Euclidean Metric:} Take the real inner-product space
$\R^n$ with the dot product and the Euclidean norm (\ie, the $2$-norm).
The \emph{Euclidean metric} $d: \R^n \times \R^n \mapsto \R$ is defined
so that for any $\v{x},\v{y} \in \R^n$,
%
\begin{align*}
        d( \v{x}, \v{y} )
        &\triangleq
        \| \v{x} - \v{y} \|_2\\
        &=
        \sqrt{ ( \v{x} - \v{y} )^\T ( \v{x} - \v{y} ) }\\
        &= 
        \sqrt{ 
        (x_1-y_1)^2 + (x_2-y_2)^2 + (x_3-y_3)^2 + \cdots + (x_n-y_n)^2
        }
\end{align*}
%
and so
%
\begin{equation*}
        d( \v{x}, \v{y} )^2
        = 
        (x_1-y_1)^2 + (x_2-y_2)^2 + (x_3-y_3)^2 + \cdots + (x_n-y_n)^2
\end{equation*}
%
This is a very familiar distance function. Note that
%
\begin{itemize}
        \item $d(\v{x},\v{y}) = d(\v{x}+\v{z},\v{y}+\v{z})$
        \item $d(a\v{x},a\v{y}) = |a|d(\v{x},\v{y})$
                \label{item:homogenous_euclidean_metric}
\end{itemize}
%
for all $\v{x},\v{y},\v{z} \in \R^n$ and $a \in \R$. By the discussion
in \longref{app:math_normed_vector_spaces}, $\R^n$ is a special metric
space where the term \emph{bounded} with respect to order is equivalent
to the term \emph{bounded} with respect to metric.

\paragraph{The Euclidean Space:} Take $n \in \N$ and real coordinate
space $\R^n$. Equip $\R^n$ with the dot product, the Euclidean norm
(\ie, the $2$-norm), and the Euclidean metric. Of course, $\R^n$ is a
metric space and thus a Hausdorff topological space. Additionally, under
these conditions, \symdef[]{Bnumbers.545}{euclideanspace}{$\R^n$}{the
Euclidean $n$-space} is called the \emph{Euclidean space} or the
\emph{Euclidean $n$-space}. 

\paragraph{The Euclidean Topology and Compact Sets:} The set of sets
$\setset{T} \subseteq \Pow(\R^n)$ defined by
%
\begin{equation*}
        \setset{T} 
        \triangleq
        \{ \set{X}_1 \times \set{X}_2 \times \cdots \times \set{X}_n :
        \set{X}_i \text{ is an open set in } \R \}
\end{equation*}
%
is the standard topology (\ie, the open sets) for $\R^n$ (\ie,
$(\R^n,\set{T})$ is a topological space). Of course, we define open sets
in $\R$ using the standard metric for $\R$ (\ie, $d(x,y)=|x-y|$ for all
$x,y \in \R$). The set $\setset{T}$ is
called the \emph{box topology} for reasons involving a geometric
interpretation of the shape of its open sets. Because $n$ is a finite
number, $\setset{T}$ will also be called a \emph{product topology} for
reasons outside the scope of this document. A subset $\set{X} \subseteq
\R^n$ is compact if and only if it is closed and bounded, using the
standard definitions of a closed set and a bounded set in a metric
space (though, in this space, boundedness in the sense of order is also
applicable).

\subsection{Matrices}
\label{app:math_matrices}

Take $n,m \in \N$, the vector space $\set{F}^n$, and the $m$-tuple of
$\set{F}^n$ $n$-vectors $(\v{x}^1,\v{x}^2,\v{x}^3,\dots,\v{x}^m)$.
Recall that 
%
\begin{equation*}
        \v{x}^1
        =
        \begin{bmatrix}
                x_1^1 \\ 
                x_2^1 \\
                x_3^1 \\
                \vdots\\
                x_n^1
        \end{bmatrix}
\end{equation*}
%
Collect each of the $m$ vectors into an \emph{$n$-by-$m$ matrix}
$\mat{X}$ with $n$ rows and $m$ columns so that
%
\begin{equation*}
        \mat{X}
        \triangleq
        \begin{bmatrix}
                x_1^1 & x_1^2 & x_1^3 & \cdots & x_1^m \\
                x_2^1 & x_2^2 & x_2^3 & \cdots & x_2^m \\
                x_3^1 & x_3^2 & x_3^3 & \cdots & x_3^m \\
                \vdots & \vdots & \vdots & \ddots & \vdots \\
                x_n^1 & x_n^2 & x_n^3 & \cdots & x_n^m
        \end{bmatrix}
\end{equation*}
%
which can be written more compactly as
%
\begin{equation*}
        \mat{X}
        =
        \begin{bmatrix}
                \v{x}^1 & \v{x}^2 & \v{x}^3 & \cdots & \v{x}^m
        \end{bmatrix}
\end{equation*}
%
\symdef[]{Bnumbers.5451}{realmatrices}{$\R^{n \times m}$}{space of
$n$-by-$m$ real matrices}All $n$-by-$m$ matrices are said to be elements
of the $\set{F}^{n \times m}$ space. Matrices from a space of the form
$\set{F}^{n \times n}$ (\ie, $n=m$) are said to be \emph{square
matrices}.

Now take the covector space $\set{F}^{1 \times n}$ and the $m$-tuple of
$\set{F}^{1 \times n}$ $n$-covectors
$(\v{y}^1,\v{y}^2,\v{y}^3,\dots,\v{y}^m)$. Collect each of the $m$
covectors into an \emph{$m$-by-$n$ matrix} $\mat{Y}$ with $m$ rows and
$n$ columns so that
%
\begin{equation*}
        \mat{Y}
        \triangleq
        \begin{bmatrix}
                y_1^1 & y_2^1 & y_3^1 & \cdots & y_n^1 \\
                y_1^2 & y_2^2 & y_3^2 & \cdots & y_n^2 \\
                y_1^3 & y_2^3 & y_3^3 & \cdots & y_n^3 \\
                \vdots & \vdots & \vdots & \ddots & \vdots \\
                y_1^m & y_2^m & y_3^m & \cdots & y_n^m
        \end{bmatrix}
\end{equation*}
%
which can be written more compactly as
%
\begin{equation*}
        \mat{Y}
        =
        \begin{bmatrix}
                \v{y}^1 \\ \v{y}^2 \\ \v{y}^3 \\ \vdots \\ \v{y}^m
        \end{bmatrix}
\end{equation*}
%
All $m$-by-$n$ matrices are said to be elements of the $\set{F}^{m
\times n}$ space.

\symdef[]{Hvectors.31}{mattranspose}{$\mat{A}^\T$}{the transpose of
matrix $\mat{A}$}Now define the transpose operators $\T: \set{F}^{n
\times m} \mapsto \set{F}^{m \times n}$ and $\T: \set{F}^{m \times n}
\mapsto \set{F}^{n \times m}$ such that
%
\begin{equation*}
        \mat{X}^\T
        =
        \begin{bmatrix}
                {\v{x}^1}^\T \\ 
                {\v{x}^2}^\T \\ 
                {\v{x}^3}^\T \\ 
                \vdots \\ 
                {\v{x}^m}^\T
        \end{bmatrix}
        =
        \begin{bmatrix}
                x_1^1 & x_2^1 & x_3^1 & \cdots & x_n^1 \\
                x_1^2 & x_2^2 & x_3^2 & \cdots & x_n^2 \\
                x_1^3 & x_2^3 & x_3^3 & \cdots & x_n^3 \\
                \vdots & \vdots & \vdots & \ddots & \vdots \\
                x_1^m & x_2^m & x_3^m & \cdots & x_n^m
        \end{bmatrix}
\end{equation*}
%
and
%
\begin{equation*}
        \mat{Y}^\T
        =
        \begin{bmatrix}
                {\v{y}^1}^\T &
                {\v{y}^2}^\T & 
                {\v{y}^3}^\T & 
                \cdots & 
                {\v{y}^m}^\T
        \end{bmatrix}
        =
        \begin{bmatrix}
                y_1^1 & y_1^2 & y_1^3 & \cdots & y_1^m \\
                y_2^1 & y_2^2 & y_2^3 & \cdots & y_2^m \\
                y_3^1 & y_3^2 & y_3^3 & \cdots & y_3^m \\
                \vdots & \vdots & \vdots & \ddots & \vdots \\
                y_n^1 & y_n^2 & y_n^3 & \cdots & y_n^m
        \end{bmatrix}
\end{equation*}

\paragraph{Matrix Addition:} Take $m,n \in \N$. Take matrices
$\mat{X},\mat{Y} \in \set{F}^{n \times m}$ denoted to make their columns
explicit. That is, 
%
\begin{equation*}
        \mat{X}
        =
        \begin{bmatrix}
                \v{X}^1 & \v{X}^2 & \v{X}^3 & \cdots & \v{X}^m
        \end{bmatrix}
        \quad \text{ and } \quad
        \mat{Y}
        =
        \begin{bmatrix}
                \v{Y}^1 & \v{Y}^2 & \v{Y}^3 & \cdots & \v{Y}^m
        \end{bmatrix}
\end{equation*}
%
Now, define the matrix addition operator ${+}: \set{F}^{n \times m}
\times \set{F}^{n \times m} \mapsto \set{F}^{n \times m}$ so that
%
\begin{equation*}
        \mat{X} + \mat{Y}
        \triangleq
        \begin{bmatrix}
                (\v{X}^1 + \v{Y}^1) & 
                (\v{X}^2 + \v{Y}^2) & 
                (\v{X}^3 + \v{Y}^3) & 
                \cdots & 
                (\v{X}^m + \v{Y}^m)
        \end{bmatrix}
\end{equation*}
%
Equivalently, define the matrix addition operator ${+}: \set{F}^{m
\times n} \times \set{F}^{m \times n} \mapsto \set{F}^{m \times n}$ so
that for $\mat{X},\mat{Y} \in \set{F}^{m \times n}$,
%
\begin{equation*}
        \mat{X}^\T + \mat{Y}^\T
        \triangleq
        \begin{bmatrix}
                (\v{X}^1 + \v{Y}^1)^\T \\
                (\v{X}^2 + \v{Y}^2)^\T \\
                (\v{X}^3 + \v{Y}^3)^\T \\
                \vdots \\ 
                (\v{X}^m + \v{Y}^m)^\T
        \end{bmatrix}
\end{equation*}
%
Note that matrix addition is commutative and associative.

\paragraph{Scalar (Matrix) Multiplication:} Take $m,n \in \N$. Take
matrix $\mat{X} \in \set{F}^{n \times m}$ denoted to make its columns
explicit. That is, 
%
\begin{equation*}
        \mat{X}
        =
        \begin{bmatrix}
                \v{X}^1 & \v{X}^2 & \v{X}^3 & \cdots & \v{X}^m
        \end{bmatrix}
\end{equation*}
%
Now, define the scalar multiplication operator ${\times}: \set{F} \times
\set{F}^{n \times m} \mapsto \set{F}^{n \times m}$ so that for $a \in
\set{F}$,
%
\begin{equation*}
        a \mat{X}
        \triangleq
        \begin{bmatrix}
                a \v{X}^1 & a \v{X}^2 & a \v{X}^3 & \cdots & a \v{X}^m
        \end{bmatrix}
\end{equation*}
%
Equivalently, define the scalar multiplication operator ${\times}:
\set{F} \times \set{F}^{m \times n} \mapsto \set{F}^{m \times n}$ so
that for $a \in \set{F}$,
%
\begin{equation*}
        a \mat{X}^\T
        \triangleq
        \begin{bmatrix}
                ( a \v{X}^1 )^\T \\
                ( a \v{X}^2 )^\T \\
                ( a \v{X}^3 )^\T \\ 
                \vdots \\
                ( a \v{X}^m )^\T
        \end{bmatrix}
\end{equation*}
%
In other words, a scalar multiplied by a matrix will multiply each
scalar element of the matrix by the scalar.

\paragraph{Matrix Multiplication:} Take $k,m,n \in \N$. Take matrices
$\mat{X} \in \set{F}^{m \times k}$ and $\mat{Y} \in \set{F}^{m \times
n}$ denoted to make their columns explicit. That is, 
%
\begin{equation*}
        \mat{X}
        =
        \begin{bmatrix}
                \v{X}^1 & \v{X}^2 & \v{X}^3 & \cdots & \v{X}^k
        \end{bmatrix}
        \quad \text{ and } \quad
        \mat{Y}
        =
        \begin{bmatrix}
                \v{Y}^1 & \v{Y}^2 & \v{Y}^3 & \cdots & \v{Y}^n
        \end{bmatrix}
\end{equation*}
%
Note that $\mat{X}^\T \in \set{F}^{k \times m}$. Now, define the matrix
multiplication operator ${\times}: \set{F}^{k \times m} \times
\set{F}^{m \times n} \mapsto \set{F}^{k \times n}$ (using juxtaposition
notation) so that
%
\begin{equation*}
        \mat{X}^\T \mat{Y}
        \triangleq
        \begin{bmatrix}
                {\v{X}^1}^\T \v{Y}^1 &
                {\v{X}^1}^\T \v{Y}^2 &
                {\v{X}^1}^\T \v{Y}^3 &
                \cdots &
                {\v{X}^1}^\T \v{Y}^n \\
                {\v{X}^2}^\T \v{Y}^1 &
                {\v{X}^2}^\T \v{Y}^2 &
                {\v{X}^2}^\T \v{Y}^3 &
                \cdots &
                {\v{X}^2}^\T \v{Y}^n \\
                {\v{X}^3}^\T \v{Y}^1 &
                {\v{X}^3}^\T \v{Y}^2 &
                {\v{X}^3}^\T \v{Y}^3 &
                \cdots &
                {\v{X}^3}^\T \v{Y}^n \\
                \vdots & \vdots & \vdots & \ddots & \vdots \\
                {\v{X}^k}^\T \v{Y}^1 &
                {\v{X}^k}^\T \v{Y}^2 &
                {\v{X}^k}^\T \v{Y}^3 &
                \cdots &
                {\v{X}^k}^\T \v{Y}^n
        \end{bmatrix}
\end{equation*}
%
Equivalently, define the matrix multiplication operator ${\times}:
\set{F}^{n \times m} \times \set{F}^{m \times k} \mapsto \set{F}^{n
\times k}$ (using juxtaposition notation) so that 
%
\begin{equation*}
        \mat{Y}^\T \mat{X}
        \triangleq
        \begin{bmatrix}
                {\v{Y}^1}^\T \v{X}^1 &
                {\v{Y}^1}^\T \v{X}^2 &
                {\v{Y}^1}^\T \v{X}^3 &
                \cdots &
                {\v{Y}^1}^\T \v{X}^k \\
                {\v{Y}^2}^\T \v{X}^1 &
                {\v{Y}^2}^\T \v{X}^2 &
                {\v{Y}^2}^\T \v{X}^3 &
                \cdots &
                {\v{Y}^2}^\T \v{X}^k \\
                {\v{Y}^3}^\T \v{X}^1 &
                {\v{Y}^3}^\T \v{X}^2 &
                {\v{Y}^3}^\T \v{X}^3 &
                \cdots &
                {\v{Y}^3}^\T \v{X}^k \\
                \vdots & \vdots & \vdots & \ddots & \vdots \\
                {\v{Y}^n}^\T \v{X}^1 &
                {\v{Y}^n}^\T \v{X}^2 &
                {\v{Y}^n}^\T \v{X}^3 &
                \cdots &
                {\v{Y}^n}^\T \v{X}^k
        \end{bmatrix}
\end{equation*}
%
Note that while $\mat{X}^\T \mat{Y}$ and $\mat{Y}^\T \mat{X}$ are
defined, all of
%
\begin{equation*}
        \mat{X} \mat{X}
        \qquad
        \mat{Y} \mat{Y}
        \qquad
        \mat{X}^\T \mat{X}^\T
        \qquad
        \mat{Y}^\T \mat{Y}^\T
        \qquad
        \mat{X} \mat{Y}
        \qquad
        \mat{Y} \mat{X}
        \qquad
        \mat{X} \mat{Y}^\T
        \qquad
        \mat{Y} \mat{X}^\T
\end{equation*}
%
are not defined in general. However, if $k=n$ then
%
\begin{equation*}
        \mat{X} \mat{Y}^\T
        \qquad
        \mat{X}^\T \mat{Y}
\end{equation*}
%
are always defined. However, the former results in a $k$-by-$k$ matrix
and the latter results in an $m$-by-$m$ matrix and so clearly $\mat{X}
\mat{Y}^\T \neq \mat{X}^\T \mat{Y}$ if $m \neq n$. In fact, this
comparison is nonsense unless $k=m=n$. In that case, $\set{X}$ and
$\set{Y}$ are \emph{square matrices}. If $\set{X}$ and $\set{Y}$ are
$n$-by-$n$ square matrices, then so are $\mat{X}^\T$ and $\mat{Y}^\T$,
and so any combination of $\mat{X}$, $\mat{Y}$, $\mat{X}^\T$, and
$\mat{Y}^\T$ can be multiplied in any order. 

\subsection{Square Matrices}

For a field $(\set{F},+,\times,0,1)$ and $n \in \N$, the square matrices
in the space $\set{F}^{n \times n}$ have some special properties.

\paragraph{Square Matrix Multiplication:} Take $n \in \N$ and matrices
$\mat{X},\mat{Y} \in \set{F}^{n \times n}$. All of $\mat{X}$, $\mat{Y}$,
$\mat{X}^\T$, and $\mat{Y}^\T$ are square $n$-by-$n$ matrices, and
therefore
%
\begin{equation*}
        \mat{X} \mat{X}
        \qquad
        \mat{Y} \mat{Y}
        \qquad
        \mat{X}^\T \mat{X}^\T
        \qquad
        \mat{Y}^\T \mat{Y}^\T
        \qquad
        \mat{X} \mat{Y}
        \qquad
        \mat{Y} \mat{X}
        \qquad
        \mat{X} \mat{Y}^\T
        \qquad
        \mat{Y} \mat{X}^\T
\end{equation*}
%
are all defined. However, in general $\mat{X} \mat{Y} \neq \mat{Y}
\mat{X}$. In other words, matrix multiplication is \emph{not}
communicative. However, it can be shown that matrix multiplication is
associative. 

\paragraph{Square Matrix Identity:} Take $n \in \N$ and a vector space
$\set{F}^n$. Recall the definitions of $\v{e}_i$ for all $i \in \N$ from
\longref{app:math_coord_vector_space}. Define the \emph{identity matrix}
$\mat{I}_n \in \set{F}^{n \times n}$ as
%
\begin{equation*}
        \I_n
        \triangleq
        \begin{bmatrix}
                \v{e}_1 &
                \v{e}_2 &
                \v{e}_3 &
                \cdots &
                \v{e}_n
        \end{bmatrix}
        =
        \begin{bmatrix}
                1 & 0 & 0 & \cdots & 0 \\
                0 & 1 & 0 & \cdots & 0 \\
                0 & 0 & 1 & \cdots & 0 \\
                \vdots & \vdots & \vdots & \ddots & \vdots \\
                0 & 0 & 0 & \cdots & 1
        \end{bmatrix}
\end{equation*}
%
Now take $m \in \N$ and matrices $\mat{X} \in \set{F}^{m \times n}$ and
$\mat{Y} \in \set{F}^{n \times m}$. It can easily be verified that
%
\begin{equation*}
        \mat{X} \I_n = \mat{X}
        \quad \text{ and } \quad
        \I_n \mat{Y} = \mat{Y}
\end{equation*}
%
In fact, for a square matrix $\mat{Z} \in \set{F}^{n \times n}$,
%
\begin{equation*}
        \mat{Z} \I_n = \I_n \mat{Z} = \mat{Z}
\end{equation*}
%
and thus $\I_n$ is known as the \emph{identity matrix} for $n$-by-$n$
square matrices. Note that the notation
\symdef{Hvectors.45}{identitymatrix}{$\I$}{the identity matrix} will
often be used instead of $\I_n$ because the value of $n$ will usually be
obvious in the context.

\paragraph{Square Matrices as Unitary Associative Algebra:} Clearly,
with the operations defined for matrices, for any $n \in \N$, the space
$\set{F}^{n \times n}$ not only forms a vector space over the field
$\set{F}$ but also forms a unitary associative algebra over the field
$\set{F}$ (\ie, a unitary associative $\set{F}$-algebra). This means
that many aspects of familiar arithmetic can be easily applied to
matrices. 

\subsection{Matrices as Vector Functions}

Take $m,n \in \N$. Note that the vector space $\set{F}^n$ can be viewed
as a space of $n$-by-$1$ matrices and the covector space $\set{F}^{1
\times n}$ can be viewed as a space of $1$-by-$n$ matrices. Therefore,
take a matrix $\mat{A} \in \set{F}^{m \times n}$ and vector $\v{x} \in
\set{F}^n$ and covector $\v{y} \in \set{F}^{1 \times n}$. There exists
vector $\v{q} \in \set{F}^m$ and covector $\v{r} \in \set{F}^{1 \times
m}$ such that
%
\begin{equation*}
        \v{q} = \mat{A} \v{x}
        \quad \text{ and } \quad
        \v{r} = \v{y} \mat{A}
\end{equation*}
%
In other words, the matrix $\mat{A}$ can be viewed as a function that
translates one vector into another.

\paragraph{Square Matrices as Functions:} Take $n \in \N$, vector space
$\set{F}^n$, and unitary associative algebra $\mat{F}^{n \times n}$.
Take a matrix $\mat{A} \in \mat{F}^{n \times n}$. Take vector $\v{x} \in
\set{F}^n$ and covector $\v{y} \in \set{F}^{1 \times n}$. There exists a
vector $\v{q} \in \set{F}^n$ and a covector $\v{r} \in \set{F}^{1 \times
n}$ such that
%
\begin{equation*}
        \v{q} = \mat{A} \v{x}
        \quad \text{ and } \quad
        \v{r} = \v{y} \mat{A}
\end{equation*}
%
Therefore, the matrix $\mat{A}$ can be thought of as a function which
reshapes vectors from $\set{F}^n$ (or covectors from $\set{F}^{1 \times
n}$) to other vectors in $\set{F}^n$ (or other covectors from
$\set{F}^{1 \times n}$). Additionally, there exists a scalar $a \in
\set{F}$ such that
%
\begin{equation*}
        a = \v{x}^\T \mat{A} \v{x}
\end{equation*}
%
This is known as a \emph{quadratic form}. Thus, the square matrix
$\mat{A}$ can also be thought of as a function which somehow converts
vectors from $\set{F}^n$ to scalars from $\set{F}$.

\subsection{The Unitary Associative Real Algebra}

Take $n \in \N$. Clearly, \symdef{Bnumbers.5452}{realalgebra}{$\R^{n
\times n}$}{the unitary associative real algebra} is a unitary
associative algebra over the field $\R$.

\paragraph{Symmetric Matrices:} Take $n \in \N$ and matrix $\mat{A} \in
\R^{n \times n}$. To say that $\mat{A}$ is a \emph{symmetric (real)
matrix} means that $\mat{A} = \mat{A}^\T$.

\subsection{Vector Derivatives: Gradients and Hessians}

For $n \in \N$ and $n$-dimensional vector $\v{x} \in \R^n$, the
$n$-dimensional operator vector $\nabla_{\v{x}}$ is
%
\begin{equation*}
        \nabla_{\v{x}} 
        \triangleq 
        \begin{bmatrix}
                \frac{ \partial }{ \partial x_1 }\\
                \frac{ \partial }{ \partial x_2 }\\
                \vdots\\
                \frac{ \partial }{ \partial x_n }
        \end{bmatrix}
\end{equation*}
%
Take $n \in \N$ and function $f: \set{D} \mapsto \R$ where $\set{D}
\subseteq \R^n$.  The
\symdef[\emph{gradient}]{Hvectors.5}{gradient}{$\nabla_{\v{x}}
f(\v{x})$}{the gradient vector of function $f$ at $\v{x}$} of function
$f$ at $\v{x}$ is
%
\begin{equation*}
        \nabla_{\v{x}} f(\v{x}) 
        \triangleq 
        \begin{bmatrix}
                \frac{ \partial f(\v{x}) }{ \partial x_1 }\\
                \frac{ \partial f(\v{x}) }{ \partial x_2 }\\
                \vdots\\
                \frac{ \partial f(\v{x}) }{ \partial x_n }
        \end{bmatrix}
\end{equation*}
%
That is, this a vector of the $n$ first partial derivatives of function
$f$.
%
\begin{itemize}
        \item If every partial derivative that makes up the gradient
                exists for all $x \in \set{D}$ then $f$ is said to be
                \emph{differentiable}.
        \item If every partial derivative that makes up the gradient
                exists and is continuous for all $x \in \set{D}$ then
                $f$ is said to be \emph{continuously differentiable}.
\end{itemize}
%
The $n$-by-$n$ operator matrix $\nabla^2_{\v{x}\v{x}} \triangleq
\nabla_{\v{x}} \nabla^\T_{\v{x}}$, and the
\symdef[\emph{Hessian}]{Hvectors.51}{hessian}{$\nabla^2_{\v{x}\v{x}}
f(\v{x})$}{the Hessian matrix of function $f$ at point $\v{x}$} of
function $f$ at $\v{x}$ is 
%
\begin{equation*}
        \nabla^2_{\v{x}\v{x}} f(\v{x})
        \triangleq
        \begin{bmatrix}
                \frac{ \partial^2 f(\v{x})}{ \partial x_1 \partial x_1 }
                &
                \frac{ \partial^2 f(\v{x})}{ \partial x_1 \partial x_2 }
                &
                \cdots
                &
                \frac{ \partial^2 f(\v{x})}{ \partial x_1 \partial x_n }
                \\
                \frac{ \partial^2 f(\v{x})}{ \partial x_2 \partial x_1 }
                &
                \frac{ \partial^2 f(\v{x})}{ \partial x_2 \partial x_2 }
                &
                \cdots
                &
                \frac{ \partial^2 f(\v{x})}{ \partial x_2 \partial x_n }
                \\
                \vdots & \vdots & \ddots & \vdots\\
                \frac{ \partial^2 f(\v{x})}{ \partial x_n \partial x_1 }
                &
                \frac{ \partial^2 f(\v{x})}{ \partial x_n \partial x_2 }
                &
                \cdots
                &
                \frac{ \partial^2 f(\v{x})}{ \partial x_n \partial x_n }
        \end{bmatrix}
\end{equation*}
%
That is, this is a matrix of the $n^2$ second partial derivatives of
function $f$. Note that if the function $f$ is continuous then its
Hessian matrix is symmetric.
%
\begin{itemize}
        \item If every partial derivative that makes up the Hessian
                exists for all $x \in \set{D}$ then $f$ is said to be
                \emph{twice differentiable}.
        \item If every partial derivative that makes up the Hessian
                exists and is continuous for all $x \in \set{D}$ then
                $f$ is said to be \emph{twice continuously
                differentiable}.
\end{itemize}

\subsection{Euclidean Convexity}
\label{app:math_euclidean_convexity}

Take $n \in \N$ and the Euclidean space $\R^n$. Also take a set $\set{X}
\subseteq \R^n$. The set $\set{X}$ is said to be \emph{convex (over
$\R^n$)} if 
%
\begin{equation*}
        t \v{x} + (1-t) \v{y} \in \set{X}
\end{equation*}
%
for all $\v{x},\v{y} \in \set{X}$ with $\v{x} \neq \v{y}$ and all $t \in
(0,1)$.

\paragraph{Convex Sets of Scalars:} Consider the Euclidean space $\R$
(\ie, $\R^n$ with $n=1$). It is easy to show that all of the convex sets
of $\R$ take the form
%
\begin{equation*}
        [a,b] 
        \text{ or }
        (a,b]
        \text{ or }
        [a,b)
        \text{ or }
        (a,b)
\end{equation*}
%
or
%
\begin{equation*}
        [a,\infty)
        \text{ or }
        (a,\infty)
        \text{ or }
        (-\infty,b]
        \text{ or }
        (-\infty,b)
\end{equation*}
%
or
%
\begin{equation*}
        (-\infty,\infty)
\end{equation*}
%
for all $a,b \in \R$. Therefore, all intervals of $\R$ are convex sets.
In fact, $\R$ is trivially a convex set.

\paragraph{Cartesian Products of Convex Sets:} The Cartesian product of
convex sets is convex. For example, for $a,b,c,d \in \R$, the set
%
\begin{equation*}
        [a,b] \times (c,\infty) \times (-\infty,d]
\end{equation*}
%
is convex subset of $\R^3$ because it is the Cartesian product of three
convex sets of $\R$ (\ie, intervals of $\R$). Clearly, $\R^n$ is a
convex set for all $n \in \N$ since $\R$ is trivially a convex set.

\paragraph{Functions on Convex Sets:} Take $n \in \N$ and a function $f:
\set{E} \mapsto \R$ where $\set{E} \subseteq \R^n$ is a convex set.
Take some $\v{x}^* \in \set{E}$ and assume that there exists a
$\varepsilon \in \R_{>0}$ such that
%
\begin{equation*}
        f(\v{x}^*) \leq f(\v{y})
        \text{ for all }
        \v{y} \in \set{E} \setdiff \{\v{x}^*\}
        \text{ with }
        \| \v{x}^* - \v{y} \| < \varepsilon
\end{equation*}
%
In this case, $\v{x}^*$ is called a \emph{local minimum} of set $f$. If
$\v{x}^*$ is a local minimum then
%
\begin{equation}
        ( \nabla_{\v{x}} f(\v{x}^*) )^\T (\v{x} - \v{x}^*) \geq 0
        \label{eq:convex_function_if}
\end{equation}
%
for all $\v{x} \in \set{E}$. Therefore, this is a \emph{necessary
condition} for a point to be a local minimum of a function over a convex
set.

\paragraph{Convex Functions:} Take $n \in \N$ and a function $f: \set{D}
\mapsto \R$ where $\set{D} \subseteq \R^n$. Also take a convex set
$\set{E} \subseteq \set{D}$. To say that the function $f$ is
\emph{convex over (the convex set) $\set{E}$} means that
%
\begin{equation*}
        f(t \v{x} + (1-t) \v{y} )
        \leq
        t f(\v{x}) + (1-t) f(\v{y})
\end{equation*}
%
for all $\v{x},\v{y} \in \set{E}$ with $\v{x} \neq \v{y}$ and all $t \in
(0,1)$. To say that the function $f$ is \emph{strictly convex over (the
convex set) $\set{E}$} means that
%
\begin{equation*}
        f(t \v{x} + (1-t) \v{y} )
        <
        t f(\v{x}) + (1-t) f(\v{y})
\end{equation*}
%
for all $\v{x},\v{y} \in \set{E}$ with $\v{x} \neq \v{y}$ and all $t \in
(0,1)$. Take function $f$ defined above to be convex over $\set{E}$.
Note the following statements.
%
\begin{itemize}
        \item If $\set{D} = \set{E}$ then $f$ is simply called
                \emph{convex}. Similarly, when this is the case, all
                references to $\set{E}$ below may be omitted. Therefore,
                the restriction of $f$ to $\set{E}$ (\ie, $f|_\set{E}$)
                may be simply called convex.
        \item If $f^*$ is defined such that $f^*(\v{x}) \triangleq
                -f(\v{x})$ for all $\v{x} \in \set{D}$ then $f^*$ is
                called a \emph{concave} function over (the convex set)
                $\set{E}$. Take such $f^*$. If $f^*$ is also convex over
                $\set{E}$ then $f$ and $f^*$ must both be \emph{affine}
                functions. That is, there exists some $\v{a} \in \R^n$
                and some $b \in \R$ such that 
                %
                \begin{equation*}
                        f(\v{x}) = \v{a}^\T \v{x} + b
                        \quad \text{ and } \quad
                        f^*(\v{x}) = -\v{a}^\T \v{x} - b
                \end{equation*}
                %
                for all $\v{x} \in \set{D}$. An affine function defined
                over a convex set is always both convex and concave over
                that convex set.
        \item Assume that there exists some $\varepsilon \in \R_{>0}$
                and some $\v{x}^* \in \set{E}$ such that 
                %
                \begin{equation*}
                        f(\v{x}^*) \leq f(\v{y}) 
                        \text{ for all } 
                        \v{y} \in \set{E} \setdiff \{\v{x}^*\}
                        \text{ with } 
                        \|\v{x}^*-\v{y}\|_2 < \varepsilon
                \end{equation*}
                %
                Take such a $\v{x}^*$. In this case, $\v{x}^*$ is called
                a \emph{local minimum} of function $f$. However, since
                function $f$ is convex over $\set{E}$, it is the case
                that
                % 
                \begin{equation*}
                        f(\v{x}^*) \leq f(\v{y}) 
                        \text{ for all } 
                        \v{y} \in \set{E} \setdiff \{\v{x}^*\}
                \end{equation*} 
                %
                In other words, $\v{x}^*$ can be called a \emph{global
                minimum} of function $f$ over $\set{E}$. 
        \item Assume that there exists some $\varepsilon \in \R_{>0}$
                and some $\v{x}^* \in \set{E}$ such that 
                %
                \begin{equation*}
                        f(\v{x}^*) < f(\v{y}) 
                        \text{ for all } 
                        \v{y} \in \set{E} \setdiff \{\v{x}^*\}
                        \text{ with } 
                        \|\v{x}^*-\v{y}\|_2 < \varepsilon
                \end{equation*}
                %
                Take such a $\v{x}^*$. In this case, $\v{x}^*$ is called
                a \emph{strict local minimum} of function $f$. Clearly,
                $\v{x}^*$ is also a local minimum of function $f$.
                However, since function $f$ is convex over $\set{E}$, it
                is the case that
                % 
                \begin{equation*}
                        f(\v{x}^*) < f(\v{y}) 
                        \text{ for all } 
                        \v{y} \in \set{E} \setdiff \{\v{x}^*\}
                \end{equation*} 
                %
                In other words, $\v{x}^*$ can be called a \emph{strict
                global minimum} of function $f$ over $\set{E}$ (and, of
                course, a global minimum of function $f$ over $\set{E}$
                as well).
        \item The point $\v{x}^* \in \set{E}$ is a local minimum of $f$
                over convex set $\set{E}$ if and only if
                %
                \begin{equation}
                        ( \nabla_{\v{x}} f(\v{x}^*) )^\T 
                        (\v{x} - \v{x}^*) 
                        \geq 
                        0
                        \label{eq:convex_function_iff}
                \end{equation}
                %
                for all $\v{x} \in \set{E}$. This condition is both
                necessary \emph{and sufficient} for a local minimum
                because $\set{E}$ is a convex set \emph{and} $f$ is a convex
                function over $\set{E}$. This condition is necessary for
                all functions defined over convex sets. However, it
                becomes sufficient when those functions are also convex.
        \item The point $\v{x}^* \in \interior(\set{E})$ is a local
                minimum of $f$ over convex set $\set{E}$ if and only if
                %
                \begin{equation*}
                        \nabla_{\v{x}} f(\v{x}^*) = 0
                \end{equation*}
                %
                This is equivalent to the condition in
                \longref{eq:convex_function_iff} when $\v{x}^*$ is an
                element of the \emph{interior} of convex set $\set{E}$.
                Again, note that this condition is both necessary
                \emph{and sufficient} for a point to be a local minimum
                of $f$ over convex set $\set{E}$.
                %
        \item If $f$ is strictly convex over $\set{E}$, not only is
                every local minimum a global minimum, but there can be
                at most one global minimum of $f$. Therefore, if a local
                minimum has been found, it must be the global minimum of
                function $f$.
\end{itemize}

\paragraph{Sufficiency Conditions for Convexity:} Take $n \in \N$ and a
function $f: \set{D} \mapsto \R$ where $\set{D} \subseteq \R^n$. Also
take a convex set $\set{E} \subseteq \set{D}$, and assume that $f$ is
twice continuously differentiable. If it is the case that
%
\begin{equation*}
        \v{\Delta}^\T \nabla^2_{\v{x}\v{x}} f(\v{x}) \v{\Delta} \geq 0
        \text{ for all }
        \v{\Delta} \in \R^n
        \text{ and }
        \v{x} \in \set{E}
\end{equation*}
%
then $f$ is convex over $\set{E}$. Additionally, if it is the case that
%
\begin{equation*}
        \v{\Delta}^\T \nabla^2_{\v{x}\v{x}} f(\v{x}) \v{\Delta} > 0
        \text{ for all }
        \v{\Delta} \in \R^n \setdiff \{0\}
        \text{ and }
        \v{x} \in \set{E}
\end{equation*}
%
then $f$ is strictly convex over $\set{E}$.

\section{Measure Theory and Integration}
\label{app:math_measure}

Measure theory provides a method for measuring the size of sets. Our
treatment of measure theory will be relatively sparse; however, it is
necessary in order to discuss probability, the subject of
\longref{app:math_probability}. Our definitions are based on the ones
given by \citet{Krantz01} and \citet{Rudin76}. \Citet{Halmos50} gives a
more complete treatment.

\subsection{Sigma Algebras}

Take a set $\set{U}$ and a set of sets $\setset{S} \subseteq
\Pow(\set{U})$ (\ie, $\setset{S}$ is a set of subsets of $\set{U}$).
Assume that $\setset{S}$ is such that
%
\begin{enumerate}[(i)]
        \item $\setset{S} \neq \emptyset$
                \label{item:sigma_nonempty}
        \item if $\set{X} \in \setset{S}$ then $\set{U} \setdiff
                \set{X} \in \setset{S}$
                \label{item:sigma_closed_complement}
        \item for a sequence $(\set{X}_n)$ where $\set{X}_n \in
                \setset{S}$ for all $n \in \N$, $\bigcup \{ \set{X}_i :
                i \in \N \} \in \setset{S}$
                \label{item:sigma_closed_countable_union}
\end{enumerate}
%
Property (\shortref{item:sigma_nonempty}) states that $\setset{S}$ is
nonempty. Property (\shortref{item:sigma_closed_complement}) states that
$\setset{S}$ is closed under complements. Property
(\shortref{item:sigma_closed_countable_union}) states that $\setset{S}$
is closed under countable unions. It is clear that
%
\begin{itemize}
        \item $\emptyset \in \setset{S}$
        \item $\set{U} \in \setset{S}$
        \item for a sequence $(\set{X}_n)$ where $\set{X}_n \in
                \setset{S}$ for all $n \in \N$, $\bigcap \{ \set{X}_i :
                i \in \N \} \in \setset{S}$
        \item $(\setset{S}, {\cap}, {\cup}, {{}^c}, \set{X}, \emptyset)$
                is an algebra of sets (\ie, $\setset{S}$ is an algebra
                over $\set{U}$ and so $(\set{U},\setset{S})$ is a field
                of sets)
\end{itemize}
%
Thus, $\setset{S}$ is called a \emph{$\sigma$-algebra} (\ie, a
\emph{sigma algebra}) or a \emph{$\sigma$-ring} (\ie, a \emph{sigma
ring}) and the field of sets $(\set{U},\setset{S})$ is called a
\emph{$\sigma$-field} (\ie, a \emph{sigma field}). Of course, $(\set{U},
\Pow(\set{U}))$ is trivially a $\sigma$-field. In particular, take the
finite set $\set{U} = \{ a,b,c,d \}$. Some possible $\sigma$-algebras
for $\set{U}$ include
%
\begin{itemize}
        \item $\{\emptyset, \{ a, b, c, d \}\}$
        \item $\{\emptyset, \{ a, b \}, \{ c, d \}, \{ a, b, c, d \}\}$
        \item $\{\emptyset, \{ a, c \}, \{ b, d \}, \{ a, b, c, d \}\}$
        \item $\{\emptyset, \{ a, d \}, \{ b, c \}, \{ a, b, c, d \}\}$
        \item $\{\emptyset, \{ a \}, \{ b, c, d \}, \{ a, b, c, d \}\}$
        \item $\{\emptyset, \{ a, b, c \}, \{ d \}, \{ a, b, c, d \}\}$
\end{itemize}
%
However, there are many more (in fact, there are $16$ total for this
four element set). All are closed under complements and countable
unions. Because of this, any $\sigma$-algebra that includes all of the
singleton sets is necessarily the power set. Of course, all include the
empty set and the universal set.

\paragraph{Sigma Notation:} Very often $\sigma$-algebras will be denoted
with the Greek uppercase letter $\Sigma$. It is not a coincidence that
this is similar to the summation symbol $\sum$. Recall that all
$\sigma$-algebras are closed under countable unions (\ie, property
(\shortref{item:sigma_closed_countable_union}) above). Later, we will
introduce a \emph{measure} which is a function which maps sets from
$\sigma$-algebras to positive extended real numbers. In other words,
measures quantify some notion of \emph{size} to sets. Specifically
because $\sigma$-algebras are closed under countable unions, measures
are \emph{countably additive}. That is, the union of a sequence of sets
from a $\sigma$-algebra has a size equal to the sum of the size of each
of its elements. This relationship to summation is the reason why
$\sigma$-algebras are denoted with $\Sigma$; of course, this is also the
reason why they are given they are called \emph{sigma} algebras.

\subsection{The Borel Algebra}

Take a topological space $(\set{U},\setset{T})$. Recall that
$\setset{T}$ is by definition the set of all of open sets in the
topological space. Assume that there exists a $\sigma$-algebra
$\setset{B}$ of $\set{U}$ such that
%
\begin{itemize}
        \item $\setset{T} \subseteq \setset{B}$
        \item for any $\sigma$-algebra $\setset{A}$ of $\set{U}$ such
                that $\setset{T} \subseteq \setset{A}$, $\setset{A} \cap
                \setset{B} = \setset{B}$
\end{itemize}
%
In other words, $\setset{B}$ is the smallest $\sigma$-algebra that
includes all open sets of $\set{U}$. Because $\Pow(\set{U})$ is a
$\sigma$-algebra of $\set{U}$ and $\setset{T} \subseteq \Pow(\set{U})$
then $\setset{B}$ must exist. In this case, $\setset{B}$ is the
\emph{Borel algebra} of $\set{U}$ and will be denoted
\symdef{Iprob.3}{borelalgebra}{$\Borel(\set{U})$}{the Borel algebra of
set $\set{U}$}; that is,
%
\begin{equation*}
        \Borel(\set{U}) \triangleq \setset{B}
\end{equation*}
%
Any subset $\setset{S} \subseteq \Borel(\set{U})$ is called a
\emph{Borel subset} and any set $\set{E} \in \Borel(\set{U})$ is called
a \emph{Borel set}. Additionally, $(\set{U},\Borel(\set{U}))$ is called
a \emph{Borel $\sigma$-field} or simply a \emph{Borel field}.

\paragraph{Generalized Construction of Borel Algebra:} Take a
topological space $(\set{U},\setset{T})$. Define $\setset{B}_0
\triangleq \setset{T}$. That is, $\setset{B}_0$ is a set of all of the
open sets of $\set{U}$. Now, define $\setset{B}_n$ for all $n \in \N$
such that
%
\begin{enumerate}[(i)]
        \item $\setset{B}_{n-1} \subseteq \setset{B}_n$ 
        \item for all $\set{B} \in \setset{B}_{n-1}$, $\set{U} \setdiff
                \set{B} \in \setset{B}_n$ (\ie, $\set{B}^c \in
                \setset{B}_n$)
        \item for any $\setset{S} \subseteq \setset{B}_{n-1}$, $\bigcap
                \setset{S} \in \setset{B}_n$
        \item for any $\setset{S} \subseteq \setset{B}_{n-1}$, $\bigcup
                \setset{S} \in \setset{B}_n$
\end{enumerate}
%
The Borel algebra $\Borel(\set{U})$ is the set that results from
continuing this process \adinfinitum{}. In other words,
$\Borel(\set{U})$ can be viewed as $\setset{B}_\infty$. It is the case
that
%
\begin{itemize}
        \item $\setset{T} \subseteq \Borel(\set{U})$
        \item $\Borel(\set{U})$ is a $\sigma$-algebra of $\set{U}$ (\ie,
                $(\set{U},\Borel(\set{U}))$ is a $\sigma$-field)
        \item for any $\sigma$-algebra $\setset{A}$ of $\set{U}$ such
                that $\setset{T} \subseteq \setset{A}$, $\Borel(\set{U})
                \cap \setset{A} = \Borel(\set{U})$ (\ie,
                $\Borel(\set{U}) \subseteq \setset{A}$)
\end{itemize}
%
These are the traits desired to call $\Borel(\set{U})$ the Borel algebra
of $\set{U}$.

\paragraph{Construction of Borel Algebra of the Extended Reals:} Take
the extended real topological space $\extR$. Define $\setset{B}_0$ as
all intervals of the reals. That is,
%
\begin{equation*}
        \setset{B}_0
        \triangleq
        \setset{B}_{00} \cup
        \setset{B}_{01} \cup
        \setset{B}_{10} \cup
        \setset{B}_{11}
\end{equation*}
%
where
%
\begin{align*}
        \setset{B}_{00} 
        &\triangleq \{ (a,b) : a,b \in \extR, a \leq b \}\\
        \setset{B}_{01} 
        &\triangleq \{ (a,b] : a,b \in \extR, a \leq b \}\\
        \setset{B}_{10} 
        &\triangleq \{ [a,b) : a,b \in \extR, a \leq b \}\\
        \setset{B}_{11} 
        &\triangleq \{ [a,b] : a,b \in \extR, a \leq b \}
\end{align*}
%
That is, $\setset{B}_0$ is a set of all of the intervals of $\extR$.
Now, define $\setset{B}_n$ for all $n \in \N$ such that
%
\begin{enumerate}[(i)]
        \item $\setset{B}_{n-1} \subseteq \setset{B}_n$ 
        \item for all $\set{B} \in \setset{B}_{n-1}$, $\set{U} \setdiff
                \set{B} \in \setset{B}_n$ (\ie, $\set{B}^c \in
                \setset{B}_n$)
        \item for any $\setset{S} \subseteq \setset{B}_{n-1}$, $\bigcap
                \setset{S} \in \setset{B}_n$
        \item for any $\setset{S} \subseteq \setset{B}_{n-1}$, $\bigcup
                \setset{S} \in \setset{B}_n$
\end{enumerate}
%
The Borel algebra $\Borel(\extR)$ is the set that results from
continuing this process \adinfinitum{}. In other words, $\Borel(\extR)$
can be viewed as $\setset{B}_\infty$. It is the case that
%
\begin{itemize}
        \item $\setset{B}_0 \subseteq \Borel(\extR)$
        \item $\Borel(\extR)$ is a $\sigma$-algebra of $\extR$ (\ie,
                $(\extR,\Borel(\extR)$ is a $\sigma$-field)
        \item for any $\sigma$-algebra $\setset{A}$ of $\R$ such
                that $\setset{B}_0 \subseteq \setset{A}$,
                $\setset{B} \cap \setset{A} = \Borel(\extR)$ (\ie,
                $\Borel(\extR) \subseteq \setset{A}$)
\end{itemize}
%
These are the traits desired to call $\Borel(\extR)$ the Borel algebra
of $\extR$. In other words, $\Borel(\extR)$ is the smallest
$\sigma$-algebra of $\extR$ that includes all of the intervals of
$\extR$.

\paragraph{Half-Line Construction of Extended Real Borel Algebra:} It is
important to note that the Borel algebra of $\extR$ can also be said to
be the smallest $\sigma$-algebra of $\extR$ that includes intervals of
the form $[-\infty,a]$ where $a \in \extR$. Take $\set{R}$ to be the set
of these \emph{half lines}; that is,
%
\begin{equation*}
        \set{R} \triangleq \{ [-\infty,a] : a \in \extR \}
\end{equation*}
%
The Borel algebra $\Borel(\extR)$ is a $\sigma$-algebra such that
$\set{R} \subseteq \Borel(\extR)$. In fact, $\Borel(\extR) \subseteq
\setset{A}$ for any $\sigma$-algebra $\setset{A}$ of $\extR$ such that
$\set{R} \subseteq \setset{A}$. Therefore, any Borel set $\set{E} \in
\Borel(\extR)$ can be constructed with a countable number of unions,
intersections, and complements of elements from $\set{R}$ (\ie, the half
lines).

\subsection{Measures}

Take a $\sigma$-field $(\set{U},\Sigma)$. Define a function $\mu: \Sigma
\mapsto [0,\infty]$, where interval $[0,\infty] \subset \extR$. Assume
that
%
\begin{enumerate}[(i)]
        \item $\mu( \emptyset ) = 0$
                \label{item:measure_zero}
        \item for a sequence $(\set{X}_n)$ where $\set{X}_n \in \Sigma$
                for all $n \in \N$ and $\set{X}_i \cap \set{X}_j
                = \emptyset$ for all $i,j \in \N$ with $i \neq j$, it is
                the case that
                %
                \begin{equation*}
                        \mu\left( 
                        \bigcup \{ \set{X}_i : i \in \N \} 
                        \right) 
                        = 
                        \sum\limits_{i=1}^\infty 
                        \mu\left( \set{X}_i \right)
                \end{equation*}
                \label{item:measure_countable_additivity}
\end{enumerate}
%
In this case, 
%
\begin{itemize}
        \item $\mu$ is called a \emph{measure}
        \item $(\set{U},\Sigma,\mu)$ is a \emph{measure space}
        \item any set $\set{X} \in \Sigma$ is called a \emph{measurable
                set}
\end{itemize}
%
Property (\shortref{item:measure_zero}) states that the empty set has
\emph{measure zero}. Any set $\set{X} \in \Sigma$ such that
$\mu(\set{X}) = 0$ is said to have \emph{measure zero} or is said to be
a \emph{null set} or simply \emph{null}. Property
(\shortref{item:measure_countable_additivity}) is called \emph{countable
additivity}.

\paragraph{Singleton Notation:} Take $(\set{U},\Sigma,\mu)$ to be a
measure space. Take a point $x \in \set{U}$ such that the singleton set
$\{x\} \in \Sigma$. For simplicity, we will use the notation
%
\begin{equation*}
        \mu(x) \triangleq \mu( \{x\} )
\end{equation*}
%
That is, the measure of a single point is defined to be the measure of
the singleton set that includes that point.

\subsection{Measurable Functions}

Take $\sigma$-fields $(\set{X},\Sigma_\set{U})$ and 
$(\set{Y},\Sigma_\set{Y})$ and a function $f: \set{U} \mapsto \set{Y}$.
To say that the function $f$ is \emph{measurable} means that for all
$\set{B} \in \Sigma_\set{Y}$, the preimage $f^{-1}[\set{B}] \in
\Sigma_\set{U}$. Measurable functions will typically have real
codomains, and so they can be viewed as mapping sets into numbers so
that the size of the domain sets can be measured with respect to some
numerical measure. That is, measurable functions combined with measures
provide a way to quantify the size of measurable sets. This will be
explained further in \longref{app:math_lebesgue_integral}.

\paragraph{Borel measurable:} Take $\sigma$-field
$(\set{X},\Sigma_\set{X})$ and Borel field $(\set{Y},\Borel(\set{Y}))$
and a function $f: \set{X} \mapsto \set{Y}$. To say that the function
$f$ is \emph{Borel measurable} means that for all $\set{B} \in
\Borel(\set{Y})$, the preimage $f^{-1}[\set{B}] \in \Sigma_\set{X}$. 

\paragraph{Real-valued measurable function:} When a function that is
(extended) real-valued is said to be measurable, it is conventional to
assume that Borel measurability is implied. That is, for $\sigma$-field
$(\set{X},\Sigma_\set{X})$ and measurable function $f: \set{X} \mapsto
\extR$, it is usually assumed (as it will be here) that the measurable
sets of the codomain $\extR$ are the Borel sets of $\extR$ (\ie,
$\Borel(\extR)$ is the applicable $\sigma$-algebra).

\paragraph{Almost Everywhere Equivalence:} Take $\sigma$-field
$(\set{X},\Sigma_\set{X})$ and measure space
$(\set{Y},\Sigma_\set{Y},\mu)$. Also take measurable functions $f:
\set{X} \mapsto \set{Y}$ and $g: \set{X} \mapsto \set{Y}$ and a set
$\set{E} \in \Sigma_\set{X}$. To say that $f$ and $g$ are equivalent
\emph{almost everywhere on $\set{E}$} or \emph{essentially equivalent on
$\set{E}$} means that $\mu( x \in \set{E} : f(x) \neq g(x)) = 0$. That
is, two functions are equal almost everywhere if the set where they
differ is null.

\subsection{The Lebesgue Integral}
\label{app:math_lebesgue_integral}

Given a measurable function mapping measurable sets into the Borel field
of the reals, the Lebesgue integral provides a way of measuring the
volume of the image of such a set. 

\paragraph{Characteristic Function:} Take a set $\set{U}$ and subset
$\set{X} \subseteq \set{U}$. Denote the
\emph{characteristic function} or \emph{indicator function} for set
$\set{X}$ by $K_\set{X}$. Define $K_\set{X}: \set{U} \mapsto \R$ with
%
\begin{equation*}
        K_\set{X}(x) 
        \triangleq 
        \begin{cases}
                1       &\text{if } x \in \set{X}\\
                0       &\text{otherwise}
        \end{cases}
\end{equation*}
%
for all $x \in \set{U}$.

\paragraph{Simple Functions:} Take $\sigma$-field $(\set{U},\Sigma)$ and
a function $s: \set{U} \mapsto \R$. To say that $s$ is a simple function
means that $\range(f)$ is a finite set of real numbers. Assume that $s$
is a simple function. Without loss of generality, assume that $n \in \N$
and $\range(s) = \{ c_1, c_2, c_3, \dots, c_n \}$ where $c_i \in \R$ for
each $i \in \{1,2,3,\dots,n\}$. In this case, for each $i \in
\{1,2,3,\dots,n\}$, define the set $\set{X}_i$ such that
%
\begin{equation*}
        \set{X}_i
        \triangleq
        \{ x \in \set{U} : s(x) = c_i \}
\end{equation*}
%
Therefore, the function $s$ is
%
\begin{equation*}
        s(x)
        =
        \sum\limits_{i=1}^n c_i K_{\set{X}_i}(x)
\end{equation*}
%
Every simple function can be written as the sum of a finite number of
characteristic functions each multiplied by some real number. Now take
$(\set{U},\Sigma,\mu)$ to be a measure space and take a set $\set{X} \in
\Sigma$. Denote the \emph{(Lebesgue) integral} of simple function $s$
over $\set{X}$ with respect to measure $\mu$ as $\int_\set{X} s$, which
is defined by
%
\begin{equation*}
        \int_\set{X} s \total \mu
        \triangleq
        \sum\limits_{i=1}^n c_i \mu( \set{X} \cap \set{X}_i )
\end{equation*}
%
Note that it will be consistent to define $0 \times \infty = \infty
\times 0 = 0$ for the integral, as is commonly done in measure theory.
Additionally, sometimes the notation
%
\begin{equation*}
        \int_\set{X} s(x) \total \mu(x)
        \triangleq
        \int_\set{X} s \total \mu
\end{equation*}
%
will be used instead. There is little value to this notation here;
however, when functions of multiple variables are defined, it can be
a helpful way to avoid confusion when functions have multiple variables.

\paragraph{The Integral:} Take measure space $(\set{U},\Sigma,\mu)$, a
measurable function $g: \set{U} \mapsto \extR$, and a set $\set{X} \in
\Sigma$. Assume that for all $x \in \set{U}$, $g(x) \geq 0$. That is,
assume that $g$ is non-negative. Define the \emph{(Lebesgue) integral}
of measurable non-negative function $g$ over $\set{X}$ with respect to
measure $\mu$ by
%
\begin{equation*}
        \int_\set{X} g \total \mu
        \triangleq
        \sup
        \left\{ 
        \int_\set{X} s \total \mu 
        : 
        s \text{ is a simple function with } 0 \leq s \leq g
        \right\}
\end{equation*}
%
where $0 \leq s \leq g$ indicates that for all $x \in \set{X}$, $0 \leq
s(x) \leq g(x)$ and the integral following the supremum was defined
above for simple functions. Note that if $g$ is simple, this agrees with
the definition already given for simple functions. Now take a measurable
function $f: \set{U} \mapsto \extR$ and define non-negative measurable
functions $f^+: \set{U} \mapsto \extR$ and $f^-: \set{U} \mapsto \extR$
by
%
\begin{equation*}
        f^+(x) \triangleq \max \{ f(x), 0 \}
        \quad \text{ and } \quad
        f^-(x) \triangleq -\min \{ f(x), 0 \}
\end{equation*}
%
Finally, define the \emph{(Lebesgue) integral} of measurable function
$f$ over $\set{X}$ with respect to measure $\mu$ by
%
\begin{equation*}
        \int_\set{X} f \total \mu
        \triangleq
        \int_\set{X} f^+ \total \mu
        -
        \int_\set{X} f^- \total \mu
\end{equation*}
%
where the two integrals on the right were defined above for non-negative
measurable functions. Note that extended real arithmetic should be used
to evaluate this integral. It may be that $\int_\set{X} f \total \mu$
%
\begin{itemize}
        \item exists and is finite
        \item exists and is $\infty$
        \item exists and is $-\infty$
        \item does not exist
\end{itemize}
%
If the integral is finite, then $f$ is said to be \emph{Lebesgue
measurable with respect to measure $\mu$}. Again, sometimes the
alternate notation 
%
\begin{equation*}
        \int_\set{X} f(x) \total \mu(x)
        \triangleq
        \int_\set{X} f \total \mu
\end{equation*}
%
will be used. This notation adds little value here. However, it will be
useful when functions of multiple variables are used. See
\longref{app:math_convolution} for an example.

\paragraph{Useful Properties of Integrals:} Take measure space
$(\set{U},\Sigma,\mu)$ and set $\set{X} \in \Sigma$. Note that
%
\begin{equation*}
        \mu(\set{X}) = \int_\set{X} \total \mu
\end{equation*}
%
Now take a measurable function $f: \set{U} \mapsto \extR$. If
$\mu(\set{X}) = 0$ then
%
\begin{equation*}
        \int_\set{X} f \total \mu = 0
\end{equation*}
%
Now take an additional measurable function $g: \set{U} \mapsto \extR$
and assume that $f$ and $g$ are equal almost everywhere on $\set{X}$. In
that case,
%
\begin{equation*}
        \int_\set{E} f \total \mu = \int_\set{E} g \total \mu
\end{equation*}
%
for all $\set{E} \subseteq \set{X}$ where the integrals exist.

\subsection{The Lebesgue Measure}

Note that any measure can be used with the Lebesgue integral. However,
it is common to use the Lebesgue measure, denoted $m$. Take some
$\set{X} \subseteq \extR$ an define the outer measure $m^*: \Pow(\extR)
\mapsto \extR$ as
%
\begin{equation*}
        m^*( \set{X} )
        \triangleq
        \inf
        \left\{
        \sum_{i=1}^\infty ( b_i - a_i )
        :
        \set{X}
        \subseteq
        \bigcup
        \{ [a_i,b_i] : i \in \N \}
        \right\}
\end{equation*}
%
In other words, $m^*$ is the greatest lower bound of the sums of the
lengths of countable unions of intervals that cover $\set{X}$. This is
called an \emph{outer measure} of the lengths of intervals. To say that
$\set{X}$ is \emph{Lebesgue measurable} means that
%
\begin{equation*}
        m^*( \set{E} )
        =
        m^*( \set{E} \cap \set{X} ) + m^*( \set{X} \setdiff \set{E} )
\end{equation*}
%
for all $\set{E} \in \Pow(\extR)$. Define the set $\setset{L}$ by
%
\begin{equation*}
        \setset{L}
        \triangleq
        \{ 
        \set{X} \in \Pow(\extR) : 
        m^*( \set{E} )
        =
        m^*( \set{E} \cap \set{X} ) + m^*( \set{X} \setdiff \set{E} )
        \text{ for all } \set{E} \in \Pow(\extR)
        \}
\end{equation*}
%
this is the set of all Lebesgue measurable sets. Note that 
%
\begin{itemize}
        \item the set $\setset{L}$ is a $\sigma$-algebra on $\extR$
        \item $(\extR,\setset{L})$ is a $\sigma$-field
        \item all of the Borel sets of extended reals are Lebesgue
                measurable (\ie, $\Borel(\extR) \subseteq \setset{L}$)
\end{itemize}
%
The \emph{Lebesgue measure} $m: \setset{L} \mapsto \extR$ is defined to
be
%
\begin{equation*}
        m( \set{X} ) = m^*(\set{X})
\end{equation*}
%
for all $\set{X} \in \setset{L}$. Note that for all $a,b \in \extR$ with
$a \leq b$,
%
\begin{equation*}
        m( (a,b) ) =
        m( [a,b) ) =
        m( (a,b] ) =
        m( [a,b] ) = b - a
\end{equation*}
%
That is, all intervals with the same endpoints have equal measure, and
that measure is the difference in the endpoints. Additionally, for all
$a \in \extR$,
%
\begin{equation*}
        m( \{a\} ) = 0
\end{equation*}
%
That is, all singleton sets have zero measure. Note that a countable set
of points is simply a countable union of singletons. Since measures are
countably additive and singletons have measure zero, then any countable
set of points is also going to have measure zero. That is, for all
sequences $(x_n)$ where $x_i \in \extR$ for all $i \in \N$,
%
\begin{equation*}
        m( \{ x_i : i \in \N \} ) = 0
\end{equation*}
%
In general, to say that a subset $\set{E} \in \setset{L}$ is
\emph{Lebesgue null} means that $m(\set{E})=0$, where $m$ is the
Lebesgue measure. Thus, all countable subsets of $\extR$ are Lebesgue
null.

\paragraph{Implied Measure Notation:} Take measure space
$(\set{U},\Sigma,\mu)$, a measurable function $f: \set{U} \mapsto
\extR$, and a set $\set{X} \in \Sigma$. The Lebesgue integral of $f$
over $\set{X}$ with respect to measure $m$ (\ie, the Lebesgue measure)
would typically be denoted
%
\begin{equation*}
        \int_\set{X} f \total m
\end{equation*}
%
However, because the Lebesgue measure is the standard measure for the
Lebesgue integral, sometimes the notation
%
\begin{equation*}
        \int_\set{X} f(x) \total x
        \triangleq
        \int_\set{X} f(x) \total \mu(x)
        =
        \int_\set{X} f \total m
\end{equation*}
%
is used instead. Additionally, since $\set{X}$ can be represented as a
countable number of unions of other elements of the Borel algebra on
$\extR$, it is very often that the integral will be taken over an
interval. \symdef[]{Iprob.4}{integral}{$\int_a^b f(x) \total x$}{the
Lebesgue integral of function $f$ over interval $[a,b] \subset \extR$
with respect to the Lebesgue measure}Thus, when $\set{X}$ is an interval
of $\extR$ with endpoints there is a $a,b \in \extR$ with $a \leq b$,
%

\begin{equation*}
        \int_a^b f(x) \total x
        \triangleq
        \int_\set{X} f(x) \total x
\end{equation*}
%
This is the familiar form of the integral.

\subsection{Dirac Delta Measure}

Take $\sigma$-field $(\set{U},\Sigma)$ and a point $a \in \set{U}$.
Define the function (which is indexed by $a$)
\symdef[]{Iprob.5}{diracdelta}{$\delta_a(\set{E})$}{Dirac delta measure
of set $\set{E}$ at point $a$ (\eg, $f(0) = \linebreak[4] \int_{-1}^1
f(x) \delta_0(\{x\}) \total
x$)}\symdef[]{Iprob.50}{diracdeltasimp}{$\delta(x-p)$}{Simplified Dirac
delta measure notation (\ie, $\delta(x-p) \triangleq
\delta_p(\{x\})$)}$\delta_a: \Sigma \mapsto \extR$ by
%
\begin{equation*}
        \delta_a( \set{X} )
        \triangleq
        \begin{cases}
                1 &\text{if } a \in \set{X}\\
                0 &\text{otherwise}
        \end{cases}
\end{equation*}
%
It is easy to verify that $\delta_a$ is a measure for
$(\set{U},\Sigma)$, and so $(\set{U},\Sigma,\delta_a)$ forms a measure
space. The measure $\delta_a$ is called the \emph{Dirac delta measure at
$a$}.

\paragraph{Integral Mass Notation:} Take a point $p \in \R$. Recall that
$\{p\}$ has Lebesgue measure $0$. That is, $\{p\}$ is Lebesgue null and
therefore has no Lebesgue mass. However, $\{p\}$ has measure $1$ with
respect to the Dirac measure at $p$. Therefore, Dirac measures are often
added to Lebesgue measures in order to add \emph{point mass}. To
simplify notation, it is conventional to take
%
\begin{equation}
        \int_a^b f(x) \delta_p(\{x\}) \total x
        \triangleq 
        \int_{[a,b]} f \total \delta_p
        =
        \begin{cases}
                f(p) &\text{if } p \in [a,b]\\
                0 &\text{otherwise}
        \end{cases}
        \label{eq:dirac_convention}
\end{equation}
%
This way, the Dirac delta function can be viewed as forcing mass into
the Lebesgue measure on sets that are typically Lebesgue null. 

\paragraph{Singleton Notation for Reals:} Take a $\sigma$-field
$(\R,\Sigma)$. Take a point $x \in \R$ such that the singleton set
$\{x\} \in \R$. For simplicity, some use the notation
%
\begin{equation*}
        \delta(x)
        \triangleq
        \delta_0( \{x\} )
\end{equation*}
%
Note that for a point $a \in \R$,
%
\begin{equation*}
        \delta(x-a)
        =
        \delta_0( \{x-a\} )
        =
        \delta_a( \{x\} )
\end{equation*}
%
These notations simplify the convention shown in
\longref{eq:dirac_convention}. That is,
%
\begin{align*}
        \int_a^b f(x) \delta(x-p) \total x
        &=
        \int_a^b f(x) \delta_p(\{x\}) \total x\\
        &=
        \begin{cases}
                f(p) &\text{if } p \in [a,b]\\
                0 &\text{otherwise}
        \end{cases}
\end{align*}

\subsection{Convolution}
\label{app:math_convolution}

Take the $\sigma$-field $(\set{X},\Sigma)$ and the measure $\mu: \Sigma
\mapsto [0,\infty]$. Assume that $(\set{X},{+})$ is a group where $+$ is
an addition operator (and thus the $-$ operator notation is defined as
the addition of the additive inverse). Also take subset $\set{D}
\subseteq \set{X}$ and the measurable functions $f: \set{D} \mapsto
\extR$ and $g: \set{D} \mapsto \extR$. From function $g$, define
function $g^*: \extR \mapsto \extR$ by
%
\begin{equation*}
        g^*(t) \triangleq \begin{cases}
                g(t) &\text{if } t \in \set{D}\\
                0 &\text{if } t \in \extR \setdiff \set{D}
        \end{cases}
\end{equation*}
%
for all $t \in \extR$. \symdef[]{Iprob.41}{convolution}{$f *
g$}{convolution of function $f$ with function $g$ (\ie, $(f * g)(t)
\triangleq \int_{-\infty}^\infty f(\tau) g(t-\tau) \total \tau$)}Define
the \emph{convolution} operator ${*}: \extR^\set{D} \times \extR^\set{D}
\mapsto \extR^\set{D}$ such that
%
\begin{equation*}
        (f * g)(t)
        \triangleq
        \int_\set{D} f^*(\tau) g^*(t - \tau) \total \mu(\tau)
\end{equation*}
%
for all $t \in \set{D}$. Therefore, for any $f$ and $g$, the function $f
* g: \set{D} \mapsto \extR$ can be defined using the convolution
definition above. Now, take additional function $h: \set{D} \mapsto
\extR$ and real numbers $a,b \in \extR$. For these $f,g,h$ and $a,b$,
the convolution operator has a number of important properties.
%
\begin{description}
        \item\emph{Commutativity:} $f * g = g * f$
        \item\emph{Associativity:} $f * (g * h) = (f * g) * h$
        \item\emph{Linearity in the First Argument:} $(af + bg) * h = 
                a(f * h) + b(g * h)$
        \item\emph{Linearity in the Second Argument:} $f * (ag + bh) =
                a(f * g) + b(f * h)$
\end{description}
%
Clearly, convolution is a bilinear operation.

\section{Probability, Random Variables, and Random Vectors}
\label{app:math_probability}

Probability is a specialization of measure theory, the subject of
\longref{app:math_measure}. \Citet{PapoulisPillai02} and
\citet{Viniotis98} provide good references on the theory of probability,
random variables, and random processes. The application of probability
is an attempt to model \emph{randomness} or extreme complexity. That is,
when a parameter is known with complete certainty, it is said to be
\emph{deterministic} and is not usually cast in a probabilistic
framework. However, when the values of a parameter are uncertain but
come from a set of possible values, the parameter is said to be
\emph{stochastic}.

\subsection{Probability Measures and Probability Spaces}

Take $\sigma$-field $(\set{U},\Sigma)$. Define a function $\Pr: \Sigma
\mapsto [0,\infty]$, where interval $[0,\infty] \subset \extR$. Assume
that
%
\begin{enumerate}[(i)]
        \item for all $\set{E} \in \Sigma$, $\Pr(\set{E}) \geq 0$
                \label{item:prob_nonnegtaive}
        \item $\Pr( \set{U} ) = 1$
                \label{item:prob_certain_event}
        \item for a sequence $(\set{X}_n)$ where $\set{X}_n \in \Sigma$
                for all $n \in \N$ and $\set{X}_i \cap \set{X}_j
                = \emptyset$ for all $i,j \in \N$ with $i \neq j$, it is
                the case that
                %
                \begin{equation*}
                        \Pr\left( 
                        \bigcup \{ \set{X}_i : i \in \N \} 
                        \right) 
                        = 
                        \sum\limits_{i=1}^\infty 
                        \Pr\left( \set{X}_i \right)
                \end{equation*}
                \label{item:prob_countable_additivity}
\end{enumerate}
%
Take $\set{E} \in \Sigma$ and the sequence
%
\begin{equation*}
        (\set{X}_n)
        \triangleq 
        (\set{E},\emptyset,\emptyset,\emptyset,\emptyset,\dots)
\end{equation*}
%
Clearly, for any $\set{X}_i$ and $\set{X}_j$ with $i \neq j$, $\set{X}_i
\cap \set{X}_j = \emptyset$. Therefore, by the definition of $\Pr$,
%
\begin{equation*}
        \Pr\left( 
        \bigcup \{ \set{X}_i : i \in \N \} 
        \right) 
        = 
        \sum\limits_{i=1}^\infty 
        \Pr\left( \set{X}_i \right)
        =
        \Pr( \set{E} ) + \Pr( \emptyset ) + \Pr( \emptyset ) + \cdots
\end{equation*}
%
However, $\bigcup \{ \set{X}_i : i \in \N \} = \set{E}$, and so
%
\begin{equation*}
        \Pr( \set{E} ) 
        =
        \Pr( \set{E} ) + \Pr( \emptyset ) + \Pr( \emptyset ) + \cdots
\end{equation*}
%
Therefore, it must be that $\Pr( \emptyset ) = 0$. Thus, $\Pr$ meets all
of the requirements for being a measure. In this case,
\symdef{Iprob.541}{probspace}{$(\set{U},\Sigma,\Pr)$}{Probability space
with outcomes $\set{U}$, $\sigma$-field of events $\Sigma$, and
probability measure $\Pr$} is called a \emph{probability space} that
models some \emph{random experiment}. Additionally,
%
\begin{itemize}
        \item \symdef{Iprob.540}{probmeasure}{$\Pr$}{Probability
                measure} is called a \emph{probability measure}
        \item the set $\set{U}$ is called the \emph{(universal) sample
                space} and is viewed as a set of \emph{outcomes} of the
                random experiment being modeled
        \item the set $\Sigma$ is called a set of \emph{events} (\ie,
                the events are the measurable subsets of the outcomes). 
        \item the \emph{probability} of any event $\set{E} \in \Sigma$
                is given by $\Pr(\set{E})$
\end{itemize}
%
Note that it is common for $(\set{U},\Sigma,\Pr)$ to be called a random
experiment rather than a probability space.

\paragraph{Properties of a Probability Space:} Take a probability space
$(\set{U},\Sigma,\Pr)$. Take an event $\set{A} \in \Sigma$. Its
complement $\set{A}^c = \set{U} \setdiff \set{A}$ (note that $\set{A}^c
\in \Sigma$, and so $\set{A}^c$ is also an event). Take an
additional event $\set{B}$. It can be shown that
%
\begin{itemize}
        \item $\Pr(\emptyset) = 0$
        \item $\Pr(\set{A}) = 1 - P(\set{A}^c)$
        \item $\Pr(\set{A}) \leq 1$
        \item $\Pr(\set{A} \cup \set{B}) = \Pr(\set{A}) + \Pr(\set{B}) -
                \Pr(\set{A} \cap \set{B})$ 
        \item $\Pr(\set{A} \cap \set{B}) = \Pr(\set{A}) + \Pr(\set{B}) -
                \Pr(\set{A} \cup \set{B})$ 
        \item if $\set{B} \subseteq \set{A}$ then $\Pr(\set{A}) =
                \Pr(\set{B}) + \Pr(\set{A} \cap \set{B}^c)$ and
                $\Pr(\set{A}) \geq \Pr(\set{B})$
\end{itemize}
%
Because $\Pr(\set{A}) \leq 1$ for all $\set{A} \in \Sigma$, it is not
uncommon for $\Pr$ to be defined with a codomain of $[0,1]$.

\paragraph{Terminology:} Take a probability space $(\set{U},\Sigma,\Pr)$
and events $\set{A},\set{B} \in \Sigma$. In application, there are a
number of terms that describe properties of events.
%
\begin{description}
        \item\emph{Independent Events:} Saying that $\set{A}$ and
                $\set{B}$ are \emph{(pairwise) independent} means that
                $\Pr( \set{A} \cap \set{B}) = \Pr(\set{A})
                \Pr(\set{B})$.
        \item\emph{Disjoint Events:} Saying that $\set{A}$ and $\set{B}$
                are \emph{disjoint (events)} means that $\set{A} \cap
                \set{B} = \emptyset$. Of course, if $\set{A}$ and
                $\set{B}$ are disjoint then $\Pr( \set{A} \cap \set{B})
                = 0$.  Assume that $\set{A}$ and $\set{B}$ are disjoint
                and independent. In this case, $\Pr( \set{A} \cap
                \set{B}) = \Pr(\set{A}) P(\set{B}) = 0$, which can only
                occur if $\Pr(\set{A})=0$ or $\Pr(\set{B})=0$ (or both).
        \item\emph{With Probability Zero:} If $\Pr(\set{A})=0$ then
                event $\set{A}$ is said to happen \emph{with probability
                zero} or \emph{almost never}. In a general measure
                context, $\set{A}$ is a \emph{null set}. Note that the
                random experiment that this probability space models may
                still have outcomes from $\set{A}$ that occur even
                though they exist in set that occurs with probability
                zero.
        \item\emph{Almost Sure:} If $\Pr(\set{A})=1$ then $\set{A}$ is
                said to happen \emph{with probability one} or
                \emph{almost surely}. In this case, $\Pr(\set{A}^c)=0$;
                therefore, the event $\set{A}^c$ occurs almost never.
                However, this does not guarantee that actual outcomes in
                the random experiment modeled by this probability space
                will always come from $\set{A}$. 
\end{description}

\subsection{The Extended Reals as Probability Space} 
\label{app:math_extended_reals_prob_space}

Take the probability space $(\extR,\Borel(\extR),\Pr)$. The
justification for using $\Borel(\extR)$ will be introduced in
\longref{app:math_random_variables}. Of course, $\Pr$ has domain
$\Borel(\extR)$ and thus must be defined for all events $\set{E} \in
\Borel(\extR)$. However, since $\Borel(\extR)$ is a Borel field then any
element $\set{E} \in \Borel(\extR)$ is a Borel set and can be
constructed by countable intersections, unions, or complements of half
lines (\ie, intervals of the form $[-\infty,a]$ where $a \in \R$). By
the properties of a probability space, if the probability of each half
line is known, then the probability of $\set{E}$ can be determined
analytically. Therefore, define a function $F: \extR \mapsto [0,1]$ by
%
\begin{equation*}
        F(x) 
        \triangleq \Pr( [-\infty,x] ) 
        = \Pr( \{ z \in \R : z \leq x \} ) 
\end{equation*}
%
for all $x \in \extR$. In this case, $F$ is called the \emph{cumulative
distribution function} and can be used to find the probability of every
event. In fact, it can be shown that $F$ is a monotonically increasing
lower semi-continuous function. That is, for all $p,q \in \extR$ with $p
\leq q$, it is the case that
%
\begin{equation*}
        F(p) \leq F(q)
\end{equation*}
%
and
%
\begin{equation}
        \liminf\limits_{x \to p} F(x) \geq F(p)
        \label{eq:cdf_lsc}
\end{equation}
%
However, note that
%
\begin{equation}
        \limsup\limits_{x \to p} F(x)
        \geq
        \liminf\limits_{x \to p} F(x)
        \label{eq:cdf_lsc_limsup}
\end{equation}
%
Therefore, we use the notation
%
\begin{equation*}
        F(p+) 
        \triangleq
        \limsup\limits_{x \to p} F(x)
\end{equation*}
%
Note that by \longrefs{eq:cdf_lsc} and \shortref{eq:cdf_lsc_limsup} then
$F(p+) \geq F(p)$ for all $p \in \extR$. Additionally, note that if
$F(p+) = F(p)$ then $F$ is upper semi-continuous at $p$ and therefore
continuous at $p$. Take $a \in \extR$. The following always hold.
%
\begin{itemize}
        \item It is always the case that $\Pr(\{a\}) = F(a+) - F(a)$.
                Thus, if $\Pr(\{a\}) = 0$ then $F$ is continuous at $a$.
        \item If $F$ is continuous then $\Pr(\{a\}) = 0$. 
\end{itemize}
%
Note that for $a,b \in \extR$ with $a \leq b$,
%
\begin{itemize}
        \item $\Pr( (a,b] ) = F(b) - F(a)$
        \item $\Pr( [a,b] ) = F(b) - F(a) + F(a+) - F(a)$
        \item $\Pr( [a,b) ) = F(b) - F(a) + F(a+) - F(a) - (F(b+) -
                F(b))$
        \item $\Pr( (a,b) ) = F(b) - F(a) - (F(b+) - F(b))$
        \item $F(\infty)=\Pr(\extR)=1$.
        \item $\Pr( (a,\infty] ) = F(\infty) - F(a) = 1 - F(a)$
\end{itemize}
%
Recall that for any $\set{E} \in \Borel(\extR)$,
%
\begin{equation*}
        \Pr( \set{E} ) = \int_\set{E} \total \Pr
\end{equation*}
%
Take a point $x \in \R$ and the interval $[-\infty,x]$. Additionally,
define $\set{E}_x$ by
%
\begin{equation*}
        \set{E}_x 
        \triangleq
        \{ p \in [-\infty,x] : F(p+) \neq F(p) \} 
\end{equation*}
%
In other words, $\set{E}_x$ is the set of all points in the interval
$[-\infty,x]$ where $F$ is not continuous. It can be shown that
$\set{E}_x$ is a countable set of points, and thus it is Lebesgue null
(\ie, $m(\set{E}_x)=0$). It can also be shown that
%
\begin{align*}
        F(x) 
        &= \Pr( [-\infty,x] )\\
        &= \int_{[-\infty,x]} \total \Pr\\
        &= \int_{-\infty}^x F'(x) \total x + \Pr( \set{E}_x )\\
        &= 
        \int_{-\infty}^x F'(x) \total x + \int_{\set{E}_x} \total \Pr\\
        &=
        \int_{-\infty}^x 
        ( 
        F'(x) 
        +
        \sum\limits_{p \in \set{E}_x} (F(p+)-F(p)) \delta_p(\{x\})
        ) \total x\\
        &= 
        \int_{-\infty}^x F'(x) \total x 
        + 
        \sum\limits_{p \in \set{E}_x} ( F(p+) - F(p) )
\end{align*}
%
In this case, denote the \emph{density function of measure $\Pr$ with
respect to the Lebesgue measure} as $f: \extR \mapsto \extR$ which is
defined by
%
\begin{align*}
        f(x)
        &\triangleq
        F'(x) + \sum_{p \in \set{E}_x} (F(p+)-F(p)) \delta_p(\{x\})\\
        &=
        F'(x) + \sum_{p \in \set{E}_x} \Pr(\{p\}) \delta_p(\{x\})\\
        &=
        F'(x) + \sum_{p \in \set{E}_x} \Pr(\{p\}) \delta(x-p)
\end{align*}
%
where $F'(x)$ can be viewed as the derivative of $F$ with respect to
$x$. Of course, $F$ may not be differentiable everywhere, and so its
derivative may not exist at some points. However, we can somewhat
arbitrarily define $F'$ on those points. The function $f$ is known as
the \emph{probability density function}. It can be shown to be
measurable, and so it is the case that
%
\begin{equation*}
        \Pr( [-\infty,a] ) = F(a) = \int_{-\infty}^a f(x) \total x
\end{equation*}
%
for all $a \in \extR$.

\subsection{Random Variables}
\label{app:math_random_variables}

Take a random experiment modeled by probability space
$(\set{U},\Sigma,\Pr)$. Clearly, any event $\set{E} \in \Sigma$ has a
probability $\Pr(\set{E})$. However, it is difficult to specify the form
of the probability measure $\Pr$ for every experiment. Thus, we
introduce the \emph{random variable}. Assume that function 
%
\begin{equation*}
        X: \set{U} \mapsto \extR
\end{equation*}
%
is a Borel measurable function. In this case, $X$ is called a
\emph{random variable}. That is, for all outcomes $\zeta \in \set{U}$,
$\set{X}(\zeta)$ is a real number. Additionally, for any Borel set
$\set{E} \in \Borel(\extR)$, the preimage $X^{-1}[\set{E}] \in \Sigma$.
In other words, for all $\set{E} \in \Borel(\extR)$, 
%
\begin{equation*}
        \{ \zeta \in \set{U} : X(\zeta) \in \set{E} \} \in \Sigma
\end{equation*}
%
and so the preimage of $\set{E}$ under $X$ is a measurable set and so it
also has a probability associated with it.
\symdef[]{Iprob.545}{setRV}{$\{X \leq a\}$}{Measurable set induced by
preimage of random variable $X$ (\ie, \linebreak[3] $\{ \zeta \in
\set{U} : X(\zeta) \leq a \}$)}Motivated by this, we introduce the
notation
%
\begin{equation*}
        \{ \text{statement about } X \}
        \triangleq
        \{ \zeta \in \set{U} : \text{statement about } X(\zeta) \}
\end{equation*}
%
For example, for $a,b \in \extR$,
%
\begin{itemize}
        \item $\{ a \leq X \} \triangleq \{ \zeta \in \set{U} : a \leq
                X(\zeta) \}$
        \item $\{ a < X \} \triangleq \{ \zeta \in \set{U} : a <
                X(\zeta) \}$
        \item $\{ X \leq b \} \triangleq \{ \zeta \in \set{U} : X(\zeta)
                \leq b \}$
        \item $\{ X < b \} \triangleq \{ \zeta \in \set{U} : X(\zeta)
                < b \}$
        \item $\{ a \leq X \leq b \} \triangleq \{ \zeta \in \set{U} : a
                \leq X(\zeta) \leq  b \}$
        \item $\{ a < X \leq b \} \triangleq \{ \zeta \in \set{U} : a <
                X(\zeta) \leq  b \}$
        \item $\{ a \leq X < b \} \triangleq \{ \zeta \in \set{U} : a
                \leq X(\zeta) < b \}$
        \item $\{ a < X < b \} \triangleq \{ \zeta \in \set{U} : a <
                X(\zeta) < b \}$
\end{itemize}
%
Some authors will use square brackets (\eg, $[ \text{statement about } X
]$) since preimages of sets are being generated by this notation.
\symdef[]{Iprob.546}{probRV}{$\Pr(X \leq a)$}{Probability induced by
preimage of random variable $X$ (\ie, \linebreak[3] $\Pr(\{ \zeta \in
\set{U} : X(\zeta) \leq a \})$)}Additionally, we will use the notation
%
\begin{align*}
        \Pr( \text{statement about } X )
        &\triangleq
        \Pr( \{ \text{statement about } X \} )\\
        &=
        \Pr( 
        \{ \zeta \in \set{U} : \text{statement about } X(\zeta) \}
        )
\end{align*}
%
For example, for some $a \in \extR$,
%
\begin{equation*}
        \Pr( X \leq a )
        =
        \Pr( \{ \zeta \in \set{U} : X(\zeta) \leq a \} )
\end{equation*}
%
Again, some authors will use square brackets (\eg, $\Pr[ X \leq a ]$ for
$a \in \extR$) which relates to the preimages being generated.

\paragraph{Cumulative Distributions and Probability Densities:} Take a
random experiment modeled by probability space $(\set{U},\Sigma,\Pr)$
and a random variable $X: \set{U} \mapsto \extR$.  Note that every set
in $\Borel(\extR)$ can be generated by a countable number of operations
on half-lines (\ie, sets of the form $[-\infty,a]$ for all $a \in
\extR$). Thus, we will focus on sets of the form $\{ X \leq a \}$ for
all $a \in \extR$. Recall the discussion in
\longref{app:math_extended_reals_prob_space}.
\symdef[]{Iprob.55}{cdf}{$F_X(x)$}{Cumulative distribution function for
random variable $X$ (\ie, $F_X(a) \triangleq \Pr(X \leq
a)$)}\symdef[]{Iprob.55}{cdfplus}{$F_X(x+)$}{Limit superior of $F_x$ at
point $p$}Denote the \emph{cumulative distribution function for random
variable $X$} as the function $F_X: \extR \mapsto [0,1]$ defined by
%
\begin{equation*}
        F_X(x) \triangleq \Pr( X \leq x )
\end{equation*}
%
for all $x \in \extR$. It can be shown that $F_X$ is lower
semi-continuous and monotonically increasing. Again, use the notation
$F_X(p+)$ to denote the limit superior of $F_X$ at $p$. That is,
%
\begin{equation*}
        F_X(p+) = \limsup\limits_{x \to p} F_X(x)
\end{equation*}
%
for any $p \in \extR$. Again, use $\set{E}_p$ to be the set of points in
$[-\infty,p]$ where $F_X$ is not continuous. That is,
%
\begin{equation*}
        \set{E}_p
        \triangleq
        \{ x \in [-\infty,p] : F_X(x+) \neq F_X(x) \}
\end{equation*}
%
It can be shown that the Lebesgue measure of $\set{E}_p$ is zero; the
set $\set{E}_p$ is Lebesgue null.
\symdef[]{Iprob.56}{pdf}{$f_X(x)$}{Probability density function for
random variable $X$ (\ie, $F_X(a) = \int_{-\infty}^a f_X(x) \total
x$)}Now denote the \emph{probability density function for random
variable $X$} as the function $f_X: \extR \mapsto [0,\infty]$ defined by
%
\begin{align*}
        f_X(x)
        &\triangleq
        F_X'(x) 
        + 
        \sum_{p \in \set{E}_x} (F_X(p+)-F_X(p)) \delta(x-p)\\
        &=
        F_X'(x) 
        + 
        \sum_{p \in \set{E}_x} \Pr(X=p) \delta(x-p)
\end{align*}
%
for all $x \in \extR$. While $F_X'$ may not exist at all points in
$\extR$, it can somewhat arbitrarily be defined on those points. It can
be shown that $f_X$ is a measurable function, and so 
%
\begin{equation*}
        \Pr( X \leq a )
        =
        F_X(a)
        =
        \int_{-\infty}^a f_X(x) \total x
\end{equation*}
%
Therefore, if either $F_X$ or $f_X$ is specified for a random variable,
the probability of the preimages generated can be calculated easily.

\paragraph{Omission of Domain and Codomain in Notation:} Notice that
$\extR$ is the domain of all cumulative distribution and probability
density functions. Because of this, the codomain of any random variable
should technically always be $\extR$. Additionally, the codomain (and,
in fact, range) of any cumulative distribution function will be $[0,1]$
and the codomain of any probability density function can safely be taken
to be $\extR$. Finally, the domain of any random variable associated
with a given probability space should be clear. Therefore, if a
probability space is given, the domain and codomain of any random
variable, cumulative distribution, or probability density function may
be omitted. For example, for a random experiment modeled by probability
space $(\set{U},\Sigma,\Pr)$, it is sufficient to declare a random
variable $X$ with distribution $F_X$ and density $f_X$.

\paragraph{Statistical Independence of Events:} Take a random experiment
modeled by probability space $(\set{U},\Sigma,\Pr)$. Take $\set{N}
\subseteq \N$ and family $(\set{A}_n)_{n \in \set{N}}$ such that
$\set{A}_i \in \Sigma$ for all $i \in \set{N}$. To say that the family
of events $(\set{A}_n)_{n \in \set{N}}$ are \emph{pairwise independent}
means that it is the case that for any $i,j \in \set{N}$ such that $i
\neq j$,
%
\begin{equation*}
        \Pr\left(\set{A}_i \cap \set{A}_j\right)
        =
        \Pr\left(\set{A}_i\right)
        \Pr\left(\set{A}_j\right)
\end{equation*}
%
To say that the family of events $(\set{A}_n)_{n \in \set{N}}$ are
\emph{mutually independent} means that
%
\begin{equation*}
        \Pr\left(
        \bigcap \left\{ \set{A}_i : i \in \set{N} \right\}
        \right)
        =
        \prod\limits_{i \in \set{N}} \Pr( \set{A}_i )
\end{equation*}
%
Of course, mutual independence implies pairwise independence.
Additionally, these events could be generated as preimages of random
variables. Below we will define statistical independence for random
variables by doing exactly that.

\paragraph{Conditional Probabilities:} Take a random experiment modeled
by probability space $(\set{U},\Sigma_\set{U},\Pr)$ and a random
variable $X: \set{U} \mapsto \extR$. Take a set $\set{B} \in \Sigma$. Of
course, this set has probability $\Pr(\set{B})$ and
%
\begin{equation*}
        \Pr(\set{B}) \leq 1
\end{equation*}
%
For simplicity, we implicitly assume that $\Pr(\set{B}) > 0$; however, a
more rigorous development would not require this. Now, assume that it is
\emph{given} that outcomes for this experiment will come from this set.
In this case, we define a new experiment modeled by probability space
$(\set{B},\Sigma_\set{B},\Pr|_{\set{B}})$ where $\Pr|_{\set{B}}$ is
defined by
%
\begin{equation*}
        \Pr|_{\set{B}}(\set{E})
        \triangleq
        \frac{ \Pr(\set{E} \cap \set{B}) }{ \Pr(\set{B}) }
\end{equation*}
%
for all $\set{E} \in \Sigma_\set{B}$. Note that
%
\begin{equation*}
        \Pr|_{\set{B}}(\set{B})
        =
        \frac{ \Pr(\set{B} \cap \set{B}) }{ \Pr(\set{B}) }
        =
        \frac{ \Pr(\set{B}) }{ \Pr(\set{B}) }
        =
        1
\end{equation*}
%
which is expected since $\Pr|_{\set{B}}$ is defined to be a probability
measure on $\set{B}$. For simplicity, use the notation
%
\begin{equation*}
        \Pr( \set{E} | \set{B} )
        \triangleq
        \Pr|_{\set{B}}(\set{E})
        =
        \frac{ \Pr(\set{E} \cap \set{B}) }{ \Pr(\set{B}) }
\end{equation*}
%
for all $\set{E} \in \Sigma_\set{B}$; $\Pr(\set{E}|\set{B})$ is called
the \emph{conditional probability} of $\set{E}$ \emph{given} $\set{B}$.
Note that if a random variable $\set{A}: \set{U} \mapsto \extR$ with
$\Pr(\set{A}) > 0$, $\set{A}$ and $\set{B}$ are statistically
independent events if and only if
%
\begin{equation*}
        \Pr( \set{A} | \set{B} )
        =
        \Pr( \set{A} )
        \quad \text{ or, equivalently, } \quad
        \Pr( \set{B} | \set{A} )
        =
        \Pr( \set{B} )
\end{equation*}
%
In other words, two events are independent if the probability of one
event is not affected by the condition that the other event is certain.
In geometric terms, the fraction of the universal probability space
filled by one event matches the fraction of some subset of that
probability space. Note that conditional probabilities can be used with
random variables as well; that is, random variables can be used to
specify the events. For example, for any $a,b \in \extR$,
%
\begin{equation*}
        \Pr( X \leq a | X < b )
        =
        \frac{ \Pr(\{X \leq a\} \cap \{X < b\}) }{ \Pr(\{X < b\}) }
\end{equation*}
%
Take $Y: \set{U} \mapsto \extR$ to be another random variable for the
original process. It may be used to specify given conditions. For
example, for any $a,b,c \in \extR$,
%
\begin{equation*}
        \Pr( X \leq a | X < b, Y = c )
        =
        \frac{\Pr(\{X \leq a \} \cap \{ X < b \} \cap \{ Y = c \})}%
             {\Pr(\{ X < b \} \cap \{ Y = c \})}
\end{equation*}

\paragraph{Memorylessness:} Take a random experiment modeled by
probability space $(\set{U},\Sigma,\Pr)$ and a random variable $X:
\set{U} \mapsto \extR$. To say that $X$ is \emph{memoryless} or has the
\emph{memoryless property} means that for any $a,b \in \R_{>0}$,
%
\begin{equation*}
        \Pr(X > a + b | X > b) = \Pr(X > a)
\end{equation*}

\paragraph{Functions of Random Variables:} Take a random experiment
modeled by probability space $(\set{U},\Sigma,\Pr)$ and a random
variable $X: \set{U} \mapsto \extR$. Define another Borel measurable
function $f: \extR \mapsto \extR$. Denote the composition $f \comp X$ as
function $Y: \set{U} \mapsto \extR$; that is, define $Y$ by
%
\begin{equation*}
        Y(\zeta) \triangleq f(X(\zeta))
\end{equation*}
%
for all $\zeta \in \set{U}$. The function $Y$ is another random
variable. In fact, $Y$ will often be denoted as $f(X)$ (\ie, $Y =
f(X)$). 

\paragraph{Exclusion of Outcome in Notation:} Take a random experiment
modeled by probability space $(\set{U},\Sigma,\Pr)$. Also let  $X:
\set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$ be random
variables. The notation
%
\begin{equation*}
        \{ X = Y \}
        =
        \{ \zeta \in \set{U} : X(\zeta) = Y(\zeta) \}
\end{equation*}
%
However, some authors will use $X = Y$ to denote $\{X=Y\}$ instead. For
example, to say \emph{$X = Y$ with probability 1} means that 
%
\begin{equation*}
        \Pr(X=Y) 
        = 
        \Pr( \{ \zeta \in \set{U} : X(\zeta) = Y(\zeta) \} ) 
        = 1
\end{equation*}
%
However, the statement that $X = Y$ might denote that \emph{for any
$\zeta \in \set{U}$, $X(\zeta) = Y(\zeta)$}. In this case, $X = Y$ is a
statement about the functional form of $X$ and $Y$ and not about the
preimages that they induce. Our convention is to use curly braces around
preimages (\eg, $\{X = Y\}$) whenever measurable preimages need to be
generated. Thus, if curly braces are not being used and a particular
$\zeta \in \set{U}$ has not been identified, we mean that the functional
expression holds for all $\zeta \in \set{U}$.

\paragraph{Expectation of a Random Variable:} Take a random experiment
modeled by probability space $(\set{U},\Sigma,\Pr)$ and a random
variable $X: \set{U} \mapsto \extR$. Define another Borel measurable
function $g: \extR \mapsto \extR$ and denote the composition $g \comp X$
as function $Y: \set{U} \mapsto \extR$ defined by $Y(\zeta) \triangleq
g(X(\zeta))$ for all $\zeta \in \set{U}$. The \emph{expectation of $Y$}
is denoted $\E(Y)$ or
\symdef{Iprob.61}{expectationgX}{$\E(g(X))$}{Expectation of function $g$
of random variable $X$ (\ie, \linebreak[4] $\int_{-\infty}^\infty g(x)
f_X(x) \total x$)} is defined
%
\begin{equation*}
        \E(g(X))
        \triangleq
        \int_{-\infty}^\infty  
        g(x)
        f_X(x) 
        \total x
\end{equation*}
%
where $f_X$ is the probability density function of random variable $X$.
Note that this implies
%
\begin{equation*}
        \E(X)
        \triangleq
        \int_{-\infty}^\infty  
        x
        f_X(x) 
        \total x
\end{equation*}
%
where \symdef{Iprob.60}{expectationX}{$\E(X)$}{Expectation of random
variable $X$ (\ie, \linebreak[4] $\int_{-\infty}^\infty x f_X(x) \total
x$)} is called the \emph{expectation of $X$}, which is a
\emph{first-order statistic} of $X$. This is sometimes called the
\emph{average} or \emph{mean} of random variable $X$; however, it should
not be confused with other non-random uses of those terms.
Additionally,
%
\begin{equation*}
        \E(X^2)
        \triangleq
        \int_{-\infty}^\infty  
        x^2
        f_X(x) 
        \total x
\end{equation*}
%
where $\E(X^2)$ is called the \emph{second moment} of random variable
$X$, which is one of its \emph{second-order statistics}.

\paragraph{Linearity of Expectation:} Take a random experiment
modeled by probability space $(\set{U},\Sigma,\Pr)$, random
variables $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$, and
$a,b \in \R$. It is easy to show that
%
\begin{equation*}
        \E( a X + b Y ) = a \E(X) + b \E(Y)
\end{equation*}
%
That is, the expectation is \emph{linear}. Additionally, assume that $c
\in \extR$. Trivially, $c$ is a random variable. Therefore, 
%
\begin{equation*}
        \E( c ) = c
\end{equation*}
%
Of course, $\E(X) \in \extR$. Therefore,
%
\begin{equation*}
        \E( \E(X) ) = \E(X)
\end{equation*}
%
Thus,
%
\begin{equation*}
        \E( X - \E(X) ) = \E( X ) - \E( \E(X) ) = \E(X) - \E(X) = 0
\end{equation*}

\paragraph{Variance of a Random Variable:} Take a random experiment
modeled by probability space $(\set{U},\Sigma,\Pr)$ and a random
variable $X: \set{U} \mapsto \extR$. The \emph{variance of $X$} or the
\emph{second central moment of $X$} is denoted $\var(X)$ or
\symdef{Iprob.62}{varianceX}{$\var(X)$}{Variance of random variable $X$
(\ie, $\var(X) = \E(X^2) - \E(X)^2$)} is defined
%
\begin{equation*}
        \var(X)
        \triangleq
        \E( (X - \E(X))^2 )
        =
        \int_{-\infty}^\infty  
        (x - \E(X))^2 
        f_X(x) 
        \total x
\end{equation*}
%
where $f_X$ is the probability density function of random variable $X$.
Note that this implies
%
\begin{equation*}
        \var(X)
        =
        \E(X^2) - \E(X)^2
\end{equation*}
%
which is a useful property of the variance. Note that this implies
%
\begin{equation*}
        \E(X^2)
        =
        \var(X) + \E(X)^2
\end{equation*}
%
The variance of $X$ is one of its \emph{second-order statistics}.

\paragraph{Properties of Variance:} Take a random experiment modeled by
probability space $(\set{U},\Sigma,\Pr)$, random variables $X: \set{U}
\mapsto \extR$, and $a,b \in \R$. It is easy to show that
%
\begin{equation*}
        \var( a X + b ) = a^2 \var( X )
\end{equation*}
%
This implicitly uses the fact that
%
\begin{equation*}
        \var( b ) = 0
\end{equation*}
%
In fact,
%
\begin{equation*}
        \var( \var(X) ) = 0
        \quad \text{ and } \quad
        \var( \E(X) ) = 0
        \quad \text{ and } \quad
        \E( \var(X) ) = \var(X)
\end{equation*}
%
Now take additional random variable $Y: \set{U} \mapsto \extR$. It is
the case that
%
\begin{align*}
        \var( a X + b Y ) 
        &= a^2 \var(X) + b^2 \var(Y) + 2ab \E( (X-\E(X))(Y-\E(Y)) )\\
        &= a^2 \var(X) + b^2 \var(Y) + 2ab ( \E(XY) - \E(X)\E(Y) )\\
        &= a^2 \var(X) + b^2 \var(Y) + 2ab \E(XY) - 2ab \E(X)\E(Y)
\end{align*}
%
where $\E( (X-\E(X))(Y-\E(Y)) )$ is sometimes called the
\emph{covariance of $X$ and $Y$} and is denoted
\symdef{Iprob.63}{covarianceXY}{$\cov(X,Y)$}{Covariance of random
variables $X$ and $Y$ (\ie, $\cov(X,Y) = \E(XY) - \E(X)\E(Y)$)}. That
is,
%
\begin{equation*}
        \cov(X,Y) 
        \triangleq \E( (X-\E(X)) (Y-\E(Y)) )
        = \E(XY) - \E(X)\E(Y)
\end{equation*}

\subsection{Relationship Between Random Variables}

It is common for two probability spaces to be related to each other.
That is, it is common for an experiment to generate multiple random
variables that are related through the experiment. Examples of this have
already been give. For example, take an experiment modeled by
probability space $(\set{U},\Sigma,\Pr)$ and random variables $X:
\set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$ and $a,b \in \R$.
As we have discussed, it is the case that
%
\begin{equation*}
        \E( a X + b Y ) = a \E(X) + b \E(Y)
\end{equation*}
%
and
%
\begin{equation*}
        \var( a X + b Y ) = a^2 \var(X) + b^2 \var(Y) + 2ab \cov(X,Y)
\end{equation*}
%
Of course, these results will generalize to any finite collection of
random variables.

\paragraph{Identically Distributed Random Variables:} Take an experiment
modeled by probability space $(\set{U},\Sigma,\Pr)$.  Also take $\set{N}
\subseteq \N$ and the family $(X_n)_{n \in \set{N}}$ where $X_i: \set{U}
\mapsto \extR$ is a random variable for each $i \in \set{N}$. For
example, these random variables could represent successive \emph{trials}
of the same experiment. Now assume that for any $i \in \set{N}$ with $i
\neq j$,
%
\begin{equation*}
        f_{X_i}(x) = f_{X_j}(x)
\end{equation*}
%
for all $x \in \extR$ (\ie, all random variables have the same
distributions). In this case, the random variables are said to be
\emph{identically distributed}.

\paragraph{Joint Distributions and Densities:} Take an experiment
modeled by probability space $(\set{U},\Sigma,\Pr)$ and random variables
$X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$ and $a,b \in
\R$.  Consider the events generated by
%
\begin{equation*}
        \{ X \leq a \}
        \quad \text{ and } \quad
        \{ Y \leq b \}
\end{equation*}
%
Of course, both of these events are from the $\sigma$-field $\Sigma$,
and so their intersection is also included in that field. We can
generate that event by taking the intersection of the two statements
above. Thus, it is useful for us to define the notation:
%
\begin{equation*}
        \{ X \leq a, Y \leq b \}
        \triangleq
        \{ X \leq a \}
        \cap
        \{ Y \leq b \}
\end{equation*}
%
\symdef[]{Iprob.65}{jcdf}{$F_{XY}(x,y)$}{Joint distribution function for
random variables $X$ and $Y$ (\ie, $F_{XY}(a,b) \triangleq \Pr(X \leq a,
Y \leq b)$)}Now, we can define the \emph{joint distribution} $F_{XY}:
\extR \times \extR \mapsto [0,1]$ as
%
\begin{equation*}
        F_{XY}(x,y) \triangleq P( X \leq x, Y \leq y )
\end{equation*}
%
for all $x,y \in \extR$. Recall how Dirac delta functions were
introduced in the construction of a density function.
\symdef[]{Iprob.66}{jpdf}{$f_{XY}(x,y)$}{Joint density function for
random variables $X$ and $Y$}Through a similar process, we can introduce
a \emph{joint density function} $f_{XY}: \extR \times \extR \mapsto
[0,\infty]$ such that
%
\begin{equation*}
        F_{XY}(a,b)
        =
        \int_{-\infty}^b \int_{-\infty}^a f(x,y) \total x \total y
\end{equation*}

\paragraph{Conditional Random Variables:} Take an experiment modeled by
probability space $(\set{U},\Sigma,\Pr)$ and random variables $X:
\set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$ and $a,b \in \R$.
Assume that the experiment is changed so that it is given that $X = a$.
As with the definition of conditional events above, we can define a new
probability space $(\{X = a\},\Sigma_{\{X=a\}},\Pr|_{\{X=a\}})$ where
the notation
%
\begin{equation*}
        \Pr(\set{E}|X=a)
        \triangleq
        \Pr|_{\{X=a\}}(\set{E})
\end{equation*}
%
for all $\set{E} \in \Sigma_{\{X=a\}}$. Thus, we can define the
\emph{conditional density function} of $Y$ \emph{given} $X=x$
\symdef[]{Iprob.670}{condpdf}{$f_{Y \pipe X}(y \pipe x)$}{Conditional
density function for random variable $Y$ given $X=x$}$f_{Y|X}: \extR
\times \extR \mapsto [0,\infty]$ by
%
\begin{equation*}
        f_{Y|X}(y|x)
        \triangleq
        \frac{ f_{XY}(x,y) }{ f_X(x) }
\end{equation*}
%
which will lead to
%
\begin{equation*}
        F_{YX}(y|x)
        \triangleq
        \Pr( Y \leq y | X=x )
        =
        \int_{-\infty}^y f_{Y|X}(y|x) \total y
\end{equation*}
%
where \symdef[]{Iprob.671}{condcdf}{$F_{Y \pipe X}(y \pipe
x)$}{Conditional distribution function for random variable $Y$ given
$X=x$}$F_{Y|X}: \extR \times \extR \mapsto [0,1]$ is the
\emph{conditional distribution function} of random variable $Y$
\emph{given} $X=x$. Similarly,
%
\begin{equation*}
        f_{X|Y}(x|y)
        \triangleq
        \frac{ f_{XY}(x,y) }{ f_Y(y) }
\end{equation*}
%
which can be used in a similar way to generate conditional distribution
function $F_{XY}$.

\paragraph{Conditional Expectation:} Take an experiment modeled by
probability space $(\set{U},\Sigma,\Pr)$ and random variables $X:
\set{U} \mapsto \extR$ and $Y: \set{U} \mapsto \extR$. We can define the
\emph{conditional expectation} of $Y$ given $X=x$ as
%
\begin{equation*}
        \E(Y|X=x)
        \triangleq
        \int_{-\infty}^\infty
        y f(y|x) \total y
\end{equation*}
%
Note that this is a function of $x$.
\symdef[]{Iprob.68}{condexp}{$\E(Y \pipe X)$}{Conditional expectation of
$Y$ given $X$}Therefore, use the notation
%
\begin{equation*}
        \E(Y|X) \triangleq \E(Y|X=X)
\end{equation*}
%
to represent a new random variable generated from the composition of
$\E(Y|X=x)$ and $X$. This is called the \emph{conditional expectation of
$Y$ given $X$}. It is the case that
%
\begin{equation*}
        \E(\E(Y|X)) = \E(Y)
        \quad \text{ and } \quad
        \E(\E(X|Y)) = \E(X)
\end{equation*}
%
In fact, for measurable functions $g: \extR \mapsto \extR$ and $h: \extR
\mapsto \extR$,
%
\begin{equation*}
        \E( g(X) h(Y) )
        =
        \E( \E( g(X) h(Y) | Y ) )
        =
        \E( h(Y) \E( g(X) | Y ) )
\end{equation*}
%
This is a useful fact. Note that it implies
%
\begin{equation}
        \E( X Y )
        =
        \E( Y \E( X | Y ) )
        \quad \text{ and } \quad
        \E( X )
        =
        \E( \E( X | Y ) )
        \label{eq:expectation_to_condexp}
\end{equation}
%
These two relationships can be especially useful if the range of $Y$ is
countable or finite.

\paragraph{Uncorrelated Random Variables:} Take an
experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and
random variables $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto
\extR$. To say $X$ and $Y$ are \emph{uncorrelated} means that their
covariance is zero (\ie, $\cov(X,Y) = \cov(Y,X) = 0$). Equivalently, to
say $X$ and $Y$ are uncorrelated means that
%
\begin{equation*}
        \E(XY) = \E(X) \E(Y)
\end{equation*}

\paragraph{Statistically Independent Random Variables:} Take an
experiment modeled by probability space $(\set{U},\Sigma,\Pr)$ and
random variables $X: \set{U} \mapsto \extR$ and $Y: \set{U} \mapsto
\extR$ and $a,b \in \R$. Also take some $\set{E}_X,\set{E}_Y \in
\Borel(\extR)$. Random variables $X$ and $Y$ are said to be
\emph{(statistically) independent} or \emph{(statistically) pairwise
independent} if
%
\begin{equation*}
        \Pr( X \in \set{E}_X, Y \in \set{E}_Y )
        =
        \Pr( X \in \set{E}_X )
        \Pr( Y \in \set{E}_Y )
\end{equation*}
%
In fact, it can be shown that $X$ and $Y$ are statistically independent
if and only if
%
\begin{equation*}
        F_{XY}(x,y) = F_X(x) F_Y(y)
        \quad \text{ or } \quad
        f_{XY}(x,y) = f_X(x) f_Y(y)
\end{equation*}
%
for all $x,y \in \extR$. Note that the condition that $f_{XY}(x,y) =
f_X(x) f_Y(y)$ for all $x,y \in \extR$ is equivalent to requiring that
%
\begin{equation*}
        f_{X|Y}(x|y) = f_X(x)
        \quad \text{ and } \quad
        f_{Y|X}(y|x) = f_Y(y)
\end{equation*}
%
In other words, this is also equivalent to statistical independence.
Now, assume that $X$ and $Y$ are statistically independent. Also take
$g: \extR \mapsto \extR$ and $h: \extR \mapsto \extR$ are two measurable
functions. It is the case that
%
\begin{equation*}
        \E( g(X) h(Y) ) = \E( g(X) ) \E( h(Y) )
\end{equation*}
%
In fact, 
%
\begin{equation*}
        \E( X Y ) = \E(X) \E(Y)
\end{equation*}
%
Therefore, statistical independence implies uncorrelatedness. Note,
however, that the converse is not necessarily true. Additionally,
because these two random variables are uncorrelated (since they are
statistically independent),
%
\begin{equation*}
        \var(X + Y) = \var(X) + \var(Y)
\end{equation*}
%
Now take random variable $Z: \set{U} \mapsto \extR$ defined by
%
\begin{equation*}
        Z(\zeta) \triangleq X(\zeta) + Y(\zeta)
\end{equation*}
%
for all $\zeta \in \set{U}$. If $X$ and $Y$ are independent random
variables, it can be shown for all $z \in \extR$,
%
\begin{equation*}
        f_Z(z) = f_X(x) * f_Y(y)
\end{equation*}
%
where $*$ denotes convolution, which is discussed in
\longref{app:math_convolution}.

\paragraph{Pairwise Independent Random Variables:} Take an experiment
modeled by probability space $(\set{U},\Sigma,\Pr)$. Also take $\set{N}
\subseteq \N$ and the family $(X_n)_{n \in \set{N}}$ where $X_i: \set{U}
\mapsto \extR$ is a random variable for each $i \in \set{N}$. Assume
that for any family $(a_n)_{n \in \set{N}}$ such that $a_n \in \R$ for
all $n \in \set{N}$, for any $i,j \in \set{N}$ with $i \neq j$,
%
\begin{equation*}
        \Pr\left(\{ X_i \leq a_i \} \cap \{ X_j \leq a_j \}\right)
        =
        \Pr\left(\{ X_i \leq a_i \}\right)
        \Pr\left(\{ X_j \leq a_j \}\right)
\end{equation*}
%
These random variables are said to be \emph{pairwise independent}. This
is equivalent to the statistical independence described above.

\paragraph{Mutually Independent Random Variables:} Take an experiment
modeled by probability space $(\set{U},\Sigma,\Pr)$. Also take $\set{N}
\subseteq \N$ and the family $(X_n)_{n \in \set{N}}$ where $X_i: \set{U}
\mapsto \extR$ is a random variable for each $i \in \set{N}$. Assume
that for any family $(a_n)_{n \in \set{N}}$ such that $a_n \in \R$ for
all $n \in \set{N}$,
%
\begin{equation*}
        \Pr\left(
        \bigcap \left\{ \{ X_i \leq a_i \} : i \in \set{N} \right\}
        \right)
        =
        \prod\limits_{i \in \set{N}} \Pr( \{ X_i \leq a_i \} )
\end{equation*}
%
These random variables are said to be \emph{mutually independent}. Note
that any collection of mutually independent random variables are
necessarily pairwise independent as well.

\paragraph{Independent and Identically Distributed Random Variables:}
Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$.
Also take $\set{N} \subseteq \N$ and the family $(X_n)_{n \in \set{N}}$
where $X_i: \set{U} \mapsto \extR$ is a random variable for each $i \in
\set{N}$. If these random variables all have the same distribution (\ie,
they are identically distributed) and are all \emph{mutually}
independent, they are said to be
\emph{\acro[\defarg][IID]{\iid}{independent and identically
distributed}}. 

\subsection{Random Vectors}

Take $n \in \N$ and an experiment modeled by probability space
$(\set{U},\Sigma,\Pr)$. Now take an indexed family $(X_i)_{i=1}^n$ where
$X_i: \set{U} \mapsto \extR$ is a random variable for all $i \in
\{1,2,\dots,n\}$. Denote the $n$-tuple $(X_1,X_2,X_3,\dots,X_n)$ by
$\v{X}$. Thus, $\v{X}: \set{U} \mapsto \extR^n$ is called a \emph{random
vector} or an \emph{$n$-dimensional random vector}. Of course, if $n=1$
then the random vector is simply a random variable (which may also be
called a one dimensional random vector).

\subsection{Common Random Variables}

There are a number of common random variables used in applications. We
define a few here. For each of these, take an experiment modeled by
probability space $(\set{U},\Sigma,\Pr)$. The random variable being
defined is function $X: \set{U} \mapsto \extR$.
%
\begin{description}
        \item\emph{The Constant Function:} Assume that there exists some
                $c \in \extR$ such that $X(\zeta)=c$ for all $\zeta \in
                \set{U}$. That is, $X$ is \emph{constant}. Clearly, its
                probability density function is
                %
                \begin{equation*}
                        f_X( x ) = \delta(x-c)
                \end{equation*}
                %
                That is, all of the mass that is its probability is
                concentrated on $\{ X = c \}$. That is,
                $\{X=c\}=\set{U}$ and so $\Pr( X=c ) = 1$ trivially.
                Notice that
                %
                \begin{equation*}
                        \E( X ) = c 
                        \quad \text{ and } \quad 
                        \var(X) = 0
                \end{equation*}
        \item\emph{The Bernoulli Random Variable:} Take some $p \in
                [0,1]$. Assume $X$ is a \emph{Bernoulli random
                variable}. This means that $X$ has the probability
                density function
                %
                \begin{equation*}
                        f_X( x ) = (1-p) \delta(x) + p \delta(x-1)
                \end{equation*}
                %
                Clearly, $\range(X)=\{0,1\}$ and so $\{ X = 0 \} \cup
                \{X = 1\} = \set{U}$. In particular,
                %
                \begin{equation*}
                        \Pr( X = 0 ) = 1-p
                        \quad \text{ and } \quad
                        \Pr( X = 1 ) = p
                \end{equation*}
                %
                Notice that
                %
                \begin{equation*}
                        \E( X ) = p 
                        \quad \text{ and } \quad 
                        \var(X) = p(1-p)
                \end{equation*}
                %
                If $(X_N)$ is a sequence of \iid{}\ Bernoulli random
                variables then $X_N$ is called a \emph{Bernoulli trial}
                for each $N \in \N$. The Bernoulli random variable can
                be viewed as a weighted coin flip (\ie, $\set{U} = \{
                \text{heads}, \text{tails} \}$), where the event $\{X =
                0\} = \{ \text{tails} \}$ and the event $\{X=1\} = \{
                \text{heads} \}$.  If its parameter $p=0.5$ then the
                outcome is equally likely to be \emph{heads} or
                \emph{tails}; if its parameter is $p=0.80$ then there is
                a much greater chance that the outcome will be
                \emph{heads}.

                Take some random variable $Y: \set{U} \mapsto \extR$
                such that
                %
                \begin{equation*}
                        f_{Y|X}(y|1) = f_{Y}(y)
                \end{equation*}
                %
                Note that this is a weak kind of statistical
                independence. It implies that $\E(Y|X=1)=\E(Y)$. By the
                definition of a Bernoulli random variable, it is then
                necessary that $X$ and $Y$ are uncorrelated (\ie,
                $\E(XY)=\E(X)\E(Y)$).

                Now, take some $n \in \N$. Notice that for all $\zeta
                \in \set{U}$, $X^n(\zeta) = X(\zeta)$. Therefore, since
                $X$ is a Bernoulli random variable with parameter $p$,
                $X^n$ is also a Bernoulli random variable with parameter
                $p$ for all $n \in \N$. Thus, 
                %
                \begin{equation*}
                        \E( X^n ) = p
                        \quad \text{ and } \quad 
                        \var( X^n ) = p(1-p)
                \end{equation*}
                %
                In fact, any statistical properties endowed to $X$ will
                be inherited by $X^n$. For example, for a random
                variable $Y: \set{U} \mapsto \extR$ such that $X$ and
                $Y$ are uncorrelated, it is also the case that $X^n$ and
                $Y$ are uncorrelated (\ie, if $\E(XY)=p \E(Y)$ then
                $\E(X^n Y)=\E(XY)=p \E(Y)$). This is a special property
                of Bernoulli random variables.
        \item\emph{The Poisson Random Variable:} Take $\lambda \in
                \R_{>0}$.  Assume $X$ is an \emph{Poisson random
                variable}. This means that $X$ has the probability
                density function
                %
                \begin{align*}
                        f_X( x ) 
                        &= 
                        \begin{cases}
                                \frac{ \exp(-\lambda) \lambda^x }{ x! }
                                &\text{if } x \in \W\\
                                0 &\text{otherwise}
                        \end{cases}\\
                        &=
                        \sum\limits_{k=0}^\infty
                        \frac{ \exp(-\lambda) \lambda^k }{ k! }
                        \delta( x - k )
                \end{align*}
                %
                Clearly, $\range(X)=\W$. Notice that
                %
                \begin{equation*}
                        \E( X ) = \lambda
                        \quad \text{ and } \quad 
                        \var(X) = \lambda
                \end{equation*}
                %
                Such a random variable is said to be \emph{Poisson
                distributed} or \emph{Poissonian}.
        \item\emph{The Continuous Uniform Random Variable:} Take $a,b
                \in \R$ with $a < b$. Assume $X$ is a \emph{continuous
                uniform random variable}.  This means that $X$ has the
                probability density function
                %
                \begin{equation*}
                        f_X( x ) 
                        = 
                        \begin{cases}
                                \frac{1}{b-a} &\text{if } x \in [a,b]\\
                                0 &\text{otherwise}
                        \end{cases}
                \end{equation*}
                %
                Clearly, $\range(X)=[0,1]$. Notice that
                %
                \begin{equation*}
                        \E( X ) = \frac{a + b}{2}
                        \quad \text{ and } \quad 
                        \var(X) = \frac{ (b-a)^2 }{12}
                \end{equation*}
                %
                Such a random variable is said to be \emph{uniformly
                distributed} on $[a,b]$.
        \item\emph{The Exponential Random Variable:} Take $\lambda \in
                \R_{>0}$.  Assume $X$ is an \emph{exponential random
                variable}. This means that $X$ has the probability
                density function
                %
                \begin{equation*}
                        f_X( x ) 
                        = 
                        \begin{cases}
                                \lambda \exp( -\lambda x ) 
                                &\text{if } x \geq 0\\
                                0 &\text{if } x < 0
                        \end{cases}
                \end{equation*}
                %
                Clearly, $\range(X)=[0,\infty)$. Notice that
                %
                \begin{equation*}
                        \E( X ) = \frac{1}{\lambda}
                        \quad \text{ and } \quad 
                        \var(X) = \frac{1}{\lambda^2}
                \end{equation*}
                %
                Note that for all $a,b \in \R_{>0}$,
                %
                \begin{equation*}
                        \Pr(X > a + b | X > b) = \Pr(X > a)
                \end{equation*}
                %
                Therefore, this random variable has the \emph{memoryless
                property}. A random variable with this distribution is
                said to be \emph{exponentially distributed}.
        \item\emph{The Erlang Random Variable:} Take $\lambda \in
                \R_{>0}$ and $k \in \N$. Assume $X$ is an \emph{Erlang
                random variable}.  This means that $X$ has the
                probability density function
                %
                \begin{equation*}
                        f_X( x )
                        = 
                        \begin{cases}
                                \frac{\lambda(\lambda x)^{k-1}\exp(-\lambda x)}{(k-1)!}
                                &\text{if } x \geq 0\\
                                0 &\text{if } x < 0
                        \end{cases}
                \end{equation*}
                %
                Clearly, $\range(X)=[0,\infty)$. Notice that
                %
                \begin{equation*}
                        \E( X ) = \frac{k}{\lambda}
                        \quad \text{ and } \quad 
                        \var(X) = \frac{k}{\lambda^2}
                \end{equation*}
                %
                Such a random variable is said to be \emph{Erlang
                distributed} or \emph{Erlang-$k$ distributed}.
\end{description}

\section{Random Processes}
\label{app:probability_rp}

Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$
and some $n \in \N$. Now take a totally ordered set $\set{A}$ and a net
\symdef[]{Iprob.70}{randomprocess}{$( \v{N}(t) : t \in \R_{\geq
0})$}{Random process (\ie, $\v{N}(t)$ is a random vector for all $t \in
\R_{>0}$)}$(X(t) : t \in \set{A})$ such that $\v{X}(t): \set{U} \mapsto
\extR^n$ is an $n$-dimensional random vector for all $t \in \set{A}$.
This is known as an \emph{($n$-dimensional) stochastic process} or an
\emph{($n$-dimensional) random process}. 

\subsection{Continuous and Discrete Time Processes}

Take an experiment modeled by probability space $(\set{U},\Sigma,\Pr)$.
Assume that this experiment runs over some period of time. Therefore, at
each instant of time, the experiment can be viewed as having outcomes.
Assume that the outcomes are characterized by $n \in \N$ random
variables. The experiment's time may be viewed in two distinct ways
%
\begin{description}
        \item\emph{Continuous Time:} Time ranges over a continuum of
                values taken from $\R_{\geq 0}$ (or, more generally, an
                uncountable subset of $\R_{\geq 0}$). That is, any $t
                \in \R_{\geq 0}$ is of interest. In this case, we can
                bundle those $n$ random variables into a random vector
                $\v{X}(t): \set{U} \mapsto \extR^n$ where $t \in
                \R_{\geq 0}$ is an instant of time. Therefore, $(
                \v{X}(t): t \in \R_{\geq 0})$ is called an
                \emph{($n$-dimensional) continuous-time random process}
                (\ie, the process is a net but not a sequence).
        \item\emph{Discrete Time:} Time ranges over a countable set of
                values taken from $\N$ (or, more generally, some
                countable set isomorphic to $\N$). That is, any $t \in
                \N$ is of interest. In this case, we say that time has
                been \emph{discretized} and we can bundle those $n$
                random variables into a random vector $\v{X}(t): \set{U}
                \mapsto \extR^n$ where $t \in \N$ is an instant of time
                that comes immediately after instant $(t-1)$. Therefore,
                $( \v{X}(t): t \in \N)$ or simply $(\v{X}(t))$ is called
                an \emph{($n$-dimensional) discrete-time random process}
                (\ie, the process is a sequence).
\end{description}
%
In both cases, each time might be viewed as a different \emph{trial} of
a particular random variable, where a continuous-time random process is
the limit as the density of trials (with respect to some interesting
outcome) increases.

\paragraph{Markov Processes and Chains:} Take an experiment modeled by
probability space $(\set{U},\Sigma,\Pr)$ and some $n \in \N$. Also take
an $n$-dimensional random process $(\v{X}(t) : t \in \set{T})$ on this
probability space where $\set{T} \subseteq \R$ (\ie, this may be a
continuous-time or a discrete-time process). Additionally, take
$\v{x}(t): \R \mapsto \extR^n$ to be some function of time and $y \in
\extR^n$ to be some constant. Assume that it is the case that for any $t
\in \R_{\geq 0}$ and any $h \in \R_{>0}$,
%
\begin{equation*}
        \Pr( \v{X}(t+h)=y 
        | \v{X}(s) = \v{x}(s) \text{ for all} s \leq t )
        =
        \Pr( \v{X}(t+h)=y | \v{X}(t) = \v{x}(t) )
\end{equation*}
%
That is, given the current state of the process, knowledge of any of the
past states of the process makes no impact on the probability of the
future states of the process. It might be said that this process has no
memory since its future trajectory depends only on its present state and
not any of its past states. This is known as the \emph{Markov property}
and such a process is called a \emph{Markov process}. If this is a
discrete-time random process, it will be called a \emph{Markov chain}.

\subsection{Sure and Almost Sure Stochastic Convergence}

Take a totally ordered set $\set{A} \subseteq \extR$ such that $\infty$
is a limit point of $\set{A}$ in $\extR$ (\eg, $\set{A} = \N$ or
$\set{A} = \R$). Also take an experiment modeled by probability space
$(\set{U},\Sigma,\Pr)$. With these, define a random process $( Y_t : t
\in \set{A} )$ where $Y_t: \set{U} \mapsto \extR$ is a random variable
(\ie, a one dimensional random vector) for each $t \in \set{A}$. Define
the set $\Omega \subseteq \set{U}$ by
%
\begin{equation*}
        \Omega
        \triangleq
        \{ 
        \zeta \in \set{U} 
        : 
        \text{there exists }
        p \in \extR 
        \text{ such that } 
        Y_t(\zeta) \to p 
        \text{ as } 
        t \to \infty
        \}
\end{equation*}
%
Now define function $Y: \Omega \mapsto \extR$ by
%
\begin{equation*}
        Y(\zeta) \triangleq \lim\limits_{t \to \infty} Y_t(\zeta)
\end{equation*}
%
for all $\zeta \in \Omega$. Note that $\Omega \subseteq \set{U}$, and so
$Y$ may not be a random variable in general. Additionally, even if
$\Omega = \set{U}$, there is no guarantee that $Y$ is a Borel measurable
function. Therefore, $Y$ should simply be viewed as a function with
domain $\Omega$ and codomain $\extR$. Of course, it may be the case that
there exists some $c \in \extR$ such that $Y(\zeta)=c$ for all $\zeta
\in \Omega$; in fact, this is often the case of most interest in
applications. However, here $\Omega$ is of critical interest.
%
\begin{description}
        \item\emph{Sure Convergence:} To say that $( Y_t: t \in
                \set{A})$ \emph{converges surely (to $Y(\zeta)$)} or
                \emph{converges (to $Y(\zeta)$) everywhere} means that
                $\Omega = \set{U}$.
                \symdef[]{Iprob.72}{sureconvergence}{$Y(t) \to
                Y$}{Random process $Y(t)$ converges surely to
                $Y$}\symdef[]{Iprob.7201}{ssureconvergence}{$Y(t)
                \xto{s.} Y$}{Random process $Y(t)$ converges surely to
                $Y$}\symdef[]{Iprob.7202}{slimsureconvergence}{$\lim
                \limits_{t\to\infty} Y(t) = Y$}{Random process $Y(t)$
                converges surely to $Y$}In this case, it is written
                %
                \begin{equation*}
                        Y_t \to Y
                        \quad \text{ or } \quad
                        Y_t \xto{s.} Y
                        \quad \text{ or } \quad
                        \lim\limits_{t \to \infty} Y_t = Y
                \end{equation*}
                %
                and this is called \emph{sure convergence} or
                \emph{everywhere convergence}.
        \item\emph{Almost Sure Convergence:} To say that $( Y_t: t \in
                \set{A} )$ \emph{converges almost surely (to
                $Y(\zeta)$)} or \emph{converges (to $Y(\zeta)$) with
                probability 1} or \emph{converges (to $Y(\zeta)$) almost
                everywhere} as $t \to \infty$ means that $\Pr( \Omega )
                = 1$. \sym{Iprob.7301}{$Y(t) \xto {a.s.} Y$}{Random
                process $Y(t)$ converges almost surely (\ie{},
                $\Pr(\lim_{t \to \infty} Y(t) = Y) = 1$) to
                $Y$}\symdef[]{Iprob.7302}{asureconvergencewp1}{$Y(t)
                \xto{w.p.1} Y$}{Random process $Y(t)$ converges almost
                surely (\ie, with probability 1) to
                $Y$}\symdef[]{Iprob.7303}{asureconvergenceaslim}{$\aslim
                \limits_{t \to \infty} Y(t) = Y$}{Random process $Y(t)$
                converges almost surely (\ie, with probability 1) to
                $Y$}In this case, it is written
                %
                \begin{equation*}
                        Y_t \xto{a.s.} Y
                        \quad \text{ or } \quad
                        Y_t \xto{w.p.1} Y
                        \quad \text{ or } \quad
                        \aslim\limits_{t \to \infty} Y_t = Y
                \end{equation*}
                %
                and this is called \emph{\acro{AS}{almost sure}
                convergence} or \emph{almost everywhere convergence}.
\end{description}

\subsection{Stochastic Convergence to Random Variables}

Take a totally ordered set $\set{A} \subseteq \extR$ such that $\infty$
is a limit point of $\set{A}$ in $\extR$ (\eg, $\set{A} = \N$ or
$\set{A} = \R$). Also take an experiment modeled by probability space
$(\set{U},\Sigma,\Pr)$. With these, define a random process $( Y_t : t
\in \set{A} )$ where $Y_t: \set{U} \mapsto \extR$ is a random variable
(\ie, a one dimensional random vector) for each $t \in \set{A}$.
Additionally, define an additional random variable $Y: \set{U} \mapsto
\extR$. There are four cases of interest.
%
\begin{description}
        \item\emph{Convergence in Probability:}
                \symdef[]{Iprob.7401}{convergenceinp}{$Y(t) \xto{P}
                Y$}{Random process $Y(t)$ converges in probability to
                random variable
                $Y$}\symdef[]{Iprob.7402}{convergenceinpr}{$Y(t)
                \xto{\Pr} Y$}{Random process $Y(t)$ converges in
                probability to random variable $Y$}To say $(Y_t: t \in
                \set{A})$ \emph{converges in probability}
                \symdef[]{Iprob.7403}{convergenceinplim}{$\plim\limits_{t
                \to \infty} Y(t) = Y$}{Random process $Y(t)$ converges
                in probability to random variable $Y$}To say $(Y_t: t
                \in \set{A})$ \emph{converges in probability} to random
                variable $Y$, denoted
                %
                \begin{equation*}
                        Y_t \xto{P} Y
                        \quad \text{ or } \quad
                        Y_t \xto{\Pr} Y
                        \quad \text{ or } \quad
                        \plim\limits_{t \to \infty} Y_t = Y
                \end{equation*}
                %
                means that for all $\varepsilon \in \R_{>0}$,
                %
                \begin{equation*}
                        \Pr( |Y_t - Y| > \varepsilon ) \to 0 
                        \text{ as }
                        t \to \infty
                \end{equation*}
                %
                or, equivalently,
                %
                \begin{equation*}
                        \Pr( |Y_t - Y| \leq \varepsilon ) \to 1
                        \text{ as }
                        t \to \infty
                \end{equation*}
        \item\emph{Convergence in Mean:}
                \symdef[]{Iprob.7501}{meanconvergence}{$Y(t) \xto{m.}
                Y$}{Random process $Y(t)$ converges in the mean to
                random variable
                $Y$}\symdef[]{Iprob.7502}{limeanconvergence} {$\limean
                \limits_{t \to \infty} Y(t) = Y$}{Random process $Y(t)$
                converges in the mean to random variable $Y$ (\ie, $Y$
                is \emph{l}imit \emph{i}n the \emph{m}ean)}To say $(Y_t:
                t \in \set{A})$ \emph{converges in the mean} to random
                variable $Y$, denoted
                %
                \begin{equation*}
                        Y_t \xto{m.} Y
                        \quad \text{ or } \quad
                        \limean\limits_{t \to \infty} Y_t = Y
                \end{equation*}
                %
                means that
                %
                \begin{equation*}
                        \lim\limits_{t \to \infty} 
                        \E( |Y_t - Y| ) = 0
                \end{equation*}
                %
                where $Y$ is called the \emph{limit in the mean}.
        \item\emph{Mean-Square Convergence:}
                \symdef[]{Iprob.7503}{msconvergence}{$Y(t) \xto{m.s.}
                Y$}{Random process $Y(t)$ converges in the mean square
                to random variable
                $Y$}\symdef[]{Iprob.7504}{mslimconvergence}{$\mslim\limits_{t
                \to \infty} Y(t) = Y$}{Random process $Y(t)$ converges
                in the mean square to random variable $Y$}To say $(Y_t:
                t \in \set{A})$ \emph{converges in the mean square} to
                random variable $Y$, denoted
                %
                \begin{equation*}
                        Y_t \xto{m.s.} Y
                        \quad \text{ or } \quad
                        \mslim\limits_{t \to \infty} Y_t = Y
                \end{equation*}
                %
                means that
                %
                \begin{equation*}
                        \lim\limits_{t \to \infty} 
                        \E( (Y_t - Y)^2 ) = 0
                \end{equation*}
                %
                where this type of convergence is called
                \emph{\acro{MS}{mean-square} convergence}.
        \item\emph{Convergence in Distribution:}
                \symdef[]{Iprob.7601}{convergenceind}{$Y(t) \xto{D}
                Y$}{Random process $Y(t)$ converges in distribution to
                random variable
                $Y$}\symdef[]{Iprob.7602}{convergenceinsmalld}{$Y(t)
                \xto{d} Y$}{Random process $Y(t)$ converges in
                distribution to random variable
                $Y$}\symdef[]{Iprob.7603}{convergenceindlim}{$\dlim
                \limits_{t \to \infty} Y(t) = Y$}{Random process $Y(t)$
                converges in distribution to random variable $Y$}To say
                $(Y_t: t \in \set{A})$ \emph{converges in distribution}
                to random variable $Y$, denoted
                %
                \begin{equation*}
                        Y_t \xto{D} Y
                        \quad \text{ or } \quad
                        Y_t \xto{d} Y
                        \quad \text{ or } \quad
                        \dlim\limits_{t \to \infty} Y_t = Y
                \end{equation*}
                %
                means that
                %
                \begin{equation*}
                        \lim\limits_{t \to \infty} F_{X_t}(x) = F_{X}(x)
                \end{equation*}
                %
                for all points $x \in \extR$ where $F_{X_t}$ is
                continuous.
\end{description}

\subsection{Relationships Among Kinds of Stochastic Convergence}

Take a totally ordered set $\set{A} \subseteq \extR$ such that $\infty$
is a limit point of $\set{A}$ in $\extR$ (\eg, $\set{A} = \N$ or
$\set{A} = \R$). Also take an experiment modeled by probability space
$(\set{U},\Sigma,\Pr)$. With these, define a random process $( Y_t : t
\in \set{A} )$ where $Y_t: \set{U} \mapsto \extR$ is a random variable
(\ie, a one dimensional random vector) for each $t \in \set{A}$.
Additionally, define a function $Y: \set{U} \mapsto \extR$. 
%
\begin{equation*}
        \text{ If } Y_t \to Y \text{ then } Y_t \xto{a.s.} Y.
\end{equation*}
%
Now assume that $Y$ is a random variable. In this case,
%
\begin{itemize}
        \item if $Y_t \xto{m.s.} Y$ then $Y_t \xto{m.} Y$
        \item if $Y_t \xto{a.s.} Y$ then $Y_t \xto{P} Y$
        \item if $Y_t \xto{m.} Y$ then $Y_t \xto{P} Y$
        \item if $Y_t \xto{P} Y$ then $Y_t \xto{D} Y$
\end{itemize}
%
Thus, \ac{MS} convergence and \ac{AS} convergence are of particular
interest in applications as they are relatively strong forms of
stochastic convergence. 

%That is,
%%
%\begin{equation*}
%        \text{m.s.} \implies \text{m.} \implies \text{P} \implies
%        \text{D}
%        \quad \text{ and } \quad
%        \text{a.s.} \implies \text{P} \implies \text{D}
%\end{equation*}