% Upper-case    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
% Lower-case    a b c d e f g h i j k l m n o p q r s t u v w x y z
% Digits        0 1 2 3 4 5 6 7 8 9
% Exclamation   !           Double quote "          Hash (number) #
% Dollar        $           Percent      %          Ampersand     &
% Acute accent  '           Left paren   (          Right paren   )
% Asterisk      *           Plus         +          Comma         ,
% Minus         -           Point        .          Solidus       /
% Colon         :           Semicolon    ;          Less than     <
% Equals        =           Greater than >          Question mark ?
% At            @           Left bracket [          Backslash     \
% Right bracket ]           Circumflex   ^          Underscore    _
% Grave accent  `           Left brace   {          Vertical bar  |
% Right brace   }           Tilde        ~

% ---------------------------------------------------------------------|
% --------------------------- 72 characters ---------------------------|
% ---------------------------------------------------------------------|
%
% Optimal Foraging Theory Revisited: Chapter 4. Finite-Lifetime
%                                    Optimization Results
% (note: this material was not included in final document)
%
% (c) Copyright 2007 by Theodore P. Pavlic
%

\chapter{Finite-Lifetime Optimization Results}
\label{ch:optimization_results_old}

\todo{The commentary on the COMBINED methods below simply is not
correct. While I was confident that a simpler algorithm than Stephen and
Krebs combined algorithm existed, now I'm fairly sure that their method
is the only way possible. While it is simple (and also a contribution)
to show the necessary conditions (sufficient is harder, but also doable)
on an optimum point of the combined problem, it is not so simple to
prove that an algorithm leads to those conditions. ADDITIONALLY, it
appears like the PATCH *AND* COMBINED results can be generalized just
like the PREY results. That would shorten this section and allow the
results to apply to a much wider set of examples. The end goal is to
derive a simple generalized combined algorithm, give it, and show the
existing cases here as simple examples.}

In this \chname{} results are derived for a number of the optimization
cases outlined in \longref{ch:optimization_objectives}. First, an
algorithm is presented that generalizes the solution approach commonly
used in classical \ac{OFT} and extends it for use with some additional
cases.  We then show how that algorithm is used by solving the task-type
and processing-length choice problems for the classical \ac{OFT} case as
well as some additional cases that take similar forms. Finally, we
derive solutions for some more complex cases. Due to the constraints in
the model, most of our optimization approaches will use the Lagrange
multiplier method (\ie, solutions will be given by the
Karush-Kuhn-Tucker conditions) \citep{Bertsekas95}; this is the same
approach used by \citet{Pulliam75} which derives optimal foraging
solutions with nutrient constraints.

\section{Generalized Profitability Ordering}
\label{sec:generalized_algorithm}

Here we derive an algorithm for solving certain types of task-type
choice problems. Conventional solution approaches to task-type optimal
foraging problems rely on ordering the profitability of choices of task
types \citep{Cha73,SK86}. We begin by generalizing this approach. A
key mathematical relationship that enables this generalization is that
if $a,b,c,d \in \R$ with $b>0$ and $d>0$ then
%
\begin{equation}
        \frac{a}{b} \geq \frac{c}{d} \iff
        \frac{a}{b} \geq \frac{c+a}{d+b} \iff
        \frac{c+a}{d+b} \geq \frac{c}{d}
        \label{eq:lemma}
\end{equation}
%
In fact, this is true if the three $\geq$ relations are all replaced
with any one of the relations from the set $\{{=},{>},{<},{\geq},
{\leq}\}$. %A proof of this can be found in \longref{app:lemma_proof}.

Assume that for each type $i \in \{1,\dots,n\}$, there exists a
parameter $\lambda_i \in \R_{>0}$ and a parameter vector
%
\begin{equation}
        \v{x}^i 
        = 
        [ x^i_1, x^i_2 ]^\T 
        \in 
        \{\v{z} = [z_1, z_2]^\T \in \R^2: z_2 \geq 0\}
        \label{eq:prof_vector}
\end{equation}
%
These (\ie, $\lambda_1, \dots, \lambda_n$, and $\v{x}^1, \dots,
\v{x}^n$) and constants $\gamma \in \R$ and $\varphi \in \R_{\geq 0}$
are parameters of a general payoff function $J: \{\set{S}: \set{S}
\subseteq \{1,\dots,n\}\} \mapsto \R$, defined with
%
\begin{equation}
        J(\set{Q})
        \triangleq 
        \frac{
        \gamma
        +
        \sum\limits_{j \in \set{Q} \cap \set{Z}} \lambda_j x^j_1
        +
        \sum\limits_{i \in \set{Q} \setdiff \set{Z}}
        \lambda_i x^i_1
        }
        {
        \varphi
        +
        \sum\limits_{i \in \set{Q} \setdiff \set{Z}}
        \lambda_i x^i_2
        }
        \label{eq:general_type_choice_cost_function}
\end{equation}
%
where set of types $\set{Z} \subseteq \{1,\dots,n\}$ is defined
%
\begin{equation*}
        \set{Z} \triangleq \{ i \in \{1,\dots,n\} : x^i_2 = 0 \}
\end{equation*}
%
In order for $J$ to be well-defined, we make the additional restriction
that if $\set{Z} \neq \emptyset$, it must be that $\varphi \neq 0$ (\ie,
$\varphi > 0$). $J$ maps a finite set of task types $\set{Q} \subseteq
\{1,\dots,n\}$ to some payoff $J(\set{Q})$. We maximize $J(\set{Q})$
over all $\set{Q} \in \{\set{S}: \set{S} \subseteq \{1,\dots,n\}\}$. In
other words, we choose the set of task-type indices to include to
\emph{globally} maximize the payoff function; all task-type indices not
in this set should be excluded. Note that if a type $i \in \set{Z}^=$
where $\set{Z}^= \triangleq \{ i \in \set{Z} : x^i_1 = 0 \}$, its
inclusion has no impact on the optimality of a set. Therefore, for
simplicity, we assume that $\set{Z}^= = \emptyset$. However, if there
are $n+m$ types and $|\set{Z}|=m$, the following can be applied to the
$n$-sized set $\{1,\dots,n+m\} \setdiff \set{Z}^=$; optimal sets
resulting from the method can then have any number of types from
$\set{Z}^=$ added to them arbitrarily.

Define sets $\set{Z}^+ \triangleq \{ i \in \set{Z} : x^i_1 > 0 \}$ and
$\set{Z}^- \triangleq \{ i \in \set{Z} : x^i_1 < 0 \}$. By the
assumption that $\set{Z}^= = \emptyset$, $\set{Z}^+ + \set{Z}^- =
\set{Z}$ and $\set{Z}^+ \cap \set{Z}^- = \emptyset$. Thus, these two
sets form a \emph{partition} of $\set{Z}$. For a task of type $i$, the
limit of $J(\set{Q})$ as $\lambda_i \to \infty$ is important to us. If
$\lambda_i$ is thought of as an encounter rate with tasks of type $i$
then this limit is the payoff returned from spending roughly all time
processing those tasks.  We call this limit the profitability of tasks
of type $i$, and we define profitability function $P: \{1,\dots,n\}
\mapsto \extR$ with
%
\begin{equation*}
        P( i ) 
        \triangleq 
        \lim\limits_{\lambda_i \to \infty} J(\{1,\dots,n\}) 
        = 
        \lim\limits_{\lambda_i \to \infty} J(\{i\}) 
        =
        \begin{cases}
                \infty &\text{if } i \in \set{Z}^+\\
                \frac{x^i_1}{x^i_2} 
                &\text{if } i \in \{1,\dots,n\} \setdiff \set{Z}\\
                {-\infty} &\text{if } i \in \set{Z}^-
        \end{cases}
\end{equation*}
%
Additionally, it is important to note that
for each $i \in \set{Q} \setdiff \set{Z}$,
%
\begin{equation*}
        P( i ) 
        = \frac{x^i_1}{x^i_2}
        = \frac{\lambda_i x^i_1}{\lambda_i x^i_2}
        %\label{eq:lambda_profitability}
\end{equation*}
%
Without loss of generality, we assume that task-type indices are ordered
such that $P(1) \geq P(2) \geq \dots \geq P( n-1 ) \geq P( n )$. In
particular, if $i,j \in \{0,\dots,n+1\}$ are such that $i=|\set{Z}^+|+1$
and $j=|\set\{1,\dots,n\}\setdiff\set{Z}^-|+1$, the indices are ordered
such that
%
\begin{align}
        \begin{split}
        \{1,\dots,i-1\} &= \set{Z}^+\\
        &\text{and}\\
        \frac{x^i_1}{x^i_2} \geq 
        \frac{x^{i+1}_1}{x^{i+1}_2} \geq 
        &\dots \geq 
        \frac{x^{j-2}_1}{x^{j-2}_2} \geq 
        \frac{x^{j-1}_1}{x^{j-1}_2}
        \end{split}
        \label{eq:prof_ordering}
\end{align}
%
The ordering of the types in $\set{Z}^+$ is not important as long as
they are lower in order than all other types. We will show that all of
these types will be included in the optimal set. 

Take the sets $\set{R}, \set{S} \subseteq \{1,\dots,n\}$. If $i \notin
\set{R}$ and there exists $j \in \set{R}$ such that $i < j$ then we
refer to $i$ as a hole in $\set{R}$. Additionally, if $J(\set{S}) \geq
J(\set{R})$ then we say that $\set{S}$ is a better set than $\set{R}$.

Take $\set{Q} \subseteq \{1,\dots,n\}$ and $i,j \in \{1,\dots,n\}$ with
$i < j$ such that $i \notin \set{Q}$ and $j \in \set{Q}$. That is, $i$
is a hole in set $\set{Q}$. Our claim is that there exists a set
$\set{V}$ with $J(\set{V}) \geq J(\set{Q})$ such that for each $p \in
\set{V}$, for all $q \in \{1,\dots,p\}$, $q \in \set{V}$. In other
words, for any index set $\set{Q}$ that has a hole there exists a better
index set $\set{V}$ that has no holes.

First, take hole $i$ such that $i \in \set{Z}$. There are two cases that
result from whether or not $x^i_1$ is less than zero (\ie, whether or
not the $P(i) = \infty$ or $P(i) = -\infty$):
%
\begin{itemize}
        \item $x^i_1 \geq 0$: Since the denominator of $J(\set{Q})$ is
                positive by construction, $J(\set{Q}\cup\{i\}) \geq
                J(\set{Q})$. Thus, $\set{Q}\cup\{i\}$ is a better set
                with one less hole. In fact, by
                \longref{eq:prof_ordering}, for all $k \in
                \{1,\dots,i-1\}$, $x^k_1 \geq 0$. Hence, if there exists
                $k \in \{1,\dots,i-1\}$ such that $k \notin \set{Q}$,
                this case will apply gain for the hole $k$ in the better
                set $\set{Q}\cup\{i\}$. Therefore, this case can be
                applied repeatedly until all holes in $\set{Q}$ less
                than $i$ are removed.
        \item $x^i_1 < 0$: By \longref{eq:prof_ordering}, $x^j_1 < 0$.
                Since the denominator of $J(\set{Q})$ is positive by
                construction, $J(\set{Q}\setdiff\{j\}) > J(\set{Q})$.
                Hence, the smaller set $\set{Q}\setdiff\{j\}$ is better
                than set $\set{Q}$. However, by
                \longref{eq:prof_ordering}, for all $k \in
                \{i+1,\dots,n\}$, $x^i_1 < 0$. Therefore, if there
                exists $k \in \set{Q}\setdiff\{j\}$ with $k > i$, this
                case can be applied again to produce a better set
                without $k$.  This repetition can continue until no such
                $k$ exists.  At that point, $i$ ceases to be a hole; the
                result is a better set with at least one less hole.
\end{itemize}
%
Clearly, the repeated application of the first case guarantees that a
better set $\set{V}$ exists that includes every task $\ell \in
\set{Z}^+$. Additionally, the repeated application of the second case
guarantees that a better set $\set{V}$ exists that excludes every task
$\ell \in \set{Z}^-$. In other words, $J( \set{Q} \cup \set{Z}^+
\setdiff \set{Z}^- ) \geq J( \set{Q} )$, therefore assume that there is
no hole $i$ such that $i \in \set{Z}$. If no holes remain then
$J(\set{Z}^+) \geq J(\set{Q})$ and the claim is proved. Assume that
there exists a hole $i$ such that $i \notin \set{Z}$. Therefore, by the
definition of a hole, there must also exist a $j \in \set{Q}$ with $j >
i$. By the assumption just made, there is no $\ell \in \set{Z}^-$ such
that $\ell \in \set{Q}$, so it must be that $j \notin \set{Z}$. There
are two cases that result from whether or not $P(i)$, the profitability
of the hole $i$, is less than $J(\set{Q})$:
%
\begin{itemize}
        \item $P(i) = x^i_1/x^i_2 = \lambda_i x^i_1 / ( \lambda_i x^i_2)
                \geq J(\set{Q})$: By \longref{eq:lemma},
                $J(\set{Q}\cup\{i\}) \geq J(\set{Q})$. Thus,
                $\set{Q}\cup\{i\}$ is a better set with one less hole.
                In fact, by \longref{eq:prof_ordering}, for all $k \in
                \{1,\dots,i-1\}-\set{Z}$, $x^k_1/x^k_2 \geq
                x^i_1/x^i_2$, and by \longref{eq:lemma}, $x^i_1/x^i_2
                \geq J(\set{Q}\cup\{i\})$, so $x^k_1/x^k_2 \geq
                J(\set{Q}\cup\{i\})$. Hence, if there exists $k \in
                \{1,\dots,i-1\}\setdiff\set{Z}$ such that $k \notin
                \set{Q}$, this case will apply again for the hole $k$ in
                the better set $\set{Q}\cup\{i\}$. Therefore, this case
                can be applied repeatedly until all holes in
                $\set{Q}\setdiff\set{Z}$ less than $i$ are removed.
        \item $J(\set{Q}) > P(i) = x^i_1/x^i_2 = \lambda_i x^i_1 / (
                \lambda_i x^i_2 )$: By \longref{eq:prof_ordering},
                $x^i_1/x^i_2 \geq x^j_1/x^j_2$; thus, $J(\set{Q}) >
                x^j_1/x^j_2$. So, by \longref{eq:lemma},
                $J(\set{Q}\setdiff\{j\}) > J(\set{Q})$. Hence, the
                smaller set $\set{Q}\setdiff\{j\}$ is better than set
                $\set{Q}$. However, by \longref{eq:prof_ordering}, for
                all $k \in \{i+1,\dots,n\} \setdiff \set{Z}$,
                $x^i_1/x^i_2 \geq x^k_1/x^k_2$, and thus,
                $J(\set{Q}\setdiff\{j\}) > x^k_1/x^k_2$. Therefore, if
                there exists $k \in
                \set{Q}\setdiff\{j\}\setdiff\set{Z}$ with $k > i$,
                this case can be applied again to produce a better set
                without $k$. This repetition can continue until no such
                $k$ exists. At that point, because all types in set
                $\set{Z}^-$ are also excluded in the better set, $i$
                ceases to be a hole; the result is a better set with at
                least one less hole.
\end{itemize}
%
In the first case, all holes $i$ with $P(i) \geq J(\set{Q})$ are
included shown to be included in a better set. In the second case, all
holes $i$ with $P(i) < J(\set{Q})$ are shown to be excluded in a better
set. By the ordering of the indices, this new better set is of the form
$\{1,2,\dots,k\}$ where $k \in \{0,\dots,n\}$, and so it has no holes.
The claim is proved.

Therefore, when maximizing $J$ over all $\set{Q} \subseteq
\{1,\dots,n\}$, only the $n$ sets of the form $\{1,\dots,k\}$ with $k
\in \{1,\dots,n\}$ and the empty set need to be considered. These $n+1$
possible combinations are a significant reduction from the original
$2^n$ possible combinations.

Assuming that tasks have been ordered by their profitabilities as
described above (\ie, as in \longref{eq:prof_ordering}) and $i$ is the
lowest-ordered type such that $x^i_2 > 0$ (or $n+1$ if no such type
exists) and $j$ is the lowest-ordered type such that $x^i_1 < 0$ (or
$n+1$ if no such type exists), an algorithm for finding a set of task
types for inclusion that will globally maximize $J$ is the following (it
depends on the fact that there will always exist an optimal set of the
form $\{1,\dots,k\}$ where $k \in \{0,\dots,n\}$ and uses
\longref{eq:lemma} to find such a set):
%
\begin{itemize}
        \item Exclude all sets of the form $\{1,\dots,\ell\}$ where
                $\ell < i-1$ and continue if $j > i$.
        \item If $P(i) = x^i_1/x^i_2 \geq J(\{1,\dots,i-1\})$ then
                $J(\{1,\dots,i\}) \geq J(\{1,\dots,i-1\})$; so, exclude
                the set $\{1,\dots,i-1\}$ and continue if $j > i+1$.
        \item If $P(i+1) = x^{i+1}_1/x^{i+1}_2 \geq J(\{1,\dots,i\})$,
                then $J(\{1,\dots,i+1\}) \geq J(\{1,\dots,i\})$; so,
                exclude the set $\{1,\dots,i\}$ and continue if $j >
                i+2$.
        \item \dots
        \item If $P(k) = x^k_1/x^k_2 \geq J(\{1,\dots,k-1\})$ then
                $J(\{1,\dots,k\}) \geq J(\{1,\dots,k-1\})$; so, exclude
                the set $\{1,\dots,k-1\}$ and continue if $j > k+1$.
        \item If $P(k+1) = x^{k+1}_1/x^{k+1}_2 < J(\{1,\dots,k\})$ then
                inclusion of any type $j \in \{k+1,\dots,n\}$ will only
                cause a decrease in $J$ by the profitability ordering.
                Hence, the set $\{1,\dots,k\}$ maximizes $J$. Choose
                $\{1,\dots,k\}$ and stop.
\end{itemize}
%
If these steps fail to find an optimal set then the set
$\{1,\dots,j-1\}$ should be used (\ie, every type except for the types
in set $\set{Z}^-$). While this algorithm finds a single set to maximize
$J$, there may be additional sets of equal value. In fact, this
algorithm finds the largest set that will globally maximize $J$; a
similar algorithm can be constructed to find the smallest maximizing set
(recall that it was assumed that $\set{Z}^= = \emptyset$; if this is not
true, larger sets can be made by adding types from $\set{Z}^=$).

Assume that these steps lead to the conclusion that the empty set is an
optimal set; however, also assume that the empty set is not valid for
the particular application. Consider the set $\set{G} \triangleq \{ i
\in \{1,\dots,n\}: \text{ for all} j \in \{1,\dots,n\}, P(i) \geq P(j)
\}$ (\ie, the set of all types that share the highest profitability).
The optimal set will be any singleton set $\{i\}$ with $i \in \set{G}$
such that for all $j \in \set{G}$, $\lambda_i x_i \geq \lambda_j x_j$.
We leave this claim without proof; however, it also follows from
\longref{eq:lemma}.

Alternatively, as long as task-type indices are ordered by
profitability, a trivial algorithm can be used that calculates
$J(\set{V})$ for each $\set{V} \in \{ \set{Q} \subseteq \{1,\dots,n\}: i
\in \set{Q}, j \in \{1,\dots,i\} \implies j \in \set{Q} \}$ and picks
the set that results in the greatest $J(\set{V})$ over these $n+1$ sets.
Note that the restriction of $J$ to these $n+1$ sets is actually convex
in the cardinality of its argument; in fact, this is a key reason why
the above algorithm works. This convexity can reduce the number of steps
required to complete such an algorithm.

\section{OFT and The Rate-Maximizing Approach}
\label{sec:rate_maximization}

As discussed in \longref{ch:optimization_objectives}, classical \ac{OFT}
makes use of the long-term rate of gain in
\longref{eq:oft_payoff_long_rate}. In our model, where we assume $N^p$
to be fixed, we define the long-term rate of gain as the ratio of
expectations in \longref{eq:payoff_rate} which results in the same
expression as rate used in classical \ac{OFT}. Hence, the results here
will be the same results found in classical \ac{OFT}. Nevertheless, we
present these results here to assist in comparing and contrasting them
with our other optimization measures. Additionally, our solution to the
task-type choice problem makes use of the generalized algorithm
developed in \longref{sec:generalized_algorithm}, and thus it serves as
an important example. We also present a simple method for solving
combined task-type and processing-length choice problems that is far
simpler than the algorithm in \citet{SK86} and does not require any of
the strange assumptions on the gain function.
%\citet[section~2.4]{SK86} 

\section{Task-Type Choice}
\label{sec:type_rate}

For the task-type choice case, the long-term rate is a function $J:
[0,1]^n \mapsto \R$, defined as
%
\begin{equation*}
        J(\v{p}) 
        \triangleq
        \frac{\E(G_1)}{\E(T_1)} = 
        \frac{ \sum\limits_{i=1}^n p_i \lambda_i \left( g_i - c_i
        \tau_i \right) - c^s }
        { \sum\limits_{i=1}^n p_i \lambda_i \tau_i + 1 }
\end{equation*}
%
Take $j \in \{1,\dots,n\}$ and $\v{q},\v{r},\v{s} \in [0,1]^n$ with $q_j
\in (0,1)$, $r_j = 0$, $s_j = 1$, and $q_i = r_i = s_i$ for all $i \in
\{1,\dots,n\} \setdiff \{j\}$. If $\v{q}$ is optimal then $\partial
J(\v{p})/\partial p_j|_{\v{p}=\v{q}} = 0$. However,
%
\begin{equation*}
        \left. \frac{ \partial J }{ \partial p_j } \right|_{\v{p}=\v{q}}
%        =  
%        \frac{
%        \left( 
%        \sum\limits_{i=1,i \neq j}^n q_i \lambda_i \tau_i + 1 
%        \right)
%        \lambda_j \left( g_j - c_j \tau_j \right)
%        -
%        \left( 
%        \sum\limits_{i=1,i \neq j}^n q_i \lambda_i \left( g_i - c_i
%        \tau_i \right) - c^s 
%        \right)
%        \lambda_j \tau_j
%        }
%        { \left( 
%            \sum\limits_{i=1}^n q_i \lambda_i \tau_i + 1 
%            \right)^2 }
        =  
        \lambda_j
        \frac{
        \left( 
        \sum\limits_{\substack{1 \leq i \leq n\\i \neq j}}
        q_i \lambda_i \tau_i
        + 1 
        \right)
        \left( g_j - c_j \tau_j \right)
        -
        \left( 
        \sum\limits_{\substack{1 \leq i \leq n\\i \neq j}}
        q_i \lambda_i
        \left( g_i - c_i \tau_i \right) - c^s 
        \right)
        \tau_j
        }
        { \left( 
            \sum\limits_{i=1}^n q_i \lambda_i \tau_i + 1 
            \right)^2 }
\end{equation*}
%
The sign of this expression is completely determined by the numerator of
the fraction, which is independent of $q_j$. Thus, if the numerator is
nonzero, $\v{q}$ cannot be optimal. If the numerator is zero then 
%
\begin{equation*}
        \frac{q_j \lambda_j \left( g_j - c_j \tau_j \right) }
        {q_j \lambda_j \tau_j}
        =
        \frac{\lambda_j \left( g_j - c_j \tau_j \right) }
        {\lambda_j \tau_j}
        = 
%        \frac{g_j - c_j \tau_j}{\tau_j} 
%        =
%        \frac{
%        \sum\limits_{i=1,i \neq j}^n q_i \lambda_i 
%        \left( g_i - c_i \tau_i \right) - c^s
%        }{\sum\limits_{i=1,i \neq j}^n q_i \lambda_i \tau_i + 1}
        \frac{
        \sum\limits_{\substack{1 \leq i \leq n\\i \neq j}}
        q_i \lambda_i \left( g_i - c_i \tau_i \right) - c^s
        }{
        \sum\limits_{\substack{1 \leq i \leq n\\i \neq j}}
        q_i \lambda_i \tau_i + 1
        }
        =
        J(\v{r})
\end{equation*}
%
which, by \longref{eq:lemma}, indicates that $J(\v{q}) = J(\v{r}) =
J(\v{s})$. Hence, if $\v{q}$ is optimal then it provides no benefit
over $\v{r}$ and $\v{s}$, and those vectors can be chosen instead.
Therefore, vectors of the form $\v{q}$ can be excluded and $J$ can be
replaced with its restriction to the set $\{0,1\}^n$. This is known as
the \emph{zero-one rule} \citep[p.~18]{SK86}. In other words, preference
can be thought of as choosing an index set $\set{Q} \subseteq
\{1,\dots,n\}$ for inclusion. That is, for any index set $\set{Q}
\subseteq \{1,\dots,n\}$, $\v{p}$ is the corresponding preference vector
if $i \in \set{Q} \iff p_i = 1$ for all $i \in \{1,\dots,n\}$. Hence,
rather than restricting $J(p)$ to $\{0,1\}^n$, we redefine it without
loss of generality as $J: \{\set{S}: \set{S} \subseteq \{1,\dots,n\}\}
\mapsto \R$ with
%
\begin{equation}
        J(\set{Q}) 
        \triangleq
        \frac{ 
        \sum\limits_{i \in \set{Q}} \lambda_i \left( g_i - c_i
        \tau_i \right) - c^s }
        { \sum\limits_{i \in \set{Q}} \lambda_i \tau_i + 1 }
        \label{eq:task_type_choice_rate_cost}
\end{equation}
%
For each task type $i \in \{1,\dots,n\}$, define the vector in
\longref{eq:prof_vector} as
%
\begin{equation*}
        \v{x}^i = [ g_i - c_i \tau_i, \tau_i ]^\T
\end{equation*}
%
Also let constants $\gamma=-c^s$ and $\varphi=1$. Hence,
\longref{eq:task_type_choice_rate_cost} can be written in the general
payoff function form of \longref{eq:general_type_choice_cost_function}.
So, the profitability
%
\begin{equation*}
        P( i )
        =
        \begin{cases}
                \infty &\text{if } \tau_i = 0 \text{ and } g_i > 0\\
                \frac{g_i - c_i \tau_i}{\tau_i} &\text{if } \tau_i > 0\\
                {-\infty} &\text{if } \tau_i = 0 \text{ and } g_i < 0
        \end{cases}
\end{equation*}
%
which is the long-term expected rate of gain for processing $N^p$ tasks
if there is zero wait time between sequential task-type $i$ encounters
(provided that task type $i$ has $g_i \neq 0$ or $\tau_i > 0$). In this
special case, the form of $P(i)$ can also be interpreted as the
long-term rate of gain for processing a \emph{single task} of type $i$.
Order the task-type indices such that \longref{eq:prof_ordering} holds.
Now, the generalized task-type choice algorithm described in
\longref{sec:generalized_algorithm} can be used to find a set of task
types for inclusion that will globally maximize the rate.

\subsection{Discussion} 

%\citet[section~2.2]{SK86} 
Note that while the algorithm described here is very similar to the
\citet{SK86} algorithm, its subtle differences allow for use with our
model, which uses a much generalized parameter set. As we will discuss
in \longref{sec:combined_type_length_rate}, one consequence of this is
that the combined task-type and processing-length choice problem can be
solved in a simple intuitive way completely different from the method
described by \citet{SK86}.
%\citet[section~2.4]{SK86}.

\section{Processing-Length Choice}
\label{sec:length_rate}

For the processing-length choice case, the long-term rate is a
function $J: \R_{\geq 0}^n \mapsto \R$, defined as
%
\begin{equation*}
        J(\v{\tau}) 
        \triangleq
        \frac{ \sum\limits_{i=1}^n \lambda_i \left( g_i(\tau_i) - c_i
        \tau_i \right) - c^s }
        { \sum\limits_{i=1}^n \lambda_i \tau_i + 1 }
\end{equation*}
%
In the task-type choice case, we found an algorithm to simplify picking
the global maximum from a finite set of candidates. For this case, there
are uncountably many candidates to consider and a wide range of possible
gain function shapes that prevent us from making generalizations about
global optimality. Thus, we will instead derive first-order and
second-order necessary conditions for local optimality.  We also will
provide sufficiency conditions for an optimal candidate to be a local
maximum.

We will use the Lagrange multiplier method \citep{Bertsekas95} to derive
necessary and sufficiency conditions for local maxima of $J$. That is,
we will solve the constrained optimization problem
%
\begin{equation}
        \begin{split}
        &\text{minimize} \enskip {-J(\v{\tau})}\\
        &\text{subject to} \enskip {-\tau_i} \leq 0 
        \enskip \text{for all} \enskip i \in \{1,\dots,n\}
        \end{split}
        \label{eq:problem_processing_length}
\end{equation}
%
To do this, we assume that for all $i \in \{1,\dots,n\}$, the gain
function $g_i(\tau_i)$ is continuously differentiable. This assumption
will lead to first-order necessary conditions on local optimality. We
also derive second-order necessary and sufficiency conditions in the
case where for all $i \in \{1,\dots,n\}$ the gain function $g_i(\tau_i)$
is twice continuously differentiable. These restrictions only need to
hold within some open neighborhood containing a local maximum of $J$. 

Define the Lagrangian $L: \R_{\geq 0}^n \times \R^n \mapsto \R$,
%
\begin{equation}
        L( \v{\tau}, \v{\mu} ) 
        \triangleq
        -J( \v{ \tau } ) - \v{ \mu }^\T \v{ \tau }
        \label{eq:lagrangian_processing_length}
\end{equation}
%
Also define $\set{A}(\v{\tau})$, the set of active inequality
constraints, as
%
\begin{equation}
        \set{A}(\v{\tau}) 
        \triangleq \{j \in \{1,\dots,n\}: \tau_j = 0\}
        \label{eq:active_constraints_processing_length}
\end{equation}
%
Note that for all $\v{\tau} \in \R_{\geq 0}^n$, for all $j \in
\set{A}(\v{\tau})$, the active inequality constraint gradient
$\nabla_\v{\tau} ( -\tau_j ) = \v{e}_j$ where $\v{e}_j$ is the $j\th$
elementary basis vector for $\R^n$. Clearly the active inequality
constraint gradients are linearly independent at all points $\v{\tau}
\in \R_{\geq 0}^n$. Thus, all points $\v{\tau} \in \R_{\geq 0}^n$ are
regular.

\subsection{First-Order Necessary Conditions}

Let $\v{\tau}^* \in \R^n$ be a local maximum of $J$. By the above
reasoning, $\v{\tau}^*$ is regular. Then there must exist a unique
Lagrange multiplier vector $\v{\mu}^* \in \R^n$ such that
%
\begin{equation}
        \begin{split}
        \left. 
        \nabla_\v{\tau} 
        L(\v{\tau}, \v{\mu}) 
        \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} &= \v{0},\\
        j \in \set{A}(\v{\tau}^*) 
        &\implies 
        \mu^*_j \geq 0,\\
        j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*) 
        &\implies 
        \mu^*_j = 0
        \end{split}
        \label{eq:lagrangian_first_order}
\end{equation}
%
Let $\v{\mu}^*$ be such a multiplier vector. For all $j \in
\{1,\dots,n\}$,
%
\begin{equation}
        \left. 
        \frac{ \partial L }{ \partial \tau_j }
        \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)}
        =
        {-\left.
        \frac{ \partial J }{ \partial \tau_j }
        \right|_{\v{\tau}=\v{\tau}^*}}
        - \mu^*_j
        = 0
        \label{eq:lagrangian_first_order_j}
\end{equation}
%
which means that the multiplier
%
\begin{align*}
        \mu^*_j &=
        \left.
        -\frac{ \partial J }{ \partial \tau_j }
        \right|_{\v{\tau}=\v{\tau}^*}\\
        &=
        -\frac{
        \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right) 
        \lambda_j \left( {g_j}'(\tau^*_j) - c_j \right)
        -
        \left( 
        \sum\limits_{i=1}^n \lambda_i 
        \left( g_i(\tau^*_i) - c_i \tau^*_i \right) 
        - c^s 
        \right) 
        \lambda_j
        }
        { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^2 }
\end{align*}
%
For $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, $\tau^*_j > 0$
and the multiplier $\mu^*_j = 0$, hence it must be that
%
\begin{equation}
        {g_j}'( \tau^*_j ) - c_j 
        =
        \frac{ \sum\limits_{i=1}^n \lambda_i 
        \left( g_i(\tau^*_i) - c_i \tau^*_i \right) 
        - c^s }
        { \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 }
        = 
        J( \v{\tau}^* )
        \label{eq:patch_rate_mvt}
\end{equation}
%
However, for $j \in \set{A}(\v{\tau}^*)$, $\tau^*_j = 0$ and the
multiplier $\mu^*_j \geq 0$, hence, it must be that
%
\begin{equation}
        {g_j}'( 0 ) - c_j 
        \leq
        \frac{ \sum\limits_{i=1}^n \lambda_i 
        \left( g_i(\tau^*_i) - c_i \tau^*_i \right) 
        - c^s }
        { \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 }
        = 
        J( \v{\tau}^* )
        \label{eq:patch_rate_mvt_0}
\end{equation}

\subsection{Second-Order Necessary Conditions}
Furthermore, assume that for all $i \in \{1,\dots,n\}$, $g_i(\tau_i)$ is
twice continuously differentiable at $\tau_i = \tau^*_i$. Define the
feasible variations at $\v{\tau}^*$ as 
%
\begin{equation*}
        \set{V}(\v{\tau}^*) \triangleq 
        \left\{ 
        \v{x} \in \R_{\geq 0}^n : x_j = 0, j \in \set{A}(\v{\tau}^*)
        \right\}
\end{equation*}
%
It holds that for all $\v{y} \in \set{V}(\v{\tau}^*)$,
%
\begin{equation}
        \left. 
        \v{y}^\T \nabla_{\v{\tau}\v{\tau}}^2 L(\v{\tau},\v{\mu}) \v{y}
        \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} \geq 0
        \label{eq:patch_rate_sufficient}
\end{equation}
%
In other words, for all $\v{x} \in \R_{\geq 0}^n$, 
%
\begin{equation}
        \sum
        \limits_{\substack{1 \leq i \leq n\\
                           i \notin \set{A}(\v{\tau}^*)}}
        \sum
        \limits_{\substack{1 \leq j \leq n\\
                           j \notin \set{A}(\v{\tau}^*)}}
        x_i x_j 
        \frac{\partial^2 L}{\partial \tau_i \tau_j}
        \geq 0
        \label{eq:lagrangian_feasible_hessian}
\end{equation}
%
However, all the off-diagonal terms are zero since for all $j,\ell \in
\{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ with $j \neq \ell$,
%
\begin{align*}
        \left.
        \frac{\partial^2 L}{\partial \tau_j \partial \tau_\ell}
        \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*=\v{\mu}^*)}
        &=
        2 
        \lambda_\ell \lambda_j
        \frac{
        \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)
        \left( g_j'(\tau^*_j) - c_j \right)
        }
        { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^3 }
        \\
        &\mathrel{\phantom{=}}
        {}-
        2 
        \lambda_\ell \lambda_j
        \frac{
        \sum\limits_{i=1}^n 
        \lambda_i
        \left( g_i(\tau^*_i) - c_i \tau^*_i \right)
        - c^s
        }
        { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^3 }
        \\
        &\mathrel{\phantom{=}}
        {}-
        \lambda_\ell \lambda_j
        \frac{
        \left( g'_j( \tau^*_j ) - c_j \right)
        -
        \left( g'_\ell( \tau^*_\ell ) - c_\ell \right)
        }
        { 
        \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^2
        }\\
\intertext{and by \longref{eq:patch_rate_mvt},}
        \left.
        \frac{\partial^2 L}{\partial \tau_j \partial \tau_\ell}
        \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*=\v{\mu}^*)}
        &=
        2 
        \lambda_\ell \lambda_j
        \frac{
        \sum\limits_{i=1}^n 
        \lambda_i
        \left( g_i(\tau^*_i) - c_i \tau^*_i \right)
        - c^s
        }
        { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^3 }
        \\
        &\mathrel{\phantom{=}}
        {}-
        2 
        \lambda_\ell \lambda_j
        \frac{
        \sum\limits_{i=1}^n 
        \lambda_i
        \left( g_i(\tau^*_i) - c_i \tau^*_i \right)
        - c^s
        }
        { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^3 }
        \\
        &\mathrel{\phantom{=}}
        {}-
        \lambda_\ell \lambda_j
        \frac{ J(\v{\tau}^*) - J(\v{\tau}^*) }
        { 
        \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^2
        }\\
        &= 0
\end{align*}
%
Therefore, \longref{eq:patch_rate_sufficient} is equivalent to
%
\begin{equation}
        \sum
        \limits_{\substack{1 \leq i \leq n\\
                           i \notin \set{A}(\v{\tau}^*)}}
        x_i^2
        \frac{\partial^2 L}{{\partial \tau_i}^2}
        \geq 0
        \label{eq:lagrangian_perfect_square_hessian}
\end{equation}
%
Thus, for all $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, 
%
\begin{align*}
        \left. 
        \frac{\partial^2 L}{{\partial \tau_j}^2} 
        \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} 
        =
%        &=
%        -\frac{
%        \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^2
%        \left( 
%        \begin{split}
%        \lambda_i^2 &\left( {g_i}'(\tau^*_i) - c_i \right)\\
%        &{}+\lambda_i 
%        \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)
%        {g_i}''(\tau^*_i)\\
%        &{}-\lambda_i^2 \left( {g_i}'(\tau^*_i) - c_i \right)
%        \end{split}
%        \right)
%        }
%        { 
%        \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^4
%        }\\
%        &=
        -\frac{
        \lambda_j {g_j}''(\tau^*_j)
        }
        { \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 }
        \geq
        0
\end{align*}
%
In other words, for all $j \in \{1,\dots,n\} \setdiff
\set{A}(\v{\tau}^*)$,
%
\begin{equation}
        {g_j}''(\tau_j^*) \leq 0
        \label{eq:patch_rate_mvt_2nd}
\end{equation}

\subsection{Second-Order Sufficiency Conditions} 

Assume that for all $i \in \{1,\dots,n\}$, the gain function
$g_i(\tau_i)$ is twice continuously differentiable, and let $\v{\tau}^*
\in \R_{\geq 0}^n$ be a vector such that \longrefs{eq:patch_rate_mvt}
and \shortref{eq:patch_rate_mvt_2nd} hold for all $j \in \{1,\dots,n\}
\setdiff \set{A}(\v{\tau}^*)$ and \longref{eq:patch_rate_mvt_0} holds
for all $j \in \set{A}(\v{\tau}^*)$. If \longrefs{eq:patch_rate_mvt_0}
and \longref{eq:patch_rate_mvt_2nd} both hold with strict inequality
then $\v{\tau}^*$ is a local maximum of $J$.

\subsection{Discussion} 

%\citet[p.~25]{SK86} 
Together, \longrefs{eq:patch_rate_mvt}, \shortref{eq:patch_rate_mvt_0},
and \shortref{eq:patch_rate_mvt_2nd} indicate that for tasks of
diminishing returns, if the initial rate of return from the individual
task is higher than the mean rate of return from the environment as a
whole then the agent should process the task until the instantaneous
rate of return processing the task drops to the mean rate of return from
the environment. This is known as the marginal-value theorem
\citep{Cha76}. However, in order to use this marginal-value theorem
interpretation, gain functions that do not meet all of these conditions
have to be excluded. For example, \citet{SK86} choose the restrictions
that for all $i \in \{1,\dots,n\}$:
%
\begin{itemize}
        \item $c_i = 0$ since $g_i(\tau_i)$ is defined as the average
                net gain.
        \item $g_i(0) = 0$
        \item $g_i'(0) > 0$ 
        \item There exists some $\widetilde{\tau}_i \in \R_{\geq 0}$
                such that for all $\tau_i \geq \widetilde{\tau}_i$,
                $g''(\tau_i) < 0$.
\end{itemize}
%
If $c^s = 0$ then these three conditions meet all of the requirements
of the marginal-value theorem. However, our method for solving the 
processing-length problem does not require any of these assumptions.
Thus, even if solutions do not have an intuitive marginal-value
interpretation, they still exist. 

As we discuss in \longref{sec:combined_type_length_rate}, the
restriction that $g_i(0)=0$ for all $i \in \{1,\dots,n\}$ actually
eliminates the combined type and length choice problem entirely. For
this case, any type $i \in \{1,\dots,n\}$ with an optimal 
processing-length choice solution $\tau_i^* = 0$ can safely be excluded
from the optimal task-type choice set and all other types can be
included. This is intuitive.

\section{Combined Type and Length Choice}
\label{sec:combined_type_length_rate}

\todo{Like I mentioned above, the stuff below needs to be changed. I
realized this while completing it, which is why it is incomplete.}

For the combined task-type and processing-length choice case, the
long-term rate is a function $J: [0,1]^n \times \R_{\geq 0}^n \mapsto
\R$, defined as
%
\begin{equation*}
        J(\v{p},\v{\tau}) 
        \triangleq
        \frac{ \sum\limits_{i=1}^n \lambda_i p_i 
        \left( g_i(\tau_i) - c_i \tau_i \right) - c^s }
        { \sum\limits_{i=1}^n \lambda_i p_i \tau_i + 1 }
\end{equation*}
%
Using an argument similar to the one in \longref{sec:type_rate}, it can
be shown that the zero-one rule still holds; thus, for any optimal
point $(\v{p}^*,\v{\tau}^*) \in [0,1]^n \times \R_{\geq 0}^n$, there
exists a point $(\hat{p},\v{\tau^*}) \in \{0,1\}^n \times \R_{\geq
0}^n$

optimal set exists
with $p_i \in \{0,1\}$ for all $i \in \{1,\dots,n\}$. 


For all $i \in \{1,\dots,n\}$, assume that $g_i(0)=0$. Take
$(\v{p}^*,\v{\tau}^*)$ to be a local maximum of $J(\v{p},\v{\tau})$.
Take $(\v{\hat{p}},\v{\hat{\tau}})$ so that for all $i \in
\{1,\dots,n\}$,
%
\begin{equation*}
        (\hat{p}_i,\hat{\tau}_i)
        =
        \begin{cases}
                (0,0) &\text{if } p^*_i = 0 \text{ or } \tau^*_i = 0\\
                (p^*_i,\tau^*_i) &\text{otherwise}
        \end{cases}
\end{equation*}
%
Certainly the point $(\v{\hat{p}},\v{\hat{\tau}})$ is not a \emph{local
maximum} of $J$ in general; however, due to our assumption about the
initial value of the gain functions, for all $i \in \{1,\dots,n\}$ with
either $p^*_i = 0$ or $\tau^*_i = 0$ or both, $p^*_i \tau^*_i = p^*_i (
g( \tau^*_i ) - c_i \tau^*_i ) = 0$, and thus
$J(\v{p}^*,\v{\tau}^*)=J(\v{\hat{p}},\v{\hat{\tau}})$. 

% The choice of t_j > 0 for P-j=0 has no impact on J, and so
% maximization of J


We claim that 

For the sake of argument, consider only vectors from the
space $\{ (\v{p},\v{\tau}) \in \{0,1\}^n \times \R_{\geq 0}^n :$ 

For the sake of
argument, consider only vectors 

Note that this
implies that $J($

For all $i \in \{1,\dots,n\}$, assume that $g_i(0)=0$. Take vectors
$\v{\tau} \in \R_{\geq 0}^n$ and $\v{p} \in \{0,1\}^n$. We claim that
there exist vectors $\v{\bar{\tau}} \in \R_{\geq 0}^n$ and $\v{\bar{p}}
\in \{0,1\}^n$ such that for all $i \in \{1,\dots,n\}$, $\bar{\tau}_i =
0 \iff \bar{p}_i = 0$ and $J(\v{p},\v{\tau}) =
J(\v{\bar{p}},\v{\bar{\tau}})$. To generate these vectors, for each $i
\in \{1,\dots,n\}$, assign
%
\begin{equation*}
        (\bar{p}_i,\bar{\tau}_i) 
        = 
        \begin{cases}
                (p_i,\tau_i) &\text{if } p_i = 1
                              \text{ and } \tau_i > 0\\
                (0,0) &\text{otherwise}
        \end{cases}
\end{equation*}
%
Note that for any $i \in \{1,\dots,n\}$, if $\tau_i = 0$ then
$g_i(\tau_i) = 0$; thus, if $\tau_i = 0$ or $p_i = 0$ or both then $p_i
\tau_i = p_i g_i(\tau_i) = 0$ and type $i$ makes no contribution to the
payoff function evaluated at $(\v{p},\v{\tau})$. Also, by the definition
of $\v{\bar{p}}$ and $\v{\bar{\tau}}$, for all $i \in \{1,\dots,n\}$,
$\bar{p}_i = 0 \iff \bar{\tau}_i = 0 \iff (p_i = 0 \text{ or } \tau_i =
0 )$. Thus, $J(\v{p},\v{\tau}) = J(\v{\bar{p}},\v{\bar{\tau}})$, and the
claim is proved.

Again, for all $i \in \{1,\dots,n\}$, assume that $g_i(0)=0$. Now assume
that $(\v{p},\v{\tau})=(\v{p}^*,\v{\tau}^*)$ locally maximizes
$J(\v{p},\v{\tau})$. By the claim, there must exist some point $(\v{p}$

%\citet[ch.~2]{SK86} 
%\citet[section~2.4]{SK86}
This assumption is made along with many others by \citet{SK86} and is
used to generate a combined task-type and processing-length choice
algorithm. However, if only this restriction on the initial value of the
gain function is assumed, an algorithm much simpler than the one by
\citet{SK86} can be used. By 

Now assume that $\v{\tau}^* \in
\R_{\geq 0}^n$ and $\v{p}^* \in \{0,1\}^n$ are optimal solutions for 


It can easily be shown (\eg, by contradiction) that a locally optimal
solution for the combined task-type and processing-length choice
problem is found with the following steps.
%
\begin{itemize}
%
        \item First, solve the related processing-length problem
                that results from setting $p_i = 1$ for all $i \in
                \{1,\dots,n\}$; call the resulting optimal solution
                $\v{\tau}^*$. It can be shown that there exists a
                task-type choice preference vector $\v{p}^*$ such that
                $\v{p}=\v{p}^*$ and $\v{\tau}=\v{\tau}^*$ will be an
                optimal solution for the combined problem.
%

        \item Next, solve the related task-type choice problem assuming
                that for all $i \in \{1,\dots,n\}$, the average
                processing length $\tau_i = \tau_i^*$ and the average
                gain $g_i = g_i(\tau_i^*)$ where the gain function
                $g_i(\tau_i)$ is taken from the combined problem. Call
                the resulting optimal solution $\v{p}^*$. An optimal
                solution for the combined problem uses $\v{p}=\v{p}^*$
                and $\v{\tau}=\v{\tau}^*$.

%                
\end{itemize}
%
Thus, set task-type choice decisions with $\v{p}=\v{p}^*$ and 
processing-length choice decisions with $\v{\tau}=\v{\tau}^*$. The
result will be a local maximum of the rate.

\subsection{Discussion} 

%\citet[section~2.2]{SK86} 
%\citet[ch.~2]{SK86}
This intuitive approach follows from the ability of the generalized
task-type choice algorithm in \longref{sec:generalized_algorithm} to
handle parameter vectors of the form of \longref{eq:prof_vector} with
$x^i_2 = 0$. Usually a type $i \in \{1,\dots,n\}$ associated with such a
parameter vector will \emph{not} have a finite profitability (\ie, $P(i)
= \infty$ or $P(i) = -\infty$) and so the algorithm presented by
\citet{SK86} for solving the task-type choice problem is unable to
handle this case. The algorithm in \longref{sec:generalized_algorithm}
can be considered the natural extension of the \citeauthor{SK86}
algorithm to profitabilities of all forms. It is interesting to note
that \citeauthor{SK86} assume that for all $i \in \{1,\dots,n\}$, the
gain function $g_i(0)=0$. Under this restriction, our combined task-type
and processing-length choice solution states that if a local optimum
behavior is desired, a type $i \in \{1,\dots,n\}$ should be chosen for
processing \emph{if and only if} the related processing-length choice
solution results in $\tau_i^* = 0$.  That is, for the case where all
gain functions are initially zero, the combined problem is identical to
the processing-length choice problem.  Since \citet{SK86} always assumes
this property of the gain functions, there is no need for any discussion
of the combined problem beyond the existing discussion of the
processing-length choice problem.

\section{Generalized Extension of OFT Results}

We have optimized the expression of $\E(G_1)/\E(T_1)$, which is
equivalent to the expression of the long-term rate of gain usually
optimized in classical \ac{OFT}. Thus, our solutions are consistent with
the classical \ac{OFT} solutions. However, due to the algorithms we have
developed, we were able to make our parameters much more general. This
allows for the analysis and design of rate-maximizing behaviors in much
more general contexts.

\section{Maximizing Point Gain Discounted by Weighted Time}

The rate-maximizing approach may not be appropriate for all cases. As we
discussed, there are other methods of simultaneously applying upward
pressure on point gain and downward pressure on time. One such method is
to apply \longref{eq:compound_optimization_objectives} with $\E(G)$ and
$-\E(T)$ as in
%
\begin{equation*}
        \E(G) - w_1 \E(T) 
        =
        N^p \left( \E(G_1) - w_1 \E(T_1) \right) 
        \label{eq:payoff_discounted_gain}
\end{equation*}
%
where $w_1 \in \R_{\geq 0}$ is in points per second and $k \in
\{1,\dots,N^p\}$ is an arbitrary cycle. Note that maximizing this
function is identical to maximizing $\E(G_1) - w_1 \E(T_1)$ for cycle $k
\in \{1,\dots,N^p\}$, and so there are similarities between this payoff
function and \longref{eq:HouMc99_approach}, the function used in the
\citet{HouMc99} approach to rate maximization. However, we allow $w_1$
to be chosen in order to adjust the relative importance of time. What is
interesting about this function is that while it provides the same
qualitative pressures on gain and time as the rate-maximizing payoff
function, its optimization leads to very different results. 

\section{Task-Type Choice}

Using the same analysis procedure as in the task-type choice case for
the rate-maximizing approach, it is easy to show that the zero-one rule
holds here as well. Thus, we define the payoff function as $J:
\{\set{S}: \set{S} \subseteq \{1,\dots,n\}\} \mapsto \R$ with
%
\begin{equation}
        J(\set{Q}) 
        \triangleq
        N^p \frac{ 
        \sum\limits_{i \in \set{Q}} \lambda_i \left( g_i - c_i
        \tau_i - w_1 \tau_i \right) - c^s - w_1}
        { \sum\limits_{i \in \set{Q}} \lambda_i }
        \label{eq:task_type_choice_gain_cost}
\end{equation}
%
Similar to before, for each task type $i \in \{1,\dots,n\}$, define the
vector in \longref{eq:prof_vector} as
%
\begin{equation*}
        \v{x}^i = [ N^p ( g_i - c_i \tau_i - w_1 \tau_i ), 1]^\T
\end{equation*}
%
Also let constants $\gamma=-c^s$ and $\varphi=1$. Hence,
\longref{eq:task_type_choice_gain_cost} can be written in the general
payoff function form of \longref{eq:general_type_choice_cost_function}.
So, the profitability
%
\begin{equation*}
        P(i) = N^p ( g_i - c_i \tau_i - w_1 \tau_i )
\end{equation*}
%
which is equal to the expected time-discounted gain for processing $N^p$
tasks when the wait time between sequential encounters of tasks of type
$i$ is zero. In this special case, the normalized profitability
$P(i)/N^p$ is equal to the mean time-discounted gain for processing a
\emph{single task} of type $i$. Order the task-type indices such that
\longref{eq:prof_ordering} holds. Just as before, the generalized
task-type choice algorithm can be used to find a set of task types for
inclusion that will globally maximize the payoff function.

Note that the positive constant factor $N^p$ in $P(i)$ for all $i \in
\{1,\dots,n\}$ does not affect the ordering, and so ordering by
$P(i)/N^p$ is equivalent. This means that tasks can be ordered by the
mean time-discounted gain for processing a \emph{single task}, just as
the ordering for the rate-maximization case orders by the long-term rate
of gain for processing a \emph{single task}. Also notice that the
profitability ordering for this mean time-discounted gain case could
lead to the same ordering as would be seen in the optimization algorithm
for the rate-maximizer. In fact, often the two orderings will only
differ slightly. Because of this, if evidence for using a
rate-maximizing approach to model nature comes entirely from
observations of task-type choice ordering then that evidence does not
exclude the use of this payoff function as well.

\section{Processing-Length Choice}

For the processing-length choice case, the payoff function is a
function $J: \R_{\geq 0}^n \mapsto \R$, defined as
%
\begin{equation*}
        J(\v{\tau}) 
        \triangleq
        N \frac{ 
        \sum\limits_{i=1}^n \lambda_i \left( g_i(\tau_i) - c_i
        \tau_i - w_1 \tau_i \right) - c^s - w_1}
        { \sum\limits_{i=1}^n \lambda_i }
\end{equation*}
%
As in the rate-maximizing approach, we will derive necessary and
sufficiency conditions for local maxima of $J$. That is, we will solve
the constrained optimization problem in
\longref{eq:problem_processing_length}.

Again, we assume that for all $i \in \{1,\dots,n\}$, the gain function
$g_i(\tau_i)$ is continuously differentiable.  When necessary, we will
also assume that for all $i \in \{1,\dots,n\}$ the gain function
$g_i(\tau_i)$ is twice continuously differentiable. These restrictions
only need to hold within some open neighborhood containing a local
maximum of $J$. 

Define the Lagrangian $L: \R_{\geq 0}^n \times \R^n \mapsto \R$ by
\longref{eq:lagrangian_processing_length}. Also define
$\set{A}(\v{\tau})$, the set of active inequality constraints, as in
\longref{eq:active_constraints_processing_length}. As discussed in the
rate-maximizing approach, all points $\v{\tau} \in \R_{\geq 0}^n$ are
said to be regular.

\subsection{First-Order Necessary Conditions}

Let $\v{\tau}^* \in \R^n$ be a local maximum of $J$. By the above
reasoning, $\v{\tau}^*$ is regular. Then there must exist a unique
Lagrange multiplier vector $\v{\mu}^* \in \R^n$ such that
\longref{eq:lagrangian_first_order} holds.  Let $\v{\mu}^*$ be such a
multiplier vector. For all $j \in \{1,\dots,n\}$,
\longref{eq:lagrangian_first_order_j} holds, which means that the
multiplier $\mu^*_j$ is given by 
%
\begin{align*}
        \mu^*_j &=
        \left.
        -\frac{ \partial J }{ \partial \tau_j }
        \right|_{\v{\tau}=\v{\tau}^*}\\
        &=
        -\frac{N^p}{ \sum\limits_{i=1}^n \lambda_i }
        \lambda_j \left( g'_j(\tau^*_j) - c_j - w_1 \right)
\end{align*}
%
For $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, $\tau^*_j > 0$
and the multiplier $\mu^*_j = 0$, hence it must be that
%
\begin{equation}
        {g_j}'( \tau^*_j ) - c_j 
        =
        w_1
        \label{eq:patch_discount_mvt}
\end{equation}
%
However, for $j \in \set{A}(\v{\tau}^*)$, $\tau^*_j = 0$ and the
multiplier $\mu^*_j \geq 0$, hence, it must be that
%
\begin{equation}
        {g_j}'( 0 ) - c_j 
        \leq
        w_1
        \label{eq:patch_discount_mvt_0}
\end{equation}

\subsection{Second-Order Necessary Conditions}

Furthermore, assume that for all $i \in \{1,\dots,n\}$, $g_i(\tau_i)$ is
twice continuously differentiable at $\tau_i = \tau^*_i$. The same
discussion of feasible variations from the rate-maximizing approach
applies here, so for all $\v{x} \in \R_{\geq 0}^n$,
\longref{eq:lagrangian_feasible_hessian} holds. However, all the
off-diagonal terms zero. That is, for all $j,\ell \in \{1,\dots,n\}
\setdiff \set{A}(\v{\tau}^*)$ with $j \neq \ell$, $\partial^2
L(\v{\tau},\v{\mu})/{\partial \tau_j \partial
\tau_\ell}|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = 0$.
Therefore, \longref{eq:lagrangian_feasible_hessian} is equivalent to
\longref{eq:lagrangian_perfect_square_hessian}. Thus, for all $j \in
\{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, 
%
\begin{align*}
        \left. 
        \frac{\partial^2 L}{{\partial \tau_j}^2} 
        \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} 
        =
        -\frac{N^p}{ \sum\limits_{i=1}^n \lambda_i }
        \lambda_j {g_j}''(\tau^*_j)
        \geq
        0
\end{align*}
%
In other words, for all $j \in \{1,\dots,n\} \setdiff
\set{A}(\v{\tau}^*)$,
%
\begin{equation}
        {g_j}''(\tau_j^*) \leq 0
        \label{eq:patch_discount_mvt_2nd}
\end{equation}

\subsection{Second-Order Sufficiency Conditions} 

Assume that for all $i \in \{1,\dots,n\}$, the gain function
$g_i(\tau_i)$ is twice continuously differentiable, and let $\v{\tau}^*
\in \R_{\geq 0}^n$ be a vector such that
\longrefs{eq:patch_discount_mvt} and
\shortref{eq:patch_discount_mvt_2nd} hold for all $j \in \{1,\dots,n\}
\setdiff \set{A}(\v{\tau}^*)$ and \longref{eq:patch_discount_mvt_0}
holds for all $j \in \set{A}(\v{\tau}^*)$. If
\longrefs{eq:patch_discount_mvt_0} and
\longref{eq:patch_discount_mvt_2nd} both hold with strict inequality
then $\v{\tau}^*$ is a local maximum of $J$.

\subsection{Marginal-Value Theorem} 

It is interesting to note that in this case there is no coupling between
the different processing lengths. In the rate-maximizing approach, the
problem was a system of coupled equations. However, in this case there
are $n$ independent equations that are not affected by the task
distributions. Nevertheless, there is still a marginal-value notion
here. Together, \longrefs{eq:patch_discount_mvt},
\shortref{eq:patch_discount_mvt_0}, and
\shortref{eq:patch_discount_mvt_2nd} indicate that for tasks of
diminishing returns, if the initial rate of discounted point gain from
the individual task is higher than the overall value of time to the
agent then the agent should process the task until the instantaneous
rate of discounted point gain processing the task drops to the value of
time. If time is very important to the agent, it will only continue
processing a task for a long time if the task has a steep gain function.
If time is not very important to the agent, it will continue processing
every task for a long amount of time. However, the time it spends on
tasks of one type will have no impact on the time it spends in tasks of
a different type.

\section{The Efficiency Approach}

Behavioral ecologists generally dislike considering efficiency
approaches because efficiency puts no pressure on time, and thus an
agent with an efficiency-maximizing behavior may be at a significant
survival risk due to the long periods of time required to receive
significant point gains. However, in engineering applications where time
may not be as important to an agent, efficiency could be a good metric
to use. Additionally, because costs in this model are proportional to
the time spent during certain activities, any pressure on cost may have
an implicit pressure on time as well.

We maximize the negative \acro{CBR}{cost-to-benefit ratio} in
\longref{eq:payoff_efficiency}. We assume that $c_i \geq 0$ for all $i
\in \{1,\dots,n\}$ and $c^s \geq 0$. To avoid dividing by zero in the
task-type choice problem, we assume that for all $i \in \{1,\dots,n\}$,
$g_i > 0$. Similarly, for the task-processing length choice problem, for
all $i \in \{1,\dots,n\}$, we will define the average gain function as
as $g_i: \R_{\geq 0} \mapsto \R_{>0}$.

\section{Task-Type Choice}

Again, using the same analysis as in the rate-maximizing approach, it is
easy to show that the zero-one rule holds here as well. Thus, we define
the payoff function $J: \{\set{S}: \set{S} \subseteq \{1,\dots,n\}\}
\mapsto \R$ with
%
\begin{equation}
        J(\set{Q}) 
        \triangleq
        -\frac{ 
        \sum\limits_{i \in \set{Q}} \lambda_i c_i
        \tau_i + c^s}
        { \sum\limits_{i \in \set{Q}} \lambda_i g_i }
        \label{eq:task_type_choice_efficiency_cost}
\end{equation}
%
Notice that this ratio is \emph{not} the same ratio as in the
rate-maximizing payoff function.  Similar to before, for each task type
$i \in \{1,\dots,n\}$, define the vector in \longref{eq:prof_vector} as
%
\begin{equation*}
        \v{x}^i = [ -c_i \tau_i, g_i ]^\T
\end{equation*}
%
Also let constants $\gamma=-c^s$ and $\varphi=1$. Hence,
\longref{eq:task_type_choice_efficiency_cost} can be written in the
general payoff function form of
\longref{eq:general_type_choice_cost_function}. So, the profitability
%
\begin{equation*}
        P( i ) = \frac{-c_i \tau_i}{g_i}
\end{equation*}
%
which is the efficiency for processing $N^p$ tasks if there is zero wait
time between sequential task-type $i$ encounters. In this special case,
the profitability $P(i)$ can also be interpreted as the efficiency of
processing a \emph{single task} of type $i$. Order the task-type indices
such that \longref{eq:prof_ordering} holds. Now, the generalized
task-type choice algorithm can be used to find a set of task types for
inclusion that will globally maximize the payoff function.

\section{Processing-Length Choice}

For the processing-length choice case, the payoff function is a
function $J: \R_{\geq 0}^n \mapsto \R$, defined as
%
\begin{equation*}
        J(\v{\tau}) 
        \triangleq
        -\frac{ 
        \sum\limits_{i=1}^n \lambda_i c_i
        \tau_i + c^s}
        { \sum\limits_{i=1}^n \lambda_i g_i(\tau_i) }
\end{equation*}
%
Notice that this ratio is \emph{not} the same ratio as in the
rate-maximizing payoff function. As in the rate-maximizing approach, we
will derive necessary and sufficiency conditions for local maxima of
$J$. That is, we will solve the constrained optimization problem in
\longref{eq:problem_processing_length}.

Again, we assume that for all $i \in \{1,\dots,n\}$, the gain function
$g_i(\tau_i)$ is continuously differentiable.  When necessary, we will
also assume that for all $i \in \{1,\dots,n\}$ the gain function
$g_i(\tau_i)$ is twice continuously differentiable. These restrictions
only need to hold within some open neighborhood containing a local
maximum of $J$. 

Define the Lagrangian $L: \R_{\geq 0}^n \times \R^n \mapsto \R$ by
\longref{eq:lagrangian_processing_length}.  Also define
$\set{A}(\v{\tau})$, the set of active inequality constraints, as in
\longref{eq:active_constraints_processing_length}. As discussed in the
rate-maximizing approach, all points $\v{\tau} \in \R_{\geq 0}^n$ are
said to be regular.

\subsection{First-Order Necessary Conditions}

Let $\v{\tau}^* \in \R^n$ be a local maximum of $J$. By the above
reasoning, $\v{\tau}^*$ is regular. Then there must exist a unique
Lagrange multiplier vector $\v{\mu}^* \in \R^n$ such that
\longref{eq:lagrangian_first_order} holds.  Let $\v{\mu}^*$ be such a
multiplier vector. For all $j \in \{1,\dots,n\}$,
\longref{eq:lagrangian_first_order_j} holds, which means that the
multiplier $\mu^*_j$ is given by 
%
\begin{align*}
        \mu^*_j &=
        \left.
        -\frac{ \partial J }{ \partial \tau_j }
        \right|_{\v{\tau}=\v{\tau}^*}\\
        &=
        \frac
        { 
        \lambda_j c_j
        \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i )
        -
        \lambda_j g'_j(\tau^*_j)
        \left( \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s \right)
        }
        { \left( \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) \right)^2 }
\end{align*}
%
For $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, $\tau^*_j > 0$
and the multiplier $\mu^*_j = 0$, hence it must be that
%
\begin{equation}
        g'_j(\tau^*_j) = 0 \implies c_j = 0
        \label{eq:patch_efficiency_mvt_special}
\end{equation}
%
and
%
\begin{equation}
        g'_j(\tau^*_j) \neq 0 \implies
        {-\frac{c_j}{ {g_j}'( \tau^*_j ) }}
        =
        -\frac{ 
        \sum\limits_{i=1}^n \lambda_i c_i
        \tau^*_i + c^s}
        { \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) }
        =
        J(\tau^*)
        \label{eq:patch_efficiency_mvt}
\end{equation}
%
However, for $j \in \set{A}(\v{\tau}^*)$, $\tau^*_j = 0$ and the
multiplier $\mu^*_j \geq 0$, hence, it must be that
%
%
\begin{equation}
        g'_j(0) = 0 \implies c_j \geq 0
        \label{eq:patch_efficiency_mvt_0_special}
\end{equation}
%
and
%
\begin{equation}
        g'_j(0) \neq 0 \implies
        {-\frac{c_j}{ {g_j}'( 0 ) }}
        \leq 
        -\frac{ 
        \sum\limits_{i=1}^n \lambda_i c_i
        \tau^*_i + c^s}
        { \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) }
        =
        J(\tau^*)
        \label{eq:patch_efficiency_mvt_0}
\end{equation}
%
Note that since we have assumed $c_i \geq 0$ for all $i \in
\{1,\dots,n\}$, \longref{eq:patch_efficiency_mvt_0_special} will always
be true for all $j \in \set{A}(\v{\tau}^*)$ such that $g'_j(0) = 0$.

\subsection{Second-Order Necessary Conditions}

Furthermore, assume that for all $i \in \{1,\dots,n\}$, $g_i(\tau_i)$ is
twice continuously differentiable at $\tau_i = \tau^*_i$. The same
discussion of feasible variations from the rate-maximizing approach
applies here, so for all $\v{x} \in \R_{\geq 0}^n$,
\longref{eq:lagrangian_feasible_hessian} holds. So, take $a,b \in
\{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ such that $a \neq b$.
%
\begin{align*}
        \left.
        \frac{\partial^2 L}{\partial \tau_a \partial \tau_b}
        \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)}
        &=
        \frac{
        \lambda_b \lambda_a
        \left( 
        c_a g'_b( \tau^*_b )
        -
        c_b g'_a( \tau^*_a )
        \right)
        \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i )
        }
        { 
        \left( \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) \right)^3
        }\\
        &\mathrel{\phantom{=}}
        {}-
        2 
        \lambda_b \lambda_a g'_b(\tau^*_b)
        \frac{
        c_a \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i)
        -
        g_a'(\tau^*_a)
        \left( \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s \right)
        }
        { 
        \left( \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) \right)^3
        }
\end{align*}
%
By \longref{eq:patch_efficiency_mvt_special}, if $g_a( \tau^*_a ) = 0$
then $c_a = 0$, and thus $\partial^2 L(\v{\tau},\v{\mu})/{\partial
\tau_a \partial \tau_b}|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} =
0$. Additionally, by the same reasoning, if $g_b( \tau^*_b ) = 0$ then
$c_b = 0$, and so $\partial^2 L(\v{\tau},\v{\mu})/{\partial \tau_a
\partial \tau_b}^2|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = 0$.
In the case where both $g_a(\tau^*_a) \neq 0$ and $g_b(\tau^*_b) \neq
0$ then
%
\begin{equation*}
        \frac{c_a}{ {g_a}'( \tau^*_a ) }
        =
        \frac{c_b}{ {g_b}'( \tau^*_b ) }
        =
        \frac{ 
        \sum\limits_{i=1}^n \lambda_i c_i
        \tau^*_i + c^s}
        { \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) }
\end{equation*}
%
so,
%
\begin{align*}
        \left.
        \frac{\partial^2 L}{\partial \tau_a \partial \tau_b}
        \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)}
        &=
        \frac{
        \lambda_b \lambda_a
        ( 0 )
        \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i )
        }
        { 
        \left( \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) \right)^3
        }
        %\\
        %&\mathrel{\phantom{=}}
        %{}-
        -
        2 
        \lambda_b \lambda_a g'_b(\tau^*_b)
        \frac{ 0 }
        { 
        \left( \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) \right)^3
        }\\
        &= 0
\end{align*}
%
Hence, all the off-diagonal terms cancel zero for all $j,\ell \in
\{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ with $j \neq \ell$,
$\partial^2 L(\v{\tau},{\mu})/{\partial \tau_j \partial
\tau_\ell}|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = 0$. Therefore,
\longref{eq:lagrangian_feasible_hessian} is equivalent to
\longref{eq:lagrangian_perfect_square_hessian}. Thus, for all $j \in
\{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, 
%
\begin{align*}
        \left. 
        \frac{\partial^2 L}{{\partial \tau_j}^2} 
        \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} 
        =
        \frac
        {
        -\lambda_j {g_j}''(\tau^*_j) 
        \left( \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s \right)
        }
        { \left( \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) \right)^2 }
        \geq
        0
\end{align*}
%
In other words, for all $j \in \{1,\dots,n\} \setdiff
\set{A}(\v{\tau}^*)$,
%
\begin{equation*}
        {g_j}''(\tau^*_j) 
        \left( \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s \right)
        \leq 0
\end{equation*}
%
However, if there exists a $j \in \{1,\dots,n\} \setdiff
\set{A}(\v{\tau}^*)$ then there exists a $j \in \{1,\dots,n\}$ such that
$\tau^*_j > 0$. Additionally, we assume that $c^s$ and
$\{c_1,\dots,c_n\}$ were not chosen so that $J \equiv 0$. So,
%
\begin{equation}
        {g_j}''(\tau^*_j) \leq 0
        \label{eq:patch_efficiency_mvt_2nd}
\end{equation}

\subsection{Second-Order Sufficiency Conditions} 

Assume that for all $i \in \{1,\dots,n\}$, the gain function
$g_i(\tau_i)$ is twice continuously differentiable, and let $\v{\tau}^*
\in \R_{\geq 0}^n$ be a vector such that
\longrefs{eq:patch_efficiency_mvt_special},
\shortref{eq:patch_efficiency_mvt}, and
\shortref{eq:patch_efficiency_mvt_2nd} hold for all $j \in \{1,\dots,n\}
\setdiff \set{A}(\v{\tau}^*)$ and
\longrefs{eq:patch_efficiency_mvt_0_special} and
\shortref{eq:patch_efficiency_mvt_0} hold for all $j \in
\set{A}(\v{\tau}^*)$.  If \longrefs{eq:patch_efficiency_mvt_0_special},
\shortref{eq:patch_efficiency_mvt_0}, and
\longref{eq:patch_efficiency_mvt_2nd} all hold with strict inequality
then $\v{\tau}^*$ is a local maximum of $J$.

\subsection{Marginal-Value Theorem} 

This processing-length result also has a marginal-value notion.
Together, \longrefs{eq:patch_efficiency_mvt_special},
\shortref{eq:patch_efficiency_mvt},
\shortref{eq:patch_efficiency_mvt_0_special},
\shortref{eq:patch_efficiency_mvt_0}, and
\shortref{eq:patch_efficiency_mvt_2nd} indicate that for tasks of
diminishing returns, if the initial
\acro{CRBRR}{cost-rate-to-benefit-rate ratio} from the individual task
is lower than the overall \ac{CBR} then the agent should process the
task until the instantaneous \ac{CRBRR} processing the task rises to the
overall \ac{CBR}. 

\todo{Of course, additional constraint and variance cases should be
added here.}

% \section{Threshold Approaches and Explicit Cycle Count}
% 
% Another advantage to using an explicit cycle count (\ie, $N^p$) rather
% than using a limiting case is that methods that seek to achieve some
% threshold (\eg, some minimum number of points or some maximum amount
% of time) will have different behaviors based on the size of $N^p$.
% 
% For example, if the goal is to solve a task-type choice problem for
% the case where minimum $\E(T)$ is desired as long as $\E(G) \geq G^c$
% is met, where $G^c \in \R$ then a \ac{KKT} analysis shows that when 
% the solution sits on the boundary where $\E(G) = G^c$, the task-type
% inclusion criteria is dependent upon $N^p$. That is, if a particular
% task type $i \in \{1,\dots,n\}$ is such that
% %
% \begin{equation*}
%         \tau_i \leq \frac{1}{N^p} \E(T) \quad \land \quad
%         g_i - c_i \tau_i \geq \frac{G^c}{N^p}
% \end{equation*}
% %
% then this task type should never be excluded completely. Both of these
% conditions depend upon $N^p$. As $N^p$ increases, more types will meet
% the gain condition, but fewer types will meet the time condition.
% Without making the cycle count $N^p$ explicit then this variation in
% optimal behavior would not have been evident.