% Upper-case A B C D E F G H I J K L M N O P Q R S T U V W X Y Z % Lower-case a b c d e f g h i j k l m n o p q r s t u v w x y z % Digits 0 1 2 3 4 5 6 7 8 9 % Exclamation ! Double quote " Hash (number) # % Dollar $ Percent % Ampersand & % Acute accent ' Left paren ( Right paren ) % Asterisk * Plus + Comma , % Minus - Point . Solidus / % Colon : Semicolon ; Less than < % Equals = Greater than > Question mark ? % At @ Left bracket [ Backslash \ % Right bracket ] Circumflex ^ Underscore _ % Grave accent ` Left brace { Vertical bar | % Right brace } Tilde ~ % ---------------------------------------------------------------------| % --------------------------- 72 characters ---------------------------| % ---------------------------------------------------------------------| % % Optimal Foraging Theory Revisited: Chapter 4. Finite-Lifetime % Optimization Results % (note: this material was not included in final document) % % (c) Copyright 2007 by Theodore P. Pavlic % \chapter{Finite-Lifetime Optimization Results} \label{ch:optimization_results_old} \todo{The commentary on the COMBINED methods below simply is not correct. While I was confident that a simpler algorithm than Stephen and Krebs combined algorithm existed, now I'm fairly sure that their method is the only way possible. While it is simple (and also a contribution) to show the necessary conditions (sufficient is harder, but also doable) on an optimum point of the combined problem, it is not so simple to prove that an algorithm leads to those conditions. ADDITIONALLY, it appears like the PATCH *AND* COMBINED results can be generalized just like the PREY results. That would shorten this section and allow the results to apply to a much wider set of examples. The end goal is to derive a simple generalized combined algorithm, give it, and show the existing cases here as simple examples.} In this \chname{} results are derived for a number of the optimization cases outlined in \longref{ch:optimization_objectives}. First, an algorithm is presented that generalizes the solution approach commonly used in classical \ac{OFT} and extends it for use with some additional cases. We then show how that algorithm is used by solving the task-type and processing-length choice problems for the classical \ac{OFT} case as well as some additional cases that take similar forms. Finally, we derive solutions for some more complex cases. Due to the constraints in the model, most of our optimization approaches will use the Lagrange multiplier method (\ie, solutions will be given by the Karush-Kuhn-Tucker conditions) \citep{Bertsekas95}; this is the same approach used by \citet{Pulliam75} which derives optimal foraging solutions with nutrient constraints. \section{Generalized Profitability Ordering} \label{sec:generalized_algorithm} Here we derive an algorithm for solving certain types of task-type choice problems. Conventional solution approaches to task-type optimal foraging problems rely on ordering the profitability of choices of task types \citep{Cha73,SK86}. We begin by generalizing this approach. A key mathematical relationship that enables this generalization is that if $a,b,c,d \in \R$ with $b>0$ and $d>0$ then % \begin{equation} \frac{a}{b} \geq \frac{c}{d} \iff \frac{a}{b} \geq \frac{c+a}{d+b} \iff \frac{c+a}{d+b} \geq \frac{c}{d} \label{eq:lemma} \end{equation} % In fact, this is true if the three $\geq$ relations are all replaced with any one of the relations from the set $\{{=},{>},{<},{\geq}, {\leq}\}$. %A proof of this can be found in \longref{app:lemma_proof}. Assume that for each type $i \in \{1,\dots,n\}$, there exists a parameter $\lambda_i \in \R_{>0}$ and a parameter vector % \begin{equation} \v{x}^i = [ x^i_1, x^i_2 ]^\T \in \{\v{z} = [z_1, z_2]^\T \in \R^2: z_2 \geq 0\} \label{eq:prof_vector} \end{equation} % These (\ie, $\lambda_1, \dots, \lambda_n$, and $\v{x}^1, \dots, \v{x}^n$) and constants $\gamma \in \R$ and $\varphi \in \R_{\geq 0}$ are parameters of a general payoff function $J: \{\set{S}: \set{S} \subseteq \{1,\dots,n\}\} \mapsto \R$, defined with % \begin{equation} J(\set{Q}) \triangleq \frac{ \gamma + \sum\limits_{j \in \set{Q} \cap \set{Z}} \lambda_j x^j_1 + \sum\limits_{i \in \set{Q} \setdiff \set{Z}} \lambda_i x^i_1 } { \varphi + \sum\limits_{i \in \set{Q} \setdiff \set{Z}} \lambda_i x^i_2 } \label{eq:general_type_choice_cost_function} \end{equation} % where set of types $\set{Z} \subseteq \{1,\dots,n\}$ is defined % \begin{equation*} \set{Z} \triangleq \{ i \in \{1,\dots,n\} : x^i_2 = 0 \} \end{equation*} % In order for $J$ to be well-defined, we make the additional restriction that if $\set{Z} \neq \emptyset$, it must be that $\varphi \neq 0$ (\ie, $\varphi > 0$). $J$ maps a finite set of task types $\set{Q} \subseteq \{1,\dots,n\}$ to some payoff $J(\set{Q})$. We maximize $J(\set{Q})$ over all $\set{Q} \in \{\set{S}: \set{S} \subseteq \{1,\dots,n\}\}$. In other words, we choose the set of task-type indices to include to \emph{globally} maximize the payoff function; all task-type indices not in this set should be excluded. Note that if a type $i \in \set{Z}^=$ where $\set{Z}^= \triangleq \{ i \in \set{Z} : x^i_1 = 0 \}$, its inclusion has no impact on the optimality of a set. Therefore, for simplicity, we assume that $\set{Z}^= = \emptyset$. However, if there are $n+m$ types and $|\set{Z}|=m$, the following can be applied to the $n$-sized set $\{1,\dots,n+m\} \setdiff \set{Z}^=$; optimal sets resulting from the method can then have any number of types from $\set{Z}^=$ added to them arbitrarily. Define sets $\set{Z}^+ \triangleq \{ i \in \set{Z} : x^i_1 > 0 \}$ and $\set{Z}^- \triangleq \{ i \in \set{Z} : x^i_1 < 0 \}$. By the assumption that $\set{Z}^= = \emptyset$, $\set{Z}^+ + \set{Z}^- = \set{Z}$ and $\set{Z}^+ \cap \set{Z}^- = \emptyset$. Thus, these two sets form a \emph{partition} of $\set{Z}$. For a task of type $i$, the limit of $J(\set{Q})$ as $\lambda_i \to \infty$ is important to us. If $\lambda_i$ is thought of as an encounter rate with tasks of type $i$ then this limit is the payoff returned from spending roughly all time processing those tasks. We call this limit the profitability of tasks of type $i$, and we define profitability function $P: \{1,\dots,n\} \mapsto \extR$ with % \begin{equation*} P( i ) \triangleq \lim\limits_{\lambda_i \to \infty} J(\{1,\dots,n\}) = \lim\limits_{\lambda_i \to \infty} J(\{i\}) = \begin{cases} \infty &\text{if } i \in \set{Z}^+\\ \frac{x^i_1}{x^i_2} &\text{if } i \in \{1,\dots,n\} \setdiff \set{Z}\\ {-\infty} &\text{if } i \in \set{Z}^- \end{cases} \end{equation*} % Additionally, it is important to note that for each $i \in \set{Q} \setdiff \set{Z}$, % \begin{equation*} P( i ) = \frac{x^i_1}{x^i_2} = \frac{\lambda_i x^i_1}{\lambda_i x^i_2} %\label{eq:lambda_profitability} \end{equation*} % Without loss of generality, we assume that task-type indices are ordered such that $P(1) \geq P(2) \geq \dots \geq P( n-1 ) \geq P( n )$. In particular, if $i,j \in \{0,\dots,n+1\}$ are such that $i=|\set{Z}^+|+1$ and $j=|\set\{1,\dots,n\}\setdiff\set{Z}^-|+1$, the indices are ordered such that % \begin{align} \begin{split} \{1,\dots,i-1\} &= \set{Z}^+\\ &\text{and}\\ \frac{x^i_1}{x^i_2} \geq \frac{x^{i+1}_1}{x^{i+1}_2} \geq &\dots \geq \frac{x^{j-2}_1}{x^{j-2}_2} \geq \frac{x^{j-1}_1}{x^{j-1}_2} \end{split} \label{eq:prof_ordering} \end{align} % The ordering of the types in $\set{Z}^+$ is not important as long as they are lower in order than all other types. We will show that all of these types will be included in the optimal set. Take the sets $\set{R}, \set{S} \subseteq \{1,\dots,n\}$. If $i \notin \set{R}$ and there exists $j \in \set{R}$ such that $i < j$ then we refer to $i$ as a hole in $\set{R}$. Additionally, if $J(\set{S}) \geq J(\set{R})$ then we say that $\set{S}$ is a better set than $\set{R}$. Take $\set{Q} \subseteq \{1,\dots,n\}$ and $i,j \in \{1,\dots,n\}$ with $i < j$ such that $i \notin \set{Q}$ and $j \in \set{Q}$. That is, $i$ is a hole in set $\set{Q}$. Our claim is that there exists a set $\set{V}$ with $J(\set{V}) \geq J(\set{Q})$ such that for each $p \in \set{V}$, for all $q \in \{1,\dots,p\}$, $q \in \set{V}$. In other words, for any index set $\set{Q}$ that has a hole there exists a better index set $\set{V}$ that has no holes. First, take hole $i$ such that $i \in \set{Z}$. There are two cases that result from whether or not $x^i_1$ is less than zero (\ie, whether or not the $P(i) = \infty$ or $P(i) = -\infty$): % \begin{itemize} \item $x^i_1 \geq 0$: Since the denominator of $J(\set{Q})$ is positive by construction, $J(\set{Q}\cup\{i\}) \geq J(\set{Q})$. Thus, $\set{Q}\cup\{i\}$ is a better set with one less hole. In fact, by \longref{eq:prof_ordering}, for all $k \in \{1,\dots,i-1\}$, $x^k_1 \geq 0$. Hence, if there exists $k \in \{1,\dots,i-1\}$ such that $k \notin \set{Q}$, this case will apply gain for the hole $k$ in the better set $\set{Q}\cup\{i\}$. Therefore, this case can be applied repeatedly until all holes in $\set{Q}$ less than $i$ are removed. \item $x^i_1 < 0$: By \longref{eq:prof_ordering}, $x^j_1 < 0$. Since the denominator of $J(\set{Q})$ is positive by construction, $J(\set{Q}\setdiff\{j\}) > J(\set{Q})$. Hence, the smaller set $\set{Q}\setdiff\{j\}$ is better than set $\set{Q}$. However, by \longref{eq:prof_ordering}, for all $k \in \{i+1,\dots,n\}$, $x^i_1 < 0$. Therefore, if there exists $k \in \set{Q}\setdiff\{j\}$ with $k > i$, this case can be applied again to produce a better set without $k$. This repetition can continue until no such $k$ exists. At that point, $i$ ceases to be a hole; the result is a better set with at least one less hole. \end{itemize} % Clearly, the repeated application of the first case guarantees that a better set $\set{V}$ exists that includes every task $\ell \in \set{Z}^+$. Additionally, the repeated application of the second case guarantees that a better set $\set{V}$ exists that excludes every task $\ell \in \set{Z}^-$. In other words, $J( \set{Q} \cup \set{Z}^+ \setdiff \set{Z}^- ) \geq J( \set{Q} )$, therefore assume that there is no hole $i$ such that $i \in \set{Z}$. If no holes remain then $J(\set{Z}^+) \geq J(\set{Q})$ and the claim is proved. Assume that there exists a hole $i$ such that $i \notin \set{Z}$. Therefore, by the definition of a hole, there must also exist a $j \in \set{Q}$ with $j > i$. By the assumption just made, there is no $\ell \in \set{Z}^-$ such that $\ell \in \set{Q}$, so it must be that $j \notin \set{Z}$. There are two cases that result from whether or not $P(i)$, the profitability of the hole $i$, is less than $J(\set{Q})$: % \begin{itemize} \item $P(i) = x^i_1/x^i_2 = \lambda_i x^i_1 / ( \lambda_i x^i_2) \geq J(\set{Q})$: By \longref{eq:lemma}, $J(\set{Q}\cup\{i\}) \geq J(\set{Q})$. Thus, $\set{Q}\cup\{i\}$ is a better set with one less hole. In fact, by \longref{eq:prof_ordering}, for all $k \in \{1,\dots,i-1\}-\set{Z}$, $x^k_1/x^k_2 \geq x^i_1/x^i_2$, and by \longref{eq:lemma}, $x^i_1/x^i_2 \geq J(\set{Q}\cup\{i\})$, so $x^k_1/x^k_2 \geq J(\set{Q}\cup\{i\})$. Hence, if there exists $k \in \{1,\dots,i-1\}\setdiff\set{Z}$ such that $k \notin \set{Q}$, this case will apply again for the hole $k$ in the better set $\set{Q}\cup\{i\}$. Therefore, this case can be applied repeatedly until all holes in $\set{Q}\setdiff\set{Z}$ less than $i$ are removed. \item $J(\set{Q}) > P(i) = x^i_1/x^i_2 = \lambda_i x^i_1 / ( \lambda_i x^i_2 )$: By \longref{eq:prof_ordering}, $x^i_1/x^i_2 \geq x^j_1/x^j_2$; thus, $J(\set{Q}) > x^j_1/x^j_2$. So, by \longref{eq:lemma}, $J(\set{Q}\setdiff\{j\}) > J(\set{Q})$. Hence, the smaller set $\set{Q}\setdiff\{j\}$ is better than set $\set{Q}$. However, by \longref{eq:prof_ordering}, for all $k \in \{i+1,\dots,n\} \setdiff \set{Z}$, $x^i_1/x^i_2 \geq x^k_1/x^k_2$, and thus, $J(\set{Q}\setdiff\{j\}) > x^k_1/x^k_2$. Therefore, if there exists $k \in \set{Q}\setdiff\{j\}\setdiff\set{Z}$ with $k > i$, this case can be applied again to produce a better set without $k$. This repetition can continue until no such $k$ exists. At that point, because all types in set $\set{Z}^-$ are also excluded in the better set, $i$ ceases to be a hole; the result is a better set with at least one less hole. \end{itemize} % In the first case, all holes $i$ with $P(i) \geq J(\set{Q})$ are included shown to be included in a better set. In the second case, all holes $i$ with $P(i) < J(\set{Q})$ are shown to be excluded in a better set. By the ordering of the indices, this new better set is of the form $\{1,2,\dots,k\}$ where $k \in \{0,\dots,n\}$, and so it has no holes. The claim is proved. Therefore, when maximizing $J$ over all $\set{Q} \subseteq \{1,\dots,n\}$, only the $n$ sets of the form $\{1,\dots,k\}$ with $k \in \{1,\dots,n\}$ and the empty set need to be considered. These $n+1$ possible combinations are a significant reduction from the original $2^n$ possible combinations. Assuming that tasks have been ordered by their profitabilities as described above (\ie, as in \longref{eq:prof_ordering}) and $i$ is the lowest-ordered type such that $x^i_2 > 0$ (or $n+1$ if no such type exists) and $j$ is the lowest-ordered type such that $x^i_1 < 0$ (or $n+1$ if no such type exists), an algorithm for finding a set of task types for inclusion that will globally maximize $J$ is the following (it depends on the fact that there will always exist an optimal set of the form $\{1,\dots,k\}$ where $k \in \{0,\dots,n\}$ and uses \longref{eq:lemma} to find such a set): % \begin{itemize} \item Exclude all sets of the form $\{1,\dots,\ell\}$ where $\ell < i-1$ and continue if $j > i$. \item If $P(i) = x^i_1/x^i_2 \geq J(\{1,\dots,i-1\})$ then $J(\{1,\dots,i\}) \geq J(\{1,\dots,i-1\})$; so, exclude the set $\{1,\dots,i-1\}$ and continue if $j > i+1$. \item If $P(i+1) = x^{i+1}_1/x^{i+1}_2 \geq J(\{1,\dots,i\})$, then $J(\{1,\dots,i+1\}) \geq J(\{1,\dots,i\})$; so, exclude the set $\{1,\dots,i\}$ and continue if $j > i+2$. \item \dots \item If $P(k) = x^k_1/x^k_2 \geq J(\{1,\dots,k-1\})$ then $J(\{1,\dots,k\}) \geq J(\{1,\dots,k-1\})$; so, exclude the set $\{1,\dots,k-1\}$ and continue if $j > k+1$. \item If $P(k+1) = x^{k+1}_1/x^{k+1}_2 < J(\{1,\dots,k\})$ then inclusion of any type $j \in \{k+1,\dots,n\}$ will only cause a decrease in $J$ by the profitability ordering. Hence, the set $\{1,\dots,k\}$ maximizes $J$. Choose $\{1,\dots,k\}$ and stop. \end{itemize} % If these steps fail to find an optimal set then the set $\{1,\dots,j-1\}$ should be used (\ie, every type except for the types in set $\set{Z}^-$). While this algorithm finds a single set to maximize $J$, there may be additional sets of equal value. In fact, this algorithm finds the largest set that will globally maximize $J$; a similar algorithm can be constructed to find the smallest maximizing set (recall that it was assumed that $\set{Z}^= = \emptyset$; if this is not true, larger sets can be made by adding types from $\set{Z}^=$). Assume that these steps lead to the conclusion that the empty set is an optimal set; however, also assume that the empty set is not valid for the particular application. Consider the set $\set{G} \triangleq \{ i \in \{1,\dots,n\}: \text{ for all} j \in \{1,\dots,n\}, P(i) \geq P(j) \}$ (\ie, the set of all types that share the highest profitability). The optimal set will be any singleton set $\{i\}$ with $i \in \set{G}$ such that for all $j \in \set{G}$, $\lambda_i x_i \geq \lambda_j x_j$. We leave this claim without proof; however, it also follows from \longref{eq:lemma}. Alternatively, as long as task-type indices are ordered by profitability, a trivial algorithm can be used that calculates $J(\set{V})$ for each $\set{V} \in \{ \set{Q} \subseteq \{1,\dots,n\}: i \in \set{Q}, j \in \{1,\dots,i\} \implies j \in \set{Q} \}$ and picks the set that results in the greatest $J(\set{V})$ over these $n+1$ sets. Note that the restriction of $J$ to these $n+1$ sets is actually convex in the cardinality of its argument; in fact, this is a key reason why the above algorithm works. This convexity can reduce the number of steps required to complete such an algorithm. \section{OFT and The Rate-Maximizing Approach} \label{sec:rate_maximization} As discussed in \longref{ch:optimization_objectives}, classical \ac{OFT} makes use of the long-term rate of gain in \longref{eq:oft_payoff_long_rate}. In our model, where we assume $N^p$ to be fixed, we define the long-term rate of gain as the ratio of expectations in \longref{eq:payoff_rate} which results in the same expression as rate used in classical \ac{OFT}. Hence, the results here will be the same results found in classical \ac{OFT}. Nevertheless, we present these results here to assist in comparing and contrasting them with our other optimization measures. Additionally, our solution to the task-type choice problem makes use of the generalized algorithm developed in \longref{sec:generalized_algorithm}, and thus it serves as an important example. We also present a simple method for solving combined task-type and processing-length choice problems that is far simpler than the algorithm in \citet{SK86} and does not require any of the strange assumptions on the gain function. %\citet[section~2.4]{SK86} \section{Task-Type Choice} \label{sec:type_rate} For the task-type choice case, the long-term rate is a function $J: [0,1]^n \mapsto \R$, defined as % \begin{equation*} J(\v{p}) \triangleq \frac{\E(G_1)}{\E(T_1)} = \frac{ \sum\limits_{i=1}^n p_i \lambda_i \left( g_i - c_i \tau_i \right) - c^s } { \sum\limits_{i=1}^n p_i \lambda_i \tau_i + 1 } \end{equation*} % Take $j \in \{1,\dots,n\}$ and $\v{q},\v{r},\v{s} \in [0,1]^n$ with $q_j \in (0,1)$, $r_j = 0$, $s_j = 1$, and $q_i = r_i = s_i$ for all $i \in \{1,\dots,n\} \setdiff \{j\}$. If $\v{q}$ is optimal then $\partial J(\v{p})/\partial p_j|_{\v{p}=\v{q}} = 0$. However, % \begin{equation*} \left. \frac{ \partial J }{ \partial p_j } \right|_{\v{p}=\v{q}} % = % \frac{ % \left( % \sum\limits_{i=1,i \neq j}^n q_i \lambda_i \tau_i + 1 % \right) % \lambda_j \left( g_j - c_j \tau_j \right) % - % \left( % \sum\limits_{i=1,i \neq j}^n q_i \lambda_i \left( g_i - c_i % \tau_i \right) - c^s % \right) % \lambda_j \tau_j % } % { \left( % \sum\limits_{i=1}^n q_i \lambda_i \tau_i + 1 % \right)^2 } = \lambda_j \frac{ \left( \sum\limits_{\substack{1 \leq i \leq n\\i \neq j}} q_i \lambda_i \tau_i + 1 \right) \left( g_j - c_j \tau_j \right) - \left( \sum\limits_{\substack{1 \leq i \leq n\\i \neq j}} q_i \lambda_i \left( g_i - c_i \tau_i \right) - c^s \right) \tau_j } { \left( \sum\limits_{i=1}^n q_i \lambda_i \tau_i + 1 \right)^2 } \end{equation*} % The sign of this expression is completely determined by the numerator of the fraction, which is independent of $q_j$. Thus, if the numerator is nonzero, $\v{q}$ cannot be optimal. If the numerator is zero then % \begin{equation*} \frac{q_j \lambda_j \left( g_j - c_j \tau_j \right) } {q_j \lambda_j \tau_j} = \frac{\lambda_j \left( g_j - c_j \tau_j \right) } {\lambda_j \tau_j} = % \frac{g_j - c_j \tau_j}{\tau_j} % = % \frac{ % \sum\limits_{i=1,i \neq j}^n q_i \lambda_i % \left( g_i - c_i \tau_i \right) - c^s % }{\sum\limits_{i=1,i \neq j}^n q_i \lambda_i \tau_i + 1} \frac{ \sum\limits_{\substack{1 \leq i \leq n\\i \neq j}} q_i \lambda_i \left( g_i - c_i \tau_i \right) - c^s }{ \sum\limits_{\substack{1 \leq i \leq n\\i \neq j}} q_i \lambda_i \tau_i + 1 } = J(\v{r}) \end{equation*} % which, by \longref{eq:lemma}, indicates that $J(\v{q}) = J(\v{r}) = J(\v{s})$. Hence, if $\v{q}$ is optimal then it provides no benefit over $\v{r}$ and $\v{s}$, and those vectors can be chosen instead. Therefore, vectors of the form $\v{q}$ can be excluded and $J$ can be replaced with its restriction to the set $\{0,1\}^n$. This is known as the \emph{zero-one rule} \citep[p.~18]{SK86}. In other words, preference can be thought of as choosing an index set $\set{Q} \subseteq \{1,\dots,n\}$ for inclusion. That is, for any index set $\set{Q} \subseteq \{1,\dots,n\}$, $\v{p}$ is the corresponding preference vector if $i \in \set{Q} \iff p_i = 1$ for all $i \in \{1,\dots,n\}$. Hence, rather than restricting $J(p)$ to $\{0,1\}^n$, we redefine it without loss of generality as $J: \{\set{S}: \set{S} \subseteq \{1,\dots,n\}\} \mapsto \R$ with % \begin{equation} J(\set{Q}) \triangleq \frac{ \sum\limits_{i \in \set{Q}} \lambda_i \left( g_i - c_i \tau_i \right) - c^s } { \sum\limits_{i \in \set{Q}} \lambda_i \tau_i + 1 } \label{eq:task_type_choice_rate_cost} \end{equation} % For each task type $i \in \{1,\dots,n\}$, define the vector in \longref{eq:prof_vector} as % \begin{equation*} \v{x}^i = [ g_i - c_i \tau_i, \tau_i ]^\T \end{equation*} % Also let constants $\gamma=-c^s$ and $\varphi=1$. Hence, \longref{eq:task_type_choice_rate_cost} can be written in the general payoff function form of \longref{eq:general_type_choice_cost_function}. So, the profitability % \begin{equation*} P( i ) = \begin{cases} \infty &\text{if } \tau_i = 0 \text{ and } g_i > 0\\ \frac{g_i - c_i \tau_i}{\tau_i} &\text{if } \tau_i > 0\\ {-\infty} &\text{if } \tau_i = 0 \text{ and } g_i < 0 \end{cases} \end{equation*} % which is the long-term expected rate of gain for processing $N^p$ tasks if there is zero wait time between sequential task-type $i$ encounters (provided that task type $i$ has $g_i \neq 0$ or $\tau_i > 0$). In this special case, the form of $P(i)$ can also be interpreted as the long-term rate of gain for processing a \emph{single task} of type $i$. Order the task-type indices such that \longref{eq:prof_ordering} holds. Now, the generalized task-type choice algorithm described in \longref{sec:generalized_algorithm} can be used to find a set of task types for inclusion that will globally maximize the rate. \subsection{Discussion} %\citet[section~2.2]{SK86} Note that while the algorithm described here is very similar to the \citet{SK86} algorithm, its subtle differences allow for use with our model, which uses a much generalized parameter set. As we will discuss in \longref{sec:combined_type_length_rate}, one consequence of this is that the combined task-type and processing-length choice problem can be solved in a simple intuitive way completely different from the method described by \citet{SK86}. %\citet[section~2.4]{SK86}. \section{Processing-Length Choice} \label{sec:length_rate} For the processing-length choice case, the long-term rate is a function $J: \R_{\geq 0}^n \mapsto \R$, defined as % \begin{equation*} J(\v{\tau}) \triangleq \frac{ \sum\limits_{i=1}^n \lambda_i \left( g_i(\tau_i) - c_i \tau_i \right) - c^s } { \sum\limits_{i=1}^n \lambda_i \tau_i + 1 } \end{equation*} % In the task-type choice case, we found an algorithm to simplify picking the global maximum from a finite set of candidates. For this case, there are uncountably many candidates to consider and a wide range of possible gain function shapes that prevent us from making generalizations about global optimality. Thus, we will instead derive first-order and second-order necessary conditions for local optimality. We also will provide sufficiency conditions for an optimal candidate to be a local maximum. We will use the Lagrange multiplier method \citep{Bertsekas95} to derive necessary and sufficiency conditions for local maxima of $J$. That is, we will solve the constrained optimization problem % \begin{equation} \begin{split} &\text{minimize} \enskip {-J(\v{\tau})}\\ &\text{subject to} \enskip {-\tau_i} \leq 0 \enskip \text{for all} \enskip i \in \{1,\dots,n\} \end{split} \label{eq:problem_processing_length} \end{equation} % To do this, we assume that for all $i \in \{1,\dots,n\}$, the gain function $g_i(\tau_i)$ is continuously differentiable. This assumption will lead to first-order necessary conditions on local optimality. We also derive second-order necessary and sufficiency conditions in the case where for all $i \in \{1,\dots,n\}$ the gain function $g_i(\tau_i)$ is twice continuously differentiable. These restrictions only need to hold within some open neighborhood containing a local maximum of $J$. Define the Lagrangian $L: \R_{\geq 0}^n \times \R^n \mapsto \R$, % \begin{equation} L( \v{\tau}, \v{\mu} ) \triangleq -J( \v{ \tau } ) - \v{ \mu }^\T \v{ \tau } \label{eq:lagrangian_processing_length} \end{equation} % Also define $\set{A}(\v{\tau})$, the set of active inequality constraints, as % \begin{equation} \set{A}(\v{\tau}) \triangleq \{j \in \{1,\dots,n\}: \tau_j = 0\} \label{eq:active_constraints_processing_length} \end{equation} % Note that for all $\v{\tau} \in \R_{\geq 0}^n$, for all $j \in \set{A}(\v{\tau})$, the active inequality constraint gradient $\nabla_\v{\tau} ( -\tau_j ) = \v{e}_j$ where $\v{e}_j$ is the $j\th$ elementary basis vector for $\R^n$. Clearly the active inequality constraint gradients are linearly independent at all points $\v{\tau} \in \R_{\geq 0}^n$. Thus, all points $\v{\tau} \in \R_{\geq 0}^n$ are regular. \subsection{First-Order Necessary Conditions} Let $\v{\tau}^* \in \R^n$ be a local maximum of $J$. By the above reasoning, $\v{\tau}^*$ is regular. Then there must exist a unique Lagrange multiplier vector $\v{\mu}^* \in \R^n$ such that % \begin{equation} \begin{split} \left. \nabla_\v{\tau} L(\v{\tau}, \v{\mu}) \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} &= \v{0},\\ j \in \set{A}(\v{\tau}^*) &\implies \mu^*_j \geq 0,\\ j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*) &\implies \mu^*_j = 0 \end{split} \label{eq:lagrangian_first_order} \end{equation} % Let $\v{\mu}^*$ be such a multiplier vector. For all $j \in \{1,\dots,n\}$, % \begin{equation} \left. \frac{ \partial L }{ \partial \tau_j } \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = {-\left. \frac{ \partial J }{ \partial \tau_j } \right|_{\v{\tau}=\v{\tau}^*}} - \mu^*_j = 0 \label{eq:lagrangian_first_order_j} \end{equation} % which means that the multiplier % \begin{align*} \mu^*_j &= \left. -\frac{ \partial J }{ \partial \tau_j } \right|_{\v{\tau}=\v{\tau}^*}\\ &= -\frac{ \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right) \lambda_j \left( {g_j}'(\tau^*_j) - c_j \right) - \left( \sum\limits_{i=1}^n \lambda_i \left( g_i(\tau^*_i) - c_i \tau^*_i \right) - c^s \right) \lambda_j } { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^2 } \end{align*} % For $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, $\tau^*_j > 0$ and the multiplier $\mu^*_j = 0$, hence it must be that % \begin{equation} {g_j}'( \tau^*_j ) - c_j = \frac{ \sum\limits_{i=1}^n \lambda_i \left( g_i(\tau^*_i) - c_i \tau^*_i \right) - c^s } { \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 } = J( \v{\tau}^* ) \label{eq:patch_rate_mvt} \end{equation} % However, for $j \in \set{A}(\v{\tau}^*)$, $\tau^*_j = 0$ and the multiplier $\mu^*_j \geq 0$, hence, it must be that % \begin{equation} {g_j}'( 0 ) - c_j \leq \frac{ \sum\limits_{i=1}^n \lambda_i \left( g_i(\tau^*_i) - c_i \tau^*_i \right) - c^s } { \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 } = J( \v{\tau}^* ) \label{eq:patch_rate_mvt_0} \end{equation} \subsection{Second-Order Necessary Conditions} Furthermore, assume that for all $i \in \{1,\dots,n\}$, $g_i(\tau_i)$ is twice continuously differentiable at $\tau_i = \tau^*_i$. Define the feasible variations at $\v{\tau}^*$ as % \begin{equation*} \set{V}(\v{\tau}^*) \triangleq \left\{ \v{x} \in \R_{\geq 0}^n : x_j = 0, j \in \set{A}(\v{\tau}^*) \right\} \end{equation*} % It holds that for all $\v{y} \in \set{V}(\v{\tau}^*)$, % \begin{equation} \left. \v{y}^\T \nabla_{\v{\tau}\v{\tau}}^2 L(\v{\tau},\v{\mu}) \v{y} \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} \geq 0 \label{eq:patch_rate_sufficient} \end{equation} % In other words, for all $\v{x} \in \R_{\geq 0}^n$, % \begin{equation} \sum \limits_{\substack{1 \leq i \leq n\\ i \notin \set{A}(\v{\tau}^*)}} \sum \limits_{\substack{1 \leq j \leq n\\ j \notin \set{A}(\v{\tau}^*)}} x_i x_j \frac{\partial^2 L}{\partial \tau_i \tau_j} \geq 0 \label{eq:lagrangian_feasible_hessian} \end{equation} % However, all the off-diagonal terms are zero since for all $j,\ell \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ with $j \neq \ell$, % \begin{align*} \left. \frac{\partial^2 L}{\partial \tau_j \partial \tau_\ell} \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*=\v{\mu}^*)} &= 2 \lambda_\ell \lambda_j \frac{ \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right) \left( g_j'(\tau^*_j) - c_j \right) } { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^3 } \\ &\mathrel{\phantom{=}} {}- 2 \lambda_\ell \lambda_j \frac{ \sum\limits_{i=1}^n \lambda_i \left( g_i(\tau^*_i) - c_i \tau^*_i \right) - c^s } { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^3 } \\ &\mathrel{\phantom{=}} {}- \lambda_\ell \lambda_j \frac{ \left( g'_j( \tau^*_j ) - c_j \right) - \left( g'_\ell( \tau^*_\ell ) - c_\ell \right) } { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^2 }\\ \intertext{and by \longref{eq:patch_rate_mvt},} \left. \frac{\partial^2 L}{\partial \tau_j \partial \tau_\ell} \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*=\v{\mu}^*)} &= 2 \lambda_\ell \lambda_j \frac{ \sum\limits_{i=1}^n \lambda_i \left( g_i(\tau^*_i) - c_i \tau^*_i \right) - c^s } { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^3 } \\ &\mathrel{\phantom{=}} {}- 2 \lambda_\ell \lambda_j \frac{ \sum\limits_{i=1}^n \lambda_i \left( g_i(\tau^*_i) - c_i \tau^*_i \right) - c^s } { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^3 } \\ &\mathrel{\phantom{=}} {}- \lambda_\ell \lambda_j \frac{ J(\v{\tau}^*) - J(\v{\tau}^*) } { \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^2 }\\ &= 0 \end{align*} % Therefore, \longref{eq:patch_rate_sufficient} is equivalent to % \begin{equation} \sum \limits_{\substack{1 \leq i \leq n\\ i \notin \set{A}(\v{\tau}^*)}} x_i^2 \frac{\partial^2 L}{{\partial \tau_i}^2} \geq 0 \label{eq:lagrangian_perfect_square_hessian} \end{equation} % Thus, for all $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, % \begin{align*} \left. \frac{\partial^2 L}{{\partial \tau_j}^2} \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = % &= % -\frac{ % \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^2 % \left( % \begin{split} % \lambda_i^2 &\left( {g_i}'(\tau^*_i) - c_i \right)\\ % &{}+\lambda_i % \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right) % {g_i}''(\tau^*_i)\\ % &{}-\lambda_i^2 \left( {g_i}'(\tau^*_i) - c_i \right) % \end{split} % \right) % } % { % \left( \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 \right)^4 % }\\ % &= -\frac{ \lambda_j {g_j}''(\tau^*_j) } { \sum\limits_{i=1}^n \lambda_i \tau^*_i + 1 } \geq 0 \end{align*} % In other words, for all $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, % \begin{equation} {g_j}''(\tau_j^*) \leq 0 \label{eq:patch_rate_mvt_2nd} \end{equation} \subsection{Second-Order Sufficiency Conditions} Assume that for all $i \in \{1,\dots,n\}$, the gain function $g_i(\tau_i)$ is twice continuously differentiable, and let $\v{\tau}^* \in \R_{\geq 0}^n$ be a vector such that \longrefs{eq:patch_rate_mvt} and \shortref{eq:patch_rate_mvt_2nd} hold for all $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ and \longref{eq:patch_rate_mvt_0} holds for all $j \in \set{A}(\v{\tau}^*)$. If \longrefs{eq:patch_rate_mvt_0} and \longref{eq:patch_rate_mvt_2nd} both hold with strict inequality then $\v{\tau}^*$ is a local maximum of $J$. \subsection{Discussion} %\citet[p.~25]{SK86} Together, \longrefs{eq:patch_rate_mvt}, \shortref{eq:patch_rate_mvt_0}, and \shortref{eq:patch_rate_mvt_2nd} indicate that for tasks of diminishing returns, if the initial rate of return from the individual task is higher than the mean rate of return from the environment as a whole then the agent should process the task until the instantaneous rate of return processing the task drops to the mean rate of return from the environment. This is known as the marginal-value theorem \citep{Cha76}. However, in order to use this marginal-value theorem interpretation, gain functions that do not meet all of these conditions have to be excluded. For example, \citet{SK86} choose the restrictions that for all $i \in \{1,\dots,n\}$: % \begin{itemize} \item $c_i = 0$ since $g_i(\tau_i)$ is defined as the average net gain. \item $g_i(0) = 0$ \item $g_i'(0) > 0$ \item There exists some $\widetilde{\tau}_i \in \R_{\geq 0}$ such that for all $\tau_i \geq \widetilde{\tau}_i$, $g''(\tau_i) < 0$. \end{itemize} % If $c^s = 0$ then these three conditions meet all of the requirements of the marginal-value theorem. However, our method for solving the processing-length problem does not require any of these assumptions. Thus, even if solutions do not have an intuitive marginal-value interpretation, they still exist. As we discuss in \longref{sec:combined_type_length_rate}, the restriction that $g_i(0)=0$ for all $i \in \{1,\dots,n\}$ actually eliminates the combined type and length choice problem entirely. For this case, any type $i \in \{1,\dots,n\}$ with an optimal processing-length choice solution $\tau_i^* = 0$ can safely be excluded from the optimal task-type choice set and all other types can be included. This is intuitive. \section{Combined Type and Length Choice} \label{sec:combined_type_length_rate} \todo{Like I mentioned above, the stuff below needs to be changed. I realized this while completing it, which is why it is incomplete.} For the combined task-type and processing-length choice case, the long-term rate is a function $J: [0,1]^n \times \R_{\geq 0}^n \mapsto \R$, defined as % \begin{equation*} J(\v{p},\v{\tau}) \triangleq \frac{ \sum\limits_{i=1}^n \lambda_i p_i \left( g_i(\tau_i) - c_i \tau_i \right) - c^s } { \sum\limits_{i=1}^n \lambda_i p_i \tau_i + 1 } \end{equation*} % Using an argument similar to the one in \longref{sec:type_rate}, it can be shown that the zero-one rule still holds; thus, for any optimal point $(\v{p}^*,\v{\tau}^*) \in [0,1]^n \times \R_{\geq 0}^n$, there exists a point $(\hat{p},\v{\tau^*}) \in \{0,1\}^n \times \R_{\geq 0}^n$ optimal set exists with $p_i \in \{0,1\}$ for all $i \in \{1,\dots,n\}$. For all $i \in \{1,\dots,n\}$, assume that $g_i(0)=0$. Take $(\v{p}^*,\v{\tau}^*)$ to be a local maximum of $J(\v{p},\v{\tau})$. Take $(\v{\hat{p}},\v{\hat{\tau}})$ so that for all $i \in \{1,\dots,n\}$, % \begin{equation*} (\hat{p}_i,\hat{\tau}_i) = \begin{cases} (0,0) &\text{if } p^*_i = 0 \text{ or } \tau^*_i = 0\\ (p^*_i,\tau^*_i) &\text{otherwise} \end{cases} \end{equation*} % Certainly the point $(\v{\hat{p}},\v{\hat{\tau}})$ is not a \emph{local maximum} of $J$ in general; however, due to our assumption about the initial value of the gain functions, for all $i \in \{1,\dots,n\}$ with either $p^*_i = 0$ or $\tau^*_i = 0$ or both, $p^*_i \tau^*_i = p^*_i ( g( \tau^*_i ) - c_i \tau^*_i ) = 0$, and thus $J(\v{p}^*,\v{\tau}^*)=J(\v{\hat{p}},\v{\hat{\tau}})$. % The choice of t_j > 0 for P-j=0 has no impact on J, and so % maximization of J We claim that For the sake of argument, consider only vectors from the space $\{ (\v{p},\v{\tau}) \in \{0,1\}^n \times \R_{\geq 0}^n :$ For the sake of argument, consider only vectors Note that this implies that $J($ For all $i \in \{1,\dots,n\}$, assume that $g_i(0)=0$. Take vectors $\v{\tau} \in \R_{\geq 0}^n$ and $\v{p} \in \{0,1\}^n$. We claim that there exist vectors $\v{\bar{\tau}} \in \R_{\geq 0}^n$ and $\v{\bar{p}} \in \{0,1\}^n$ such that for all $i \in \{1,\dots,n\}$, $\bar{\tau}_i = 0 \iff \bar{p}_i = 0$ and $J(\v{p},\v{\tau}) = J(\v{\bar{p}},\v{\bar{\tau}})$. To generate these vectors, for each $i \in \{1,\dots,n\}$, assign % \begin{equation*} (\bar{p}_i,\bar{\tau}_i) = \begin{cases} (p_i,\tau_i) &\text{if } p_i = 1 \text{ and } \tau_i > 0\\ (0,0) &\text{otherwise} \end{cases} \end{equation*} % Note that for any $i \in \{1,\dots,n\}$, if $\tau_i = 0$ then $g_i(\tau_i) = 0$; thus, if $\tau_i = 0$ or $p_i = 0$ or both then $p_i \tau_i = p_i g_i(\tau_i) = 0$ and type $i$ makes no contribution to the payoff function evaluated at $(\v{p},\v{\tau})$. Also, by the definition of $\v{\bar{p}}$ and $\v{\bar{\tau}}$, for all $i \in \{1,\dots,n\}$, $\bar{p}_i = 0 \iff \bar{\tau}_i = 0 \iff (p_i = 0 \text{ or } \tau_i = 0 )$. Thus, $J(\v{p},\v{\tau}) = J(\v{\bar{p}},\v{\bar{\tau}})$, and the claim is proved. Again, for all $i \in \{1,\dots,n\}$, assume that $g_i(0)=0$. Now assume that $(\v{p},\v{\tau})=(\v{p}^*,\v{\tau}^*)$ locally maximizes $J(\v{p},\v{\tau})$. By the claim, there must exist some point $(\v{p}$ %\citet[ch.~2]{SK86} %\citet[section~2.4]{SK86} This assumption is made along with many others by \citet{SK86} and is used to generate a combined task-type and processing-length choice algorithm. However, if only this restriction on the initial value of the gain function is assumed, an algorithm much simpler than the one by \citet{SK86} can be used. By Now assume that $\v{\tau}^* \in \R_{\geq 0}^n$ and $\v{p}^* \in \{0,1\}^n$ are optimal solutions for It can easily be shown (\eg, by contradiction) that a locally optimal solution for the combined task-type and processing-length choice problem is found with the following steps. % \begin{itemize} % \item First, solve the related processing-length problem that results from setting $p_i = 1$ for all $i \in \{1,\dots,n\}$; call the resulting optimal solution $\v{\tau}^*$. It can be shown that there exists a task-type choice preference vector $\v{p}^*$ such that $\v{p}=\v{p}^*$ and $\v{\tau}=\v{\tau}^*$ will be an optimal solution for the combined problem. % \item Next, solve the related task-type choice problem assuming that for all $i \in \{1,\dots,n\}$, the average processing length $\tau_i = \tau_i^*$ and the average gain $g_i = g_i(\tau_i^*)$ where the gain function $g_i(\tau_i)$ is taken from the combined problem. Call the resulting optimal solution $\v{p}^*$. An optimal solution for the combined problem uses $\v{p}=\v{p}^*$ and $\v{\tau}=\v{\tau}^*$. % \end{itemize} % Thus, set task-type choice decisions with $\v{p}=\v{p}^*$ and processing-length choice decisions with $\v{\tau}=\v{\tau}^*$. The result will be a local maximum of the rate. \subsection{Discussion} %\citet[section~2.2]{SK86} %\citet[ch.~2]{SK86} This intuitive approach follows from the ability of the generalized task-type choice algorithm in \longref{sec:generalized_algorithm} to handle parameter vectors of the form of \longref{eq:prof_vector} with $x^i_2 = 0$. Usually a type $i \in \{1,\dots,n\}$ associated with such a parameter vector will \emph{not} have a finite profitability (\ie, $P(i) = \infty$ or $P(i) = -\infty$) and so the algorithm presented by \citet{SK86} for solving the task-type choice problem is unable to handle this case. The algorithm in \longref{sec:generalized_algorithm} can be considered the natural extension of the \citeauthor{SK86} algorithm to profitabilities of all forms. It is interesting to note that \citeauthor{SK86} assume that for all $i \in \{1,\dots,n\}$, the gain function $g_i(0)=0$. Under this restriction, our combined task-type and processing-length choice solution states that if a local optimum behavior is desired, a type $i \in \{1,\dots,n\}$ should be chosen for processing \emph{if and only if} the related processing-length choice solution results in $\tau_i^* = 0$. That is, for the case where all gain functions are initially zero, the combined problem is identical to the processing-length choice problem. Since \citet{SK86} always assumes this property of the gain functions, there is no need for any discussion of the combined problem beyond the existing discussion of the processing-length choice problem. \section{Generalized Extension of OFT Results} We have optimized the expression of $\E(G_1)/\E(T_1)$, which is equivalent to the expression of the long-term rate of gain usually optimized in classical \ac{OFT}. Thus, our solutions are consistent with the classical \ac{OFT} solutions. However, due to the algorithms we have developed, we were able to make our parameters much more general. This allows for the analysis and design of rate-maximizing behaviors in much more general contexts. \section{Maximizing Point Gain Discounted by Weighted Time} The rate-maximizing approach may not be appropriate for all cases. As we discussed, there are other methods of simultaneously applying upward pressure on point gain and downward pressure on time. One such method is to apply \longref{eq:compound_optimization_objectives} with $\E(G)$ and $-\E(T)$ as in % \begin{equation*} \E(G) - w_1 \E(T) = N^p \left( \E(G_1) - w_1 \E(T_1) \right) \label{eq:payoff_discounted_gain} \end{equation*} % where $w_1 \in \R_{\geq 0}$ is in points per second and $k \in \{1,\dots,N^p\}$ is an arbitrary cycle. Note that maximizing this function is identical to maximizing $\E(G_1) - w_1 \E(T_1)$ for cycle $k \in \{1,\dots,N^p\}$, and so there are similarities between this payoff function and \longref{eq:HouMc99_approach}, the function used in the \citet{HouMc99} approach to rate maximization. However, we allow $w_1$ to be chosen in order to adjust the relative importance of time. What is interesting about this function is that while it provides the same qualitative pressures on gain and time as the rate-maximizing payoff function, its optimization leads to very different results. \section{Task-Type Choice} Using the same analysis procedure as in the task-type choice case for the rate-maximizing approach, it is easy to show that the zero-one rule holds here as well. Thus, we define the payoff function as $J: \{\set{S}: \set{S} \subseteq \{1,\dots,n\}\} \mapsto \R$ with % \begin{equation} J(\set{Q}) \triangleq N^p \frac{ \sum\limits_{i \in \set{Q}} \lambda_i \left( g_i - c_i \tau_i - w_1 \tau_i \right) - c^s - w_1} { \sum\limits_{i \in \set{Q}} \lambda_i } \label{eq:task_type_choice_gain_cost} \end{equation} % Similar to before, for each task type $i \in \{1,\dots,n\}$, define the vector in \longref{eq:prof_vector} as % \begin{equation*} \v{x}^i = [ N^p ( g_i - c_i \tau_i - w_1 \tau_i ), 1]^\T \end{equation*} % Also let constants $\gamma=-c^s$ and $\varphi=1$. Hence, \longref{eq:task_type_choice_gain_cost} can be written in the general payoff function form of \longref{eq:general_type_choice_cost_function}. So, the profitability % \begin{equation*} P(i) = N^p ( g_i - c_i \tau_i - w_1 \tau_i ) \end{equation*} % which is equal to the expected time-discounted gain for processing $N^p$ tasks when the wait time between sequential encounters of tasks of type $i$ is zero. In this special case, the normalized profitability $P(i)/N^p$ is equal to the mean time-discounted gain for processing a \emph{single task} of type $i$. Order the task-type indices such that \longref{eq:prof_ordering} holds. Just as before, the generalized task-type choice algorithm can be used to find a set of task types for inclusion that will globally maximize the payoff function. Note that the positive constant factor $N^p$ in $P(i)$ for all $i \in \{1,\dots,n\}$ does not affect the ordering, and so ordering by $P(i)/N^p$ is equivalent. This means that tasks can be ordered by the mean time-discounted gain for processing a \emph{single task}, just as the ordering for the rate-maximization case orders by the long-term rate of gain for processing a \emph{single task}. Also notice that the profitability ordering for this mean time-discounted gain case could lead to the same ordering as would be seen in the optimization algorithm for the rate-maximizer. In fact, often the two orderings will only differ slightly. Because of this, if evidence for using a rate-maximizing approach to model nature comes entirely from observations of task-type choice ordering then that evidence does not exclude the use of this payoff function as well. \section{Processing-Length Choice} For the processing-length choice case, the payoff function is a function $J: \R_{\geq 0}^n \mapsto \R$, defined as % \begin{equation*} J(\v{\tau}) \triangleq N \frac{ \sum\limits_{i=1}^n \lambda_i \left( g_i(\tau_i) - c_i \tau_i - w_1 \tau_i \right) - c^s - w_1} { \sum\limits_{i=1}^n \lambda_i } \end{equation*} % As in the rate-maximizing approach, we will derive necessary and sufficiency conditions for local maxima of $J$. That is, we will solve the constrained optimization problem in \longref{eq:problem_processing_length}. Again, we assume that for all $i \in \{1,\dots,n\}$, the gain function $g_i(\tau_i)$ is continuously differentiable. When necessary, we will also assume that for all $i \in \{1,\dots,n\}$ the gain function $g_i(\tau_i)$ is twice continuously differentiable. These restrictions only need to hold within some open neighborhood containing a local maximum of $J$. Define the Lagrangian $L: \R_{\geq 0}^n \times \R^n \mapsto \R$ by \longref{eq:lagrangian_processing_length}. Also define $\set{A}(\v{\tau})$, the set of active inequality constraints, as in \longref{eq:active_constraints_processing_length}. As discussed in the rate-maximizing approach, all points $\v{\tau} \in \R_{\geq 0}^n$ are said to be regular. \subsection{First-Order Necessary Conditions} Let $\v{\tau}^* \in \R^n$ be a local maximum of $J$. By the above reasoning, $\v{\tau}^*$ is regular. Then there must exist a unique Lagrange multiplier vector $\v{\mu}^* \in \R^n$ such that \longref{eq:lagrangian_first_order} holds. Let $\v{\mu}^*$ be such a multiplier vector. For all $j \in \{1,\dots,n\}$, \longref{eq:lagrangian_first_order_j} holds, which means that the multiplier $\mu^*_j$ is given by % \begin{align*} \mu^*_j &= \left. -\frac{ \partial J }{ \partial \tau_j } \right|_{\v{\tau}=\v{\tau}^*}\\ &= -\frac{N^p}{ \sum\limits_{i=1}^n \lambda_i } \lambda_j \left( g'_j(\tau^*_j) - c_j - w_1 \right) \end{align*} % For $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, $\tau^*_j > 0$ and the multiplier $\mu^*_j = 0$, hence it must be that % \begin{equation} {g_j}'( \tau^*_j ) - c_j = w_1 \label{eq:patch_discount_mvt} \end{equation} % However, for $j \in \set{A}(\v{\tau}^*)$, $\tau^*_j = 0$ and the multiplier $\mu^*_j \geq 0$, hence, it must be that % \begin{equation} {g_j}'( 0 ) - c_j \leq w_1 \label{eq:patch_discount_mvt_0} \end{equation} \subsection{Second-Order Necessary Conditions} Furthermore, assume that for all $i \in \{1,\dots,n\}$, $g_i(\tau_i)$ is twice continuously differentiable at $\tau_i = \tau^*_i$. The same discussion of feasible variations from the rate-maximizing approach applies here, so for all $\v{x} \in \R_{\geq 0}^n$, \longref{eq:lagrangian_feasible_hessian} holds. However, all the off-diagonal terms zero. That is, for all $j,\ell \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ with $j \neq \ell$, $\partial^2 L(\v{\tau},\v{\mu})/{\partial \tau_j \partial \tau_\ell}|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = 0$. Therefore, \longref{eq:lagrangian_feasible_hessian} is equivalent to \longref{eq:lagrangian_perfect_square_hessian}. Thus, for all $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, % \begin{align*} \left. \frac{\partial^2 L}{{\partial \tau_j}^2} \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = -\frac{N^p}{ \sum\limits_{i=1}^n \lambda_i } \lambda_j {g_j}''(\tau^*_j) \geq 0 \end{align*} % In other words, for all $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, % \begin{equation} {g_j}''(\tau_j^*) \leq 0 \label{eq:patch_discount_mvt_2nd} \end{equation} \subsection{Second-Order Sufficiency Conditions} Assume that for all $i \in \{1,\dots,n\}$, the gain function $g_i(\tau_i)$ is twice continuously differentiable, and let $\v{\tau}^* \in \R_{\geq 0}^n$ be a vector such that \longrefs{eq:patch_discount_mvt} and \shortref{eq:patch_discount_mvt_2nd} hold for all $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ and \longref{eq:patch_discount_mvt_0} holds for all $j \in \set{A}(\v{\tau}^*)$. If \longrefs{eq:patch_discount_mvt_0} and \longref{eq:patch_discount_mvt_2nd} both hold with strict inequality then $\v{\tau}^*$ is a local maximum of $J$. \subsection{Marginal-Value Theorem} It is interesting to note that in this case there is no coupling between the different processing lengths. In the rate-maximizing approach, the problem was a system of coupled equations. However, in this case there are $n$ independent equations that are not affected by the task distributions. Nevertheless, there is still a marginal-value notion here. Together, \longrefs{eq:patch_discount_mvt}, \shortref{eq:patch_discount_mvt_0}, and \shortref{eq:patch_discount_mvt_2nd} indicate that for tasks of diminishing returns, if the initial rate of discounted point gain from the individual task is higher than the overall value of time to the agent then the agent should process the task until the instantaneous rate of discounted point gain processing the task drops to the value of time. If time is very important to the agent, it will only continue processing a task for a long time if the task has a steep gain function. If time is not very important to the agent, it will continue processing every task for a long amount of time. However, the time it spends on tasks of one type will have no impact on the time it spends in tasks of a different type. \section{The Efficiency Approach} Behavioral ecologists generally dislike considering efficiency approaches because efficiency puts no pressure on time, and thus an agent with an efficiency-maximizing behavior may be at a significant survival risk due to the long periods of time required to receive significant point gains. However, in engineering applications where time may not be as important to an agent, efficiency could be a good metric to use. Additionally, because costs in this model are proportional to the time spent during certain activities, any pressure on cost may have an implicit pressure on time as well. We maximize the negative \acro{CBR}{cost-to-benefit ratio} in \longref{eq:payoff_efficiency}. We assume that $c_i \geq 0$ for all $i \in \{1,\dots,n\}$ and $c^s \geq 0$. To avoid dividing by zero in the task-type choice problem, we assume that for all $i \in \{1,\dots,n\}$, $g_i > 0$. Similarly, for the task-processing length choice problem, for all $i \in \{1,\dots,n\}$, we will define the average gain function as as $g_i: \R_{\geq 0} \mapsto \R_{>0}$. \section{Task-Type Choice} Again, using the same analysis as in the rate-maximizing approach, it is easy to show that the zero-one rule holds here as well. Thus, we define the payoff function $J: \{\set{S}: \set{S} \subseteq \{1,\dots,n\}\} \mapsto \R$ with % \begin{equation} J(\set{Q}) \triangleq -\frac{ \sum\limits_{i \in \set{Q}} \lambda_i c_i \tau_i + c^s} { \sum\limits_{i \in \set{Q}} \lambda_i g_i } \label{eq:task_type_choice_efficiency_cost} \end{equation} % Notice that this ratio is \emph{not} the same ratio as in the rate-maximizing payoff function. Similar to before, for each task type $i \in \{1,\dots,n\}$, define the vector in \longref{eq:prof_vector} as % \begin{equation*} \v{x}^i = [ -c_i \tau_i, g_i ]^\T \end{equation*} % Also let constants $\gamma=-c^s$ and $\varphi=1$. Hence, \longref{eq:task_type_choice_efficiency_cost} can be written in the general payoff function form of \longref{eq:general_type_choice_cost_function}. So, the profitability % \begin{equation*} P( i ) = \frac{-c_i \tau_i}{g_i} \end{equation*} % which is the efficiency for processing $N^p$ tasks if there is zero wait time between sequential task-type $i$ encounters. In this special case, the profitability $P(i)$ can also be interpreted as the efficiency of processing a \emph{single task} of type $i$. Order the task-type indices such that \longref{eq:prof_ordering} holds. Now, the generalized task-type choice algorithm can be used to find a set of task types for inclusion that will globally maximize the payoff function. \section{Processing-Length Choice} For the processing-length choice case, the payoff function is a function $J: \R_{\geq 0}^n \mapsto \R$, defined as % \begin{equation*} J(\v{\tau}) \triangleq -\frac{ \sum\limits_{i=1}^n \lambda_i c_i \tau_i + c^s} { \sum\limits_{i=1}^n \lambda_i g_i(\tau_i) } \end{equation*} % Notice that this ratio is \emph{not} the same ratio as in the rate-maximizing payoff function. As in the rate-maximizing approach, we will derive necessary and sufficiency conditions for local maxima of $J$. That is, we will solve the constrained optimization problem in \longref{eq:problem_processing_length}. Again, we assume that for all $i \in \{1,\dots,n\}$, the gain function $g_i(\tau_i)$ is continuously differentiable. When necessary, we will also assume that for all $i \in \{1,\dots,n\}$ the gain function $g_i(\tau_i)$ is twice continuously differentiable. These restrictions only need to hold within some open neighborhood containing a local maximum of $J$. Define the Lagrangian $L: \R_{\geq 0}^n \times \R^n \mapsto \R$ by \longref{eq:lagrangian_processing_length}. Also define $\set{A}(\v{\tau})$, the set of active inequality constraints, as in \longref{eq:active_constraints_processing_length}. As discussed in the rate-maximizing approach, all points $\v{\tau} \in \R_{\geq 0}^n$ are said to be regular. \subsection{First-Order Necessary Conditions} Let $\v{\tau}^* \in \R^n$ be a local maximum of $J$. By the above reasoning, $\v{\tau}^*$ is regular. Then there must exist a unique Lagrange multiplier vector $\v{\mu}^* \in \R^n$ such that \longref{eq:lagrangian_first_order} holds. Let $\v{\mu}^*$ be such a multiplier vector. For all $j \in \{1,\dots,n\}$, \longref{eq:lagrangian_first_order_j} holds, which means that the multiplier $\mu^*_j$ is given by % \begin{align*} \mu^*_j &= \left. -\frac{ \partial J }{ \partial \tau_j } \right|_{\v{\tau}=\v{\tau}^*}\\ &= \frac { \lambda_j c_j \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) - \lambda_j g'_j(\tau^*_j) \left( \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s \right) } { \left( \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) \right)^2 } \end{align*} % For $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, $\tau^*_j > 0$ and the multiplier $\mu^*_j = 0$, hence it must be that % \begin{equation} g'_j(\tau^*_j) = 0 \implies c_j = 0 \label{eq:patch_efficiency_mvt_special} \end{equation} % and % \begin{equation} g'_j(\tau^*_j) \neq 0 \implies {-\frac{c_j}{ {g_j}'( \tau^*_j ) }} = -\frac{ \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s} { \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) } = J(\tau^*) \label{eq:patch_efficiency_mvt} \end{equation} % However, for $j \in \set{A}(\v{\tau}^*)$, $\tau^*_j = 0$ and the multiplier $\mu^*_j \geq 0$, hence, it must be that % % \begin{equation} g'_j(0) = 0 \implies c_j \geq 0 \label{eq:patch_efficiency_mvt_0_special} \end{equation} % and % \begin{equation} g'_j(0) \neq 0 \implies {-\frac{c_j}{ {g_j}'( 0 ) }} \leq -\frac{ \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s} { \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) } = J(\tau^*) \label{eq:patch_efficiency_mvt_0} \end{equation} % Note that since we have assumed $c_i \geq 0$ for all $i \in \{1,\dots,n\}$, \longref{eq:patch_efficiency_mvt_0_special} will always be true for all $j \in \set{A}(\v{\tau}^*)$ such that $g'_j(0) = 0$. \subsection{Second-Order Necessary Conditions} Furthermore, assume that for all $i \in \{1,\dots,n\}$, $g_i(\tau_i)$ is twice continuously differentiable at $\tau_i = \tau^*_i$. The same discussion of feasible variations from the rate-maximizing approach applies here, so for all $\v{x} \in \R_{\geq 0}^n$, \longref{eq:lagrangian_feasible_hessian} holds. So, take $a,b \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ such that $a \neq b$. % \begin{align*} \left. \frac{\partial^2 L}{\partial \tau_a \partial \tau_b} \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} &= \frac{ \lambda_b \lambda_a \left( c_a g'_b( \tau^*_b ) - c_b g'_a( \tau^*_a ) \right) \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) } { \left( \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) \right)^3 }\\ &\mathrel{\phantom{=}} {}- 2 \lambda_b \lambda_a g'_b(\tau^*_b) \frac{ c_a \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) - g_a'(\tau^*_a) \left( \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s \right) } { \left( \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) \right)^3 } \end{align*} % By \longref{eq:patch_efficiency_mvt_special}, if $g_a( \tau^*_a ) = 0$ then $c_a = 0$, and thus $\partial^2 L(\v{\tau},\v{\mu})/{\partial \tau_a \partial \tau_b}|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = 0$. Additionally, by the same reasoning, if $g_b( \tau^*_b ) = 0$ then $c_b = 0$, and so $\partial^2 L(\v{\tau},\v{\mu})/{\partial \tau_a \partial \tau_b}^2|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = 0$. In the case where both $g_a(\tau^*_a) \neq 0$ and $g_b(\tau^*_b) \neq 0$ then % \begin{equation*} \frac{c_a}{ {g_a}'( \tau^*_a ) } = \frac{c_b}{ {g_b}'( \tau^*_b ) } = \frac{ \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s} { \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) } \end{equation*} % so, % \begin{align*} \left. \frac{\partial^2 L}{\partial \tau_a \partial \tau_b} \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} &= \frac{ \lambda_b \lambda_a ( 0 ) \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) } { \left( \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) \right)^3 } %\\ %&\mathrel{\phantom{=}} %{}- - 2 \lambda_b \lambda_a g'_b(\tau^*_b) \frac{ 0 } { \left( \sum\limits_{i=1}^n \lambda_i g_i( \tau^*_i ) \right)^3 }\\ &= 0 \end{align*} % Hence, all the off-diagonal terms cancel zero for all $j,\ell \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ with $j \neq \ell$, $\partial^2 L(\v{\tau},{\mu})/{\partial \tau_j \partial \tau_\ell}|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = 0$. Therefore, \longref{eq:lagrangian_feasible_hessian} is equivalent to \longref{eq:lagrangian_perfect_square_hessian}. Thus, for all $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, % \begin{align*} \left. \frac{\partial^2 L}{{\partial \tau_j}^2} \right|_{(\v{\tau},\v{\mu})=(\v{\tau}^*,\v{\mu}^*)} = \frac { -\lambda_j {g_j}''(\tau^*_j) \left( \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s \right) } { \left( \sum\limits_{i=1}^n \lambda_i g_i(\tau^*_i) \right)^2 } \geq 0 \end{align*} % In other words, for all $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$, % \begin{equation*} {g_j}''(\tau^*_j) \left( \sum\limits_{i=1}^n \lambda_i c_i \tau^*_i + c^s \right) \leq 0 \end{equation*} % However, if there exists a $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ then there exists a $j \in \{1,\dots,n\}$ such that $\tau^*_j > 0$. Additionally, we assume that $c^s$ and $\{c_1,\dots,c_n\}$ were not chosen so that $J \equiv 0$. So, % \begin{equation} {g_j}''(\tau^*_j) \leq 0 \label{eq:patch_efficiency_mvt_2nd} \end{equation} \subsection{Second-Order Sufficiency Conditions} Assume that for all $i \in \{1,\dots,n\}$, the gain function $g_i(\tau_i)$ is twice continuously differentiable, and let $\v{\tau}^* \in \R_{\geq 0}^n$ be a vector such that \longrefs{eq:patch_efficiency_mvt_special}, \shortref{eq:patch_efficiency_mvt}, and \shortref{eq:patch_efficiency_mvt_2nd} hold for all $j \in \{1,\dots,n\} \setdiff \set{A}(\v{\tau}^*)$ and \longrefs{eq:patch_efficiency_mvt_0_special} and \shortref{eq:patch_efficiency_mvt_0} hold for all $j \in \set{A}(\v{\tau}^*)$. If \longrefs{eq:patch_efficiency_mvt_0_special}, \shortref{eq:patch_efficiency_mvt_0}, and \longref{eq:patch_efficiency_mvt_2nd} all hold with strict inequality then $\v{\tau}^*$ is a local maximum of $J$. \subsection{Marginal-Value Theorem} This processing-length result also has a marginal-value notion. Together, \longrefs{eq:patch_efficiency_mvt_special}, \shortref{eq:patch_efficiency_mvt}, \shortref{eq:patch_efficiency_mvt_0_special}, \shortref{eq:patch_efficiency_mvt_0}, and \shortref{eq:patch_efficiency_mvt_2nd} indicate that for tasks of diminishing returns, if the initial \acro{CRBRR}{cost-rate-to-benefit-rate ratio} from the individual task is lower than the overall \ac{CBR} then the agent should process the task until the instantaneous \ac{CRBRR} processing the task rises to the overall \ac{CBR}. \todo{Of course, additional constraint and variance cases should be added here.} % \section{Threshold Approaches and Explicit Cycle Count} % % Another advantage to using an explicit cycle count (\ie, $N^p$) rather % than using a limiting case is that methods that seek to achieve some % threshold (\eg, some minimum number of points or some maximum amount % of time) will have different behaviors based on the size of $N^p$. % % For example, if the goal is to solve a task-type choice problem for % the case where minimum $\E(T)$ is desired as long as $\E(G) \geq G^c$ % is met, where $G^c \in \R$ then a \ac{KKT} analysis shows that when % the solution sits on the boundary where $\E(G) = G^c$, the task-type % inclusion criteria is dependent upon $N^p$. That is, if a particular % task type $i \in \{1,\dots,n\}$ is such that % % % \begin{equation*} % \tau_i \leq \frac{1}{N^p} \E(T) \quad \land \quad % g_i - c_i \tau_i \geq \frac{G^c}{N^p} % \end{equation*} % % % then this task type should never be excluded completely. Both of these % conditions depend upon $N^p$. As $N^p$ increases, more types will meet % the gain condition, but fewer types will meet the time condition. % Without making the cycle count $N^p$ explicit then this variation in % optimal behavior would not have been evident.