Let be a Markov chain. At each time , a decision maker observes the state of the Markov chain and decides whether to continue or stop the process. If the decision maker decides to continue, he incurs a continuation cost and the state evolves. If the DM decides to stop, he incurs a stopping cost of and the problem is terminated. The objective is to determine an optimal stopping time to minimize
Such problems are called Optimal stopping problems. These can be solved using dynamic programming as follows.
Define the cost-to-go function of any stopping rule as and the value function as Then, it can be shown that the value functions satisfy the following recursion:
Consider an optimal stopping problem and define the benefit (or advantage) function as the expected benefit of delaying the stopping decision at time , i.e., Thus, it is optimal to stop whenever .
1 The terminology comes from reward maximization problems. In cost minimization problems, this may be thought of as the disadvantage function.
Note that, we can write the value function in terms of the benefit function as follows: where is a short hand for .
Now, define the one-step look-ahead function as the benefit of delaying the stopping decision by one step, i.e.,
The benefit function and the one-step look-ahead functions are related as follows. and
Theorem 11.1 Suppose the state space is totally ordered and the following conditions hold.
- For all , is weakly increasing in .
- is stochastic monotone.
Then is weakly increasing in for all and there exists a sequence such that it is optimal to stop at time if and only if .
We first prove monotonicity of . As usual, the proof is by backward induction. For , . This forms the basis of induction. Now assume that is increasing in and consider the problem at time .
Since is increasing so is . Moreover, since is stochastically monotone, is increasing in . Therefore, is increasing in . Thus, by induction, is increasing in for all .
Recall that it is optimal to stop iff . Since is increasing in , the optimal decision rule is of a threshold type.
Let’s contrast the result of Theorem 11.1 from the monotonicity of optimal policies in general MDPs. Here, in addition to stochastic monotonicity of the Markov chain, we only require the one-step look-ahead function to be monotone. There is no assumption on the submodularity of the cost.
Example: Time-to-Market Model
Consider a firm that decides when to introduce a new product. When the firm introduces the product earlier than the competition, it captures a larger market share. However, an early introduction results in high production costs and low profit margins due to low manufacturing yields. Hence, the firm needs to determine the optimal time to enter the market. Suppose that the total market demand is deterministic. Let denote the number of competitors at time . It is assumed that where is an independent process independent of .
Let denote the market share of the firm when it enters the market after the -th competitor. It is assumed that is decreasing and concave in .
Let denote the sale price of the product and denote the production cost at time . It is assumed that decreases with .
The continuation reward is zero and the stopping reward at time is When should the firm enter the market?
First observe that is a monotone process. Now consider the one step look-ahead function Since is concave, the first term is decreasing in . The second term is also decreasing in because is decreasing in and . Therefore, is decreasing in . Hence, by Theorem 11.1, there exist a sequence of thresholds such that the firm should enter the market at time iff .
Example: Optimal choice of the best alternative
A decision maker (DM) wants to select the best alternative from a set of alternatives. The DM evaluates the alternatives sequentially. After evaluating alternative , the DM knows whether alternative was the best alternative so far or not. Based on this information, the DM must decide whether to choose alternative and stop the search, or to permanently reject alternative and evaluate remaining alternatives. The DM may reject the last alternative and not make a choice at all. All alternatives are equally likely to be the best. Find the optimal choice strategy that maximize the probability of picking the best alternative.
This optimization problem is known by different names including secretary problem (in which the alternatives correspond to finding the best candidate as a secretary), marriage problem (in which the alternatives correspond of find the best spouse), Googol (in which the alternatives consist of finding the biggest number), parking problem (in which the alternatives correspond to finding the nearest parking spot) and so on
We can view this an optimal stopping problem with binary state . means that alternative is the best alternative so far. Thus, is an independent process with . The continuation reward is zero. The DM receives a stopping reward only if the current alternative is best, i.e., the current alternative is best so far () and better than all future alternative (). Thus, the expected stopping reward conditioned on is Thus, the optimal strategy is given by the solution of the following dynamic program.
Lemma 11.1 Define Then, and, therefore,
The result follows immediately from the definition of .
Note that it is never optimal to select an alternative if it is not the best so far (i.e., ). Thus, we can completely characterize an optimal strategy by solving for in a backward manner.
Theorem 11.2
There exists a critical time , , such that it is optimal to reject all alternatives until and then select the first alternative that is superior to all previous ones, if it occurs.
The critical time is the smallest integer such that
The value function are given by
For large , and the probability of selecting the best candidate is approximately .
Note that . Thus, is decreasing with time.
where the first term is increasing with time and the second term is decreasing with time. Thus, the critical time is the first time when . Since and , such a .
For any (i.e., ),
For any (i.e., ), we have . Therefore, The above can be shown to be equal to the form given above in point 3 by induction.
For large , Thus, . Moreover
Example: Call options
An investor has a call option to buy one share of a stock at a fixed price dollars and has days to exercise this option. For simplicity, we assume that the investor makes a decision at the beginning of each day.
The investor may decide not to exercise the option but if he does exercise the option when the stock price is , he effectively gets .
Assume that the price of the stock varies with independent increments, i.e., the price on day is where is an i.i.d. process with mean .
Let denote the stopping time when the investor exercises his option. Then the optimization problem is to maximize
Since this is an optimal stopping problem with perfect state observation, the optimal strategy is given by the solution of the following dynamic program
Lemma 11.2
- For all , is increasing in .
- For all , is decreasing in .
- For all , .
The first property follows immediately from monotonicity of terminal reward and the monotonicity of the dynamics. From Exercise 5.1, to show the third property, we need to show that . Observe that
Now we prove the second property using backward induction. At , which is decreasing in . This forms the basis of induction. Now assume that is decreasing in . Then, By the induction hypothesis the second term is decreasing in . The minimum of a constant and a decreasing function is decreasing in . Thus, is decreasing in . This completes the induction step.
Lemma 11.3 At any time , if it is optimal to sell when the stock price is , then it is optimal to sell at all .
Since it is optimal to sell at , we must have Since is decreasing in , we have that for any , where the last inequality follows from . Eq implies that Thus, the stopping action is also optimal at .
Theorem 11.3 The optimal strategy is of the threshold type. In particular, there exist numbers such that it is optimal to exercise the option at time if and only if .
Let . The previous Lemma shows that is of the form , where , where we assume that is it is not optimal to stop in any state. Thus proves the threshold property.
To show that the thresholds are decreasing with time, it suffices to show that . Suppose . Then, where the first inequality follows because and the second inequality follows because . Eq implies that . Hence, and, therefore, the optimal thresholds are decreasing.
Exercises
Exercise 11.1 Derive a version of Theorem 11.1 where is weakly decreasing in .
Exercise 11.2 Derive a version of Theorem 11.1 for an optimal stopping problem where the objective is reward maximization instead of cost minimization. In particular, assume that denotes the continuation reward and denote the stopping reward at time . Define the benefit function and the one-step look-ahead function as above.
- Write the benefit function in terms of the one-step look-ahead function.
- Derive a version similar to Theorem 11.1 assuming is increasing in .
Exercise 11.3 (Selling an asset) Consider the decision problem by a person selling an asset. Let denote the offer received by the person at time . We assume that is an i.i.d. process. If the person sells the asset at time , then he receives a reward equal to the best offer received so far, i.e., . If he decides to continue, then he has to pay a continuation cost of . Show that there exist a sequence of thresholds such that the optimal strategy is to sell the asset when .
Notes
Theorem 11.1 is taken from Oh and Özer (2016).
The solution of the secretary problem presented here is due to Lindley (1961). For a history of the secretary problem and some generalizations, see Ferguson (1989) and Freeman (1983).
The above model for pricing options was introduced by Taylor (1967).
Ferguson, T.S. 1989. Who solved the secretary problem? Statistical science, 282–289.
Freeman, P.R. 1983. The secretary problem and its extensions: A review.
International Statistical Review / Revue Internationale de Statistique 51, 2, 189. DOI:
10.2307/1402748.
Lindley, D.V. 1961. Dynamic programming and decision theory.
Applied Statistics 10, 1, 39. DOI:
10.2307/2985407.
Oh, S. and Özer, Ö. 2016. Characterizing the structure of optimal stopping policies.
Production and Operations Management 25, 11, 1820–1838. DOI:
10.1111/poms.12579.
Taylor, H.M. 1967. Evaluating a call option and optimal timing strategy in the stock market.
Management Science 14, 1, 111–120. Available at:
http://www.jstor.org/stable/2628546.