References
Altman, Eitan. 1999. Constrained
markov decision processes. CRC Press. Available at: http://www-sop.inria.fr/members/Eitan.Altman/TEMP/h.pdf.
Arrow, K.J., Harris, T., and Marschak, J.
1952. Optimal inventory policy. Econometrica 20, 1,
250–272. DOI: 10.2307/1907830.
Bellman, R., Glicksberg, I., and Gross,
O. 1955. On the optimal inventory equation. Management
Science 2, 1, 83–104. DOI: 10.1287/mnsc.2.1.83.
Berry, R.A. 2000. Power and delay
trade-offs in fading channels. Available at: https://dspace.mit.edu/handle/1721.1/9290.
Berry, R.A. 2013. Optimal power-delay
tradeoffs in fading channels—small-delay asymptotics. IEEE
Transactions on Information Theory 59, 6, 3939–3952. DOI:
10.1109/TIT.2013.2253194.
Berry, R.A. and Gallager, R.G. 2002.
Communication over fading channels with delay constraints.
IEEE Transactions on Information Theory
48, 5, 1135–1149. DOI: 10.1109/18.995554.
Berry, R., Modiano, E., and Zafer, M.
2012. Energy-efficient scheduling under delay constraints for wireless
networks. Synthesis Lectures on Communication Networks
5, 2, 1–96. DOI: 10.2200/S00443ED1V01Y201208CNT011.
Bertsekas, D.P. 2011. Dynamic
programming and optimal control. Athena Scientific. Available at:
http://www.athenasc.com/dpbook.html.
Bitar, E., Poolla, K., Khargonekar, P.,
Rajagopal, R., Varaiya, P., and Wu, F. 2012. Selling random wind.
2012 45th hawaii international conference on system sciences,
IEEE, 1931–1937.
Blackwell, D. 1964. Memoryless strategies
in finite-stage dynamic programming. The Annals of Mathematical
Statistics 35, 2, 863–865. DOI: 10.1214/aoms/1177703586.
Bohlin, T. 1970. Information pattern for
linear discrete-time models with stochastic coefficients. IEEE
Transactions on Automatic Control (TAC) 15, 1, 104–106.
Cassandra, A., Littman, M.L., and Zhang,
N.L. 1997. Incremental pruning: A simple, fast, exact method for
partially observable Markov decision processes.
Proceedings of the thirteenth conference on uncertainty
in artificial intelligence.
Cassandra, A.R., Kaelbling, L.P., and Littman,
M.L. 1994. Acting optimally in partially observable stochastic
domains. AAAI, 1023–1028.
Chakravorty, J. and Mahajan, A. 2018.
Sufficient conditions for the value function and optimal strategy to be
even and quasi-convex. IEEE Transactions on Automatic Control
63, 11, 3858–3864. DOI: 10.1109/TAC.2018.2800796.
Cheng, H.-T. 1988. Algorithms for
partially observable markov decision processes.
Daley, D.J. 1968. Stochastically monotone
markov chains. Zeitschrift für
Wahrscheinlichkeitstheorie und verwandte Gebiete 10, 4,
305–317. DOI: 10.1007/BF00531852.
Davis, M.H.A. and Varaiya, P.P. 1972.
Information states for linear stochastic systems. Journal of
Mathematical Analysis and Applications 37, 2, 384–402.
Devlin, S. 2014. Potential based reward
shaping tutorial. Available at: http://www-users.cs.york.ac.uk/~devlin/presentations/pbrs-tut.pdf.
Devlin, S. and Kudenko, D. 2012. Dynamic
potential-based reward shaping. Proceedings of the 11th
international conference on autonomous agents and multiagent
systems, International Foundation for Autonomous Agents; Multiagent
Systems, 433–440.
Ding, N., Sadeghi, P., and Kennedy, R.A.
2016. On monotonicity of the optimal transmission policy in cross-layer
adaptive m -QAM modulation.
IEEE Transactions on Communications 64, 9, 3771–3785.
DOI: 10.1109/TCOMM.2016.2590427.
Edgeworth, F.Y. 1888. The mathematical
theory of banking. Journal of the Royal Statistical Society
51, 1, 113–127. Available at: https://www.jstor.org/stable/2979084.
Feinberg, E.A. and He, G. 2020.
Complexity bounds for approximately solving discounted MDPs
by value iterations. Operations Research Letters. DOI: 10.1016/j.orl.2020.07.001.
Ferguson, T.S. and Gilstein, C.Z. 2004.
Optimal investment policies for the horse race model". Available at: https://www.math.ucla.edu/~tom/papers/unpublished/Zach2.pdf.
Grzes, M. and Kudenko, D. 2009.
Theoretical and empirical analysis of reward shaping in reinforcement
learning. International conference on machine learning and
applications, 337–344. DOI: 10.1109/ICMLA.2009.33.
Harris, F.W. 1913. How many parts to make
at once. The magazine of management 10, 2, 135–152.
DOI: 10.1287/opre.38.6.947.
Hinderer, K. 2005. Lipschitz continuity
of value functions in Markovian decision processes.
Mathematical Methods of Operations Research 62, 1,
3–22. DOI: 10.1007/s00186-005-0438-1.
Hopcroft, J. and Kannan, R. 2012.
Computer science theory for the information age. Available at: https://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/hopcroft-kannan-feb2012.pdf.
Howard, R.A. 1960. Dynamic
programming and markov processes. The M.I.T. Press.
Jenner, E., Hoof, H. van, and Gleave, A.
2022. Calculus on MDPs: Potential shaping as a gradient. Available at:
https://arxiv.org/abs/2208.09570v1.
Keilson, J. and Kester, A. 1977. Monotone
matrices and monotone markov processes. Stochastic Processes and
their Applications 5, 3, 231–241.
Kelly, J.L., Jr. 1956. A new
interpretation of information rate. Bell System Technical
Journal 35, 4, 917–926. DOI: 10.1002/j.1538-7305.1956.tb03809.x.
Koole, G. 2006. Monotonicity in markov
reward and decision chains: Theory and applications. Foundations and
Trends in Stochastic Systems 1, 1, 1–76. DOI:
10.1561/0900000002.
Kumar, P.R. and Varaiya, P. 1986.
Stochastic systems: Estimation identification and adaptive
control. Prentice Hall.
Kwakernaak, H. 1965. Theory of
self-adaptive control systems. In: Springer, 14–18.
Levy, H. 1992. Stochastic dominance and
expected utility: Survey and analysis. Management Science
38, 4, 555–593. DOI: 10.1287/mnsc.38.4.555.
Levy, H. 2015. Stochastic dominance:
Investment decision making under uncertainty. Springer. DOI: 10.1007/978-3-319-21708-6.
Marshall, A.W., Olkin, I., and Arnold,
B.C. 2011. Inequalities: Theory of majorization and its
applications. Springer New York. DOI: 10.1007/978-0-387-68276-1.
Morse, P. and Kimball, G. 1951.
Methods of operations research. Technology Press of MIT.
Nerode, A. 1958. Linear automaton
transformations. Proceedings of American Mathematical
Society 9, 541–544.
Ng, A.Y., Harada, D., and Russell, S.
1999. Policy invariance under reward transformations: Theory and
application to reward shaping. ICML, 278–287. Available at: http://aima.eecs.berkeley.edu/~russell/papers/icml99-shaping.pdf.
Picard, J. 2007. Concentration
inequalities and model selection. Springer Berlin Heidelberg. DOI:
10.1007/978-3-540-48503-2.
Porteus, E.L. 1975. Bounds and
transformations for discounted finite markov decision chains.
Operations Research 23, 4, 761–784. DOI: 10.1287/opre.23.4.761.
Porteus, E.L. 2008. Building intuition:
Insights from basic operations management models and principles. In: D.
Chhajed and T.J. Lowe, eds., Springer, 115–134. DOI: 10.1007/978-0-387-73699-0.
Puterman, M.L. 2014. Markov decision
processes: Discrete stochastic dynamic programming. John Wiley
& Sons. DOI: 10.1002/9780470316887.
Rachelson, E. and Lagoudakis, M.G. 2010.
On the locality of action domination in sequential decision making.
Proceedings of 11th international symposium on artificial
intelligence and mathematics. Available at: https://oatao.univ-toulouse.fr/17977/.
Rigollet, P. 2015. High-dimensional
statistics. Available at: https://ocw.mit.edu/courses/mathematics/18-s997-high-dimensional-statistics-spring-2015/lecture-notes/.
Rivasplata, O. 2012. Subgaussian random
variables: An expository note. Available at: http://stat.cmu.edu/~arinaldo/36788/subgaussians.pdf.
Ross, S.M. 1974. Dynamic programming and
gambling models. Advances in Applied Probability 6, 3,
593–606. DOI: 10.2307/1426236.
Sayedana, B. and Mahajan, A. 2020.
Counterexamples on the monotonicity of delay optimal strategies for
energy harvesting transmitters. IEEE Wireless
Communications Letters, 1–1. DOI: 10.1109/lwc.2020.2981066.
Sayedana, B., Mahajan, A., and Yeh, E.
2020. Cross-layer communication over fading channels with adaptive
decision feedback. International symposium on modeling and
optimization in mobile, ad hoc, and wireless networks (WiOPT), 1–8.
Serfozo, R.F. 1976. Monotone optimal
policies for markov decision processes. In: Mathematical programming
studies. Springer Berlin Heidelberg, 202–215. DOI: 10.1007/bfb0120752.
Shwartz, A. 2001. Death and discounting.
IEEE Transactions on Automatic Control
46, 4, 644–647. DOI: 10.1109/9.917668.
Skinner, B.F. 1938. Behavior of
organisms. Appleton-Century.
Smallwood, R.D. and Sondik, E.J. 1973.
The optimal control of partially observable markov processes over a
finite horizon. Operations Research 21, 5, 1071–1088.
DOI: 10.1287/opre.21.5.1071.
Striebel, C. 1965. Sufficient statistics
in the optimal control of stochastic systems. Journal of
Mathematical Analysis and Applications 12, 576–592.
Subramanian, J., Sinha, A., Seraj, R., and
Mahajan, A. 2022. Approximate information state for approximate
planning and reinforcement learning in partially observed systems.
Journal of Machine Learning Research 23, 12, 1–83.
Available at: http://jmlr.org/papers/v23/20-1165.html.
Topkis, D.M. 1998. Supermodularity
and complementarity. Princeton University Press.
Tsitsiklis, J.N. 1984. Periodic review
inventory systems with continuous demand and discrete order sizes.
Management Science 30, 10, 1250–1254. DOI: 10.1287/mnsc.30.10.1250.
Urgaonkar, R., Wang, S., He, T., Zafer, M.,
Chan, K., and Leung, K.K. 2015. Dynamic service migration and
workload scheduling in edge-clouds. Performance Evaluation
91, 205–228. DOI: 10.1016/j.peva.2015.06.013.
Veinott, A.F. 1965. The optimal inventory
policy for batch ordering. Operations Research 13, 3,
424–432. DOI: 10.1287/opre.13.3.424.
Wainwright, M.J. 2019.
High-dimensional statistics. Cambridge University Press. DOI:
10.1017/9781108627771.
Wang, S., Urgaonkar, R., Zafer, M., He, T.,
Chan, K., and Leung, K.K. 2019. Dynamic service migration in
mobile edge computing based on Markov decision process.
IEEE/ACM Transactions on Networking
27, 3, 1272–1288. DOI: 10.1109/tnet.2019.2916577.
Whitin, S. 1953. The theory of
inventory management. Princeton University Press.
Whittle, P. 1982. Optimization over
time: Dynamic programming and stochastic control. Vol. 1 and 2.
Wiley.
Whittle, P. 1996. Optimal control:
Basics and beyond. Wiley.
Whittle, P. and Komarova, N. 1988. Policy
improvement and the newton-raphson algorithm. Probability in the
Engineering and Informational Sciences 2, 2, 249–255. DOI:
10.1017/s0269964800000760.
Wiewiora, E. 2003. Potential-based
shaping and q-value initialization are equivalent. Journal of
Artificial Intelligence Research 19, 1, 205–208.
Witsenhausen, H.S. 1975. On policy
independence of conditional expectation. Information and
Control 28, 65–75.
Witsenhausen, H.S. 1976. Some remarks on
the concept of state. In: Y.C. Ho and S.K. Mitter, eds., Directions
in large-scale systems. Plenum, 69–75.
Witsenhausen, H.S. 1979. On the structure
of real-time source coders. Bell System Technical Journal
58, 6, 1437–1451.
Yeh, E.M. 2012. Fundamental performance
limits in cross-layer wireless optimization: Throughput, delay, and
energy. Foundations and Trends in Communications and Information
Theory 9, 1, 1–112. DOI: 10.1561/0100000014.
Zhang, H. 2009. Partially observable
Markov decision processes: A geometric technique and
analysis. Operations Research.
Zhang, N. and Liu, W. 1996. Planning
in stochastic domains: Problem characteristics and approximation.
Hong Kong Univeristy of Science; Technology.