References

Updated

April 22, 2023

Altman, Eitan. 1999. Constrained markov decision processes. CRC Press. Available at: http://www-sop.inria.fr/members/Eitan.Altman/TEMP/h.pdf.
Arrow, K.J., Harris, T., and Marschak, J. 1952. Optimal inventory policy. Econometrica 20, 1, 250–272. DOI: 10.2307/1907830.
Bellman, R., Glicksberg, I., and Gross, O. 1955. On the optimal inventory equation. Management Science 2, 1, 83–104. DOI: 10.1287/mnsc.2.1.83.
Berry, R.A. 2000. Power and delay trade-offs in fading channels. Available at: https://dspace.mit.edu/handle/1721.1/9290.
Berry, R.A. 2013. Optimal power-delay tradeoffs in fading channels—small-delay asymptotics. IEEE Transactions on Information Theory 59, 6, 3939–3952. DOI: 10.1109/TIT.2013.2253194.
Berry, R.A. and Gallager, R.G. 2002. Communication over fading channels with delay constraints. IEEE Transactions on Information Theory 48, 5, 1135–1149. DOI: 10.1109/18.995554.
Berry, R., Modiano, E., and Zafer, M. 2012. Energy-efficient scheduling under delay constraints for wireless networks. Synthesis Lectures on Communication Networks 5, 2, 1–96. DOI: 10.2200/S00443ED1V01Y201208CNT011.
Bertsekas, D.P. 2011. Dynamic programming and optimal control. Athena Scientific. Available at: http://www.athenasc.com/dpbook.html.
Bitar, E., Poolla, K., Khargonekar, P., Rajagopal, R., Varaiya, P., and Wu, F. 2012. Selling random wind. 2012 45th hawaii international conference on system sciences, IEEE, 1931–1937.
Blackwell, D. 1964. Memoryless strategies in finite-stage dynamic programming. The Annals of Mathematical Statistics 35, 2, 863–865. DOI: 10.1214/aoms/1177703586.
Bohlin, T. 1970. Information pattern for linear discrete-time models with stochastic coefficients. IEEE Transactions on Automatic Control (TAC) 15, 1, 104–106.
Cassandra, A., Littman, M.L., and Zhang, N.L. 1997. Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes. Proceedings of the thirteenth conference on uncertainty in artificial intelligence.
Cassandra, A.R., Kaelbling, L.P., and Littman, M.L. 1994. Acting optimally in partially observable stochastic domains. AAAI, 1023–1028.
Chakravorty, J. and Mahajan, A. 2018. Sufficient conditions for the value function and optimal strategy to be even and quasi-convex. IEEE Transactions on Automatic Control 63, 11, 3858–3864. DOI: 10.1109/TAC.2018.2800796.
Cheng, H.-T. 1988. Algorithms for partially observable markov decision processes.
Daley, D.J. 1968. Stochastically monotone markov chains. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 10, 4, 305–317. DOI: 10.1007/BF00531852.
Davis, M.H.A. and Varaiya, P.P. 1972. Information states for linear stochastic systems. Journal of Mathematical Analysis and Applications 37, 2, 384–402.
Devlin, S. 2014. Potential based reward shaping tutorial. Available at: http://www-users.cs.york.ac.uk/~devlin/presentations/pbrs-tut.pdf.
Devlin, S. and Kudenko, D. 2012. Dynamic potential-based reward shaping. Proceedings of the 11th international conference on autonomous agents and multiagent systems, International Foundation for Autonomous Agents; Multiagent Systems, 433–440.
Ding, N., Sadeghi, P., and Kennedy, R.A. 2016. On monotonicity of the optimal transmission policy in cross-layer adaptive m -QAM modulation. IEEE Transactions on Communications 64, 9, 3771–3785. DOI: 10.1109/TCOMM.2016.2590427.
Edgeworth, F.Y. 1888. The mathematical theory of banking. Journal of the Royal Statistical Society 51, 1, 113–127. Available at: https://www.jstor.org/stable/2979084.
Feinberg, E.A. and He, G. 2020. Complexity bounds for approximately solving discounted MDPs by value iterations. Operations Research Letters. DOI: 10.1016/j.orl.2020.07.001.
Ferguson, T.S. and Gilstein, C.Z. 2004. Optimal investment policies for the horse race model". Available at: https://www.math.ucla.edu/~tom/papers/unpublished/Zach2.pdf.
Grzes, M. and Kudenko, D. 2009. Theoretical and empirical analysis of reward shaping in reinforcement learning. International conference on machine learning and applications, 337–344. DOI: 10.1109/ICMLA.2009.33.
Harris, F.W. 1913. How many parts to make at once. The magazine of management 10, 2, 135–152. DOI: 10.1287/opre.38.6.947.
Hinderer, K. 2005. Lipschitz continuity of value functions in Markovian decision processes. Mathematical Methods of Operations Research 62, 1, 3–22. DOI: 10.1007/s00186-005-0438-1.
Hopcroft, J. and Kannan, R. 2012. Computer science theory for the information age. Available at: https://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/hopcroft-kannan-feb2012.pdf.
Howard, R.A. 1960. Dynamic programming and markov processes. The M.I.T. Press.
Jenner, E., Hoof, H. van, and Gleave, A. 2022. Calculus on MDPs: Potential shaping as a gradient. Available at: https://arxiv.org/abs/2208.09570v1.
Keilson, J. and Kester, A. 1977. Monotone matrices and monotone markov processes. Stochastic Processes and their Applications 5, 3, 231–241.
Kelly, J.L., Jr. 1956. A new interpretation of information rate. Bell System Technical Journal 35, 4, 917–926. DOI: 10.1002/j.1538-7305.1956.tb03809.x.
Koole, G. 2006. Monotonicity in markov reward and decision chains: Theory and applications. Foundations and Trends in Stochastic Systems 1, 1, 1–76. DOI: 10.1561/0900000002.
Kumar, P.R. and Varaiya, P. 1986. Stochastic systems: Estimation identification and adaptive control. Prentice Hall.
Kwakernaak, H. 1965. Theory of self-adaptive control systems. In: Springer, 14–18.
Levy, H. 1992. Stochastic dominance and expected utility: Survey and analysis. Management Science 38, 4, 555–593. DOI: 10.1287/mnsc.38.4.555.
Levy, H. 2015. Stochastic dominance: Investment decision making under uncertainty. Springer. DOI: 10.1007/978-3-319-21708-6.
Marshall, A.W., Olkin, I., and Arnold, B.C. 2011. Inequalities: Theory of majorization and its applications. Springer New York. DOI: 10.1007/978-0-387-68276-1.
Morse, P. and Kimball, G. 1951. Methods of operations research. Technology Press of MIT.
Nerode, A. 1958. Linear automaton transformations. Proceedings of American Mathematical Society 9, 541–544.
Ng, A.Y., Harada, D., and Russell, S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. ICML, 278–287. Available at: http://aima.eecs.berkeley.edu/~russell/papers/icml99-shaping.pdf.
Picard, J. 2007. Concentration inequalities and model selection. Springer Berlin Heidelberg. DOI: 10.1007/978-3-540-48503-2.
Porteus, E.L. 1975. Bounds and transformations for discounted finite markov decision chains. Operations Research 23, 4, 761–784. DOI: 10.1287/opre.23.4.761.
Porteus, E.L. 2008. Building intuition: Insights from basic operations management models and principles. In: D. Chhajed and T.J. Lowe, eds., Springer, 115–134. DOI: 10.1007/978-0-387-73699-0.
Puterman, M.L. 2014. Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons. DOI: 10.1002/9780470316887.
Rachelson, E. and Lagoudakis, M.G. 2010. On the locality of action domination in sequential decision making. Proceedings of 11th international symposium on artificial intelligence and mathematics. Available at: https://oatao.univ-toulouse.fr/17977/.
Rigollet, P. 2015. High-dimensional statistics. Available at: https://ocw.mit.edu/courses/mathematics/18-s997-high-dimensional-statistics-spring-2015/lecture-notes/.
Rivasplata, O. 2012. Subgaussian random variables: An expository note. Available at: http://stat.cmu.edu/~arinaldo/36788/subgaussians.pdf.
Ross, S.M. 1974. Dynamic programming and gambling models. Advances in Applied Probability 6, 3, 593–606. DOI: 10.2307/1426236.
Sayedana, B. and Mahajan, A. 2020. Counterexamples on the monotonicity of delay optimal strategies for energy harvesting transmitters. IEEE Wireless Communications Letters, 1–1. DOI: 10.1109/lwc.2020.2981066.
Sayedana, B., Mahajan, A., and Yeh, E. 2020. Cross-layer communication over fading channels with adaptive decision feedback. International symposium on modeling and optimization in mobile, ad hoc, and wireless networks (WiOPT), 1–8.
Serfozo, R.F. 1976. Monotone optimal policies for markov decision processes. In: Mathematical programming studies. Springer Berlin Heidelberg, 202–215. DOI: 10.1007/bfb0120752.
Shwartz, A. 2001. Death and discounting. IEEE Transactions on Automatic Control 46, 4, 644–647. DOI: 10.1109/9.917668.
Skinner, B.F. 1938. Behavior of organisms. Appleton-Century.
Smallwood, R.D. and Sondik, E.J. 1973. The optimal control of partially observable markov processes over a finite horizon. Operations Research 21, 5, 1071–1088. DOI: 10.1287/opre.21.5.1071.
Striebel, C. 1965. Sufficient statistics in the optimal control of stochastic systems. Journal of Mathematical Analysis and Applications 12, 576–592.
Subramanian, J., Sinha, A., Seraj, R., and Mahajan, A. 2022. Approximate information state for approximate planning and reinforcement learning in partially observed systems. Journal of Machine Learning Research 23, 12, 1–83. Available at: http://jmlr.org/papers/v23/20-1165.html.
Topkis, D.M. 1998. Supermodularity and complementarity. Princeton University Press.
Tsitsiklis, J.N. 1984. Periodic review inventory systems with continuous demand and discrete order sizes. Management Science 30, 10, 1250–1254. DOI: 10.1287/mnsc.30.10.1250.
Urgaonkar, R., Wang, S., He, T., Zafer, M., Chan, K., and Leung, K.K. 2015. Dynamic service migration and workload scheduling in edge-clouds. Performance Evaluation 91, 205–228. DOI: 10.1016/j.peva.2015.06.013.
Veinott, A.F. 1965. The optimal inventory policy for batch ordering. Operations Research 13, 3, 424–432. DOI: 10.1287/opre.13.3.424.
Wainwright, M.J. 2019. High-dimensional statistics. Cambridge University Press. DOI: 10.1017/9781108627771.
Wang, S., Urgaonkar, R., Zafer, M., He, T., Chan, K., and Leung, K.K. 2019. Dynamic service migration in mobile edge computing based on Markov decision process. IEEE/ACM Transactions on Networking 27, 3, 1272–1288. DOI: 10.1109/tnet.2019.2916577.
Whitin, S. 1953. The theory of inventory management. Princeton University Press.
Whittle, P. 1982. Optimization over time: Dynamic programming and stochastic control. Vol. 1 and 2. Wiley.
Whittle, P. 1996. Optimal control: Basics and beyond. Wiley.
Whittle, P. and Komarova, N. 1988. Policy improvement and the newton-raphson algorithm. Probability in the Engineering and Informational Sciences 2, 2, 249–255. DOI: 10.1017/s0269964800000760.
Wiewiora, E. 2003. Potential-based shaping and q-value initialization are equivalent. Journal of Artificial Intelligence Research 19, 1, 205–208.
Witsenhausen, H.S. 1975. On policy independence of conditional expectation. Information and Control 28, 65–75.
Witsenhausen, H.S. 1976. Some remarks on the concept of state. In: Y.C. Ho and S.K. Mitter, eds., Directions in large-scale systems. Plenum, 69–75.
Witsenhausen, H.S. 1979. On the structure of real-time source coders. Bell System Technical Journal 58, 6, 1437–1451.
Yeh, E.M. 2012. Fundamental performance limits in cross-layer wireless optimization: Throughput, delay, and energy. Foundations and Trends in Communications and Information Theory 9, 1, 1–112. DOI: 10.1561/0100000014.
Zhang, H. 2009. Partially observable Markov decision processes: A geometric technique and analysis. Operations Research.
Zhang, N. and Liu, W. 1996. Planning in stochastic domains: Problem characteristics and approximation. Hong Kong Univeristy of Science; Technology.