
Title: Berkeley IEOR Seminar Series: Vivek F Farias, MIT
Title: Massive Speedups for Policy Simulation with applications to Inventory Management
Abstract: We consider the task of producing a single trajectory of a dynamical system under some state dependent policy. This ‘policy simulation’ task is often the core computational bottleneck in modern Reinforcement Learning algorithms. The multiple, inherently serial, policy evaluations that must be performed in one such simulation constitute the bulk of this bottleneck. As a concrete example, simulating a fulfillment optimization policy on a month’s worth of demand at a moderately large retailer is a task that can take several hours rendering granular RL infeasible at scale.
We present a class of iterative algorithms we dub Picard Iteration. Our scheme carefully allocates policy evaluation tasks across independent GPU ‘processes’. Within each iteration a single process only evaluates the policy on its assigned tasks while assuming a certain ‘cached’ evaluation for other tasks. This cache is updated at the end of the iteration. A single iteration is ideally suited for the type of ‘single program multiple data’ parallelism offered by a GPU. We prove that the structure afforded by many inventory management problems allows Picard iteration to converge in a small number of iterations independent of the horizon. As one practical consequence, we demonstrate a 500x speedup in policy simulation for large-scale fulfillment optimization. Picard iteration offers a blueprint for similar speedups in related policy simulation and sequential inference tasks.
Joint work with Joren Gijsbrechts, Aryan Khojandi, Tianyi Peng and Andy Zheng.
Bio: Vivek is interested in the development of new methodologies and applications for large scale dynamic optimization. He received his Ph.D. in Electrical Engineering from Stanford University in 2007 and is the Patrick J. McGovern (1959) Professor at MIT. Vivek is a recipient of an INFORMS MSOM Student Paper Prize (2006), an INFORMS JFIG paper prize (2009, 2011), the NSF CAREER award (2011), MIT Sloan’s Outstanding Teacher award (2013), the INFORMS Simulation Society Best Publication Award (2014), the INFORMS Pricing and Revenue Management Best Publication Award (2015), the INFORMS MSOM Best Publication award in Management Science (2016), the MSOM Young Scholar Prize and the MIT-wide Jamieson Prize for Excellence in Teaching (2020). His practice based work has won the Wagner prize (2022) and Pierskalla Ward (2024) and has been judged a finalist for the Pierskalla Award (2011), the Gary L. Lilien ISMS-MSI Marketing Practice Prize (2016) and the Wagner Prize (2018). Vivek’s doctoral advisees have on various occasions won the Nicholson, MSOM, APS and RMP student paper prizes. Outside of academia, Vivek was co-founder/CTO at Celect (2014-19; acquired by Nike); was a corresponding author of the the technology at Seer (2018-IPO in 2020); and is co-founder/ CTO at Cimulate (current).