Inverse reinforcement learning seeks to recover a cost function that explains an expert’s behavior, but the problem is generally ill-posed. This talk presents a regularized inverse-optimization approach in which prior beliefs are used to select meaningful costs, even when the observed expert is not exactly optimal. After discussing the main results in the discrete setting, I develop a parallel with continuous-time stochastic control, where occupation measures, HJB inequalities, and Sobolev regularization lead naturally to an inverse PDE and variational-inequality framework.