Is RL Fine-Tuning Harder Than Regression? A PDE-Based Approach for Diffusion Models

发布时间:2025-08-02 11:55 阅读:
A A A
报告时间
报告地点
报告人

Abstract:

Reinforcement learning (RL) has gained significant traction for post-training optimization of deep generative models, where fine-tuning adapts pre-trained models to task-specific rewards. Theoretical literature consistently positions RL as fundamentally harder than regression, a view reflected in practical challenges like training instability and sample inefficiency. This raises a critical question: Is RL fine-tuning inherently harder than regression?


This talk addresses this question through recent advances in RL algorithm design for diffusion process fine-tuning. First, I demonstrate how uniform ellipticity -- a key structural property of Markov diffusions -- enables efficient RL with strong theoretical guarantees. Leveraging this framework, I introduce a novel PDE-based algorithm that reduces the RL fine-tuning problem to standard nonparametric regression. Among other results, I will highlight a self-mitigating statistical error bound that enables RL fine-tuning to achieve even superior efficiency to regression tasks.


Bio: Wenlong Mou is an Assistant Professor in the Department of Statistical Sciences at University of Toronto. In 2023, he received his Ph.D. degree in Electrical Engineering and Computer Sciences (EECS) from UC Berkeley. Prior to Berkeley, he received his B.Sc. degree in Computer Science and B.A. degree in Economics, both from Peking University. Wenlong's research interests include machine learning theory, mathematical statistics, optimization, and applied probability. He is particularly interested in data-driven decision-making in modern AI paradigms. His works have been published in many leading journals in statistical machine learning. His research has been recognized by the INFORMS Applied Probability Society as a Best Student Paper finalist.