Reinforcement Learning: Zero to ChatGPT
I have given this talk for a Machine Learning seminar hosted by the Department of Applied Mathematics, University of Waterloo. This talk gives an introduction to reinforcement learning, proximal policy optimization (PPO), and reinforcement learning with human feedback (RLHF) used to train large language models (LLMs) such as ChatGPT.