Reinforcement Learning: Zero to ChatGPT

Date: March 03, 2023

I have given this talk for a Machine Learning seminar hosted by the Department of Applied Mathematics, University of Waterloo. This talk gives an introduction to reinforcement learning, proximal policy optimization (PPO), and reinforcement learning with human feedback (RLHF) used to train large language models (LLMs) such as ChatGPT.

You can find the slides here.

Share on

X (formerly Twitter) Facebook LinkedIn

Marty Mukherjee

Share on