Logo ČVUT
CZECH TECHNICAL UNIVERSITY IN PRAGUE
STUDY PLANS
2025/2026

Reinforcement Learning

The course is not on the list Without time-table
Code Completion Credits Range Language
B4M36PSU Z,ZK 6 2P+2C Czech
Course guarantor:
Lecturer:
Tutor:
Supervisor:
Department of Computer Science
Synopsis:
Requirements:
Syllabus of lectures:

1. Motivation (successes, AGI, human feedback, history)

2. Multi-armed bandit problems (stochastic, contextual)

3. Solving MDPs 1: (Bellman equations, Value iteration)

4. Solving MDPs 2: (Contraction, Policy iteration)

5. Temporal difference learning 1: (TD(0), Sarsa, Q-learning)

6. Temporal difference learning 2: (n-step, Double-Q, DQN)

7. Policy gradient methods 1: (Tabular)

8. Policy gradient methods 2: (Variance reduction, Neural)

9. Combining learning and planning (AlphaZero, muZero)

10. Exploration in RL

11. Multi-agent RL (cooperative vs. adversarial)

12. Applications: Advertising, RLHF, Robotics,

13. Neuro-science and RL

Syllabus of tutorials:
Study Objective:
Study materials:

Jako primární materiál budou k dispozici online scripta (ne slidy).

Doporučená literatura:

Reinforcement Learning, second edition: An Introduction, Richard Sutton, Andrew G. Barto, 2018.

Deep Reinforcement Learning Hands-On: A practical and easy-to-follow guide to RL from Q-learning and DQNsto PPO and RLHF, Maxim Lapan, 2020.

Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions, Warren B. Powel, 2022.

Note:
Further information:
No time-table has been prepared for this course
The course is a part of the following study plans:
Data valid to 2026-05-22
For updated information see http://bilakniha.cvut.cz/en/predmet8709006.html