Link

Authors

  • Ruitong Huang - Borealis AI
  • Tianyang Yu - Nanchang University
  • Zihan Ding - Princeton University
  • Shanghang Zhang* - University of California, Berkeley (shzhang.pku[at]gmail.com)

Abstract

This chapter aims to introduce one of the most important deep reinforcement learning algorithms, called deep Q-networks. We will start with the Q-learning algorithm via temporal difference learning, and introduce the deep Q-networks algorithm and its variants. We will end this chapter with code examples and experimental comparison of deep Q-networks and its variants in practice.

Keywords: temporal difference learning, DQN, double DQN, dueling DQN, prioritized experience replay, distributional reinforcement learning

Code

Codes for contents in this chapter are available here.

Citation

To cite this book, please use this bibtex entry:

@incollection{deepRL-chapter5-2020,
 title={Policy Gradient},
 chapter={5},
 author={Ruitong Huang, Tianyang Yu, Zihan Ding, Shanghang Zhang},
 editor={Hao Dong, Zihan Ding, Shanghang Zhang},
 booktitle={Deep Reinforcement Learning: Fundamentals, Research, and Applications},
 publisher={Springer Nature},
 pages={161-212},
 note={\url{http://www.deepreinforcementlearningbook.org}},
 year={2020}
}

If you find any typos or have suggestions for improving the book, do not hesitate to contact with the corresponding author (name with *).