|Table of Contents|

Alternated Deep Q Network Based on Upper Confidence Bound(PDF)

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

Issue:
2022年01期
Page:
24-29
Research Field:
机器学习
Publishing date:

Info

Title:
Alternated Deep Q Network Based on Upper Confidence Bound
Author(s):
Wu Qingyuan1Tan Xiaoyang12
(1.College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)(2.MIIT Key Laboratory of Pattern Analysis and Machine Intelligence,Nanjing University of Aeronautics and Astronautics,Nanjing 211006,China)
Keywords:
reinforcement learningdeep reinforcement learningdeep Q-networkupper confidence bound
PACS:
TP18
DOI:
10.3969/j.issn.1672-1292.2022.01.004
Abstract:
The agent needs to learn interactively with the environment in the paradigm of deep reinforcement learning(DRL). The important dilemma of DRL is that the agent needs to balance exploitation and exploration. Therefore,how to improve the sample efficiency of algorithms and increase the exploration ability of the algorithm is a very popular research direction in the field of DRL. Different from existing works,we apply multiple DQNs with independent random initialization and use them to interact with the environment alternately. Using the generalized exploration abilities brought by random initialization of the networks,this paper proposes a method of alternately selecting DQN based on the maximum confidence upper bound(UCB)method,which is called Alternated DQN(ADQN). Experimental results on different standard reinforcement learning experimental environments show that ADQN has higher sample efficiency and algorithm learning efficiency than other benchmark algorithms.

References:

[1] SUTTON R S,BARTO A G. Reinforcement learning:an introduction[J]. IEEE Transactions on Neural Networks,1998,9(5):1054-1054.
[2]MNIH V,KAVUKCUOGLU K,SILVER D,et al. Playing atari with deep reinforcement learning[J]. arXiv Preprint arXiv:1312.5602,2013.
[3]MNIH V,KAVUKCUOGLU K,SILVER D,et al. Human-level control through deep reinforcement learning[J]. Nature,2015,518(7540):529-533.
[4]SILVER D,HUANG A,MADDISON C J,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature,2016,529(7587):484-489.
[5]SCHRITTWIESER J,ANTONOGLOU I,HUBERT T,et al. Mastering atari,go,chess and shogi by planning with a learned model[J]. Nature,2020,588(7839):604-609.
[6]VAN HASSELT H,GUEZ A,SILVER D. Deep reinforcement learning with double Q-learning[J]. arXiv Preprint arXiv:1509.06461v3,2016.
[7]SCHAUL T,QUAN J,ANTONOGLOU I,et al. Prioritized experience replay[J]. arXiv Preprint arXiv:1511.05952,2015.
[8]WANG Z,SCHAUL T,HESSEL M,et al. Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning. Lodon,UK,2016:1995-2003.
[9]HESSEL M,MODAYIL J,VAN HASSELT H,et al. Rainbow:Combining improvements in deep reinforcement learning[C]//Thirty-second AAAI Conference on Artificial Intelligence. Lousiana,USA,2018.
[10]FORTUNATO M,AZAR M G,PIOT B,et al. Noisy networks for exploration[J]. arXiv Preprint arXiv:1706.10295,2017.
[11]OSBAND I,BLUNDELL C,PRITZEL A,et al. Deep exploration via bootstrapped DQN[J]. Advances in Neural Information Processing Systems,2016,29.
[12]CHEN R Y,SIDOR S,ABBEEL P,et al. UCB exploration via Q-ensembles[J]. arXiv Preprint arXiv:1706.01502,2017.
[13]朱斐,吴文,刘全,等. 一种最大置信上界经验采样的深度Q网络方法[J]. 计算机研究与发展,2018,55(8):1694-1705.
[14]WATKINS C,DAYAN P. Q-learning[J]. Machine Learning,1992,8(3/4):279-292.
[15]ANSCHEL O,BARAM N,SHIMKIN N. Averaged-DQN:variance reduction and stabilization for deep reinforcement learning[C]//International Conference on Machine Learning. Sydney,Australia,2017:176-185.

Memo

Memo:
-
Last Update: 2022-03-15