[1] SUTTON R S,BARTO A G. Reinforcement learning:an introduction[J]. IEEE Transactions on Neural Networks,1998,9(5):1054-1054.
[2]MNIH V,KAVUKCUOGLU K,SILVER D,et al. Playing atari with deep reinforcement learning[J]. arXiv Preprint arXiv:1312.5602,2013.
[3]MNIH V,KAVUKCUOGLU K,SILVER D,et al. Human-level control through deep reinforcement learning[J]. Nature,2015,518(7540):529-533.
[4]SILVER D,HUANG A,MADDISON C J,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature,2016,529(7587):484-489.
[5]SCHRITTWIESER J,ANTONOGLOU I,HUBERT T,et al. Mastering atari,go,chess and shogi by planning with a learned model[J]. Nature,2020,588(7839):604-609.
[6]VAN HASSELT H,GUEZ A,SILVER D. Deep reinforcement learning with double Q-learning[J]. arXiv Preprint arXiv:1509.06461v3,2016.
[7]SCHAUL T,QUAN J,ANTONOGLOU I,et al. Prioritized experience replay[J]. arXiv Preprint arXiv:1511.05952,2015.
[8]WANG Z,SCHAUL T,HESSEL M,et al. Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning. Lodon,UK,2016:1995-2003.
[9]HESSEL M,MODAYIL J,VAN HASSELT H,et al. Rainbow:Combining improvements in deep reinforcement learning[C]//Thirty-second AAAI Conference on Artificial Intelligence. Lousiana,USA,2018.
[10]FORTUNATO M,AZAR M G,PIOT B,et al. Noisy networks for exploration[J]. arXiv Preprint arXiv:1706.10295,2017.
[11]OSBAND I,BLUNDELL C,PRITZEL A,et al. Deep exploration via bootstrapped DQN[J]. Advances in Neural Information Processing Systems,2016,29.
[12]CHEN R Y,SIDOR S,ABBEEL P,et al. UCB exploration via Q-ensembles[J]. arXiv Preprint arXiv:1706.01502,2017.
[13]朱斐,吴文,刘全,等. 一种最大置信上界经验采样的深度Q网络方法[J]. 计算机研究与发展,2018,55(8):1694-1705.
[14]WATKINS C,DAYAN P. Q-learning[J]. Machine Learning,1992,8(3/4):279-292.
[15]ANSCHEL O,BARAM N,SHIMKIN N. Averaged-DQN:variance reduction and stabilization for deep reinforcement learning[C]//International Conference on Machine Learning. Sydney,Australia,2017:176-185.