参考文献/References:
[1] SUTTON R S,BARTO A G. Reinforcement learning:an introduction[J]. IEEE Transactions on Neural Networks,1998,9(5):1054-1054.
[2]MNIH V,KAVUKCUOGLU K,SILVER D,et al. Playing atari with deep reinforcement learning[J]. arXiv Preprint arXiv:1312.5602,2013.
[3]MNIH V,KAVUKCUOGLU K,SILVER D,et al. Human-level control through deep reinforcement learning[J]. Nature,2015,518(7540):529-533.
[4]SILVER D,HUANG A,MADDISON C J,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature,2016,529(7587):484-489.
[5]SCHRITTWIESER J,ANTONOGLOU I,HUBERT T,et al. Mastering atari,go,chess and shogi by planning with a learned model[J]. Nature,2020,588(7839):604-609.
[6]VAN HASSELT H,GUEZ A,SILVER D. Deep reinforcement learning with double Q-learning[J]. arXiv Preprint arXiv:1509.06461v3,2016.
[7]SCHAUL T,QUAN J,ANTONOGLOU I,et al. Prioritized experience replay[J]. arXiv Preprint arXiv:1511.05952,2015.
[8]WANG Z,SCHAUL T,HESSEL M,et al. Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning. Lodon,UK,2016:1995-2003.
[9]HESSEL M,MODAYIL J,VAN HASSELT H,et al. Rainbow:Combining improvements in deep reinforcement learning[C]//Thirty-second AAAI Conference on Artificial Intelligence. Lousiana,USA,2018.
[10]FORTUNATO M,AZAR M G,PIOT B,et al. Noisy networks for exploration[J]. arXiv Preprint arXiv:1706.10295,2017.
[11]OSBAND I,BLUNDELL C,PRITZEL A,et al. Deep exploration via bootstrapped DQN[J]. Advances in Neural Information Processing Systems,2016,29.
[12]CHEN R Y,SIDOR S,ABBEEL P,et al. UCB exploration via Q-ensembles[J]. arXiv Preprint arXiv:1706.01502,2017.
[13]朱斐,吴文,刘全,等. 一种最大置信上界经验采样的深度Q网络方法[J]. 计算机研究与发展,2018,55(8):1694-1705.
[14]WATKINS C,DAYAN P. Q-learning[J]. Machine Learning,1992,8(3/4):279-292.
[15]ANSCHEL O,BARAM N,SHIMKIN N. Averaged-DQN:variance reduction and stabilization for deep reinforcement learning[C]//International Conference on Machine Learning. Sydney,Australia,2017:176-185.
相似文献/References:
[1]毛 晋,熊 轲,位 宁,等.基于深度强化学习的超密集网络中多用户上行功率控制方法[J].南京师范大学学报(工程技术版),2022,22(01):016.[doi:10.3969/j.issn.1672-1292.2022.01.003]
Mao Jin,Xiong Ke,Wei Ning,et al.Power Control in Ultra Dense Network:A DeepReinforcement Learning Based Method[J].Journal of Nanjing Normal University(Engineering and Technology),2022,22(01):016.[doi:10.3969/j.issn.1672-1292.2022.01.003]
[2]王哲超,傅启明,陈建平,等.小样本场景下的强化学习研究综述[J].南京师范大学学报(工程技术版),2022,22(01):086.[doi:10.3969/j.issn.1672-1292.2022.01.013]
Wang Zhechao,Fu Qiming,Chen Jianping,et al.Review of Research on Reinforcement Learning in Few-Shot Scenes[J].Journal of Nanjing Normal University(Engineering and Technology),2022,22(01):086.[doi:10.3969/j.issn.1672-1292.2022.01.013]
[3]黄江涛,刘 刚,周 攀,等.基于深度强化学习技术的舰载无人机自主着舰控制研究[J].南京师范大学学报(工程技术版),2022,22(03):063.[doi:10.3969/j.issn.1672-1292.2022.03.009]
Huang Jiangtao,Liu Gang,Zhou Pan,et al.Research on Autonomous Landing Control of Carrier-borne UCAV Based on Deep Reinforcement Learning Technology[J].Journal of Nanjing Normal University(Engineering and Technology),2022,22(01):063.[doi:10.3969/j.issn.1672-1292.2022.03.009]