参考文献/References:
[1] 吉珊珊. 基于神经网络树和人工蜂群优化的数据聚类[J]. 南京师大学报(自然科学版),2021,44(1):119-127.
[2]LI F F,FERGUS R,PERSON P. A bayesian approach to unsupervised one-shot learning of object categories[C]//Proceedings of the 9th IEEE International Conference on Computer Vision. Nice,France:IEEE,2003:1134-1141.
[3]SUTTON R S,BARTO A G. Reinforcement learning:an introduction[M]. London:MIT Press,2018.
[4]MITCHELL M T. Machine learning[M]. New York:McGraw-Hill,1997.
[5]TOBIN J,FONG R,RAY A,et al. Domain randomization for transferring deep neural networks from simulation to the real world[J]. arXiv Preprint arXiv:1703.06907,2020.
[6]HESTER T,VECERIK M,PIETQUIM O,et al. Deep Q-learning from demonstrations[C]//The 32nd AAAI Conference on Artificial Intelligence. New Orleans,USA,2018:3223-3230.
[7]ANDERSON J R. Cognitive psychology and its applications[M]. 3rd ed. New York:Freeman,1990.
[8]王皓,高阳,陈兴国. 强化学习中的迁移:方法和进展[J]. 电子学报,2008,36(Suppl 1):39-43.
[9]KIM B,FARAHMAND A,PINEAU J,et al. Approximate policy iteration with demonstration data[C]//The 1st Multi-disciplinary Conference on Reinforcement Learning and Decision Making. Princeton,USA,2013:168-172.
[10]BERTSEKAS D P. Approximate policy iteration:a survey and some new methods[J]. Journal of Control Theory and Applications,2011,9(3):310-335.
[11]PIOT B,GEIST M,PIETQUIN O. Boosted bellman residual minimization handling expert demonstrations[C]//The 25th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Nancy,France,2014:549-564.
[12]CHEMALI J,LAZARIC A. Direct policy iteration with demonstrations[C]//The 24th International Joint Conference on Artificial Intelligence. Buenos Aires,Argentina,2015:3380-3386.
[13]LAZARIC A,RESTELI M,ANDREA B. Transfer of samples in batch reinforcement learning[C]//The 25th International Conference on Machine Learning. Helsinki,Finland,2008:544-551.
[14]CORTES C,MOHRI M,RILEY M,et al. Sample selection bias correction theory[C]//The 19th International Conference on Algorithmic Learning Theory. Budapest,Hungary,2008:38-53.
[15]LAROCHE R,BARLIER M. Transfer reinforcement learning with shared dynamics[C]//The 31st AAAI Conference on Artificial Intelligence. San Francisco,USA,2017:2147-2153.
[16]TIRINZONI A,SESSA A,MATTEO P,et al. Importance weighted transfer of samples in reinforcement learning[C]//The 35th International Conference on Machine Learning. Stockholm,Sweden,2018:4943-4952.
[17]ERNST D,GEURTS P,WEHENKEL L. Tree-based batch mode reinforcement learning[J]. Journal of Machine Learning Research,2005,6(4):503-556.
[18]NG A Y,HARADA D,RUSSELL S J. Policy invariance under reward transformations:Theory and application to reward shaping[C]//The 16th International Conference on Machine Learning. Bled,Slovenia,1999:278-287.
[19]WIEWIORA E,COTTREL G W,ELKAN C. Principled methods for advising reinforcement learning agents[C]//The 20th International Conference on Machine Learning. Washington DC,USA,2003:792-799.
[20]DEVLIN S,KUDENKO D. Dynamic potential-based reward shaping[C]//The 11th International Conference on Autonomous Agents and Multiagent Systems. Valencia,Spain,2012:433-440.
[21]HARUTYUNYAN A,DEVLIN S,VRANCX P,et al. Expressing arbitrary reward functions as potential-based advice[C]//The 29th AAAI Conference on Artificial Intelligence. Austin,USA,2015:2652-2658.
[22]FINN C,ABBEEL P,LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//The 34th International Conference on Machine Learning. Sydney,Australia,2017:1126-1135.
[23]DELEU T,BENGIO Y. The effects of negative adaptation in Model-Agnostic Meta-Learning[J]. arXiv Preprint arXiv:1812.02159,2018.
[24]RUSU A A,COLMENAREJO S G,GüLCEHRE C,et al. Policy distillation[C]//arXiv Preprint arXiv:1511.06295,2016.
[25]ABEL D. A theory of state abstraction for reinforcement learning[C]//The 31st Innovative Applications of Artificial Intelligence Conference. Honolulu,USA,2019:9876-9877.
[26]ABEL D,HERSHKOWITZ D E,LITTMAN M L. Near optimal behavior via approximate state abstraction[C]//International Conference on Machine Learning. New York,USA,2016:2915-2923.
[27]VALIANT L G. A theory of the learnable[J]. Communications of the Association for Computing Machinery. 1984,27(11):1134-1142.
[28]YAO H,ZHANG C,WEI Y,et al. Graph few-shot learning via knowledge transfer[C]//The 34th AAAI Conference on Artificial Intelligence. New York,USA,2020:6656-6663.
[29]ZHANG C,YAO H,HUANG C,et al. Few-shot knowledge graph completion[C]//The 34th AAAI Conference on Artificial Intelligence. New York,USA,2020:3041-3048.
[30]PARISOTTO E,BA J L,SALAKHUTDINOV R. Actor-mimic:deep multitask and transfer reinforcement learning[C]//The 4th International Conference on Learning Representations. San Juan,Puerto Rico,2016:156-171.
[31]MEHTA B,DELEU T,RAPARTHY S C,et al. Curriculum in gradient-based meta-reinforcement learning[J]. arXiv Preprint arXiv:2002.07956,2020.
[32]BENGIO Y,LOURADOUR J,COLLOBERT R,et al. Curriculum learning[C]//The 26th Annual International Conference on Machine Learning. New York,USA,2009:41-48.
[33]HESTER T,STONE P. Texplore:real-time sample-efficient reinforcement learning for robots[J]. Machine Learning,2013,90(3):385-429.
[34]施伟,冯旸赫,程光权,等. 基于深度强化学习的多机协同空战方法研究[J]. 自动化学报,2021,47(7):1610-1623.
[35]孟琭,沈凝,祁殷俏,等. 基于强化学习的三维游戏控制算法[J]. 东北大学学报(自然科学版),2021,42(4):478-482,493.