|Table of Contents|

Review of Research on Reinforcement Learning in Few-Shot Scenes(PDF)


Research Field:
Publishing date:


Review of Research on Reinforcement Learning in Few-Shot Scenes
Wang Zhechao123Fu Qiming123Chen Jianping23Hu Fuyuan123Lu You123Wu Hongjie123
(1.School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou 215009,China)(2.Jiangsu Provincial Key Laboratory of Building Intelligence and Energy Saving,Suzhou University of Science and Technology,Suzhou 215009,China)(3.Suzhou Key Laboratory of Mobile Networking and Applied Technologies,Suzhou University of Science and Technology,Suzhou 215009,China)
reinforcement learningfew-shot learningmeta-learningtransfer learninglifelong learningknowledge generalization
According to the background of the few-shot problem,this paper divides few-shot scenes into two types. The first type of scenes pursues more professional performance,while the other pursues more general performance. In the process of knowledge generalization,different scenes have obvious tendency to the requirement of knowledge carrier. Because of the discovery,the FSL is divided into two types in terms of knowledge carrier,where one type uses procedural knowledge and the other uses declarative knowledge. Then FS-RL algorithms under this classification are discussed. Finally,the possible development direction is proposed from the theory and the application,hoping to provide insights to following research.


[1] 吉珊珊. 基于神经网络树和人工蜂群优化的数据聚类[J]. 南京师大学报(自然科学版),2021,44(1):119-127.
[2]LI F F,FERGUS R,PERSON P. A bayesian approach to unsupervised one-shot learning of object categories[C]//Proceedings of the 9th IEEE International Conference on Computer Vision. Nice,France:IEEE,2003:1134-1141.
[3]SUTTON R S,BARTO A G. Reinforcement learning:an introduction[M]. London:MIT Press,2018.
[4]MITCHELL M T. Machine learning[M]. New York:McGraw-Hill,1997.
[5]TOBIN J,FONG R,RAY A,et al. Domain randomization for transferring deep neural networks from simulation to the real world[J]. arXiv Preprint arXiv:1703.06907,2020.
[6]HESTER T,VECERIK M,PIETQUIM O,et al. Deep Q-learning from demonstrations[C]//The 32nd AAAI Conference on Artificial Intelligence. New Orleans,USA,2018:3223-3230.
[7]ANDERSON J R. Cognitive psychology and its applications[M]. 3rd ed. New York:Freeman,1990.
[8]王皓,高阳,陈兴国. 强化学习中的迁移:方法和进展[J]. 电子学报,2008,36(Suppl 1):39-43.
[9]KIM B,FARAHMAND A,PINEAU J,et al. Approximate policy iteration with demonstration data[C]//The 1st Multi-disciplinary Conference on Reinforcement Learning and Decision Making. Princeton,USA,2013:168-172.
[10]BERTSEKAS D P. Approximate policy iteration:a survey and some new methods[J]. Journal of Control Theory and Applications,2011,9(3):310-335.
[11]PIOT B,GEIST M,PIETQUIN O. Boosted bellman residual minimization handling expert demonstrations[C]//The 25th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Nancy,France,2014:549-564.
[12]CHEMALI J,LAZARIC A. Direct policy iteration with demonstrations[C]//The 24th International Joint Conference on Artificial Intelligence. Buenos Aires,Argentina,2015:3380-3386.
[13]LAZARIC A,RESTELI M,ANDREA B. Transfer of samples in batch reinforcement learning[C]//The 25th International Conference on Machine Learning. Helsinki,Finland,2008:544-551.
[14]CORTES C,MOHRI M,RILEY M,et al. Sample selection bias correction theory[C]//The 19th International Conference on Algorithmic Learning Theory. Budapest,Hungary,2008:38-53.
[15]LAROCHE R,BARLIER M. Transfer reinforcement learning with shared dynamics[C]//The 31st AAAI Conference on Artificial Intelligence. San Francisco,USA,2017:2147-2153.
[16]TIRINZONI A,SESSA A,MATTEO P,et al. Importance weighted transfer of samples in reinforcement learning[C]//The 35th International Conference on Machine Learning. Stockholm,Sweden,2018:4943-4952.
[17]ERNST D,GEURTS P,WEHENKEL L. Tree-based batch mode reinforcement learning[J]. Journal of Machine Learning Research,2005,6(4):503-556.
[18]NG A Y,HARADA D,RUSSELL S J. Policy invariance under reward transformations:Theory and application to reward shaping[C]//The 16th International Conference on Machine Learning. Bled,Slovenia,1999:278-287.
[19]WIEWIORA E,COTTREL G W,ELKAN C. Principled methods for advising reinforcement learning agents[C]//The 20th International Conference on Machine Learning. Washington DC,USA,2003:792-799.
[20]DEVLIN S,KUDENKO D. Dynamic potential-based reward shaping[C]//The 11th International Conference on Autonomous Agents and Multiagent Systems. Valencia,Spain,2012:433-440.
[21]HARUTYUNYAN A,DEVLIN S,VRANCX P,et al. Expressing arbitrary reward functions as potential-based advice[C]//The 29th AAAI Conference on Artificial Intelligence. Austin,USA,2015:2652-2658.
[22]FINN C,ABBEEL P,LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//The 34th International Conference on Machine Learning. Sydney,Australia,2017:1126-1135.
[23]DELEU T,BENGIO Y. The effects of negative adaptation in Model-Agnostic Meta-Learning[J]. arXiv Preprint arXiv:1812.02159,2018.
[24]RUSU A A,COLMENAREJO S G,GüLCEHRE C,et al. Policy distillation[C]//arXiv Preprint arXiv:1511.06295,2016.
[25]ABEL D. A theory of state abstraction for reinforcement learning[C]//The 31st Innovative Applications of Artificial Intelligence Conference. Honolulu,USA,2019:9876-9877.
[26]ABEL D,HERSHKOWITZ D E,LITTMAN M L. Near optimal behavior via approximate state abstraction[C]//International Conference on Machine Learning. New York,USA,2016:2915-2923.
[27]VALIANT L G. A theory of the learnable[J]. Communications of the Association for Computing Machinery. 1984,27(11):1134-1142.
[28]YAO H,ZHANG C,WEI Y,et al. Graph few-shot learning via knowledge transfer[C]//The 34th AAAI Conference on Artificial Intelligence. New York,USA,2020:6656-6663.
[29]ZHANG C,YAO H,HUANG C,et al. Few-shot knowledge graph completion[C]//The 34th AAAI Conference on Artificial Intelligence. New York,USA,2020:3041-3048.
[30]PARISOTTO E,BA J L,SALAKHUTDINOV R. Actor-mimic:deep multitask and transfer reinforcement learning[C]//The 4th International Conference on Learning Representations. San Juan,Puerto Rico,2016:156-171.
[31]MEHTA B,DELEU T,RAPARTHY S C,et al. Curriculum in gradient-based meta-reinforcement learning[J]. arXiv Preprint arXiv:2002.07956,2020.
[32]BENGIO Y,LOURADOUR J,COLLOBERT R,et al. Curriculum learning[C]//The 26th Annual International Conference on Machine Learning. New York,USA,2009:41-48.
[33]HESTER T,STONE P. Texplore:real-time sample-efficient reinforcement learning for robots[J]. Machine Learning,2013,90(3):385-429.
[34]施伟,冯旸赫,程光权,等. 基于深度强化学习的多机协同空战方法研究[J]. 自动化学报,2021,47(7):1610-1623.
[35]孟琭,沈凝,祁殷俏,等. 基于强化学习的三维游戏控制算法[J]. 东北大学学报(自然科学版),2021,42(4):478-482,493.


Last Update: 2022-03-15