如何通过经典论文速读记录,掌握offline meta-RL的知识?
摘要:目录(MAML) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks [ICML 2017](MACAW) Offline Meta-Reinforcement
目录(MAML) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks [ICML 2017](MACAW) Offline Meta-Reinforcement Learning with Advantage Weighting [ICML 2021](PEARL) Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables [ICML 2019]VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning(BOReL) Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies [NeurIPS 2021](MBML) Multi-task Batch Reinforcement Learning with Metric Learning [NeurIPS 2020]Meta-Q-Learning [ICLR 2020]FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization [ICLR 2021]
也请参考:offline meta-RL | 近期工作速读记录
(MAML) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks [ICML 2017]
arxiv:https://arxiv.org/abs/1703.03400
pdf:https://arxiv.org/pdf/1703.03400
html:https://ar5iv.labs.arxiv.org/html/1703.03400
来源:meta-RL 经典工作,MAML。
参考博客:论文速读记录 | 2025.04 - MAML
(MACAW) Offline Meta-Reinforcement Learning with Advantage Weighting [ICML 2021]
arxiv:https://arxiv.org/abs/2008.06043
pdf:https://arxiv.org/pdf/2008.06043
html:https://ar5iv.labs.arxiv.org/html/2008.06043
GitHub:https://sites.google.com/view/macaw-metarl
来源:这篇文章好像提出了 offline MAML,ICML 2021。
主要内容:
这篇文章正式提出了 offline meta-RL 的 setting:offline multi-task 数据集 + 新任务的少量 offline 数据(<5 条轨迹)用于适应新任务。
method:增强版 AWR(一种 offline 方法)+ MAML。
内核替换:将 MAML 的策略梯度换成 AWR,适用于 offline 场景;
增强表达能力:简单 MAML + AWR 会失败,因为 AWR 梯度信息量不足。MACAW 增加优势回归头,让梯度能同时编码“动作该是什么”和“优势有多大”;【这里还没完全看懂】
架构升级:引入权重变换层,突破普通 MLP"秩1更新"的限制,让内循环更强大。【这里还没完全看懂】
实验环境:MuJoCo 的 cheetah-direction、cheetah-velocity、walker-params、ant-direction。
好奇它的 baseline 是怎么做的,meta-BC 和 multi-task offline RL with fine-tuning 是怎么做的。
