2026年3月修改的论文速读记录,有哪些疑问点?

摘要:目录On the Role of Iterative Computation in Reinforcement LearningWileReward: Learning Reward Models from In-the-Wild Huma
目录On the Role of Iterative Computation in Reinforcement LearningWileReward: Learning Reward Models from In-the-Wild Human InteractionsCan We Really Learn One Representation to Optimize All Rewards?The Magic Correlations: Understanding Knowledge Transfer from Pretraining to Supervised Fine-TuningImproving Interactive In-Context Learning from Natural Language FeedbackLearning to Learn with Contrastive Meta-ObjectiveMulti-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal PreferencesMetaCURE: Meta Reinforcement Learning with Empowerment-Driven ExplorationAbsolute Zero: Reinforced Self-play Reasoning with Zero DataCIC: Contrastive Intrinsic Control for Unsupervised Skill Discoveryauto-curriculum learning (Jiang et al., 2021b)Meta-Motivo(Tirinzoni 等人,2025),zero-shot goal-conditioned RLUnsupervised Skill Discovery via Recurrent Skill TrainingLearning to Discover Skills through GuidanceOne After Another: Learning Incremental Skills for a Changing WorldDirect then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal ReachingHorizon Generalization in Reinforcement LearningHIQL: Offline Goal-Conditioned RL with Latent States as ActionsContrastive Preference Learning: Learning from Human Feedback without RLFew is More: Task-Efficient Skill-Discovery for Multi-Task Offline Multi-Agent Reinforcement LearningRethinking Reward Modeling in Preference-based Large Language Model AlignmentDOPL: Direct Online Preference Learning for Restless Bandits with Preference FeedbackFewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced DatasetData Center Cooling System Optimization Using Offline Reinforcement LearningSpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based SpikingRethinking Inverse Reinforcement Learning: from Data Alignment to Task AlignmentWhy Distillation can Outperform Zero-RL: The Role of Flexible ReasoningThinkless: LLM Learns When to ThinkLearning to Reason wi
阅读全文