KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts


NeurIPS 2024
Alt text Alt text Alt text

We introduce KALM method, which investigates utilizing the knowledge within the pre-trained LLMs in a form of imaginary rollouts for novel text goals. (Left) Goal: Use the green ball as the nucleus of the circle, arranging the rest around it. (Middle) Goal: Employ the gripper tool to pick up the desired object and move it to the intended location, notwithstanding the hindrance of a wall in the path. (Right) Goal: Position the gripper to reach the target area, with awareness of the wall obstructing the path.

Abstract

Reinforcement learning (RL) traditionally trains agents using interaction data, which limits their capabilities to the scope of the training data. To create more knowledgeable agents, leveraging knowledge from large language models (LLMs) has shown a promising way. Despite various attempts to combine LLMs with RL, there is commonly a semantic gap between action signals and LLM tokens, which hinders their integration. This paper introduces a novel approach, KALM (Knowledgeable Agents from Language Model Rollouts), to learn knowledgeable agents by bridging this gap. KALM extracts knowledge from LLMs in the form of imaginary rollouts, which agents can learn through offline RL. To overcome the limitation that LLMs are inherently text-based and may be incompatible with numerical environmental data, KALM fine-tunes the LLM to perform bidirectional translation between textual goals and rollouts. This process enables the LLM to understand the environment better, facilitating the generation of meaningful rollouts. Experiments on robotic manipulation tasks demonstrate that KALM allows agents to rephrase complex goals and tackle novel tasks requiring new optimal behaviors. KALM achieves a 46% success rate in completing 1400 various novel goals, significantly outperforming the 26% success rate of baseline methods.

Overview of KALM Method

KALM consists of three main steps: (1) LLM grounding that enables LLM to understand the elements of the environment. In this phase, KALM fine-tunes the LLM using the pairs of rollout-goal collected from the environment. The LLM's architecture is modified to process/output symbolic (2) Rollout generation that generates imaginary rollouts for novel skills. KALM achieves this by prompting the fine-tuned LLM with various text goals. (3) Skill acquisition that trains the policy on both real data and imaginary rollouts with offline RL.

Main Results

KALM is compatible with wide range of offline RL algorithms, enabling agents to acquire novel skills beyond the pre-collected environment data.
KALM enables LLM to imagine novel rollouts unseen before. In (a), LLM is never trained with data containing "wall", but it raises the robotics arm to avoid colliding the wall. In (b), LLM is only trained with goals like "move blue ball to the left of red ball."

Takeaways

1. LLMs can generalize novel execution rollouts, after fine-tuned with low-level control data.
2. Utilizing knowledge in form of imaginary rollouts is a better way than LLMs as policies for control.

BibTeX

@inproceedings{KALM,
 title={KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts},
 author={Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang and Yang Yu},
 booktitle={The 38th Annual Conference on Neural Information Processing Systems},
 year={2024}
}