Skip Navigation

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning (paper from 12.07.2023)

Abstract:

Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors, 36 rooms and 140 objects, and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute. We provide real robot video demonstrations and code on our project page sayplan.github.io.

paper: https://arxiv.org/pdf/2307.06135.pdf

Video: https://cdn-uploads.huggingface.co/production/uploads/6258561f4d4291e8e63d8ae6/d_U_pzeCoJ2dTcBWz6n0r.mp4

0
0 comments