Fold-Llama

Fold-Llama: Data-Efficient Robotic Fabric Folding via Geometry-to-Text Encoding and Lightweight LLM Fine-Tuning

School of Computer Science and Artificial Intelligence, Wuhan Textile University,wuhan,China

Abstract

Sequential multi-step fabric folding is a challenging problem in robotic manip-ulation, requiring a robot to perceive the fabric state and plan a sequence ofactions leading to the desired goal. Existing approaches suffer from low data efficiency, weak cross-task generalization, and high deployment costs, which are not practical for industrial applications. Thus, we present Fold-Llama, a novel method for robotic fabric folding that utilized geometry-to-text encoding and lightweight large language models (LLMs). Fold-Llama explicitly encodes human folding strategies into structured geometric operations, including 10 common geometric operation strategies, and subsequently DeepSeek-R1 generates a total of 2640 structured data pairs based on these geometric operation strategies for Low-Rank Adaptation (LoRA) fine-tuning of the Llama-3.2-3B-Instruct model. The fine-tuned model combines general LLM reasoning with 10 learned folding strategies, enabling it to generalize across diverse fabric configurations. This text-only synthesis and local deployment pipeline fundamentally shifts fabric manipulation learning from vision-based imitation to language-grounded geometric reasoning. We experimentally evaluate Fold-Llama on four representative sequential folding tasks and show that it significantly outperforms baseline LLMs-based approaches in simulation, demonstrating strong data efficiency with only 2,640 text samples (approximately 6% of traditional methods' 48K+ robotic interaction samples) while achieving 90.5% error reduction over the best LLM baseline. Furthermore, our approach can be transferred from simulation to the real world without additional training by directly calling the API of models deployed on the vLLM platform. Despite training on rectangular fabrics, we also show our approach generalizes to T-shirts and shorts.

Keywords: Fabric folding, Single-arm robotic systems, Geometric rule encoding, Low-Rank Adaptation (LoRA) fine-tuning, Large language models(LLMs)

Evaluation results of Fold-Llama and other methods across four tasks.

Simulation Experiments

0-1.mp4

0-3.mp4

0-2.mp4

0.mp4

1-1.mp4

All Corners Inward

1-3.mp4

Double Triangle

1-2.mp4

Corners Edges Inward

1.mp4

Double Straight

Physical Experiments(Stream Output)

4f326a8538c577a433031baf281bb456.mp4

All Corners Inward

f47b35d14d72599c195622bb8ca03c1d.mp4

Double Triangle

ddb950e7fff166699319ca29d3a1bc32.mp4

Corners Edges Inward

1f33d14ea8d7ac12240e196bee075639.mp4

Short Fold

Page updated

Google Sites

Report abuse