Xuan Huang 2257380739@qq.com
School of Computer Science and Artificial Intelligence, Wuhan Textile University,wuhan,China
Sequential multi-step fabric folding is a challenging problem in robotic manip-ulation, requiring a robot to perceive the fabric state and plan a sequence ofactions leading to the desired goal. Existing approaches suffer from low data efficiency, weak cross-task generalization, and high deployment costs, which are not practical for industrial applications. Thus, we present Fold-Llama, a novel method for robotic fabric folding that utilized geometry-to-text encoding and lightweight large language models (LLMs). Fold-Llama explicitly encodes human folding strategies into structured geometric operations, including 10 common geometric operation strategies, and subsequently DeepSeek-R1 generates a total of 2640 structured data pairs based on these geometric operation strategies for Low-Rank Adaptation (LoRA) fine-tuning of the Llama-3.2-3B-Instruct model. The fine-tuned model combines general LLM reasoning with 10 learned folding strategies, enabling it to generalize across diverse fabric configurations. This text-only synthesis and local deployment pipeline fundamentally shifts fabric manipulation learning from vision-based imitation to language-grounded geometric reasoning. We experimentally evaluate Fold-Llama on four representative sequential folding tasks and show that it significantly outperforms baseline LLMs-based approaches in simulation, demonstrating strong data efficiency with only 2,640 text samples (approximately 6% of traditional methods' 48K+ robotic interaction samples) while achieving 90.5% error reduction over the best LLM baseline. Furthermore, our approach can be transferred from simulation to the real world without additional training by directly calling the API of models deployed on the vLLM platform. Despite training on rectangular fabrics, we also show our approach generalizes to T-shirts and shorts.
Keywords: Fabric folding, Single-arm robotic systems, Geometric rule encoding, Low-Rank Adaptation (LoRA) fine-tuning, Large language models(LLMs)
Evaluation results of Fold-Llama and other methods across four tasks.
Simulation Experiments
All Corners Inward
Double Triangle
Corners Edges Inward
Double Straight
Physical Experiments(Stream Output)
All Corners Inward
Double Triangle
Corners Edges Inward
Short Fold