TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach

Weikun Peng1, Jun Lv2, Yuwei Zeng1, Haonan Chen3, Siheng Zhao3, Jichen Sun2, Cewu Lu2, Lin Shao1,✝
1National University of Singapore, 2Shanghai Jiao Tong University, 3Nanjing University
Corresponding Author

CoRL 2024 (Oral)

Abstract

The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes used as subgoals, we first learn a teacher policy using privileged information. Then, we learn a student policy with point cloud observation by imitating teacher policy. Lastly, our pipeline learns a residual policy when the learned policy is applied to real-world execution, mitigating the Sim2Real gap. We demonstrate the effectiveness of TieBot in simulation and the real world. In the real-world experiment, a dual-arm robot successfully knots a tie, achieving 50% success rate among 10 trials.

Video Summary

Pipeline Overview

Our pipeline first apply local feature matching and keypoints detection to estimate the mesh of the tie in the human demonstration, with a given mesh model. Then, we learn to select grasping points for robots using RL, and train a student policy to imitate the teacher policy. Finally, we learn a residual policy when deploying to real robots.

Pipeline overview.

Human Demonstration

Human demonstration of the first tie-knotting task.

Human demonstration of the second tie-knotting task.

Human demonstration of the towel-folding task.

Local Feature Matching

We use LoFTr to build feature matching between two consecutive images. Here are some examples of feature matching results.

First tie-knotting task

Second tie-knotting task

Towel-folding task

Keypoints Detection

In our Real2Sim pipeline, we use existing estimated results to train keypoints detection model. Here we illustrate several detection results. The blue arrow is the predicted z-axis. The green arrow is the predicted x-axis.

First tie-knotting task

Second tie-knotting task

Real2Sim Results

Here we show several estimated meshes using our Real2Sim pipeline. The first and third columns are point clouds extracted from human demonstration video, and the second and forth columns are estimated meshes.

First tie-knotting task

Second tie-knotting task

Towel-folding task

Real-World Experiment

Tie-Knotting Task

Towel-Folding Task

Acknowledgement

The authors would like to thank Zihao Xu from National University of Singapore for setting up the tie-knotting experiment in the real setting, Zhixuan Xu and Haoyu Zhou from National University of Singapore for the support of computation resources, Hongjie Fang from Shanghai Jiao Tong University and Flexiv Robotics for help with towel-folding experiment in the real setting.