To read this content please select one of the options below:

Real-time pixel-wise grasp affordance prediction based on multi-scale context information fusion

Yongxiang Wu (State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China)
Yili Fu (State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China)
Shuguo Wang (State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China)

Industrial Robot

ISSN: 0143-991x

Article publication date: 30 December 2021

Issue publication date: 11 February 2022

227

Abstract

Purpose

This paper aims to use fully convolutional network (FCN) to predict pixel-wise antipodal grasp affordances for unknown objects and improve the grasp detection performance through multi-scale feature fusion.

Design/methodology/approach

A modified FCN network is used as the backbone to extract pixel-wise features from the input image, which are further fused with multi-scale context information gathered by a three-level pyramid pooling module to make more robust predictions. Based on the proposed unify feature embedding framework, two head networks are designed to implement different grasp rotation prediction strategies (regression and classification), and their performances are evaluated and compared with a defined point metric. The regression network is further extended to predict the grasp rectangles for comparisons with previous methods and real-world robotic grasping of unknown objects.

Findings

The ablation study of the pyramid pooling module shows that the multi-scale information fusion significantly improves the model performance. The regression approach outperforms the classification approach based on same feature embedding framework on two data sets. The regression network achieves a state-of-the-art accuracy (up to 98.9%) and speed (4 ms per image) and high success rate (97% for household objects, 94.4% for adversarial objects and 95.3% for objects in clutter) in the unknown object grasping experiment.

Originality/value

A novel pixel-wise grasp affordance prediction network based on multi-scale feature fusion is proposed to improve the grasp detection performance. Two prediction approaches are formulated and compared based on the proposed framework. The proposed method achieves excellent performances on three benchmark data sets and real-world robotic grasping experiment.

Keywords

Citation

Wu, Y., Fu, Y. and Wang, S. (2022), "Real-time pixel-wise grasp affordance prediction based on multi-scale context information fusion", Industrial Robot, Vol. 49 No. 2, pp. 368-381. https://doi.org/10.1108/IR-06-2021-0118

Publisher

:

Emerald Publishing Limited

Copyright © 2021, Emerald Publishing Limited

Related articles