Random Bin Picking of Unknown Objects: Development of a Modular System Architecture using Synthetic Datasets and Deep Learning

Zhang, Hui

Author:

Zhang, Hui

Abstract:

Grasping is a core task that often needs to be performed by a robotic system, and has been widely adopted in manufacturing, medicine, warehousing and explorative scientific research. However, robotic bin picking of unknown objects remains challenging due to limitations in robotic perception, situational awareness, and control. In this dissertation, the author proposes a grasping system architecture for the random bin picking of unknown objects. The proposed system architecture consists of three modules: grasp simulation, neural network, and grasp execution. The proposed grasping system architecture collects massive datasets with millions of grasp examples with various grippers in simulation, and trains neural networks to learn the grasp principles for the applied gripper to detect feasible grasp poses during real-world grasping. Both visual and force-torque feedback are integrated into the closed-loop control to optimize the real-world grasping trials. As one of the main contributions in the research, a versatile grasp simulator is proposed, which can simulate grasping trials for different grippers. It collects synthetic grasp examples to develop grasping methods for real-world grasping. For simulation with a complex gripper, the contact model and force-torque wrenches are tracked by a set of deformable sub-surfaces that behave following human-designed principles, outperforming many existing grasp simulators. The grasp simulator can render grasp scenes with stacked objects, instead of isolated objects in many state-of-the-art methods, implying that the synthetic grasp scenes from the proposed simulator are more similar to the real-world cases. This research develops neural networks to evaluate grasp parameters and grasp feasibility for the random picking with different grippers, varying from a 7-layer lightweighted CNN to a 52-layer deep neural network. The architecture and complexity of an adopted neural network mainly depend on the characteristics and grasp parameters of the applied gripper. The effects of different network architectures, depth, kernel size, and other parameters are investigated. The architecture of the developed neural networks could be a traditional linear regression model, a deep topology with gripping attention modules, an encoder-decoder structure, etc. A suitable neural network is selected taking into account the characteristics of each investigated gripper, the prediction error of the neural network, and the computational complexity of the desired grasping method. Furthermore, the proposed system architecture provides closed-loop grasping methods in both SE(2) with a top-down grasp direction and SE(3) with flexible grasp directions, depending on the gripper and robot. Various benchmarking experiments are implemented in both the simulation and real world, including the grasp quality estimation for isolated objects and dense clutters, and short-term / long-term picking tasks. The proposed grasping methods achieve higher grasp success rates than state-of-the-art grasping methods during benchmarking experiments. Besides, a novel nine-level criterion is presented to define the difficulty of picking tasks, considering the complexity and distribution of target objects. Distinguishing from many narrations, like "a random clutter", "a set of stacked objects", "objects with random poses", etc., in published work and open-source grasping datasets, explicit equations are defined to compute the objects' complexity and distribution. New objects and clutters can be defined with similar principles for the experiments with other grippers, and thus provide a relatively fair evaluation for a grasping method during benchmarking tests.