Planning for grasping cluttered objects based on obstruction degree

Grasping objects in clutter is more difficult than grasping a separated single object. An important issue is that unsafe grasps may occur, in case, one object sits or leans on another, which could cause the collapse of objects. In addition, reachability of each object surrounded by other obstacles also has to be considered. So the order of multiple objects for grasping and the grasp configuration of each object must be planned simultaneously. This article combines grasp order and grasp configuration planning to perform fast and safe multiobject grasping in cluttered scenes. First, a comprehensive grasp configuration database is built to provide enough feasible grasp configurations for the objects. Then, we propose an obstruction degree to estimate the likelihood of reachability of each grasp configuration as well as each object. This measurement also implicitly infers object interactions. Finally, grasp order and grasp configurations are planned together to deal with the constraints caused by reachability and object interaction. Simulations and experiments in a series of cluttered scenes demonstrate that our method can grasp objects efficiently and can greatly reduce unsafe grasps.


Introduction
Multiobject grasp planning in cluttered scenes has been regarded as a key problem in robot manipulation field. Grasping multiple objects is constrained by reachability and object interaction. On the one hand, there are various grasp configurations for different objects, and each object is surrounded by other objects, so one reachable grasp configuration must be chosen for each object. On the other hand, objects will collapse if an object that is supporting other objects is removed first, and objects may be damaged and may roll to unreachable places. Therefore, an appropriate grasp order and reachable grasp configurations for all objects are required to achieve safe and fast grasping in cluttered scenes.
Many researchers have solved grasping known objects by dividing the whole process into two steps: offline grasp generation and online grasp planning. Shape primitive 1-3 is widely used in grasp generation. It approximates the target object to one or more shape primitives, which include an appropriate set of grasp configurations. Region masking [4][5][6] is also a popular methodology. It can quickly locate the region, which has a high probability of containing highquality grasp configurations. Tsuji et al. 7 combine the above two methods, select the constricted region of the object as the grasp interest region, and fit the local model of the object near the grasp interest region to the grasp primitives. Li et al. 8 wrap ropes around the object to find possible grasp regions and then computes the contacts of a multifingered hand with object surfaces around these regions. Wan et al. 9 apply superimposed segmentation to the object mesh model and use the uniform facets to locate contacts and generate grasp poses for grippers and suction cups. Grasp generation only provides possible grasp configurations but does not decide the final grasp configuration to grasp an object in a cluttered scene.
Online grasp planning is usually used to decide how to grasp each object in clutter. One type is planning on action level, which finds the best way to grasp one object in the current scene. Most studies usually choose some metric as objective function to optimize grasp configurations in the scene. Berenson et al. 10 propose a grasp objective function that takes into account the kinematics of the robot, the environment around the object, and the grasp force-closure quality. Some studies are interested in planning a path that approaches the target while pushes away other obstacles [11][12][13] or pushing the target itself to make it graspable. 14,15 The other type is planning on task level, which finds an order of objects to grasp in the current scene. Stilman et al. 16 and Dogar and Srinivasa 17 consider occlusion between objects and plan how to move the objects to get a target object. But they cannot deal with complex object interactions like stacking or slanting. Some studies work on recognizing complex object interactions, [18][19][20][21][22] which is used to decide a safe grasping order. Due to neglect of concrete grasp configuration planning, the actual feasibility of the plan cannot be guaranteed.
In addition, a lot of research also solves grasping unknown objects, which usually generates grasp configurations online. Lippiello et al. 23 reconstruct object surface through images taken by a camera and move the fingertips on the surface to find optimal grasp. Lei and Wisse 24 extract object's concave hull contour from the point cloud and calculate suitable grasping by maximizing the coefficient of force balance. Lin et al. 25 transfer example grasps taught by human demonstration to similar objects. Now deep learning [26][27][28] is becoming the most prevailing approach to solve grasping problems. However, these methods usually consider scenes where objects are sparsely placed or object collapse is not detrimental.
In summary, most of the current research on multiobject grasp planning separates task level and action level or only involves one of the two levels. We aim at combining the two planning levels. Concretely, we focus on a task that a robotic arm with a two-finger parallel gripper is commanded to grasp all objects on a table. The order of objects to grasp and reachable grasp configuration for each object are planned simultaneously, for the purpose that the risk of object collapse is reduced and reachable grasp configurations are found quickly. We hold several assumptions: 1. Object three-dimensional (3D) models are known. 2. Identifications (IDs) and poses of visible objects can be recognized. 3. No object belongs to containers, that is, an object is never in another object.
Grasp planning based on obstruction degree (OD) is proposed to realize safe grasp order and fast grasp configuration search. Differently from previous works, we do not infer object interactions explicitly using complex algorithms. Our method is very simple and easy for implementation and has high efficiency in planning.
The structure of this article is as follows: The second section gives a glimpse of the overall planning framework; the third section describes the method used to generate grasp configurations offline; the fourth section proposes OD computation and online planning algorithms; the fifth section evaluates our planning framework by simulations and experiments; and the sixth section gives a conclusion and discusses the future work.

Grasp planning framework
The whole planning framework is shown in Figure 1. Grasp generation only needs to be computed once offline and provides a grasp database with widely distributed grasp center points and grasp approach directions. Grasp  planning is performed online to decide grasp order and grasp configurations for all objects based on ODs.

Offline grasp generation
Region masking extracts the object mean curvature skeleton SK and samples interested vertices according to the connectivity to their neighbors. Then hypotheses generation applies to search in a spherical grid built on each interested vertex to find collision-free grasp hypotheses H. Next, grasp stability analysis formulates a grasp quality function to filter unstable grasps from the generated hypotheses and the remains are stored in the grasp database B.

Online grasp planning
Grasp candidates selection selects possible grasp candidates B 0 that may be reached by the robot in the current scene from the database B. Then, obstruction analysis determines ODs between grasp configurations and objects as well as between objects and objects in the current scene. Next, grasp order planning plans the order X n of all objects to grasp, which minimizes total ODs. At last, grasp configuration planning searches reachable grasp configurations Y n for all objects.

Grasp parameterization
A grasp is commonly parameterized by gripper-specific information and gripper pose. Following parameters define a grasp configuration P for a two-finger parallel gripper: P c : grasp center point: a point at which the center between the two fingertips is located. P d : grasp approach direction: a vector that indicates the direction the fingers point to. P r : gripper roll angle: the roll angle of gripper around the approach direction. P s : opening degree: the distance between the two parallel fingers before grasping. P c , P d , and P r are described in object coordinate frame and determine the gripper pose when grasping an object. The gripper usually first moves to a pregrasp pose and then moves forward along P d to arrive at the final grasp pose. This is why P d is called "grasp approach direction." As a grasp configuration can be determined by the above parameters, grasp generation can be regarded as a sampling process for these parameters.

Region masking
We assume that 3D models of the objects are available. To reduce computation effort of grasp generation and online planning, the models can be simplified, for example, as unions of primitive shapes. [1][2][3] According to human grasp experience, the surrounding areas of skeleton vertices are usually used as preferred areas for humans to grasp objects. 29 We choose the mean curvature skeleton 30 as the interested region, like Vahrenkamp et al. 6 have done. A resulting skeleton is a graph SK ¼ ðV ; EÞ, in which each vertex v 2 V is connected to one or multiple neighbors via edges e 2 E. According to the connectivity to neighboring vertices, each skeleton vertex v can be classified into branching vertex, endpoint vertex, and connecting vertex. We select all branching and endpoint vertices as interested vertices, and connecting vertices are uniformly sampled as interested vertices.

Hypotheses generation
Grasp parameters are sampled according to Algorithm 1. First, a spherical coordinate frame is established at an interested vertex (line 2), and the x; y; and z axes are parallel with that of the object coordinate frame as depicted in Figure 2. Then point ðr; q; Þ inside the object model is sampled (lines 3, 4, and 6) as the grasp center point P c . The grasp approach direction P d is parallel with the direction ðq; Þ and points to the interested vertex; gripper roll angle P r is also sampled (line 7). P c determines the gripper position, P d and P r determine the gripper rotation, and P s is conservatively predefined as the largest opening degree of the gripper. At this time, a grasp hypothesis can be generated (line 8). If the gripper in that configuration is not in collision with the object model (line 9), the hypothesis will be inserted into the set of grasp hypotheses H (line 10).
A grasp hypothesis for a two-finger parallel gripper can also be seen as h ¼ ðP h ; P s Þ. P h is the pose of the gripper in object coordinate frame and can be computed by (1) where ½x; y; z T is the position in object coordinate frame, and ½roll; pitch; yaw T is RPY Euler angle that represents the rotation in object coordinate frame.

Grasp stability analysis
Inspired by force balance, 24 we propose a metric to evaluate grasp stability for a two-finger parallel gripper. As shown in Figure 3, we establish a set of line segments connecting corresponding points between the two fingers in the region, which may contact the object. These line segments are perpendicular to P d and parallel with P m , which is the movement direction of the fingers.
Then the gripper is put at one grasp configuration, all intersection points of the line segments and the object surface are collected. The intersection points closest to finger 1 and finger 2 are labeled by P 1 and P 2 , respectively, and they are the first contact points of this grasp. If there are multiple intersection points, which are closest to some finger, P 1 and P 2 are selected such that jP 1 P 2 ! j is the shortest.
N 1 and N 2 are surface normals at P 1 and P 2 , respectively. For an ideal grasp configuration, N 1 and N 2 should be parallel with P m so that the contacts between the fingertips and the object would be stable. Besides, P 1 P 2 ! should also be parallel with P m , otherwise the object may rotate when the gripper is closing. So we use the following value to evaluate grasp stability where operator ff denotes the angle between two vectors, and l 1 ; l 2 ; l 3 are positive weight coefficients. When the angles are closing to p=2, MðPÞ is larger and indicates the grasp is unstable. When the angles are closing to 0 or p, MðPÞ is smaller and indicates the grasp is stable. A threshold M t is set for grasp stability filter. All grasp hypotheses in H are evaluated by equation (2), and those whose MðPÞ < M t would be added into the grasp database B.

Grasp candidates selection
Grasp candidates B 0 are selected from grasp database B. First, all grasp configurations of the objects that have been recognized in the scene are transferred into robot base frame according to the objects' poses. Then for each object, its grasp configurations in the upper one-fourth spherical space toward the robot are selected as its grasp candidates, since the supporting table and manipulator limitation prevent grasping from bottom and back. As shown in Figure 4, any selected grasp candidate P of object O must satisfy where RO H is the projection in the horizontal plane of the vector from the origin of robot base frame to the origin of object frame and V D is a vector point straight down. For each grasp candidate P, its relative pose score is defined as follows Object Figure 3. Grasp stability analysis. A set of line segments connect corresponding points between the two fingers. They are parallel with P m which is the movement direction of the fingers. P 1 and P 2 are the intersection points of the line segments and the object surface, which are closest to finger 1 and finger 2. N 1 and N 2 are surface normals at P 1 and P 2 . where RP c is the vector from the origin of robot base frame to the grasp center point P c , and K q and K d are positive weight coefficients. The first term in the right of equation (4) represents the angle that the gripper has to turn to reach P, and the second term represents the distance that the gripper has to move to reach P. As the angle or distance is smaller, EðPÞ is larger, which indicates the gripper can reach P more easily without consideration of obstacles.

Obstruction analysis
We utilize the distance map 10 to identify obstruction relationships between grasp candidates and objects. We build a set of rays frg constrained by ffðÀP d ; rÞ 8 e, where e denotes the range to estimate free space around a grasp candidate. The first ray r 0 is built according to Figure 5. Then rays fr i ji ¼ 1; 2; :::g are built by rotating r 0 around P m with angle ie 0 8 e, where e 0 is a constant interval. Finally, rays fr ij jj ¼ 1; 2; :::g are built by rotating each r i around ÀP d with angle je 0 < 2p. All above rays constitute the set frg.
where O ðO i ; O j Þ is also between 0 and 1, the larger it is, and O j causes more obstruction of grasping O i .
; ng is defined as object obstruction set of the current scene, which includes all ODs between each two objects.

Grasp order planning
Suppose X n ¼ ½x 1 ; :::; x i ; :::; x n represents grasp order of all objects in O. x i is the object in the ith grasp, and For a given X n , we can array the elements in object obstruction set O as an obstruction matrix x j Þ is called scene OD of x i , which indicates the whole OD by the other objects when grasping x i . The objective of grasp order planning is minimizing the sum of the scene ODs of all objects under grasp order X n , that is It actually minimizes the sum of the upper right part of the matrix SðX n Þ, and the bottom-left part is ignored because the objects that have been removed no longer obstruct remaining objects.
This problem is similar to the traveling salesman problem (TSP), and any planning method, which can solve the TSP, Figure 5. P 0 is the intersection of the object surface and a ray starting at P c with direction ÀP d . r 0 is a ray starting at P 0 with direction ÀP d . can also solve our problem. In our implementation, branch and bound method 31 is utilized as shown in Algorithm 2.
The branch and bound method incrementally builds a tree to search for an optimal plan. A node q on the tree contains a grasp order q:order ¼ X m ¼ ½x 1 ; :::; x i ; :::; x m for grasping m objects, m 8 n. Each node also has a lower bound q:lowerbound, and its value is computed by Z 0 ðX m Þ ¼ P m i¼1 P n j¼i O ðx i ; x j Þ. x mþ1 to x n are objects in O À q:order, and the order is arbitrary.
At the beginning, an upper bound is initialized by a greed method, which chooses the next object to be grasped with the least scene OD. Then a set Q is created to save all expandable leaf nodes on the tree, it initially contains a single root node q 0 , which includes no object. At each loop, the leaf node q 2 Q, which has the lowest lower bound, is taken out to be expanded. Each object, which is not in q:order, is added to create a new node q 0 , and the lower bound is updated. If the new node's lower bound is less than the upper bound, it is inserted in Q as a child of q. Moreover, if q 0 :order already includes all the objects, the upper bound is updated to q 0 :lowerbound, and in this condition, q 0 will not be inserted in Q as it cannot be further expanded. The loop repeats until there is no node can be expanded or the lowest lower bound is larger than the upper bound.
Actually, the planning is driven by inequality of ODs between two objects: O ðO i ; O j Þ 6 ¼ O ðO j ; O i Þ. Figure 6 shows three kinds of basic object interactions. If an object is in front of another object, as there is no grasp candidate from the back, the front object is not obstructed by the back one while the back object is obstructed by the front one. If an object sits on another object, as there is no grasp candidate from the bottom, the upper object is not obstructed by the bottom one, while the bottom object is obstructed by the upper one. Similarly, if an object leans against another object, its OD is less than the supporting object, as the slanting object has few grasp candidates toward the supporting object. So the planning gives front, upper, and slanting objects more priority.

Grasp configuration planning
After planning the grasp order, we should decide which grasp candidate is used to grasp each object. Suppose Y n ¼ ½y 1 ; :::; y i ; :::; y n represents grasp configurations to grasp objects in X n , y i is used to grasp x i . For a grasp candidate P of x i , its relative pose score under obstruction is defined as follows where E 0 ðPÞ estimates how easily the gripper can reach P with consideration of obstacles.
For object x i , its final grasp configuration is computed by Algorithm 3. A motion planner is required to compute collision-free trajectories of the robot. If there is at least one trajectory that arrives at some grasp configuration, then this grasp configuration is reachable. First, y i is given a default P 0 , which cannot be executed, and B 0 ðx i Þ is ranked in descending order with respect to E 0 ðPÞ. Then motion planning is called for each P 2 B 0 ðx i Þ, until the first reachable one is found.
The complete Y n can be decided at one time according to X n before any object is grasped. But the remaining objects' poses may change after each grasp due to accidental collision with the robot or removing an object which supports some others. In addition, previously invisible objects may be revealed after each grasp. So we implement the whole planning process as Algorithm 4. At one time, only one object's grasp configuration is planned, and the object is grasped immediately. Then grasp order planning and grasp configuration planning will be called again. The grasp order of the remaining objects may change because of the change of objects' poses or the discovery of new objects. If no grasp candidate of an object is reachable, the next object in X n will be tried. In the worst case, no object can be grasped (line 11), at this time the task is aborted.

Complexity analysis
Suppose one object has at most m grasp configurations, that is, B 0 ðOÞ 8 BðOÞ 8 m. In obstruction analysis, Distðr; OÞ has to be computed at most mA 2 n times, the time complexity is Oðn 2 À nÞ. For a whole task, replanning is done after grasping each object, so the total complexity is Oð P n k¼2 ðk 2 À kÞÞ ¼ Oðn 3 À nÞ, which runs in polynomial time.
Grasp order planning cannot promise to run in polynomial time, as in the worst-case n! grasp orders may be explored. Generally, the branch and bound method is fairly efficient for small-scale instances. For large amount of objects, we can divide objects into several groups according to their poses and plan grasp order for each group, respectively.
In grasp configuration planning, at most mn grasp configurations are tested by motion planner for n objects, thus the time complexity is OðnÞ. Although it looks low, grasp configuration planning actually occupies the most part in the whole planning time, because motion planning itself is time-consuming.

Setup
We built an experiment platform as shown in Figure 7(a). The robot is an ABB IRB 120 manipulator with a Robotiq 2F-140 gripper. A Microsoft Kinect is mounted on the manipulator. A small table with the size of 40 cm Â 40 cm is in front of the robot. We also built a simulation  platform as shown in Figure 7(b) based on Gazebo, and the virtual robot is the same as the real robot.
Six objects and their IDs are shown in Figure 8. We measured the sizes of these objects and directly modeled them as cylinders or boxes. Approximate models are used not only for grasp generation, obstruction analysis, and collision checking but also for grasping in simulations.
The parameters for grasp generation are q 0 ¼ p=4, The parameters for grasp planning are e ¼ p=9, e 0 ¼ p=18, a ¼ 0:01, and K q ¼ K d ¼ 1. Rapidly-Exploring Random Tree (RRT)-connect 32 integrated into MoveIt is adopted as the motion planner. If motion planning run time reaches 1 s and no collision-free trajectory is found, the grasp candidate is regarded as unreachable.
We would like to compare our planning method based on OD with a baseline method based on grasp ranking (GR), which ranks all grasp candidates by some metric and selects the best one. 33 Here, we use the relative pose score as the metric and Algorithm 5 shows the details. Grasp candidate selection is the same as our method, then B 0 of all the objects are gathered and sorted in descending order with respect to relative pose score E. Motion planning is tested for each grasp candidate until the first reachable one is found and the robot immediately grasps an object. These procedures are repeated until no object can be grasped. The GR method is satisfactory at the most times for grasping the single object, and we hope to see the performance of the two methods for grasping multiple cluttered objects.
To evaluate performance, the following data are recorded during simulations and experiments:

5.
Grasp order planning time: the time that is spent on grasp order planning. 6. Grasp configuration planning time: the time that is spent on grasp configuration planning. 7. Number of motion planning trials: the number of grasp candidates that are tested by motion planning.

Simulation
We generated four sets of scenes for 3, 4, 5, and 6 objects, respectively, and each set has 50 random scenes. To generate a scene, a first certain number of objects are randomly    selected, then they are dropped one by one from random positions over the table. Figure 9 shows some examples of the scenes. In the simulation, the robot can completely get IDs and poses of all objects on the table. Each scene was tested once under methods OD and GR, respectively, and the results are presented in Table 1. The source code for simulation is available at https://github.com/Kazfyx/grasp-planning. Unsuccessful grasping may be due to two reasons: (1) the object is initially at an unreachable pose, that is, all of its grasp candidates are not reachable, and (2) the object falls and rolls to an unreachable pose. We can see that more objects were grasped under method OD since less objects fell down. Method GR caused more fallen objects because it is originally designed for grasping a single object and no appropriate order of grasping is planned. As the number of objects increased, more objects fell because complex object interactions occurred more often.
Planning time (including time of obstruction analysis, grasp order planning, and grasp configuration planning) increased as the number of objects increased. Obstruction analysis time conformed to the time complexity Oðn 3 À nÞ. Grasp order planning also approximately ran in polynomial time, which proofs the efficiency of the branch and bound method. Grasp configuration planning took the most time as a lot of unreachable grasp candidates were tested. Table 2 presents some average data computed from Table 1. The average planning time per grasp has no obvious difference between the two methods, in the condition that GR does not do obstruction analysis and grasp order planning. Less motion planning trials were taken in OD since obstruction analysis gives which object is more likely to be graspable and which grasp candidate is more likely to be reachable. Besides, the average motion time per grasp is nearly unchanged over all simulations. Because the grasp candidates that likely generate short paths are tested first, and object positions did not vary much on a not big table.
Infrequently, there were two fallen objects under method OD. In Figure 10(a), object 1 sat on object 6 while leaned against object 4. Slanting angle of object 1 was too little to filter grasp configurations toward object 4, so object 4 was grasped first as it had more unobstructed grasp candidates. In Figure 10(b), object 2 leaned against object 6 from the back, and object 6 was not obstructed at all and was grasped first. In this case, object 2 can never be grasped first as it was unreachable.

Experiment
We designed four scenes as shown in Figure 11 for experiments. Object recognition 34 is based on 3D point cloud captured by the Kinect, and occluded objects may not be recognized. Scene 1 represents the situation that objects lean against others, scene 2 represents the situation that objects sit on others, and scenes 3 and 4 are complex situations including multiple object interactions.   Each scene was tested once under methods OD and GR, respectively, and the results are presented in Table 3. Method OD picked all the objects in the four scenes with no object collapse and taken less time for planning. Method GR led to object collapse in 3 scenes, 3 objects fell down and were not able to be grasped later.
In scene 1, all objects were recognized at the beginning. Figure 12 shows the selected grasp candidates, and Table 4 presents the object obstruction set. The execution processes are shown in Figure 13. OD planned a grasp order of X 4 ¼ ½object 2; object 1; object 3; object 4. GR grasped object 4 before object 3 because object 4 had a reachable grasp candidate with the highest relative pose score among all grasp candidates of objects 3 and 4. Then object 3 fell and came to an unreachable pose, so it could not be grasped later. In this case, much time was spent on grasp configuration planning as all grasp candidates of object 3 were tested by motion planning.

Conclusion and discussion
This article proposes a grasp planning method based on OD. First, a grasp configuration database, which has widely distributed grasp approach directions, is built and then it facilitates the search of reachable grasp configurations in cluttered scenes. Then ODs of grasp configurations and objects are analyzed according to the geometric relation between grasp configurations and object models. Finally, grasp order is planned in which object interactions are implicitly inferred by the ODs. At the same time, reachable grasp configuration is searched quickly for each object.
Simulations and experiments in a series of scenes demonstrate that our method can grasp objects in clutter efficiently and safely.
Compared with previous grasp action planning, we consider the relationships among objects and give proper order of grasping, which reduces the risk of object collapse. Compared with previous work on object support relation recognition, our method does not need complex computer vision algorithms or mechanical analysis and plans concrete grasp configurations. Also because of that, it cannot completely avoid improper grasp order, as object interaction is not explicitly recognized.
In the future, we are going to add more constraints in grasp planning and introduce risk estimation to allow the robot to terminate task execution when object collapse may happen. In addition, the uncertainty of perception is common, so possible unseen objects are also going to be taken into account for grasp order planning.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Key R&D Program of China under grant 2017YFB1303600.

Supplemental material
Supplemental material for this article is available online.