A robotic grasping approach with elliptical cone-based potential fields under disturbed scenes

Vision-based grasping plays an important role in the robot providing better services. It is still challenging under disturbed scenes, where the target object cannot be directly grasped constrained by the interferences from other objects. In this article, a robotic grasping approach with firstly moving the interference objects is proposed based on elliptical cone-based potential fields. Single-shot multibox detector (SSD) is adopted to detect objects, and considering the scene complexity, Euclidean cluster is also employed to obtain the objects without being trained by SSD. And then, we acquire the vertical projection of the point cloud for each object. Considering that different objects have different shapes with respective orientation, the vertical projection is executed along its major axis acquired by the principal component analysis. On this basis, the minimum projected envelope rectangle of each object is obtained. To construct continuous potential field functions, an elliptical-based functional representation is introduced due to the better matching degree of the ellipse with the envelope rectangle among continuous closed convex curves. Guided by the design principles, including continuity, same-eccentricity equivalence, and monotonicity, the potential fields based on elliptical cone are designed. The current interference object to be grasped generates an attractive field, whereas other objects correspond to repulsive ones, and their resultant field is used to solve the best placement of the current interference object. The effectiveness of the proposed approach is verified by experiments.


Introduction
With the rapid development of artificial intelligence, robots are more expected in our daily lives. They are required to operate in complex environments with different assignments. [1][2][3] To accomplish the given task, the robot has to interact with the environment, where autonomous grasping is an important aspect. With this capability, the robot can provide better services.
In robotic grasping, the grasp detection can be determined directly on the point clouds. 4,5 Zapata-Impata et al. presented a method to find the best pair of grasping points given a three-dimensional point cloud for an unknown object, where a set of geometric rules is employed to explore the cloud. 4 Considering the raw incomplete 3D point cloud, Gori et al. first reconstruct the object in 3D and then obtain candidate triplets using discrete particle swarm optimization for three-finger manipulation. Finally, the best grasp triplet is selected. 5 Suzuki and Oka presented a method for untrained objects with a single depth image. 6 The planar surface and the object are extracted by employing random sample consensus (RANSAC), and the robot grasps the object along the principal axis obtained by principal component analysis (PCA). Such solutions are suitable for untrained objects in some tasks, such as table clearing, and they cannot handle the specific object due to the indiscrimination of point clouds. To solve this problem, the object detection provides a preferable scheme.
For object detection, visual perception is commonly used. Traditionally, the robot can use the depth and appearance features to recognize an object. 7,8 The detection accuracy may be affected by illumination variations. With the development of deep learning, researchers have proposed abundant deep networks. The representative methods include two-stage faster regions with convolutional neural networks (Faster R-CNN), 9 single-stage You Only Look Once (YOLO), 10 and single-shot multibox detector (SSD). 11 SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which possesses competitive accuracy with a fast processing. On the basis of object detection, grasp detection is then executed. A grasp detection solution is the model-based method. Morales et al. determined the best grasp based on a global model database that contains computer-aided design object models and grasp set for each object. 12 However, the construction process is time consuming. Asif et al. 13 presented a framework of hierarchical cascaded forests with CNN features to perform object recognition and grasp detection with RGB-D images. Chao et al. applied direct predictor and multimodal grasp predictor to obtain the best graspable region, where faster R-CNN and SSD are used to locate the grasp objects. 14 Chu et al. proposed a deep learning architecture to predict grasps for robotic manipulation, and the learning problem is defined to be classified with null hypothesis competition instead of regression, the deep neural network can predict multiple grasp candidates of multiple objects. 15 These methods mainly concentrate on the grasp detection of the target object and the influence of other objects on the target is seldom considered.
Actually, in everyday life environments, the scenes tend to be complex. Nevertheless, the robot should grasp the target object with the consideration of other objects. Kuehnle et al. proposed a collision-free method to grasp objects in the presence of obstacles, where each object is constructed by a 3D scale-invariant feature transform (SIFT) model. 16 A problem is its computation burden of scene model updating. Berenson et al. presented a framework to find valid grasps in cluttered environments, which consider the robot kinematics, the local environment around the object as well as the grasp force-closure quality. 17 Nagata et al. found a grasp point with grasp evaluation using geometric information about the target object with a 3D environment model around it by stereo vision. After the target object is assigned by the user, the user also determines its category from a list of the object models and selects the grasp mode from the corresponding list in sequence on the monitor. 18 For these methods, they tend to no solution in disturbed scenes, where there exist interference objects, which severely affect the grasping and the target cannot be directly grasped. Some pioneering works have been proposed to handle the grasping under disturbed scenes. [19][20][21] Dogar and Srinivasa presented a push-grasp planner, which can push the target object away from clutter. 19 The object is pushed using a robot finger, and the capture region is introduced to enable the object roll into the robot hand. An assumption is that the pressure distribution is at the object's periphery. Also, this method has a critical requirement of the contact point on the object for the finger, and an improper contact point shall lead to that the object deviates out of the hand. Different from the above push-grasp scheme, manipulating the interference object away from the target object provides another choice. Stilman et al. presented a resolve spatial constraint (RSC) algorithm for manipulation planning, 20 where the placement of an interference object is determined from feasible sampled placements in a probabilistic way. The larger the area of a region without objects is, the higher the chosen probability of sampled placements in the region is. The aforementioned placement scheme neglects the energy consumption of the manipulator and sometimes the interference object is placed at a farther position. The evaluation of placements still needs to be further improved. Zhao et al. proposed to move the interference object, whose best placement is determined based on artificial potential field. 21 One drawback is that the object pose is not considered. The object is non-inclined and it is expressed by a projected envelope rectangle on the table plane, whose two neighboring sides are coincident with X w axis and Y w axis of the base coordinate system O w X w Y w Z w of the robot. Actually, the object can be placed in an arbitrary orientation, and the adaptability of this method to the object with an arbitrary orientation is poor. How to better select the placement of the interference object is still a challenge.
To achieve the matching of projected envelope rectangle and object pose, a natural solution is to generate an inclined projected envelope rectangle along the object orientation. However, it is difficult to express this projected envelope rectangle in a form of equation, and thus, the corresponding artificial potential field has to be discretely represented. This discretization in the inclined state brings in a computation burden, especially for fine-grained fields. A nondiscretized functional representation becomes crucial. In this article, a circumscribed ellipse corresponding to the minimum projected envelope rectangle is designed due to its small occupation space among the continuous closed convex curves. Then, the design principles of the potential field are presented, and they include continuity, sameeccentricity equivalence, and monotonicity. Guided by these principles, the elliptical cone is introduced and the attractive and repulsive potential field functions are then constructed according to different object types: current interference object to be grasped and other objects. Finally, the resultant potential field is acquired to determine the best placement of the current interference object.
This article is organized as follows. The problem statement is first given. Then, an object grasping method based on elliptical cone potential field model is presented. The effectiveness of the proposed method is verified by experiments.

Problem statement
This article is motivated by the robotic grasping under disturbed scenes, where the target object cannot be grasped directly and we focus on the moving of interference objects by the manipulator. Figure 1 shows an illustration, where a robot is required to use its manipulator grasping the target object. O w X w Y w Z w is the base coordinate system of the robot, where O w is at the center of the robot base. We label O c X c Y c Z c as the camera coordinate system with its origin O c being the center of the camera. Besides, the pixel coordinate system is expressed by o-uv. Also, the joint angles of the robot manipulator are described as q 1 , q 2 , . . . , q 6 .
A point p(u p , v p ) in the o-uv system can be converted to c P( c x P , c y P , c z P ) under O c X c Y c Z c according to the camera's intrinsic matrix T ¼ [f x , 0, u c ; 0, f y , v c ; 0, 0, 1], and its coordinate w P( w x P , w y P , w z P ) in O w X w Y w Z w is then obtained as follows w x P w y P w z P where c x P ¼ To accomplish the grasping of target object, the robot should possess the capability of object detection. In this article, SSD 11 is adopted and each detectable object is enclosed by a bounding box. In Figure 1, an apple and a cup are the detectable objects with their bounding boxes described in o-uv coordinate system. With the depth map provided by Kinect V2 camera, one can obtain the point cloud corresponding to each detectable object. In practice, the environments are complex and there inevitably exist some objects that are not trained by SSD. These untrained objects possibly disturb the grasping of detectable objects, and thus, their point cloud information is also required. The table plane where the objects are placed is firstly fitted by RANSAC, 22 and then, the straight-pass filter is applied to obtain the point clouds of the objects. And then, a 2D To better extract the untrained objects, a prerequisite step is to remove the point clouds related to detectable objects. Then, Euclidean clustering 23 is utilized to acquire the point cloud of each untrained object. No matter detectable objects or untrained objects, with the point cloud of each object, PCA 24 shall be used to obtain the respective principal axis. Moreover, a minimum 3D bounding box B O j along the principal axis direction is acquired. Accordingly, the height of an object is computed.
For an object O j , j ¼ 1, 2, . . . , N o , the vertex coordinates of its 3D bounding box are calculated and then vertically projected on W m to form a closure region, where N o is the number of objects, W m refers to the manipulator's workspace (see Figure 1) and it is a predefined zone in O d X d Y d . This closure region is bounded by a minimum envelope rectangle, which is expressed by R s . The four vertexes of R s are described as P Algorithm 1 presents the information extraction process of the objects including all detectable objects and untrained objects, where N um is the number of the detectable objects whose information is stored in Det_Obj. C O[i] refers to the point cloud of the detectable object Figure 1. Illustration of the robot grasping the target object, where O g is the target object, O 1 is an untrained object, and O 2 refers to the detectable object. W m describes the workspace of the manipulator.

Object grasping based on potential field model with elliptical cone
To achieve the grasping under disturbed scenes, the robot should move the interference objects firstly with the consideration of other objects. Artificial potential field method is a general solution to express the interaction among the objects. In reality, different objects have different shapes with respective orientation. The design of the potential field should consider the object information including its orientation and height.

Elliptical potential field
For an inclined object, the construction of the potential field is not accurate by relying on an arbitrary projected envelope rectangle, and a minimum envelope rectangle corresponding to its major axis is more preferable. Instead of adopting the rectangle-based discrete potential field, in this article, we first obtain a circumscribed ellipse for this minimum rectangle, and then, an ellipse-based continuous potential field is proposed. As shown in Figure 2, for the minimum envelope rectangle R s of the object O j , the minimum circumscribed ellipse E l can be obtained, where F 1 and F 2 are the focal points of E l , and the points U O j o , F 1 , and F 2 are in the line l a . Due to the fact that the object O j may be in an arbitrary pose, it is needed to be transformed to the standard pose for calculating the parameters of E l . The transformation is as follows where ðx P O j ; y P O j Þ is the coordinate of any point P in E l , ðx 0 P O j ; y 0 P O j Þ is the coordinate of point P in the standard pose, and R ¼ is the transformation matrix. According to the standard ellipse equation , the parameters a and b are semimajor axis and semiminor axis, respectively. For the object O j , combining with P O j t ðt ¼ 1; . . . ; 4Þ, one can obtain the parameters of its minimum circumscribed ellipse    (4) into (3), the general elliptical equation of E l is then acquired After the minimum circumscribed ellipse E l of the object O j is obtained, the influence of O j can be considered by concentrically expanding E l with the same eccentricity in a discrete way (see E l1 in Figure 2). However, this solution based on an incremental focal distance shall lead to a complicated calculation process. To solve this problem, a functional solution is required. Note that the potential field function is constant within the ellipse E l . In the region outside E l , the potential field function should conform to the following design principles: Ø The potential field function is continuous; Ø For any two positions on the plane O d X d Y d , if they locate at an ellipse, whose center and eccentricity are the same as E l , the influences of O j on them are equal, which is called same-eccentricity equivalence; Ø The variation trends influenced by the object O j are different for the directions of major axis and minor axis of E l ; Ø The function is monotonically increasing if we expect that the farther a position is away from the center of E l , the larger the influence of O j is.
It is referred to as attractive potential field function; Ø The function is monotonically decreasing if we expect that the farther a position is away from the center of E l , the smaller the influence of O j is. In this case, it corresponds to repulsive potential field function.

Calculation of the potential fields
When the manipulator cannot grasp the target object, it has to select an interference object and move it away first. The selection criterion is based on the distance to the center of the target object and smaller values represent higher priorities. The chosen interference object is called current grasped object, and other objects are subdivided into two categories: the residual interference objects and the target, and noninterference objects. The object numbers of these two categories are labeled as N r1 and N r2 , respectively. It shall be noted that the less the current grasped object is moved, the smaller the manipulator consumes. Therefore, the influence of the current grasped object is consistent with the fourth design principle. On the contrary, every other object generates a repulsion to   current grasped object, and its influence coincides with the fifth design principle.
Meanwhile, for any object O j with its minimum circumscribed ellipse E l , the former three items of the design principles are the basis. To satisfy these requirements, we introduce an elliptical cone E O j cone , which will take effect for a point For the current grasped object O ig , its elliptical potential field function F at should conform to the fourth design  principle, and it tends to be moved along the short edge direction of projected rectangle. The attractive field of O ig is designed as follows where d cg refers to the distance between the object O ig and its nearest object. is a given value, e cone is calculated according to equation (6) and its corresponding ellipse has the same eccentricity and center as the ellipse E O ig , with k vir times of semimajor axis, whose direction is perpendicular to that of E O ig , where k vir (k vir < 1) and a are given values. F at is set to a constant C at when a point is within the ellipse E O ig ¼ 0. F at is monotonically increasing in the region outside its minimum circumscribed ellipse.
Different from the object O ig , every other object O r generates a repulsive elliptical field F O r rep in accordance with the fifth design principle where d O r O ig is the distance between the objects O r and O ig , and s is a given value. F O r rep is set to a constant C rep for the points satisfying E O r 0. Note that F O r rep is monotonically decreasing in the region outside its minimum circumscribed ellipse.
Considering all the objects, we calculate the resultant potential field F res as follows The placement to minimize the resultant potential field is considered as the best one p Ã O ig , which is given by where p represents a position in the manipulator's workspace W m . If there exists more than one minimum solution, the robot shall choose the solution, which is closest to the object O ig . Notice that the point p Ã O ig is respect to O d X d Y d . The flowchart of the proposed approach is shown in Figure 3. The image information and depth information of the scene are firstly provided by Kinect2. The robot recognizes the detectable objects and others are referred to untrained objects. Combining with the depth information, the point cloud of each detectable or untrained object is then obtained by Euclidean clustering. According to PCA, we acquire minimum 3D bounding box of each object including size and pose information. By comparing the Euclidean distance D is between the target object and every nontarget object with a given threshold D t , nontarget  objects are classified into the interference objects and noninterference objects. If the number N fer of interference objects is zero, it means that the robot may grasp the target object directly, otherwise, the robot has to move the interference objects according to the placement position solution based on elliptical cone potential fields. Note that the interference object with smaller D is has a higher priority. Repeat the above processing until the robot seizes the target object.

Experiments
In the experiments, Kinect V2 is used to obtain the scene information, and a service robot with six degrees of freedom Kinova manipulator is required to grasp the target object. SSD is adopted to detect objects with 2D red bounding boxes, and then, we acquire the point clouds of untrained objects. The experiments concern the following objects: target apples, cup, beverage and box, where box and beverage belong to untrained objects. Experiment 1 considers the box and the beverage as noninterference objects. Figure 4 shows the detection result of experiment 1. Figure 5 describes the potential fields. The potential field of each object is shown in Figure 5(a) to (d), respectively, and one can obtain the resultant field, as shown in Figure 5(e). In Figure 5(b), the beverage is closed to the border of the robot workspace, and thus, the potential field seems to be cut. The video snapshots of the manipulator grasping the target apple are shown in Figure 6. It is seen that the manipulator firstly moves the cup away and places it to a new position (À0.24, À0.49) determined by equation (10), and then, the apple is grasped smoothly.
In experiment 2, the target object apple is located at the right-bottom corner of the scene. The beverage is regarded as a noninterference object and the box is seen as an interference object. Figure 7 shows the detection result of experiment 2. All the potential fields are shown in Figure 8. By combining the potential fields of all objects shown in Figure 8(a) to (c), one can obtain the resultant field, as shown in Figure 8(d). The video snapshots of the manipulator grasping are shown in Figure 9. It is seen that the manipulator firstly moves the box to a new position (À0.12, À0.46), and the apple is grasped smoothly. Experiment 3 considers two interference objects. The detection result is shown in Figure 10. Figures 11 and 12 demonstrate the potential fields and video snapshots of experiment 3, respectively. For better description of the process, it is divided into two stages: stage I and stage II. The former is to move away the interference cup, and in the latter, the interference box is moved. The potential fields of the box, apple, and cup in stage I are shown in Figure 11(a) to (c), and at this moment, the current grasped object cup corresponds to the attractive one. Based on the best placement result obtained by the resultant field shown in Figure 11(d), the cup is moved to its new placement position and the moving process is shown in Figure 12(a) to (d). After the cup is released, the robot continues to execute stage II. In this stage, the box becomes the current grasped object, and thus, it generates an attractive field, whereas the  cup corresponds to a repulsive one. Note that for the target apple, it always generates the repulsive field. Combining the fields shown in Figure 11(b), (e), and (f), one can obtain the resultant field (see Figure 11(g)). On this basis, the robot grasps the box and moves it to its new placement position, and the moving process is shown in Figure 12(e) to (g). Finally, the apple is grasped, as shown in Figure 12(h) to (i).

Conclusion
In this article, a robotic grasping approach with elliptical cone-based potential field is proposed to handle the challenge from disturbed scenes. Based on the extraction results of SSD and Euclidean cluster for the detectable and untrained objects, the robot acquires the attractive or repulsive potential field of each object and determines the placements of interference objects. Compared to conventional circumscribed circle envelope, the circumscribed ellipse envelope in this article is better due to its small occupation space among the continuous closed convex curves. The form of ellipse can reflect the poses of different objects with better fitting degrees. The resultant continuous elliptical cone fields can improve the placement position of interference object. The experimental results verify the effectiveness of the proposed approach. In the near future, we shall conduct more deep research on the moving sequence of interference objects as well as the grasping in a larger environment that also relies on the navigation ability of the robot.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.