Use of Colour and Shape Constraints in Vision-based Valve Operation by Robot

This paper proposes a new strategy for a humanoid robot to approach and operate a valve based on colour and shape constraints. It consists of four stages, namely rough base approaching, fine base approaching, rough hand approaching and fine hand approaching and grasping. The robot estimates the object's position using its stereo-vision at the first stage. A new visual positioning method is used to determine the valve's position and pose in the robot's frame in the second stage. When its hands are near the valve, a visual servoing method is employed to catch the handle of the valve via cameras in end-effectors. The advantages of both eye-in-head and eye-to-hand systems are exploited. Experimental results are presented to verify the effectiveness of the proposed method.


Introduction
Hand-eye systems is widely used in robotics applications, which include two types: one is an eye-inhand system (EIHS) that has cameras installed on and moved with hands, and the other is an eye-to-hand system (ETHS) that has cameras that do not move with hands (Flandin 2000).EIHS is very popular in the area of industrial robotics.When a manipulator approaches a target, the distance between the camera and the target is reduced, and the measurement error of the camera is decreased.Visual control methods in an EIHS are divided into three types, namely image-based, position-based or hybrid (a combination of both).The image-based visual control method can effectively eliminate camera calibration error because of the closed loop established in the image space.On the other hand, the absolute measurement error in positionbased visual control is dramatically reduced while a manipulator is close to a target.The same situation happens under the hybrid visual control method (Hager 1996, Chaumette 2000, Corke 2000, Zhu 2000, Wells 2001).However, EIHS has a vital drawback, i.e. the object cannot be guaranteed in the view field of the cameras at all time, especially during the pose adjustment of the hand at a long range (Hager 1996).In contrast, ETHS can be effectively used in humanoid robots and mobile manipulators that operate in a large work space.When the robot is far from a target, it travels toward it and stops at a close range.Then, according to visual measurements, the manipulator approaches the target and manipulates it.To ensure that the end-effector can reach the target accurately, some researchers have designed special marks which are installed on the endeffector and the target (Han 2002, Cardenas 2003).The approaching task is realized through closed loop control of end-effectors.However, because the target or markers may be partially blocked during approaching or manipulation, image-based or hybrid visual control methods may not be able to bring the manipulator to the target accurately.As we know, the position of an object in 3D space can be calculated from two image points using stereo cameras and according to the projecting view lines.The lack of constraints, errors in calibration and errors in image coordinates of matching points result in large errors during object positioning and pose estimation.By using shape constraints of an object and its multiple imaging points, positioning accuracy, especially pose estimation accuracy, can be increased and the influence of the last factor can be partly eliminated (Bartoli 2001).By combining ETHS and EIHS, a humanoid robot could use its hands to reach and manipulate an object accurately.In this paper, the advantages of both eye-to-hand and eye-in-hand systems are fully exploited in the development of a new positioning method.The blocking problem for the eye-to-hand system is effectively avoided since cameras on the head are active.The problem of losing targets in the field of view for an eye-in-hand system is resolved, and end-effectors only adjust their position in a small range.The rest of this paper is organized as follows.Section 2 introduces our humanoid robot and the four-stage process for finding and manipulating a valve.The camera models are described in Section 3. Section 4 proposes a new visual positioning method based on rectangle constraints, which accurately provides the position and pose of the valve.System calibration is conducted in Section 5 to verify the accuracy of the proposed positioning method.Section 6 presents the application experiment that is designed for the humanoid robot to approach and operate the valve autonomously, and results show the effectiveness of the proposed method.Finally, Section 7 provides a brief conclusion.

The robot and its control strategy
As shown in Fig. 1, our humanoid robot consists of a head, a body with two arms and a wheeled mobile base.The robot body has three degrees of freedom (DOFs), i.e. twist, pitch and yaw.The two arms/manipulators have six DOFs and are fixed, one on each side of the body.Each has an end-effector as its hand, and its wrist is equipped with a mini camera and force sensors.Note that we treat the end-effect, gripper and hand as the same in this paper from now on without further explanation.
The robot head has two cameras as eyes and a PC104 computer to process images used to position the valve.Once the robot finds the valve, it moves towards it and operates it using its hands, as shown in Fig. 2. Operations include turning on or turning off the valve.These operations can be remotely controlled by an operator using audio commands sent via radio.
The process of finding and operating the valve consists of four stages as follows: o Stage 1 − The robot first uses its stereo vision to estimate the rough position of the valve relative to its own position in order to approach the valve.At this stage, the centre of the image area for the red colour marker is selected as the feature point, and the pose of the valve is not important.When the distance between the valve and the robot is less than two meters, the first stage of the positioning method is ended and the second stage begins.A new strategy is developed for measuring the position and pose of the valve, in the robot frame, based on the shape constraint of the marker.o Stage 2 − According to the position and pose of the valve in the robot frame, the robot moves near to it at a range that is reachable by its arm.The position and pose of the valve, calculated at the end of the 2nd stage, is used for the movement control of its arm in the 3rd stage of the positioning method.The given pose of the end-effector of the robot arm is calculated, and is kept for later stages.o Stage 3 − The position that the hand should reach at this stage is calculated according to the position of the valve (by considering the position of the mark and handles).Based on kinematics and inverse kinematics, the hand is controlled to move to the handle while the camera in the hand measures the green colour image size of the handle marker.It will stop when the marker size is large enough or a given position is reached.
o Stage 4 − An image-based visual servoing method is adopted to guide the end-effector to reach and catch the handle.Finally hybrid control with force and position is employed to rotate the valve using two hands.
With regard to control, many methods are employed in the process described above.The control methods in the 1st and 2nd stages employ position-based visual servoing.Control in the 3rd stage employs models and control in the 4th stage involves image-based visual servoing.
The position-based visual servoing methods in the 1st and 2nd stage and the model based control method in the 3rd stage are traditional.They are omitted here because of length limitation.The pose of the valve, given at the end of the 2nd stage, is an important parameter because it ensures that the end-effector can catch the handle with correct orientation.The visual positioning method in the 2nd stage will be described in the next section.

Camera model
To enlarge the field of view, 8mm focus lenses are selected for the cameras in the robot head.However, this kind of lenses has the distortion problem, which needs to be corrected.In this research, the process of distortion correction is carried out by simply changing a non-linear image to a linear one.In other words, the imaging curve of a line should be corrected to a linear line.To simplify the process, the non-linear model shown in (1) is used to denote the radial distortion.
where [u, v] is the radius.
The intrinsic and extrinsic parameter models of the cameras are shown in ( 2) and (3).
, and [x c , y c , z c ] are the coordinates of a point in the camera frame.M 1 is the intrinsic parameter matrix, and [k x , k y ] are the magnification coefficients from the imaging plane coordinates to the image coordinates.

Visual positioning method
A red rectangular colour marker is attached to the valve, as shown in Fig. 2. The measurement of the position and pose for the valve is similar to that for the red marker.A frame is established as a target frame based on the rectangle centre, which takes the rectangle plane as a XOY plane.The line between two handle markers acts as the X axis, as shown in Fig. 3.The rectangle size is 2X w in width and 2Y w in height.The coordinates of the four vertexes P 1 to P 4 are also known in this frame.Obviously, any point on the plane should satisfy z w =0.

The derivation of vector n v
According to the orthogonal constraints of M 2 , we have (4) that is derived from (3) with the condition z w =0.

⎩ ⎨ ⎧
where . x′ c .Note that y′ c can be obtained from ( 1) and ( 2) according to the imaging coordinates [u, v].
All points on the line parallel to the X axis have the same coordinate y w , so A 1 and B 1 are constants.Taking two points on the line arbitrarily, such as point i and point j, and applying them to (6), we obtain (7) and also its simplified form (8) which results from its simplification using the orthogonal restriction of the rotation matrix M 2 .
Any two points on the same line parallel to the X axis should match (8).Therefore, we can obtain two equations for one camera from the two lines parallel to the X axis, i.e. four such equations for two cameras.If the camera's optical axis is not vertical to the target plane, then n z ≠0, (8) can be divided by nz and become (9). of ( 8), with the results of n x and n y .n x is positive, and the sign of n y depends on (8).

The derivation of vector a v
According to (3) and the orthogonal restriction of the rotation matrix M 2 , we have where In a line parallel to the Y axis, x w is constant, so A 2 is a constant too.Because A 2 ≠0, B 1 ≠0 and z c ≠0, (11) becomes where C 2 =B 1 /A 2 . where Similarly, two equations are formed from line x=X w and x= -X w as in (18).Combining (17) and (18), p x 、p y and p z are calculated. where . C 21 and C 22 correspond to lines x=X w and x=-X w , respectively.

2) Fine Positioning:
In the camera frame, p v is a position vector of the original point in the target frame.Obviously, using p v and (2), the image coordinates of the target original point, [u b , v b ], are obtained easily.
Fig. 4 shows the relation between the space point and its imaging point.According to the camera's pinhole model, the target point P 1 in the space and the point P′ 1 on the plane Z c =p z share the same imaging coordinates.Using the imaging coordinates P′ 1 and angle β, the x-coordinate of the point P 1 in the target frame is obtained as follows: where β x is the angle between projections of the Z axis in the target frame and the Z c axis in the image frame on the plane X c O c Z c , as shown in (20).21) to (19), P 1x , the coordinate for P 1 on the axis X in the target frame is calculated.
Similarly, the coordinate for P 1 on the Y axis in the target frame, P 1y , is also obtained.where m 1x and m 1y in ( 22) and (23) are the coordinates of P 1 in the virtual target frame, which is in the normalized focus imaging plane, i.e. p z =1.
In the target plane, coordinates offset on the axis Y of both top brim and bottom brim of the rectangle are integrated along the axis X, obtaining the area S of the target rectangle.where S 1 is the target area on the normalized focus imaging plane.M is the sample numbers along the X axis in the target frame.

Camera Calibration
The two cameras on the robot head were well calibrated using the method described in (Zhang 2000, Heikkela 2000).Their intrinsic parameters are shown in Table 1.
The extrinsic parameters of the left camera relative to the end of the industrial robot are given in (26).

Verification of the visual measurement
An experiment was designed and conducted to verify the proposed method with a rectangular colour marker attached to a panel.A red rectangle was viewed as the valve, and had a dimension of 98mm × 100mm.Two green parts are used to simulate valve handles.The robot head was installed on the end of an industrial robot as shown in Fig. 5 (a).The target was laid on the ground under the head.Images captured by two MINTRON 8055MK cameras in the head are as shown in Fig. 5 (b).
In the experiment, the target was fixed on the ground under the robot head.The position and pose of the robot's hand was changed so that the cameras could capture the fixed target.The position and pose of the target relative to the left camera at the i-th sampling is denoted as T ci , that of the robot's hand as T ei , and that of the target in the world frame of the industrial robot as T wi .The edges of the red rectangle were detected using a Hough transformation (Tzvi 1990) after distortion corrections.Two points were then selected from each line to calculate, T ci , the position and pose of the target according to the method described in Section 4. Four vertexes were computed using the intersection of these lines, then the fine position vector p v was obtained and T ci was modified.Table 2 shows verification results.

A comparison with traditional stereovision
To compare the proposed method with a traditional stereo vision method, another experiment was conducted.Four points that the rectangle intersects with the x-axis and y-axis of the object frame were selected as feature points for stereovision.Their positions in Cartesian space were computed, and were used to Measurements were taken three times under the same conditions.Table 3 shows measuring results for the position and pose of an object.The first column shows the results for the position and pose of the object computed with a traditional stereovision method, while the 2nd column shows the results for the proposed method.Position values are shown in mm.The results with stereo vision were different, and the results with our method were unchanging.

Results with stereovision
Results with the proposed method  3. Measuring results for the position and pose of an object using stereovision and rectangle constraint Table 4 shows the positioning results for four feature points P1 to P4 in terms of the stereovision method and our proposed method.The results with the proposed method are formed using the coordinates of the feature points in the object frame, and the position and pose of the object frame.It can be found that the positioning results with our method are very stable.

Results with stereovision
Results with the proposed method Times It should be noticed that the method proposed in this paper computes the position and pose of the target with the imaging points on edge lines of the rectangle (these are detected through a Hough transformation).Even if there were errors in some imaging points, the edge line would be accurate enough for the Hough transformation, which can eliminate the influence of random errors.Furthermore, it does not need to employ feature point matching.Measuring results for the proposed method are stable and insensitive to random noise.In other words, the proposed method is more robust than the traditional stereovision method in terms of noise resistance.Errors which exist in the measurements taken with the proposed method mainly occur due to system errors such as camera calibration errors and so on.Errors in poses should be less than those in positions, i.e. the object pose measurements have higher accuracy.

Experimental results
Based on the proposed method, experiments were designed and conducted for our humanoid robot to approach and operate a valve with a rectangular coloured marker attached on its panel, as shown in Fig. 2. The red rectangle was 100mm in height and 100mm in width.The pose of the valve handles was marked by green colour, whose direction was consistent with the X axis in Fig. 3.The head with two MINTRON 8055MK cameras is shown in Fig. 1.Two mini cameras were fixed on the wrists of the two manipulators.The cameras on the head were well calibrated, but the ones on the wrists are not calibrated.

Approaching the Valve by the mobile base
At the beginning, the robot searched for the target valve in the laboratory.When the valve was found, the 1st stage described in Section 2 was started.When the valve was within two meters of the robot, the 2nd stage began and the method described above was in operation.The position and pose of the mobile base was adjusted according to that of the valve until the robot was in an adequate operational area.When the robot stopped moving, the position and pose of the valve relative to the head was measured again by using the proposed method.The position and pose of the target valve relative to the chest of the humanoid robot could be obtained through coordinate transformation.Table 5 shows the position and pose of the target relative to the reference frame at the chest.The pose and position of the target relative to two end-effectors could also be calculated respectively through coordinate transformation.At the 1st and 2nd stages, both arms were not in operation and were kept in a static position and pose.In the process of approaching the target by the humanoid robot, two arms were positioned so that they did not block the head field of view of the target valve.

Moving the end-effectors near to the Handle
Once the robot was in an adequate operational area, the hands of both arms would move, one to each of the handles of the valve.At the same time, the cameras on the head were inactive.The goal position and pose of the two end-effectors were determined according to the pose and position of the valve given above.The goal positions of the hands, especially in the cameras' view direction, had an offset added in order to avoid collisions between the end-effectors and the valve as a result of any error.The moving paths were planned to satisfy positions with a high priority so as to avoid collisions, except for the cameras' view direction.The movements were controlled using kinematic and inverse kinematic models of manipulators.Therefore, end-effectors could move to the given goal quickly.At the same time, the camera at each hand was in operation to measure the size of image areas of the valve handle (the green colour marker on both sides of the valve).The size of the green markers increases as each hand moved closer to the handle.If the size was large enough or a given position was reached, position adjustment was ended and the process changed to the 4th stage of the proposed positioning method.
Fig. 6 provides a pair of images of one hand, captured at the end of the 3rd stage.Both images, i.e.Fig. 6(a) and Fig. 6(b), were captured by the left and the right cameras of the robot head respectively.It can be seen that the endeffector is at the place near to the handle with an appropriate pose.This means that the pose calculated by the proposed method has good accuracy.

Approaching and Catching the Handle
An image-based visual servoing method in the 4th stage was applied to guide the end-effectors to reach and catch each handle.As pointed out in (Hager 1996), imagebased visual servoing methods for an eye-in-hand system have the drawback that a target object may be out of the camera's field of view during pose adjustment of the endeffector, which results in servoing failure.If positions were only adjusted in a stationary pose, this drawback would be overcome.However, to ensure that the pose is stationary in the servoing process and the end-effector can catch the handle with appropriate pose, the pose of the end-effector should be given accurately at the beginning of visual servoing.This is why the pose of the valve needs to be measured accurately in the 3rd stage of the proposed positioning method and kept unchanged in the 4th stage.The goal of image-based visual servoing is that the image of the green marker, representing the handle, should match a given reference image as much as possible.The position adjustments of end-effectors were given a high priority, except for one (in the cameras' view direction), to avoid collision with valve handles.The end-effector was open during the visual servoing process.The goal was to adjust the end-effector position at a small range, and the gripper reached the handle with an appropriate pose when guided by the camera in hand.The final part of the process involved the gripper closing to grasp a handle.A hybrid control method using force and position was employed to rotate the valve with the robot's two hands.It is omitted here.In a series of experiments, the humanoid robot was able to autonomously find, reach and operate the valve successfully.These experiments show that the position and pose of the valve calculated using the proposed methods are accurate enough to guide two arms in order for them to operate the valve.The advantages of using both eye-to-hand and eye-in-hand systems are clearly demonstrated.

Conclusions
A new visual servoing strategy for a humanoid robot to approach and grasp a valve is proposed.It consists of four stages, namely rough base approaching, fine base approaching, rough hand approaching and fine hand approaching and grasping.As an important part in the process of autonomous valve manipulation, a visual positioning and control method was proposed in this paper (for a hand-eye system and rectangular shape constraints).It employs multiple imaging points, which lie on lines with pre-known parameters in the objective frame.Positioning accuracy and robustness, especially the pose, were increased, and the influence of position errors in images was eliminated.Based on the position and pose of the valve being calculated using the proposed method, end-effectors could smoothly reach valve handles, under the guidance of a hand-eye system.The end-effectors of our humanoid robot could catch the handles successfully and rotate the valve.The results verify the effectiveness of our proposed methods.The reliability and robustness of the system were significantly improved.The methods em-ployed can be widely applied in real-world applications of humanoid robots and mobile manipulators.

Fig
Fig. 1.The humanoid robot where [x w , y w , z w ] are the coordinates of a point in the object frame, and M2 is the extrinsic parameter matrix.In Fig. 3.The objective frame of a rectangle (2) using the imaging coordinates.OP′ 1 is the offset on the X c axis between the points P′ 1 and O on the plane Z c =p z , as in (21).

Fig
Fig. 4. Space position and imaging the Y axis of the i-th edge point of the rectangle, which have the same coordinates on the X axis.
p y are recalculated by applying p z to (17) and (18), to obtain more precise target position vector p v .

Fig. 5 .
Fig. 5.The experimental scene and target image Fig. 6.Images captured by the cameras on the robot head are the coordinates of a point in a practical image.[u 0 , v 0 ] denote the image coordinates of the centre of the optical axis.

.
To improve accuracy, points on two lines parallel to the Y axis are used to calculate the results of a x , a y and a z .It should be noticed that each line parallel to the Y axis has a different constant 2 Two points are taken from two lines, one on line y=Y w and the other on line y= -Y w .The corresponding constants C 1 for the lines are calculated from (6), denoted as C 11 and C 12 respectively.C 11 corresponds to y=Y w .To enhance the precision, more points on each line are needed.C 11 and C 12 are calculated from the average of C 1 for each line respectively.From C 11 and C 12 , two equations (15) and (16) with p x 、p y and p z are obtained.Equation (17) is derived from them.

Table 1 .
Camera parameters

Table 4 .
Positioning results for the feature points using stereovision and rectangle constraints

Table 2 .
Verification Experiment Results

Table 5 .
Position and pose of the Valve