Volumetric Calibration of Stereo Camera in Visual Servo Based Robot Control

The primary objective of the paper is to propose a calibration method for a stereo camera used in a visual servo control for a robot manipulator. Specifically, projection matrix between the stereo camera and world coordinates is established using few calibration points and solved using the single value decomposition technique. Then calibration accuracy is compared for a randomized and designed set of points, and economical number of calibration points is recommended. Additionally, the non-linear lens distortion is modeled and corrected to improve the accuracy. In addition, this research focuses on the development and implementation of a fully automated visual servo control system using a stereo camera that is calibrated by proposed method.


Introduction
Since the initial introduction in 1960s, industrial robots has been widely used in various fields of manufacturing industry and service sector due to their ability to increase the production rate with high quality and decrease the maintenance and purchasing costs. Each year the population of industrial robots increases worldwide as the related industry regards them as more important because of product superiority they can offer (Litzenberger, G., 2007). A key to secure the superiority is the addition of intelligence by means of sensors including computer vision. For instance, industrial robots equipped with a computer vision system are considered more reliable and cost-effective in manufacturing and automation processes. The computer vision system employs traditional easy-to-use single camera or recently developed stereo camera in order to directly determine 3D coordinates of the scene on the image plane without intensive computation. More specifically, in the stereo images or stereo vision (also known as binocular vision) two imaging devices are placed at a known distance apart and fixed within the environment. Then both cameras scan the scene and generate a picture matrix where given any point in the scene, there will be two pixels representing that point: one in the left camera image and the other in the right camera image (Staugaard, A.C., 1987). The aforementioned high reliability and cost-effectiveness of industrial robots with the computer vision system can be ensured by the accurate calibration for the applied vision systems. In general, a vision system is calibrated for two purposes: (i) projection, which is for computing the image coordinates from given world coordinates; or (ii) back-projection, which is for estimating the 3D position of a world point from acquired stereo image points (Tsai, R.Y., 1986). The purpose of projection and back-projection is mainly to make position measurements for 3D applications including dimensional inspection and robotic manipulation. The concepts of projection and back-projection for the successful calibration of vision systems have been widely accepted and studied, and some more closely related to this research include neural networks, photogrammetry, and stereoscopic 3D metrology. Neural networks (NN) possess the following features for the camera calibration (Jun J. & Kim, C., 1999): • The structure of network in NN is made up of an interconnection of nonlinear neurons and therefore NN have the potential for learning the nonlinear imaging process.
• The basic principles behind supervised NN learning and camera calibration are the same such that both use a set of known data to find system parameters and apply these parameters later for the data unseen during its learning stage of the environment.
• Solution method using a neural network for a certain problem can be considered as a model-free solution, which is in common with the nature of implicit camera calibration techniques. Since projection and back-projection can be considered as the mapping between image and world coordinates, the function identifying the mapping can be approximated without complicated mathematical modeling using NNs. However, the determination of proper network architecture for the specific problem is ambiguous. In addition, there are two practical difficulties in the approximation of projection or back-projection (Do, Y., 1999): (i) it is difficult, for example, to find the optimal number of nodes in hidden layer; and (ii) it is computationally inefficient in general to achieve accurate approximation by training a neural network. On the other hand, photogrammetry, which is characterized by its image plane and the position of the cameras' optical center, incorporates the use of the camera parameters: the internal and external parameters. Due to its simplicity, the photogrammetry is widely accepted in stereo camera calibration for robot vision systems. In this approach, the parameters for the left and right camera can be computed using the knowledge of at least six (6) corresponding scene and image points with the help of a perspective transformation equation (Weckesser, P. & Wallner F., 1993). The practical problem with this approach, however, is the exact determination of the positions of the reference points in the images. Stereoscopic 3D metrology is one of the examples, where physical camera parameters are not required to be identified, if a 3D point observed by stereo cameras can be accurately determined by the use of intermediate parameters.
This research deals with solutions for the back-projection problem, which is about obtaining a projection matrix that is formed by the internal and external camera parameters for a specific camera position and orientation. More specifically, an efficient stereo camera (left and right camera) calibration algorithm is developed to ensure faster calibration with higher accuracy than complicated calibration process for the stereo camera. The algorithm developed in this research is based on the perspective projection and introduces the use of forward and inverse kinematics of manipulator links. In addition, single value decomposition (SVD) technique is used to solve the implicit relationship between world and image coordinates obtained from left and right camera. Importantly, a computational experiment is designed to improve the calibration accuracy and the impact of different set of calibration points in robot working volume on the calibration accuracy is studied. The improvement of accuracy is evaluated against a randomized set of calibration points. Then, an economical number of calibration points is determined from the experiment. For further improvement of the calibration accuracy, the nonlinear lens distortion should be corrected by removing the nonlinearity in the camera image. Therefore, in this research, lens distortion existing in the given stereo camera is modeled and corrected considering incorrect cropping and skewness (Ojanen, H., 1996). This paper is organized as follows: Section 2 explains the camera model based on pinhole perspective projection. In Section 3, calibration algorithm for a stereo camera is discussed in details. The model of the camera lens distortion and its correction is described in Section 4. Section 5 summarizes experimental design and results.

Camera Model
In the camera calibration, each camera contains a lens that forms a 2D projection of the 3D scene on the image plane where the sensor is located. This projection causes direct depth information to be lost so that each point on the image plane corresponds to a ray in 3D space. Therefore, additional information is needed to determine the 3D coordinates corresponding to points on an image plane. More specifically, the transformation from 3D world coordinates to 2D image coordinates must be determined by solving the unknown parameters of the camera model. This information can be obtained from the utilization of multiple cameras, multiple views with a single camera, or knowledge of the geometric relationship between feature points on the target. For the simplicity, a stereo vision is integrated in this research allowing a pinhole camera model based on perspective projection to be utilized. Stereo vision (binocular vision) is achieved by two imaging devices, which are placed with known distance. In this paper, the stereo camera is fixed within the environment. Fig. 1 illustrates pinhole camera model. In the configuration of a pure pinhole camera model, the center of projection is at the origin O of the camera frame C. The image plane П is parallel to the xy plane in the camera frame and it is displaced at a distance f (focal length) from O along the z-axis. The z-axis is called the optical axis, or the principal axis, and the intersection of where λi is an Eigen-value used to classify the image structure. Note that the coordinates in the world frame and image frame can be obtained from the robot controller and image processing software, respectively. In this research, IRB-140 articulated industrial robot from ABB Inc. has been used with Point Grey's Bumble Bee stereo camera. Once the world and image coordinates are obtained, the objective of perspective projection is to solve the projection matrix, M, which can be written as follows using equation (1) (Heikkila, J., 2000): After arrangement, we obtain the following: Thus, each set of reference points (n) in world and image coordinates generates two linear equations which can be rearranged into a [2n×12] projection matrix. However, since the projection matrix has 12 elements yet only 11 degrees of freedom, the scale of the matrix can be chosen and set to unity; in other words, m34 = 1. As a result of this, the following equation can be obtained: Given any set of calibration points of 6 ≥ n , the solution of equation (5) can be readily obtained using several techniques such as Moore-Penrose pseudo-inverse technique and single value decomposition (SVD). In this research, the SVD is used for improving the calibration accuracy than the pseudo-inverse technique in which the projection matrix may be approximated depending on the residue values. After this, the intrinsic and extrinsic parameters of the camera can be found. The relation between the intrinsic parameters, which represent the inner camera imaging parameters (α, β, u0, v0), and the extrinsic parameters, which represent the coordinate transformation between the camera coordinate system and the world coordinate system (R and t), can be expressed as follows (Wu, Y., 2004): Applying QR-decomposition technique, the intrinsic and extrinsic parameters can be found and the result is the essential tool for the camera calibration process.

Stereo Camera Calibration
Calibration is the process to establish an accurate relationship between the coordinates of a point within an extracted image and its corresponding real world values, which has been modeled in Section 2. The process begins with acquiring an image of the desired scene within the robot working volume. In this research, Bumblebee stereo camera from Point Grey Research has been used, which is mounted on the frame with pan-and-tilt head as shown in Fig. 2(a). This camera takes the image of the tool center point of gripper attached at the end of the ABB IRB-140 industrial robot as shown in Fig. 2 (4); therefore, the seven calibration points ( 7 = n ) result in [14×11] matrix for the right and left camera in the stereo camera configuration. Using the corresponding image coordinates for those seven calibration points, the projection matrix, M can be obtained using the SVD technique for the equation (5) in Section 2 and express as: where 34 m is set equal to 1. The M matrices for the right and left camera are used to produce the following two equations:  (8), and solve for their corresponding world coordinate values. This procedure is repeated for 7 = n to 21, which are randomly chosen calibration points in order to gauge the optimal number of initial calibration positions for producing the most accurate results. The accuracy of the proposed algorithm is verified by inputting the original point coordinates used to create the projection matrices into the model. It is expected that if the same image coordinates are entered into the module, the result would also be the same world coordinate values. This verification is also performed to find the accuracy of inputting the world coordinates in order to produce the same image coordinates. Validation is achieved by moving the robot to ten (10), new arbitrary positions and their world coordinates found by the proposed algorithm. The values are crossed checked against the values from the robot controller and their error is calculated. This validation approach is also repeated for a set of ten specified positions: a horizontal movement of the manipulator across the scene from left to right, changing only the y-direction while keeping the x and z-directions fixed.

Lens Distortion Removal
The traditional calibration techniques based on ideal pinhole camera model methods suffers from inaccuracy of the results because the lens distortion effect is ignored (Do, Y., 1999). The camera used in this research has the lens distortion as shown in Fig. 3, which must be corrected to reduce the error otherwise ( ) coordinates match with distorted ( ) v u, coordinates. Therefore, in this research lens distortion is modeled and corrected for improving the calibration accuracy. Ojanen (1999) defined the mapping or a fixed object plane distance as follows: A point in the object plane that has a distance r from the optical axis is imaged to a point in the film plane by the lens, which has a distance L(r) from the axis. A lens with no distortion has ( ) r r L = . Transverse magnification at a distance r from the optical axis is given by . Thus, we can get approximate the formulae for the lens function and magnification in terms of Δ(r) and Δ'(r) (= dΔ/dr) as follows: The process of correcting lens distortion starts with capturing an image of a rectangular grid of dots, where the vertical and horizontal distances between the dots are known; in this research 0.5 inches. Then, the lens distortion is modeled by applying the equations (10) and (11) for the captured image with the resolution of 1024 by 768 (pixels), which is shown in Fig. 4. The distortion model is then optimized for improving the accuracy by examining different polynomial functions for Δ(r) such as polynomial, sinusoidal and cubic-spline function. This optimization process is shown in Fig. 5. After the optimization process, parameters such as distortion functions are saved into a file and used to rectify all images taken by the camera. This entire process is conducted for both left and right camera in the stereo camera used in this research. Finally, corrected images without lens distortion is used for the calibration. Fig. 6 compares captured images before and after lens distortion removal.

Experimental Results
In this research, camera calibration is performed for two cases: (i) a randomized set of calibration points; and (ii) a designed set of calibration points from the robot working volume. For each case, the number of calibration positions (n) ranges from 7 to 21. Based on the experimental results for these two cases, the overall calibration performance is evaluated and the economical number of calibration points is also determined. Fig. 7 shows the randomly chosen calibration points used in this research as they appear on the image captured.

Randomized set
Here, the yellow dots represent the location of the TCP as marked by the crosshairs. In fact, calibration has been performed with different number of points ranging from 7 to 21. Although the world coordinates of these points are widely spread in the robot working volume, they appear to irregularly spread in certain areas in the image captured at a specific orientation of the stereo camera. For example, with 21 calibration points shown in Fig. 7, rightupper corner of the captured image is calibrated with relatively more points (13,16,20) than lower-left corner with only one calibration point (17). This leads to the variations of the camera calibration such that visual servo control in some area with more calibration points would be outperformed to the same control for other area with relatively less number of the calibration points. The proposed calibration algorithm is verified by the average error (R) between the actual coordinates and the coordinates obtained from the projection matrices, which can be expressed as: Fig. 8 shows the change of R-value with respect to the different number of calibration points ranging from 7 to 21. It can be observed from the figure that the error increases as the number of calibration points increases. This is because as the number of calibration points increases, the size of the A-matrix increases while the number of unknowns remain same. Therefore, the possibility for higher error is introduced when attempting . Test points for model validation to recalculate the initial values used to formulate it following a normal regression trend. On the other hand, ten new points (test point 1 to 10) have been used for the validation of the calibration model as shown in Fig. 9. Here, the ten points are differentiated by only ycoordinates maintaining x-and z-coordinates same. Depending on the setting of camera orientation, those points may appear not straight in the image plane even though they are all on the straight line physically. For each test point, image is captured and corresponding world coordinates have been calculated using the calibration model with different calibration points from 7 to 21. Fig. 10 shows the percent error of x-coordinate with respect to the different number of calibration points for test point 1 to 10. It can be observed from Fig. 10 that test positions 1 and 2 follow a decreasing error trend while the rest, with the exception of position 8, stay within a range of error between 0.5% and 3%.

P e r c e n t E rr o r o f Z -C o o r d in a te V a lu e s
Posi ti on 1 Posi ti on 2 Posi ti on 3 Posi ti on 4 Posi ti on 5 Posi ti on 6 Posi ti on 7 Posi ti on 8 Posi ti on 9 Posi ti on 10 Fig. 11. Percent error of y-and z-coordinate Position 8 experiences a higher percent error averaging 3.58% and does not improve as n increases. It is conjectured that position 8 locates where relatively less number of calibration points has been used for the calibration process so that it has relatively large error. This becomes more evident when the percent error of yand z-coordinate is analyzed as shown in Fig. 11. The test points close to the greater populated calibration areas produce less percentage error. Yet, test positions such as test positions 4 and 5 which lie within lesser populated areas steadily increase in error as n increases. This is not the case in the x-coordinate results because in this case the y-coordinate is the only coordinate direction changing between positions; therefore making it more susceptible to error. It should be pointed out that the results contradict the effects of lens distortion. More specifically, as previously shown, the image is less distorted near the center while experiences the most distortion on the outer edges. Therefore, it is expected that the test points close to the center of the image should have the least error. Nevertheless, Position 4 is very close to the center of the image and possesses the most error. This is conjectured by two possible occurrences: either the lens distortion type is actually a combination of barrel and pincushion or the placement of the calibration positions mainly effects the accuracy and effectiveness of the calculated projected matrices used within the calibration algorithm. These observations lead to the creation of a more systematic method of positioning the calibration positions in order to improve the overall error experienced by these test positions.

Designed set with lens distortion
The designed set of calibration points are chosen in a manner that points are systematically spread out the calibration positions in the robot working volume as illustrated in Fig. 12.
Note that the grid shown with dashed line in the figure covers most of the robot working space with various orientation of the gripper, so-called dexterous space. Actual size of grid is X by Y by Z. In this case, the number of calibration points ranges from 8 to 21. Following the same procedure used in the randomized set case, R-value has been obtained as shown in Fig. 13 and compared to the randomized case. It is observed that R-value error in the designed case is larger than the randomized case; however, the error no longer increases with an increase of the number of positions, while in the randomized case the error increases as the number of calibration points increases. It should be pointed out that the errors for n = 14 and 18 are relatively larger than other points because there is an absence of calibration points inside the grid block. Although the same configuration can be noticed for n = 10, 12 and 16, the contributions of the other points provide enough information to maintain the error within range in these points. The calibration model in the designed case has been validated using ten test positions as in the randomized case. The behavior of the xcoordinates in the case of designed set is shown in Fig. 14 and it is observed that error decreases as the number of calibration positions, n, increases. Also, the same situation occurs as in the randomized case where the test positions located toward the center of the image are experiencing larger errors than those located along the border of the images. This implies that although introducing large amounts designed calibration positions is capable of reducing projection errors, lens distortion has still a significant influence. Fig. 15(left) presents R-value error obtained from undistorted images of the designed set of calibration points. Here, the error appears to stabilize more and no obvious fluctuation is observed. In addition, the amount of errors decreases from 15% to 5% with the correction of lens distortion. It should be pointed out that for n ≥ 9 the R-value error becomes stable. However, this should be supported by the model validation using ten test points. 25.00 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2  shows the results of the percent error of ycoordinate for example, which has been measured with undistorted images of the designed calibration points for ten test points. Percent error has been reduced to about 2% due to the correction of lens distortion. Note that the percent error becomes stable from the number of calibration points n = 11 so that any number of calibration points greater or equal to n = 9 can be recommended for the real applications. It should be pointed out that the calibration module developed in this research is a part of a fully automated visual servo control system for ABB IRB-140 robot manipulator. Fig. 16 presents the flow chart of the visual servo system and snap shots of each stage in the whole system. For more details about the visual servo system software, contact the authors.

Conclusion
In this paper, an efficient stereo camera calibration method has been developed and implemented in a fully automated visual servo control system for an industrial robot. To ensure the efficiency, the single value decomposition technique has been used for the solution of mapping between camera and world images in the stereo camera. To improve the calibration accuracy, randomized set and designed set of the calibration points have been used to establish the mapping relationship and it has been shown from the experiment that the designed set improved the calibration accuracy from with about 4% error to about 2% error. It should be pointed out that the lens distortion has been successfully modeled and corrected in this research. In addition, from the Fig. 16. Fully automated visual servo control system experiment any number of calibration points greater than or equal to n = 11 can be suggested for the real application with almost 2% error. In this research, a fully automated visual servo control system using Matlab software with the calibration algorithm has been developed and applied for a simple pick-and-place operation using ABB IRB-140 industrial robot. In the future, the developed visual servo control system will evolve for more complication application such as robotic deburring process in an engine block.