Moving target tracking method for unmanned aerial vehicle/unmanned ground vehicle heterogeneous system based on AprilTags

Using the characteristics of unmanned aerial vehicle/unmanned ground vehicle, heterogeneous systems can accomplish many complex tasks cooperatively. Moving target tracking is an important basis for the relative positioning and formation maintenance of heterogeneous cooperative systems. This paper first introduces the unmanned aerial vehicle/ unmanned ground vehicle collaborative tracking task and heterogeneous system. In order to maintain the original stability of unmanned aerial vehicle, a control method based on SBUS protocol to simulate remote control is proposed. About unmanned ground vehicle with Mecanum wheel, a detailed description of control method is designed. For the problems of real-time performance and occlusion, a tracking scheme based on AprilTag identification is studied. The scheme tracks the Tag target in the case of no occlusion. When occlusion occurs, the scheme tracks the color feature around the Tag. The accuracy of the tracking algorithm and the problem of occlusion are greatly improved. Finally, the scheme is applied to the heterogeneous systems. Simulation and experimental results show that the proposed method is suitable for unmanned aerial vehicle/unmanned ground vehicle heterogeneous system to perform the collaborative tracking task.


Introduction
With the rapid development of science and technology, the applications of aerial robots (also known as unmanned aerial vehicles (UAVs)) and ground robots (unmanned ground vehicles (UGVs)) are receiving more and more attention from researchers all over the world. Recently, UAV/UGV heterogeneous system [1][2][3] has become a hot topic. In many missions, UAVs can quickly scout vast areas. On the contrary, UGVs can accurately locate ground targets and complete complex interactions, but move slowly. Therefore, UAV and UGV can cooperate to finish many complex tasks.
In UAV/UGV heterogeneous system, insufficient real-time performance and accuracy will lead to failures in collaboration. Therefore, the target tracking algorithm is an important basis for UAV/UGV to make relative positioning and formation. [4][5][6] The task of the target tracking is to establish the positional relationship of object in a continuous video sequence. Since the attitude, scale, occlusion, and light are constantly changing during the movement, many researches have been carried out.
Moving target recognition and tracking mainly include the following classic methods: frame difference method, 7 MeanShift algorithm, 8 and optical flow method. 9 The method of frame difference calculation is fast and usually used in camera calibration. MeanShift is simple and easy to implement. However, the tracking performance is not good for targets that move faster or change in scale. CamShift [10][11][12] was improved by the MeanShift to solve the scale problem. CamShift converts the image from RGB space to HSV space, to reduce the effect of illumination. Since only the color feature is used, the tracking performance is poor when the color of background is complex or close to the color of the target. The optical flow method is easily affected by illumination. To find the target again after losing the target, a combination of detection and tracking is proposed, such as the TLD (tracking-learning-detection) method. 13 TLD algorithm achieves long-term tracking of single targets and solves the problem of losing target which is caused by target deformation and partial occlusion. However, in global search, the real-time performance is not satisfactory. The correlation filtering algorithm adopts the local search method which has obvious advantages in real-time performance, but it is difficult to track the target with high speed and longer occlusion time, such as KCF (kernel correlation filter) 14 based on HOG (histogram of oriented gradients) feature. In 2016, Danelljan et al. 15 proposed the ECO (Efficient Convolution Operators) target tracking algorithm after C-COT (Learning Continuous Convolution Operators for Visual Tracking), 16 which greatly improved the target tracking speed. The ECO deep learning version can reach 8 FPS/s, and the ECO-HC can reach 60 FPS/s, which is a better tracking algorithm. However, heterogeneous system requires a high level of accuracy and real-time tracking, so they are not suitable for installation on embedded devices.
Many scholars apply tracking algorithms to unmanned devices. Qu et al. 17 studied long-term reliable visual tracking of UAVs. They compared KCF and TLD with their proposed algorithm. However, UAV is only used to shoot videos as a test dataset for comparison and the research has not been verified by physical experiments. AprilTag 18,19 is a visual reference library that is widely used in UAV positioning guidance. Xiao et al. 20 used AprilTag to visually locate tethered UAV, but their experiment was carried out in a barrier-free indoor environment. Wang et al. 21 used AprilTags to implement UAV tracking of UGV, but the method was only tested in Gazebosim simulation software.
This paper first introduces a UAV/UGV heterogeneous system. Then a quadrotor control method based on SBUS protocol to simulate remote control (RC) is proposed. The design and control of UGV with Mecanum wheel are also introduced. To improve the real-time performance and accuracy, a tracking scheme based on AprilTag is proposed and color information is added around the original Tag in the case of occlusion. In experiment, the tracking scheme is loaded into the performed UAV/UGV heterogeneous systems and the result shows that the proposed method is suitable for collaborative tracking task. The experiment consists of two parts: first, a simulation experiment is used to verify the feasibility of the AprilTag algorithm and, second, a physical experiment is completed by the heterogeneous system composed of the quadrotor and the Maltese unmanned vehicle. The main contribution of this paper is the establishment and verification of the system that can provide a physical experiment platform for different control and image processing methods.

System structure
The heterogeneous system proposed in this paper is composed of UAV and UGV. The prototype is developed with quadrotor as UAV and Mecanum vehicle as UGV, and the system is tested in many physical experiments at last. Therefore, ''UAV/UGV heterogeneous system'' introduces the structure of UAV/UGV heterogeneous system with quadrotor and Mecanum vehicle as examples, and section ''UAV/UGV tracking system'' introduces the collaborative tracking function of the heterogeneous system.

UAV/UGV heterogeneous system
In the air, UAV with high speed can detect vast areas and make aerial fire suppression, but it is limited in the accuracy of localization on the ground. On the contrary, UGV has the disadvantage of not being able to move rapidly or see through obstacles, but it can carry out complex and accurate interaction with environment. The heterogeneous system consisting of UAV and UGV is not simply changed from ''single agent'' to ''multiple agents.'' The heterogeneous feature will bring unique advantages to collaborative mission and make 1 + 1 .

2.
A UAV equipped with camera and other sensors can obtain a two-dimensional (2D) horizontal image of the environment in front of UGV. It supplements the obstacle information in front of UGV, so it can provide local/global image information for UGV obstacle avoidance. Based on the mutual awareness of UAV/ UGV, heterogeneous system can achieve complex tasks such as cluster formation, avoidance guidance, and information fusion ( Figure 1).

UAV/UGV tracking system
In Figure 1, the gray block in the diagram is relative positioning and it is particularly critical for the heterogeneous system. Although GPS (Global Positioning System) can achieve relative positioning of two types of unmanned devices, the accuracy is low. Collaborative positioning becomes an important connection between UAV and UGV for the heterogeneous collaborative system to complete collaborative tasks. Since the collaborative system works relatively close, the tracking accuracy and real-time performance are the key issues that must be solved.
The tracking system of this paper consists of two parts: UAV tracking subsystem and UGV tracking subsystem ( Figure 2). UGV automatically tracks front moving targets with AprilTag. UAV can automatically track UGV and get the relative position and attitude between UGV.
In the figure, the upper left is a UAV, which is equipped with a camera that recognizes the UGV below. In the middle is UGV, which has an AprilTag tag (tag36h11_1) on its deck. The UGV is equipped with a camera that recognizes moving targets in front. The lower right is a moving target with an AprilTag attached.

Design and control of UAV
The UAV used in the heterogeneous system is a quadrotor. The main structure consists of a frame, four motors, four electronic governors, a signal translator, a flight controller, and wireless communication equipment. Accelerometer, gyroscope, magnetometer, and barometer on the flight controller can realize attitude solving. The structure of hardware is shown in Figure 3.
OpenMV camera can acquire the images in real time and send the processed coordinates to the UAV. The UAV automatically tracks the target according to the coordinates. In this system, RC has the highest control right and can be switched to manual mode at any time.
When the UAV loses its target, it ensures the safety of the system.

UAV modeling and automatic control
At present, there are many control algorithms for quadrotor, such as linear quadratic regulator (LQR), 22 adaptive control, 23 genetic algorithm, 24,25 proportionalintegral-derivative (PID) controller, 26 and so on. To   ensure the stability of control, PID method is adopted in the paper and later physical experiments. First, the model of quadrotor is established. Assuming that the quadrotor is rigid and symmetrical, and the origin of the body frame is consistent with the center of gravity, the quadrotor dynamics model can be described as 27 where u is the pitch angle, c is the yaw angle, f is the roll angle, g is the gravitational acceleration, and d x , d y , and d z are the additional disturbances caused by the external force. The total thrust on the z-axis is expressed as where m is the mass of the quadrotor and F i (i = 1, 2, 3, 4) is the thrust of the four rotors. u 2 and u 3 are the pitch and roll inputs, respectively, and u 4 represents the yawing moment.
The structure of the PID controller system is shown in Figure 4.

Analog RC based on SBUS protocol
In this paper, the main contribution of UAV control is to propose a multi-rotor autonomous flight control method based on SBUS protocol analog RC. A signal generation device is used to simulate the generation of RC signals to control UAV autonomous flight. This method can maintain the original stability of UAV and reduce the workload of developing flight control. The method uses a signal generation device to simulate the RC signals to control UAV. A Pixhawk flight controller is used and connected to a signal translator ( Figure  5). The function of signal translator is to decode and encode the input signal of RC and generate the control signal for flight controller, so as to realize automatic/ semi-automatic control of the UAV.
The UAV has three flight modes: manual RC mode, automatic flight mode, and emergency stop mode. The flow chart of control is shown in Figure 6.

Manual RC mode
The receiver receives RC data in real time and sends it to the signal translator through the SBUS protocol. The signal translator decodes and encodes the data and sends the received data to flight controller through SBUS protocol as usual, to achieve the purpose of manual control flight.

Automatic flight mode
The signal translator decodes the data transmitted from the receiver, but does not perform the code transmission to flight controller. Instead, the controller calculates the signal of  each channel and then encodes and transmits these signals to achieve the purpose of automatic flight.

Emergency stop mode
The signal translator immediately sets all channel values to the middle value and then encodes and sends them to achieve the purpose of UAV fixed-point hovering to ensure flight safety.

Target tracking of UAV
The image information under the UAV is collected by an OpenMV camera and determines whether there is a Tag. If there is a Tag, the deviation between the center of the tag and the center of the image is calculated. Then, the control system calculates the control amount of UAV through the incremental PID, encodes the data and sends it to UAV to track the target. If no target is detected or target is lost, UAV automatically hovers until it finds the target or receives landing command.

Design and control of UGV
The UGV is mainly composed of a car body, four motors and their drives, a main control board, and a wireless communication equipment (Figure 7).
OpenMV camera can acquire images in real time and transmit the processed coordinates to UGV, and then UGV tracks and moves to the target autonomously.

Modeling and control of UGV
To simplify the mathematical model of kinematics, there are some assumptions: 1. The omnidirectional wheel will not slip, and the ground has sufficient friction; 2. Four wheels are distributed on four corners of the rectangle or square, and wheels are parallel to each other.
Assuming that the body coordinate system coincides with the geographic coordinate system, the Mecanum UGV motion direction is specified (Figure 8).
According to Figure 8, the motion of UGV can be linearly decomposed into three components: movement along X, Y directions and rotation around Z direction. V A , V B , V C , and V D represent the velocity of four wheels Motor A, Motor B, Motor C, and Motor D, respectively. V x is the velocity of UGV along the x-axis, V y is the velocity of UGV along the y-axis, and v is the angular velocity around the z-axis. Here, a is half the width of the UGV and b is half the length of the UGV.
When UGV goes along the x-axis, the following equation is obtained  When UGV goes along the y-axis, the following equation is obtained When the car rotates around its geometric center, the following equation is obtained Based on equations (2)-(4), the velocity of four wheels can be calculated according to the status of UGV Using C language (the input parameters are V x , V y , and v), the speeds of four motors are calculated and sent to the PID controller of UGV.

Control mode of UGV
UGV has two control schemes: remote and automatic control. Each control scheme has two motion modes: speed and displacement modes. RC mode receives data from RC or mobile phone. Automatic mode receives the commands transmitted by other controllers through the serial port.
Speed control mode. When UGV is in speed control mode, the format of speed control data in each frame is shown in Table 1. According to Figure 8, Tx [1] controls the speed of the A motor; Tx [2] controls the speed of the B motor; Tx [3] controls the speed of the C motor; and Tx [4] controls the speed of the D motor.
Tx [7] is the direction control bit and has 8-bit data. Higher 4 bits are default and lower 4 bits control the direction of four motors ( Table 2).
Displacement control mode. When UGV is in displacement control mode, displacement is input. Mecanum UGV automatically performs kinematic analysis and converts displacement to speed. A 16-bit unsigned number synthesized by Tx [1] and Tx [2] controls the xaxis displacement; a 16-bit unsigned number synthesized by Tx [3] and Tx [4] controls the y-axis displacement; a 16-bit unsigned number synthesized by Tx [5] and Tx [6] controls the z-axis displacement. The format of coordinate-displacement control data in each frame is shown in Table 3. Tx [7] is the direction control bit and has 8-bit data. Higher 5 bits are default and lower 3 bits control the direction of three axes (Table 4).

UGV target recognition and tracking
UGV also has an OpenMV camera to detect image information in real time and determines whether there is a Tag. If there is a Tag, the deviation between the center of the tag and the center of the image is calculated. Then, the deviation is taken as the input of incremental PID to control the rotation of the two-axis pan/ tilt. At the same time, UGV plans the motion trajectory according to the current angle and distance from target. If no target is detected or target is lost, UGV stops moving and the two-axis pan/tilt performs a global search in 180°+ 60°.

Tracking scheme based on AprilTag
AprilTag 28 is an improved visual positioning system based on ARToolkit 29 and ARTag. 30 It is a visual reference library 31 and widely used in robots and UAV positioning guide. 32 AprilTag uses a simple Quick Response code (QR code) which has only 4-to 12-bit data and can be detected more robustly and from longer ranges.
AprilTag not only can identify and track the target but also can get the three-dimensional (3D) pose of the target. As long as the camera resolution, focal length, and the size of tag are known, the algorithm can identify the type, ID, distance, and attitude of the tag.

Detection and identification of Tags
The Tag is a quadrangle that is inner black and outer white, as shown in Figure 9.
The tag detection algorithm begins by computing the gradient at every pixel, including their magnitudes (Figure 10(a)) and direction (Figure 10(b)). Using a graph-based method, pixels with similar gradient directions and magnitude are clustered into components ( Figure 10(c)). Weighted least square is used to fit these pixels of each component with a line segment ( Figure 10(d)). The direction of the line segment is determined by the gradient direction so that segments are dark on the left and light on the right.
At this point, the Tag is transformed into a set of directed segments, and then the sequence of segments of the quadrilateral is calculated. The method used is based on a recursive depth-first search with a depth of 4. 18

Calculation of the distance and angle from Tag to camera
In homography transformation (one plane is mapped to another) and external parameter estimation, a 3 3 3 homography matrix (a conversion matrix when mapping, and the matrix is usually represented by H) needs to be calculated. It maps the coordinate system of the Tag to a 2D image coordinate system. Homography matrix is calculated by direct linear transform (DLT) algorithm. The position and orientation of the Tag require additional information, namely the focal length of camera and the physical size of Tag. The 3 3 3 Table 4. Format of Tx [7] in position control mode.    R 01 Here, h ij is the element of homography matrix H. f x and f y are the focal lengths of camera, respectively. It is not possible to solve E directly because P is not full rank. By calculating the right side of equation (6), each h ij can be written into a set of equations The elements of R ij and T k can be easily determined. Each column of the rotation matrix must be a unit vector, so the limitation can be satisfied by different s. Since the rotation matrix has only two columns, s can be set as the geometric mean of the amplitude of matrix. Because the columns of rotation matrix must be orthogonal, the third column can be recovered by computing the cross product of two known columns.
The above DLT process and normalization process cannot guarantee that the rotation matrix is strictly orthogonal. So, to solve this problem, R can be decomposed by polar coordinates to generate a Frobenius matrix norm with minimum error.
Finally, by homography matrix, the relative coordinate system of tag is mapped to the coordinate system of image. Furthermore, the distance and angle from Tag to camera are finally obtained.

Improved tracking based on color histogram
Color histogram is a statistics of color distribution, and it is not affected by the changes of shape and attitude. Color histogram is used to improve the tracking method to get better stability and anti-occlusion ability. To reduce the influence of light changes, the color histogram is selected under the HSV (hue, saturation, value) color system. The three components of HSV are quantified separately according to how they are sensitive to color changes. Suppose the values of the three components after quantization are f0, 1, . . . , L H À 1g, f0, 1, . . . , L S À 1g, and f0, 1, . . . , L V À 1g, respectively, design a vector in the form of ½H, S, V, so its range is: f0, 1, . . . , L H , À 1, . . . , L H + L S À 1, . . . , L H + L S + L V À 1g. Suppose the number of pixels of color i is m i , the total number of pixels of the image is The probability p i of color i is defined as a color histogram Since the color histogram is a vector, the Bhattacharyya distance can be used as a measure of similarity of two histograms when tracking. The calculation of Bhattacharyya distance is Here, r is the Bhattacharyya coefficient of two histograms; p is the histogram of target; q is the histogram of template; and d is the Bhattacharyya distance of two histograms. The smaller the value of d, the higher the similarity of the two histograms.

Simulation and experiment
Simulation of tracking target KCF is a common tracking algorithm and it is used to make comparison with AprilTag. KCF abstracts the tracking problem into a linear regression model. To adapt to the deformation of target, KCF is modeled by ridge regression that has the feature of regularization. The objective function of the ridge regression is Here, l is a regularization parameter. Calculate the derivative of equation (12) about w and let it be equal to zero. So the extreme value is In equation (13), each row of X represents x i , y is a Gaussian regression label, and I is an identity matrix. Equation (13) can be written as a complex domain form where X H is a complex conjugate transpose matrix. To avoid calculating the inversion of matrix and accelerate the calculation, equation (14) is transformed to Fourier domainŵ =x Yŷ wherex is a fast Fourier transform of x, andŷ is a fast Fourier transform of y, and Y represents the Hadamard product.
The simulation is carried out in a computer that has Intel Core i5-7300HQ CPU with 8GB memory. The operating system is 64bit Windows10 that has installed MATLAB 2016a and OpenMV IDE. For comparison, both algorithms select tag36h11_1 as the target.

Comparison between KCF and AprilTag
After running KCF, select the tag36h11_1 tag as the initial frame (Figure 11(a)), and AprilTag also selects the tag36h11_1 tag as the tracking target (Figure 11(b)).
After the selection of initial frame, both algorithms can accurately identify and track the target. To improve the running speed of AprilTag in embedded devices, the resolution is appropriately reduced and there is more noise (Figure 11(b)). However, it does not affect the accuracy of recognition. Figure 12 shows the results of KCF and AprilTag when occlusion occurs. When occlusion just occurs, KCF can still determine the probable location of the target (Figure 12(a)), but AprilTag will lose the target (Figure 12(b)). It is because that KCF uses real-time online training to handle occlusion problems. Once the target is occluded, AprilTag will lose some features, hence the tracking fails.
After occlusion for a short period of time, KCF loses target (Figure 12(c)) and AprilTag still tracks target ( Figure 12(d)). The comparison in Figure 15 shows that AprilTag can re-identify and track the target as soon as occlusion is removed.
When UGV moves, the two methods are also tested. Both algorithms are able to identify and track the target in low speed. When the speed of UGV is fast or suddenly changes, KCF loses its target (Figure 13(a)), while the AprilTag performs well (Figure 13(b)). KCF uses a local search method, so it will lose the target with high speed.
Since KCF uses a local search method, when the target motion speed is too fast, exceeding the search range will cause the target tracking to fail.

Effect of color features on the AprilTag
As can be seen from Figure 12(b), once the Tag is occluded, the target will be lost. This can seriously  affect the tracking of moving targets. Color information around the tag is added, to prevent the loss of target caused by occlusion. Figure 14 shows the comparison of tracking with or without the color feature.
When the Tag is occluded, AprilTag improved by color feature will identify the blue information around the Tag to ensure that UGV is always tracked until the Tag appears again. When the Tag is visible, it is automatically switched to the original method. Since color feature is easily affected by environmental factor, it is only used as a supplementary to tracking.

Tracking experiment
Tracking experiment of UGV. Since UGV uses flat view to track target and target moves in a complex background, the algorithm is original AprilTag without adding color feature. If target is lost, UGV will stop moving. Gimbal camera will rotate to making global search until camera detects the target again. Figure 15 shows that UGV can adjust the speed according to the distance from target, so the distance between UGV and target is basically unchanged.   Collaborative tracking experiment of UAV/UGV. Since UAV uses top view and the color features around UGV are not too complicated, the algorithm adopts AprilTag improved by color feature. When the target is lost, UAV will enter hover mode and wait for the target to reappear or the RC command.
In initial state, UAV parks on UGV as shown in Figure 16(a). In Figure 16(b), UAV takes off automatically and flies to a predetermined altitude. Figure 16(c) shows that the proposed method can help UAV/UGV heterogeneous system to complete the moving target tracking task successfully in good lighting. Figure 16(d) shows that the tracking effect is still good at night, when the lighting conditions are poor, and even total reflection occurs.

Conclusion
The focus of this paper is on the integration of UAV/ UGV system. A heterogeneous collaborative system consisting of UAV and UGV is designed. Then, an analog RC based on the SBUS protocol is proposed for UAV. Next, a UGV with omnidirectional wheel is also  designed for the heterogeneous system. To improve the effectiveness and accuracy of tracking, a tracking scheme based on AprilTag is studied. By simulation and experiment, the proposed heterogeneous collaborative system can realize the real-time tracking of moving target and the conclusions are as follow:  1. Analog RC based on the SBUS protocol can maintain the original stability of UAV and reduce the workload of developing. 2. UGV has a variety of control modes to adapt to different tasks. Omnidirectional wheels enable UGV to meet multiple challenges from various environments. 3. The tracking scheme based on AprilTag runs well in embedded devices. The effectiveness and accuracy are satisfied. The scheme is suitable for UAV/UGV heterogeneous system to track moving target.

Further work
The main contribution of this paper is the establishment and verification of the system that can provide a physical experiment platform for different control and image processing methods. Next, we will experiment with several tracking methods in an unknown complex outdoor environment. In future, a method of separating image acquisition and image processing is being studied and it will indirectly realize the complex tracking algorithm in embedded device. Now, we have basically implemented the tracking based on arbitrary features, and Figure 17 shows some preliminary results which still needs some improvement.
In addition, the proposed heterogeneous system is being improved and further tested by a new structure that consists of a tethered UAV and a sport utility vehicle (SUV). In Figure 18, the tethered UAV has been developed and tested. The tethered UAV is a four-axis eight-paddle UAV which has stronger power and more stable flight. The electro-optical pod is equipped with an HD sport camera and tethered cable is responsible for electric energy transport and data return. In the future, the modified SUV will be added to the heterogeneous system for more actual experiments.