A bagging tree-based pseudorange correction algorithm for global navigation satellite system positioning in foliage canyons

Global navigation satellite system is indispensable to provide positioning, navigation, and timing information for pedestrians and vehicles in location-based services. However, tree canopies, although considered as valuable city infrastructures in urban areas, adversely degrade the accuracy of global navigation satellite system positioning as they attenuate the satellite signals. This article proposes a bagging tree-based global navigation satellite system pseudorange error prediction algorithm, by considering two variables, including carrier to noise C/N0 and elevation angle θe to improve the global navigation satellite system positioning accuracy in the foliage area. The positioning accuracy improvement is then obtained by applying the predicted pseudorange error corrections. The experimental results shows that as the stationary character of the geostationary orbit satellites, the improvement of the prediction accuracy of the BeiDou navigation satellite system solution (85.42% in light foliage and 83.99% in heavy foliage) is much higher than that of the global positioning system solution (70.77% in light foliage and 73.61% in heavy foliage). The positioning error values in east, north, and up coordinates are improved by the proposed algorithm, especially a significant decrease in up direction. Moreover, the improvement rate of the three-dimensional root mean square error of positioning accuracy in light foliage area test is 86% for BeiDou navigation satellite system/global positioning system combination solutions, while the corresponding improvement rate is 82% for the heavy foliage area test.


Introduction
Nowadays, with the rapid development of intelligent and modernized urbanization, severe environmental issues like overcrowding, air pollution, climate change, and ecocide are emerging in the social and economic development of cities. 1 To promote the green and sustainable urbanization, the tree canopy which is full of layers of leaves and branches that cover the ground or on the roadsides are regarded as a functional management of environmental benefits and mental health. In addition, global navigation satellite system (GNSS) is also widespread as an underpinning technology for the intelligent transportation system in the modern and smart cities. 2 However, the multipath interference and non-line-of-sight (NLOS) receptions, which could degrade the GNSS positioning performance in city areas, are caused by signal blockage, reflection, and diffraction of the tree canopy.
Many studies have shown the performance of the GNSS in the foliage area. Positioning accuracy and precision are examined in typical peripheral canopies on Irish forest roads. 3 Seasonal parameters of deciduous tree foliage are studied in affecting GNSS signal attenuation. 4 So far, various research studies have been conducted to find robust approaches to detect, classify, and eliminate the NLOS reception and multipath interference. 5 The main solutions of NLOS reception/multipath interference elimination methods could be classified as antenna design, signal processing, and measurement-based modeling methods. 6 It has been shown that the antenna design method such as dual-polarization antenna, choke-ring antenna, and antenna array technology are significant in NLOS and multipath detection and mitigation at low elevation angles. 7,8 However, high-cost and bulky-size make it inappropriate for widespread application in either highly built area or dense foliage streets.
The signal processing method is about mitigating the multipath interference by minimizing the space between the receiver code correlators. 9 A more comprehensive review of signal processing-based multipath mitigation methods is described. 10 Furthermore, Newton methodbased fast iterative maximum-likelihood algorithm (FIMLA), advanced receiver-architecture vector tracking-based method, and probabilistic neural network (PNN)-based methods are functional methods to mitigate multipath. [11][12][13] However, signal processingbased method is invalid for NLOS signal elimination.
Measurement-based method is considered as integrating other source of information such as sensors, spatial, and geometric information with the GNSS systems to compensate the signal blockage to enhance the performance of positioning. For the GNSS with sensor information measurement-based method, the integrated GNSS with inertial navigation system (INS) was proposed which provide timely position, velocity, and attitude information by dead reckoning system for positioning improvement of a moving object. High accuracy-level INS system is always expensive for ground-use which is functional but not recommended. [14][15][16] Utilizing dual-frequency observations, code-minus-carrier measurements (CMC) can be obtained to estimate code multipath error and evaluate multipath conditions when carrier-phase and codephase measurements are available. 17,18 To eliminate the multipath effects effectively, the correctness and robustness of distinguishing and classifying the direct, NLOS, multipath signals are essential. The strength of the signal which is defined as the carrier to noise (C=N 0 ) value is a traditional signal identification indicator. The signal with higher C=N 0 value is defined as line-of-sight (LOS), while those with lower C=N 0 value is defined as NLOS and multipath. 19 In addition, the elevation angle (u e ) is another important factor which is combined with C=N 0 to examine the signal types. 20 While it is indicated that the signals from the lower elevation satellites are generally received as NLOS in the urban environment. 21 Besides, the machine learning-based signal reception classification has been widely studies in recent years. 22 The support vector machine (SVM)-based method treats elevation and azimuth angle as the key features to eliminate the multipath effects. 22,23 Yozevitch et al. 24 selected signal strength, elevation, and other variables as the training indicators for decision trees-based method to identify the types of receiving signals. Sun et al. 25 defined nine indicators which are derived from the raw global positioning system (GPS) measurements for the adaptive neuro fuzzy inference system (ANFIS)-based algorithm to classify the signal types. A gradient boosting decision tree (GBDT)-based signal classification method was proposed to improve the signal classification accuracy. 19 To mitigate NLOS and multipath caused by the built-up constructions, a three-dimensional (3D) digital model which provides additional geometric information such as the height of the building, tilt, orientation, and area of the roof is treated as an available measure. 26 Groves 27 indicates that the main limitation of the existing 3D digital models is that they assume that the roof of the buildings are flat, whereas the most roofs of the buildings are rough and even have different shapes in reality.
What's more, it is further difficult for the 3D digital models to present other detailed roadside objects information, such as the moving vehicles, growing trees foliage, and changing irregular shapes of leaves. Above all, it is concluded that 3D digital models cannot be effective to solve the NLOS reception and multipath problem in foliage canyons, while GNSS and INS integration-based methods are effective with high equipment cost in solving foliage canyons.
Thus, to improve the positioning performance in foliage canyons, we have developed a bagging treebased GNSS positioning algorithm with a combined strategy of pseudorange error prediction and correction. The contributions are summarized as follows: 1. A bagging tree-based pseudorange error prediction and correction algorithm is proposed by using the signal strength and elevation angle as the inputs. 2. The GNSS positioning accuracy in foliage canyon could be improved by the pseudorange correction combined strategy. The field test has validated that the proposed algorithm could effectively enhance the GNSS positioning performance in both light and heavy foliage areas. The positioning accuracy of the single or multiple GNSS, that is, BeiDou navigation satellite system (BDS) only and BDS/GPS, is analyzed and compared.

Algorithm framework
The proposed bagging tree-based pseudorange determination algorithm is presented in Figure 1. The main processes could be divided into the training and testing stages.
In the training stage, first, an offline data set is created which are GNSS raw pseudorange measurements including LOS, NLOS, and multipath signals from a known point in the foliage area. Second, calculating the pseudorange error which is equal to the difference between the raw pseudorange measurement and the corresponding geometric ranges from the known station coordinate and satellite ephemeris. Then, as the input features for machine learning, the signal strength (C=N 0 ) and elevation angle (u e ) at each epoch will be labeled with the pseudorange error to obtain the training data set. Finally, utilizing this training data set, a bagging tree-based algorithm is used to develop the rules about the relationship between the input features and the corresponding pseudorange errors. More details about each step will be discussed in the following sections.
During the online testing stage, the extracted rules from the offline stage are being used to predict the pseudorange errors according to the data of newly received variables (i.e. signal strength and elevation angle). Then, the position solutions are calculated with the pseudorange error proposed by the pseudorange correction algorithm based on the predicted pseudorange errors which are treated as the corrections to the newly raw pseudorange measurements.

Feature selection
Signal strength and elevation angle are selected as the inputs: The strength of the signal is represented by the C=N 0 value. C=N 0 is defined as the ratio of the received signal power to the weighted noise power spectral density. It is known that high-elevation satellites should have a high C=N 0 , while NLOS satellites should have a lower received C=N 0 , although NLOS with high C=N 0 can occur. 28 Thus, this indicator is not fully sufficient to identify the received signal.

Elevation angle (u e ).
Generally speaking, the elevation angles of satellites influence the satellite visibility. It is certified that a signal of higher elevation angle is less possible to be blocked by the surroundings. 29 Thus, the satellite elevation is served as a probabilistic LOS indicator. Pseudorange error labeling process Pseudorange error calculation. Theoretically, the raw pseudorange measurement can be derived by multiplying the light speed c with signal propagation time which is the difference value between the departure time from the satellite t s and the arrival time from the receiver t r , expressed as However, the pseudorange measurement is always affected by several errors, such as ionospheric delay, tropospheric delay, receiving clock and satellite ephemeris error, receiving dynamic error, and multipath. The correction of the pseudorange measurement can be improved as where the geometric range is denoted by r;Dr is the pseudorange error; t r and t s are the receiver clock error and the satellite clock error, respectively; I and T are the ionospheric delay and the tropospheric delay, respectively; e is the errors from other sources such as multipath error, receiver noise, and antenna delay. The geometric range r in equation (2) can also be expressed by the following equation where (X , Y , Z) R is the coordinate of the GNSS receiver R, and (X , Y , Z) S is the coordinate of the satellites. As the receiver clock error, satellite clock error, ionospheric delay, and tropospheric delay could be moderated by proper models, and their corresponding residuals are small enough to ignore, and the main errors are considered as multipath errors.
Labeling process. Then, label the pseudorange error Dr to the corresponding values of input variables. From the above data, we can obtain the set that consists of GNSS measurements, containing C=N 0 , u e , and pseudorange error at each epoch in the offline labeling phase. Then, the labeled data set is created to train.

Bagging tree-based pseudorange correction algorithm
Ensemble learning, as a kind of machine learning, aims at increasing the accuracy of the results by combining multiple models instead of using a single model. Compared with the ordinary machine learning methods, the ensemble methods try to create a series of hypotheses and combine them to use. 30 Bagging and boosting are the two main methods of ensemble machine learning. A gradient-boosting decision tree algorithm was designed as a signal classifier. 19 Compared with the boosting technique, bagging is advantageous of minimizing the decision tree training error, reducing the variance of a predictor, and avoiding overfitting. 31,32 A bagging procedure is classified as bootstrapping and aggregation. Bootstrapping is a sampling technique where samples are selected from the origin data set with replacement randomly. Aggregation is a procedure of combing all the possible outcomes of the prediction for the final prediction. The bagging method is shown in Figure 2.
In this study, the bagging method is proposed to form the stronger and robust prediction rules.
Step 1 in bagging is training the multiple learners from different bootstrap samples. A bootstrap sample is obtained by subsampling the training data set with replacement. And the sample size is as the same as that of the training data set. 31 Step 2 in bagging is aggregating the multiple learners by averaging for regression.
The process of the bagging regression tree-based pseudorange error prediction method is as follows: 1. Define an original training data set of n examples: 2. Where x i = C=N 0i , u ei ð Þ , i = 1, 2, 3, :::n. Here, n is the total number of the samples, and Dr n is the labeled pseudorange error of x i . 3. To form a sequence number of the weaker predictor w(x,Ls) by the decision tree algorithm on the sequence of the training data set. Let x i be the input, and L s is predicted by w(x,Ls).

A bagged predictor W aver is obtained as a final
predictor by the average of the w(x,Ls) over S W aver = 5. Once the final predictor W aver is obtained, the corresponding pseudorange errors could be predicted. 6. Positioning solutions are calculated. The newly testing pseudorange measurement can be corrected by subtracting the predicted pseudorange error obtained above.

Experiment setup
In order to validate the proposed pseudorange prediction and determination algorithm in solving the foliage canyon problems, the field test was carried out at foliage area in Jiangjun Road Campus of Nanjing University of Aeronautics and Astronautics, China. This test consists two stages, one stage is to obtain the ground truth and the other stage is to collect the testing data. To obtain the ground truth, two locations in the open sky area near the foliage area were selected to collect reference data for more than 1 h, and then the network real-time kinematic (NRTK) services from Qianxun SI Ltd was used to obtain the centimeter-level positioning solutions for these locations. The accurate coordinates (centimeter-level accuracy) of the reference points in the foliage areas are then obtained by the total station Leica TS02 based on these two control points. The testing data were collected at 10 Hz data rate by a NovAtel OEM7 receiver in two points. One is in a light foliage area and the other one is in a heavy foliage area, as shown in Figures 3 and 4. The summary of the testing data sets is shown in

Results analysis
To assess the positioning performance, the satellite visibility and the dilution of precision (DOP) value are two indicators which are measured and evaluated. The satellite visibility of BDS-only and BDS/GPS combination observations in partial epochs is shown in Figures 5  and 6. As shown below, the number of visible satellites of the multiple constellation strategy is superior to the single constellation strategy. In addition, as the severe signal blockage in heavy foliage, the number of visible   satellites fluctuates more dramatically in both BDSonly and BDS/GPS scenarios than that shown in Figure 5.
The position dilution of precision (PDOP) and horizontal dilution of precision (HDOP) of BDS-only and BDS/GPS combination observations are shown in Figures 7 and 8. It can be seen from the results that both the PDOP and HDOP values of BDS + GPS are lower than that of BDS-only. Besides, compared with the HDOP values in all scenarios, the higher value of PDOP demonstrate that the positioning performance in three dimensions is inferior to that of the horizontal ones. The above situation is more significant as the signal blockage and multipath inference are caused by the heavily surrounding trees in the up direction.
The testing results are shown in Figures 9 and 10, respectively. It can be seen that both the BDS-only and BDS/GPS positioning solutions by the proposed method are closer to the ground truth than that of the conventional results.
In order to verify the performance of the proposed algorithm, the pseudorange error prediction accuracy are evaluated and shown in Figures 11-13. If the absolute value of the difference between the actual measured pseudorange error and the predicted pseudorange error is less than the actual value, it indicated that the proposed algorithm is valid, and invalid when it does not. In BDS-only scenario, the prediction accuracy by the proposed algorithm is 82.45% in light foliage area and 81.48% in heavy foliage area. In the BDS/GPS scenario, the prediction accuracy of the BDS solution and GPS solution by the proposed algorithm is 85.42% and 70.77% in light foliage area, respectively, while the prediction accuracy of the BDS solution and GPS solution       by the proposed algorithm is 83.99% and 73.61% in heavy foliage area, respectively. The testing results show that the proposed algorithm displays better performance in BDS solutions compared to the GPS solutions. This performance advantage of BDS is mainly because that BDS consists of not only the medium earth orbit (MEO) satellites but also the geostationary orbit (GEO) satellites and inclined geosynchronous orbit (IGSO) satellites. The modeling in the proposed algorithm is more effective for the GEO satellites as the GEO satellites are almost stationary, while it is less effective for the MEO satellites and IGSO satellites, which are with large motions. The performance of the proposed bagging tree-based pseudorange correction method in improving the positioning accuracy in various scenarios is shown above. In light foliage area, the proposed bagging tree-based method proposed a 14.39% (2D) and 77.61% (3D)       It can be concluded that the proposed algorithm can achieve a significant improvement in the 2D and 3D positioning accuracy, especially in 3D solutions. In addition, various scenarios indicated that the percentage improvement of the 3D RMSE value is significantly higher than that of the 2D RMSE value, which indicated that the up direction is influenced severely by the foliage canyon environment.

Conclusion
This article proposed a bagging tree-based method to improve the GNSS positioning accuracy in solving foliage canyons by predicting the pseudorange errors and correcting the pseudorange measurement. Based on the bagging tree method, the pseudorange error predicted from the input features of signal strength, and elevation angle, the positioning solutions are calculated with the corrected pseudorange measurement. Field tests are carried out in light and heavy foliage canyons, and the results indicate that the performance of the satellite visibility and DOP indicators of the BDS/GPS scenario are superior to the BDS-only scenario. Besides, like the severe signal blockage and multipath inference in heavy foliage area, the number of visible satellites fluctuates more dramatically and the PDOP value is higher. Moreover, the test results show that the proposed bagging tree-based pseudorange correction method could achieve a significant improvement in positioning accuracy compared to the conventional positioning calculation. The pseudorange error prediction accuracy is evaluated which demonstrated that the proposed algorithm tends to perform better in BDS satellites for the stationary character of the GEO satellites. 33 The BDS/ GPS error value in the east, north, and up direction by both the conventional method and the bagging treebased pseudorange correction-based method illustrated that the up coordinate error values could be decreased significantly from more than 30 to 10 m. Furthermore, as the signal blockage and multipath by the tall foliage in the up direction, the improve rate of the 3D RMSE value in much higher than the 2D RMSE value. With the proposed algorithm, the BDS-only solution in light foliage area provides a 14.39% and 77.61% improvement in 2D and 3D RMSE, and the BeiDou/GPS combination solution in light foliage area shows a 26.89% and 85.79% improvement in 2D and 3D RMSE. With the proposed algorithm, the BDS-only solution in heavy foliage area provides a 13.82% and 61.69% improvement in 2D and 3D RMSE, and the BeiDou/ GPS combination solution in heavy foliage area shows a 28.17% and 82.17% improvement in 2D and 3D RMSE. In future studies, it is advised to improve the weighting strategies in the multi-GNSS constellation system for the positioning improvements.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.