RadarSLAM: A robust simultaneous localization and mapping system for all weather conditions

A Simultaneous Localization and Mapping (SLAM) system must be robust to support long-term mobile vehicle and robot applications. However, camera and LiDAR based SLAM systems can be fragile when facing challenging illumination or weather conditions which degrade the utility of imagery and point cloud data. Radar, whose operating electromagnetic spectrum is less affected by environmental changes, is promising although its distinct sensor model and noise characteristics bring open challenges when being exploited for SLAM. This paper studies the use of a Frequency Modulated Continuous Wave radar for SLAM in large-scale outdoor environments. We propose a full radar SLAM system, including a novel radar motion estimation algorithm that leverages radar geometry for reliable feature tracking. It also optimally compensates motion distortion and estimates pose by joint optimization. Its loop closure component is designed to be simple yet efficient for radar imagery by capturing and exploiting structural information of the surrounding environment. Extensive experiments on three public radar datasets, ranging from city streets and residential areas to countryside and highways, show competitive accuracy and reliability performance of the proposed radar SLAM system compared to the state-of-the-art LiDAR, vision and radar methods. The results show that our system is technically viable in achieving reliable SLAM in extreme weather conditions on the RADIATE Dataset, for example, heavy snow and dense fog, demonstrating the promising potential of using radar for all-weather localization and mapping.


Introduction
Simultaneous Localization and Mapping (SLAM) has attracted substantial interest over recent decades, and extraordinary progress has been made in the last 10 years in both the robotics and computer vision communities. In particular, camera and LiDAR based SLAM algorithms have been extensively investigated (Engel et al. (2014), Mur-Artal et al. (2015), Zhang and Singh (2014), Shan and Englot (2018)) and progressively applied to various realworld applications. Their robustness and accuracy are also improved further by fusing with other sensing modalities, especially Inertial Measurement Unit (IMU) based motion as a prior (Qin et al. (2018), Campos et al. (2020), Shan et al. (2020)).
Most existing camera and LiDAR sensors fundamentally operate within or near-visible electromagnetic spectra, which means that they are more susceptible to illumination changes, floating particles and water drops in the environments. It is well-known that vision suffers from low illumination, causing image degradation with dramatically increased motion blur, pixel noise and texture losses. The quality of LiDAR point clouds and camera images can also degenerate significantly, for instance, when facing a realistic density of fog particles, raindrops and snowflakes in mist, rain and snow. Studies of the degradation of LiDAR sensors are performed in (Jokela et al. (2019); Carballo et al. (2020)) and suggest that the data of all the tested LiDAR sensors degrades, to some extent, in foggy, rainy and snowy conditions. Given the fact that a motion prior is mainly effective in addressing short-period and temporary sensor degradation, even visual-inertial or LiDAR-inertial SLAM systems are anticipated to fail in these challenging weather conditions.
Radar is a type of active sensor, whose electromagnetic spectrum usually lies in a much lower frequency (GHz) band than camera and LiDAR (from THz to PHz). Therefore, it can operate more reliably in the majority of weather and light conditions. It also has additional benefits, for example, further sensing range, relative velocity estimates from the Doppler effect and absolute range measurement. Recently, radar has been gradually considered to be indispensable for safe autonomy and has been increasingly adopted in the automotive industry for obstacle detection and Advanced Driver-Assistance Systems (ADAS). Meanwhile, recent advances in Frequency-Modulated Continuous-Wave (FMCW) radar systems make radar sensing more appealing since it is able to provide a relatively dense representation of the environment, instead of only returning sparse detections. With these advancements in radar sensing, radar-based SLAM system can be deployed on various platforms and in different environments, for example, surface mining, underground mining, off-road driving.
However, radar has a distinct sensor model and its data is formed very differently from vision and LiDAR. There are different challenges for radar-based SLAM compared to vision and LiDAR based SLAM. For example, its noise and clutter characteristics are complex, for example, electromagnetic radiation in the atmosphere and multi-path reflection, and its noise level tends to be much higher. This means that existing feature extraction and matching algorithms may not be well suited for radar images. Unlike LiDAR sensors, most FMCW radars do not provide 3D elevation information.
In this paper, we propose a novel SLAM system based on a FMCW radar. It can operate in various outdoor scenarios, for example, busy city streets and highways, and weather conditions, for example, heavy snowfall and dense fog, see Figure 1. Our main contributions are: · A robust data association and outlier rejection mechanism for radar-based feature tracking by leveraging radar geometry.
· A novel motion compensation model formulated to reduce motion distortion induced by a low scanning rate. The motion compensation is jointly optimized with pose estimation in an optimization framework. · A fast and effective loop closure detection scheme designed for a FMCW radar with dense returns. · Extensive experiments on three available public radar datasets, demonstrating and validating the feasibility of a SLAM system operating in extreme weather conditions. · Unique robustness and minimal parameter tuning, that is, the proposed radar SLAM system is the only competing method which can work properly on all data sequences, in particular using an identical set of parameters without much parameter tuning.
The rest of the paper is structured as follows. In Sec. 2, we discuss related work. In Sec. 3, we elaborate on the geometry of radar sensing and the challenges of using radar for SLAM. The proposed motion compensation tracking model is presented in Sec. 4, followed by the loop closure detection and pose graph optimization in Sec. 5. Experiments, results and system parameters are presented in Sec. 6. Finally, the conclusions and future work are discussed in Sec. 7.

Related work
In this section, we discuss related work on localization and mapping in extreme weather conditions using optical sensor modalities, that is, camera and LiDAR. We also review the past and current state-of-the-art radar-based localization and mapping methods. . Map and trajectory estimated by our proposed radar SLAM system on a self-collected Snow Sequence. We can observe random noisy LiDAR points around the vehicle due to reflection from snowflakes. The camera is completely covered by frozen snow. The magnified areas are compared with satellite images from Google Maps showing reconstructed buildings and roads. Our proposed radar SLAM method can successfully handle this challenging sequence with heavy snowfall.

Vision and LiDAR based localization and mapping in adverse weathers
Typical adverse weather conditions include rain, fog and snow which usually cause degradation in image quality or produce undesired effects, for example, due to rain streaks or ice. Therefore, significant efforts have been made to alleviate this impact by pre-processing image sequences to remove the effects of rain (Garg and Nayar (2004)), (Ren et al. (2017)), for example using a model based on matrix decomposition to remove the effects of both snow and rain in the latter case. In contrast, (Li et al. (2016)) removes the effects of rain streaks from a single image by learning the static and dynamic background using a Gaussian Mixture Model. A de-noising generator that can remove noise and artefacts induced by the presence of adherent rain droplets and streaks is trained in (Porav et al. (2019)) using data from a stereo rig. A rain mask generated by temporal content alignment of multiple images is also used for keypoint detection (Huang et al. (2019); Yamada et al. (2019)). In spite of these pre-processing strategies, existing visual SLAM and visual odometry (VO) methods tend to be susceptible to these image degradations and tend to perform poorly under such condition.
The quality of LiDAR scans can also be degraded when facing rain droplets, snowflakes and fog particles in extreme weather. A filtering based approach is proposed in (Charron et al. (2018)) to de-noise 3D point cloud scans corrupted by snow before using them for localization and mapping. To mitigate the noisy effects of LiDAR reflection from random rain droplets, ) proposes groundreflectivity and vertical features to build a prior tile map, which is used for localization in a rainy weather. In contrast to process 3D LiDAR scans, (Aldibaja et al. (2016)) suggests the use of 2D LiDAR images reconstructed and smoothed by Principal Component Analysis (PCA). An edge-profile matching algorithm is then used to match the run time LiDAR images with a mapped set of LiDAR images for localization. However, these methods are not reliable when the rain, snow or fog is moderate or heavy. The results of LIO-SAM (Shan et al. (2020)), a LiDAR based odometry and mapping algorithm fused with IMU data, in light snow show that a LiDAR based approach can work to some degree in snow. However, as the snow increases, the reconstructed 3D point cloud map is corrupted to a high degree with random points from the reflection of snowflakes, which reduces the map's quality and its reusability for localization.
In summary, camera and LiDAR sensors are naturally sensitive to rain, fog and snow. Therefore, attempts to use these sensors to perform localization and mapping tasks in adverse weather are limited.

Radar-based localization and mapping
Using Millimetre Wave (MMW) radar as a guidance sensor for autonomous vehicle navigation can be traced back two or three decades. An Extended Kalman Filter (EKF) based beacon localization system is proposed by (Clark and Durrant-Whyte (1998)) where the wheel encoder information is fused with range and bearing obtained by radar. One of the first substantial solutions for MMW radar-based SLAM is proposed in (Dissanayake et al. (2001)), detecting features and landmarks from radar to provide range and bearing information. (Jose and Adams (2005)) further extends the landmark description and formalizes an augmented state vector containing rich absorption and localization information about targets. A prediction model is formed for the augmented SLAM state. Instead of using the whole radar measurement stream to perform scan matching, (Chandran and Newman (2006)) suggests treating the measurement sequence as a continuous signal and proposes a metric to assess the quality of map and estimate the motion by maximizing the map quality. A consistent map is built using a FMCW radar, an odometer and a gyroscope (Rouveure et al. (2009)). Specifically, vehicle motion is corrected using an odometer and a gyrometer while the map is updated by registering radar scans. Instead of extracting and registering feature points, (Checchin et al. (2010)) uses the Fourier-Mellin Transform (FMT) to estimate the relative transformation between two radar images. In (Vivet et al. (2013)), two approaches are evaluated for localization and mapping in a semi-natural environment using only a radar. The first one is the aforementioned FMT computing relative transformation from whole images, while the second one uses a velocity prior to correct a distorted scan (Vivet et al. (2012)). However, both methods are evaluated without any loop closure detection. A landmark based pose graph radar SLAM system proves that it can work in dynamic environments (Schuster et al. (2016)). (Marck et al. (2013)) use an Iterative Closest Point algorithm (ICP) to register the returned radar point cloud and a Particle filter to map the indoor environment. ) studies the localization to a prior LiDAR map using radar in a low visibility situation. A lowcost millimetre-wave radar is used to provide robust egomotion estimation in indoor environments in (Almalioglu et al. (2021)) with a RNN-based motion model.
Recently, FMCW radar sensors have been increasingly adopted for vehicles and autonomous robots. (Cen and Newman (2018)) extract meaningful landmarks for robust radar scan matching, demonstrating the potential of using radar to provide odometry information for mobile vehicles in dynamic city environments. This work is extended with a graph based matching algorithm for data association ). Radar odometry might fail in challenging environments, such as a road with hedgerows on both sides. Therefore, (Aldera et al. (2019b)) train a classifier to detect failures in the radar odometry using inertial measurements as supervision to automatically label good and bad odometry estimation. Recently, a direct radar odometry method has been proposed to estimate relative pose using FMT, with local graph optimization to further boost the performance ( ). In (Burnett et al. (2021)), they study the necessity of motion compensation and Doppler effects on the recent emerging spinning radar for urban navigation.
Deep Learning based radar odometry and localization approaches have been explored in (Barnes et al. (2020b), Aldera et al. (2019a), Barnes and Posner (2020), Gadd et al. (2020, Gadd et al. (2021), Tang et al. (2020a,b), Sȃftescu et al. (2020), Wang et al. (2021)). Specifically, in (Aldera et al. (2019a)) the coherence of multiple measurements is learnt to decide which information should be kept in the readings. In (Barnes et al. (2020b)), a mask is trained to filter out the noise from radar data and Fast Fourier Transform (FFT) cross correlation is applied to the masked images to compute the relative transformation. The experimental results show impressive accuracy of odometry using radar. A self-supervised framework is also proposed for robust keypoint detection on Cartesian radar images which are further used for both motion estimation and loop closure detection (Barnes and Posner (2020)). A hierarchical approach to place recognition and pose refinement for FMCW radar localization is presented in ) with compelling performance by using one experience. Promising result for radar based place recognition is shown in (Gadd et al. (2021)) by learning and embedding from sequences in an unsupervised manner.
Full radar-based SLAM systems are able to reduce drift and generate a more consistent map once a loop is closed. A real-time pose graph SLAM system is proposed in (Holder et al. (2019)), which extracts keypoints and computes the GLARE descriptor (Himstedt et al. (2014)) to identify loop closure. However, the system depends on other sensory information, for example, rear wheel speed, yaw rates and steering wheel angles.
2.2.1. Adverse weather. Although radar is considered more robust in adverse weather, the aforementioned methods do not directly demonstrate its operation in these conditions. (Yoneda et al. (2018)) proposes a radar and GNSS/IMU fused localization system by matching query radar images with mapped ones, and tests radar-based localization in three different snow conditions: without snow, partially covered by snow and fully covered by snow. It shows that the localization error grows as the volume of snow increases. However, they did not evaluate their system during snow but only afterwards. To explore the full potential of FMCW radar in all weathers, our previous work (Hong et al. (2020)) proposes a feature matching based radar SLAM system and performs experiments in adverse weather conditions without the aid of other sensors. It demonstrates that radar-based SLAM is capable of operating even in heavy snow when LiDAR and camera both fail. In other interesting recent work, ground penetrating radar is used for localization in inclement weather (Ort et al. (2020)) This takes a completely different perspective to address the problem. The ground penetrating radar (GPR) is utilized for extracting stable features beneath the ground. During the localization stage, the vehicle needs an IMU, a wheel encoder and GPR information to localize.
In this work, we extend our preliminary results presented in (Hong et al. (2020)) with a novel motion estimation algorithm optimally compensating motion distortion and an improved loop closure detection. We also carry out extensive additional experiments, demonstrating more tests and results of Radar SLAM system operating in various weather conditions, including the MulRan dataset and more sequences in adverse weather conditions.

Radar sensing and system overview
In this section, we describe the working principle of a FMCW radar and its sensor model. We also elaborate the challenges of employing a FMCW radar for localization and mapping.

Notation
Throughout this paper, a reference frame j is denoted as F j and a homogeneous coordinate of a 2D point F j in frame F j is defined as p j ¼ ½x j ,y j ,1 u . A homogeneous transformation T i , j 2 SE (2) which transforms a point from the coordinate frame F j to F i is denoted by a transformation matrix where R i , j 2 SO(2) is the rotation matrix and t i,j 2 R 2 is the translation vector. Perturbation ω 2 R 3 around the pose T i , j uses a minimal representation and its Lie algebra representation is expressed as ω ⋀ 2 seð2Þ. We use the left multiplication convention to define its increment on T i , j with an operator Å, that is A polar radar image and its bilinear interpolated Cartesian counterpart are denoted as S and I, respectively. A point in the Cartesian image I is represented by its pixel

Geometry of a rotating frequency-modulated continuous-wave radar
There are two types of continuous-wave radar: unmodulated and frequency-modulated radars. Unmodulated continuouswave radar can only measure the relative velocity of targeted objects using the Doppler effect, while a FMCW radar is also able to measure distances by detecting time shifts and/or frequency shifts between the transmitted and received signals. Some recently developed FMCW radars make use of multiple consecutive observations to calculate targets' speeds so that Doppler processing is strictly required. This improves the processing performance and accuracy of target range measurements.
Assume a radar sensor rotates 360°clockwise in a full cycle with a total of N s azimuth angles as shown in Figure 2(a), that is, the step size of the azimuth angle is 2π/N s . For each azimuth angle, the radar emits a beam and collapses the return signal to the point where a target is sensed along a range without considering elevation. Therefore, a radar image is able to provide absolute metric information of distance, different from a camera image which lacks depth by nature. As shown in Figure 2(b), given a point (a, r) in a polar image S where a and r denote its azimuth and range, its homogeneous coordinates p can be computed by where θ = À a Á 2π/N s is the ranging angle in Cartesian coordinates, and μ p (m/pixel) is the scaling factor between the image space and the world metric space. This point on the polar image can also be related to a point on the Cartesian image I with a pixel coordinate P by where w and h are the width and height of the Cartesian image, and μ c (m/pixel) is the scale factor between the pixel space and the world metric space used in the Cartesian image. Therefore, the raw polar scan S can be transformed into a Cartesian space, represented by a greyscale Cartesian image I through bilinear interpolation, as shown in Figure 2(b).

Challenges of radar sensing for simultaneous localization and mapping
Despite the increasingly widespread adoption of radar systems for perception in autonomous robots and in Advanced Driver-Assistance Systems (ADAS), there are still significant challenges for an effective radar SLAM system.
3.3.1. Coupled artefacts. As a radioactive sensor, radar suffers from multiple sources of artefacts and clutters, for example, speckle noise, receiver saturation and multi-path reflection, as shown in Figure 3(a). Speckle noise is the product of interaction between different radar waves which introduces light and dark random noisy pixels on the image. Meanwhile, multi-path reflection may create 'ghost' objects, presenting repetitive similar patterns on the image. The interaction of these multiple sources adds another dimension of complexity and difficulty when applying traditional vision based SLAM techniques to radar sensing.

Discontinuities of detection.
Radar operates at a longer wavelength than LiDAR, offering the advantage of perceiving beyond the closest object on a line of sight. However, this could become problematic for some key tasks in pose estimation, for example, frame-to-frame feature matching and tracking, since objects or clutter detected (not detected) in the current radar frame might suddenly disappear (appear) in next frame. As shown in Figure 3(b), this can happen even during a small positional change. This discontinuity of detection can introduce ambiguities and challenges for SLAM, reducing robustness and accuracy of motion estimation and loop closure.

Motion distortion.
In contrast to camera and LiDAR, current mechanical scanning radar operates at a relatively low frame rate (4 Hz for our radar sensor). Within a full 360degree radar scan, a high-speed vehicle can travel several metres and degrees, causing serious motion distortion and discontinuities on radar images, in particular between scans at 0 and 360°. An example in Figure 3(c) shows this issue on the Cartesian image on the left, that is, skewed radar detections due to motion distortion. By contrast, there are no skewed detections when it is static. Therefore, directly using these distorted Cartesian images for geometry estimation and mapping can introduce errors.

System overview
Having these challenges in mind, we propose a novel radar SLAM system which includes motion compensated radar motion estimation, loop closure detection and pose graph optimization. The system, shown in Figure 4, is divided into two threads. The main thread is the tracking thread which takes the Cartesian images as input, tracks the radar motion and creates new points and keyframes for mapping. The other parallel thread takes the polar images as input and is responsible for generation of the dense point cloud and computation of descriptors for loop closure detection. Finally, once a loop is detected, it performs pose graph optimization to correct the drift induced by tracking before updating the map.

Radar motion estimation
This section describes the proposed radar motion estimation algorithm, which includes feature detection and tracking, graph based outlier rejection and radar pose tracking with optimal motion distortion compensation.

Feature detection and tracking
For each radar Cartesian image I j , we first detect keypoints purely using a blob detector based on a Hessian matrix. Keypoints with Hessian responses larger than a threshold are selected as candidate points. The candidate points are then selected based on the adaptive non-maximal suppression (ANMS) algorithm (Bailo et al. (2018)), which selects points that are homogeneously spatially distributed. Instead of using a descriptor to match keypoints as in (Hong et al. (2020)), we track them between frames I jÀ1 and I j using the KLT tracker (Lucas and Kanade (1981)).

Graph based outlier rejection
It is inevitable that some keypoints are detected and tracked on dynamic objects, for example, cars, cyclists and pedestrians, and on radar noise, for example, multi-path reflection. We leverage the absolute metrics that radar images directly provide to form geometric constraints used for detecting and removing these outliers. We apply a graph based outlier rejection algorithm described in (Howard (2008)). We impose a pairwise geometric consistency constraint on the tracked keypoint pair based on the fact that they should follow a similar motion tendency. The assumption is that most of the tracked points are from static scene data. Therefore, for any two pairs of keypoint matches between the current I j and the last I jÀ1 radar frames, they should satisfy the following pairwise constraint: where j Á j is the absolute operation, kÁk 2 is the Euclidean distance, δ c is a small distance threshold, and P m jÀ1 , P n jÀ1 , P m j and P n j are the pixel coordinates of two pairs of tracked points between I jÀ1 and I j . Hence, P m jÀ1 and P m j denote a pair of associated points while P n jÀ1 and P n j is another pair, see Figure 5 for an intuitive example. A consistency matrix G is then used to represent all the associations that satisfy this pairwise consistency. If a pair of associations satisfies this constraint, the corresponding entry in G is set as one shown in Figure 5. Finding the maximum inlier set of all matches that are mutually consistent is equivalent to deriving the maximum clique of a graph represented by G, which can be solved efficiently using (Konc and Janezic (2007)). Once the maximum inlier set is obtained, it is used to compute the relative transformation T jÀ1, j , which transforms a point from local frame j to local frame j À 1 using Singular Value Decomposition (SVD) (Challis (1995)). Given T jÀ1, j and the fixed radar frame rate, an initial guess of current velocity v j ¼ ½v x ,v y ,v θ u 2 R 3 can be computed for the motion compensation tracking model. Similarly, a graph based method is applied to perform outlier rejection in Newman (2018, 2019)). Global geometric constraints are imposed in both methods. However, their method formulates the graph matching problem as an optimization problem maximizing the global compatibility, while ours solves the maximum clique problem with an approximate vertex-colouring algorithm.

Motion distortion modelling
After the tracked points are associated, they can be used to estimate the motion. However, since the radar scanning rate  is slow, they tend to suffer from serious motion distortion as discussed in Sec. 3.3. This can dramatically degrade the accuracy of motion estimation, which is different from most of the vision and LiDAR based methods. Therefore, we explicitly model and compensate for motion distortion in radar pose tracking using an optimization approach.
Assume a full polar radar scan S j takes Δt seconds to finish. Denote T w , j as the pose of radar scan S j in the world coordinate frame F w and T j,j t as the pose of the radar scan in the local frame while capturing its azimuth beam at time t 2 [ À Δt/2, Δt/2]. Without losing generality, we compensate the motion distortion relative to the central azimuth beam at t = 0, that is, F j 0 defines the local coordinate frame F j of S j . The motion distortion model is designed to correct detections on each beam of a radar scan, that is, optimally estimating the detections on an undistorted radar image as shown in Figure 6.
The radar pose in the world coordinate frame while capturing an azimuth scan at time t can be obtained by Consider a constant velocity model in a full scan, we can compute the relative transformation T j,j t given the velocity v j , that is where exp () is the matrix exponential map and ⋀ is the operation to transform a vector to a matrix. If the ith keypoint p i w in the world frame F w is observed as p i j t in the azimuth scan at time t, its motion compensated location is then In other words, p i j 0 is the compensated location of p i w in the local frame F j 0 . Therefore, the feature residual between the locally observed and estimated (after motion compensation) locations of this ith keypoint can be computed as: where ρ c is the Cauchy robust cost function used to account for perspective changes described in 3.3.2.

Optimal motion compensated radar pose tracking
Radar pose tracking aims to find the optimal radar pose T w , j and the current velocity v j while considering the motion distortion. In order to ensure smooth motion dynamics, a velocity error e v is also introduced as a velocity prior term: where v j , prior is a prior on the current velocity which is parameterized as Here, ⋁ is the operation to convert a matrix to a vector. This velocity prior term establishes a constraint on velocity changes by considering the previous pose T w , jÀ1 . This prior is crucial to stabilize the optimization. The results with and without this prior are compared in Figure 7. Therefore, the pose tracking optimizes the velocity v j and the current radar pose T w , j by minimizing the cost function including the feature residuals of all the N keypoints tracked in S j and the velocity prior, that is where Λ i p and Λ v are the information matrices of the keypoint i and the velocity.
By formulating a state variable Θ containing all the variables to be optimized, that is, the velocity v j and the current radar pose T w , j , and denoting e(Θ) as the residual function containing both e i p and e v , Θ can be solved in an optimization whose factor graph representation is shown in Figure 8. The optimization problem is then framed as finding the minimum of the weighted Sum of Squared Errors cost function: where W i is a symmetric positive definite weighting matrix. The total cost in 13 is minimized using the Levenberg-Marquardt algorithm. With the initial guess Θ, the residual e i (ΔΘ?Θ) is approximated by its first order Taylor expansion around the current guess Θ: where J i is the Jacobian of e i evaluated with the current parameters Θ. ? represents a standard addition operator for the velocity variable v j and a Å operation for the pose variable T w , j , that is, exp ((ΔΘ) ⋀ ) ÁT w , j (Kümmerle et al. (2011), Schubert et al. (2018). J i can be decomposed into two parts with respect to the velocity and the pose that is, the Jacobian part with respect to the velocity as and the Jacobian with respect to the pose as which is equivalent to computing the Jacobian with respect to zero perturbation ΔΘ p around the pose (Solà et al. (2018)), that is The Levenberg-Marquardt algorithm computes a solution ΔΘ at each iteration such that it minimizes the residual function e i (ΔΘ ?Θ). We refer the reader to (Kümmerle et al. (2011)) for further details on the optimization process.

New point generation
After tracking the current radar scan S j , the total number of successfully tracked keypoints is checked to decide whether new keypoints should be generated. If it is below a certain threshold, new keypoints are extracted and added for tracking if they are located in image grids whose total numbers of keypoints are low. Specifically, we divide the radar image into grids and count the number of tracked points in each grid. These grids are then sorted according to the number of keypoints in each of them. For the grids that have fewer number of keypoints, new keypoints are extracted in them before being inserted for tracking. The number of the new keypoints is limited by the maximum number of keypoints in each grid. Once a new keypoint p i j t is associated with the current frame S j , its global position p w can be derived from using its direct observation of p i j t (with distortion). T * w,j and v * j are the optimal radar pose and velocity derived in equation (12).

New keyframe generation
To scale the system in a large-scale environment, we use a pose-graph representation for the map with each node parameterized by a keyframe. Each keyframe which contains a velocity and a pose is connected with its neighbouring keyframes using its odometry derived from the motion estimation. The keyframe generation criterion is similar to that introduced in (Mur-Artal and Tardós (2017)) based on travelled distance and angle.

Loop closure and pose graph optimization
Robust loop closure detection is critical to reduce drift in a SLAM system. Although the Bag-of-Words model has proved efficient for visual SLAM algorithms, it is not adequate for radar-based loop closure detection due to three main reasons: first, radar images have less distinctive pixelwise characteristics compared to optical images, which means  similar feature descriptors can repeat widely across radar images causing a large number of incorrect feature matches; second, the multi-path reflection problem in radar can introduce further ambiguity for feature description and matching; third, a small rotation of the radar sensor may produce tremendous scene changes, significantly distorting the histogram distribution of the descriptors. There are some attempts to address this challenge in the context of place recognition (Sȃftescu et al. (2020), Gadd et al. (2021), Kim et al. (2020)). On the other hand, radar imagery encapsulates valuable absolute metric information, which is inherently missing for an optical image. Therefore, we propose a loop closure technique which captures the geometric scene structure and exploits the spatial signature of reflection density from radar point clouds.

Algorithm 1. Radar Polar Scan to Point Cloud Conversion
Input: Radar polar scan S 2 R m×n ; Output: Point Cloud C 2 R z×2 ; Parameters: Minimum peak prominence δ p and minimum peak distance δ d; Initialize empty point cloud set C; Add the point p to C; end end end

Point cloud generation
Considering the challenges of radar sensing in Sec. 3.3, we want to separate true targets from the noisy measurements on a polar scan. An intuitive and naive way would be to detect peaks by finding the local maxima from each azimuth reading. However, as shown in Figure 9, the detected peaks can be distributed randomly across the whole radar image, even for areas without a real object, due to the speckle noise described in Sec. 3.3.1. Therefore, we propose a simple yet effective point cloud generation algorithm using adaptive thresholding. We denote the return power of a peak as q, we select peaks which satisfy the following inequality constraint where μ and σ are the mean and the standard deviation of the powers of the peaks in one azimuth scan. Estimating the power mean and standard deviation along one azimuth instead of the whole polar image can mitigate the effect of receiver saturation since the radar may be saturated at one direction while rotating. By selecting the peaks whose powers are greater than one standard deviation plus their mean power, the true detections tend to be separated from the false-positive ones. The procedure is shown in Algorithm 1. Once a point cloud C is generated from a radar image, M2DP (He et al. (2016)), a rotation invariant global descriptor designed for 3D point clouds, is adapted for the 2D radar point cloud to describe it for loop closure detection. M2DP computes the density signature of the point cloud on a plane and uses the left and right singular vectors of these signatures as the descriptor.

Loop Candidate Rejection with principal component analysis
We leverage principal component analysis (PCA) to determine whether the current frame can be a candidate to be matched with historical keyframes for loop closure detection. After performing PCA on the extracted 2D point cloud C, we compute the ratio r pca = γ 1 /γ 2 between the two eigenvalues γ 1 and γ 2 where γ 1 ≥ γ 2 . The frame is selected for loop closure detection if its r pca is less than a certain threshold δ pca . The intuition behind this is to detect loop closure mainly on point clouds which have distinctive structural layouts, reducing the possibility of detecting false-positive loop closures. In other words, if γ 1 is dominant (that is, r pca is big), it is very likely that the radar imagery is collected in an environment, such as a highway or a country road, which exhibits less distinctive structural patterns and layouts and should be avoided for loop closure detection. Some examples are given in Figure 10. Note that a great amount of peaks detected are due to speckle noise. Right: Peaks detected using the proposed point cloud extraction algorithm which preserves the environmental structure and suppresses detections from multi-path reflection and speckle noise.

Relative transformation
Once a loop closure is detected, the relative transformation T l , j between the current radar image I j and the matched radar image I l is computed. Similar to Sec. 4.1, we also associate keypoints of the two frames by using KLT tracker. The challenge here is that I l might have a large rotation with respect to I j , causing the tracker to fail. To address this problem, we estimate firstly the relative rotation between the two frames and align them by using the eigenvectors from the PCA of their point clouds, similar to Sec. 5.2. Then, the keypoints of I l are tracked through the rotated version of I j .
After obtaining the keypoint association, ICP is used to compute the relative transformation T l , j , which is added in the pose graph as a loop closure constraint for pose graph optimization.

Pose graph optimization
A pose graph is gradually built as the radar moves. Once a new loop closure constraint is added in the pose graph, pose graph optimization is performed. After successfully optimizing the poses of the keyframes, we update the global map points. The g2o (Kümmerle et al. (2011)) library is used in this work for the pose graph optimization.

Experimental results
Both quantitative and qualitative experiments are conducted to evaluate the performance of the proposed radar SLAM method using three open radar datasets, covering large-scale environments and some adverse weather conditions.

Evaluation protocol
We perform both quantitative and qualitative evaluation using different datasets. Specifically, the quantitative evaluation is to understand the pose estimation accuracy of the SLAM system. For Relative/Odometry Error (RE), we follow the popular KITTI odometry evaluation criteria, that is, computing the mean translation and rotation errors from length 100-800 m with a 100 m increment. Absolute Trajectory Error (ATE) is also adopted to evaluate the localization accuracy of full SLAM, in particular after loop closure and global graph optimization. The trajectories of all methods (see full list in Sec. 6.3) are aligned with the ground truth trajectories using a 6 Degree-of-Freedom (DoF) transformation provided by the evaluation tool in (Zhang and Scaramuzza (2018)) for ATE evaluation. On the other hand, the qualitative evaluation focuses on how some challenging scenarios, for example, in adverse weather conditions, influence the performance of various vision, LiDAR and radar-based SLAM systems.

Datasets
So far there exist three public datasets that provide longrange radar data with dense returns: the Oxford Radar RobotCar Dataset , Maddern et al. (2017)), the MulRan Dataset ) and the RADIATE Dataset (Sheeny et al. (2021)). We choose the Oxford RobotCar and MulRan datasets for detailed quantitative benchmarking and our RADIATE dataset mainly for qualitative evaluation in our experiments.
6.2.1. Oxford radar RobotCar dataset. The Oxford Radar RobotCar Dataset , Maddern et al. (2017)) provides data from a Navtech CTS350-X Millimetre-Wave W radar for about 280 km of driving in Oxford, UK, traversing the same route 32 times. It also provides stereo images from a Point Grey Bumblebee XB3 camera and LiDAR data from two Velodyne HDL-32E sensors with ground truth pose locations. The radar is configured to provide 4.38 cm and 0.9°resolution in range and azimuth, respectively, with a range up to 163 m. The radar scanning frequency is 4 Hz. See Figure 11 for some examples of data.  Figure 12. 6.2.3. RADIATE dataset. The RADIATE dataset is our recently released dataset which includes radar, LiDAR, stereo camera and GPS/IMU data (Sheeny et al. (2021)). One of its unique features is that it provides data in extreme weather conditions, such as rain and snow, as shown Figure 13. A Navtech CIR104-X radar is used with 0.175 m range resolution and maximum range of 100 m at 4 Hz operating frequency. A 32-channel Velodyne HDL-32E LiDAR and a ZED stereo camera are set at 10 Hz and 15 Hz, respectively. The seven sequences used in this work include 2 fog, 1 rain, 1 normal, 2 snow and 1 night recorded in the City of Edinburgh, UK. Their sequence lengths are given in Table 1. Note that only the rain, normal, snow and night sequences have loop closures and the GPS signal is occasionally lost in the snow sequence.

Competing methods and their settings
In order to validate the performance of our proposed radar SLAM system, state-of-the-art odometry and SLAM methods for large-scale environments using different sensor modalities (camera, LiDAR, radar) are chosen. These include ORB-SLAM2 (Mur-Artal and Tardós (2017)), SuMa (Behley and Stachniss (2018)) and our previous version of RadarSLAM (Hong et al. (2020)), as baseline algorithms for vision, LiDAR and radar-based approaches, respectively.  For the Oxford Radar RobotCar Dataset, the results reported in (Cen and Newman (2018), Barnes et al. (2020b)) are also included as a radar-based method due to the unavailability of their implementations. We would like to highlight that we use an identical set of parameters for our radar odometry and SLAM algorithm across all the experiments and datasets. We believe this is worthwhile to tackle the challenge that most existing odometry or SLAM algorithms require some levels of parameter tuning in order to reduce or avoid result degradation. (2017)) is a sparse feature based visual SLAM system which relies on ORB features. It also possesses loop closure and pose graph optimization capabilities. Local Bundle Adjustment is used to refine the map point position which boosts the odometry accuracy. Based on its official open-source implementation, we use its stereo setting in all experiments and loop closure is enabled.

Stereo vision based ORB-SLAM2. ORB-SLAM2 (Mur-Artal and Tardós
6.3.2. LiDAR based SuMa. SuMa (Behley and Stachniss (2018)) is one of the state-of-the-art LiDAR based odometry and mapping algorithms for large-scale outdoor environments, especially for mobile vehicles. It constructs and uses a surfel-based map to perform robust data association for loop closure detection and verification. We employ its opensource implementation and keep the original parameter setting used for KITTI dataset in our experiments. 6.3.3. Radar-based radarSLAM. Our previous version of RadarSLAM (Hong et al. (2020)) extracts SURF features from Cartesian radar images and matches the keypoints based on their descriptors for pose estimation, which is different from the feature tracking technique in this work. It does not consider motion distortion although it includes loop closure detection and pose graph optimization to reduce drift and improve the map consistency. It is named as 'Baseline Odometry' and 'Baseline SLAM' for comparison.
6.3.4. Cen's radar odometry. Cen's method (Cen and Newman (2018)) is one of the first attempts using the Navtech FMCW radar sensor to estimate ego-motion of a mobile vehicle. Landmarks are extracted from polar scans before performing data association by maximizing the overall compatibility with pairwise constraints. Given the associated pairs, SVD is used to find the relative transformation.
6.3.5. Barnes' radar odometry. Barnes' method (Barnes et al. (2020b)) leverages deep learning to generate distractionfree feature maps and uses FFT cross correlation to find relative poses on consecutive feature maps. After being trained end-to-end, the system is able to mask out multipath reflection, speckle noise and dynamic objects. This facilitates the cross correlation stage and produces accurate odometry. The spatial cross-validation results in appendix of (Barnes et al. (2020b)) are chosen for fair comparison. Figure 13. Images collected in Snow (left), Fog/Rain (middle), and Night (right). The image quality degrades in these conditions, making it extremely challenging for vision based odometry and SLAM algorithms. Note that for the snow sequence, the camera is completely covered by snow.
6.4.1. Quantitative comparison. The RE and ATE results of each sequence are given in Table 2 and Table 3, respectively. Since the ground truth poses provided by the Oxford Radar RobotCar Dataset are 3-DoF, only the x, y and yaw of the estimated 6-DoF poses of ORB-SLAM2 and SuMa are evaluated. Note that SuMa fails on these eight sequences at 10 À 30% of the full lengths and its results are therefore reported until the point where it fails, and ATE is not applicable due to the lack of fully estimated trajectories. Specifically, the stereo version of ORB-SLAM2 is able to complete seven sequences, successfully close the loops and achieve superior localization accuracy. SuMa also performs accurately when it works although it is less robust on these sequences. This may be due to the large number of dynamic objects, for example, surrounding moving cars and buses. Regarding radar-based approaches, we can see that our proposed radar odometry/SLAM achieves less RE compared to the baseline radar odometry/SLAM and Cen's method and a similar mean AE to the learning based Barnes' method. It can also be seen that our proposed radar odometry and SLAM methods achieve better or comparable RE and ATE performance to ORB-SLAM2 and SuMa. Figure 14 describes the REs of ORB-SLAM2, SuMa, baseline radar odometry and our radar odometry/SLAM algorithms using different path lengths and speeds, following the popular KITTI odometry evaluation protocol. SuMa has the lowest error on both translation and rotation against path lengths and speed until it fails, while our radar SLAM and odometry methods are the second and third lowest, respectively. The low, median and high translation and rotation errors are presented in Figures 15(a) and (b). It can be seen that our SLAM and odometry achieve low values for both translation and rotation errors for different path lengths. The optimized velocities of x, y and yaw are given in Figure 16 compared to the ground truth. The optimized velocities have very high accuracy, which verifies the superior performance of our proposed radar motion estimation algorithm.
6.4.2. Qualitative comparison. We show the estimated trajectories of six sequences in Figure 17 for qualitative evaluation. For most of the sequences, our SLAM results are closest to the ground truth although the trajectories of baseline SLAM and ORB-SLAM2 are also accurate except for sequence 18-14-14-42. Figure 18 elaborate on the trajectory of each method on sequence 17-13-26-39 for qualitative performance.

Experiments on MulRun dataset
The RE and ATE of SuMa, baseline radar odometry/SLAM and our radar SLAM are shown in Table 4 and Table 5. ORB-SLAM2 is not applicable here since MulRan only contains radar and LiDAR data. Similar to the RobotCat  -11-46-21 10-12-32-52 11-14-02-26 16-11-53-11 17-13-26-39 18-14-14-42 18-14-46-59 18-15-20-12  The absolute trajectory error of position is in metres. N/A: SuMa fails to finish all eight sequences, no absolute trajectory error is applicable here.    dataset, we again transform its provided 6-DoF ground truth poses into 3-DoF for evaluation. Both RE and ATE are evaluated on nine sequences: DCC01, DCC02, DCC03, KAIST01, KAIST02, KAIST03, Riverside01, Riverside02 and Riverside03. In terms of RE, both our odometry and SLAM system achieve comparable or better performance on all sequences. Our odometry method reduces both translation error and rotation errors significantly compared to the baseline. Our SLAM system, to a great extent, outperforms both baseline SLAM and SuMa on ATE. More importantly, only our SLAM reliably works on all nine sequences which cover diverse urban environments. Specifically, the baseline SLAM detects wrong loop closures on sequence KAIST03, Riverside02 and Riverside03 and fails to detect a loop in DCC03, which causes its large ATEs for these sequences. SuMa, on the other hand, fails to finish the sequences Riverside01, 02 and 03, likely due to the challenges of less distinctive structures along the rather open and long road as shown in Figure 20. It can be very challenging to register LiDAR scans accurately in this kind of environment.
The estimated trajectories on sequences DCC01, DCC02, DCC03, KAIST01, KAIST02, KAIST03 are shown in Figure 21. These qualitative results of the algorithms provide similar observations to the RE and ATE. For clarity, Figure 22 presents trajectories of the SLAM algorithms on Riverside01 in separate figures.

Experiments on the RADIATE dataset
To further verify the superiority of radar against LiDAR and camera in adverse weathers and degraded visual environments, we perform qualitative evaluation by comparing the estimated trajectories with a high-precision Inertial Navigation System (inertial system fused with GPS) using our RADIATE dataset. Since ORB-SLAM2 fails to produce meaningful results due to the visual degradation caused by water drops, blurry effects in low-light conditions and occlusion from snow (see Figure 13 for example images), its results are not reported in this section. 6.6.1. Experiments in adverse weather. Estimated trajectories of SuMa, baseline and our odometry for Fog 1 and Fog 2 are shown in Figures 23(a) and 23(b), respectively. We can see that our SLAM drifts less than SuMa and the baseline radar SLAM although they all suffer from drift without loop closure. SuMa also loses tracking for sequence Fog 2, which is likely due to the impact of fog on LiDAR sensing.
The impact of snowflakes on LiDAR reflection is more obvious. Figure 24 shows the LiDAR point clouds of two of the same places in snowy and normal conditions. Depending on the snow density, we can see two types of degeneration of LiDAR in snow. It is clear that both the number of correct LiDAR reflections and point intensity dramatically drop in snow for place 1, while there are a lot of  noisy detections around the origin for place 2. Both cases can be challenging for LiDAR based odometry/SLAM methods. This matches the results of the Snow sequence in Figure 25. Specifically, when the snow was initially light, SuMa was operating well. However, when the snow gradually became heavier, the LiDAR data degraded and eventually SuMa lost track. The three examples of LiDAR scans at the point when SuMa fails are shown in Figure 25.
The very limited surrounding structures sensed by LiDAR makes it extremely challenging for LiDAR odometry/ SLAM methods like SuMa. In contrast, our radar SLAM method is still able to operate accurately in heavy snow, estimating a more accurate trajectory than the baseline SLAM.
6.6.2. Experiments on the same route in different weathers. To compare different algorithms' performance on the same route but in different weather conditions, we also provide results here in normal weather, rain and snow conditions, respectively. The estimated trajectories of SuMa, baseline SLAM and our SLAM result in normal weather are shown in Figure 26(a) while for the Rain sequence these are shown in Figure 26(b). In the Rain sequence, there is moderate rain. LiDAR based SuMa is slightly affected, and as we can see at the beginning of the sequence, SuMa estimates a shorter length. Our radar SLAM also performs better than the baseline SLAM. In the Snow 2 sequence, there is moderate snow, and the results are shown in Figure 26(c). The Snow 2 sequence was taken while moving quickly. Therefore, without motion    compensation, the baseline SLAM drifts heavily and cannot close the loop while our SLAM consistently performs well. Hence, the results in Figure 26 once again confirm that our proposed SLAM system is robust in all weather conditions. 6.6.3. Experiments at night. The estimated trajectories of SuMa, baseline SLAM and our SLAM on the Night sequence are shown in Figure 23(c). LiDAR based SuMa is almost unaffected by the dark night although it does not detect the loops. Both baseline and our SLAM perform well in the night sequence, producing more accurate trajectories after detecting loop closures.

Average completion percentage
We calculate the average completion percentage for each competing algorithm on each dataset, to evaluate the robustness of each algorithm representing a different sensor modality. The number of frames that a method completed before losing tracking is denoted as K completed while the total number of frames is denoted as K total . The metric is computed as: The MulRan dataset does not include camera data so it is shown as N/A for ORB-SLAM2. In the RADIATE dataset, the camera is either blocked by snow or blurred in the night so ORB-SLAM2 fails to initialize and it is also shown as N/A. In Table 6 we can see that only the radar-based methods are reliable and completed in all cases. Neither vision based nor LiDAR based methods manage to finish on all three datasets.

Parameters used
The same set of parameters provided in Table 7 is employed in all the experiments, covering different cities, radar resolutions and ranges, weather conditions, road scenarios, etc. This selected set of parameters is tuned using the 16-13-09-37 sequence in the Oxford Radar RobotCar Dataset.

Runtime
The system is implemented in C++ without a GPU. The computation time of a tracking thread is shown in Figure 27 showing that our proposed system runs at 8 Hz, which is twice as fast as the 4 Hz radar frame rate, on a laptop with an Intel i7 2.60 GHz CPU and 16 GB RAM. The loop closure and pose  graph optimization are performed with an independent thread which does not affect our real-time performance.

Conclusion
In this paper, we have presented a FMCW radar-based SLAM system that includes pose tracking, loop closure and pose graph optimization. To address the motion distortion problem in radar sensing, we formulate pose tracking as an optimization problem that explicitly compensates for the motion without the aid of other sensors. A robust loop closure detection scheme is specifically designed for the FMCW radar. The proposed system is agnostic to the radar resolutions, radar range, environment and weather conditions. The same set of system parameters is used for the evaluation of three different datasets covering different cities and weather conditions. Extensive experiments show that the proposed FMCW radar SLAM algorithm achieves comparable localization accuracy in normal weather compared to the state-of-the-art LiDAR and vision based SLAM algorithms. More importantly, it is the only one that is resilient to adverse weather conditions, for example, snow and fog, demonstrating the superiority and promising potential of using FMCW radar as the primary sensor for long-term mobile robot localization and navigation tasks. However, our current radar SLAM system depends on an expensive and cumbersome radar sensor. We aim to extend its application on some low-cost, lightweight radar sensors in future. For future work, we also seek to use the map built by our SLAM system and perform long-term localization on it across all weather conditions.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This   image data in normal weather captured by our stereo camera, centre: image data in rain captured by our stereo camera, bottom: image data in snow captured by our phone for reference.

Supplementary Material
Supplementary material for this article is available online.