Decision-Theoretical Navigation of Service Robots Using POMDPs with Human-Robot Co-Occurrence Prediction

To improve the natural human-avoidance skills of service robots, a human motion predictive navigation method is proposed, namely PN-POMDP. A human-robot motion co-occurrence estimation algorithm is proposed which incorporates long-term and short-term human motion prediction. To improve the reliability of probabilistic and predictive navigation, the POMDP model is utilized to generate navigation control policies through theoretically optimal decisions. A layered motion control structure is proposed that combines global path planning and reactive avoidance. Multiple comity policies are integrated with a decision-making module that generates efficient and human-compliant navigational behaviours for robots. Experimental results illustrate the effectiveness and reliability of the predictive navigation method.


Introduction
As service robots have been designed to provide interactive tasks in domestic and office environments, they must reliably navigate around a populated room. When robots and people encounter each other, humanaware motion planners [1] help robots treat people as social entities and aim to endow robots with safe and humanfriendly navigational behaviours [2] [3] .
Predicting the motion of moving people is an effective way for compliant robot navigation in dynamic environments [4] . Many researchers [5] [6][7] [8] have developed efficient replanning algorithms to cope with the environmental dynamic and satisfy the real-time requirement by feeding the updated information to a grid map and optimizing the robot's path to minimize the expected time to its destination. Although reactive motion planners [9][10] [11] [12] are able to rapidly query the next appropriate action, they are prone to getting robots blocked in complex environments because of their greedy property and the uncertainty of human motion. According to research on human indoor motion modelling and understanding, a human's daily motions in a specific room environment presents certain long-term patterns. Nevertheless, uncertainties are pervasive in the velocity and heading direction of people's movement. Several studies have exploited the spatial-temporal nature of human motion using a chain of Gaussian distributions [13] , clustering the trajectories with Kmeans [14] , and learning human motion patterns from tracking data using an EM algorithm [6] . However, most past research ignores the combination of human motion uncertainty prediction with motion pattern prediction.
Another key factor of predictive navigation is inherent uncertainties [15] [16] , which typically necessitate intelligently reacting to unknown moving people. Due to the non-linear nature of human and robot motion, as well as sensor noise, the popular robot localization and people-tracking algorithm based on Bayes filtering can only estimate position distributions. The probabilistic human motion prediction algorithm is also likely to produce larger errors when making motion prediction.
Many researchers have already pointed out that probabilistic representation and reasoning is appropriate and very effective for navigating in noisy environments in the real world. In situations of probabilistic decisionmaking, Partially Observable Markov Decision Processes (POMDPs) [17] [18] have already been widely used in robot navigation and interacting with people. Robots such as Flo [19] or Pearl [20] use POMDPs at all levels of decisionmaking, and not only in low-level navigation routines. But since finding optimal control strategies in POMDP cases is computationally intractable due to the continuous and high-dimensional beliefs space, POMDPs have usually been applied to topological navigation. For example, Foka [16] proposes a method for combining the prediction of the destination of a moving obstacle and its one-step-ahead positions. However, the proposed method relies on complex hierarchical decomposition of the environment and has only been implemented and successfully tested in a simulation application.
In this paper, a novel approach called POMDPs for Predictive Navigation (PN-POMDP) is proposed. The idea of predictive navigation is largely inspired by Sisbot's [1] human-aware planner, which focuses on providing human-friendly robot behaviours that imitate human motion habits. However, the human-aware planner does not take into account uncertainties when robots work in unstructured environments. This paper deals with the robustness and reliability requirements of a navigation system using the probabilistic reasoning (POMDP) method. We compute the uncertainties of human motion in two parts: the ambiguity in path selection and the motion uncertainty along each path. Then, the human-robot cooccurrence probability is estimated by analysing two situations: conflict and obstruction. Our major contribution is the PN-POMDP framework that coordinates the global path planner, the motion reactor and the speed controller in the context of probabilistic decision-making under conditions of multiple uncertainties. The control framework combines the objectives of goal-guiding, reducing the probability of human-robot conflict. More importantly, by considering high perceptual aliasing and other uncertainty factors, it combines the probabilistic robot localization, people-tracking and human motion prediction in a natural probabilistic decision-making framework to generate robot control policies resulting in efficient and polite navigation behaviours. This paper is organized as follows. After an overview of the navigation system framework in Section 2, Section 3 describes the human motion prediction method and Section 4 introduces the human-robot co-occurrence estimation in the spatial-temporal aspect. Section 5 describes the decision-making mechanism of predictive navigation and the POMDP. Finally, some experimental results are reported in Section 6, followed by a final discussion and conclusion that summarizes the paper.

Uncertainties in predictive navigation
Firstly, the trajectories of human motion are uncertain; velocity and direction usually vary within a range when they are engaged in specific motion patterns. Secondly, the localization errors are pervasive. A Simultaneous robot Localization And People-tracking (SLAP) system using global cameras and an onboard laser range finder has been developed in our previous work [21] . This jointly estimates a robot's pose r t and people's ground-plane position t t t t (x , y , ) in the global coordinate frame using two sets of particles, as shown in Figure 1. But in clustering environments with table and chair legs, the localization errors are prone to deteriorate the human motion prediction. Thirdly, the control uncertainty [14] caused by wheel slip, time-delay and other unexpected factors is commonly reported in the robot navigation domain. Finding optimal policies in the POMDP case is computationally intractable because the beliefs space is continuous and high-dimensional. The solution adopted in this work is a hybrid control structure that combines the reactive motion control and the probabilistic strategy selection for generating optimal navigational behaviours, as shown in Figure 2. In the system, sensory data obtained from laser, global cameras and other sensors are processed by the SLAP module and fed to the Perception module. Human motion patterns learned by the Modelling module are also inputted to the Perception module, in which the future motion tendencies of people are predicted in both their long-term and short-term aspects. Then, the human and robot motion states are abstracted and three types of abstracted observations are formatted, namely People's Action Observation (PAO), People-robot Relation Observation (PRO) and Robot State Observation (RSO). These abstracted observations are inputted to the Control module. In the Control module, the POMDP-based decisionmaking sub-module generates a suitable predictive navigational (PN) policy that minimizes the risk of conflict with humans. We have designed four types of action: detour, slow-down, speed-up, and halt, which will be explained in Section 5. In order to ensure goal-directed and predictive navigation performance, the motion controller is constructed with a two-layered architecture augmented by policies generated from the POMDP controller. The wavefront-based global path planner calculates the optimal path and the reference points along the path based on mapped obstacles. The Nearness Diagram [12] based local reactive obstacle avoidance controller computes actual translational velocity  and rotational velocity  based on the reference points and real-time sensory data.
A more detailed illustration of the POMDP controller structure is depicted in the right part of Figure 2. The POMDP controller contains a state estimator (SE) and a policy generator. The state estimator computes probabilistic distributions upon the belief t b according to o , a and t 1 b  . Meanwhile the policy generator maps the belief onto an optimal behaviour of the robot, i.e., a (bel)   .

Long-term modelling of motion pattern
Based on the collection of the tracking trajectories, a set of motion patterns of people are clustered hierarchically using a fuzzy K-means algorithm based on the spatial and temporal information. This algorithm results in a set . Spatial probability of the person located at h t given step k of the motion pattern Ψ m is computed according to the Gaussian distribution and denoted as h Ψ k t m p( | ) : The probability evaluates the probability of the person that covers the point h t at time t , given a sequence of observations 1:t z and given that where  is a normalizer, Ψ is the observation likelihood of are two prior probability distributions.
Examples of learned human motion patterns are shown in Figure 3, which indicates that people typically move between places with important objects to manipulate: fridge, a printer, a washing machine, etc.

Short-term motion prediction
To account for the short-term uncertainty of the movement along the path of m  , the variation in velocity and heading orientation of the person are modelled.
We assume the following on the movement of a person [7] : T  is the time step in which a person keeps to a certain velocity and heading direction; the possible ranges of his/her motion velocity and orientation are represented as   , respectively; he/she changes the velocity and orientation only at every time step T  . In this sense, the velocity (orientation) of a time step is constant, and is randomly and independently selected within the above range. According to these assumptions, the sequence of velocity (orientation) along time contains a list of velocities (orientations) that are independently distributed random variables. This assumption describes a common indoor motion style where people move smoothly between two places. Firstly, the orientation variance is modelled by a fanshaped area called the field of view, as shown in Figure 4. The field of view defines a coordinate system originated to be the goal of movement in the next time step. Eq. 4 indicates that the higher the value of  , the less likely it is that the person will head in that direction.
Secondly, the velocity variance is modelled by a distribution h Moreover, according to the assumption that i v (i 0,...,t)  follows the same but independent uniform distribution, the sequence of variables 0 t v ,...,v have the same mean and variance. This indicates that As a result, the above equations can be rewritten as: To combine the long-term and short-term prediction, the heading orientation probability orien 0 p ( | ) h h is used as the exponent discount factor to the velocity probability, and the probability of the motion pattern that the person is involved in at current position 0 h is also normalized by a normalization factor  for all M motion patterns. Finally, the probability of reaching t h at time t is computed as:

Human-robot Co-occurrence Estimation
The probability of human and robot co-occurrence is estimated in the spatial-temporal aspect according to the robot's travelling route obtained from the global path planner and the human motion prediction. In the PN-POMDP system, situations of human-robot co-occurrence are classified into two types.
The first type is human-robot motion conflict, which is denoted as This is because if the person and the robot are moving along their respective paths at constant speeds of h v and r v , the robot will arrive at the place c P at time , and then its distance to the person will be less than safe L . If the motion uncertainty of the person is taken into account, the probability that he/she will arrive at place c P at a future time t is computed according to Eq. 18:  The second situation is human-robot motion obstruction (as shown in Figure 7). This represents a situation where the robot's path will block a human's intentional trajectory, which happens to traverse a narrow passage. In this type of situation, a detected

The Elements: States, Actions and Observations
To automatically compile the POMDP model , , , , ,         , it is necessary to define some action and observation uncertainties. Actions (  ) are human-compliant avoidance behaviours that the robot can execute to give way to human in a polite manner:


Normal path following n a ;  Accelerating along path u a ;  Decelerating along path s a ;  Dynamic replanning for detour r a .
The first three actions indicate that the robot follows the planned path with only velocity changes. These actions are usually more efficient for avoiding conflict with or obstruction of humans. The fourth action indicates that the robot replans a new path according to the updated environmental map by incorporating the probability distribution function (PDF) of p( ; t) t h with the occupancy grid map.
The reward (  ) defines the reward function that determines the immediate utility of executing action a at state s . In our system, the reward matrix is manually specified based on a criterion that behaviours ensuring more safety and politeness receive higher rewards. Nevertheless, the optimum choice of settable parameters can be adjusted through a user-supervised learning system when a robot is installed in a new environment and performs a daily room exploration task, as suggested by Lopez [22] .

POMDP Compilation
The transition model ( ) T s ,a,s  specifies the conditional probability distribution of transiting from state s to s by executing action a . O(s ,a,o)  is the observation model that computes the probability of obtaining   Figure 8) and the EM algorithm is employed for learning from collected data until the output parameter converges. The Randomized Pointbased Value Iteration algorithm [18] is utilized to solve the above-defined POMDP model. In our system, 32 iterations and 63.578 seconds are required for offline model compiling. Figure 9 shows the errors during the iteration. The error between two successive iterations is plotted in the y-axis, with higher value indicating faster convergence rate. The proposed approach was validated in a real office environment of size 12 m×7 m. An ActivMedia Peoplebot was used in the experiments. We assumed that participants in the experiment walked at a smooth speed and intended to follow certain motion patterns. The sensory system for robot localization and people-tracking consists of five stationary CCD cameras mounted on each side of the room above head level and the robot's onboard laser range finder. The environmental grid map was previously built by a SLAM algorithm with a resolution of 0.1 m. Based on the collection of tracking trajectories, typical indoor motion patterns of humans are learned, as presented in our previous work [23] [24] .

Predictive Navigation
During online predictive navigation, the robot collected laser scan data with a time period of 200 ms, and updated the local map local grid with a time period of 80 ms, according to the positions of detected human legs. In the PN-POMDP algorithm, th1  , th2  , th3  and th4  are the threshold values that can be set and adjusted in the experiment.
In the first three testing scenarios, the robot and the human were initially located in the same room area within a short distance of each other (less than 5 m). In the first case, where the robot was moving towards the human, the system predicted human motion 5 seconds ahead of time. Figure 10(a) shows that the robot began to avoid possible human-robot conflict using the detour policy when it was still about 3 m away from the human. In the second scenario ( Figure 10(b)), when the robot had predicted that its path would intersect with the predicted human trajectory from one side, it selected the slow down action s a . This policy was efficient because frequent replanning was avoided and the robot would continue to move along the path with normal speed when the human had passed the predicted intersection point. To test the reliability of the algorithm, in the third scenario ( Figure  10(c)) the robot was following a person through a narrow corridor. In this case, the robot made the prediction that it would not interfere with the human's motion if it did not overtake him/her. Thus the robot followed its planned path with regular speed. The fourth testing scenario involved predictive navigation. The robot and the human were initially positioned in two different rooms and global cameras were utilized for people-tracking. In the experiment, the robot was initially positioned at place A as in Figure  11(a), and it planned to navigate through a narrow passage (passage II) to place B. In the meantime, a person intended to walk through the same passage in the opposite direction.
Before the robot approached the passage entrance, the probability of human-robot motion confliction at the entrance of the passage (place C in Figure 11 (b)) was estimated. More specifically, the tendency likelihood of the person's temporary motion to be engaged in the motion pattern ending at place C was estimated at as high as 97.5%. However, since the predicted occupancy probability of the grid cells within the passage was volatile, the traditional replanning method caused the robot to switch paths between two candidate routes (plan1, detour via passage I and plan2, continue via passage II). This method guided the robot to unnecessarily move back and forth at the passage entrance and finally reach the goal, requiring as long as 633 time periods. In comparison, the PN-POMDP method supporting multiple comity policies generated a highly efficient and human-friendly behaviour. When the human-robot conflict probability within the passage II was predicted, the robot consequently drove to a free space outside the entrance of the passage and waited.
After the person had passed through the passage, the robot proceeded to cross the doorway and continued on its route. The resulting behaviour of the robot improved the navigation efficiency (only 342 time periods for reaching the goal) by avoiding unnecessary repeated zigzagging and wandering before entering the passage. The pose and translational velocity of the robot during the navigation test is shown in Figure 12. As shown in Figure  12(a), the replanning method caused the robot to switch between the two candidate paths during the interval (2) to (5). In contrast, Figure 12(b) shows that the PN-POMDP method ensures smooth and efficient robot navigation. More importantly, the polite navigation behaviour is comprehensible to humans and shows full respect to the human.  Figure 13 shows the experimental result in crowded environments with three participants walking around the robot. Since the PN-POMDP method supports multiple policies of predictive navigation, the robot frequently adjusted the policy according the predicted human-robot co-occurrence situations. In fact, after raising the reward function of the deceleration action s a , the robot tended to slow down for the human to pass first. This indicates that the PN-POMDP method is feasible to be applied to service robots that work in crowded environments such as exhibition halls and museums.

Trial study
A statistical trial study was also conducted to verify the success rate of the PN-POMDP method. We invited 12 participants (eight male and four female), ranged in age between 21 and 34. 33% of them were from nontechnological fields, while 67% worked in technologyrelated areas. The trial tests involved different types of situation as described above. In the trial tests, the following situations were treated as "failure": (i) The robot blocked the human's intended route of movement (subjective scoring); (ii) The robot failed to reach the goal because of getting trapped or localization failure; (iii) The robot reached the goal with time consumption as high as four times that needed in situations without humans moving around. Figure 14 shows that the PN-POMDP method achieved a higher success rate than the traditional real-time replanning method.

Conclusion
In this paper, we have presented a predictive navigation method for service robots in the POMDP framework. By learning human motion patterns and combining longterm and short-term human motion prediction, spacetime estimation of human-robot co-occurrence is achieved. In order to execute tasks in typical partially observable environments, POMDP-based probabilistic decision-making is incorporated to generate a theoretically optimal policy that allows the robot to behave in an efficient and polite manner. Thus the risk of conflict with human motion is minimized. The feasibility of the proposed methodology is validated by navigation experiments as well as user trials, in which the robot's navigational behaviour is interpreted by humans as safe, comprehensible and polite.
Although the system makes use of external cameras for human tracking, the proposed methodology framework does not rely on specific means for the acquisition of human motion. In situations where robots are not close to people, we suggest the utilization of global cameras to ensure seamless and reliable human-tracking, which improves the performance of predictive navigation.