Human-Robot Interaction Learning Using Demonstration-Based Learning and Q-Learning in a Pervasive Sensing Environment

Given that robots provide services in any locations after they move toward humans, the pervasive sensing environment can provide diverse kinds of services through the robots not depending on the locations of humans. For various services, robots need to learn accurate motor primitives such as walking and grabbing objects. However, learning motor primitives in a pervasive sensing environment are very time consuming. Several previous studies have considered robots learning motor primitives and interacting with humans in virtual environments. Given that a robot learns motor primitives based on observations, a disadvantage is that there is no way of defining motor primitives that cannot be observed by a robot. In this paper, we develop a novel interaction learning approach based on a virtual environment. The motor primitives are defined by manipulating a robot directly using demonstration-based learning. In addition, a robot can apply Q-learning to learn interactions with humans. In an experiment, using the proposed method, the motor primitives were generated intuitively and the amount of movement required by a virtual human in one of the experiments was reduced by about 25% after applying the generated motor primitives.


Introduction
In pervasive sensing environments, robots can provide various services in an active manner. Irrespective of the locations of humans, robots can provide services after they move toward humans based on the information of the humans' daily life [1]. However, the following problems may occur after a robot learns interactions with a human. First, a robot cannot learn interactions with humans rapidly, which leads to learning time problems during interaction learning. Therefore, it is necessary to learn interactions with humans without the participation of humans. Second, an interaction between a robot and a human could injure the latter, because of the incomplete perception of robots. Therefore, protective equipment is required by humans.
Previous studies have considered interaction learning with a human in virtual environments, which can solve the problems described above [2][3][4]. A virtual robot can generate its motor primitives by observing a virtual human in virtual environments and utilizing demonstration-based learning. However, unobservable motor primitives cannot be generated. In addition, because of the differences in the appearance of a robot and a human, the motor primitives of robots may differ from the movements of humans and it might not be possible to perform the motor primitives generated for a robot. Thus, the methods used to generate different motor primitives need to be improved. Further research is required to determine how to teach motor primitives to a robot while learning interactions with humans in a virtual environment. In this paper, we propose a virtual pervasive sensing environment-based interaction learning method that utilizes demonstration-based learning to learn motor primitives and -learning to execute motor primitives. The motor primitives are defined during manipulations based on demonstration learning, so the motor primitives can be generated intuitively by users who are not programmers. The application oflearning allows the newly generated motor primitives to be performed without modifying any of the algorithms after their production.
The remainder of this paper is organized as follows. Section 2 introduces demonstration-based learning approaches and virtual environment-based learning. Section 3 proposes an interaction learning method for a virtual pervasive sensing environment. Section 4 presents the results of interaction learning experiments in virtual pervasive sensing environments. Finally, we provide our conclusions in Section 5.

Related Work
Various types of learning algorithms are required to allow robots to interact with humans. In this section, we summarize related research into the learning of motor primitives and the learning of interactions with humans in virtual environment.
The motor primitives learned by robots are very important for achieving their goals. The repulsion of robots can be reduced by different motor primitives. Different types of research are ongoing to produce motor primitives for robots that appear more natural, like those of humans. For example, a related study defined natural motor primitives for following the shortest path [5,6]. A genetic algorithm was used to generate these movements. Following mutation, the motor primitives that failed to follow the shortest path were eliminated and new motor primitives were generated. Another approach is to use demonstration-based learning [7][8][9]. Demonstration-based learning algorithms learn each motor primitive separately based on repetition, before analyzing the same learned motor primitives [7,10]. Another approach involves learning motor primitives by dividing a series of movements [8], where each motor primitive is defined as a part of the series of movements. Furthermore, an approach was proposed that generates motor primitives as a hierarchical tree [9,11]. Within the same hierarchical tree, a robot executes the same motor primitive initially but executes different motor primitives in different states. The motor primitives are usually generated by planning algorithms [12]. However, some problems may occur if planning algorithms are applied. For example, planning algorithms are defined based on the generated motor primitives. If the motor primitives change, the planning algorithms must also be changed to execute the motor primitives. An advantage of demonstration-based learning is that humans can define motor primitives without any requirement for programming. However, this advantage does not apply to planning algorithms. Therefore, algorithms are required that are not affected by changes to the motor primitives.
There is a method that learns the interaction with humans by utilizing motor primitives after generating the motor  primitives using demonstration-based learning [13]. A previous study defined a virtual human and a virtual robot, where the former is a virtual agent that behaves in virtual environments in the same way as a human in a virtual environment, while the latter behaves like a real robot. Therefore, a virtual robot interacts with a virtual human to learn an interaction with a real human. If a virtual human executes a motor primitive, the virtual robot also executes the motor primitive at the same time. However, virtual-based interaction learning has problems. For example, the motor primitives used by a virtual robot cannot be generated if a virtual human does not execute the motor primitives, because they are generated by observing the virtual human. Therefore, another approach is required for generating motor primitives.
Thus, we propose a new approach for defining the motor primitives for a virtual robot. We also apply -learning to solve the problem of executing motor primitives, which does not require any changes after the modification of motor primitives.

Concept.
In a pervasive sensing environment, it takes a long time to learn interactions with humans and the number of interactions with robots is limited. Therefore, the number of interactions should be reduced to increase the amount of the learning to facilitate the high quality execution of motor primitives. In our approach, the interactions are learned via a virtual pervasive sensing environment, so no interactions are required in real pervasive sensing environments, as shown in Figure 1.
We define two types of virtual agents for learning in a virtual pervasive sensing environment: a virtual human and a virtual robot. The virtual human acts like a human while the virtual robot executes motor primitives to collaborate with the virtual human. The virtual robot learns interactions with real humans by interacting with virtual humans. The learning result is then embedded in the real robot. The real robot executes motor primitives based on the results of virtual learning to interact with a real human.  There is no requirement for interactions with real humans. The learning time problem is always invoked if a human is involved during learning processes, which makes it very hard to reduce the learning time. However, the learning time can be reduced more by increasing the speed of interactions between a virtual human and a virtual robot. This is because a virtual human and a virtual robot do not need to execute motor primitives at the same speed as a real human and a real robot.
In our approach, interaction learning includes human modeling, motor primitive learning, collaboration learning, deployment, and collaboration stages. In this paper, we only propose the processes used during the motor primitive learning stage and the collaboration learning stage as shown in Table 1. During the human modeling stage, humans control a virtual human to make them act like humans by executing predefined motor primitives. The virtual humans learn how to execute motor primitives by analyzing the human control process. During the motor primitive learning stage, humans control the virtual robots directly to teach them how to move, and the virtual robots then generate their own motor primitives. Next, the virtual robot interacts with a virtual human by executing the learnt motor primitives. During this interaction, the virtual robot learns how to provide services to humans. The results obtained from motor primitive generation and from interactions are then applied in a real robot, which can interact with real humans.

Human-Robot Interaction Framework and Processes.
The roles of real humans are divided into two groups during whole learning processes: one for residents and the other for operators. Operators teach real robots while residents live in pervasive sensing environments. All of the virtual humans in the virtual pervasive sensing environment are virtual residents. We also define a robot server as a server that generates motor primitives and policies, which transfers data between a real robot and a virtual robot. Our proposed framework is shown in Figure 2.
First, an operator controls a virtual human via a user interface. During the motor primitive learning stage, there are two modules in a real robot: a motor measurer and a motor primitive generator. The motor measurer is deployed in a real robot. When the operator manipulates a real robot directly, the motor measurer determines the degrees of the joints in the real robot. The motor primitive generator is embedded in the robot server rather than the real robot, which separates the dependency of the motor primitive generator from the robot platform. The generated motor primitives are deployed in the virtual robot and the real robot.
During the collaboration learning stage, a policy generator and a motor primitive executor are utilized to learn the interactions between a resident and a real robot based on the interactions that occur between a virtual human and a virtual robot. The motor primitive executor executes the generated motor primitives and the policy generator then generates the results of the interaction. The interaction results are then deployed in the real robot. Finally, the real robot can provide various services by executing the motor primitives based on the interaction learning results.
In our approach, a robot executes multiple motor primitives.
is the th motor primitive. A motor primitive is de fined as a part of a series of movements, which is described by multiple joints of the robot. Therefore, comprises multiple joints. The th joint of the th motor primitive is defined · · · by , . , ,ℎ is the ℎth measured , . If is the number of joints, is ⟨ ,1 , . . . , , , . . . , , ⟩. Each joint moves irregularly. ,ℎ denotes the time when , ,ℎ is executed. Finally, the set M is a motor primitive set. Figure 3 shows the example of the configuration of the motor primitive set.
To eliminate any differences between motor primitives of a virtual robot and a real robot, the motor primitive generator generates the same motor primitives for both. To reduce the number of movements measured, any movements are eliminated that do not change as much as the difference calculated using (1). After similar movements are eliminated, the motor primitives are generated using the remaining measured movements. Consider Given that pervasive sensing environment is usually complex, the policy generator used by our approach utilizes -learning [14] to execute the generated motor primitives, becauselearning has the advantage that a model of the environment does not need to be defined. In addition, -learning algorithm does not need to be modified after the motor primitives are generated. The policy generator selects motor primitives depending on the current state and sends the selected motor primitive to the motor primitive executor for execution. After executing each motor primitive, the corresponding reward of the executed motor primitives is calculated and transferred back to the policy generator. The policy generator updates the Q-values with the reward using where is an executed motor primitive, is a state, is a reward after executing , and are the next state and the next motor primitive, respectively, denotes the learning rate, and is a discount factor.
The motor primitive executor receives motor primitives from the motor primitive generator and executes the motor primitives according to the decisions made by the policy generator. After executing the motor primitives, the corresponding reward of the executed motor primitives is transferred to the policy generator.

Configurations of the Real and Virtual Pervasive Sensing
Environments. In our experiment, we used a Nao as a real robot. We also built a model house, which was a suitable size for the Nao, as shown in Figure 4. The model house contained a kitchen, living room, and bedroom. The Nao learned during interactions with a real human.
The objective of the Nao was to transfer the objects required by a real human. After recognizing the object, the Nao moved toward the object initially. Next, it grabbed the object, moved toward the real human, and gave the object to the real human. In the experiments, we used the objects shown in Table 2. There were two types of objects: static objects that could not be moved and movable objects, which a Nao and a human could grab, carry, and put down.
The state space must be defined in advance to uselearning. In this experiment, we denoted the positions of the human and the robot based on their grid coordinates, after taking a picture using an omnicamera placed on the ceiling and dividing the picture into the grid shown in Figure 5. The size of each cell was set to the width of the Nao. Thus, 50 cells were defined. We defined each state based on the coordinates of the human, the robot, and the object located nearest to the human.
To learn interactions between a real human and a real robot, the virtual pervasive sensing environment used in this experiment was modeled in exactly the same way as the real pervasive sensing environment, as shown in Figure 6. Therefore, the structure and size of the virtual pervasive sensing environment were the same as the real pervasive   sensing environment. Objects were also deployed in the same way as the real pervasive sensing environment. We utilized two virtual agents as a virtual human and a virtual robot.

Configuration of the Motor Primitives.
A real operator controlled a virtual robot, while a virtual human and a robot server were also used, depending on the stage. The robot followed a different process during each stage and the real Figure 6: Virtual pervasive sensing environment used for interaction learning.

Standing after grabbing
If a real robot has grabbed an object, it stands and waits to execute the next motor primitive 10 Walking A real robot follows a ball while remaining at a fixed distance from the ball operator also controlled the state of the real robot by touching a touch sensor on the head of the real robot. The motor primitives of the robot were defined as follows. The real operator manipulated the robot directly to make the robot learn the motor primitives. There were two types of motor primitives. First, a type of motor primitive was predefined by programming, as shown in Table 3. For example, given that an initial motor primitive was required and that it was very hard to define a walking motor primitive by 6 International Journal of Distributed Sensor Networks Receiving an object with the right hand Giving Giving an object with the right hand Walking Walking toward a specific object Sitting Sitting on a chair or couch Laying Laying down on a bed manipulation, the real robot executed two preprogrammed standing motor primitives and one walking motor primitive. The other type of motor primitive was defined by the manipulations performed by the operator. For the walking motor primitive, the algorithm determined a path from the current coordinates to specific coordinates. We used the * search algorithm because the grids of the virtual and real pervasive sensing environments were not complex and they only comprised 50 cells. For example, if a real operator was in the specific position where a virtual human needed to move, the virtual human moved to the position while avoiding objects and walls.
While the real robot was learning the motor primitives, the real robot measured its joints every 500 ms and transferred the values of the joints to the robot server. If the interval is set under 500 ms, the joints are not measured accurately, which delays the performance of the real robot.
We predefined the animation of the virtual human, as shown in Table 4. The objective of the Nao was to transfer objects for a virtual human, so the animation of the virtual human also focused on transferring objects.

Motor Primitive Generation Experiment.
The first experiment aimed to generate motor primitives for the Nao. An operator defined the motor primitives from 2 to 7 by manipulating the arms and touching the touch sensors on the arms, as shown in Table 5. In this experiment, the operator only controlled the arms because the legs only moved when the robot walked.
The real robot executed a series of motor primitives. The end of a motor primitive was connected to the end of the next motor primitive in a natural manner. Thus, the standing motor primitives were executed after each motor primitive and the next motor primitive started after the end of the standing motor primitive. Therefore, we defined the sequence of motor primitives as shown in Figure 7.
Some of motor primitives could not be connected with the standing motor primitive because of the grabbed objects. Therefore, standing after grabbing was added. Standing after  grabbing was performed after executing, receiving, or onehand grabbing, followed by one-hand placing or giving.
Each motor primitive was generated based on separate manipulation performed by a real human. Figure 8 shows four of the generated motor primitives. Only five joints were measured, which were all related to the right hand. The generated motor primitive was then performed by the virtual robot.

Interaction Learning Experiment.
We specified a scenario for learning the interactions. First, we applied our approach to the scenario where a human stood up, sat on a couch, and then read a newspaper after picking it up, as shown in the following list (a).

Interaction Learning Results
(a) Scenario where a virtual human lives alone is as follows: (i) a virtual human sleeps, (ii) the human wakes up on a bed, (iii) the human walks to a couch, (iv) the human sits on the couch for a while, (v) the human stands up on the couch,

Conclusion
In this paper, we developed an approach to virtual pervasive sensing environment-based interaction learning where the operators taught motor primitives to a real robot by manipulating its arms directly. The learned motor primitives were utilized by a virtual robot and executed to learn interactions with a human. The operators defined the motor primitives using manipulations, so various different types of motor primitives could be defined intuitively, which overcame the problems of previous approaches. The virtual human and the virtual robot used in our proposed method and -learning are suitable for single agentbased learning algorithms, so it is necessary to improve our proposed method by applying multi-agent-based -learning. A method is also required to allow a virtual robot to provide services to multiple virtual humans. Finally, an approach will be developed to facilitate the application of the learned interaction results to a real robot.