Humanoid Upper Torso Complexity for Displaying Gestures

Body language is an important part of human‐ to‐human communication; therefore body language in humanoid robots is very important for successful communication and social interaction with humans. The number of degrees of freedom (d.o.f) necessary to achieve realistic body language in robots has been investigated. Using animation, three robots were simulated performing body language gestures; the complex model was given 25 d.o.f, the simplified model 18 d.o.f and the basic model 10 d.o.f. A subjective survey was created online using these animations, to obtain people’s opinions on the realism of the gestures and to see if they could recognise the emotions portrayed. It was concluded that the basic system was the least realistic, complex system the most realistic, and the simplified system was only slightly less realistic than the human. Modular robotic joints were then fabricated so that the gestures could be implemented experimentally. The experimental results demonstrate that through simplification of the required degrees of freedom, the gestures can be experimentally reproduced.


Introduction
Body language is the physical movement of body parts due to muscle activation that would not be required for normal function. Emotions are portrayed through both the trajectory of these movements and their duration. The common perception of body language is that it is purely subconscious and reveals underlying and, perhaps, hidden emotional states. However, often body language is displayed consciously to accentuate verbal communication or deliberately display a strong emotional response.
humans. Gestures are suited towards being displayed by robotic systems as they are well bounded, with the action having a clear start and finish.
Automatic recognition of human body language from vision is a rapidly growing research area, but surprisingly little work has investigated the complexity of robot for displaying body language.
Several research institutions are developing humanoid robots. Asimo, is a 4ft tall, walking bipedal robot being developed by the American Honda Motor Co. [3]. It has 26 degrees of freedom, not including the 5 bending fingers on each hand. The SDR-4X from Sony, is a small humanoid built for entertainment. It has 38 DOF in total. The robot can recognise faces and speech. There has been consideration of the body language of the SDR-4X; extra degrees of freedom were added in the head and wrist to improve the expression of the robot [4]. However, it has no capability to bend at the waist limiting the emotions that can be expressed. Waseda University in Tokyo are researching into several humanoid robots; Robita, Wabian, Wendy and Wamoeba-2R. Wamoeba-2R is the only robot capable of displaying body language in its arms, however it is not able to move its shoulders or trunk, therefore it is likely the 'emotional' experiences reported [5,6] are as a result of facial features or speech synthesis. The Massachusetts Institute of Technology have performed significant work in the area of socially interactive robots. Kismet is a robot formed with the image of just a head and neck. The main focus of the work on Kismet is to make it natural and expressive. The areas of research include facial expression, body posture and social cues. Head and eye orientation and facial expression are used as nonverbal signals to portray emotions [7,8]. MIT's 'Cog', has a head, torso and arms but no legs. It has 22 degrees of freedom, similar to a human. Body language has been implemented on Cog, however it is again difficult to assess the realism of the body language when combined with facial expression. Table 1 summarises the humanoid robots discussed.
No conclusive work has demonstrated the importance of robotic body language for ease of human communication and the robots under development have different degrees of freedom and movement ranges. A robot designed specifically for natural human communication must have the capability to display emotion; however each additional joint of a robotic system adds significant cost. It is unlikely that the full complexities of human joints are required to be duplicated by a robotic system in order to display basic emotional responses.
This study seeks to gain a greater understanding of the required complexity of a robotic system in order to display emotion through animation. In section 2, three robot kinematic configurations are presented; the first with complexity approaching that of a human, the second with reduced complexity and the third with a very basic configuration.
The implementation of these configurations as animations is discussed. Then 6 different gestures with well-defined and understood meanings are selected and animations are developed in section 3. Section 4 describes an internet survey performed to assess people's ability to perceive the displayed emotions. Section 5 describes experimental development of the humanoid upper torso and section 6 illustrates single arm movement. Section 7 implements the gestures experimentally on the 10 d.o.f configuration and finally section 8 draws conclusions from the work.

Kinematic configurations
Three robot configurations were used in this study (figure 1).
Only the main joints have been analysed, for example the rib cage moves up and down in humans for some emotional states. No attempt has been made to represent these subtleties of motion. The study deliberately ignores facial or finger gestures as these are the dominant component of body language and their implementation would limit assessment of the limb movements. Each of the joints has a movement range similar to that of humans [9] irrespective of the system complexity.  Initially, the three robotic systems were developed in animation. Although the robots have different degrees of freedom, their outward appearance was identical. Figure 2 illustrates a static pose of the animation. The animations were constructed without facial features or muscular shapes to try and ensure they produce no strong emotion without motion. Grey was selected as a neutral colour scheme.

Gestures
Six emotional responses were selected as they have a good range of movements allowing a wide range of emotions to be portrayed. Furthermore, the gestures are not alike in meaning or in movement. Therefore, the chances of confusing the gestures are reduced. The gestures are described in Table 2 [10], [11].
Each robot figure was animated to implement these movements within the restrictions of the d.o.f. Ideally, the eyes should lead the movement closely followed by the head, to suggest that it is the thoughts of the character that are driving its actions. In this situation the animation has no eyes therefore, it is very important that the head leads. How much the head lead by depends upon how much thought is going in to the action. When a character is happy, the body movements it makes are fast; the body movements of a sad character are slower and the head hangs down [12]. Subtle differences in motion can affect the believability of characters [13].
For each joint there are two types of human movement; ballistic movements and controlled movements. Ballistic movements are prepared in advance without any adjustments in motor control. Controlled movements are made at a moderate speed and are subject to change; they are amended during the movement, using feedback information [14]. The animations use ballistic movements as they are displaying emotional thoughts, hence they are more innate, already known, and will not be subject to changes.
People do not move symmetrically, therefore, to increase the realism, the right arm is made to move slightly before the left arm, which indicates right-handedness. , simplified (B) and basic (C) animation frames. It is apparent from examining the animation frames that the basic system is unable to realistically display some of the movements such as arm cross.

1) Akimbo
The akimbo gesture is putting hands on hips. It is a confident and aggressive gesture, showing that the individual is prepared to "take steps". It is an aggressive gesture because it is used to make the person look bigger. Also the palms of the hands are facing down, which shows dominance and confidence. The akimbo animation also includes a slight head tilt to the side as a questioning gesture -"what do you think you're doing?".

2) Arm cross
This is a defensive gesture. It is guarded and shows disagreement, dislike, arrogance or anxiety. It is used as a barrier to block out undesirable circumstances. The spine twists to angle away from the person they are facing, showing negative feelings. The head is moved slightly so that it is still facing forwards.

3) Hand behind head
The hand behind head gesture is negative. It is also known as the 'pain-in-the-neck' gesture. It is usually indicative of feelings such as uncertainty, frustration, anger or dislike. The gesture includes gazing down and angling the body away to represents feelings of defeat, guilt, or shame.

4) Shrug display
The shoulder shrug display is a global body movement, including not only the shoulders, but also the head, elbows, hands and torso. This is a submissive gesture, showing uncertainty, resignation or helplessness. The movements are: the shoulders are raised, head tilted sideways, elbows bent and held in, palms shown and upraised, body bent forwards at the waist.

5) Dominance
Having the palm of the hand facing downwards shows dominance, confidence, assertiveness and authority. Moving the hand up and down (beating) is symbolically beating the listener into submission. The head tilting backwards shows superiority, arrogance and disdain. Having the other hand on the hip helps to confirm the attitude of confidence, it too makes the palm face down

6) Excited
The gestures included in the Excited movement are nodding the head and rubbing the palms of the hands together. Nodding the head up and down is affirmative, showing understanding, approval or agreement; emphatic nods show feelings of conviction, excitement or sometimes rage. This excited movement has emphatic nods. Rubbing the hands together shows positive expectation; quickly rubbing together shows excitement.

Survey to assess animations emotional expression
A survey was performed to gain some incite into the realism of each gesture and to compare the different robot configurations.
The survey contained all the animations displaying 'emotional' states and each reviewer was asked the following questions: "What do you think the figure is feeling or thinking?" "What do you feel or think when you look at the figure?"    The people performing the survey were unaware that the animations had different d.o.f and the animations were presented in a random order, so that the animations with the same gesture could not be directly compared. The complex animation has the greatest d.o.f therefore, it is likely that it would receive the highest score, conversely the simplified animation would have a lower score and the basic would have the lowest score. Nineteen anonymous responses received through the Internet survey were examined. To be labelled as correct, the emotion given had to describe the general emotional area, since emotions are very subjective and the movements that people make when experiencing them are very individual.
The results of the survey for the recognized emotion are shown in Table 3. The body language of the basic animation is the least recognisable, with only 34% of the movements being identified. The complex animation has the most correct responses and therefore has the most recognisable movements.   These results indicate that the clavicle joint in particular plays little part in gesture representation. Reducing the d.o.f to 10, vastly reduces the clarity of emotional states displayed. Little movement of the wrist was implemented on any of the configurations and the complexity of the neck on the simplified configuration was not required for the majority of the movements. Thefore, the structure of the upper torso was reduced to that shown in figure 9 for experimental implementation. Constructing an arm from four degrees of freedom it is still a relatively complex task. Modularity is the best approach to keep the design simple and relatively affordable. Two different modular units were used to create the arm; with different torque and weight performance. Each unit consists of a single motor and potentiometer to allow precise joint angle control. The modules were designed to allow joint constructions for any serial/parallel combination of joints. This allows extremely versatile construction, with the drawback that spherical joints are modelled by three separate joints with offset; this results in only an approximate spherical joint. Table 4 describes the performance of the two modules and figure 10 illustrates the modular joint system in a pitch rotation configuration (around the x axis). The modules can also be connected in relative roll (around y) and yaw (around z) configurations. Each module can be connected directly end to end or spaced using hollow carbon fibre rod; this allows complex joints and structures to be implemented. The final kinematics of the constructed arm is shown in figure 11 and a photo of the system is shown in figure 12. The main issue with using modular single degree of freedom joints is apparent in the shoulder joint. Ideally this should be a single spherical joint with all the axes aligned. However, here only 2 of the joints are aligned resulting in translations that vary with joint orientation.   Although the animations contain two arms, head and torso movement, they offer a useful comparison against the performance of the simplified arm developed here. Figure 13 illustrates the akimbo single arm experimental response and repeated animation frames for ease of comparision. The illustrations show 'key points' of motion and do not necessarily correspond to sequential time slices. Figure 16 shows the full angle movements against time in joint space. The experimental arm is capable of accurately reproducing the akimbo action. Most joints are involved in the motion apart from the first joint. It is important to note that the motion needs to be considered in joint space as the full structural configuration expresses gestures, rather than the traditional robotic focus on end effector movement.
The dominance gesture is shown in figures 14 & 17. To perform the gesture the whole arm is rotated forward to allow it to be raised in line with the viewer. The rotation of the arm around joint 1 results in movement of the arm out of the page, due the shoulder joints not being coincident. This also aligns the large joint length ways, which looks ungainly. However, in general the gesture can be represented with reasonable accuracy. Note that to perform the gesture both arms perform different actions. The other arm is producing an Akimbo action, which has already been demonstrated to be producible on the system.
The shoulder shrug gesture is synonymous with raising the clavicle; this is a movement the arm is not capable of performing. Therefore, the gesture is expressed solely in the rest of the arm. Figures 15 & 18 illustrate the gesture being implemented. The arm creates the correct profile, however the gesture is not easily recognisable without the distinctive shoulder raise. It maybe that is specific contexts this gesture will be sufficient to be understandable. Following the successful implementation of the single arm, a full experimental system was constructed with the complexity shown in figure 9. The arm trajectories were defined as in the previous section however, one arm leads the motion and each arm motion was slightly different to avoid a 'mechanical' look to the motion. Figure 19 shows motion frames of the akimbo gesture. Note that the right arm leads the left indictating a 'right handed' motion. The trajectory paths of the motions are also slightly different. Figure 20 illustrates the dominance gesture. This gesture is formed from different left and right arm motions. The left arm reaches towards the observer resulting in stronger response than in animation. Figure 21 illustrates the shrug motion, with the right arm again leading. Supple movements of the head add to the gesture effectiveness.
These results show that you do not need high degrees of freedom to display recognisable gestures.  This work has investigated the expression of emotion by the upperbody motion of humanoid robots. It has been demonstrated in both animation and experimentally that gestures can be displayed from robot arms with far less degree of freedom than humans. The reduced complexity has enabled an experimental system to be constructed with relative ease to implement these gestures. Further work will perform detailed interaction studies to determine the emotional response to the gestures and compare/contrast the differences between the emotion generated by animation and those from the experimental system.