Robot sound research to be tested with a waiter robot

Emotional expression of robots was examined in a recent study participated by BME Faculty of Mechanical Engineering, Department of Mechatronics, Optics, and Mechanical Engineering Informatics (BME-GPK MOGI). The result was made in international cooperation of researchers, institutes and research groups on robotics and ethology. The examination is going to be tested with a waiter robot. The first author of the paper, published in Scientific Reports, published by the prestigious Nature, was Beáta Korcsok (left), assistant lecturer of the BME-GPK MOGI Department. Interview with professor Péter Korondi (right), a member of the project.


How did you find this topic of research?

Péter Korondi: The Department of Mechatronics, Optics and Mechanical Engineering Informatics (MOGI) takes part in researches on ethorobotics, which field of study was born in this form due to a cooperation of domestic scientific institutions. In the researches on ethorobotics, the BME-GPK MOGI takes the mechanical tasks, while the ELTE Department of Ethology, and the MTA-ELTE Comparative Ethology Research Group takes the ethology-wise ones. The Miskolc University Institute of Information Science, and as a foreign partner, the Tokyo-based Chuo University also participated in the research.

MOGI Robi (left), and MOGI Ethon (right)
The research was triggered by a conversation, when Ádám Miklósi ethologist told, that how they rewrote mother-child attachment studies into dog-owner attachment tests. That time we decided that if we replace dogs with robots, the robot-owner attachment can be examined, too. Of course, with this, the attachment behaviour examination of robots should be needed. Ádám Miklósi mentioned some rules, that they have found, and they agreed, that these verbal ethology rules should be described with fuzzy logics. That was the particular moment when the Hungarian ethorobotics was founded. The MOGI Robi, already retired, was created that time.

Ethorobotics, which term is a merge of the terms ’ethology’ and ’robotics’, seek opportunities of developing such social robots, whose behaviour and communication can integrate them into the human environment, and are more acceptable for humans. This field of study came to life, although the usage of social robots is increasing, the communication towards humans is still an unsolved problem. The communication of the robot must be understandable for anyone, and also should not be disturbing in the long term.


The most wide-spread direction of the research is creating humanoid robots, which encounters some problems. The humanoid robots, which can manage tasks, are still not going to be produced technologically for a long time; secondly, their human look can trigger fear or dismay in humans. The ethorobotical approach, on the other hand, robots should be handled as an individual species, and their communication should be set to their function and abilities.  

During our research, we examined such models, mainly based on the behaviour of social animals, enabling us to design proper behaviour and communication signals. In our study, we examined such signals and voice effects.

What kind of tasks did you have to deal with?

Péter Korondi: Within this research series, we have been focusing on the emotional expression, for instance, with the help of visual signals, when we designed the emotional expressions. For example, with the help of an artificial agent appearing on the monitor, the agent was built of a circle and a square, maze-like abstract shape. With motion and changes in its colour and size, we created emotional expressions along with such patterns, which are present in the emotional expressing behaviour of some animal species.

In the recent period, we examined, that in case of artificially-created voices, whether can we observe or not such patterns, which present in the animal and human vocalisations. We also pointed connections between the acoustical parameters of the vocalisation and the emotional state of the individual.

Humans can not only understand emotional expressions of individuals having different mother tongue, or culture, with non-verbal vocalisations (e.g. crying, screaming, laughing or sighing), but it is true to many land mammals, for example, dogs and pigs.

This phenomenon is enabled by the similar vocal tracts, and mode of vocalisation of the mammals, thus, expression of the similar emotional states. Such a pattern is, for instance, that the high fundamental frequency voices are perceived as more intensive, while the short ones are perceived as more positive ones.

During our research, with the help of Praat acoustics software, we generated 600 voices, with the systematic change of the pitch, and the length of the sounds, besides, we added multiple acoustic parameters to the sounds, creating seven voice categories. The artificial voices range from simple sounds (which are the simplest and clearest ones) to complex sounds, modelling mammal vocalisation. The sounds were examined by an online survey with statistical data analysis. 

What was the most significant challenge you faced in the research?

Péter Korondi: In this research, the most significant challenge was choosing the proper parameters and thresholds of the artificial sounds, as there was a wide range of opportunities, due to the complexity of the animal vocalisation. The long history of bioacoustical researches helped us, not only examining mechanisms of animal and human vocalisations, but also emotional expressions between and inside species, and which are studied at the ELTE Department of Ethology.
Felirat Illustration: when filling the questionnaire, the voices
 had to be located in the coordination system above,
depending on their intensity and emotional valence

The study about the voice-triggered emotions was conducted with an online questionnaire. How was the research conducted? How many individuals were in the sample, with what kind of selection method?

Péter Korondi: The research was done with an online interactive questionnaire, which we were seeking participants, advertised on social media platforms. In the study, voluntary adults were participating. The poll was available in English and Hungarian as well. After the demographic questions, and some voices showing the users how to handle the platform, the participants had to listen to ten of each of the seven sample categories. After hearing the sound, the participants had to locate it in a coordinate system, depending on its intensity and emotional valence. The answers were filtered by the response time, and we ruled out responses coming after 20 seconds. We processed the response of 237 participants, 95 responding in Hungarian, 142 in English.

Which animal had a sound kit the closest to the optimal?

Péter Korondi: The participants evaluated artificial voices based on the patterns, observed on animal voices, not depending on the biological complexity, so the higher frequency sounds were perceived more intensive. In contrast, the shorter sounds were perceived as more positive. As this result suggests, that the examined emotional coding patterns should be very ancient. Moreover, these rules work similarly regardless of the exact species. Besides vocalisation, the sound-processing neural processes should be similar among land mammals. So there were no optimal, or less optimal sound kit regarding emotional expression, enabling further voice setting of these features.

Biscee, the waiter robot
Do you plan further researches on the topic?

Péter Korondi:  Yes, the artificial voices created in the study, should be examined in further research, that they trigger approach-avoidance, which serves as the third, social dimension of this model. With the inclusion of this dimension, such negative and intensive voices can be separated, which are all negative and intense voices, based on the first questionnaire, but the reaction given to such sounds can vary totally. As an everyday example, we could think about sounds of anger, and fear or pain: we come less close to an angrily barking, snarling dog, than a painfully whimpering one. However, both are intensive voices with a negative resonation. The research results on artificial-voice triggered approach-avoidance are going to be published soon.
We are going to study the created voices in real-life situations, with the help of a social robot, Biscee, which we plan to launch in a café environment, working as a waiter, enabling us to analyse voices, and further communication signals.

László Benesóczky

No comments:

Post a Comment