The performance of Automatic Speech Recognition (ASR) systems can drop with even moderate levels of background noise such as when multiple speakers talk simultaneously. Human listeners appear to have little difficulty in such environments and can follow one voice amongst a mixture. Various factors have been shown to contribute to this ability, including the use of dynamic head movements. This active strategy to improve hearing has been extensively investigated for human sound source localization and separation in multi-source, reverberated environments, and has stimulated developments in the area of Robot Audition. The purpose of our research is to investigate the benefits of augmenting a mobile robot platform with an active hearing system to improve its speech recognition and perception abilities. This paper presents a milestone in our research on providing tools for developing active hearing on the MiRo robot platform.
THE MIRO ROBOT
The MiRo robot is a programmable mobile developer platform for companion and social robotics that resembles a pet animal whilst being clearly a robot. Developed by Consequential Robotics, MiRo has a unique biomimetic design. It has two physically directable ears that have the potential to deliver an animal-like ability to localize and track sound sources in an active manner and therefore support speech recognition capabilities.
MODELLING ROBOT-SPECIFIC ACOUSTIC FACTORS
Building an active hearing system for a robot requires the understanding of two sets of robot-specific factors: the spatial filtering properties of the robot’s ears and the robot's self-noise. The spatial filtering properties of the MiRo robot was evaluated by placing the robot in an anechoic chamber and measuring the responses of the microphones to carefully controlled sounds played from a grid of radial directions. This provided a set of impulse response recordings and the process was repeated for a range of orientations of the robot’s ears. The robot self-noise was also recorded as heard through MiRo’s own ears, enabling us to explore noise cancellation techniques in order to mitigate the effect of motors being placed close to the robot’s microphones.
We aim to develop a MiRo hearing simulator to be added to the existing MiRo motion simulator. This will enable the evaluation of hearing algorithms at an early stage in a simulated environment. It will also enable the filtering of standard speech and noise datasets, so they appear as if they have been recorded on the robot reflecting a variety of directions and environments. Such data augmentation is standard practice for creating the needed large multi-conditional datasets for training of Deep Neural Network (DNN) based systems in state-of-the-art ASR.