THE SHEFFIELD SEARCH AND RESCUE SPEECH CORPUS

A goal-oriented two-party human/human conversational speech corpus with multiple levels of annotations for spoken language understanding purposes.

ABOUT THE SSAR

As part of an ongoing research into extracting mission-critical information from Search and Rescue speech communications, a corpus of unscripted, goal-oriented, two-party spoken conversations has been designed and collected. The Sheffield Search and Rescue (SSAR) corpus comprises about 12 hours of data from 96 conversations by 24 native speakers of British English with a southern accent. Each conversation is about a collaborative task of exploring and estimating a simulated indoor environment. The task has carefully been designed to have a quantitative measure for the amount of exchanged information about the discourse subject. SSAR includes different layers of annotations which should be of interest to researchers in a wide range of human/human conversation understanding as well as automatic speech recognition. It also provides an amount of data for analysis of multiple parallel conversations around a single subject.