FiCa speech dataset

The FiCa speech dataset is a private dataset consisting of 92 minutes of audio from a single female speaker. This dataset was originally created in order to train a TTS system capable of synthesizing short feedback responses such as "mhm", "oh", "wow". This work was published at SigDial 2024.

SigDial paper: Mhm... Yeah? Okay! Evaluating the Naturalness and Communicative Function of Synthesized Feedback Responses in Dialogue

Authors: Carol Figueroa, Marcel de Korte, Magalie Ochs, Gabriel Skantze

Voice Talent: Carol Figueroa

This speech dataset consists of different speech recordings:

Read speech: 43 minutes were recorded from the CMU ARCTIC database (Kominek and Black,2004)
Role-play acted speech: 4 minutes were recorded from the Taskmaster-2 dataset (Bryne et al., 2019)
Feedback imitations: 724 feedback responses were imitated from Switchboard amounting to 11 minutes (Godfrey et al., 1992)
Conversational speech: 34 minutes of speech were recorded from the voice talent while chatting with people. 981 instances of feedback were captured.

Feedback examples from the dataset

Access to the feedback imitations and conversational feedback responses can be requested. Please contact the first author Carol Figueroa

	Feedback imitations	Conversational feedback responses

To cite this dataset please use the following:

  @inproceedings{figueroa2024mhm,
  title={Mhm... Yeah? Okay! Evaluating the Naturalness and Communicative Function of Synthesized Feedback Responses in Spoken Dialogue},
  author={Figueroa, Carol and de Korte, Marcel and Ochs, Magalie and Skantze, Gabriel},
  booktitle={Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue},
  pages={544--553},
  year={2024}
}