Norwegian Voice Control Corpus

The Norwegian Voice Control Corpus (NVCC) is a text and speech corpus consisting of written queries in Norwegian Bokmål and Nynorsk within a number of intents, and voice recordings of these queries. The queries are the type of commands typically given to mobile phones to trigger certain functions, and the intents reflect the functions a mobile phone typically has.

NVCC consists of 10 706 queries within 183 different intents. The intents are sorted into 24 intent groups further organised into 9 domains. 9,834 of the queries were recorded, read by eleven different speakers from five dialect groups. Each query has been segmented into individual audio files. The transcriptions, written queries and information about the audio segments and speakers are organised in csv files. See the documentation file for detailed information.

NVCC is open-source and primarily intended as training data for the kind of voice controlled assistants found in mobile phones. However, as it is possible to make use of the text and speech parts of the corpus separately, the corpus might also be useful for development of text-based language technology, like chatbots.

NVCC is developed by the Language Bank at the National Library of Norway. We greatly appreciate any feedback and suggestions for improvement. Please contact us at sprakbanken@nb.no.

Download resources

Extended metadata

Last ned metadata (CMDI XML)

Last ned metadata (CMDI XML) https://www.nb.no/sprakbanken/oai?verb=GetRecord&identifier=oai:nb.no:sbr-75&metadataPrefix=cmdi

dc:type	corpus
dc:title	Norwegian Voice Control Corpus
dc:identifier	oai:nb.no:sbr-75
dc:description	The Norwegian Voice Control Corpus (NVCC) is a text and speech corpus consisting of written queries in Norwegian Bokmål and Nynorsk within a number of intents, and voice recordings of these queries. The queries are the type of commands typically given to mobile phones to trigger certain functions, and the intents reflect the functions a mobile phone typically has. NVCC consists of 10 706 queries within 183 different intents. The intents are sorted into 24 intent groups further organised into 9 domains. 9,834 of the queries were recorded, read by eleven different speakers from five dialect groups. Each query has been segmented into individual audio files. The transcriptions, written queries and information about the audio segments and speakers are organised in csv files. See the documentation file for detailed information. NVCC is open-source and primarily intended as training data for the kind of voice controlled assistants found in mobile phones. However, as it is possible to make use of the text and speech parts of the corpus separately, the corpus might also be useful for development of text-based language technology, like chatbots. NVCC is developed by the Language Bank at the National Library of Norway. We greatly appreciate any feedback and suggestions for improvement. Please contact us at sprakbanken@nb.no.
dc:publisher
dc:format	downloadable
dc:date	2020-01-06
dc:date	2022-12-15
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-ZERO (CC-ZERO)
dc:rights	https://creativecommons.org/publicdomain/zero/1.0/
dc:creator	National Library of Norway
dc:lang	Norwegian

Norwegian Voice Control Corpus

Download resources

Extended metadata

Dublin Core (DC)

Last ned metadata (CMDI XML)