Skip to content

Norwegian Voice Control Corpus

The Norwegian Voice Control Corpus (NVCC) is a text and speech corpus consisting of written queries in Norwegian Bokmål and Nynorsk within a number of intents, and voice recordings of these queries. The queries are the type of commands typically given to mobile phones to trigger certain functions, and the intents reflect the functions a mobile phone typically has.

NVCC consists of 10 706 queries within 183 different intents. The intents are sorted into 24 intent groups further organised into 9 domains. 9,834 of the queries were recorded, read by eleven different speakers from five dialect groups. Each query has been segmented into individual audio files. The transcriptions, written queries and information about the audio segments and speakers are organised in csv files. See the documentation file for detailed information.

NVCC is open-source and primarily intended as training data for the kind of voice controlled assistants found in mobile phones. However, as it is possible to make use of the text and speech parts of the corpus separately, the corpus might also be useful for development of text-based language technology, like chatbots.

NVCC is developed by the Language Bank at the National Library of Norway. We greatly appreciate any feedback and suggestions for improvement. Please contact us at sprakbanken@nb.no.

The Norwegian Voice Control Corpus (NVCC) is a text and speech corpus consisting of written queries in Norwegian Bokmål and Nynorsk within a number of intents, and voice recordings of these queries. The queries are the type of commands typically given to mobile phones to trigger certain functions, and the intents reflect the functions a mobile phone typically has.

NVCC consists of 10 706 queries within 183 different intents. The intents are sorted into 24 intent groups further organised into 9 domains. 9,834 of the queries were recorded, read by eleven different speakers from five dialect groups. Each query has been segmented into individual audio files. The transcriptions, written queries and information about the audio segments and speakers are organised in csv files. See the documentation file for detailed information.

NVCC is open-source and primarily intended as training data for the kind of voice controlled assistants found in mobile phones. However, as it is possible to make use of the text and speech parts of the corpus separately, the corpus might also be useful for development of text-based language technology, like chatbots.

NVCC is developed by the Language Bank at the National Library of Norway. We greatly appreciate any feedback and suggestions for improvement. Please contact us at sprakbanken@nb.no.

Extended metadata

Download resources

Download metadata