Skip to content

Norwegian Conversation Speech Corpus

NB Samtale is a speech corpus made by the Language Bank at the National Library of Norway. The corpus contains orthographically transcribed speech from podcasts and recordings of live events at the National Library. The corpus is intended as an open source dataset for Automatic Speech Recognition (ASR) development, and is specifically aimed at improving ASR systems’ handle on conversational speech.

The corpus consists of 12,080 segments, a total of 24 hours transcribed speech from 69 speakers. The corpus ensures both gender and dialect variation, and speakers from five broad dialect areas are represented. Both Bokmål and Nynorsk transcriptions are present in the corpus, with Nynorsk making up approximately 25% of the transcriptions.

We greatly appreciate feedback and suggestions for improvements. PLease contact us at sprakbanken@nb.no.

NB Samtale is a speech corpus made by the Language Bank at the National Library of Norway. The corpus contains orthographically transcribed speech from podcasts and recordings of live events at the National Library. The corpus is intended as an open source dataset for Automatic Speech Recognition (ASR) development, and is specifically aimed at improving ASR systems’ handle on conversational speech.

The corpus consists of 12,080 segments, a total of 24 hours transcribed speech from 69 speakers. The corpus ensures both gender and dialect variation, and speakers from five broad dialect areas are represented. Both Bokmål and Nynorsk transcriptions are present in the corpus, with Nynorsk making up approximately 25% of the transcriptions.

We greatly appreciate feedback and suggestions for improvements. PLease contact us at sprakbanken@nb.no.

Extended metadata

Download resources

Download metadata