Norwegian Conversation Speech Corpus

NB Samtale is a speech corpus made by the Language Bank at the National Library of Norway. The corpus contains orthographically transcribed speech from podcasts and recordings of live events at the National Library. The corpus is intended as an open source dataset for Automatic Speech Recognition (ASR) development, and is specifically aimed at improving ASR systems’ handle on conversational speech.

The corpus consists of 12,080 segments, a total of 24 hours transcribed speech from 69 speakers. The corpus ensures both gender and dialect variation, and speakers from five broad dialect areas are represented. Both Bokmål and Nynorsk transcriptions are present in the corpus, with Nynorsk making up approximately 25% of the transcriptions.

We greatly appreciate feedback and suggestions for improvements. PLease contact us at sprakbanken@nb.no.

Extended metadata

resource Common Info:
resource Type: corpus
identification Info:
resource Name: NB Samtale
resource Name: Norwegian Conversation Speech Corpus
description: NB Samtale er et talekorpus med ortografisk transkribert lydmateriale hentet fra podkaster og opptak av arrangementer på Nasjonalbiblioteket. Korpuset inneholder samtaler mellom flere personer, og talen er spontan og har typiske trekk ved muntlig språk. Lydmaterialet er valgt ut med tanke på god balanse mellom kjønnene og god dialektvariasjon, og korpuset har transkripsjoner på både bokmål og nynorsk. NB Samtale er tenkt som et open-source-datasett for trening av automatisk talegjenkjenning, spesifikt gjenkjenning av spontan tale mellom flere personer i samtale. Det er til sammen 24 timer transkribert tale fra 69 talere fordelt på 12.080 segmenter som hver er en individuell WAV-fil. Metadataene inneholder blant annet informasjon om segmentenes kildefil, tidskode og varighet, samt talernes kjønn, dialekt og målform. NB Samtale er utviklet av Språkbanken ved Nasjonalbiblioteket. Vi setter stor pris på tilbakemeldinger og forslag til forbedringer. Kontakt oss på sprakbanken@nb.no.
description: NB Samtale is a speech corpus made by the Language Bank at the National Library of Norway. The corpus contains orthographically transcribed speech from podcasts and recordings of live events at the National Library. The corpus is intended as an open source dataset for Automatic Speech Recognition (ASR) development, and is specifically aimed at improving ASR systems' handle on conversational speech. The corpus consists of 12,080 segments, a total of 24 hours transcribed speech from 69 speakers. The corpus ensures both gender and dialect variation, and speakers from five broad dialect areas are represented. Both Bokmål and Nynorsk transcriptions are present in the corpus, with Nynorsk making up approximately 25% of the transcriptions. We greatly appreciate feedback and suggestions for improvements. PLease contact us at sprakbanken@nb.no.
url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-85/
P I D: hdl:21.11146/85
identifier: sbr-85
distribution Info:
licence Info:
user Category: Public
distribution Access Medium: downloadable
download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-85/
licence:
licence Family: Creative Commons (CC)
licence Name: Creative_Commons-ZERO (CC-ZERO)
licence Url: https://creativecommons.org/publicdomain/zero/1.0/
licensor:
actor Info:
actor Type: organization
role: Licensor
organization Info:
organization Name: Nasjonalbiblioteket
organization Name: National Library of Norway
organization Short Name: NB
organization Short Name: NLN
department Name: Språkbanken
department Name: The Language Bank
communication Info:
email: sprakbanken@nb.no
url: https://www.nb.no/sprakbanken/
address: P.O. Box 2674 Solli
zip Code: 0203
city: Oslo
region: Oslo
country: Norway
contact
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
metadata Info:
metadata Creation Date: 18.08.2023
metadata Language Name: Norwegian Bokmål
metadata Language Name: English
metadata Language Id: nb
metadata Language Id: en
metadata Last Date Updated: 18.08.2023
metadata Creator
- actor Info:
- actor Type: organization
- role: Metadata creator
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
version Info:
version: 1.0
last Date Updated: 18.08.2023
validation Info:
validated: true
validation Type: content
validation Mode: manual
validation Extent: full
validator:
actor Info:
actor Type: organization
role: Resource Validator
organization Info:
organization Name: Nasjonalbiblioteket
organization Name: National Library of Norway
organization Short Name: NB
organization Short Name: NLN
department Name: Språkbanken
department Name: The Language Bank
communication Info:
email: sprakbanken@nb.no
url: https://www.nb.no/sprakbanken/
address: P.O. Box 2674 Solli
zip Code: 0203
city: Oslo
region: Oslo
country: Norway
resource Creation Info:
creation Start Date: 01.07.2022
creation End Date: 18.08.2023
resource Creator
- actor Info:
- actor Type: organization
- role: Resource Creator
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway

Download resources

Download metadata

Download metadata https://www.nb.no/sprakbanken/oai?verb=GetRecord&identifier=oai:nb.no:sbr-85&metadataPrefix=cmdi

dc:type	corpus
dc:title	Norwegian Conversation Speech Corpus
dc:identifier	oai:nb.no:sbr-85
dc:description	NB Samtale is a speech corpus made by the Language Bank at the National Library of Norway. The corpus contains orthographically transcribed speech from podcasts and recordings of live events at the National Library. The corpus is intended as an open source dataset for Automatic Speech Recognition (ASR) development, and is specifically aimed at improving ASR systems' handle on conversational speech. The corpus consists of 12,080 segments, a total of 24 hours transcribed speech from 69 speakers. The corpus ensures both gender and dialect variation, and speakers from five broad dialect areas are represented. Both Bokmål and Nynorsk transcriptions are present in the corpus, with Nynorsk making up approximately 25% of the transcriptions. We greatly appreciate feedback and suggestions for improvements. PLease contact us at sprakbanken@nb.no.
dc:publisher
dc:format	downloadable
dc:date	2022-07-01
dc:date	2023-08-18
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-ZERO (CC-ZERO)
dc:rights	https://creativecommons.org/publicdomain/zero/1.0/
dc:creator	National Library of Norway
dc:lang	Norwegian

Norwegian Conversation Speech Corpus

Extended metadata

Resource Common Info

Corpus Info

Dublin Core (DC)

Download resources

Download metadata