Tuva Taledatabase
Utvidet metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: Tuva Speech Database
- resource Name: Tuva Taledatabase
- description: Tuva Speech Database was created by Max Manus AS for testing and evaluation of the speech recognition solution "Tuva" for Norwegian. The corpus consists of approximately 24 hours of recorded speech from 40 speakers of Norwegian, 36 of which speak a dialect close to the Bokmål written standard, while four speak a dialect that is closer to the Nynorsk written standard. About 70% of the material is manuscript read. The remaining 30% is spontaneous speech. The manuscripts in the manuscript read part of the corpus is for the most part composed of short news articles. 25% of the manuscripts are read by all speakers, while the remaining 75% are unique to each speaker. All punctuation (dots, commas, paragraphs etc.) are read by the speakers, and all sound recordings are orthographically transcribed in two different formats. For Nynorsk, only manuscript read speech is available. The speakers have been selected to represent a cross section of the Norwegian working population, balanced for age, gender and dialect. All recordings are made with a 48 kHz sampling frequency and 32 bit resolution with one microphone in one channel (mono). The recordings were conducted in a recording studio in Oslo.
- description: Tuva Taledatabase er utarbeidd av Max Manus AS for test og evaluering av dikteringsløysinga «Tuva». Databasen inneheld omlag 24 timar innlesen tale frå 40 talarar. 36 av desse snakkar ei bokmålsnær dialekt, fire ei nynorsknær dialekt. Omlag 70% av materialet er manuskriptlesen tale og 30% er spontan tale. Manuskripta i den manuskriptlesne delen av korpuset er som regel korte avisartiklar. Av desse manuskripta vert 25% lesne av alle talarane, medan dei resterande 75% er unike for kvar talar. All punktuering (punktum, komma, avsnitt osb.) vert lesen opp av innlesarane, og alle lydopptaka er ortografisk transkriberte i to ulike format. For nynorsk finst det berre manuskriptlesen tale i korpuset. Innlesarane i Tuva Taledatabase har vorte utvalde for å representere eit tverrsnitt av den norske arbeidsbefolkninga, balansert for alder, kjønn og dialekt. Alle lydopptaka er utførde med 48 kHz punktprøvingsfrekvens og 32 bit oppløysing med ein mikrofon i ein kanal (mono). Opptaka vart gjennomførte i eit opptaksstudio i Oslo.
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-44/
- P I D: hdl:21.11146/44
- identifier: sbr-44
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-44/
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY (CC-BY)
- licence Url: https://creativecommons.org/licenses/by/4.0/
- conditions Of Use: BY
- licensor:
- actor Info:
- actor Type: organization
- role: Licensor
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- distribution Rights Holder
- actor Info:
- actor Type: organization
- role: Distribution Rights Holder
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Lindstad
- given Name: Arne Martinus
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: organization
- role: Resource Creator
- organization Info:
- organization Name: Max Manus AS
- organization Name: Max Manus AS
- corpus Info:
- corpus Type: Multimodal Corpus
- corpus Part Info:
- media Type: audio
- corpus Audio Info:
- audio Size Info:
- size Info:
- size: 24
- size Unit: hours
- audio Format Info:
- mime Type: audio/wav
- corpus Part Info:
- media Type: text
- corpus Text Info:
- text Format Info:
- mime Type: text/plain
- character Encoding Info:
- character Encoding: UTF-8
- corpus Part General Info:
- linguality Info:
- linguality Type: monolingual
- language Info:
- language Id: no
- language Name: Norwegian
- language Variety Info:
- language Variety Type: dialect
- language Variety Name: Dialects representing various regions
- modality Info:
- modality Type: spokenLanguage
- modality Type Details: Manuscript read and spontaneous speech
- size Info:
- size: 24
- size Unit: hours
- annotation Info:
- annotation Type: speechAnnotation-orthographicTranscription
- segmentation Level: word
- annotation Mode: manual
dc:type | corpus |
dc:title | Tuva Taledatabase |
dc:identifier | oai:nb.no:sbr-44 |
dc:description | Tuva Taledatabase er utarbeidd av Max Manus AS for test og evaluering av dikteringsløysinga «Tuva». Databasen inneheld omlag 24 timar innlesen tale frå 40 talarar. 36 av desse snakkar ei bokmålsnær dialekt, fire ei nynorsknær dialekt. Omlag 70% av materialet er manuskriptlesen tale og 30% er spontan tale. Manuskripta i den manuskriptlesne delen av korpuset er som regel korte avisartiklar. Av desse manuskripta vert 25% lesne av alle talarane, medan dei resterande 75% er unike for kvar talar. All punktuering (punktum, komma, avsnitt osb.) vert lesen opp av innlesarane, og alle lydopptaka er ortografisk transkriberte i to ulike format. For nynorsk finst det berre manuskriptlesen tale i korpuset. Innlesarane i Tuva Taledatabase har vorte utvalde for å representere eit tverrsnitt av den norske arbeidsbefolkninga, balansert for alder, kjønn og dialekt. Alle lydopptaka er utførde med 48 kHz punktprøvingsfrekvens og 32 bit oppløysing med ein mikrofon i ein kanal (mono). Opptaka vart gjennomførte i eit opptaksstudio i Oslo. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2016-01-01 |
dc:date | 2017-06-01 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY (CC-BY) |
dc:rights | https://creativecommons.org/licenses/by/4.0/ |
dc:creator | Max Manus AS |
dc:lang | norsk |