Nordic Dialect Corpus – downloadable transcriptions
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: Nordic Dialect Corpus – downloadable transcriptions
- resource Name: Nordisk dialektkorpus – nedlastbare transkripsjoner
- description: Nordic Dialect Corpus v. 4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage – Data Collection), recorded in 1998 – 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers. The downloadable version of the corpus contains all transcriptions in the corpus, both in txt and html format. The Norwegian and Övdaliantranscriptions are available in to versions: one phonetic and one orthographic. The other transcriptions are orthographically transcribed.
- resource Short Name: NDC – downloadable transcriptions
- url: http://www.tekstlab.uio.no/scandiasyn/download.html
- P I D: http://hdl.handle.net/11538/0000-0005-E7C7-6
- distribution Info
- licence Info
- user Category: Public
- distribution Access Medium: downloadable
- download Location: http://www.tekstlab.uio.no/scandiasyn/download.html
- execution Location: http://www.tekstlab.uio.no/nota/scandiasyn/
- licence
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- licence Url: http://creativecommons.org/licenses/by-nc-sa/4.0/
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the video and audio files are accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- contact
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- metadata Info
- metadata Creation Date: 03.02.2015
- metadata Last Date Updated: 16.04.2021
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- sex: female
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- validation Info
- validated: true
- validation Type: content
- validation Mode: manual
- validation Mode Details: The transcriptions are proof read against the audio files. The national projects NorDiaSyn, DanDiaSyn and SweDiaSyn have proof read own transcriptions, see homepage – Transcription
- validation Extent: full
- resource Documentation Info
- documentation Structured
- role: documentation
- document Info
- document Type: other
- title: Nordic Dialect Corpus and Syntax Database
- author: The Text Laboratory
- year: 2013
- url: http://www.tekstlab.uio.no/nota/scandiasyn/
- document Language Id: en
- documentation Structured
- role: documentation
- document Info
- document Type: manual
- title: The Nordic Dialect Corpus – Search Interface Documentation
- author: Eirik Olsen
- year: 2014
- url: http://www.tekstlab.uio.no/nota/scandiasyn/help/
- document Language Id: en
- documentation Structured
- role: documentation
- document Info
- document Type: book
- title: Om artiklene i denne boka og Nordisk dialektkorpus
- editor: Janne Bondi Johannessen og Kristin Hagen
- year: 2014
- publisher: Novus forlag
- book Title: Språk i Norge og nabolanda. Ny forskning om talespråk.
- I S B N: 978-82-7099-795-4
- document Language Name: Norwegian bokmål
- document Language Id: nb
- documentation Structured
- resource Creation Info
- creation Start Date: 01.01.2005
- creation End Date: 01.10.2019
- funding Project:
- project Info
- project Name: Scandinavian Dialect Syntax
- project Short Name: ScanDiaSyn
- url: http://websim.arkivert.uit.no/scandiasyn/scandiasyn/index.html%3fcolapsemenu=colapsemenu
- url: http://www.tekstlab.uio.no/nota/scandiasyn/index.html
- funding Type: other
- funder: http://websim.arkivert.uit.no/scandiasyn/scandiasyn/29
- funding Project:
- project Info
- project Name: NorDiaSyn – Norsk dialektsyntaks
- project Name: Nordiasyn – Norwegian Dialect Syntax
- project Short Name: Nordiasyn
- url: http://www.tekstlab.uio.no/nota/NorDiaSyn/index.html
- url: http://www.tekstlab.uio.no/nota/NorDiaSyn/english/index.html
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Start Date: 01.01.2009
- project End Date: 31.12.2013
- funding Project:
- project Info
- project Name: For the funding of the national projects in Norway, Sweden, Denmark, Iceland and Faroese islands, see under National Projects: http://www.tekstlab.uio.no/nota/scandiasyn/dialect_data_collection.html
- url: http://www.tekstlab.uio.no/nota/scandiasyn/dialect_data_collection.html
- funding Type: nationalFunds
- corpus Info
- corpus Type: Written Corpus
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: Downloadable transcriptions in txt and html format
- size Per Text Format
- size Info
- size: 2 754 289
- size Unit: tokens
- size Info
- character Encoding Info
- character Encoding: utf-8
- text Format Info
- corpus Part General Info
- person Source Set Info
- number Of Persons: 737
- age Of Persons: teenager
- age Of Persons: adult
- age Of Persons: elderly
- age Range Start: 11
- age Range End: 94
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: Dialects from Norway, Sweden, Denmark, the Faroe Islands, Iceland and Älvdalen.
- geographic Distribution Of Persons: Norway, Sweden, Denmark, the Faroe Islands, Iceland and Älvdalen
- linguality Info
- linguality Type: multilingual
- multilinguality Type: other
- multilinguality Type Details: Interviews and conversations in 5 scandinavian languages.
- language Info
- language Id: nb
- language Name: Norwegian Bokmål (the orthographic transcriptions)
- size Per Language
- size Info
- size: 1 997 920
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Dialects from 111 places in Norway, 438 informants
- language Info
- language Id: Sv
- language Name: Swedish (Övdalien included)
- size Per Language
- size Info
- size: 376 868,14 798 of them are Övdalian
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Dialects from 44 places in Sweden, 150 informants 17 informants from 7 places are Övdalian.
- language Info
- language Id: Da
- language Name: Danish
- size Per Language
- size Info
- size: 220 360
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Dialects from 15 places in Denmark. 81 informants
- language Info
- language Id: Is
- language Name: Icelandic
- size Per Language
- size Info
- size: 94 338
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Dialects from 8 places in Iceland, 48 informants
- language Info
- language Id: fo
- language Name: Faroese
- size Per Language
- size Info
- size: 64 803
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Dialects from 5 places on the Faroese islands, 20 informants
- modality Info
- modality Type: spokenLanguage
- size Info
- size: 2 754 289
- size Unit: tokens
- annotation Info
- annotation Type: speechAnnotation-phoneticTranscription
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: Norwegian and Övdalian have phonetic transcriptions, see http://www.tekstlab.uio.no/nota/scandiasyn/transcription.html
- annotation Info
- annotation Type: speechAnnotation-orthographicTranscription
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: All languages are ortographical transcribed, see http://www.tekstlab.uio.no/nota/scandiasyn/transcription.html
- annotation Tool
- target Resource Name U R I: Transcriber (http://trans.sourceforge.net/en/presentation.php ) ELAN (https://tla.mpi.nl/tools/tla-tools/elan/)
- annotation Tool
- target Resource Name U R I: For Norwegian and Övdalian: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/oslo-transliterator/index.html
- classification Info
- genre Info
- genre Type: speechGenre
- genre: informal
- unstandardised Genre: conversations
- genre Info
- classification Info
- genre Info
- genre Type: speechGenre
- genre: semi formal
- unstandardised Genre: interviews
- genre Info
- time Coverage Info
- time Coverage: 1998 – 2015
- geographic Coverage Info
- geographic Coverage: Norway, Sweden, Denmark, the Faroe Islands, Iceland and Älvdalen from 183 places
- recording Info
- recording Environment: office
- recording Environment: closedPublicPlace
- recording Environment: conferenceRoom
- recording Environment: lectureRoom
- recording Environment: other
- person Source Set Info
dc:type | corpus |
dc:title | Nordic Dialect Corpus – downloadable transcriptions |
dc:identifier | oai:tekstlab.uio.no:nordic-dialect-corpus-transcriptions |
dc:description | Nordic Dialect Corpus v. 4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage – Data Collection), recorded in 1998 – 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers. The downloadable version of the corpus contains all transcriptions in the corpus, both in txt and html format. The Norwegian and Övdaliantranscriptions are available in to versions: one phonetic and one orthographic. The other transcriptions are orthographically transcribed. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2005-01-01 |
dc:date | 2019-09-31 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc:lang | Norwegian Bokmål (the orthographic transcriptions) |
dc:lang | Swedish (Övdalien included) |
dc:lang | Danish |
dc:lang | Icelandic |
dc:lang | Faroese |