Nordic Dialect Corpus v. 4.0
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: Nordic Dialect Corpus v. 4.0
- resource Name: Nordisk dialektkorpus v. 4.0
- description: Nordic Dialect Corpus v.4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage – Data Collection), recorded in 1998 – 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers. It is transcribed and linked to audio and video, has a map function, and can be searched in a large variety of ways. Even if the aim of the corpus is Nordic syntax research, the corpus is a general one, a Norwegian Dialect Corpus, a Swedish Dialect Corpus and so on, to be used in a wide range of research areas, such as phonology, morphology and lexicography. Note! v. 3.0 contains old recordings and transcriptions from Målførearkivet (Oslo Old Dialect Archive. The same transcriptions are now searchable in LIA Norwegian – Corpus of Old Dialect Recordings. Use v. 4.0 to search the corpus without the old Målførearkiv recordings.
- resource Short Name: NDC – Nordic Dialect Corpus v. 4.0
- url: http://www.tekstlab.uio.no/nota/scandiasyn/
- P I D: http://hdl.handle.net/11538/0000-0005-E7C7-6
- distribution Info
- licence Info
- user Category: Academic
- distribution Access Medium: accessibleThroughInterface
- execution Location: http://www.tekstlab.uio.no/nota/scandiasyn/
- licence
- licence Family: CLARIN
- licence Name: CLARIN_ACA-NC-LOC-PRIV-ND-*
- licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
- conditions Of Use: *
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: LOC
- conditions Of Use: NC
- conditions Of Use: ND
- conditions Of Use: NORED
- conditions Of Use: PRIV
- non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- contact
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- metadata Info
- metadata Creation Date: 03.02.2015
- metadata Last Date Updated: 16.04.2021
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- sex: female
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- validation Info
- validated: true
- validation Type: content
- validation Mode: manual
- validation Mode Details: The transcriptions are proof read against the audio files. The national projects NorDiaSyn, DanDiaSyn and SweDiaSyn have proof read own transcriptions, see homepage – Transcription
- validation Extent: full
- resource Documentation Info
- documentation Structured
- role: documentation
- document Info
- document Type: other
- title: Nordic Dialect Corpus and Syntax Database
- author: The Text Laboratory
- year: 2013
- url: http://www.tekstlab.uio.no/nota/scandiasyn/
- document Language Id: en
- documentation Structured
- role: documentation
- document Info
- document Type: manual
- title: The Nordic Dialect Corpus – Search Interface Documentation
- author: Eirik Olsen
- year: 2014
- url: http://www.tekstlab.uio.no/nota/scandiasyn/help/
- document Language Id: en
- documentation Structured
- role: documentation
- document Info
- document Type: book
- title: Om artiklene i denne boka og Nordisk dialektkorpus
- editor: Janne Bondi Johannessen og Kristin Hagen
- year: 2014
- publisher: Novus forlag
- book Title: Språk i Norge og nabolanda. Ny forskning om talespråk.
- I S B N: 978-82-7099-795-4
- document Language Name: Norwegian bokmål
- document Language Id: nb
- documentation Structured
- resource Creation Info
- creation Start Date: 01.01.2005
- creation End Date: 01.10.2019
- funding Project:
- project Info
- project Name: Scandinavian Dialect Syntax
- project Short Name: ScanDiaSyn
- url: http://websim.arkivert.uit.no/scandiasyn/scandiasyn/index.html%3fcolapsemenu=colapsemenu
- url: http://www.tekstlab.uio.no/nota/scandiasyn/index.html
- funding Type: other
- funder: http://websim.arkivert.uit.no/scandiasyn/scandiasyn/29
- funding Project:
- project Info
- project Name: NorDiaSyn – Norsk dialektsyntaks
- project Name: Nordiasyn – Norwegian Dialect Syntax
- project Short Name: Nordiasyn
- url: http://www.tekstlab.uio.no/nota/NorDiaSyn/index.html
- url: http://www.tekstlab.uio.no/nota/NorDiaSyn/english/index.html
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Start Date: 01.01.2009
- project End Date: 31.12.2013
- funding Project:
- project Info
- project Name: For the funding of the national projects in Norway, Sweden, Denmark, Iceland and Faroese islands, see under National Projects: http://www.tekstlab.uio.no/nota/scandiasyn/dialect_data_collection.html
- url: http://www.tekstlab.uio.no/nota/scandiasyn/dialect_data_collection.html
- funding Type: nationalFunds
- corpus Info
- corpus Type: Multimodal Corpus
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: txt
- size Per Text Format
- size Info
- size: 2 754 289
- size Unit: tokens
- size Info
- character Encoding Info
- character Encoding: utf-8
- text Format Info
- corpus Part Info
- media Type: video
- corpus Video Info
- video Content Info
- type Of Video Content: (Some recordings in the corpus are audio only, below are the video recordings) Norway: informal conversations and semi-formal interwievs. 438 informants from 111 places. Âlvdalen, Sweden: interviews and conversations: 17 informants from 7 places Denmark: intervievs and conversations:18 informants from 4 places Faroese islands: intervievs and conversations: 20 informants from 5 places Iceland: conversations: 6 informants from 2 places
- text Included In Video: none
- dynamic Element Info
- body Parts: face
- body Parts: arms
- setting Info
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Two scenarios in the corpus: 1) semiformal interview: research assistant/researcher and informant(s). 2) Free conversation between two informants. Research assistants were some times passively present in the room during the conversations to prevent conversations about sensitive matters
- video Format Info
- mime Type: videos in mpeg4 streaming format available through Glossa
- frame Rate: 25
- resolution Info
- size Width: 400
- size Height: 300
- resolution Standard: HD.720
- compression Info
- compression: true
- compression Name: mpg
- video Content Info
- corpus Part Info
- media Type: audio
- corpus Audio Info
- audio Size Info
- size Info
- size: approx 27 GB
- size Unit: gb
- size Info
- audio Content Info
- textual Description: Norway: 1)old audio recordings from Målførearkivet, University of Oslo. Interviews: 126 informants from 52 places. (In v. 4.0 these recordings are moved to LIA Norwegian – Corpus of Old Dialect Recordings. They are still searcable in NDC v. 3.0) 2) New recordings: informal conversations and semi-formal interwievs. 438 informants from 111 places. Sweden: interviews. 133 informants from 37 places. + Âlvdalen, Sweden: interviews and conversations: 17 informants from 7 places Denmark: interviews: 81 informants from 15 places Iceland: intervievs and conversations: 48 informants from 8 places Faroese islands: intervievs and conversations: 20 informants from 8 places
- setting Info
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Two scenarios: 1) (semiformal) interview: research assistant or researcher and informant(s). 2) Free conversation between two informants. Research assistants were sometimes passively present in the room during the conversations to prevent conversations about sensitive matters
- audio Format Info
- mime Type: wav and mp3
- signal Encoding: linearPCM
- sampling Rate: 32
- quantization: 64
- number Of Tracks: 1
- recording Quality: medium
- compression Info
- compression: true
- compression Name: mp3
- audio Size Info
- corpus Part General Info
- person Source Set Info
- number Of Persons: 737
- age Of Persons: teenager
- age Of Persons: adult
- age Of Persons: elderly
- age Range Start: 11
- age Range End: 94
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: Dialects from Norway, Sweden, Denmark, the Faroe Islands, Iceland and Älvdalen.
- geographic Distribution Of Persons: Norway, Sweden, Denmark, the Faroe Islands, Iceland and Älvdalen
- linguality Info
- linguality Type: multilingual
- multilinguality Type: other
- multilinguality Type Details: Interviews and conversations in 5 scandinavian languages. Can be translated to english by google translate
- language Info
- language Id: nb
- language Name: Norwegian Bokmål (the orthographic transcriptions)
- size Per Language
- size Info
- size: 1 997 920
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Dialects from 111 places in Norway, 438 informants
- language Info
- language Id: Sv
- language Name: Swedish (Övdalien included)
- size Per Language
- size Info
- size: 376 868,14 798 of them are Övdalian
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Dialects from 44 places in Sweden, 150 informants 17 informants from 7 places are Övdalian.
- language Info
- language Id: Da
- language Name: Danish
- size Per Language
- size Info
- size: 220 360
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Dialects from 15 places in Denmark. 81 informants
- language Info
- language Id: Is
- language Name: Icelandic
- size Per Language
- size Info
- size: 94 338
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Dialects from 8 places in Iceland, 48 informants
- language Info
- language Id: fo
- language Name: Faroese
- size Per Language
- size Info
- size: 64 803
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Dialects from 5 places on the Faroese islands, 20 informants
- modality Info
- modality Type: spokenLanguage
- size Info
- size: 2 754 289
- size Unit: tokens
- annotation Info
- annotation Type: morphosyntacticAnnotation-posTagging
- annotated Elements: other
- segmentation Level: word
- annotation Format: See http://www.tekstlab.uio.no/nota/scandiasyn/tagging.html for tagging of the five languages
- tagset: See http://www.tekstlab.uio.no/nota/scandiasyn/tagging.html for tagging of the five languages
- annotation Mode: automatic
- annotation Info
- annotation Type: speechAnnotation-phoneticTranscription
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: Norwegian and Övdalian have phonetic transcriptions, see http://www.tekstlab.uio.no/nota/scandiasyn/transcription.html
- annotation Info
- annotation Type: speechAnnotation-orthographicTranscription
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: All languages are ortographical transcribed, see http://www.tekstlab.uio.no/nota/scandiasyn/transcription.html
- annotation Tool
- target Resource Name U R I: Transcriber (http://trans.sourceforge.net/en/presentation.php ) ELAN (https://tla.mpi.nl/tools/tla-tools/elan/)
- annotation Tool
- target Resource Name U R I: For Norwegian and Övdalian: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/oslo-transliterator/index.html
- classification Info
- genre Info
- genre Type: speechGenre
- genre: informal
- unstandardised Genre: conversations
- genre Info
- classification Info
- genre Info
- genre Type: speechGenre
- genre: semi formal
- unstandardised Genre: interviews
- genre Info
- time Coverage Info
- time Coverage: 1998 – 2015
- geographic Coverage Info
- geographic Coverage: Norway, Sweden, Denmark, the Faroe Islands, Iceland and Älvdalen from 183 places
- recording Info
- recording Device Type: tapeVHS
- recording Device Type: tapeVHS
- recording Device Type: other
- recording Environment: office
- recording Environment: closedPublicPlace
- recording Environment: conferenceRoom
- recording Environment: lectureRoom
- recording Environment: other
- capture Info
- capturing Device Type: closeTalkMicrophone
- capturing Device Type: camera
- person Source Set Info
dc:type | corpus |
dc:title | Nordic Dialect Corpus v. 4.0 |
dc:identifier | oai:tekstlab.uio.no:nordic-dialect-corpus-v4 |
dc:description | Nordic Dialect Corpus v.4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage – Data Collection), recorded in 1998 – 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers. It is transcribed and linked to audio and video, has a map function, and can be searched in a large variety of ways. Even if the aim of the corpus is Nordic syntax research, the corpus is a general one, a Norwegian Dialect Corpus, a Swedish Dialect Corpus and so on, to be used in a wide range of research areas, such as phonology, morphology and lexicography. Note! v. 3.0 contains old recordings and transcriptions from Målførearkivet (Oslo Old Dialect Archive. The same transcriptions are now searchable in LIA Norwegian – Corpus of Old Dialect Recordings. Use v. 4.0 to search the corpus without the old Målførearkiv recordings. |
dc:publisher | |
dc:format | accessibleThroughInterface |
dc:date | 2005-01-01 |
dc:date | 2019-09-31 |
dc:rights | Academic |
dc:rights | CLARIN |
dc:rights | CLARIN_ACA-NC-LOC-PRIV-ND-* |
dc:rights | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1 |
dc:lang | Norwegian Bokmål (the orthographic transcriptions) |
dc:lang | Swedish (Övdalien included) |
dc:lang | Danish |
dc:lang | Icelandic |
dc:lang | Faroese |