The NDC Treebank
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: NDC-trebanken
- resource Name: The NDC Treebank
- description: The NDC Treebank includes 4637 speech segments and 66 042 tokens from the Norwegian part of Nordic Dialect Corpus. The segments are taken from 30 transcribed interviews from 17 places in Norway. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in two versions: A downloadable version in conllx format and a searchable version in the search interface Glossa. Nordic Dialect Corpus is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spontaneously spoken dialects.
- description: NDC-trebanken inneholder 4637 talemålssegment og 66 042 ord/token fra den norske delen av Nordisk dialektkorpus. Segmentene er hentet fra 30 transkriberte intervjuer fra 17 stader i Noreg. Trebanken er annotert med morfologisk og syntaktisk informasjon og manuelt korrigert. Trebanken er tilgjengelig i to versjoner: en nedlastbar versjon i conllx-format og en søkbar i søkegrensesnittet Glossa. Nordisk dialektkorpus er et talespråkskorpus med spontantale fra norske, svenske, danske, islandske og færøyske dialekter.
- resource Short Name: The NDC Treebank
- resource Short Name: NDC-trebanken
- url: http://www.tekstlab.uio.no/scandiasyn/index.html
- url: http://www.tekstlab.uio.no/nota/scandiasyn/treebank.html
- P I D: http://hdl.handle.net/11538/0000-0005-E7C7-6
- distribution Info
- licence Info
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-BokmaalNDC
- licence
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UoO
- organization Short Name: UiO
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- department Name: Department of Linguistics and Scandinavian Studies
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: Oslo
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- department Name: Department of Linguistics and Scandinavian Studies
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- licence Info
- user Category: Academic
- distribution Access Medium: accessibleThroughInterface
- execution Location: https://tekstlab.uio.no/glossa3/ndctrebanken
- licence
- licence Family: CLARIN
- licence Name: CLARIN_ACA-NC-LOC-PRIV-ND-*
- licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: LOC
- conditions Of Use: NC
- conditions Of Use: ND
- conditions Of Use: NORED
- conditions Of Use: PRIV
- non Standard Conditions Of Use: The treebank has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the treebank linked to audio and video files is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: Oslo
- country: Norway
- metadata Creation Date: 16.11.2022
- metadata Last Date Updated: 04.01.2024
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: kristin.hagen@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- version: conllx and glossa version November 2022
- validated: true
- validation Type: content
- validation Mode: manual
- validation Mode Details: The treebank is manually corrected by at least one person
- validation Extent: partial
- documentation Unstructured
- role: documentation
- document Unstructured: http://www.tekstlab.uio.no/nota/scandiasyn/treebank.html
- documentation Structured
- role: documentation
- document Info
- document Type: proceedings
- title: The Norwegian Dialect Corpus Treebank
- author: Andre Kåsen and Kristin Hagen and Anders Nøklestad and Joel Priestley and Per Erik Solberg and Dag Trygve Truslew Haug
- editor: Nicoletta Calzolari et al
- year: 2022
- book Title: Proceedings of the Thirteenth Language Resources and Evaluation Conference
- url: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.516.pdf
- creation Start Date: 01.06.2021
- creation End Date: 01.12.2022
- resource Creator
- actor Info
- actor Type: organization
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- funding Project:
- project Info
- project Name: Common Language Resources and Technology Infrastructure Norway +
- project Short Name: CLARINO +
- project I D: 295700
- url: http://clarin.b.uib.no/
- funding Type: nationalFunds
- funder: the Research Council of Norway
- funding Country: Norway
- project Start Date: 01.03.2020
- project End Date: 31.12.2023
- corpus Info
- corpus Type: Treebank
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: Downloadable in conllx-format
- size Per Text Format
- size Info
- size: 66 042
- size Unit: tokens
- size Info
- size: 4637 speech segments
- size Unit: utterances
- size Info
- character Encoding Info
- character Encoding: utf-8
- text Format Info
- corpus Part Info
- media Type: audio
- corpus Audio Info
- audio Size Info
- size Info
- size: 30 wav files
- size Unit: files
- size Info
- setting Info
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Semiformal or informal interviews with one interviewer.
- audio Format Info
- mime Type: wav and mp3
- compression Info
- compression: true
- compression Name: mp3
- audio Size Info
- corpus Part Info
- media Type: video
- corpus Video Info
- video Content Info
- type Of Video Content: Semiformal or informal interviews with one interviewer.
- setting Info
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Semiformal or informal interviews with one interviewer.
- video Format Info
- mime Type: mp4
- compression Info
- compression: true
- compression Name: mpg
- video Content Info
- corpus Part General Info
- person Source Set Info
- age Of Persons: teenager
- age Of Persons: adult
- age Of Persons: elderly
- age Range Start: 14
- age Range End: 91
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: Dialects from 17 places in Norway
- geographic Distribution Of Persons: All over Norway
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: No
- language Name: Norwegian
- language Info
- language Id: Nb
- language Name: Norwegian Bokmål
- modality Info
- modality Type: spokenLanguage
- modality Type Details: Norwegian dialects. Orthographic transcription
- size Info
- size: 66 042
- size Unit: tokens
- size Info
- size: 4637 speech segments
- size Unit: utterances
- annotation Info
- annotation Type: speechAnnotation-orthographicTranscription
- annotation Type: morphosyntacticAnnotation-posTagging
- annotation Type: syntacticAnnotation-treebanks
- annotation Description: Original version in conllx-format, annotated with morphological and dependency-style syntactic analysis.
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: NDT Guidelines for Morphological and Syntactic Annotation
- author: Kari Kinn, Per Erik Solberg og Pål Kristian Eriksen. Translated from Norwegian to English by Per Erik Solberg
- year: 2013
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: Retningslinjer for syntaktisk annotasjon i LIA
- author: Andre Kåsen, Kristin Hagen, Lilja Øvrelid, Signe Laake, Håvard Østli
- year: 2019
- url: http://tekstlab.uio.no/LIA/pdf/parseretningslinjer-lia12042019.pdf
- annotation Tool
- target Resource Name U R I: ConlluEditor.
- annotation Tool
- target Resource Name U R I: https://aclanthology.org/W19-8010/
- classification Info
- genre Info
- genre Type: speechGenre
- genre: informal
- unstandardised Genre: informal interviews
- genre Info
- classification Info
- genre Info
- genre Type: speechGenre
- genre: semi formal
- unstandardised Genre: semi formal interviews
- genre Info
- time Coverage Info
- time Coverage: 2007 – 2010
- geographic Coverage Info
- geographic Coverage: 17 places from all over Norway
- person Source Set Info
dc:type | corpus |
dc:title | The NDC Treebank |
dc:identifier | oai:tekstlab.uio.no:ndc-trebanken |
dc:description | The NDC Treebank includes 4637 speech segments and 66 042 tokens from the Norwegian part of Nordic Dialect Corpus. The segments are taken from 30 transcribed interviews from 17 places in Norway. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in two versions: A downloadable version in conllx format and a searchable version in the search interface Glossa. Nordic Dialect Corpus is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spontaneously spoken dialects. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2021-06-01 |
dc:date | 2022-12-01 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1 |
dc:lang | Norwegian |
dc:lang | Norwegian Bokmål |