NDC-trebanken
Utvidet metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: NDC-trebanken
- resource Name: The NDC Treebank
- description: The NDC Treebank includes 4637 speech segments and 66 042 tokens from the Norwegian part of Nordic Dialect Corpus. The segments are taken from 30 transcribed interviews from 17 places in Norway. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in two versions: A downloadable version in conllx format and a searchable version in the search interface Glossa. Nordic Dialect Corpus is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spontaneously spoken dialects.
- description: NDC-trebanken inneholder 4637 talemålssegment og 66 042 ord/token fra den norske delen av Nordisk dialektkorpus. Segmentene er hentet fra 30 transkriberte intervjuer fra 17 stader i Noreg. Trebanken er annotert med morfologisk og syntaktisk informasjon og manuelt korrigert. Trebanken er tilgjengelig i to versjoner: en nedlastbar versjon i conllx-format og en søkbar i søkegrensesnittet Glossa. Nordisk dialektkorpus er et talespråkskorpus med spontantale fra norske, svenske, danske, islandske og færøyske dialekter.
- resource Short Name: The NDC Treebank
- resource Short Name: NDC-trebanken
- url: http://www.tekstlab.uio.no/scandiasyn/index.html
- url: http://www.tekstlab.uio.no/nota/scandiasyn/treebank.html
- P I D: http://hdl.handle.net/11538/0000-0005-E7C7-6
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-BokmaalNDC
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- licensor:
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UoO
- organization Short Name: UiO
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- department Name: Department of Linguistics and Scandinavian Studies
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: Oslo
- country: Norway
- distribution Rights Holder
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- department Name: Department of Linguistics and Scandinavian Studies
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- actor Info:
- actor Type: person
- person Info:
- surname: Hagen
- given Name: Kristin
- actor Info:
- actor Type: organization
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- corpus Info:
- corpus Type: Treebank
- corpus Part Info:
- media Type: text
- corpus Text Info:
- text Format Info:
- mime Type: Downloadable in conllx-format
- size Per Text Format:
- size Info:
- size: 66 042
- size Unit: tokens
- size Info:
- size: 4637 speech segments
- size Unit: utterances
- character Encoding Info:
- character Encoding: utf-8
- corpus Part Info:
- media Type: audio
- corpus Audio Info:
- audio Size Info:
- size Info:
- size: 30 wav files
- size Unit: files
- setting Info:
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Semiformal or informal interviews with one interviewer.
- audio Format Info:
- mime Type: wav and mp3
- compression Info:
- compression: true
- compression Name: mp3
- corpus Part Info:
- media Type: video
- corpus Video Info:
- video Content Info:
- type Of Video Content: Semiformal or informal interviews with one interviewer.
- setting Info:
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Semiformal or informal interviews with one interviewer.
- video Format Info:
- mime Type: mp4
- compression Info:
- compression: true
- compression Name: mpg
- corpus Part General Info:
- person Source Set Info:
- age Of Persons: teenager
- age Of Persons: adult
- age Of Persons: elderly
- age Range Start: 14
- age Range End: 91
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: Dialects from 17 places in Norway
- geographic Distribution Of Persons: All over Norway
- linguality Info:
- linguality Type: monolingual
- language Info:
- language Id: No
- language Name: Norwegian
- language Info:
- language Id: Nb
- language Name: Norwegian Bokmål
- modality Info:
- modality Type: spokenLanguage
- modality Type Details: Norwegian dialects. Orthographic transcription
- size Info:
- size: 66 042
- size Unit: tokens
- size Info:
- size: 4637 speech segments
- size Unit: utterances
- annotation Info:
- annotation Type: speechAnnotation-orthographicTranscription
- annotation Type: morphosyntacticAnnotation-posTagging
- annotation Type: syntacticAnnotation-treebanks
- annotation Description: Original version in conllx-format, annotated with morphological and dependency-style syntactic analysis.
- annotation Manual Structured:
- role: annotationManual
- document Info:
- document Type: manual
- title: NDT Guidelines for Morphological and Syntactic Annotation
- author: Kari Kinn, Per Erik Solberg og Pål Kristian Eriksen. Translated from Norwegian to English by Per Erik Solberg
- year: 2013
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/
- annotation Manual Structured:
- role: annotationManual
- document Info:
- document Type: manual
- title: Retningslinjer for syntaktisk annotasjon i LIA
- author: Andre Kåsen, Kristin Hagen, Lilja Øvrelid, Signe Laake, Håvard Østli
- year: 2019
- url: http://tekstlab.uio.no/LIA/pdf/parseretningslinjer-lia12042019.pdf
- annotation Tool:
- target Resource Name U R I: ConlluEditor.
- annotation Tool:
- target Resource Name U R I: https://aclanthology.org/W19-8010/
- classification Info:
- genre Info:
- genre Type: speechGenre
- genre: informal
- unstandardised Genre: informal interviews
- classification Info:
- genre Info:
- genre Type: speechGenre
- genre: semi formal
- unstandardised Genre: semi formal interviews
- time Coverage Info:
- time Coverage: 2007 – 2010
- geographic Coverage Info:
- geographic Coverage: 17 places from all over Norway
dc:type | corpus |
dc:title | NDC-trebanken |
dc:identifier | oai:tekstlab.uio.no:ndc-trebanken |
dc:description | NDC-trebanken inneholder 4637 talemålssegment og 66 042 ord/token fra den norske delen av Nordisk dialektkorpus. Segmentene er hentet fra 30 transkriberte intervjuer fra 17 stader i Noreg. Trebanken er annotert med morfologisk og syntaktisk informasjon og manuelt korrigert. Trebanken er tilgjengelig i to versjoner: en nedlastbar versjon i conllx-format og en søkbar i søkegrensesnittet Glossa. Nordisk dialektkorpus er et talespråkskorpus med spontantale fra norske, svenske, danske, islandske og færøyske dialekter. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2021-06-01 |
dc:date | 2022-12-01 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1 |
dc:lang | norsk |
dc:lang | bokmål |