Norsk talespråkskorpus – Oslodelen
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: Norsk talespråkskorpus – Oslodelen
- description: NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo area. The informants are carefully selected w.r.t. sociolinguistic variables and therefore representative in terms of age, gender, place of residence and education. NoTa-Oslo consists of approx. 957 000 words that are orthographically transcribed and morphologically tagged. The corpus is searchable in a specially designed search interface, and the transcriptions are linked to audio and video files.
- description: NoTa-Oslo er et talespråkskorpus bestående av intervjuer og samtaler med 166 informanter født og oppvokst i Oslo og Oslo-området. Informantene er representative med hensyn til alder, kjønn, bosted og utdannelse. NoTa-Oslo består av drøyt 957 000 ord som er ortografisk transkribert og morfologisk tagget. Korpuset er tilgjengelig for forskning og søkbart gjennom søkegrensesnittet Glossa, og transkripsjonene er koblet sammen med lyd- og videofiler. Transkripsjonene kan også lastes ned. NoTa-Oslo er laget av Tekstlaboratoriet i perioden 2004 – 2006.
- resource Short Name: NoTa-Oslo
- resource Short Name: NoTa-Oslo
- url:
- url:
- P I D:
- distribution Info
- licence Info
- user Category: Academic
- distribution Access Medium: accessibleThroughInterface
- execution Location:
- execution Location:
- licence
- licence Family: CLARIN
- licence Name: CLARIN_ACA-NC-LOC-PRIV-ND-*
- licence Url:
- conditions Of Use: *
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: LOC
- conditions Of Use: NC
- conditions Of Use: ND
- conditions Of Use: NORED
- conditions Of Use: PRIV
- non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- contact
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- metadata Info
- metadata Creation Date: 26.11.2014
- metadata Last Date Updated: 16.04.2021
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- version Info
- version: Second version (2018)
- validation Info
- validated: true
- validation Type: content
- validation Mode: manual
- validation Mode Details: The transcriptions are proofread against the audio files.
- validation Extent: partial
- validator:
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- documentation Structured
- role: documentation
- document Info
- document Type: other
- title: Norsk talespråkskorpus – Oslodelen
- url:
- documentation Structured
- role: documentation
- document Info
- document Type: inBook
- title: Språk i Oslo. Ny forskning omkring talespråk.
- editor: Janne Bondi Johannessen and Kristin Hagen
- year: 2008
- publisher: Novus forlag
- book Title: Språk i Oslo. Ny forskning omkring talespråk.
- I S B N: 978-82-7099-471-7
- creation Start Date: 01.01.2004
- creation End Date: 31.12.2006
- resource Creator
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- funding Project:
- project Info
- project Name: Norsk talespråkskorpus – Oslodelen
- project Short Name: NoTa-Oslo
- url:
- url:
- funding Type: nationalFunds
- funder: The Research Council of Norway
- project Start Date: 01.01.2004
- project End Date: 31.12.2006
- corpus Info
- corpus Type: Multimodal Corpus
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: txt
- size Per Text Format
- size Info
- size: 957 063
- size Unit: tokens
- size Info
- character Encoding Info
- character Encoding: utf-8
- text Format Info
- corpus Part Info
- media Type: video
- corpus Video Info
- video Content Info
- type Of Video Content: Interviews and conversations from 166 informants born and raised in Oslo and the Oslo area.
- text Included In Video: none
- dynamic Element Info
- body Parts: arms
- body Parts: face
- setting Info
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Two scenarios: one semiformal interview: research assistant and informant. One free conversation between two informants. Research assistants were often passively present in the room during the conversations to prevent conversations about sensitive matters
- video Format Info
- mime Type: videos in mpeg4 streaming format available through Glossa
- frame Rate: 25
- resolution Info
- size Width: 400
- size Height: 300
- resolution Standard: HD.720
- compression Info
- compression: true
- compression Name: mpg
- video Content Info
- corpus Part Info
- media Type: audio
- corpus Audio Info
- audio Size Info
- size Info
- size: Approx 40 GB
- size Unit: gb
- size Info
- setting Info
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Two scenarios: one semiformal interview: research assistant and informant. One free conversation between two informants. Research assistants were often passively present in the room during the conversations to prevent conversations about sensitive matters
- audio Format Info
- mime Type: wav and mpeg4
- signal Encoding: linearPCM
- sampling Rate: 32
- quantization: 64
- number Of Tracks: 1
- recording Quality: medium
- compression Info
- compression: true
- compression Name: mpg
- audio Size Info
- corpus Part General Info
- person Source Set Info
- number Of Persons: 166
- age Of Persons: teenager
- age Of Persons: adult
- age Of Persons: elderly
- age Range Start: 16
- age Range End: 90
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: Oslo dialect: half of the informants come from East Oslo, the other half from West Oslo
- geographic Distribution Of Persons: Oslo and close Oslo area
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: No
- language Name: Norwegian
- language Info
- language Id: Nb
- language Name: Norwegian Bokmål
- modality Info
- modality Type: spokenLanguage
- modality Type Details: Orthographic transcription
- size Info
- size: 957 063
- size Unit: tokens
- annotation Info
- annotation Type: morphosyntacticAnnotation-posTagging
- annotated Elements: other
- segmentation Level: word
- tagset: POS tagset created for the statistical NoTa-tagger – based on the tagset of the Oslo Bergen Tagger.
- tagset Language Id: nb
- tagset Language Name: Norwegian Bokmål
- theoretic Model: TreeTagger
- annotation Mode: automatic
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: NoTa-taggeren: TAGGEVEILEDNING
- author: Åshild Søfteland
- year: 2007
- url:
- document Language Name: Norwegian bokmål
- document Language Id: nb
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: article
- title: Tagging a Norwegian Speech Corpus
- author: Anders Nøklestad and Åshild Søfteland
- editor: Joakim Nivre,Heiki-Jaan Kaalep,Kadri Muischnek, Mare Koit
- year: 2007
- book Title: Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007
- pages: 245–248
- conference: Nodalida 2007
- document Language Name: English
- document Language Id: en
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: article
- title: Manuell morfologisk tagging av NoTa-materialet med støtte fra en statistisk tagger.
- author: Åshild Søfteland og Anders Nøklestad
- editor: Janne Bondi Johannessen og Kristin Hagen
- year: 2008
- publisher: Novus forlag
- book Title: Språk i Oslo. Ny forskning omkring talespråk
- pages: 226–234.
- I S B N: 978-82-7099-471-7
- document Language Name: Norwegian
- document Language Id: nb
- annotation Info
- annotation Type: speechAnnotation-orthographicTranscription
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: Orthographic transcription,cf Bokmålsordboka (Wangensteen 2004)
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: Transkripsjonsveiledning for NoTa-Oslo
- author: Kristin Hagen
- year: 2008
- url:
- annotation Tool
- target Resource Name U R I: Transcriber ( )
- classification Info
- genre Info
- genre Type: speechGenre
- genre: informal
- unstandardised Genre: conversations
- genre Info
- classification Info
- genre Info
- genre Type: speechGenre
- genre: semi formal
- unstandardised Genre: interviews
- genre Info
- time Coverage Info
- time Coverage: 2004 – 2006
- geographic Coverage Info
- geographic Coverage: Oslo and the Oslo area
- recording Info
- recording Device Type: tapeVHS
- recording Environment: office
- recording Environment: closedPublicPlace
- recording Environment: conferenceRoom
- recording Environment: lectureRoom
- recording Environment: other
- recorder Actor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- person Source Set Info
- capture Info
- capturing Device Type: microphone
- capturing Device Type: camera
dc:type | corpus |
dc:title | Norsk talespråkskorpus – Oslodelen |
dc:identifier | |
dc:description | NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo area. The informants are carefully selected w.r.t. sociolinguistic variables and therefore representative in terms of age, gender, place of residence and education. NoTa-Oslo consists of approx. 957 000 words that are orthographically transcribed and morphologically tagged. The corpus is searchable in a specially designed search interface, and the transcriptions are linked to audio and video files. |
dc:publisher | |
dc:format | accessibleThroughInterface |
dc:date | 2004-01-01 |
dc:date | 2006-12-31 |
dc:rights | Academic |
dc:rights | CLARIN |
dc:rights | CLARIN_ACA-NC-LOC-PRIV-ND-* |
dc:rights | |
dc:creator | The Text Laboratory |
dc:lang | Norwegian |
dc:lang | Norwegian Bokmål |