Corpus of American Nordic Speech v.3.1
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: Amerikanordisk talespråkskorpus v.3.1
- resource Name: Corpus of American Nordic Speech v.3.1
- description: CANS v.3.1 – Corpus of American Nordic Speech – is a speech corpus with speakers from USA and Canada speaking Norwegian and Swedish. Most of the informants learnt to speak their Nordic language as children at home. There are 268 speakers from 63 places in the corpus, all in all more than 774 000 tokens. CANS v.3.1 contains both conversations and interviews. The transcriptions are both phonetic and orthographic and are linked to audio and video. CANS v.3.1 includes Norwegian recordings from Janne Bondi Johannessen et al. (2010 – 2016) together with older recordings and transcriptions from Didrik Arup Seip and Ernst W. Selmer (1931), Einar Haugen (1942) and Arnstein Hjelde (1987, 1990, 1992). The Swedish recordings are collected by Ida Larsson et al. (2011 – 2014).
- description: CANS v.3.1 – amerikanordisk talespråkskorpus – er et talespråkskorpus med informanter fra USA og Canada. Informantene snakker norsk og svensk, og de fleste lærte språket som barn hjemme hos foreldrene i Amerika. Det er 268 talere fra 63 steder i korpuset, alt i alt mer enn 774 000 tokens. CANS v.3.1 inneholder både samtaler og intervjuer. Transkripsjonene finnes både i en fonetisk, talemålsnær variant og i en ortografisk versjon. Transkripsjonene er lenket til lyd og video i korpuset. CANS v.3.1 inneholder opptak fra Janne Bondi Johannessen et al. (2010 – 2016) sammen med eldre opptak og transkripsjoner fra Didrik Arup Seip og Ernst W. Selmer (1931), Einar Haugen (1942) og Arnstein Hjelde (1987, 1990, 1992). De svenske opptakene er samlet av Ida Larsson et al. (2011 – 2014).
- resource Short Name: CANS v.3.1
- url: http://www.tekstlab.uio.no/norskiamerika/english/index.html
- url: https://sites.google.com/site/svenskaniamerika/home/english
- P I D: http://hdl.handle.net/11538/0000-0005-E7C9-4
- distribution Info
- licence Info
- user Category: Academic
- distribution Access Medium: accessibleThroughInterface
- execution Location: http://www.tekstlab.uio.no/norskiamerika/english/index.html
- execution Location: https://tekstlab.uio.no/glossa2/cans3
- licence
- licence Family: CLARIN
- licence Name: CLARIN_ACA-NC-LOC-PRIV-ND-*
- licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
- conditions Of Use: *
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: LOC
- conditions Of Use: NC
- conditions Of Use: ND
- conditions Of Use: NORED
- conditions Of Use: PRIV
- non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- contact
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- metadata Info
- metadata Creation Date: 04.03.2015
- metadata Last Date Updated: 08.04.2021
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: kristin.hagen@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- version Info
- version: version 3.1
- validation Info
- validated: true
- validation Type: content
- validation Mode: manual
- validation Mode Details: The transcriptions are proofread against the audio files.
- validation Extent: full
- validator:
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- documentation Unstructured
- role: documentation
- document Unstructured: http://www.tekstlab.uio.no/norskiamerika/english/index.html
- documentation Unstructured
- role: documentation
- document Unstructured: User Manual for CANS v.2: http://tekstlab.uio.no/brukerveiledninger/CANS/index_eng_v2.html User Manual for CANS v.3: http://tekstlab.uio.no/brukerveiledninger/CANS/index_eng_v3.html
- creation Start Date: 01.01.2010
- creation End Date: 01.11.2019
- resource Creator
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Text Lab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- actor Type: person
- person Info
- surname: Larsson
- given Name: Ida
- sex: female
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: ida.larsson@iln.uio.no
- actor Info
- funding Project:
- project Info
- project Name: Norwegian in America
- project Short Name: NorAmDiaSyn
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Name: Norwegian in America
- project Short Name: NorAmDiaSyn
- funding Type: other
- funder: Department of Linguistics and Scandinavian Studies, University of Tromsø (through Merete Anderssen and Marit Westergaard)
- funding Country: Norway
- project Name: Norwegian in America
- project Short Name: NorAmDiaSyn
- funding Type: ownFunds
- funder: The Text Laboratory
- funding Country: Norway
- project Name: Language Infrastructure made Accessible
- project Short Name: LIA
- url: http://www.hf.uio.no/iln/english/research/projects/language-infrastructure-made-accessible/index.html
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Start Date: 01.04.2014
- project End Date: 01.04.2019
- project Name: Swedish in America
- project Short Name: Swedish in America
- url: https://sites.google.com/site/svenskaniamerika/home/english
- funding Type: nationalFunds
- funder: Torsten Söderbergs stiftelse
- funding Country: Sweden
- project Start Date: 01.01.2010
- project End Date: 31.12.2011
- project Name: Swedish in America
- project Short Name: Swedish in America
- funding Type: ownFunds
- funder: Department of Linguistics and Scandinavian Studies, UiO
- funding Country: Norway
- project Start Date: 01.01.2015
- project End Date: 31.08.2017
- project Name: Transcriptions of older recordings Einar Haugen (1942) and Seip and Selmer (1931)
- funding Type: ownFunds
- funder: Kari Kinn and Ida Larsson got funding for transcription of older recordings (Seip and Selmer and Haugen) from the Research Committee, ILN.
- project Start Date: 01.06.2019
- project End Date: 11.01.2019
- project Name: Norwegian across the Americas
- url: https://www.uib.no/lle/134610/norwegian-across-americas
- funding Type: nationalFunds
- funder: The Research Council of Norway
- project Start Date: 01.01.2020
- project End Date: 31.12.2024
- corpus Info
- corpus Type: Multilingual Corpus
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: .txt
- size Per Text Format
- size Info
- size: 774 625
- size Unit: tokens
- size Info
- character Encoding Info
- character Encoding: utf-8
- text Format Info
- corpus Part Info
- media Type: video
- corpus Video Info
- video Content Info
- type Of Video Content: Interviews and conversations between American Norwegians
- text Included In Video: none
- setting Info
- naturality: spontaneous
- conversational Type: multilogue
- audience: some
- interactivity: overlapping
- video Format Info
- mime Type: video in streaming format mp4 available through Glossa
- frame Rate: 25
- resolution Info
- size Width: 400
- size Height: 300
- resolution Standard: HD.720
- compression Info
- compression: true
- compression Name: mpg
- video Content Info
- corpus Part Info
- media Type: audio
- corpus Audio Info
- audio Size Info
- size Info
- size: approx 10 GB
- size Unit: gb
- size Info
- audio Content Info
- textual Description: Interviews and conversations between American Norwegians and American Swedes
- setting Info
- naturality: spontaneous
- conversational Type: dialogue
- audience: some
- interactivity: overlapping
- interaction: Two scenarios: one semiformal interview: research assistant/researcher and informant. One free conversation between two informants. The older recordings are interviews with a Norwegian interviewer
- audio Format Info
- mime Type: wav and mp4
- signal Encoding: linearPCM
- sampling Rate: 32
- quantization: 64
- number Of Tracks: 1
- recording Quality: medium
- compression Info
- compression: true
- compression Name: mp3
- audio Size Info
- corpus Part General Info
- person Source Set Info
- number Of Persons: 268
- age Of Persons: elderly
- age Of Persons: adult
- age Of Persons: teenager
- age Range Start: 12
- age Range End: 98
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: American-Norwegian and American-Swedish
- geographic Distribution Of Persons: USA and Canada
- linguality Info
- linguality Type: bilingual
- language Info
- language Id: Nb
- language Name: Norwegian Bokmål
- language Variety Info
- language Variety Type: dialect
- language Variety Name: American Norwegian
- size Per Language Variety
- size Info
- size: 729 393
- size Unit: tokens
- size Info
- language Info
- language Id: Sv
- language Name: Swedish
- size Per Language
- size Info
- size: 45 232
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: American-Swedish dialects
- modality Info
- modality Type: spokenLanguage
- size Info
- size: 774 625
- size Unit: tokens
- annotation Info
- annotation Type: morphosyntacticAnnotation-posTagging
- annotated Elements: other
- segmentation Level: word
- tagset: POS tagset created for the statistical NoTa-tagger – based on the tagset of the Oslo Bergen Tagger.
- tagset Language Id: nb
- tagset Language Name: Norwegian Bokmål
- theoretic Model: TreeTagger
- annotation Mode: automatic
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: NoTa-taggeren: TAGGEVEILEDNING
- author: Åshild Søfteland
- year: 2007
- url: http://www.tekstlab.uio.no/nota/oslo/Taggeveiledning2.pdf
- document Language Name: Norwegian bokmål
- document Language Id: nb
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: article
- title: Tagging a Norwegian Speech Corpus
- author: Anders Nøklestad and Åshild Søfteland
- editor: Joakim Nivre,Heiki-Jaan Kaalep,Kadri Muischnek, Mare Koit
- year: 2007
- book Title: Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007
- pages: 245–248
- conference: Nodalida 2007
- document Language Name: English
- document Language Id: en
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: article
- title: Manuell morfologisk tagging av NoTa-materialet med støtte fra en statistisk tagger.
- author: Åshild Søfteland og Anders Nøklestad
- editor: Janne Bondi Johannessen og Kristin Hagen
- year: 2008
- publisher: Novus forlag
- book Title: Språk i Oslo. Ny forskning omkring talespråk
- pages: 226–234.
- I S B N: 978-82-7099-471-7
- document Language Name: Norwegian
- document Language Id: nb
- annotation Info
- annotation Type: speechAnnotation-phoneticTranscription
- segmentation Level: word
- annotation Mode: interactive
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: http://www.tekstlab.uio.no/norskiamerika/english/index.html
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: Transkripsjons-og translittereringsveiledning for Norsk i Amerika
- author: Andre Kåsen, Eirik Olsen, Linn Iren Sjånes Rødvand og Eirik Tengesdal
- year: 2018
- url: http://tekstlab.uio.no/norskiamerika/Transkripsjons-translittereringsveiledning
- annotation Tool
- target Resource Name U R I: Transcriber (http://trans.sourceforge.net/en/presentation.php )
- annotation Tool
- target Resource Name U R I: ELAN: https://tla.mpi.nl/tools/tla-tools/elan/
- annotation Tool
- target Resource Name U R I: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/oslo-transliterator/index.html
- annotation Info
- annotation Type: morphosyntacticAnnotation-posTagging
- annotated Elements: other
- segmentation Level: word
- tagset: PAROLE tag set customized for Nordic Dialect Corpus
- tagset Language Name: Swedish
- theoretic Model: TnT
- annotation Mode: automatic
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: See documentation on the Nordic Dialect Corpus web page: http://www.tekstlab.uio.no/nota/scandiasyn/tagging.html
- classification Info
- genre Info
- genre Type: speechGenre
- genre: informal
- genre Info
- time Coverage Info
- time Coverage: Interviews and conversations mostly from 2010 – 2016. Some are from 1931, 1942, 1987, 1990 and 1992
- geographic Coverage Info
- geographic Coverage: Informants from 57 places in USA and Canada speaking Norwegian and Swedish
- recording Info
- recording Device Type: hardDisk
- recording Environment: office
- recording Environment: closedPublicPlace
- recording Environment: conferenceRoom
- recording Environment: lectureRoom
- recording Environment: other
- recorder Actor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- person Source Set Info
- capture Info
- capturing Device Type: closeTalkMicrophone
- capturing Device Type: camera
- creation Info
- creation Mode: manual
dc:type | corpus |
dc:title | Corpus of American Nordic Speech v.3.1 |
dc:identifier | oai:tekstlab.uio.no:cans |
dc:description | CANS v.3.1 – Corpus of American Nordic Speech – is a speech corpus with speakers from USA and Canada speaking Norwegian and Swedish. Most of the informants learnt to speak their Nordic language as children at home. There are 268 speakers from 63 places in the corpus, all in all more than 774 000 tokens. CANS v.3.1 contains both conversations and interviews. The transcriptions are both phonetic and orthographic and are linked to audio and video. CANS v.3.1 includes Norwegian recordings from Janne Bondi Johannessen et al. (2010 – 2016) together with older recordings and transcriptions from Didrik Arup Seip and Ernst W. Selmer (1931), Einar Haugen (1942) and Arnstein Hjelde (1987, 1990, 1992). The Swedish recordings are collected by Ida Larsson et al. (2011 – 2014). |
dc:publisher | |
dc:format | accessibleThroughInterface |
dc:date | 2010-01-01 |
dc:date | 2019-11-01 |
dc:rights | Academic |
dc:rights | CLARIN |
dc:rights | CLARIN_ACA-NC-LOC-PRIV-ND-* |
dc:rights | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1 |
dc:creator | The Text Laboratory |
dc:creator | Ida Larsson |
dc:lang | Norwegian Bokmål |
dc:lang | Swedish |