Corpus of American Nordic Speech – downloadable transcriptions
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: Amerikanordisk talespråkskorpus – nedlastbare transkripsjoner
- resource Name: Corpus of American Nordic Speech – downloadable transcriptions
- description: CANS v.3.1 – Corpus of American Nordic Speech – is a speech corpus with speakers from USA and Canada speaking Norwegian and Swedish. Most of the informants learnt to speak their Nordic language as children at home. There are 268 speakers from 63 places in the corpus, all in all more than 774 000 tokens. The corpus contains both conversations and interviews. The downloadable version of the corpus contains all transcriptions in the corpus, some in txt format and some in html format. The transcriptions are available in to versions: one phonetic and one orthographic. CANS v.3.1. includes Norwegian recordings from Janne Bondi Johannessen et al. (2010 – 2016) together with older recordings and transcriptions from Didrik Arup Seip and Ernst W. Selmer (1931), Einar Haugen (1942) and Arnstein Hjelde (1987, 1990, 1992). The Swedish recordings are collected by Ida Larsson et al. (2011 – 2014).
- description: CANS v.3.1 – amerikanordisk talespråkskorpus – er et talespråkskorpus med informanter fra USA og Canada. Informantene snakker norsk og svensk, og de fleste lærte språket som barn hjemme hos foreldrene i Amerika. Det er 268 talere fra 63 steder i korpuset, alt i alt mer enn 774 000 tokens. Korpuset inneholder både samtaler og intervjuer. Den nedlastbare versjonen av korpuset inneholder alle transkripsjonene, noen i tekstformat og noen i html. Transkripsjonene finnes både i en fonetisk, talemålsnær variant og i en ortografisk versjon. CANS v.3.1 inneholder opptak fra Janne Bondi Johannessen et al. (2010 – 2016) sammen med eldre opptak og transkripsjoner fra Didrik Arup Seip og Ernst W. Selmer (1931), Einar Haugen (1942) og Arnstein Hjelde (1987, 1990, 1992). De svenske opptakene er samlet av Ida Larsson et al. (2011 – 2014).
- resource Short Name: CANS v.3.1
- url:
- url:
- P I D:
- distribution Info
- licence Info
- user Category: Public
- distribution Access Medium: downloadable
- download Location:
- execution Location:
- licence
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- licence Url:
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the video and audio files are accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- contact
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- metadata Info
- metadata Creation Date: 04.03.2015
- metadata Last Date Updated: 08.04.2021
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- version Info
- version: version 3.1
- validation Info
- validated: true
- validation Type: content
- validation Mode: manual
- validation Mode Details: The transcriptions are proofread against the audio files.
- validation Extent: full
- validator:
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- documentation Unstructured
- role: documentation
- document Unstructured:
- documentation Unstructured
- role: documentation
- document Unstructured: User Manual for CANS v.2: User Manual for CANS v.3:
- creation Start Date: 01.01.2010
- creation End Date: 01.11.2019
- resource Creator
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Text Lab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email:
- url:
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- actor Type: person
- person Info
- surname: Larsson
- given Name: Ida
- sex: female
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email:
- actor Info
- funding Project:
- project Info
- project Name: Norwegian in America
- project Short Name: NorAmDiaSyn
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Name: Norwegian in America
- project Short Name: NorAmDiaSyn
- funding Type: other
- funder: Department of Linguistics and Scandinavian Studies, University of Tromsø (through Merete Anderssen and Marit Westergaard)
- funding Country: Norway
- project Name: Norwegian in America
- project Short Name: NorAmDiaSyn
- funding Type: ownFunds
- funder: The Text Laboratory
- funding Country: Norway
- project Name: Language Infrastructure made Accessible
- project Short Name: LIA
- url:
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Start Date: 01.04.2014
- project End Date: 01.04.2019
- project Name: Swedish in America
- project Short Name: Swedish in America
- url:
- funding Type: nationalFunds
- funder: Torsten Söderbergs stiftelse
- funding Country: Sweden
- project Start Date: 01.01.2010
- project End Date: 31.12.2011
- project Name: Swedish in America
- project Short Name: Swedish in America
- funding Type: ownFunds
- funder: Department of Linguistics and Scandinavian Studies, UiO
- funding Country: Norway
- project Start Date: 01.01.2015
- project End Date: 31.08.2017
- project Name: Transcriptions of older recordings Einar Haugen (1942) and Seip and Selmer (1931)
- funding Type: ownFunds
- funder: Kari Kinn and Ida Larsson got funding for transcription of older recordings (Seip and Selmer and Haugen) from the Research Committee, ILN.
- project Start Date: 01.06.2019
- project End Date: 11.01.2019
- project Name: Norwegian across the Americas
- url:
- funding Type: nationalFunds
- funder: The Research Council of Norway
- project Start Date: 01.01.2020
- project End Date: 31.12.2024
- corpus Info
- corpus Type: Written Corpus
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: Downloadable transcriptions in txt and html format
- size Per Text Format
- size Info
- size: 774 625
- size Unit: tokens
- size Info
- character Encoding Info
- character Encoding: utf-8
- text Format Info
- corpus Part General Info
- person Source Set Info
- number Of Persons: 268
- age Of Persons: elderly
- age Of Persons: adult
- age Of Persons: teenager
- age Range Start: 12
- age Range End: 98
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: American-Norwegian and American-Swedish
- geographic Distribution Of Persons: USA and Canada
- linguality Info
- linguality Type: bilingual
- language Info
- language Id: Nb
- language Name: Norwegian Bokmål
- language Variety Info
- language Variety Type: dialect
- language Variety Name: American Norwegian
- size Per Language Variety
- size Info
- size: 729 393
- size Unit: tokens
- size Info
- language Info
- language Id: Sv
- language Name: Swedish
- size Per Language
- size Info
- size: 45 232
- size Unit: tokens
- size Info
- language Variety Info
- language Variety Type: dialect
- language Variety Name: American-Swedish dialects
- modality Info
- modality Type: spokenLanguage
- size Info
- size: 774 625
- size Unit: tokens
- annotation Info
- annotation Type: speechAnnotation-phoneticTranscription
- annotation Type: speechAnnotation-orthographicTranscription
- segmentation Level: word
- annotation Mode: interactive
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured:
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: Transkripsjons-og translittereringsveiledning for Norsk i Amerika
- author: Andre Kåsen, Eirik Olsen, Linn Iren Sjånes Rødvand og Eirik Tengesdal
- year: 2018
- url:
- annotation Tool
- target Resource Name U R I: Transcriber ( )
- annotation Tool
- target Resource Name U R I: ELAN:
- annotation Tool
- target Resource Name U R I:
- classification Info
- genre Info
- genre Type: speechGenre
- genre: informal
- genre Info
- time Coverage Info
- time Coverage: Interviews and conversations mostly from 2010 – 2016. Some are from 1931, 1942, 1987, 1990 and 1992
- geographic Coverage Info
- geographic Coverage: Informants from 57 places in USA and Canada speaking Norwegian and Swedish
- creation Info
- creation Mode: manual
- person Source Set Info
dc:type | corpus |
dc:title | Corpus of American Nordic Speech – downloadable transcriptions |
dc:identifier | |
dc:description | CANS v.3.1 – Corpus of American Nordic Speech – is a speech corpus with speakers from USA and Canada speaking Norwegian and Swedish. Most of the informants learnt to speak their Nordic language as children at home. There are 268 speakers from 63 places in the corpus, all in all more than 774 000 tokens. The corpus contains both conversations and interviews. The downloadable version of the corpus contains all transcriptions in the corpus, some in txt format and some in html format. The transcriptions are available in to versions: one phonetic and one orthographic. CANS v.3.1. includes Norwegian recordings from Janne Bondi Johannessen et al. (2010 – 2016) together with older recordings and transcriptions from Didrik Arup Seip and Ernst W. Selmer (1931), Einar Haugen (1942) and Arnstein Hjelde (1987, 1990, 1992). The Swedish recordings are collected by Ida Larsson et al. (2011 – 2014). |
dc:publisher | |
dc:format | downloadable |
dc:date | 2010-01-01 |
dc:date | 2019-11-01 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | |
dc:creator | The Text Laboratory |
dc:creator | Ida Larsson |
dc:lang | Norwegian Bokmål |
dc:lang | Swedish |