Amerikanordisk talespråkskorpus – nedlastbare transkripsjoner
Utvidet metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: Amerikanordisk talespråkskorpus – nedlastbare transkripsjoner
- resource Name: Corpus of American Nordic Speech – downloadable transcriptions
- description: CANS v.3.1 – Corpus of American Nordic Speech – is a speech corpus with speakers from USA and Canada speaking Norwegian and Swedish. Most of the informants learnt to speak their Nordic language as children at home. There are 268 speakers from 63 places in the corpus, all in all more than 774 000 tokens. The corpus contains both conversations and interviews. The downloadable version of the corpus contains all transcriptions in the corpus, some in txt format and some in html format. The transcriptions are available in to versions: one phonetic and one orthographic. CANS v.3.1. includes Norwegian recordings from Janne Bondi Johannessen et al. (2010 – 2016) together with older recordings and transcriptions from Didrik Arup Seip and Ernst W. Selmer (1931), Einar Haugen (1942) and Arnstein Hjelde (1987, 1990, 1992). The Swedish recordings are collected by Ida Larsson et al. (2011 – 2014).
- description: CANS v.3.1 – amerikanordisk talespråkskorpus – er et talespråkskorpus med informanter fra USA og Canada. Informantene snakker norsk og svensk, og de fleste lærte språket som barn hjemme hos foreldrene i Amerika. Det er 268 talere fra 63 steder i korpuset, alt i alt mer enn 774 000 tokens. Korpuset inneholder både samtaler og intervjuer. Den nedlastbare versjonen av korpuset inneholder alle transkripsjonene, noen i tekstformat og noen i html. Transkripsjonene finnes både i en fonetisk, talemålsnær variant og i en ortografisk versjon. CANS v.3.1 inneholder opptak fra Janne Bondi Johannessen et al. (2010 – 2016) sammen med eldre opptak og transkripsjoner fra Didrik Arup Seip og Ernst W. Selmer (1931), Einar Haugen (1942) og Arnstein Hjelde (1987, 1990, 1992). De svenske opptakene er samlet av Ida Larsson et al. (2011 – 2014).
- resource Short Name: CANS v.3.1
- url: http://www.tekstlab.uio.no/norskiamerika/english/index.html
- url: https://sites.google.com/site/svenskaniamerika/home/english
- P I D: http://hdl.handle.net/11538/0000-0005-E7C9-4
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: http://tekstlab.uio.no/norskiamerika/english/corpus.html
- execution Location: http://tekstlab.uio.no/norskiamerika/english/corpus.html
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- licence Url: http://creativecommons.org/licenses/by-nc-sa/4.0/
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the video and audio files are accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
- licensor:
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- actor Info:
- actor Type: person
- person Info:
- surname: Hagen
- given Name: Kristin
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: The Text Laboratory
- organization Short Name: Text Lab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- corpus Info:
- corpus Type: Written Corpus
- corpus Part Info:
- media Type: text
- corpus Text Info:
- text Format Info:
- mime Type: Downloadable transcriptions in txt and html format
- size Per Text Format:
- size Info:
- size: 774 625
- size Unit: tokens
- character Encoding Info:
- character Encoding: utf-8
- corpus Part General Info:
- person Source Set Info:
- number Of Persons: 268
- age Of Persons: elderly
- age Of Persons: adult
- age Of Persons: teenager
- age Range Start: 12
- age Range End: 98
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: American-Norwegian and American-Swedish
- geographic Distribution Of Persons: USA and Canada
- linguality Info:
- linguality Type: bilingual
- language Info:
- language Id: Nb
- language Name: Norwegian Bokmål
- language Variety Info:
- language Variety Type: dialect
- language Variety Name: American Norwegian
- size Per Language Variety:
- size Info:
- size: 729 393
- size Unit: tokens
- language Info:
- language Id: Sv
- language Name: Swedish
- size Per Language:
- size Info:
- size: 45 232
- size Unit: tokens
- language Variety Info:
- language Variety Type: dialect
- language Variety Name: American-Swedish dialects
- modality Info:
- modality Type: spokenLanguage
- size Info:
- size: 774 625
- size Unit: tokens
- annotation Info:
- annotation Type: speechAnnotation-phoneticTranscription
- annotation Type: speechAnnotation-orthographicTranscription
- segmentation Level: word
- annotation Mode: interactive
- annotation Manual Unstructured:
- role: annotationManual
- document Unstructured: http://www.tekstlab.uio.no/norskiamerika/english/index.html
- annotation Manual Structured:
- role: annotationManual
- document Info:
- document Type: manual
- title: Transkripsjons-og translittereringsveiledning for Norsk i Amerika
- author: Andre Kåsen, Eirik Olsen, Linn Iren Sjånes Rødvand og Eirik Tengesdal
- year: 2018
- url: http://tekstlab.uio.no/norskiamerika/Transkripsjons-translittereringsveiledning-norskiamerika.pdf
- annotation Tool:
- target Resource Name U R I: Transcriber (http://trans.sourceforge.net/en/presentation.php )
- annotation Tool:
- target Resource Name U R I: ELAN: https://tla.mpi.nl/tools/tla-tools/elan/
- annotation Tool:
- target Resource Name U R I: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/oslo-transliterator/index.html
- classification Info:
- genre Info:
- genre Type: speechGenre
- genre: informal
- time Coverage Info:
- time Coverage: Interviews and conversations mostly from 2010 – 2016. Some are from 1931, 1942, 1987, 1990 and 1992
- geographic Coverage Info:
- geographic Coverage: Informants from 57 places in USA and Canada speaking Norwegian and Swedish
- creation Info:
- creation Mode: manual
dc:type | corpus |
dc:title | Amerikanordisk talespråkskorpus – nedlastbare transkripsjoner |
dc:identifier | oai:tekstlab.uio.no:cans-transcriptions |
dc:description | CANS v.3.1 – amerikanordisk talespråkskorpus – er et talespråkskorpus med informanter fra USA og Canada. Informantene snakker norsk og svensk, og de fleste lærte språket som barn hjemme hos foreldrene i Amerika. Det er 268 talere fra 63 steder i korpuset, alt i alt mer enn 774 000 tokens. Korpuset inneholder både samtaler og intervjuer. Den nedlastbare versjonen av korpuset inneholder alle transkripsjonene, noen i tekstformat og noen i html. Transkripsjonene finnes både i en fonetisk, talemålsnær variant og i en ortografisk versjon. CANS v.3.1 inneholder opptak fra Janne Bondi Johannessen et al. (2010 – 2016) sammen med eldre opptak og transkripsjoner fra Didrik Arup Seip og Ernst W. Selmer (1931), Einar Haugen (1942) og Arnstein Hjelde (1987, 1990, 1992). De svenske opptakene er samlet av Ida Larsson et al. (2011 – 2014). |
dc:publisher | |
dc:format | downloadable |
dc:date | 2010-01-01 |
dc:date | 2019-11-01 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc:creator | The Text Laboratory |
dc:creator | Ida Larsson |
dc:lang | bokmål |
dc:lang | svensk |