LIA Norwegian – Corpus of historical dialect recordings
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: LIA norsk – korpus av eldre dialektopptak
- resource Name: LIA Norwegian – Corpus of historical dialect recordings
- description: LIA Norwegian is a speech corpus with old recordings (1939 – 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT. The recordings are mainly made for dialect and onomastic research and the topics of the interviews and conversations are typically about old trades such as agriculture, fisheries, logging and life at the summer farm. Other topics are weaving, knitting, baking or dialects. The recordings are semi-formal or informal and often take place in an informant’s home. The first version of the corpus have 3.5 million tokens and 1374 speakers from 222 places in Norway. The corpus is morphologically tagged with a statistical speech tagger for Nynorsk.
- description: LIA norsk er et talespråkskorpus med gamle opptak (1939 – 1996) fra fire norske universitet: NTNU, UiB, UiO og UiT. Opptakene er gjort for dialektforskning og navneforskning, og handler ofte om landbruk, skogbruk, fiske, livet på setra og gamle håndverkstradisjoner. Som regel er opptakene gjort i private hjem, og intervjuene og samtalene er ganske uformelle. Den første versjonen av korpuset inneholder 3.5 millioner tokens og 1374 talere fra 222 steder i Norge. Korpuset er morfologisk tagget med en nyutviklet, statistisk talemålstagger for Nynorsk.
- resource Short Name: LIA Norwegian
- resource Short Name: LIA Norsk
- url: http://tekstlab.uio.no/LIA/norsk/index.html
- url: http://tekstlab.uio.no/LIA/norsk/index_english.html
- P I D: http://hdl.handle.net/11538/0000-000C-368B-B
- distribution Info
- licence Info
- user Category: Academic
- distribution Access Medium: accessibleThroughInterface
- execution Location: http://tekstlab.uio.no/LIA/norsk/index.html
- licence
- licence Family: CLARIN
- licence Name: CLARIN_ACA-NC-LOC-PRIV-ND-*
- licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
- conditions Of Use: *
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: LOC
- conditions Of Use: NC
- conditions Of Use: ND
- conditions Of Use: NORED
- conditions Of Use: PRIV
- non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. The audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- contact
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- metadata Info
- metadata Creation Date: 26.09.2018
- metadata Last Date Updated: 06.04.2021
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: kristin.hagen@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- version Info
- version: First version (15. September 2019)
- validation Info
- validated: true
- validation Type: content
- validation Mode: manual
- validation Mode Details: The transcriptions are proofread against the audio files.
- validation Extent: partial
- validator:
- actor Info
- actor Type: organization
- organization Info
- organization Name: The LIA project
- organization Short Name: LIA
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- documentation Unstructured
- role: documentation
- document Unstructured: Brukarrettleiing for LIA norsk – korpus av eldre dialektopptak: http://tekstlab.uio.no/brukerveiledninger/LIA%20norsk/index.html
- documentation Structured
- role: documentation
- document Info
- document Type: other
- title: Heimesida til LIA-korpuset for norske dialekter
- url: http://tekstlab.uio.no/LIA/norsk/index.html
- creation Start Date: 01.04.2014
- creation End Date: 01.07.2018
- resource Creator
- actor Info
- actor Type: organization
- organization Info
- organization Name: The LIA project (Project participants and employees in the LIA project)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://tekstlab.uio.no/LIA/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- funding Project:
- project Info
- project Name: LIA (Language Infrastructure made Accessible)
- project Short Name: LIA
- project I D: 22 59 41
- url: http://tekstlab.uio.no/LIA/
- url: https://www.hf.uio.no/iln/english/research/projects/language-infrastructure-made-accessible/index.html
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Start Date: 04.01.2014
- project End Date: 31.12.2019
- corpus Info
- corpus Type: Multimodal Corpus
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: txt
- size Per Text Format
- size Info
- size: 3 481 547
- size Unit: tokens
- size Info
- character Encoding Info
- character Encoding: utf-8
- text Format Info
- corpus Part Info
- media Type: audio
- corpus Audio Info
- audio Size Info
- size Info
- size: Approx 25,4 GB
- size Unit: gb
- size Info
- setting Info
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Semiformal or informal interviews with one or more interviewers. Often the recordings are more like conversations. The recordings are mostly from peoples homes.
- audio Format Info
- mime Type: wav and mp3
- recording Quality: medium
- compression Info
- compression: true
- compression Name: mp3
- audio Size Info
- corpus Part General Info
- person Source Set Info
- number Of Persons: 1374
- age Of Persons: teenager
- age Of Persons: adult
- age Of Persons: elderly
- age Range Start: 10
- age Range End: 99
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: Dialects from 222 places in Norway
- geographic Distribution Of Persons: All over Norway
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: No
- language Name: Norwegian
- language Info
- language Id: Nn
- language Name: Norwegian Nynorsk
- modality Info
- modality Type: spokenLanguage
- modality Type Details: Two annotation modes: Norwegian dialects. One phonetic (with Norwegian alphabet) and one orthographic.
- size Info
- size: 3 481 547
- size Unit: tokens
- annotation Info
- annotation Type: morphosyntacticAnnotation-posTagging
- annotated Elements: other
- segmentation Level: word
- tagset: POS tagset created for the statistical LIA-tagger – based on the tagset of the Oslo Bergen Tagger.
- tagset Language Id: nn
- tagset Language Name: Norwegian Nynorsk
- theoretic Model: MarMoT
- annotation Mode: automatic
- annotation Info
- annotation Type: speechAnnotation-phoneticTranscription
- annotation Type: speechAnnotation-orthographicTranscription
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: Orthographic transcription,cf Nynorskordboka: https://ordbok.uib.no/
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: Transkripsjonsrettleiing for LIA
- author: Kristin Hagen and Live Håberg and Eirik Olsen and Åshild Søfteland
- year: 2018
- url: http://tekstlab.uio.no/LIA/pdf/transkripsjonsrettleiing_lia.pdf
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: LIA:Translitterering frå dialekt til nynorsk
- author: Anneke Askeland, Kristin Hagen, Live Håberg,Janne Bondi Johannessen, Linn Iren Sjånes Rødvand og Eirik Tengesdal
- year: 2019
- url: http://www.tekstlab.uio.no/LIA/pdf/rettleiing-translitterator.pdf
- annotation Tool
- target Resource Name U R I: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/oslo-transliterator/index.html
- classification Info
- genre Info
- genre Type: speechGenre
- genre: informal
- unstandardised Genre: conversations and informal interviews
- genre Info
- classification Info
- genre Info
- genre Type: speechGenre
- genre: semi formal
- unstandardised Genre: interviews
- genre Info
- time Coverage Info
- time Coverage: 1951 – 1995
- geographic Coverage Info
- geographic Coverage: All over Norway
- recording Info
- recording Device Type: other
- recording Environment: other
- person Source Set Info
dc:type | corpus |
dc:title | LIA Norwegian – Corpus of historical dialect recordings |
dc:identifier | oai:tekstlab.uio.no:lia-norsk |
dc:description | LIA Norwegian is a speech corpus with old recordings (1939 – 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT. The recordings are mainly made for dialect and onomastic research and the topics of the interviews and conversations are typically about old trades such as agriculture, fisheries, logging and life at the summer farm. Other topics are weaving, knitting, baking or dialects. The recordings are semi-formal or informal and often take place in an informant’s home. The first version of the corpus have 3.5 million tokens and 1374 speakers from 222 places in Norway. The corpus is morphologically tagged with a statistical speech tagger for Nynorsk. |
dc:publisher | |
dc:format | accessibleThroughInterface |
dc:date | 2014-04-01 |
dc:date | 2018-06-31 |
dc:rights | Academic |
dc:rights | CLARIN |
dc:rights | CLARIN_ACA-NC-LOC-PRIV-ND-* |
dc:rights | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1 |
dc:creator | The LIA project (Project participants and employees in the LIA project) |
dc:lang | Norwegian |
dc:lang | Norwegian Nynorsk |