The LIA Treebank
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: LIA-trebanken
- resource Name: The LIA Treebank
- description: The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in three versions: A downloadable version in conllx format, a searchable version in the search interface Glossa and a downloadable version in in conllu format. The conllu version is automatically converted to Universal Dependencies and includes 5250 speech segments and 55 410 tokens. LIA Norwegian is a speech corpus with old recordings (1939 – 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT.
- description: LIA-trebanken består av 7536 talemålssegment og 77 701 ord/token frå talespråkskorpuset LIA norsk. Trebanken er annotert morfologisk og syntaktisk og manuelt korrigert. LIA-trebanken er tilgjengelig i tre versjoner: en nedlastbar versjon i conllx-format, en søkbar versjon i søkegrensesnittet Glossa og en nedlastbar versjon i conllu-format. Conllu-versjonen er automatisk konvertert til Universal Dependencies og inneholder 5250 talemålssegment og 55 410 ord/token LIA norsk er et talespråkskorpus med gamle opptak (1939 – 1996) fra fire norske universitet: NTNU, UiB, UiO og UiT.
- resource Short Name: The LIA Treebank
- resource Short Name: LIA-trebanken
- url: http://tekstlab.uio.no/LIA/norsk/index_english.html
- url: http://tekstlab.uio.no/LIA/trebank.html
- P I D: http://hdl.handle.net/11538/0000-000C-368B-B
- distribution Info
- licence Info
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-NynorskLIA
- download Location: https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA/
- licence
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- licence Url: http://creativecommons.org/licenses/by-nc-sa/4.0/
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- department Name: Department of Linguistics and Scandinavian Studies
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Informatics
- department Name: Institutt for informatikk (IFI)
- communication Info
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Informatics
- department Name: Institutt for informatikk (IFI)
- communication Info
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/about/organization/text-laboratory/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- licence Info
- user Category: Academic
- distribution Access Medium: accessibleThroughInterface
- execution Location: https://tekstlab.uio.no/glossa2/liatrebanken
- licence
- licence Family: CLARIN
- licence Name: CLARIN_ACA-NC-LOC-PRIV-ND-*
- licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
- conditions Of Use: *
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: LOC
- conditions Of Use: NC
- conditions Of Use: ND
- conditions Of Use: NORED
- conditions Of Use: PRIV
- non Standard Conditions Of Use: The treebank has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the treebank linked to audio and video files is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- organization Short Name: ILN
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- metadata Creation Date: 26.09.2018
- metadata Last Date Updated: 13.12.2022
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: kristin.hagen@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- version: conllu version 2021 conllx and glossa version November 2022
- validated: true
- validation Type: content
- validation Mode: manual
- validation Mode Details: The treebank is manually corrected by at least one person
- validation Extent: partial
- validator:
- actor Info
- actor Type: organization
- organization Info
- organization Name: The LIA project
- organization Short Name: LIA
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- documentation Unstructured
- role: documentation
- document Unstructured: http://tekstlab.uio.no/LIA/trebank.html (In Norwegian)
- documentation Structured
- role: documentation
- document Info
- document Type: proceedings
- title: The LIA Treebank of Spoken Norwegian Dialects
- author: Lilja Øvrelid and Andre Kåsen and Kristin Hagen and Anders Nøklestad and Per Erik Solberg and Janne Bondi Johannessen
- editor: Nicoletta Calzolari et al
- year: 2018
- book Title: Proceedings of the Eleventh International Conference on Language Resources and Evaluation
- url: http://www.lrec-conf.org/proceedings/lrec2018/summaries/642.html
- creation Start Date: 01.04.2014
- creation End Date: 01.12.2022
- resource Creator
- actor Info
- actor Type: organization
- organization Info
- organization Name: The LIA project (Project participants and employees in the LIA project)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://tekstlab.uio.no/LIA/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- funding Project:
- project Info
- project Name: LIA (Language Infrastructure made Accessible)
- project Short Name: LIA
- project I D: 22 59 41
- url: http://tekstlab.uio.no/LIA/
- url: https://www.hf.uio.no/iln/english/research/projects/language-infrastructure-made-accessible/index.html
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Start Date: 04.01.2014
- project End Date: 31.12.2019
- corpus Info
- corpus Type: Treebank
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: Downloadable in two formats: conllx-format and conllu-format
- size Per Text Format
- size Info
- size: 77 701
- size Unit: tokens
- size Info
- size: 7536 speech segments
- size Unit: utterances
- size Info
- size: 55 410
- size Unit: tokens
- size Info
- size: 5250 speech segments
- size Unit: utterances
- size Info
- character Encoding Info
- character Encoding: utf-8
- text Format Info
- corpus Part Info
- media Type: audio
- corpus Audio Info
- audio Size Info
- size Info
- size: 19 wav files
- size Unit: files
- size Info
- setting Info
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Semiformal or informal interviews with one or more interviewers. Often the recordings are more like conversations. The recordings are mostly from peoples homes.
- audio Format Info
- mime Type: wav and mp3
- compression Info
- compression: true
- compression Name: mp3
- audio Size Info
- corpus Part General Info
- person Source Set Info
- age Of Persons: teenager
- age Of Persons: adult
- age Of Persons: elderly
- age Range Start: 14
- age Range End: 91
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: Dialects from 17 places in Norway
- geographic Distribution Of Persons: All over Norway
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: No
- language Name: Norwegian
- language Info
- language Id: Nn
- language Name: Norwegian Nynorsk
- modality Info
- modality Type: spokenLanguage
- modality Type Details: Norwegian dialects. Orthographic transcription
- size Info
- size: 77 701
- size Unit: tokens
- size Info
- size: 7536 speech segments
- size Unit: utterances
- annotation Info
- annotation Type: speechAnnotation-orthographicTranscription
- annotation Type: morphosyntacticAnnotation-posTagging
- annotation Type: syntacticAnnotation-treebanks
- annotation Description: Original version in conllx-format,annotated with morphological and dependency-style syntactic analysis. The treebank has also been automatically converted to the UD scheme and is available in conllu-format.
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: NDT Guidelines for Morphological and Syntactic Annotation
- author: Kari Kinn, Per Erik Solberg og Pål Kristian Eriksen. Translated from Norwegian to English by Per Erik Solberg
- year: 2013
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: Retningslinjer for syntaktisk annotasjon i LIA
- author: Andre Kåsen, Kristin Hagen, Lilja Øvrelid, Signe Laake, Håvard Østli
- year: 2019
- url: http://tekstlab.uio.no/LIA/pdf/parseretningslinjer-lia12042019.pdf
- annotation Tool
- target Resource Name U R I: https://ufal.mff.cuni.cz/tred/
- annotation Tool
- target Resource Name U R I: Read about the annotation process in Norwegian: http://tekstlab.uio.no/LIA/verktoy.html
- classification Info
- genre Info
- genre Type: speechGenre
- genre: informal
- unstandardised Genre: conversations and informal interviews
- genre Info
- classification Info
- genre Info
- genre Type: speechGenre
- genre: semi formal
- unstandardised Genre: interviews
- genre Info
- time Coverage Info
- time Coverage: 1958 – 1981
- geographic Coverage Info
- geographic Coverage: 17 places from all over Norway
- person Source Set Info
dc:type | corpus |
dc:title | The LIA Treebank |
dc:identifier | oai:tekstlab.uio.no:lia-trebanken |
dc:description | The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in three versions: A downloadable version in conllx format, a searchable version in the search interface Glossa and a downloadable version in in conllu format. The conllu version is automatically converted to Universal Dependencies and includes 5250 speech segments and 55 410 tokens. LIA Norwegian is a speech corpus with old recordings (1939 – 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2014-04-01 |
dc:date | 2022-11-31 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc:creator | The LIA project (Project participants and employees in the LIA project) |
dc:lang | Norwegian |
dc:lang | Norwegian Nynorsk |