LIA-trebanken

LIA-trebanken består av 7536 talemålssegment og 77 701 ord/token frå talespråkskorpuset LIA norsk. Trebanken er annotert morfologisk og syntaktisk og manuelt korrigert. LIA-trebanken er tilgjengelig i tre versjoner: en nedlastbar versjon i conllx-format, en søkbar versjon i søkegrensesnittet Glossa og en nedlastbar versjon i conllu-format. Conllu-versjonen er automatisk konvertert til Universal Dependencies og inneholder 5250 talemålssegment og 55 410 ord/token

LIA norsk er et talespråkskorpus med gamle opptak (1939 – 1996) fra fire norske universitet: NTNU, UiB, UiO og UiT.

Utvidet metadata

resource Common Info:
resource Type: corpus
identification Info:
resource Name: LIA-trebanken
resource Name: The LIA Treebank
description: The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in three versions: A downloadable version in conllx format, a searchable version in the search interface Glossa and a downloadable version in in conllu format. The conllu version is automatically converted to Universal Dependencies and includes 5250 speech segments and 55 410 tokens. LIA Norwegian is a speech corpus with old recordings (1939 – 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT.
description: LIA-trebanken består av 7536 talemålssegment og 77 701 ord/token frå talespråkskorpuset LIA norsk. Trebanken er annotert morfologisk og syntaktisk og manuelt korrigert. LIA-trebanken er tilgjengelig i tre versjoner: en nedlastbar versjon i conllx-format, en søkbar versjon i søkegrensesnittet Glossa og en nedlastbar versjon i conllu-format. Conllu-versjonen er automatisk konvertert til Universal Dependencies og inneholder 5250 talemålssegment og 55 410 ord/token LIA norsk er et talespråkskorpus med gamle opptak (1939 – 1996) fra fire norske universitet: NTNU, UiB, UiO og UiT.
resource Short Name: The LIA Treebank
resource Short Name: LIA-trebanken
url: http://tekstlab.uio.no/LIA/norsk/index_english.html
url: http://tekstlab.uio.no/LIA/trebank.html
P I D: http://hdl.handle.net/11538/0000-000C-368B-B
distribution Info:
licence Info:
user Category: Public
distribution Access Medium: downloadable
download Location: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-NynorskLIA
download Location: https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA/
licence:
licence Family: Creative Commons (CC)
licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
licence Url: http://creativecommons.org/licenses/by-nc-sa/4.0/
conditions Of Use: BY
conditions Of Use: NC
conditions Of Use: SA
licensor:
actor Info:
actor Type: organization
organization Info:
organization Name: University of Oslo
organization Name: Universitetet i Oslo
organization Short Name: UiO
organization Short Name: UoO
department Name: Institutt for lingvistiske og nordiske studier (ILN)
department Name: Department of Linguistics and Scandinavian Studies
communication Info:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zip Code: 0317
city: OSLO
country: Norway
actor Info:
actor Type: organization
organization Info:
organization Name: University of Oslo
organization Name: Universitetet i Oslo
organization Short Name: UiO
organization Short Name: UoO
department Name: Department of Informatics
department Name: Institutt for informatikk (IFI)
communication Info:
email: liljao@ifi.uio.no
url: https://www.mn.uio.no/ifi/english/people/aca/liljao/index.html
distribution Rights Holder
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Informatics
- department Name: Institutt for informatikk (IFI)
- communication Info:
- email: liljao@ifi.uio.no
- url: https://www.mn.uio.no/ifi/english/people/aca/liljao/index.html
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/about/organization/text-laboratory/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
licence Info:
user Category: Academic
distribution Access Medium: accessibleThroughInterface
execution Location: https://tekstlab.uio.no/glossa3/liatrebanken
licence:
licence Family: CLARIN
licence Name: CLARIN_ACA-NC-LOC-PRIV-ND-*
licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
conditions Of Use: *
conditions Of Use: BY
conditions Of Use: ID
conditions Of Use: LOC
conditions Of Use: NC
conditions Of Use: ND
conditions Of Use: NORED
conditions Of Use: PRIV
non Standard Conditions Of Use: The treebank has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the treebank linked to audio and video files is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
licensor:
actor Info:
actor Type: organization
organization Info:
organization Name: University of Oslo
organization Name: Universitetet i Oslo
organization Short Name: UiO
organization Short Name: UoO
department Name: Department of Linguistics and Scandinavian Studies
department Name: Institutt for lingvistiske og nordiske studier (ILN)
communication Info:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/english/
address: Box 1102 Blindern
zip Code: 0317
city: OSLO
country: Norway
distribution Rights Holder
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- organization Short Name: ILN
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
contact
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
metadata Info:
metadata Creation Date: 26.09.2018
metadata Last Date Updated: 22.11.2023
metadata Creator
- actor Info:
- actor Type: person
- person Info:
- surname: Hagen
- given Name: Kristin
- organization Info:
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info:
- email: kristin.hagen@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
version Info:
version: conllu version 2021 conllx and glossa version November 2022
validation Info:
validated: true
validation Type: content
validation Mode: manual
validation Mode Details: The treebank is manually corrected by at least one person
validation Extent: partial
validator:
actor Info:
actor Type: organization
organization Info:
organization Name: The LIA project
organization Short Name: LIA
department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
communication Info:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zip Code: 0317
city: OSLO
country: Norway
resource Documentation Info:
documentation Unstructured:
role: documentation
document Unstructured: http://tekstlab.uio.no/LIA/trebank.html (In Norwegian)
documentation Structured:
role: documentation
document Info:
document Type: other
title: The LIA Treebank of Spoken Norwegian Dialects
author: Lilja Øvrelid, Andre Kåsen, Kristin Hagen, Anders Nøklestad, Per Erik Solberg and Janne Bondi Johannessen
editor: Nicoletta Calzolari et al
year: 2018
book Title: Proceedings of the Eleventh International Conference on Language Resources and Evaluation
url: http://www.lrec-conf.org/proceedings/lrec2018/summaries/642.html
resource Creation Info:
creation Start Date: 01.04.2014
creation End Date: 01.12.2022
resource Creator
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: The LIA project (Project participants and employees in the LIA project)
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://tekstlab.uio.no/LIA/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
funding Project:
project Info:
project Name: LIA (Language Infrastructure made Accessible)
project Short Name: LIA
project I D: 22 59 41
url: http://tekstlab.uio.no/LIA/
url: https://www.hf.uio.no/iln/english/research/projects/language-infrastructure-made-accessible/index.html
funding Type: nationalFunds
funder: The Research Council of Norway
funding Country: Norway
project Start Date: 04.01.2014
project End Date: 31.12.2019

corpus Info:
corpus Type: Treebank
corpus Part Info:
media Type: text
corpus Text Info:
text Format Info:
mime Type: Downloadable in two formats: conllx-format and conllu-format
size Per Text Format:
size Info:
size: 77 701
size Unit: tokens
size Info:
size: 7536 speech segments
size Unit: utterances
size Info:
size: 55 410
size Unit: tokens
size Info:
size: 5250 speech segments
size Unit: utterances
character Encoding Info:
character Encoding: utf-8
corpus Part Info:
media Type: audio
corpus Audio Info:
audio Size Info:
size Info:
size: 19 wav files
size Unit: files
setting Info:
naturality: spontaneous
conversational Type: dialogue
audience: few
interactivity: overlapping
interaction: Semiformal or informal interviews with one or more interviewers. Often the recordings are more like conversations. The recordings are mostly from peoples homes.
audio Format Info:
mime Type: wav and mp3
compression Info:
compression: true
compression Name: mp3
corpus Part General Info:
person Source Set Info:
age Of Persons: teenager
age Of Persons: adult
age Of Persons: elderly
age Range Start: 14
age Range End: 91
sex Of Persons: mixed
origin Of Persons: native
dialect Accent Of Persons: Dialects from 17 places in Norway
geographic Distribution Of Persons: All over Norway
linguality Info:
linguality Type: monolingual
language Info:
language Id: No
language Name: Norwegian
language Info:
language Id: Nn
language Name: Norwegian Nynorsk
modality Info:
modality Type: spokenLanguage
modality Type Details: Norwegian dialects. Orthographic transcription
size Info:
size: 77 701
size Unit: tokens
size Info:
size: 7536 speech segments
size Unit: utterances
annotation Info:
annotation Type: speechAnnotation-orthographicTranscription
annotation Type: morphosyntacticAnnotation-posTagging
annotation Type: syntacticAnnotation-treebanks
annotation Description: Original version in conllx-format,annotated with morphological and dependency-style syntactic analysis. The treebank has also been automatically converted to the UD scheme and is available in conllu-format.
annotation Manual Structured:
role: annotationManual
document Info:
document Type: manual
title: NDT Guidelines for Morphological and Syntactic Annotation
author: Kari Kinn, Per Erik Solberg og Pål Kristian Eriksen. Translated from Norwegian to English by Per Erik Solberg
year: 2013
url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/
annotation Manual Structured:
role: annotationManual
document Info:
document Type: manual
title: Retningslinjer for syntaktisk annotasjon i LIA
author: Andre Kåsen, Kristin Hagen, Lilja Øvrelid, Signe Laake, Håvard Østli
year: 2019
url: http://tekstlab.uio.no/LIA/pdf/parseretningslinjer-lia12042019.pdf
annotation Tool:
target Resource Name U R I: https://ufal.mff.cuni.cz/tred/
annotation Tool:
target Resource Name U R I: Read about the annotation process in Norwegian: http://tekstlab.uio.no/LIA/verktoy.html
classification Info:
genre Info:
genre Type: speechGenre
genre: informal
unstandardised Genre: conversations and informal interviews
classification Info:
genre Info:
genre Type: speechGenre
genre: semi formal
unstandardised Genre: interviews
time Coverage Info:
time Coverage: 1958 – 1981
geographic Coverage Info:
geographic Coverage: 17 places from all over Norway

Last ned ressurser

Gå til ressursside

Gå til ressursside https://tekstlab.uio.no/glossa3/liatrebanken

dc:type	corpus
dc:title	LIA-trebanken
dc:identifier	oai:tekstlab.uio.no:lia-trebanken
dc:description	LIA-trebanken består av 7536 talemålssegment og 77 701 ord/token frå talespråkskorpuset LIA norsk. Trebanken er annotert morfologisk og syntaktisk og manuelt korrigert. LIA-trebanken er tilgjengelig i tre versjoner: en nedlastbar versjon i conllx-format, en søkbar versjon i søkegrensesnittet Glossa og en nedlastbar versjon i conllu-format. Conllu-versjonen er automatisk konvertert til Universal Dependencies og inneholder 5250 talemålssegment og 55 410 ord/token LIA norsk er et talespråkskorpus med gamle opptak (1939 – 1996) fra fire norske universitet: NTNU, UiB, UiO og UiT.
dc:publisher
dc:format	downloadable
dc:date	2014-04-01
dc:date	2022-11-31
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
dc:rights	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc:creator	The LIA project (Project participants and employees in the LIA project)
dc:lang	norsk
dc:lang	nynorsk

LIA-trebanken

Utvidet metadata

Resource Common Info

Corpus Info

Dublin Core (DC)

Last ned ressurser

Gå til ressursside