LIA-trebanken
Utvidet metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: LIA-trebanken
- resource Name: The LIA Treebank
- description: The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in three versions: A downloadable version in conllx format, a searchable version in the search interface Glossa and a downloadable version in in conllu format. The conllu version is automatically converted to Universal Dependencies and includes 5250 speech segments and 55 410 tokens. LIA Norwegian is a speech corpus with old recordings (1939 – 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT.
- description: LIA-trebanken består av 7536 talemålssegment og 77 701 ord/token frå talespråkskorpuset LIA norsk. Trebanken er annotert morfologisk og syntaktisk og manuelt korrigert. LIA-trebanken er tilgjengelig i tre versjoner: en nedlastbar versjon i conllx-format, en søkbar versjon i søkegrensesnittet Glossa og en nedlastbar versjon i conllu-format. Conllu-versjonen er automatisk konvertert til Universal Dependencies og inneholder 5250 talemålssegment og 55 410 ord/token LIA norsk er et talespråkskorpus med gamle opptak (1939 – 1996) fra fire norske universitet: NTNU, UiB, UiO og UiT.
- resource Short Name: The LIA Treebank
- resource Short Name: LIA-trebanken
- url: http://tekstlab.uio.no/LIA/norsk/index_english.html
- url: http://tekstlab.uio.no/LIA/trebank.html
- P I D: http://hdl.handle.net/11538/0000-000C-368B-B
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-NynorskLIA
- download Location: https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA/
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- licence Url: http://creativecommons.org/licenses/by-nc-sa/4.0/
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- licensor:
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- department Name: Department of Linguistics and Scandinavian Studies
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Informatics
- department Name: Institutt for informatikk (IFI)
- communication Info:
- email: liljao@ifi.uio.no
- url: https://www.mn.uio.no/ifi/english/people/aca/liljao/index.html
- distribution Rights Holder
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Informatics
- department Name: Institutt for informatikk (IFI)
- communication Info:
- email: liljao@ifi.uio.no
- url: https://www.mn.uio.no/ifi/english/people/aca/liljao/index.html
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info:
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/about/organization/text-laboratory/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- organization Short Name: ILN
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- actor Info:
- actor Type: person
- person Info:
- surname: Hagen
- given Name: Kristin
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: The LIA project (Project participants and employees in the LIA project)
- corpus Info:
- corpus Type: Treebank
- corpus Part Info:
- media Type: text
- corpus Text Info:
- text Format Info:
- mime Type: Downloadable in two formats: conllx-format and conllu-format
- size Per Text Format:
- size Info:
- size: 77 701
- size Unit: tokens
- size Info:
- size: 7536 speech segments
- size Unit: utterances
- size Info:
- size: 55 410
- size Unit: tokens
- size Info:
- size: 5250 speech segments
- size Unit: utterances
- character Encoding Info:
- character Encoding: utf-8
- corpus Part Info:
- media Type: audio
- corpus Audio Info:
- audio Size Info:
- size Info:
- size: 19 wav files
- size Unit: files
- setting Info:
- naturality: spontaneous
- conversational Type: dialogue
- audience: few
- interactivity: overlapping
- interaction: Semiformal or informal interviews with one or more interviewers. Often the recordings are more like conversations. The recordings are mostly from peoples homes.
- audio Format Info:
- mime Type: wav and mp3
- compression Info:
- compression: true
- compression Name: mp3
- corpus Part General Info:
- person Source Set Info:
- age Of Persons: teenager
- age Of Persons: adult
- age Of Persons: elderly
- age Range Start: 14
- age Range End: 91
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: Dialects from 17 places in Norway
- geographic Distribution Of Persons: All over Norway
- linguality Info:
- linguality Type: monolingual
- language Info:
- language Id: No
- language Name: Norwegian
- language Info:
- language Id: Nn
- language Name: Norwegian Nynorsk
- modality Info:
- modality Type: spokenLanguage
- modality Type Details: Norwegian dialects. Orthographic transcription
- size Info:
- size: 77 701
- size Unit: tokens
- size Info:
- size: 7536 speech segments
- size Unit: utterances
- annotation Info:
- annotation Type: speechAnnotation-orthographicTranscription
- annotation Type: morphosyntacticAnnotation-posTagging
- annotation Type: syntacticAnnotation-treebanks
- annotation Description: Original version in conllx-format,annotated with morphological and dependency-style syntactic analysis. The treebank has also been automatically converted to the UD scheme and is available in conllu-format.
- annotation Manual Structured:
- role: annotationManual
- document Info:
- document Type: manual
- title: NDT Guidelines for Morphological and Syntactic Annotation
- author: Kari Kinn, Per Erik Solberg og Pål Kristian Eriksen. Translated from Norwegian to English by Per Erik Solberg
- year: 2013
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/
- annotation Manual Structured:
- role: annotationManual
- document Info:
- document Type: manual
- title: Retningslinjer for syntaktisk annotasjon i LIA
- author: Andre Kåsen, Kristin Hagen, Lilja Øvrelid, Signe Laake, Håvard Østli
- year: 2019
- url: http://tekstlab.uio.no/LIA/pdf/parseretningslinjer-lia12042019.pdf
- annotation Tool:
- target Resource Name U R I: https://ufal.mff.cuni.cz/tred/
- annotation Tool:
- target Resource Name U R I: Read about the annotation process in Norwegian: http://tekstlab.uio.no/LIA/verktoy.html
- classification Info:
- genre Info:
- genre Type: speechGenre
- genre: informal
- unstandardised Genre: conversations and informal interviews
- classification Info:
- genre Info:
- genre Type: speechGenre
- genre: semi formal
- unstandardised Genre: interviews
- time Coverage Info:
- time Coverage: 1958 – 1981
- geographic Coverage Info:
- geographic Coverage: 17 places from all over Norway
dc:type | corpus |
dc:title | LIA-trebanken |
dc:identifier | oai:tekstlab.uio.no:lia-trebanken |
dc:description | LIA-trebanken består av 7536 talemålssegment og 77 701 ord/token frå talespråkskorpuset LIA norsk. Trebanken er annotert morfologisk og syntaktisk og manuelt korrigert. LIA-trebanken er tilgjengelig i tre versjoner: en nedlastbar versjon i conllx-format, en søkbar versjon i søkegrensesnittet Glossa og en nedlastbar versjon i conllu-format. Conllu-versjonen er automatisk konvertert til Universal Dependencies og inneholder 5250 talemålssegment og 55 410 ord/token LIA norsk er et talespråkskorpus med gamle opptak (1939 – 1996) fra fire norske universitet: NTNU, UiB, UiO og UiT. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2014-04-01 |
dc:date | 2022-11-31 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc:creator | The LIA project (Project participants and employees in the LIA project) |
dc:lang | norsk |
dc:lang | nynorsk |