Norwegian idioms
Extended metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: Norske idiom
- resource Name: Norwegian idioms
- description: Dette datasettet består av 3537 norske idiom og fraser som finst fleire enn 100 gonger i Nettbiblioteket. Det er 3455 idiom på bokmål og 88 på nynorsk. I framtida vil vi prøve å leggje til fleire idiom for nynorsk. Sjå dokumentasjonsfila for ei beskriving av datasettet. Dataa kan nyttast til å måle ein generativ språkmodell si evne til å fullføre kjende idiom eller som ei 'masked language modelling'-oppgåve.
- description: This dataset consists of 3537 Norwegian idioms and phrases that appear more than 100 times in the online library of the National Library of Norway. There are 3455 idioms in Norwegian Bokmål and 88 in Norwegian Nynorsk. In the future we will try to add more idioms for Nynorsk. See the documentation file for a description of the dataset. The data can be used to measure a generative language model's ability to complete well known idioms or as a masked language modeling task.
- url:
- P I D: hdl:21.11146/96
- identifier: sbr-96
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location:
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-ZERO (CC-ZERO)
- licence Url:
- licensor:
- actor Info:
- actor Type: organization
- role: Licensor
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- communication Info:
- email:
- url:
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- contact
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- communication Info:
- email:
- url:
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- actor Info:
- actor Type: person
- role: Resource Creator
- person Info:
- surname: Enstad
- given Name: Tita
- affiliation:
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- corpus Info:
- corpus Type: Written Corpus
- corpus Part Info:
- media Type: text
- corpus Text Info:
- text Format Info:
- mime Type: application/json
- size Per Text Format:
- size Info:
- size: 3537
- size Unit: idiomaticExpressions
- character Encoding Info:
- character Encoding: UTF-8
- corpus Part General Info:
- linguality Info:
- linguality Type: multilingual
- multilinguality Type: other
- language Info:
- language Id: nb
- language Name: Norwegian Bokmål
- size Per Language:
- size Info:
- size: 3455
- size Unit: idiomaticExpressions
- language Info:
- language Id: nn
- language Name: Norwegian Nynorsk
- size Per Language:
- size Info:
- size: 88
- size Unit: idiomaticExpressions
- modality Info:
- modality Type: writtenLanguage
dc:type | corpus |
dc:title | Norwegian idioms |
dc:identifier | |
dc:description | This dataset consists of 3537 Norwegian idioms and phrases that appear more than 100 times in the online library of the National Library of Norway. There are 3455 idioms in Norwegian Bokmål and 88 in Norwegian Nynorsk. In the future we will try to add more idioms for Nynorsk. See the documentation file for a description of the dataset. The data can be used to measure a generative language model's ability to complete well known idioms or as a masked language modeling task. |
dc:publisher | |
dc:format | downloadable |
dc:date | |
dc:date | 2024-10-10 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-ZERO (CC-ZERO) |
dc:rights | |
dc:creator | Tita Enstad |
dc:lang | Norwegian Bokmål |
dc:lang | Norwegian Nynorsk |