Skip to content
National Library of Norway
|
Språkbanken
Norsk
The Norwegian Language Bank
Resource Catalogue
I samarbeid med
Vis filter
Skjul filter
Type
Origin
Vis filter
Skjul filter
Text
28.01.2025
Norwegian Newspaper Corpus annotated (2001-2009)
This is a subpart of Norsk aviskorpus, grammatically annotated and classified. It comprises 35 692 210 tokens and covers Norwegian bokmål in the time span 2001-2009. The full Norwegian Newspaper …
Language:
Norwegian, Norwegian Bokmål
Origin:
CLARINO Bergen Centre
Licence:
Creative_Commons-BY-NC (CC-BY-NC)
Type:
Text
Updated:
28.01.2025
Text
28.01.2025
Norwegian Newspaper Corpus Nynorsk
The Norwegian Newspaper Corpus (NNC) Nynorsk is a large monitor corpus representing contemporary Norwegian language in the written variety Norwegian Nynorsk. A corresponding corpus is available for …
Language:
Norwegian, Norwegian Nynorsk
Origin:
CLARINO Bergen Centre
Licence:
Creative_Commons-BY (CC-BY)
Type:
Text
Updated:
28.01.2025
Tool
28.01.2025
Synthetic text images for North, South, Lule and Inare Sámi
This dataset contains synthetic line images meant for fitting OCR models for North, South, Lule and Inari Sámi. Clean line images are created using Pillow and they are subsequently distorted using …
Language:
Origin:
Language Bank
Licence:
Creative_Commons-BY (CC-BY)
Type:
Tool
Updated:
28.01.2025
Tool
22.01.2025
OCR Models for Sámi Languages
This is a collection of models for OCR (optical character recognition) of Sámi languages. These can be used to recognize text in images of printed text (scanned books, magazines, etc.) in North …
Language:
Origin:
Language Bank
Licence:
Creative_Commons-BY (CC-BY)
Type:
Tool
Updated:
22.01.2025
Text
10.10.2024
Norwegian idioms
This dataset consists of 3537 Norwegian idioms and phrases that appear more than 100 times in the online library of the National Library of Norway. There are 3455 idioms in Norwegian Bokmål and 88 in …
Language:
Norwegian Bokmål, Norwegian Nynorsk
Origin:
Language Bank
Licence:
Creative_Commons-ZERO (CC-ZERO)
Type:
Text
Updated:
10.10.2024
Speech
10.07.2024
Norwegian Government Press Conference Speech Corpus
The Norwegian Government Press Conference Speech Corpus (NorGovPCC) consists of approximately 138 hours of speech generated from audio with aligned subtitles from press conferences published by the …
Origin:
Language Bank
Licence:
Norwegian Licence for Open Government Data (NLOD)
Type:
Speech
Updated:
10.07.2024
Speech, Text
23.03.2024
TeflonNorL2
This page is currently a placeholder for the Norwegian data in the Teflon project. The Teflon project (https://teflon.aalto.fi/) aims at studying computer assisted language learning for immigrant …
Language:
Norwegian
Origin:
Language Bank
Licence:
unspecified
Type:
Speech, Text
Updated:
23.03.2024
Tool
09.02.2024
Grapheme-to-Phoneme Models for Norwegian Bokmål
This resource contains Grapheme-to-Phoneme (G2P) models for Norwegian Bokmål, which have been adapted to the G2P system Phonetisaurus (https://github.com/AdolfVonKleist/Phonetisaurus). The G2P models …
Language:
Origin:
Language Bank
Licence:
Creative_Commons-ZERO (CC-ZERO)
Type:
Tool
Updated:
09.02.2024
Text
31.01.2024
Målfrid 2024 – Freely Available Documents from Norwegian State Institutions
This corpus consists of documents from 497 domains of Norwegian state institutions and comprises approximately 2.6 billion tokens in total. In addition to Norwegian Bokmål and Nynorsk texts, the …
Language:
Norwegian Bokmål, Norwegian Nynorsk, English, Northern Sami, Southern Sami, Lule Sami
Origin:
Language Bank
Licence:
Norwegian Licence for Open Government Data (NLOD)
Type:
Text
Updated:
31.01.2024
Tool
11.01.2024
Glossa
Glossa is a tool for researchers who want to search linguistically annotated corpora. Glossa is designed to make it easy for researchers to: - create complex searches - explore the result via e.g. …
Language:
Origin:
CLARINO Text Laboratory Centre
Licence:
MIT license
Type:
Tool
Updated:
11.01.2024
Vis filter
Skjul filter