Ressurser fra ressursbanken Archive - Språkbanken

I samarbeid med

Norwegian Newspaper Corpus Nynorsk

The Norwegian Newspaper Corpus (Nynorsk) is a freely accessible text corpus representing modern Norwegian in the written variety Norwegian Nynorsk. As of today, the material contains texts from 1998 …

Language:
Norwegian, Norwegian Nynorsk
Origin:
CLARINO Bergen Centre
Licence:
Creative_Commons-BY (CC-BY)
Type:
Text
Updated:
14.04.2025
Norwegian Newspaper Corpus annotated (2001-2009)

This is a subpart of Norsk aviskorpus, grammatically annotated and classified. It comprises 35 692 210 tokens and covers Norwegian bokmål in the time span 2001-2009. The full Norwegian Newspaper …

Language:
Norwegian, Norwegian Bokmål
Origin:
CLARINO Bergen Centre
Licence:
Creative_Commons-BY-NC (CC-BY-NC)
Type:
Text
Updated:
28.01.2025
Synthetic text images for North, South, Lule and Inare Sámi

This dataset contains synthetic line images meant for fitting OCR models for North, South, Lule and Inari Sámi. Clean line images are created using Pillow and they are subsequently distorted using …

Language:
Origin:
Language Bank
Licence:
Creative_Commons-BY (CC-BY)
Type:
Tool
Updated:
28.01.2025
OCR Models for Sámi Languages

This is a collection of models for OCR (optical character recognition) of Sámi languages. These can be used to recognize text in images of printed text (scanned books, magazines, etc.) in North …

Language:
Origin:
Language Bank
Licence:
Creative_Commons-BY (CC-BY)
Type:
Tool
Updated:
22.01.2025
Norwegian idioms

This dataset consists of 3537 Norwegian idioms and phrases that appear more than 100 times in the online library of the National Library of Norway. There are 3455 idioms in Norwegian Bokmål and 88 in …

Language:
Norwegian Bokmål, Norwegian Nynorsk
Origin:
Language Bank
Licence:
Creative_Commons-ZERO (CC-ZERO)
Type:
Text
Updated:
10.10.2024
Norwegian Government Press Conference Speech Corpus

The Norwegian Government Press Conference Speech Corpus (NorGovPCC) consists of approximately 138 hours of speech generated from audio with aligned subtitles from press conferences published by the …

Origin:
Language Bank
Licence:
Norwegian Licence for Open Government Data (NLOD)
Type:
Speech
Updated:
10.07.2024
TeflonNorL2

This page is currently a placeholder for the Norwegian data in the Teflon project. The Teflon project (https://teflon.aalto.fi/) aims at studying computer assisted language learning for immigrant …

Language:
Norwegian
Origin:
Language Bank
Licence:
unspecified
Type:
Speech, Text
Updated:
23.03.2024
Grapheme-to-Phoneme Models for Norwegian Bokmål

This resource contains Grapheme-to-Phoneme (G2P) models for Norwegian Bokmål, which have been adapted to the G2P system Phonetisaurus (https://github.com/AdolfVonKleist/Phonetisaurus). The G2P models …

Language:
Origin:
Language Bank
Licence:
Creative_Commons-ZERO (CC-ZERO)
Type:
Tool
Updated:
09.02.2024
Målfrid 2024 – Freely Available Documents from Norwegian State Institutions

This corpus consists of documents from 497 domains of Norwegian state institutions and comprises approximately 2.6 billion tokens in total. In addition to Norwegian Bokmål and Nynorsk texts, the …

Language:
Norwegian Bokmål, Norwegian Nynorsk, English, Northern Sami, Southern Sami, Lule Sami
Origin:
Language Bank
Licence:
Norwegian Licence for Open Government Data (NLOD)
Type:
Text
Updated:
31.01.2024
Glossa

Glossa is a tool for researchers who want to search linguistically annotated corpora. Glossa is designed to make it easy for researchers to: - create complex searches - explore the result via e.g. …

Language:
Origin:
CLARINO Text Laboratory Centre
Licence:
MIT license
Type:
Tool
Updated:
11.01.2024