Norwegian Newspaper Corpus Nynorsk

The Norwegian Newspaper Corpus (Nynorsk) is a freely accessible text corpus representing modern Norwegian in the written variety Norwegian Nynorsk. As of today, the material contains texts from 1998 to 2020, and the corpus contains 21 million running words (tokens). Through the search interface Corpuscle, you may search for all running words in the text (tokens) and sort by source (newspaper name), year and date. The corpus has been compiled through daily harvesting and processing of published texts from the online editions of 11 major Norwegian newspapers, and was created through the project Norwegian Newspaper Corpus at the University of Bergen (1998–2012). Although the project has ended, the corpus is regularly updated via the search interface Corpuscle with new material, and is thus a dynamic, growing corpus. A similar corpus is also available for Norwegian Bokmål.

Extended metadata

resource Common Info:
resource Type: corpus
identification Info:
resource Name: Norwegian Newspaper Corpus Nynorsk
resource Name: Norsk aviskorpus nynorsk
description: The Norwegian Newspaper Corpus (Nynorsk) is a freely accessible text corpus representing modern Norwegian in the written variety Norwegian Nynorsk. As of today, the material contains texts from 1998 to 2020, and the corpus contains 21 million running words (tokens). Through the search interface Corpuscle, you may search for all running words in the text (tokens) and sort by source (newspaper name), year and date. The corpus has been compiled through daily harvesting and processing of published texts from the online editions of 11 major Norwegian newspapers, and was created through the project Norwegian Newspaper Corpus at the University of Bergen (1998–2012). Although the project has ended, the corpus is regularly updated via the search interface Corpuscle with new material, and is thus a dynamic, growing corpus. A similar corpus is also available for Norwegian Bokmål.
description: Norsk Aviskorpus (nynorsk) er eit ope tilgjengeleg tekstkorpus som representerer moderne norsk i skriftvarianten nynorsk. Per i dag inneheld korpuset ca. 21 millinonar ord fra 1998 til 2020, og du kan søkje i løpande ord (tokens) og sortere på kjelde (avisnamn), år og dato. Korpuset byggjast gjennom dagleg innhausting og omarbeiding av publiserte tekstar frå nettutgåva av 11 store norske aviser, og vart oppretta gjennom prosjektet Norsk aviskorpus ved Universitetet i Bergen (1998–2012). Sjølv om prosjektet er avslutta, blir korpuset jamleg oppdatert i søkjegrensesnittet Korpuskel med nytt materale som ein del av drifta i CLARINO, og er såleis eit dynamisk, veksande korpus. Eit tilsvarande korpus er òg tilgjengeleg for norsk bokmål.
resource Short Name: NNC nynorsk
url: http://clarino.uib.no/korpuskel/landing-page?identifier=avis-nno&view=short
P I D: hdl:11495/D9F7-42A1-ED26-0
identifier: avis-nno
distribution Info:
licence Info:
user Category: Public
distribution Access Medium: accessibleThroughInterface
download Location: http://clarino.uib.no/korpuskel/landing-page?identifier=avis-plain
licence:
licence Family: Creative Commons (CC)
licence Name: Creative_Commons-BY (CC-BY)
licence Url: http://creativecommons.org/licenses/by/4.0/
conditions Of Use: BY
ipr Holder
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: Uni Research AS
- department Name: Uni Research Computing
- communication Info:
- email: knut.hofland@uni.no
- url: http://uni.no/nb/staff/directory/knut-hofland/
- city: Bergen
- country: Norway
- telephone Number: +47 5558 9463
contact
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: CLARINO Bergen Centre
- organization Name: CLARINO Bergen senter
- communication Info:
- email: clarin@uib.no
- url: https://clarino.uib.no/
- actor Info:
- actor Type: person
- person Info:
- surname: Meurer
- given Name: Paul
- sex: male
- position: Senior engineer
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: University of Bergen Library
- department Name: Universitetsbiblioteket
- communication Info:
- email: clarin@uib.no
- url: https://repo.clarino.uib.no/
- url: https://clarin.b.uib.no
- city: Bergen
- country: Norway
metadata Info:
metadata Creation Date: 29.09.2015
metadata Last Date Updated: 14.04.2025
metadata Creator
- actor Info:
- actor Type: person
- person Info:
- surname: Lyse
- given Name: Gunn Inger
- sex: female
- position: Researcher (Ph.D)
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- communication Info:
- email: clarin@uib.no
resource Documentation Info:
documentation Unstructured:
role: documentation
document Unstructured: Andersen, Gisle, and Knut Hofland. 2012. “Building a Large Corpus Based on Newspapers from the Web.” In Exploring Newspaper Language: Using the Web to Create and Investigate a Large Corpus of Modern Norwegian, edited by Gisle Andersen, 1–28. Studies in Corpus Linguistics 49. Amsterdam/Philadelphia: John Benjamins Publishing Company
resource Creation Info:
creation Start Date: 1998
funding Project:
project Info:
project Name: Norsk aviskorpus
url: http://avis.uib.no/avis/om-aviskorpuset/english
funding Type: nationalFunds
funder: Research Council of Norway
funding Country: Norway
project Start Date: 1998
project End Date: 2012

Download metadata

Download metadata

dc:type	corpus
dc:title	Norwegian Newspaper Corpus Nynorsk
dc:identifier	oai:clarino.uib.no:avis-nno
dc:description	The Norwegian Newspaper Corpus (Nynorsk) is a freely accessible text corpus representing modern Norwegian in the written variety Norwegian Nynorsk. As of today, the material contains texts from 1998 to 2020, and the corpus contains 21 million running words (tokens). Through the search interface Corpuscle, you may search for all running words in the text (tokens) and sort by source (newspaper name), year and date. The corpus has been compiled through daily harvesting and processing of published texts from the online editions of 11 major Norwegian newspapers, and was created through the project Norwegian Newspaper Corpus at the University of Bergen (1998–2012). Although the project has ended, the corpus is regularly updated via the search interface Corpuscle with new material, and is thus a dynamic, growing corpus. A similar corpus is also available for Norwegian Bokmål.
dc:publisher
dc:format	accessibleThroughInterface
dc:date	1998
dc:date
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-BY (CC-BY)
dc:rights	http://creativecommons.org/licenses/by/4.0/
dc:lang	Norwegian
dc:lang	Norwegian Nynorsk

Norwegian Newspaper Corpus Nynorsk

Extended metadata

Resource Common Info

Corpus Info

Dublin Core (DC)

Download metadata