Skip to content

Norwegian Newspaper Corpus Nynorsk

The Norwegian Newspaper Corpus (Nynorsk) is a freely accessible text corpus representing modern Norwegian in the written variety Norwegian Nynorsk. As of today, the material contains texts from 1998 to 2020, and the corpus contains 21 million running words (tokens). Through the search interface Corpuscle, you may search for all running words in the text (tokens) and sort by source (newspaper name), year and date. The corpus has been compiled through daily harvesting and processing of published texts from the online editions of 11 major Norwegian newspapers, and was created through the project Norwegian Newspaper Corpus at the University of Bergen (1998–2012). Although the project has ended, the corpus is regularly updated via the search interface Corpuscle with new material, and is thus a dynamic, growing corpus. A similar corpus is also available for Norwegian Bokmål.

The Norwegian Newspaper Corpus (Nynorsk) is a freely accessible text corpus representing modern Norwegian in the written variety Norwegian Nynorsk. As of today, the material contains texts from 1998 to 2020, and the corpus contains 21 million running words (tokens). Through the search interface Corpuscle, you may search for all running words in the text (tokens) and sort by source (newspaper name), year and date. The corpus has been compiled through daily harvesting and processing of published texts from the online editions of 11 major Norwegian newspapers, and was created through the project Norwegian Newspaper Corpus at the University of Bergen (1998–2012). Although the project has ended, the corpus is regularly updated via the search interface Corpuscle with new material, and is thus a dynamic, growing corpus. A similar corpus is also available for Norwegian Bokmål.

Extended metadata

Download metadata