Skip to content

Randomized extraction of the New Norwegian corpus

Randomized extraction of the New Norwegian Corpus (Nynorskkorpuset).

Contains sentences in New Norwegian (Nynorsk) from the year 2000 and after. Tab-separated, one word pr. line, lemmatized and morphologically tagged, year and domain information is given. Annotation is done with the Oslo-Bergen tagger. Sentences in the Bokmål standard have been removed.

This corpus is intended for use in the development of language technology.

Size: 3,3 million sentences, 57,5 million words.

Randomized extraction of the New Norwegian Corpus (Nynorskkorpuset).

Contains sentences in New Norwegian (Nynorsk) from the year 2000 and after. Tab-separated, one word pr. line, lemmatized and morphologically tagged, year and domain information is given. Annotation is done with the Oslo-Bergen tagger. Sentences in the Bokmål standard have been removed.

This corpus is intended for use in the development of language technology.

Size: 3,3 million sentences, 57,5 million words.

Extended metadata

Download resources

Download metadata