Skip to content

The Lexicographic Corpus for Norwegian Bokmål

The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words.
The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.

The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words.
The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.

Extended metadata

Download resources

Go to resource page