Skip to content

N-gram – Norwegian Bokmål News Text

This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank

For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.

This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank

For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.

Extended metadata

Download resources

Download metadata