N-gram – Norwegian Bokmål News Text

This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank

For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.

Extended metadata

resource Common Info:
resource Type: corpus
identification Info:
resource Name: N-gram – Norwegian Bokmål News Text
resource Name: N-gram – nyhetstekst på bokmål
description: This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.
description: Dette korpuset inneholder n-grammer på bokmål, hentet ut fra Norsk aviskorpus. Tekstgrunnlaget for korpuset er 665 millioner ord med løpende tekst høstet fra forskjellige norske nettaviser. Sekvenser av ett til seks ord er generert (unigrammer, bigrammer, trigrammer, 4-grammer, 5-grammer og 6-grammer) og ordnet etter frekvens. Dette arbeidet ble gjort av Uni Research på vegne av Nasjonalbiblioteket og Språkbanken. For enkelhets skyld ble det også laget en samling med de 1000 mest frekvente n-grammene av alle typer nevnt ovenfor for nedlasting separat..
url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-29/
P I D: hdl:21.11146/29
identifier: sbr-29
distribution Info:
licence Info:
user Category: Public
distribution Access Medium: downloadable
download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-29/
licence:
licence Family: Creative Commons (CC)
licence Name: Creative_Commons-ZERO (CC-ZERO)
licence Url: https://creativecommons.org/publicdomain/zero/1.0/
licensor:
actor Info:
actor Type: organization
role: Licensor
organization Info:
organization Name: National Library of Norway
organization Name: Nasjonalbiblioteket
organization Short Name: NLN
organization Short Name: NB
department Name: The Language Bank
department Name: Språkbanken
communication Info:
email: sprakbanken@nb.no
url: https://www.nb.no/sprakbanken/
address: P.O. Box 2674 Solli
zip Code: 0203
city: Oslo
region: Oslo
country: Norway
distribution Rights Holder
- actor Info:
- actor Type: organization
- role: Distribution Rights Holder
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
contact
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
metadata Info:
metadata Creation Date: 08.10.2015
metadata Language Name: English
metadata Language Id: en
metadata Last Date Updated: 05.07.2023
metadata Creator
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Birkenes
- given Name: Magnus Breder
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Lindstad
- given Name: Arne Martinus
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
version Info:
version: 1.0
last Date Updated: 22.12.2011
validation Info:
validated: false
resource Creation Info:
creation Start Date: 03.01.2011
creation End Date: 22.12.2011
resource Creator
- actor Info:
- actor Type: person
- role: Resource Creator
- person Info:
- surname: Hofland
- given Name: Knut
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UiB

Download resources

Download metadata

Download metadata https://www.nb.no/sprakbanken/oai?verb=GetRecord&identifier=oai:nb.no:sbr-29&metadataPrefix=cmdi

dc:type	corpus
dc:title	N-gram – Norwegian Bokmål News Text
dc:identifier	oai:nb.no:sbr-29
dc:description	This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.
dc:publisher
dc:format	downloadable
dc:date	2011-01-03
dc:date	2011-12-22
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-ZERO (CC-ZERO)
dc:rights	https://creativecommons.org/publicdomain/zero/1.0/
dc:creator	Knut Hofland
dc:lang	Norwegian Bokmål

N-gram – Norwegian Bokmål News Text

Extended metadata

Resource Common Info

Corpus Info

Dublin Core (DC)

Download resources

Download metadata