N-gram – nyhetstekst på bokmål
Utvidet metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: N-gram – Norwegian Bokmål News Text
- resource Name: N-gram – nyhetstekst på bokmål
- description: This corpus contains n-grams in Norwegian Bokmål derived from the Norwegian Newspaper Corpus. The source data for the corpus is 665 million words of running text harvested from Norwegian news sources on the web (1998-2011). Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered by frequency. This work was done by Uni Research on behalf of the National Library and the Language Bank For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.
- description: Dette korpuset inneholder n-grammer på bokmål, hentet ut fra Norsk aviskorpus. Tekstgrunnlaget for korpuset er 665 millioner ord med løpende tekst høstet fra forskjellige norske nettaviser. Sekvenser av ett til seks ord er generert (unigrammer, bigrammer, trigrammer, 4-grammer, 5-grammer og 6-grammer) og ordnet etter frekvens. Dette arbeidet ble gjort av Uni Research på vegne av Nasjonalbiblioteket og Språkbanken. For enkelhets skyld ble det også laget en samling med de 1000 mest frekvente n-grammene av alle typer nevnt ovenfor for nedlasting separat..
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-29/
- P I D: hdl:21.11146/29
- identifier: sbr-29
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-29/
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-ZERO (CC-ZERO)
- licence Url: https://creativecommons.org/publicdomain/zero/1.0/
- licensor:
- actor Info:
- actor Type: organization
- role: Licensor
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- distribution Rights Holder
- actor Info:
- actor Type: organization
- role: Distribution Rights Holder
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Birkenes
- given Name: Magnus Breder
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: person
- role: Resource Creator
- person Info:
- surname: Hofland
- given Name: Knut
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UiB
- corpus Info:
- corpus Type: Ngram Corpus
- corpus Part Info:
- media Type: textNgram
- corpus Text Ngram Info:
- ngram Info:
- base Item: word
- order: 6
- text Format Info:
- mime Type: text/plain
- size Per Text Format:
- size Info:
- size: 665000000
- size Unit: words
- character Encoding Info:
- character Encoding: Windows
- corpus Part General Info:
- linguality Info:
- linguality Type: monolingual
- language Info:
- language Id: nb
- language Name: Norwegian Bokmål
- size Per Language:
- size Info:
- size: 665000000
- size Unit: words
- language Variety Info:
- language Variety Type: other
- language Variety Name: news text
- modality Info:
- modality Type: writtenLanguage
- modality Type Details: news text
- size Info:
- size: 665000000
- size Unit: words
- time Coverage Info:
- time Coverage: 1998-2011
dc:type | corpus |
dc:title | N-gram – nyhetstekst på bokmål |
dc:identifier | oai:nb.no:sbr-29 |
dc:description | Dette korpuset inneholder n-grammer på bokmål, hentet ut fra Norsk aviskorpus. Tekstgrunnlaget for korpuset er 665 millioner ord med løpende tekst høstet fra forskjellige norske nettaviser. Sekvenser av ett til seks ord er generert (unigrammer, bigrammer, trigrammer, 4-grammer, 5-grammer og 6-grammer) og ordnet etter frekvens. Dette arbeidet ble gjort av Uni Research på vegne av Nasjonalbiblioteket og Språkbanken. For enkelhets skyld ble det også laget en samling med de 1000 mest frekvente n-grammene av alle typer nevnt ovenfor for nedlasting separat.. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2011-01-03 |
dc:date | 2011-12-22 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-ZERO (CC-ZERO) |
dc:rights | https://creativecommons.org/publicdomain/zero/1.0/ |
dc:creator | Knut Hofland |
dc:lang | bokmål |