N-gram – bokmål
Utvidet metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: N-gram – Norwegian Bokmål
- resource Name: N-gram – bokmål
- description: These n-grams (n=1-6) are made on the basis of the texts in Norwegian Newspaper Corpus and the news texts from the text corpus from Nordic Language Technology AS (NST). In total, the source material consists of 1175 million words of running text. The n-grams are sorted alphabetically and by frequency, respectively. Frequency lists (unigrams) are published in a separate download. A simplified version, listing the 1000 most frequent n-grams is also available for download.
- description: Disse n-grammene (n=1-6) er laget med utgangspunkt i tekstene i Norsk aviskorpus (24 nettaviser) og nyhetsdelen av tekstkorpuset etter Nordisk språkteknologi AS (NST). Det samlede tekstgrunnlaget for hele materialet er 1175 millioner ord med løpende tekst. N-grammene er sortert henholdsvis alfabetisk og etter frekvens. Frekvenslister (unigram) er i tillegg publisert separat. En forenklet versjon med en liste over de 1000 mest frekvente n-grammene er også tilgjengelig.
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-12/
- P I D: hdl:21.11146/12
- identifier: sbr-12
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-12/
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-ZERO (CC-ZERO)
- licence Url: https://creativecommons.org/publicdomain/zero/1.0/
- licensor:
- actor Info:
- actor Type: organization
- role: Licensor
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- distribution Rights Holder
- actor Info:
- actor Type: organization
- role: Distribution Rights Holder
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Ohren
- given Name: Oddrun Pauline
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: Acquisition and Bibliographic Services
- department Name: Tilvekst og kunnskapsorganisering
- actor Info:
- actor Type: person
- role: Resource Creator
- person Info:
- surname: Hofland
- given Name: Knut
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UiB
- corpus Info:
- corpus Type: Ngram Corpus
- corpus Part Info:
- media Type: textNgram
- corpus Text Ngram Info:
- ngram Info:
- base Item: word
- order: 6
- text Format Info:
- mime Type: text/csv
- size Per Text Format:
- size Info:
- size: 1175000000
- size Unit: words
- size Info:
- size: 7940352
- size Unit: unigrams
- character Encoding Info:
- character Encoding: Windows
- corpus Part General Info:
- linguality Info:
- linguality Type: monolingual
- language Info:
- language Id: nb
- language Name: Norwegian Bokmål
- language Variety Info:
- language Variety Type: other
- language Variety Name: news text
- modality Info:
- modality Type: writtenLanguage
- modality Type Details: from newspapers and web-based media
- size Info:
- size: 1175000000
- size Unit: words
- size Info:
- size: 7940352
- size Unit: unigrams
dc:type | corpus |
dc:title | N-gram – bokmål |
dc:identifier | oai:nb.no:sbr-12 |
dc:description | Disse n-grammene (n=1-6) er laget med utgangspunkt i tekstene i Norsk aviskorpus (24 nettaviser) og nyhetsdelen av tekstkorpuset etter Nordisk språkteknologi AS (NST). Det samlede tekstgrunnlaget for hele materialet er 1175 millioner ord med løpende tekst. N-grammene er sortert henholdsvis alfabetisk og etter frekvens. Frekvenslister (unigram) er i tillegg publisert separat. En forenklet versjon med en liste over de 1000 mest frekvente n-grammene er også tilgjengelig. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2012-01-02 |
dc:date | 2012-06-12 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-ZERO (CC-ZERO) |
dc:rights | https://creativecommons.org/publicdomain/zero/1.0/ |
dc:creator | Knut Hofland |
dc:lang | bokmål |
Last ned ressurser
-
ngram_nob.tar.gz
-
1gram_nob_abc.zip
-
1gram_nob_f1_abc.zip
-
1gram_nob_f1_freq.zip
-
ngram_nob_1000.zip
-
ngram_nob.pdf