NST N-gram – Swedish
Extended metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: NST N-gram – Swedish
- resource Name: N-gram – svensk
- description: This collection of n-grams (n=1-6) has been produced on the basis of approximately 400 million words of running text from the Swedish text corpus of Nordic Language Technology AS. The corpus contains all the n-grams, sorted alphabetically and by frequency, respectively. There is also a second format available, making it possible to select text types. This version contains more texts and is based on approximately 437 million words. A simplified version, listing the 1.000 most frequent n-grams is also available separately.
- description: Denne samlinga av n-gram (n=1-6), er laga med utgangspunkt i det svenske tekstkorpuset etter Nordisk språkteknologi AS, ei total tekstmengd på ca. 400 millionar ord. Alle n-gramma er samla og sorterte høvesvis alfabetisk og etter frekvens. I tillegg ligg materialet føre i eit format der ein kan velje kva type tekster ein ønskjer å ta med. Denne versjonen inneheld nokre fleire tekster og er på til saman ca. 437 millionar ord. Ein forenkla versjon som listar opp dei 1000 mest frekvente n-gramma er tilgjengeleg separat.
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-11/
- P I D: hdl:21.11146/11
- identifier: sbr-11
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-11/
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-ZERO (CC-ZERO)
- licence Url: https://creativecommons.org/publicdomain/zero/1.0/
- licensor:
- actor Info:
- actor Type: organization
- role: Licensor
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- distribution Rights Holder
- actor Info:
- actor Type: organization
- role: Distribution Rights Holder
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Ohren
- given Name: Oddrun Pauline
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: Acquisition and Bibliographic Services
- department Name: Tilvekst og kunnskapsorganisering
- actor Info:
- actor Type: person
- role: Resource Creator
- person Info:
- surname: Hofland
- given Name: Knut
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UiB
- corpus Info:
- corpus Type: Ngram Corpus
- corpus Part Info:
- media Type: textNgram
- corpus Text Ngram Info:
- ngram Info:
- base Item: word
- order: 6
- text Format Info:
- mime Type: text/csv
- size Per Text Format:
- size Info:
- size: 436900000
- size Unit: words
- size Info:
- size: 4238495
- size Unit: unigrams
- size Info:
- size: 22,9
- size Unit: gb
- character Encoding Info:
- character Encoding: Windows
- corpus Part General Info:
- linguality Info:
- linguality Type: monolingual
- language Info:
- language Id: sv
- language Name: Swedish
- size Per Language:
- size Info:
- size: 436900000
- size Unit: words
- language Variety Info:
- language Variety Type: other
- language Variety Name: mixed genres
- modality Info:
- modality Type: writtenLanguage
- modality Type Details: news text, technical text and fiction
- size Info:
- size: 436900000
- size Unit: words
- size Info:
- size: 4238495
- size Unit: unigrams
- size Info:
- size: 22,9
- size Unit: gb
dc:type | corpus |
dc:title | NST N-gram – Swedish |
dc:identifier | oai:nb.no:sbr-11 |
dc:description | This collection of n-grams (n=1-6) has been produced on the basis of approximately 400 million words of running text from the Swedish text corpus of Nordic Language Technology AS. The corpus contains all the n-grams, sorted alphabetically and by frequency, respectively. There is also a second format available, making it possible to select text types. This version contains more texts and is based on approximately 437 million words. A simplified version, listing the 1.000 most frequent n-grams is also available separately. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2012-01-02 |
dc:date | 2012-06-11 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-ZERO (CC-ZERO) |
dc:rights | https://creativecommons.org/publicdomain/zero/1.0/ |
dc:creator | Knut Hofland |
dc:lang | Swedish |