NST N-gram – dansk nyhendetekst
Utvidet metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: NST N-gram – Danish News Text
- resource Name: NST N-gram – dansk nyhendetekst
- description: This corpus contains n-grams derived from a 290 million word corpus of Danish news text from the papers Berlingske Tidende, Ekstrabladet og Politiken. The time period covered is 1995-1999. The corpus was originally developed by Nordic Language Technology (NST) 1997-2003. The n-grams were generated by Uni Research for the National Library and the Language Bank. Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered both by frequency and alphabetically. For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.
- description: Dette korpuset inneheld n-gram på dansk, henta frå eit korpus på 290 millionar ord med nyhendetekst på dansk frå avisene Berlingske Tidende, Ekstrabladet og Politiken. Avisene er frå tidsperioden 1995-1999. Korpuset vart opprinneleg utvikla av Nordisk Språkteknologi (NST) i perioden 1997-2003. N-gramma vart laga av Uni Research for Nasjonalbiblioteket og Språkbanken. Sekvensar av eitt til seks ord er genererte (unigram, bigram, trigram, 4-gram, 5-gram og 6-gram), og deretter sorterte alfabetisk og etter frekvens. Det er òg laga ein forenkla versjon for nedlasting med dei 1000 mest frekvente n-gramma av alle typar nemnde ovanfor.
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-28/
- P I D: hdl:21.11146/28
- identifier: sbr-28
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-28/
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-ZERO (CC-ZERO)
- licence Url: https://creativecommons.org/publicdomain/zero/1.0/
- licensor:
- actor Info:
- actor Type: organization
- role: Licensor
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- distribution Rights Holder
- actor Info:
- actor Type: organization
- role: Distribution Rights Holder
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Birkenes
- given Name: Magnus Breder
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: person
- role: Resource Creator
- person Info:
- surname: Hofland
- given Name: Knut
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UiB
- corpus Info:
- corpus Type: Ngram Corpus
- corpus Part Info:
- media Type: textNgram
- corpus Text Ngram Info:
- ngram Info:
- base Item: word
- order: 6
- text Format Info:
- mime Type: text/plain
- size Per Text Format:
- size Info:
- size: 290000000
- size Unit: words
- character Encoding Info:
- character Encoding: Windows
- corpus Part General Info:
- linguality Info:
- linguality Type: monolingual
- language Info:
- language Id: da
- language Name: Danish
- language Variety Info:
- language Variety Type: other
- language Variety Name: news text
- modality Info:
- modality Type: writtenLanguage
- modality Type Details: news text
- size Info:
- size: 290000000
- size Unit: words
- time Coverage Info:
- time Coverage: 1995-1999
dc:type | corpus |
dc:title | NST N-gram – dansk nyhendetekst |
dc:identifier | oai:nb.no:sbr-28 |
dc:description | Dette korpuset inneheld n-gram på dansk, henta frå eit korpus på 290 millionar ord med nyhendetekst på dansk frå avisene Berlingske Tidende, Ekstrabladet og Politiken. Avisene er frå tidsperioden 1995-1999. Korpuset vart opprinneleg utvikla av Nordisk Språkteknologi (NST) i perioden 1997-2003. N-gramma vart laga av Uni Research for Nasjonalbiblioteket og Språkbanken. Sekvensar av eitt til seks ord er genererte (unigram, bigram, trigram, 4-gram, 5-gram og 6-gram), og deretter sorterte alfabetisk og etter frekvens. Det er òg laga ein forenkla versjon for nedlasting med dei 1000 mest frekvente n-gramma av alle typar nemnde ovanfor. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2012-01-02 |
dc:date | 2012-06-11 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-ZERO (CC-ZERO) |
dc:rights | https://creativecommons.org/publicdomain/zero/1.0/ |
dc:creator | Knut Hofland |
dc:lang | dansk |