Tagged Norwegian Bokmål texts from NBdigital
Extended metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: Tagged Norwegian Bokmål texts from NBdigital
- resource Name: Taggede bokmålstekster fra NBdigital
- description: This corpus contains 4,807 morphologically tagged texts in Norwegian Bokmål from the National Library of Norway's corpus of texts in the public domain. All texts have been published after 1960. The texts were automatically tagged with the Oslo-Bergen tagger (see http://www.tekstlab.uio.no/obt-ny/english/index.html), with syntactic disambiguation. In theory, this should give an accuracy of approximately 96,5%. However, the texts have been digitized and OCR-read automatically (with an average word confidence of approximately 90%); this means the overall accuracy is probably considerably lower. The data is stored as one xml file per text/book, with a simple xml structure. See the documentation file for an example.
- description: Dette korpuset inneholder 4.807 morfologisk taggede bokmålstekster fra NBs korpus av tekster som har falt i det fri eller ikke er beskyttet av opphavsrett. Alle tekstene er publisert etter 1960. Tekstene er blitt automatisk tagget med Oslo-Bergen-taggeren (se http://www.tekstlab.uio.no/obt-ny/), og statistisk disambiguert. Dette skulle tilsi en nøyaktighet på 96,5 %. Samtidig må det tas hensyn til at tekstene er skannet og OCR-lest automatisk (gjennomsnittlig treffsikkerhet for denne samlingen ligger på cirka 90%), slik at nøyaktigheten jevnt over sannsynligvis er betydelig lavere. Dataene er lagret som en xml-fil per tekst/bok, med en veldig enkel xml-struktur. Se dokumentasjonsfilen for et eksempel.
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-43/
- P I D: hdl:21.11146/43
- identifier: sbr-43
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-43/
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-ZERO (CC-ZERO)
- licence Url: https://creativecommons.org/publicdomain/zero/1.0/
- licensor:
- actor Info:
- actor Type: organization
- role: Licensor
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- distribution Rights Holder
- actor Info:
- actor Type: organization
- role: Distribution Rights Holder
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Birkenes
- given Name: Magnus Breder
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: person
- role: Resource Creator
- person Info:
- surname: Birkenes
- given Name: Magnus Breder
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- corpus Info:
- corpus Type: Written Corpus
- corpus Part Info:
- media Type: text
- corpus Text Info:
- text Format Info:
- mime Type: text/xml
- size Per Text Format:
- size Info:
- size: 4807
- size Unit: texts
- size Info:
- size: 9,9
- size Unit: gb
- character Encoding Info:
- character Encoding: UTF-8
- corpus Part General Info:
- linguality Info:
- linguality Type: monolingual
- language Info:
- language Id: nb
- language Name: Norwegian Bokmål
- modality Info:
- modality Type: writtenLanguage
- size Info:
- size: 4807
- size Unit: texts
- size Info:
- size: 9,9
- size Unit: gb
- annotation Info:
- annotation Type: morphosyntacticAnnotation-posTagging
- segmentation Level: word
- annotation Mode: automatic
- time Coverage Info:
- time Coverage: 1960-2013
dc:type | corpus |
dc:title | Tagged Norwegian Bokmål texts from NBdigital |
dc:identifier | oai:nb.no:sbr-43 |
dc:description | This corpus contains 4,807 morphologically tagged texts in Norwegian Bokmål from the National Library of Norway's corpus of texts in the public domain. All texts have been published after 1960. The texts were automatically tagged with the Oslo-Bergen tagger (see http://www.tekstlab.uio.no/obt-ny/english/index.html), with syntactic disambiguation. In theory, this should give an accuracy of approximately 96,5%. However, the texts have been digitized and OCR-read automatically (with an average word confidence of approximately 90%); this means the overall accuracy is probably considerably lower. The data is stored as one xml file per text/book, with a simple xml structure. See the documentation file for an example. |
dc:publisher | |
dc:format | downloadable |
dc:date | 2016-01-12 |
dc:date | 2016-02-29 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-ZERO (CC-ZERO) |
dc:rights | https://creativecommons.org/publicdomain/zero/1.0/ |
dc:creator | Magnus Breder Birkenes |
dc:lang | Norwegian Bokmål |