Skip to content

Tagged Norwegian Bokmål texts from NBdigital

This corpus contains 4,807 morphologically tagged texts in Norwegian Bokmål from the National Library of Norway’s corpus of texts in the public domain. All texts have been published after 1960.

The texts were automatically tagged with the Oslo-Bergen tagger (see http://www.tekstlab.uio.no/obt-ny/english/index.html), with syntactic disambiguation. In theory, this should give an accuracy of approximately 96,5%. However, the texts have been digitized and OCR-read automatically (with an average word confidence of approximately 90%); this means the overall accuracy is probably considerably lower.

The data is stored as one xml file per text/book, with a simple xml structure. See the documentation file for an example.

This corpus contains 4,807 morphologically tagged texts in Norwegian Bokmål from the National Library of Norway’s corpus of texts in the public domain. All texts have been published after 1960.

The texts were automatically tagged with the Oslo-Bergen tagger (see http://www.tekstlab.uio.no/obt-ny/english/index.html), with syntactic disambiguation. In theory, this should give an accuracy of approximately 96,5%. However, the texts have been digitized and OCR-read automatically (with an average word confidence of approximately 90%); this means the overall accuracy is probably considerably lower.

The data is stored as one xml file per text/book, with a simple xml structure. See the documentation file for an example.

Extended metadata

Download resources

Download metadata