The Lexicographic Corpus for Norwegian Bokmål

The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words.
The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.

Extended metadata

resource Common Info
- resource Type: corpus
- identification Info
  - resource Name: Leksikografisk bokmålskorpus
  - resource Name: The Lexicographic Corpus for Norwegian Bokmål
  - description: The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words. The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.
  - description: Korpuset består av tekster hentet fra tilgjengelig litteratur/prosa fra 1985 til 2013. Korpuset har tekster fra fem sjangere: sakprosa (45%) skjønnlitteratur (35%) aviser og periodika (10%), TV-teksting( 5%), og upublisert materiale, småtrykk (5%), alt i alt 100 mill ord. Korpuset er grammatisk merket med den opprinnelige versjonen av Oslo-Bergen taggeren.
  - resource Short Name: LBK2013
  - resource Short Name: LBK2013
  - url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/lbk/
  - P I D: http://hdl.handle.net/11538/0000-000B-C022-5
- distribution Info
  - licence Info
    - user Category: Academic
    - distribution Access Medium: accessibleThroughInterface
    - execution Location: https://tekstlab.uio.no/glossa2/bokmal
    - licence
      - licence Family: CLARIN
      - licence Name: CLARIN_ACA-NC-LOC-ND
      - licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&NORED=1&ND=1
      - conditions Of Use: BY
      - conditions Of Use: ID
      - conditions Of Use: LOC
      - conditions Of Use: NC
      - conditions Of Use: ND
      - conditions Of Use: NORED
      - non Standard Conditions Of Use: Due to agreements with the third party copyright holders, the corpus is only available through Glossa, a search and post-processing tool developed by the Text Laboratory.
    - licensor:
    - actor Info
      - actor Type: organization
      - organization Info
        organization Name: University of Oslo
        organization Name: Universitetet i Oslo
        organization Short Name: UiO
        organization Short Name: UoO
        department Name: Department of Linguistics and Scandinavian Studies
        department Name: Institutt for lingvistiske og nordiske studier (ILN)
      - communication Info
        email: r.e.v.fjeld@iln.uio.no
        url: http://www.hf.uio.no/iln/english/
        address: Box 1102 Blindern
        zip Code: 0317
        city: OSLO
        country: Norway
    - distribution Rights Holder
      - actor Info
        actor Type: organization
        organization Info
        organization Name: University of Oslo
        organization Name: Universitetet i Oslo
        organization Short Name: UiO
        organization Short Name: UoO
        department Name: Department of Linguistics and Scandinavian Studies
        department Name: Institutt for lingvistiske og nordiske studier (ILN)
        communication Info
        email: tekstlab-post@iln.uio.no
        url: http://www.hf.uio.no/iln/english/
        address: Box 1102 Blindern
        zip Code: 0317
        city: OSLO
        country: Norway
- contact
  - actor Info
    - actor Type: person
    - organization Info
      - organization Name: The Text Laboratory
      - organization Short Name: Textlab
      - department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
    - communication Info
      - email: tekstlab-post@iln.uio.no
      - url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
      - address: Box 1102 Blindern
      - zip Code: 0317
      - city: OSLO
      - country: Norway
  - actor Info
    - actor Type: person
    - person Info
      - surname: Ruth E. Vatvedt
      - given Name: Fjeld
      - affiliation:
      - organization Info
        organization Name: University of Oslo
        organization Name: Universitetet i Oslo
        organization Short Name: UiO
        organization Short Name: UoO
        department Name: Department of Linguistics and Scandinavian Studies
        department Name: Institutt for lingvistiske og nordiske studier (ILN)
- metadata Info
  - metadata Creation Date: 07.08.2015
  - metadata Last Date Updated: 12.12.2018
  - metadata Creator
    - actor Info
      - actor Type: person
      - person Info
        surname: Hagen
        given Name: Kristin
      - organization Info
        organization Name: The Text Laboratory
        organization Short Name: Textlab
        department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
      - communication Info
        email: kristin.hagen@iln.uio.no
        url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
        address: Box 1102 Blindern
        zip Code: 0317
        city: OSLO
        country: Norway
- version Info
  - version: 2013
- resource Documentation Info
  - documentation Unstructured
    - role: documentation
    - document Unstructured: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/lbk/
- resource Creation Info
  - creation End Date: 31.12.2013
  - resource Creator
    - actor Info
      - actor Type: organization
      - organization Info
        organization Name: University of Oslo
        organization Name: Universitetet i Oslo
        organization Short Name: UiO
        organization Short Name: UoO
        department Name: Department of Linguistics and Scandinavian Studies
        department Name: Institutt for lingvistiske og nordiske studier (ILN)
      - communication Info
        email: r.e.v.fjeld@iln.uio.no
        url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
        address: Box 1102 Blindern
        zip Code: 0317
        city: OSLO
        country: Norway
    - actor Info
      - actor Type: organization
      - organization Info
        organization Name: The Text Laboratory
        organization Short Name: Textlab
        department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
      - communication Info
        email: tekstlab-post@iln.uio.no
        url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
        address: Box 1102 Blindern
        zip Code: 0317
        city: OSLO
        country: Norway

Download resources

Go to resource page

Go to resource page https://tekstlab.uio.no/glossa2/bokmal

dc:type	corpus
dc:title	The Lexicographic Corpus for Norwegian Bokmål
dc:identifier	oai:tekstlab.uio.no:LBK2013
dc:description	The corpus consists of texts collected from available literature/prose from 1985 to 2013. The corpus is composed of texts from five genres: non-fiction prose (45 %) fiction (35 %) newpapers/magazines (10 %), TV subtitles (5 %), and non-standardized, unpublished texts (5 %), all in all 100 mill words. The corpus is grammatically tagged with the original version of The Oslo-Bergen tagger.
dc:publisher
dc:format	accessibleThroughInterface
dc:date
dc:date	2013-12-31
dc:rights	Academic
dc:rights	CLARIN
dc:rights	CLARIN_ACA-NC-LOC-ND
dc:rights	https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&NORED=1&ND=1
dc:creator	University of Oslo
dc:creator	The Text Laboratory
dc:lang	Norwegian Bokmål

The Lexicographic Corpus for Norwegian Bokmål

Extended metadata

Resource Common Info

Corpus Info

Dublin Core (DC)

Download resources

Go to resource page