NoWaC v 1.0 (Norwegian Web as Corpus)

Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet).

There are no information about author, publisher, genre etc in the corpus.

NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa).

There are no information about author, publisher, genre etc in the corpus.

NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa).

Extended metadata

resource Common Info
- resource Type: corpus
- identification Info
  - resource Name: NoWaC v 1.0 (Norwegian Web as Corpus)
  - description: Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet). There are no information about author, publisher, genre etc in the corpus. NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa).
  - resource Short Name: NoWaC
  - url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
  - P I D: http://hdl.handle.net/11538/0000-0005-E7C0-D
- distribution Info
  - licence Info
    - user Category: Public
    - distribution Access Medium: downloadable
    - download Location: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
    - licence
      - licence Family: Creative Commons (CC)
      - licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
      - licence Url: http://creativecommons.org/licenses/by-nc-sa/2.0/
      - conditions Of Use: BY
      - conditions Of Use: NC
      - conditions Of Use: SA
    - licensor:
    - actor Info
      - actor Type: organization
      - organization Info
        organization Name: University of Oslo
        organization Name: Universitetet i Oslo
        organization Short Name: UiO
        organization Short Name: UoO
        department Name: Department of Linguistics and Scandinavian Studies
        department Name: Institutt for lingvistiske og nordiske studier (ILN)
      - communication Info
        email: tekstlab-post@iln.uio.no
        url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
        address: Box 1102 Blindern
        zip Code: 0317
        city: OSLO
        country: Norway
    - distribution Rights Holder
      - actor Info
        actor Type: organization
        organization Info
        organization Name: University of Oslo
        organization Name: Universitetet i Oslo
        organization Short Name: UiO
        organization Short Name: UoO
        department Name: Department of Linguistics and Scandinavian Studies
        department Name: Institutt for lingvistiske og nordiske studier (ILN)
        communication Info
        email: tekstlab-post@iln.uio.no
        url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
        address: Box 1102 Blindern
        zip Code: 0317
        city: OSLO
        country: Norway
  - licence Info
    - user Category: Academic
    - distribution Access Medium: accessibleThroughInterface
    - execution Location: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
    - licence
      - licence Family: CLARIN
      - licence Name: CLARIN_ACA-NC-LOC-ND
      - licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&NORED=1&ND=1
      - conditions Of Use: BY
      - conditions Of Use: ID
      - conditions Of Use: LOC
      - conditions Of Use: NC
      - conditions Of Use: ND
      - conditions Of Use: NORED
      - non Standard Conditions Of Use: The unscrambled corpus is accesible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
    - licensor:
    - actor Info
      - actor Type: organization
      - organization Info
        organization Name: University of Oslo
        organization Name: Universitetet i Oslo
        organization Short Name: UiO
        organization Short Name: UoO
        department Name: Department of Linguistics and Scandinavian Studies
        department Name: Institutt for lingvistiske og nordiske studier (ILN)
      - communication Info
        email: tekstlab-post@iln.uio.no
        url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
        address: Box 1102 Blindern
        zip Code: 0317
        city: OSLO
        country: Norway
    - distribution Rights Holder
      - actor Info
        actor Type: organization
        organization Info
        organization Name: University of Oslo
        organization Name: Universitetet i Oslo
        organization Short Name: UiO
        organization Short Name: UoO
        department Name: Department of Linguistics and Scandinavian Studies
        department Name: Institutt for lingvistiske og nordiske studier (ILN)
        communication Info
        email: tekstlab-post@iln.uio.no
        url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
        address: Box 1102 Blindern
        zip Code: 0317
        city: OSLO
        country: Norway
  - ipr Holder
    - actor Info
      - actor Type: person
      - person Info
        surname: Guevara
        given Name: Emiliano
        affiliation:
        organization Info
        organization Name: The Text Laboratory
        department Name: Department of Linguistics and Scandinavian Studies
- contact
  - actor Info
    - actor Type: organization
    - organization Info
      - organization Name: The Text Laboratory
      - organization Short Name: Textlab
      - department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
    - communication Info
      - email: tekstlab-post@iln.uio.no
      - url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
      - address: Box 1102 Blindern
      - zip Code: 0317
      - city: OSLO
      - country: Norway
- metadata Info
  - metadata Creation Date: 28.11.2014
  - metadata Last Date Updated: 05.06.2018
  - metadata Creator
    - actor Info
      - actor Type: person
      - person Info
        surname: Hagen
        given Name: Kristin
        sex: female
      - organization Info
        organization Name: The Text Laboratory
        organization Short Name: Textlab
        department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
      - communication Info
        email: kristin.hagen@iln.uio.no
        url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
        address: Box 1102 Blindern
        zip Code: 0317
        city: OSLO
        country: Norway
- version Info
  - version: v 1.0
- resource Documentation Info
  - documentation Unstructured
    - role: documentation
    - document Unstructured: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
- resource Creation Info
  - creation Start Date: 01.08.2009
  - creation End Date: 31.12.2010
  - resource Creator
    - actor Info
      - actor Type: person
      - person Info
        surname: Guevara
        given Name: Emiliano
        affiliation:
        organization Info
        organization Name: The Text Laboratory
        organization Short Name: Textlab
        department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
      - communication Info
        email: emiguevara@gmail.com
  - funding Project:
  - project Info
    - project Name: Emiliano Guevara's PhD project
    - funding Type: nationalFunds
- relation Info
  - resource Relation
    - related Resource
      - reference Scope: externalResource
      - resource Reference: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/tjenester/nowac-frequency.html
    - related Resource
      - reference Scope: thisResource
    - relation Type
      - relation Name: derivate

Download resources

Go to resource page

Go to resource page http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html

dc:type	corpus
dc:title	NoWaC v 1.0 (Norwegian Web as Corpus)
dc:identifier	oai:tekstlab.uio.no:nowac
dc:description	Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet). There are no information about author, publisher, genre etc in the corpus. NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa).
dc:publisher
dc:format	downloadable
dc:date	2009-08-01
dc:date	2010-12-31
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
dc:rights	http://creativecommons.org/licenses/by-nc-sa/2.0/
dc:creator	Emiliano Guevara
dc:lang	Norwegian Bokmål

NoWaC v 1.0 (Norwegian Web as Corpus)

Extended metadata

Resource Common Info

Corpus Info

Dublin Core (DC)

Download resources

Go to resource page