NoWaC v 1.0 (Norwegian Web as Corpus)
Utvidet metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: NoWaC v 1.0 (Norwegian Web as Corpus)
- description: Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet). There are no information about author, publisher, genre etc in the corpus. NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa).
- resource Short Name: NoWaC
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
- P I D: http://hdl.handle.net/11538/0000-0005-E7C0-D
- distribution Info
- licence Info
- user Category: Public
- distribution Access Medium: downloadable
- download Location: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
- licence
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- licence Url: http://creativecommons.org/licenses/by-nc-sa/2.0/
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- licence Info
- user Category: Academic
- distribution Access Medium: accessibleThroughInterface
- execution Location: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
- licence
- licence Family: CLARIN
- licence Name: CLARIN_ACA-NC-LOC-ND
- licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&NORED=1&ND=1
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: LOC
- conditions Of Use: NC
- conditions Of Use: ND
- conditions Of Use: NORED
- non Standard Conditions Of Use: The unscrambled corpus is accesible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- ipr Holder
- actor Info
- actor Type: person
- person Info
- surname: Guevara
- given Name: Emiliano
- affiliation:
- organization Info
- organization Name: The Text Laboratory
- department Name: Department of Linguistics and Scandinavian Studies
- actor Info
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- metadata Creation Date: 28.11.2014
- metadata Last Date Updated: 05.06.2018
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- sex: female
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: kristin.hagen@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- version: v 1.0
- documentation Unstructured
- role: documentation
- document Unstructured: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
- creation Start Date: 01.08.2009
- creation End Date: 31.12.2010
- resource Creator
- actor Info
- actor Type: person
- person Info
- surname: Guevara
- given Name: Emiliano
- affiliation:
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: emiguevara@gmail.com
- actor Info
- project Name: Emiliano Guevara's PhD project
- funding Type: nationalFunds
- resource Relation
- related Resource
- reference Scope: externalResource
- resource Reference: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/tjenester/nowac-frequency.html
- related Resource
- reference Scope: thisResource
- relation Type
- relation Name: derivate
- related Resource
- corpus Info
- corpus Type: Written Corpus
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: txt
- character Encoding Info
- character Encoding: utf-8
- text Format Info
- corpus Part General Info
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: Nb
- language Name: Norwegian Bokmål
- size Info
- size: 7000 000
- size Unit: tokens
- annotation Info
- annotation Type: morphosyntacticAnnotation-posTagging
- annotation Type: lemmatization
- segmentation Level: word
- tagset: The Oslo Bergen-tagger tagset: http://tekstlab.uio.no/obt-ny/english/index.html
- tagset Language Id: NB
- tagset Language Name: Norwegian bokmål
- annotation Mode: automatic
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: http://www.tekstlab.uio.no/obt-ny/english/index.html
- annotation Tool
- target Resource Name U R I: The Oslo-Bergen Tagger: http://tekstlab.uio.no/obt-ny/english/index.html
- classification Info
- conformance To Classification Scheme: other
- genre Info
- genre Type: textGenre
- genre: unstandardised
- unstandardised Genre: scrambled web corpus/searchable web corpus
- time Coverage Info
- time Coverage: November 2009 – January 2010
- linguality Info
dc:type | corpus |
dc:title | NoWaC v 1.0 (Norwegian Web as Corpus) |
dc:identifier | oai:tekstlab.uio.no:nowac |
dc:description | Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet). There are no information about author, publisher, genre etc in the corpus. NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa). |
dc:publisher | |
dc:format | downloadable |
dc:date | 2009-08-01 |
dc:date | 2010-12-31 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | http://creativecommons.org/licenses/by-nc-sa/2.0/ |
dc:creator | Emiliano Guevara |
dc:lang | bokmål |