Skip to content

NoWaC v 1.0 (Norwegian Web as Corpus)

Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet).

There are no information about author, publisher, genre etc in the corpus.

NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa).

Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet).

There are no information about author, publisher, genre etc in the corpus.

NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa).

Extended metadata

Download resources

Go to resource page