NoReC: The Norwegian Review Corpus

While the NoReC dataset was primarily created for training and evaluating models for document-level sentiment analysis, many other use cases are of course possible. The corpus comprises more than 35,000 full-text reviews extracted from eight different major Norwegian news sources: Dagbladet, VG, Aftenposten, Bergens Tidende, Fædrelandsvennen, Stavanger Aftenblad, DinSide.no and P3.no. The reviews cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1–6, as provided by the rating of the original author. The texts have been pre-processed using UDPipe and are distributed in the CoNLL-U format. However, we also provide HTML files with the raw texts. Documentation and an accompanying Python package are provided through the following git repository: https://github.com/ltgoslo/norec

Last ned ressurser

Utvidet metadata

Last ned metadata (CMDI XML)

Last ned metadata (CMDI XML) http://hdl.handle.net/11509/124@format=cmdi

dc:type
dc:title	NoReC: The Norwegian Review Corpus
dc:identifier	oai:repo.clarino.uib.no:11509/124
dc:description	While the NoReC dataset was primarily created for training and evaluating models for document-level sentiment analysis, many other use cases are of course possible. The corpus comprises more than 35,000 full-text reviews extracted from eight different major Norwegian news sources: Dagbladet, VG, Aftenposten, Bergens Tidende, Fædrelandsvennen, Stavanger Aftenblad, DinSide.no and P3.no. The reviews cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1–6, as provided by the rating of the original author. The texts have been pre-processed using UDPipe and are distributed in the CoNLL-U format. However, we also provide HTML files with the raw texts. Documentation and an accompanying Python package are provided through the following git repository: https://github.com/ltgoslo/norec
dc:publisher
dc:format
dc:date
dc:date
dc:rights
dc:rights
dc:rights
dc:rights

NoReC: The Norwegian Review Corpus

Last ned ressurser

Utvidet metadata

Dublin Core (DC)

Last ned metadata (CMDI XML)