Skip to content

Corpus with Book Reviews from Bokelskere.no

This corpus is a dump of user generated book reviews and discussions from Bokelskere.no (meaning “book lovers”), a web community where users review and discuss new and old literature, both fiction and non-fiction.

The corpus is structured as a JSON Array where each object corresponds to a review or comment to a review on Bokelskere.no. Each object has the following fields:

– “post_id”: unique identifier for review
– “date”: date when the review was posted
– “user_id”: unique identifier for the user
– “isbn13”: ISBN for the the rewieved book
– “post_title”: title of review
– “text”: review
– “score”: evaluation (from 1-6, where 6 is the best)
– “main_title”: title of reviewed book
– “author”: author of reviewed book
– “parent_id”: identifier of review which has been commented upon

The corpus contains approximately 219,000 posts/objects, and 1.5 million word tokens (in the “text”-field).

This corpus is a dump of user generated book reviews and discussions from Bokelskere.no (meaning “book lovers”), a web community where users review and discuss new and old literature, both fiction and non-fiction.

The corpus is structured as a JSON Array where each object corresponds to a review or comment to a review on Bokelskere.no. Each object has the following fields:

– “post_id”: unique identifier for review
– “date”: date when the review was posted
– “user_id”: unique identifier for the user
– “isbn13”: ISBN for the the rewieved book
– “post_title”: title of review
– “text”: review
– “score”: evaluation (from 1-6, where 6 is the best)
– “main_title”: title of reviewed book
– “author”: author of reviewed book
– “parent_id”: identifier of review which has been commented upon

The corpus contains approximately 219,000 posts/objects, and 1.5 million word tokens (in the “text”-field).

Extended metadata

Download resources

Download metadata