Discussions from Wikipedia
Extended metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: Diskusjonstekster frå Wikipedia
- resource Name: Discussions from Wikipedia
- description: Dette korpuset inneheld ein dump av diskusjonstrådar frå Wikipedia, der forfattarar diskuterer ulike problemstillingar i samband med publisering av bestemde artiklar på Wikipedia. Artiklane er fordelte på to filer, ei for høvesvis bokmål (nb.wikipedia.json) og nynorsk (nn.wikipedia.json). Kvar diskusjon er eit element i eit json-array, med eitt nivå som inneheld tekst og diverse metadata. Det er åtte datafelt per diskusjon: – title: tittel på artikkelen som vert diskutert – pageid: identifikator for artikkelen – revid: revisjonsinformasjon – wikidata: ev. andre data – contentcategories: metadata – hiddencategories: metadata – text: diskusjonstekst – bytelength: lengde på teksten i bytes Eit døme på dette finst i dokumentasjonsfila (2019_wikidisc.pdf).
- description: This corpus is a dump of discussion threads from the Norwegian Wikipedia, where authors discuss various issues regarding the publication of specific Wikipedia articles. The material is split into two files, one each for Norwegian Bokmål (nb.wikipedia.json) and Nynorsk (nn.wikipedia.json). Each file is a structured JSON array. One discussion corresponds to one element, with one level containing text and metadata. There are eight key/value pairs per discussion: – title: title of article under discussion – pageid: text identifier – revid: audit information – wikidata: other data – contentcategories: metadata – hiddencategories: metadata – text: discussion text – bytelength: length of text in number of bytes An example of this can be found in the pdf file (2019_wikidisc.pdf).
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-66/
- P I D: hdl:21.11146/66
- identifier: sbr-66
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-66/
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-SA (CC-BY-SA)
- licence Url: https://creativecommons.org/licenses/by-sa/4.0/
- conditions Of Use: BY
- conditions Of Use: SA
- licensor:
- actor Info:
- actor Type: organization
- role: Licensor
- organization Info:
- organization Name: Wikimedia Norge
- organization Name: Wikimedia Norge
- ipr Holder
- actor Info:
- actor Type: organization
- role: IPR Holder
- organization Info:
- organization Name: Wikimedia Norge
- organization Name: Wikimedia Norge
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Lindstad
- given Name: Arne Martinus
- affiliation:
- organization Info:
- organization Name: National Library of Norway
- organization Name: Nasjonalbiblioteket
- organization Short Name: NLN
- organization Short Name: NB
- department Name: The Language Bank
- department Name: Språkbanken
- actor Info:
- actor Type: organization
- role: Resource Creator
- organization Info:
- organization Name: Wikimedia Norge
- organization Name: Wikimedia Norge
- corpus Info:
- corpus Type: Written Corpus
- corpus Part Info:
- media Type: text
- corpus Text Info:
- text Format Info:
- mime Type: application/json
- size Per Text Format:
- size Info:
- size: 2
- size Unit: files
- size Info:
- size: 36864
- size Unit: entries
- size Info:
- size: 136,7
- size Unit: mb
- size Info:
- size: 18400000
- size Unit: words
- character Encoding Info:
- character Encoding: UTF-8
- corpus Part General Info:
- linguality Info:
- linguality Type: multilingual
- multilinguality Type: other
- multilinguality Type Details: Discussions of a similar kind in either Norwegian Bokmål or Norwegian Nynorsk
- language Info:
- language Id: nb
- language Name: Norwegian Bokmål
- size Per Language:
- size Info:
- size: 17000000
- size Unit: words
- size Info:
- size: 31364
- size Unit: entries
- size Info:
- size: 1
- size Unit: files
- size Info:
- size: 126,4
- size Unit: mb
- language Variety Info:
- language Variety Type: jargon
- language Variety Name: Informal written language
- language Info:
- language Id: nn
- language Name: Norwegian Nynorsk
- size Per Language:
- size Info:
- size: 1400000
- size Unit: words
- size Info:
- size: 5500
- size Unit: entries
- size Info:
- size: 1
- size Unit: files
- size Info:
- size: 10,3
- size Unit: mb
- language Variety Info:
- language Variety Type: jargon
- language Variety Name: Informal written language
- modality Info:
- modality Type: writtenLanguage
- size Info:
- size: 18400000
- size Unit: words
- size Info:
- size: 36864
- size Unit: entries
- size Info:
- size: 2
- size Unit: files
- size Info:
- size: 136,7
- size Unit: mb
dc:type | corpus |
dc:title | Discussions from Wikipedia |
dc:identifier | oai:nb.no:sbr-66 |
dc:description | This corpus is a dump of discussion threads from the Norwegian Wikipedia, where authors discuss various issues regarding the publication of specific Wikipedia articles. The material is split into two files, one each for Norwegian Bokmål (nb.wikipedia.json) and Nynorsk (nn.wikipedia.json). Each file is a structured JSON array. One discussion corresponds to one element, with one level containing text and metadata. There are eight key/value pairs per discussion: – title: title of article under discussion – pageid: text identifier – revid: audit information – wikidata: other data – contentcategories: metadata – hiddencategories: metadata – text: discussion text – bytelength: length of text in number of bytes An example of this can be found in the pdf file (2019_wikidisc.pdf). |
dc:publisher | |
dc:format | downloadable |
dc:date | |
dc:date | 2019-12-11 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-SA (CC-BY-SA) |
dc:rights | https://creativecommons.org/licenses/by-sa/4.0/ |
dc:creator | Wikimedia Norge |
dc:lang | Norwegian Bokmål |
dc:lang | Norwegian Nynorsk |