The BigBrother Corpus
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: The BigBrother Corpus
- resource Name: BigBrother-korpuset
- description: BigBrother-korpuset er et talespråkskorpus som består av den første sesongen av realityserien BigBrother som ble sendt på TVNorge våren 2001. Deltakerne i BigBrother er i alderen 23-36 år og snakker ulike dialekter. BigBrother-korpuset inneholder lyd- og videoopptak av nesten alle de 100 sendingene som ble vist på tv, cirka 44 300 tokens. Materialet er ortografisk transkribert og lenket til lyd og video. Transkripsjonene er også tagget morfologisk. BigBrother-korpuset er et unikt talespråkskorpus der deltakerne arbeider sammen, diskutere, argumenterer, krangler, gråter, ler, roper og elsker. I motsetning til kontrollerte talespråksinnspillinger som ofte er begrenset til intervjuer og dialog, har BigBrother-materialet samtaler om alle mulige temaer og innen ulike genre. Noen ganger er sterke følelser i sving, og dette kan tenkes å innvirkning på språket. Transkripsjonene er nedlastbare.
- description: The BigBrother Corpus is a speech corpus with recordings from the first season of the BigBrother show, sent on Norwegian television by TVNorge in the first half of 2001. The participants in BigBrother speak different dialects, but primarily they come from the east of Norway. They are aged 23-36 years. The BigBrother Corpus contains audio and video recordings of almost all the 100 broadcasts that was shown on television, approx. 440 300 tokens. The recordings are linked to the orthographic transcriptions. The transcriptions are also tagged morphologically. The BigBrother Corpus is a unique speech corpus where the participants work together, discuss, argue, quarrel, cries, laugh, shout, make love etc. In contrast to controlled recordings that are limited to interviews and dialogue, the BigBrother-material has conversations about all possible topics and within different genre. Sometimes strong feelings are in turn, which also can conceivably have an impact on the language. The transcripts can be downloaded.
- resource Short Name: BigBrother
- url: http://www.tekstlab.uio.no/nota/bigbrother/index.html
- P I D: http://hdl.handle.net/11538/0000-0005-E7C1-C
- distribution Info
- licence Info
- user Category: Academic
- distribution Access Medium: accessibleThroughInterface
- execution Location: https://tekstlab.uio.no/glossa2/bb
- licence
- licence Family: CLARIN
- licence Name: CLARIN_ACA-NC-LOC-PRIV-ND-*
- licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
- conditions Of Use: *
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: LOC
- conditions Of Use: NC
- conditions Of Use: ND
- conditions Of Use: NORED
- conditions Of Use: PRIV
- non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data.The production company Nordic Entertainment has generously given their consent to the usage of the videos as a speech corpus. Every individual researcher is responsible for treating the participants with resepct and sincerity. Furthermore, the informants in the corpora should be anonymized, e.g. by changing their names, in every published paper or other output.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: Universitetet i Oslo
- organization Name: University of Oslo
- organization Short Name: UiO
- organization Short Name: UoU
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- ipr Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: Nordic Entertainment (ipr holder of the videos)
- actor Info
- contact
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- metadata Info
- metadata Creation Date: 24.02.2015
- metadata Last Date Updated: 12.03.2021
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: kristin.hagen@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- version Info
- version: Second version
- validation Info
- validated: true
- validation Type: content
- validation Mode: manual
- validation Mode Details: The transcriptions are proof read against the audio files.
- validation Extent: full
- validator:
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- documentation Unstructured
- role: documentation
- document Unstructured: http://www.tekstlab.uio.no/nota/bigbrother/index.html
- creation Start Date: 01.08.2007
- creation End Date: 31.12.2009
- resource Creator
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- funding Project:
- project Info
- project Name: Developing and completing language resources: The Big Brother show as a modern speech corpus
- url: http://www.tekstlab.uio.no/nota/bigbrother/index.html
- funding Type: nationalFunds
- funder: The Research Council of Norway, the KUNSTI program (Kunnskapsutvikling for norsk språkteknologi).
- funding Country: Norway
- project Start Date: 31.08.2007
- project End Date: 31.12.2007
- corpus Info
- corpus Type: Multimodal Corpus
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: txt
- size Per Text Format
- size Info
- size: 440 338
- size Unit: tokens
- size Info
- character Encoding Info
- character Encoding: utf8
- text Format Info
- corpus Part Info
- media Type: video
- corpus Video Info
- video Content Info
- type Of Video Content: Recordings from the first season of the BigBrother show, that means audio and video recordings of almost all the 100 broadcasts that was shown on television.
- text Included In Video: none
- setting Info
- naturality: spontaneous
- conversational Type: multilogue
- audience: some
- interactivity: overlapping
- interaction: All kinds of situations in the BigBrother house. The participants prepare dinner, eat, sleep, make love, discuss, work together etc etc.
- video Format Info
- mime Type: videos in mpeg4 streaming format available through Glossa
- compression Info
- compression: true
- compression Name: mpg
- video Content Info
- corpus Part Info
- media Type: audio
- corpus Audio Info
- audio Size Info
- size Info
- size: Approx 29 GB
- size Unit: gb
- size Info
- audio Content Info
- setting Info
- naturality: spontaneous
- conversational Type: multilogue
- audience: some
- interactivity: overlapping
- interaction: All kinds of situations in the BigBrother house. The participants prepare dinner, eat, sleep, make love, discuss, work together etc etc.
- audio Format Info
- mime Type: wav and mpeg
- signal Encoding: linearPCM
- sampling Rate: 32
- quantization: 64
- number Of Tracks: 1
- recording Quality: medium
- compression Info
- compression: true
- compression Name: mpg
- audio Size Info
- corpus Part General Info
- person Source Set Info
- number Of Persons: 12
- age Of Persons: adult
- age Range Start: 23
- age Range End: 36
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: Some dialects represented, all of them from Southern Norway.
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: No
- language Name: Norwegian
- language Info
- language Id: Nb
- language Name: Norwegian Bokmål
- modality Info
- modality Type: spokenLanguage
- modality Type Details: Informal language from all settings in the BigBrother house.
- annotation Info
- annotation Type: morphosyntacticAnnotation-posTagging
- annotated Elements: other
- segmentation Level: word
- tagset: POS tagset created for the statistical NoTa-tagger – based on the tagset of the Oslo Bergen Tagger.
- tagset Language Id: nb
- tagset Language Name: Norwegian Bokmål
- theoretic Model: TreeTagger
- annotation Mode: automatic
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: NoTa-taggeren: TAGGEVEILEDNING
- author: Åshild Søfteland
- year: 2007
- url: http://www.tekstlab.uio.no/nota/oslo/Taggeveiledning2.pdf
- document Language Name: Norwegian bokmål
- document Language Id: nb
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: article
- title: Tagging a Norwegian Speech Corpus
- author: Anders Nøklestad and Åshild Søfteland
- editor: Joakim Nivre,Heiki-Jaan Kaalep,Kadri Muischnek, Mare Koit
- year: 2007
- book Title: Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007
- pages: 245–248
- conference: Nodalida 2007
- document Language Name: English
- document Language Id: en
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: article
- title: Manuell morfologisk tagging av NoTa-materialet med støtte fra en statistisk tagger.
- author: Åshild Søfteland og Anders Nøklestad
- editor: Janne Bondi Johannessen og Kristin Hagen
- year: 2008
- publisher: Novus forlag
- book Title: Språk i Oslo. Ny forskning omkring talespråk
- pages: 226–234.
- I S B N: 978-82-7099-471-7
- document Language Name: Norwegian
- document Language Id: nb
- annotation Info
- annotation Type: speechAnnotation-orthographicTranscription
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: Orthographic transcription,cf Bokmålsordboka (Wangensteen 2004)
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: http://www.tekstlab.uio.no/nota/bigbrother/
- annotation Tool
- target Resource Name U R I: Transcriber (http://trans.sourceforge.net/en/presentation.php )
- classification Info
- genre Info
- genre Type: speechGenre
- genre: informal
- unstandardised Genre: All kinds of situations in the BigBrother house. The participants prepare dinner, eat, sleep, make love, discuss, work together etc etc. Lots of emotions.
- genre Info
- time Coverage Info
- time Coverage: 2001
- person Source Set Info
dc:type | corpus |
dc:title | The BigBrother Corpus |
dc:identifier | oai:tekstlab.uio.no:bigbrother |
dc:description | The BigBrother Corpus is a speech corpus with recordings from the first season of the BigBrother show, sent on Norwegian television by TVNorge in the first half of 2001. The participants in BigBrother speak different dialects, but primarily they come from the east of Norway. They are aged 23-36 years. The BigBrother Corpus contains audio and video recordings of almost all the 100 broadcasts that was shown on television, approx. 440 300 tokens. The recordings are linked to the orthographic transcriptions. The transcriptions are also tagged morphologically. The BigBrother Corpus is a unique speech corpus where the participants work together, discuss, argue, quarrel, cries, laugh, shout, make love etc. In contrast to controlled recordings that are limited to interviews and dialogue, the BigBrother-material has conversations about all possible topics and within different genre. Sometimes strong feelings are in turn, which also can conceivably have an impact on the language. The transcripts can be downloaded. |
dc:publisher | |
dc:format | accessibleThroughInterface |
dc:date | 2007-08-01 |
dc:date | 2009-12-31 |
dc:rights | Academic |
dc:rights | CLARIN |
dc:rights | CLARIN_ACA-NC-LOC-PRIV-ND-* |
dc:rights | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1 |
dc:creator | The Text Laboratory |
dc:lang | Norwegian |
dc:lang | Norwegian Bokmål |