COLA – Corpus Oral de Lenguaje Adolescente
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: COLA – Corpus Oral de Lenguaje Adolescente
- description: COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish. The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording. The full COLA corpus has three subparts: 1) COLAm: teenage language from Madrid 2) COLAba: teenage language from Buenos Aires 3) COLAs: teenage language from Santiago de Chile The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle. As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle. For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata). About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla. The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem. The third subpart COLAs was compiled by Eli Marie Drange in the same project. Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages. In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange. To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.
- resource Short Name: COLA
- url: http://clarino.uib.no/korpuskel/landing-page?identifier=cola&view=short
- url: http://www.colam.org/
- P I D: hdl:11495/D98E-D689-6A14-5
- identifier: cola
- distribution Info
- licence Info
- user Category: Academic
- attribution Text: The COLA corpus is distributed by Corpuscle (http://hdl.handle.net/11495/D98E-D689-6A14-5) and was created in the COLA project at the University of Bergen. Jørgensen, Annette Myre. 2008. “COLA: Un corpus Oral de Lenguaje Adolescente”, Anejos a Oralia 3.1.
- licence
- licence Family: CLARIN
- licence Name: CLARIN_ACA-NC-LOC-PRIV-ND-*
- licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: LOC
- conditions Of Use: NC
- conditions Of Use: ND
- conditions Of Use: NORED
- conditions Of Use: PRIV
- non Standard Conditions Of Use: Time limited access: The End-User’s access to the Resource being only valid for a specified task/project, the research plan must specify a time span for the project. The End-User’s access to the Resource will thus be limited to the End-User’s expected needs.
- licensor:
- actor Info
- actor Type: person
- person Info
- surname: Jørgensen
- given Name: Annette Myre
- sex: female
- position: Associate Professor
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Foreign Languages
- department Name: Institutt for fremmedspråk (IF)
- communication Info
- email: Annette.Myre@if.uib.no
- licence Info
- ipr Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Foreign Languages
- department Name: Institutt for fremmedspråk (IF)
- communication Info
- email: Annette.Myre@if.uib.no
- email: eli.m.drange@uia.no
- actor Info
- actor Info
- actor Type: person
- person Info
- surname: Jørgensen
- given Name: Annette Myre
- sex: female
- position: Associate Professor
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Foreign Languages
- department Name: Institutt for fremmedspråk (IF)
- communication Info
- email: Annette.Myre@if.uib.no
- actor Type: person
- person Info
- surname: Drange
- given Name: Eli Marie
- sex: female
- affiliation:
- organization Info
- organization Name: University of Agder
- organization Name: Universitetet i Agder
- organization Short Name: UiA
- organization Short Name: UoA
- email: eli.m.drange@uia.no
- actor Type: organization
- organization Info
- organization Name: CLARIN Bergen
- communication Info
- email: clarin@uib.no
- url: https://clarino.uib.no/
- metadata Creation Date: 27.08.2015
- metadata Last Date Updated: 31.10.2017
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Lyse
- given Name: Gunn Inger
- sex: female
- position: Researcher (Ph.D)
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- communication Info
- email: clarin@uib.no
- actor Info
- documentation Structured
- role: documentation
- document Info
- document Type: article
- title: COLA: A Spanish spoken corpus of youth language
- author: Hofland, Knut and Jørgensen, Annette Myre and Drange, Eli-Marie and Stenström, Anna-Brita
- year: 2005
- url: http://www.colam.org/publikasjoner/COLA-cl2005-fig.htm
- document Language Name: English
- document Language Id: en
- documentation Structured
- role: documentation
- document Info
- document Type: article
- title: COLA: Un corpus Oral de Lenguaje Adolescente
- author: Jørgensen, Annette Myre
- year: 2008
- journal: Anejos de Oralia 3/1
- url: http://www.colam.org/publikasjoner/corpuslenguajeadoles.htm
- document Language Name: Spanish
- document Language Id: es
- documentation Structured
- role: documentation
- document Info
- document Type: other
- title: Project webpage. Lists the project participants, related publications etc.
- url: http://www.colam.org
- funding Project:
- project Info
- project Name: COLA (Corpus Oral de Lenguaje Adolescente)
- project Short Name: COLA
- funding Type: nationalFunds
- funder: University of Bergen, Faculty of Arts
- funder: Meltzer fund
- funder: Research Council of Norway
- funding Country: Norway
- project Start Date: 2002
- corpus Info
- corpus Type: Multimodal Corpus
- corpus Part Info
- media Type: audio
- corpus Audio Info
- audio Size Info
- size Info
- size: 500000
- size Unit: words
- duration Of Audio Info
- size: 50
- duration Unit: hours
- size Info
- audio Content Info
- textual Description: The method used for recording the data follows the same pattern as the COLT Corpus of English adolescents and the UNO Corpus of Norwegian adolescents, which in turn is patterned on the Longman model used for collecting the British National Corpus (BNC). The recruits were selected from schools in areas with different social status in order to create a balanced corpus with regards to gender, type of school and social status. The recruits are also between 13-18 years old. Each recruit was then equipped with a Minidisc recorder and a microphone, and asked to record his or her conversations with friends and at school for a few days. Some of the conversations are recorded at school, in breaks or during teamwork, and some of the conversations are recorded at home or at places where adolescents use to meet, as parks and so on. The recruits filled in a questionnaire with some personal information as place of birth, language spoken at home, etc, and they were also requested to write down some information about the other participants in their conversations. The madrid consists of 78 recordings (individual conversations), which roughly corresponds to 50 hours of recording. Based on the transcriptions, the material consists of ca 750000 tokens, but when considering that some 'tokens' form multiword units, there are ca 500000 lexemes.
- setting Info
- naturality: spontaneous
- conversational Type: multilogue
- audio Size Info
- corpus Text Info
- text Format Info
- mime Type: text/plain
- character Encoding Info
- character Encoding: UTF-8
- text Format Info
- corpus Part General Info
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: es
- language Name: Spanish
- language Variety Info
- language Variety Type: jargon
- language Variety Name: teenage language
- language Variety Info
- language Variety Type: dialect
- language Variety Name: Corpus part COLAm: teenage language (spoken) in Madrid
- size Per Language Variety
- size Info
- size: 500000
- size Unit: words
- size Info
- modality Info
- modality Type: writtenLanguage
- modality Type Details: Transciptions of the recorded speech
- modality Info
- modality Type: spokenLanguage
- modality Type Details: Spontaneous speech among teenagers
- size Info
- size: 751168
- size Unit: tokens
- size Info
- size: 500000
- size Unit: words
- annotation Info
- annotation Type: speechAnnotation-orthographicTranscription
- segmentation Level: word
- segmentation Level: wordGroup
- annotation Mode Details: COLA has been transcibed to be made searchable as text. Using the program Transcriber, the recordings were orthographically transkribed. Apart from the ortographic words, there is specific annotations for imitation and citing, incomplete words (%) and unclear words (XXX), rising vs. falling intonation for questions. The user is meant to listen to the sound file while reading the transciption; thus there is no annotation for non-linguistic sounds such as coughing, dog's bark. I Corpuscle the user may click on the sound file to listen while reading the transcription.
- annotation Tool
- target Resource Name U R I: Transcriber
- annotator:
- actor Info
- actor Type: person
- person Info
- surname: Padilla
- given Name: Esperanza Eguía
- sex: female
- linguality Info
- classification Info
- genre Info
- genre Type: audioGenre
- genre: informal
- unstandardised Genre: teenage language
- genre Info
- time Coverage Info
- time Coverage: Recordings between 2002 – 2004 and in 2007 (Madrid corpus subpart)
dc:type | corpus |
dc:title | COLA – Corpus Oral de Lenguaje Adolescente |
dc:identifier | oai:clarino.uib.no:cola |
dc:description | COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish. The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording. The full COLA corpus has three subparts: 1) COLAm: teenage language from Madrid 2) COLAba: teenage language from Buenos Aires 3) COLAs: teenage language from Santiago de Chile The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle. As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle. For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata). About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla. The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem. The third subpart COLAs was compiled by Eli Marie Drange in the same project. Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages. In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange. To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen. |
dc:publisher | |
dc:format | |
dc:date | |
dc:date | |
dc:rights | Academic |
dc:rights | CLARIN |
dc:rights | CLARIN_ACA-NC-LOC-PRIV-ND-* |
dc:rights | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1 |
dc:lang | Spanish |