COLA – Corpus Oral de Lenguaje Adolescente

COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish.
The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording.

The full COLA corpus has three subparts:
1) COLAm: teenage language from Madrid
2) COLAba: teenage language from Buenos Aires
3) COLAs: teenage language from Santiago de Chile

The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle.
As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle.
For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata).

About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla.
The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem.
The third subpart COLAs was compiled by Eli Marie Drange in the same project.
Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages.
In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange.

To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.

The full COLA corpus has three subparts:
1) COLAm: teenage language from Madrid
2) COLAba: teenage language from Buenos Aires
3) COLAs: teenage language from Santiago de Chile

To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.

Extended metadata

resource Common Info
- resource Type: corpus
- identification Info
  - resource Name: COLA – Corpus Oral de Lenguaje Adolescente
  - description: COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish. The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording. The full COLA corpus has three subparts: 1) COLAm: teenage language from Madrid 2) COLAba: teenage language from Buenos Aires 3) COLAs: teenage language from Santiago de Chile The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle. As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle. For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata). About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla. The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem. The third subpart COLAs was compiled by Eli Marie Drange in the same project. Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages. In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange. To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.
  - resource Short Name: COLA
  - url: http://clarino.uib.no/korpuskel/landing-page?identifier=cola&view=short
  - url: http://www.colam.org/
  - P I D: hdl:11495/D98E-D689-6A14-5
  - identifier: cola
- distribution Info
  - licence Info
    - user Category: Academic
    - attribution Text: The COLA corpus is distributed by Corpuscle (http://hdl.handle.net/11495/D98E-D689-6A14-5) and was created in the COLA project at the University of Bergen. Jørgensen, Annette Myre. 2008. “COLA: Un corpus Oral de Lenguaje Adolescente”, Anejos a Oralia 3.1.
    - licence
      - licence Family: CLARIN
      - licence Name: CLARIN_ACA-NC-LOC-PRIV-ND-*
      - licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
      - conditions Of Use: BY
      - conditions Of Use: ID
      - conditions Of Use: LOC
      - conditions Of Use: NC
      - conditions Of Use: ND
      - conditions Of Use: NORED
      - conditions Of Use: PRIV
      - non Standard Conditions Of Use: Time limited access: The End-User’s access to the Resource being only valid for a specified task/project, the research plan must specify a time span for the project. The End-User’s access to the Resource will thus be limited to the End-User’s expected needs.
    - licensor:
    - actor Info
      - actor Type: person
      - person Info
        surname: Jørgensen
        given Name: Annette Myre
        sex: female
        position: Associate Professor
        affiliation:
        organization Info
        organization Name: University of Bergen
        organization Name: Universitetet i Bergen
        organization Short Name: UiB
        organization Short Name: UoB
        department Name: Department of Foreign Languages
        department Name: Institutt for fremmedspråk (IF)
      - communication Info
        email: Annette.Myre@if.uib.no
  - ipr Holder
    - actor Info
      - actor Type: organization
      - organization Info
        organization Name: University of Bergen
        organization Name: Universitetet i Bergen
        organization Short Name: UiB
        organization Short Name: UoB
        department Name: Department of Foreign Languages
        department Name: Institutt for fremmedspråk (IF)
      - communication Info
        email: Annette.Myre@if.uib.no
        email: eli.m.drange@uia.no
- contact
  - actor Info
    - actor Type: person
    - person Info
      - surname: Jørgensen
      - given Name: Annette Myre
      - sex: female
      - position: Associate Professor
      - affiliation:
      - organization Info
        organization Name: University of Bergen
        organization Name: Universitetet i Bergen
        organization Short Name: UiB
        organization Short Name: UoB
        department Name: Department of Foreign Languages
        department Name: Institutt for fremmedspråk (IF)
    - communication Info
      - email: Annette.Myre@if.uib.no
  - actor Info
    - actor Type: person
    - person Info
      - surname: Drange
      - given Name: Eli Marie
      - sex: female
      - affiliation:
      - organization Info
        organization Name: University of Agder
        organization Name: Universitetet i Agder
        organization Short Name: UiA
        organization Short Name: UoA
    - communication Info
      - email: eli.m.drange@uia.no
  - actor Info
    - actor Type: organization
    - organization Info
      - organization Name: CLARIN Bergen
    - communication Info
      - email: clarin@uib.no
      - url: https://clarino.uib.no/
- metadata Info
  - metadata Creation Date: 27.08.2015
  - metadata Last Date Updated: 31.10.2017
  - metadata Creator
    - actor Info
      - actor Type: person
      - person Info
        surname: Lyse
        given Name: Gunn Inger
        sex: female
        position: Researcher (Ph.D)
        affiliation:
        organization Info
        organization Name: University of Bergen
        organization Name: Universitetet i Bergen
        organization Short Name: UiB
        organization Short Name: UoB
        department Name: Department of Linguistic, Literary and Aesthetic Studies
      - communication Info
        email: clarin@uib.no
- resource Documentation Info
  - documentation Structured
    - role: documentation
    - document Info
      - document Type: article
      - title: COLA: A Spanish spoken corpus of youth language
      - author: Hofland, Knut and Jørgensen, Annette Myre and Drange, Eli-Marie and Stenström, Anna-Brita
      - year: 2005
      - url: http://www.colam.org/publikasjoner/COLA-cl2005-fig.htm
      - document Language Name: English
      - document Language Id: en
  - documentation Structured
    - role: documentation
    - document Info
      - document Type: article
      - title: COLA: Un corpus Oral de Lenguaje Adolescente
      - author: Jørgensen, Annette Myre
      - year: 2008
      - journal: Anejos de Oralia 3/1
      - url: http://www.colam.org/publikasjoner/corpuslenguajeadoles.htm
      - document Language Name: Spanish
      - document Language Id: es
  - documentation Structured
    - role: documentation
    - document Info
      - document Type: other
      - title: Project webpage. Lists the project participants, related publications etc.
      - url: http://www.colam.org
- resource Creation Info
  - funding Project:
  - project Info
    - project Name: COLA (Corpus Oral de Lenguaje Adolescente)
    - project Short Name: COLA
    - funding Type: nationalFunds
    - funder: University of Bergen, Faculty of Arts
    - funder: Meltzer fund
    - funder: Research Council of Norway
    - funding Country: Norway
    - project Start Date: 2002

corpus Info
- corpus Type: Multimodal Corpus
- corpus Part Info
  - media Type: audio
  - corpus Audio Info
    - audio Size Info
      - size Info
        size: 500000
        size Unit: words
      - duration Of Audio Info
        size: 50
        duration Unit: hours
    - audio Content Info
      - textual Description: The method used for recording the data follows the same pattern as the COLT Corpus of English adolescents and the UNO Corpus of Norwegian adolescents, which in turn is patterned on the Longman model used for collecting the British National Corpus (BNC). The recruits were selected from schools in areas with different social status in order to create a balanced corpus with regards to gender, type of school and social status. The recruits are also between 13-18 years old. Each recruit was then equipped with a Minidisc recorder and a microphone, and asked to record his or her conversations with friends and at school for a few days. Some of the conversations are recorded at school, in breaks or during teamwork, and some of the conversations are recorded at home or at places where adolescents use to meet, as parks and so on. The recruits filled in a questionnaire with some personal information as place of birth, language spoken at home, etc, and they were also requested to write down some information about the other participants in their conversations. The madrid consists of 78 recordings (individual conversations), which roughly corresponds to 50 hours of recording. Based on the transcriptions, the material consists of ca 750000 tokens, but when considering that some 'tokens' form multiword units, there are ca 500000 lexemes.
    - setting Info
      - naturality: spontaneous
      - conversational Type: multilogue
  - corpus Text Info
    - text Format Info
      - mime Type: text/plain
    - character Encoding Info
      - character Encoding: UTF-8
- corpus Part General Info
  - linguality Info
    - linguality Type: monolingual
  - language Info
    - language Id: es
    - language Name: Spanish
    - language Variety Info
      - language Variety Type: jargon
      - language Variety Name: teenage language
    - language Variety Info
      - language Variety Type: dialect
      - language Variety Name: Corpus part COLAm: teenage language (spoken) in Madrid
      - size Per Language Variety
        size Info
        size: 500000
        size Unit: words
  - modality Info
    - modality Type: writtenLanguage
    - modality Type Details: Transciptions of the recorded speech
  - modality Info
    - modality Type: spokenLanguage
    - modality Type Details: Spontaneous speech among teenagers
  - size Info
    - size: 751168
    - size Unit: tokens
  - size Info
    - size: 500000
    - size Unit: words
  - annotation Info
    - annotation Type: speechAnnotation-orthographicTranscription
    - segmentation Level: word
    - segmentation Level: wordGroup
    - annotation Mode Details: COLA has been transcibed to be made searchable as text. Using the program Transcriber, the recordings were orthographically transkribed. Apart from the ortographic words, there is specific annotations for imitation and citing, incomplete words (%) and unclear words (XXX), rising vs. falling intonation for questions. The user is meant to listen to the sound file while reading the transciption; thus there is no annotation for non-linguistic sounds such as coughing, dog's bark. I Corpuscle the user may click on the sound file to listen while reading the transcription.
    - annotation Tool
      - target Resource Name U R I: Transcriber
    - annotator:
    - actor Info
      - actor Type: person
      - person Info
        surname: Padilla
        given Name: Esperanza Eguía
        sex: female
  - classification Info
    - genre Info
      - genre Type: audioGenre
      - genre: informal
      - unstandardised Genre: teenage language
  - time Coverage Info
    - time Coverage: Recordings between 2002 – 2004 and in 2007 (Madrid corpus subpart)

dc:type	corpus
dc:title	COLA – Corpus Oral de Lenguaje Adolescente
dc:identifier	oai:clarino.uib.no:cola
dc:description	COLA (Corpus Oral de Lenguaje Adolescente Resource) is a corpus of recorded, spontaneous speech among teenagers from different schools and youth clubs in Madrid, Buenos Aires and Santiago de Chile. It is created for the purpose of studying teenage language in Spanish. The sound files are coupled with orthographic transcriptions (text files) that are anonymized, making the corpus searchable as text through a web search interface where you can read the text and listen to the corresponding recording. The full COLA corpus has three subparts: 1) COLAm: teenage language from Madrid 2) COLAba: teenage language from Buenos Aires 3) COLAs: teenage language from Santiago de Chile The present metadata describe the part of COLA which is searchable through the corpus management and analysis system Corpuscle: http://clarino.uib.no/corpuscle. As of August 2015, the Madrid subpart of the corpus is available for search in Corpuscle. For enquires about access to other parts of COLA, please contact Annette Myre Jørgensen (see contact information details in metadata). About the making of the corpus: The corpus results from the COLA project, led by Annette Myre Jørgensen at University of Bergen. The transcription work has been coordinated and led by Esperanza Eguía Padilla. The technical development of the corpus was mainly done by Uni Research Computing, especially by Knut Hofland and Øystein Reigem. The third subpart COLAs was compiled by Eli Marie Drange in the same project. Formally, COLA belongs to the University of Bergen/Dept. of Foreign Languages. In agreement with the head of department, the executive copyright holders (on behalf of University of Bergen) are: Annette Myre Jørgensen and Eli Marie Drange. To access the corpus, a (short) research plan needs to be approved by Annette Myre Jørgensen.
dc:publisher
dc:format
dc:date
dc:date
dc:rights	Academic
dc:rights	CLARIN
dc:rights	CLARIN_ACA-NC-LOC-PRIV-ND-*
dc:rights	https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
dc:lang	Spanish

Download metadata

Download metadata

COLA – Corpus Oral de Lenguaje Adolescente

Extended metadata

Resource Common Info

Corpus Info

Dublin Core (DC)

Download metadata