TAUS – downloadable transcriptions
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: TAUS – nedlastbare transkripsjoner
- resource Name: TAUS – downloadable transcriptions
- description: TAUS (The spoken language investigation in Oslo) v.3 is a speech corpus with 86 speakers and 387 551 tokens. The downloadable version of the corpus contains the transcriptions, approx. 387 500 tokens, all of them orthographically transcribed. Some of the interviews are also transcribed phonetically. The material from TAUS is based on informal interviews with people from Oslo. The interviews were made in 1971-73. The informants are mainly from two eastern districts (Vålerenga and Kampen) and a western (Frogner), and have a social background that can be considered representative with respect to education, occupation and place of adolescence. The informants fall into three groups based on age: youth (15 – 17 years), young adults (20 – 30) and adults (34 – 75). The topics for the interviews are experiences and descriptions from childhood and adolescence. The interviews were conducted at home with an unceremoniously and informal tone, so that the linguistic style can be described as informal vernacular. In 2006 – 2007 the TAUS-tapes from the A and B series were digitized, and all the interviews were transcribed orthographically and linked to the digital audio files. The transcriptions are now searchable via the search interface tool Glossa. In 2014 – 2019 the tapes from the B-series were digitized and transcribed during the LIA-project. In January 2020 TAUS v.3 was published with all available material from the A, B og C series.
- description: TAUS (Talemålsundersøkelsen i Oslo) v.3 er et talespråkskorpus med 86 talere og 387 551 tokens. Denne nedlastbare versjoner inneholder transkripsjonene, cirka 44 300 tokens. Alle transkripsjonene er ortografisk transkribert, mange har også en talemålsnær transkripsjon. Materialet fra (TAUS) er basert på uformelle intervjuer med folk fra Oslo, som ble gjort i 1971-73. Informantene er hovedsakelig fra to østlige bydeler (Vålerenga og Kampen) og en vestlig (Frogner), og har en sosial bakgrunn som kan anses representative med hensyn til utdanning og yrke, og oppvekstmiljø. Personene faller i tre grupper ut fra alder: ungdom (15 – 17 år), unge voksne (20 – 30) og voksne (34 – 75). Temaene for intervjuene er opplevelser og beskrivelser fra barndom og oppvekst, og det er flere innslag av muntlige fortellinger. Samtalene har foregått hjemme hos de enkelte og i en uhøytidelig og uformell tone, slik at den språklige stilen kan betegnes som uformell dagligtale. I 2006 – 2007 er A- og C-serien av TAUS-lydbåndene digitalisert, og alle intervjuene er transkribert ortografisk. Transkripsjonene er dessuten koplet sammen med de digitaliserte lydfilene. Hele materialet er søkbart via søkeverktøyet Glossa. Det er mulig å søke både i de originale, fonetiske TAUS-transkripsjonene og i de ortografiske. Vær oppmerksom på at noen av de originale TAUS-lydbåndene har gått tapt. Disse intervjuene mangler derfor i dette søkbare materialet. Les mer om dette under fanen Informanter. I 2014 – 2019 er B-serien digitalisert og transkribert gjennom LIA-prosjektet. I januar 2020 ble TAUS v.3 publisert med alt tilgjengelig materiale fra A-, B- og C-serien.
- resource Short Name: TAUS
- url: http://www.tekstlab.uio.no/nota/taus/index.html
- P I D: http://hdl.handle.net/11538/0000-0005-E7C2-B
- distribution Info
- licence Info
- user Category: Public
- distribution Access Medium: downloadable
- download Location: http://www.tekstlab.uio.no/nota/taus/index.html
- execution Location: http://www.tekstlab.uio.no/nota/taus/index.html
- execution Location: http://www.tekstlab.uio.no/nota/taus/english.html
- licence
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- licence Url: http://creativecommons.org/licenses/by-nc-sa/4.0/
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- non Standard Conditions Of Use: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the video and audio files are accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory. Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- contact
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- metadata Info
- metadata Creation Date: 31.07.2015
- metadata Last Date Updated: 04.05.2021
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: kristin.hagen@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- version Info
- version: Transcriptions from the third version of TAUS
- validation Info
- validated: true
- validation Type: content
- validation Mode: manual
- validation Mode Details: The transcriptions are proof read against the audio files.
- validation Extent: full
- validator:
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- documentation Unstructured
- role: documentation
- document Unstructured: http://www.tekstlab.uio.no/nota/taus/index.html
- documentation Structured
- role: documentation
- document Info
- document Type: book
- title: Oslomål. TAUS skrift nr. 6. (Hovedrapport.)
- author: E. Hanssen, Th. Hoel, E. H. Jahr, O. Rekdal, G. Wiggen.
- year: 1978
- documentation Structured
- role: documentation
- document Info
- document Type: mastersThesis
- title: Sosio-syntaktisk undersøking av talemålet til utvalgte grupper Oslo-ungdom.
- author: Wiggen, Geirr
- year: 1974
- creation Start Date: 01.01.1970
- creation End Date: 15.01.2020
- resource Creator
- actor Info
- actor Type: organization
- role: Står som førsteforfatter av prosjektrapporten. TAUS var ellers et gruppearbeid.
- person Info
- surname: Hanssen
- given Name: Eskil
- sex: male
- organization Info
- organization Name: Prosjektet Talemålsundersøkelsen i Oslo (1971-1976)
- department Name: Tidligere Institutt for Nordisk språk og litteratur ved UiO.
- communication Info
- email: eskil.hanssen@iln.uio.no
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- funding Project:
- project Info
- project Name: Talemålsundersøkelsen i Oslo
- project Short Name: TAUS
- funding Type: nationalFunds
- funder: NAVF, Norges almennvitenskaplige forskningsråd
- funding Country: Norge
- project Start Date: 01.01.1971
- project End Date: 31.12.1976
- project Name: Digitalisering og retranskribering av TAUS
- funding Type: nationalFunds
- funder: Utstyrsmidler fra Humanistisk fakultet, Universitetet i Oslo
- funder: Professor Didrik Arup Seips fond
- funding Country: Norge
- project Start Date: 01.01.2006
- project End Date: 31.12.2007
- project Name: LIA (Language Infrastructure made Accessible)
- project Short Name: LIA
- project I D: 22 59 41
- url: http://tekstlab.uio.no/LIA/
- url: https://www.hf.uio.no/iln/english/research/projects/language-infrastructure-made-accessible/index.html
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Start Date: 04.01.2014
- project End Date: 31.12.2019
- corpus Info
- corpus Type: Written Corpus
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: Downloadable transcriptions in txt format
- size Per Text Format
- size Info
- size: 387 551
- size Unit: tokens
- size Info
- character Encoding Info
- character Encoding: Unicode
- text Format Info
- corpus Part General Info
- person Source Set Info
- number Of Persons: 86
- age Of Persons: teenager
- age Of Persons: adult
- age Of Persons: elderly
- age Range Start: 15
- age Range End: 75
- sex Of Persons: mixed
- origin Of Persons: native
- dialect Accent Of Persons: Oslo dialect: from Kampen, Vålerenga (Oslo east) and Frogner (Oslo west)
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: No
- language Name: Norwegian
- language Info
- language Id: Nb
- language Name: Norwegian Bokmål
- modality Info
- modality Type: spokenLanguage
- modality Type Details: Orthographic transcription. Some of the interviews in the A series also have the original phonetic TAUS transcription linked to the orthographic transcription. The B series transcriptions have phonetic transcriptions following the LIA guidelines together with orthographic transcriptions.
- size Info
- size: 387 551
- size Unit: tokens
- annotation Info
- annotation Type: speechAnnotation-orthographicTranscription
- annotation Type: speechAnnotation-phoneticTranscription
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: Orthographic transcription,cf Bokmålsordboka (Wangensteen 2004)
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: Transkripsjonsveiledning for NoTa-Oslo
- author: Kristin Hagen
- year: 2008
- url: http://www.tekstlab.uio.no/nota/oslo/transkripsjon/NoTa-transkripsjonsveil22.pdf
- annotation Manual Structured
- role: annotationManual
- document Info
- document Type: manual
- title: Transkripsjonsrettleiing for LIA
- author: Kristin Hagen and Live Håberg and Eirik Olsen and Åshild Søfteland
- year: 2018
- url: http://tekstlab.uio.no/LIA/pdf/transkripsjonsrettleiing_lia.pdf
- annotation Tool
- target Resource Name U R I: Transcriber (http://trans.sourceforge.net/en/presentation.php )
- annotation Tool
- target Resource Name U R I: ELAN: https://tla.mpi.nl/tools/tla-tools/elan/ (for the B series)
- annotation Tool
- target Resource Name U R I: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/oslo-transliterator/index.html
- classification Info
- genre Info
- genre Type: speechGenre
- genre: semi formal
- unstandardised Genre: interviews
- genre Info
- genre Type: speechGenre
- genre: informal
- unstandardised Genre: B series: Conversations between interviewer and informants. Some of them are friends, some of them are pretending to be friends as a part of the task.
- genre Info
- time Coverage Info
- time Coverage: 1971 – 1976
- time Coverage Info
- time Coverage: In 2006 – 2007 the TAUS-tapes were digitized, and all the interviews were transcribed orthographically and linked to the digital audio files.
- time Coverage Info
- time Coverage: In 2014 – 2019 the tapes from the B series were digitalized and transcribed. In 2020 the new TAUS v.3 corpus was published
- geographic Coverage Info
- geographic Coverage: Oslo (Vålerenga, Kampen and Oslo. In the B series there are also some other locations in Oslo)
- recording Info
- recording Device Type: other
- recording Environment: other
- recorder Actor:
- actor Info
- actor Type: organization
- person Info
- surname: Hanssen
- given Name: Eskil
- sex: male
- organization Info
- organization Name: Prosjektet Talemålsundersøkelsen i Oslo (1971-1976)
- communication Info
- email: eskil.hanssen@iln.uio.no
- person Source Set Info
dc:type | corpus |
dc:title | TAUS – downloadable transcriptions |
dc:identifier | oai:tekstlab.uio.no:taus-transcriptions |
dc:description | TAUS (The spoken language investigation in Oslo) v.3 is a speech corpus with 86 speakers and 387 551 tokens. The downloadable version of the corpus contains the transcriptions, approx. 387 500 tokens, all of them orthographically transcribed. Some of the interviews are also transcribed phonetically. The material from TAUS is based on informal interviews with people from Oslo. The interviews were made in 1971-73. The informants are mainly from two eastern districts (Vålerenga and Kampen) and a western (Frogner), and have a social background that can be considered representative with respect to education, occupation and place of adolescence. The informants fall into three groups based on age: youth (15 – 17 years), young adults (20 – 30) and adults (34 – 75). The topics for the interviews are experiences and descriptions from childhood and adolescence. The interviews were conducted at home with an unceremoniously and informal tone, so that the linguistic style can be described as informal vernacular. In 2006 – 2007 the TAUS-tapes from the A and B series were digitized, and all the interviews were transcribed orthographically and linked to the digital audio files. The transcriptions are now searchable via the search interface tool Glossa. In 2014 – 2019 the tapes from the B-series were digitized and transcribed during the LIA-project. In January 2020 TAUS v.3 was published with all available material from the A, B og C series. |
dc:publisher | |
dc:format | downloadable |
dc:date | 1970-01-01 |
dc:date | 2020-01-15 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc:creator | Prosjektet Talemålsundersøkelsen i Oslo (1971-1976) |
dc:creator | The Text Laboratory |
dc:lang | Norwegian |
dc:lang | Norwegian Bokmål |