Hopp til innhold

The Kola Peninsula Spoken Corpus (KoPeSC) 1: Spoken Corpus to “Речь поморов Терского берега Белого моря: Звучащая хрестоматия” [“Pomor Speech on the Ter Coast of the White Sea: A spoken anthology”] (Slavica Bergensia 15)

The Kola Peninsula Spoken Corpus (KoPeSC) is a dataset of sound recordings and their transcriptions in ELAN of Pomor Russian dialect speech and of Sámi and Russian speech as spoken by the indigenous peoples of Kola Peninsula. Most recordings are sociolinguistic interviews collected during fieldwork expeditions that took place between 2001 and 2008, with Margje Post and David Pineda (then UiT, now UiB) as main researchers.

KoPeSC 1, the first dataset, consists of all audio files (in mp3- and WAVE-format) and their transcriptions (in ELAN), with metadata, to the following publication:

Post, Margje & David Pineda (2024). Речь поморов Терского берега Белого моря. Звучащая хрестоматия [“Pomor Speech from the Ter Coast of the White Sea: A Spoken Anthology”]; Slavic Bergensia, Volume 15. DOI: xxx

The dataset in KoPeSC 1 consists of:

– The 30 audio files to the 29 texts in the anthology, both in WAVE-format and in .mp3-format;
– 30 ELAN transcription files (.eaf) to these audio files, with their transcriptions (both in simplified phonetic script and in standardized Russian);
– metadata: KoPeSC1_SlavBerg15_metadata.xlsx

Vol. 15 of Slavica Bergensia is an open access anthology of the Pomor Russian dialect as it is spoken on the Ter Coast of the White Sea, with 30 short excerpts from interviews with 21 elderly dialect speakers. This publication in Russian also contains background information to the region and its dialect, in-depth analyses of a selection of linguistic features and commentaries on each single text.

In the publication itself the recordings are transcribed in a simplified phonetic transcription. The transcriptions in ELAN in this repository also contain transcriptions in Standard Russian, which are better suited for queries and analyses. ELAN allows searching through multiple annotation files, so one can search for an expression in all sound and transcriptions files of the anthology at once and listen to each individual token or download a spreadsheet with all tokens of the expression; cf. https://www.mpi.nl/corpus/html/elan/ch07s02.html

License: CC BY-NC-SA 4.0, https://creativecommons.org/licenses/by-nc-sa/4.0/
[версия на русском языке: https://creativecommons.org/licenses/by-nc-sa/4.0/deed.ru]

Although the sound and text data to Slavica Bergensia 15 are made freely available for access, printing and download for non-commercial use, the audio recordings are classified as personal data. Please note that every individual user is responsible for treating the participants in the interviews with respect and sincerity.
The publication of these data has been registered in RETTE (project nr. F3438, https://rette.app.uib.no), UiB’s system for monitoring and control of the processing of personal data in research and student projects, and follows the Norwegian national research ethical guidelines for projects processing personal data (https://www.forskningsetikk.no/en/guidelines/).

The dialect recordings were collected during various field work expeditions and for different projects between 1961 and 2006, most of them by Margje Post (then University of Tromsø, now University of Bergen) and colleagues from Tromsø, Moscow and Bochum between 2001 and 2006. The dataset also contains five fairy tales, recorded in 1961 and 1964 by Dmitrij Balašov (Petrozavodsk – St. Petersburg), and an excerpt from a folkloristic interview by Marina Vlasova (St. Petersburg) from 1987.
Most speakers are from the village Varzuga, but recordings from Umba, Kuzomen’, Tetrino and Čavanga are represented as well. For details, see KoPeSC1_SlavBerg15_metadata.xlsx.

More dialect recordings will be made available in a separate dataset as KoPeSC 2, including the long versions of the interviews from which the excerpts were taken.

The fieldwork expeditions, the cooperation with prof. Christian Sappok from the University of Bochum and the transcriptions have been supported by grants from UiT The Arctic University of Norway, DAAD, DFG and the University of Bergen. We are indebted to the Audio Archive of the Institute of Linguistics, Literature and History of the Karelian Research Centre of the Russian Academy of Sciences (KarRC RAS) in Petrozavodsk for the recordings of texts 2-5 (from 1964) and to the folklorist Marina Vlasova (Puškinskij Dom, Saint Petersburg) and to her colleagues at the Audio Archive of Puškinskij Dom for texts 1 (1961) and 8 (1987).

For questions or to receive the annotation guidelines or phonetic transcriptions, please contact the Corpus manager (Margje Post, UiB).

––––––––––––––––
KoPeSC 1 is the first dataset of the Kola Peninsula Spoken Corpus. The Kola Peninsula Spoken Corpus (KoPeSC) consists of several datasets, which are planned to be archived in CLARINO, including more recordings of Pomor Russian dialect speech from the Ter Coast from the Tromsø-Bergen archive, which have been transcribed in ELAN, and sound files and transcriptions that were collected during fieldwork in 2007 and 2008 in Lovozero and Krasnoščelje (Central Kola Peninsula) by Margje Post and David Pineda (then UiT, now UiB). These recordings consist of interviews in Russian with native speakers of Sámi and Komi-Zyryan and with former Pomor Russian inhabitants of Ponoj, a coastal village on the easternmost part of Kola Peninsula.

The Kola Peninsula Spoken Corpus (KoPeSC) is a dataset of sound recordings and their transcriptions in ELAN of Pomor Russian dialect speech and of Sámi and Russian speech as spoken by the indigenous peoples of Kola Peninsula. Most recordings are sociolinguistic interviews collected during fieldwork expeditions that took place between 2001 and 2008, with Margje Post and David Pineda (then UiT, now UiB) as main researchers.

KoPeSC 1, the first dataset, consists of all audio files (in mp3- and WAVE-format) and their transcriptions (in ELAN), with metadata, to the following publication:

Post, Margje & David Pineda (2024). Речь поморов Терского берега Белого моря. Звучащая хрестоматия [“Pomor Speech from the Ter Coast of the White Sea: A Spoken Anthology”]; Slavic Bergensia, Volume 15. DOI: xxx

The dataset in KoPeSC 1 consists of:

– The 30 audio files to the 29 texts in the anthology, both in WAVE-format and in .mp3-format;
– 30 ELAN transcription files (.eaf) to these audio files, with their transcriptions (both in simplified phonetic script and in standardized Russian);
– metadata: KoPeSC1_SlavBerg15_metadata.xlsx

Vol. 15 of Slavica Bergensia is an open access anthology of the Pomor Russian dialect as it is spoken on the Ter Coast of the White Sea, with 30 short excerpts from interviews with 21 elderly dialect speakers. This publication in Russian also contains background information to the region and its dialect, in-depth analyses of a selection of linguistic features and commentaries on each single text.

In the publication itself the recordings are transcribed in a simplified phonetic transcription. The transcriptions in ELAN in this repository also contain transcriptions in Standard Russian, which are better suited for queries and analyses. ELAN allows searching through multiple annotation files, so one can search for an expression in all sound and transcriptions files of the anthology at once and listen to each individual token or download a spreadsheet with all tokens of the expression; cf. https://www.mpi.nl/corpus/html/elan/ch07s02.html

License: CC BY-NC-SA 4.0, https://creativecommons.org/licenses/by-nc-sa/4.0/
[версия на русском языке: https://creativecommons.org/licenses/by-nc-sa/4.0/deed.ru]

Although the sound and text data to Slavica Bergensia 15 are made freely available for access, printing and download for non-commercial use, the audio recordings are classified as personal data. Please note that every individual user is responsible for treating the participants in the interviews with respect and sincerity.
The publication of these data has been registered in RETTE (project nr. F3438, https://rette.app.uib.no), UiB’s system for monitoring and control of the processing of personal data in research and student projects, and follows the Norwegian national research ethical guidelines for projects processing personal data (https://www.forskningsetikk.no/en/guidelines/).

The dialect recordings were collected during various field work expeditions and for different projects between 1961 and 2006, most of them by Margje Post (then University of Tromsø, now University of Bergen) and colleagues from Tromsø, Moscow and Bochum between 2001 and 2006. The dataset also contains five fairy tales, recorded in 1961 and 1964 by Dmitrij Balašov (Petrozavodsk – St. Petersburg), and an excerpt from a folkloristic interview by Marina Vlasova (St. Petersburg) from 1987.
Most speakers are from the village Varzuga, but recordings from Umba, Kuzomen’, Tetrino and Čavanga are represented as well. For details, see KoPeSC1_SlavBerg15_metadata.xlsx.

More dialect recordings will be made available in a separate dataset as KoPeSC 2, including the long versions of the interviews from which the excerpts were taken.

The fieldwork expeditions, the cooperation with prof. Christian Sappok from the University of Bochum and the transcriptions have been supported by grants from UiT The Arctic University of Norway, DAAD, DFG and the University of Bergen. We are indebted to the Audio Archive of the Institute of Linguistics, Literature and History of the Karelian Research Centre of the Russian Academy of Sciences (KarRC RAS) in Petrozavodsk for the recordings of texts 2-5 (from 1964) and to the folklorist Marina Vlasova (Puškinskij Dom, Saint Petersburg) and to her colleagues at the Audio Archive of Puškinskij Dom for texts 1 (1961) and 8 (1987).

For questions or to receive the annotation guidelines or phonetic transcriptions, please contact the Corpus manager (Margje Post, UiB).

––––––––––––––––
KoPeSC 1 is the first dataset of the Kola Peninsula Spoken Corpus. The Kola Peninsula Spoken Corpus (KoPeSC) consists of several datasets, which are planned to be archived in CLARINO, including more recordings of Pomor Russian dialect speech from the Ter Coast from the Tromsø-Bergen archive, which have been transcribed in ELAN, and sound files and transcriptions that were collected during fieldwork in 2007 and 2008 in Lovozero and Krasnoščelje (Central Kola Peninsula) by Margje Post and David Pineda (then UiT, now UiB). These recordings consist of interviews in Russian with native speakers of Sámi and Komi-Zyryan and with former Pomor Russian inhabitants of Ponoj, a coastal village on the easternmost part of Kola Peninsula.

Utvidet metadata

Last ned ressurser

Last ned metadata