The SKRIV Corpus
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: SKRIV-korpuset
- resource Name: The SKRIV Corpus
- description: Texts written by students in upper secondary vocational education programs. The corpus is especially suitable for the analysis of texts written by students with Norwegian as their second language. There are approx 225 texts and 112 000 words in the corpus. The texts differ in length, genre and type. The text corpus have three different versions of each text: one scanned original in pdf format and two transcribed versions in txt format: one original transcription with errors and one version where the errors are corrected. All versions are linked and it is possible to search in both transcribed versions.
- description: SKRIV-korpuset (Skriving i videregående skole) består av tekster skrevet av elever i videregående opplæring på yrkesfaglige utdanningsprogrammer. Det er spesielt tilrettelagt for analyse av tekster skrevet av elever som har norsk som sitt andrespråk. Materialet er autentiske elevtekster fra tentamener, skolearbeid og praksisuker. De er skrevet innenfor norskfaget og innenfor elevenes ulike programfag fra Bygg- og anleggsteknikk, Service og samferdsel, Elektrofag og Helse- og oppvekstfag. Korpuset rommer rundt 225 tekster av ulik lengde og i ulike sjangere og teksttyper, ca 112 000 ord. Tekstene er samlet inn ved tre ulike skoler – en storbyskole, en skole i en mindre by og en skole på et tettsted. Skriverne er både elever med norsk som førstespråk og minoritetsspråklige elever med norsk som sitt andrespråk, eller flerspråklige elever. Til tekstene er det knyttet opplysninger om elevenes morsmål og antall år i norsk skole. De fleste tekstene finnes i tre utgaver: en skannet original i pdf-format og to transkriberte i txt-format, den ene versjonen med feil. I den andre versjonen er feilene rettet. Versjonene er lenket til hverandre og det er mulig å søke i begge de transkriberte versjonene.
- resource Short Name: SKRIV
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/skriv/
- P I D: http://hdl.handle.net/11538/0000-000B-C01F-A
- distribution Info
- licence Info
- user Category: Academic
- distribution Access Medium: accessibleThroughInterface
- execution Location: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/skriv/
- licence
- licence Family: CLARIN
- licence Name: CLARIN_ACA-NC-LOC-ND
- licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&NORED=1&ND=1
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: LOC
- conditions Of Use: NC
- conditions Of Use: ND
- conditions Of Use: NORED
- non Standard Conditions Of Use: Due to agreements with the text contributors, the texts are only available through Glossa, a search and post-processing tool developed by the Text Laboratory.
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: elisabeth.selj@iln.uio.no
- url: http://www.hf.uio.no/iln/personer/vit/eselj/index.html
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/about/organization/text-laboratory/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- contact
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/about/organization/text-laboratory/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- actor Type: person
- person Info
- surname: Selj
- given Name: Elisabeth
- communication Info
- email: elisabeth.selj@iln.uio.no
- actor Info
- metadata Info
- metadata Creation Date: 21.03.2017
- metadata Last Date Updated: 05.06.2018
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: kristin.hagen@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- version Info
- version: 2
- last Date Updated: 01.04.2016
- resource Documentation Info
- tool Documentation Type: online
- documentation Unstructured
- role: documentation
- document Unstructured: http://www.tekstlab.uio.no/nota/skriv/
- resource Creation Info
- creation Start Date: 01.01.2013
- creation End Date: 01.04.2016
- resource Creator
- actor Info
- actor Type: person
- person Info
- surname: Selj
- given Name: Elisabeth
- sex: female
- communication Info
- email: elisabeth.selj@iln.uio.no
- actor Info
- funding Project:
- project Info
- project Name: SKRIV
- funding Type: ownFunds
- funder: Department of Linguistic and Scandinavian Studies, University of Oslo
- corpus Info
- corpus Type: Written Corpus
- corpus Part Info
- media Type: text
- corpus Text Info
- text Format Info
- mime Type: txt
- character Encoding Info
- character Encoding: utf-8
- size Per Character Encoding
- size Info
- size: 112 000
- size Unit: tokens
- size Info
- text Format Info
- corpus Part General Info
- source Work Info
- work Description: Texts written by students in upper secondary education programs.The texts differ in length, genre and type.
- language Info
- language Id: nb
- language Name: Norwegian Bokmål
- modality Info
- modality Type: writtenLanguage
- size Info
- size: 112 000
- size Unit: tokens
- annotation Info
- annotation Type: lemmatization
- annotation Type: morphosyntacticAnnotation-posTagging
- segmentation Level: word
- tagset: The Oslo Bergen-tagger tagset: http://tekstlab.uio.no/obt-ny/english/index.html
- tagset Language Id: Nb
- tagset Language Name: Norwegian Bokmål
- theoretic Model: Constraint Grammar
- annotation Mode: automatic
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: http://www.tekstlab.uio.no/obt-ny/english/index.html
- annotation Tool
- target Resource Name U R I: The Oslo-Bergen Tagger: http://tekstlab.uio.no/obt-ny/english/index.html
- classification Info
- genre Info
- genre Type: textGenre
- genre: unstandardised
- unstandardised Genre: Texts written by students in upper secondary education programs. The texts are available in three different versions: one scanned original in pdf format and two transcribed versions in txt format: one original transcription with errors and one version where the errors are corrected. All versions are linked and it is possible to search in both transcribed versions.
- genre Info
- time Coverage Info
- time Coverage: The texts were mostly written in 2012
- source Work Info
dc:type | corpus |
dc:title | The SKRIV Corpus |
dc:identifier | oai:tekstlab.uio.no:skriv |
dc:description | Texts written by students in upper secondary vocational education programs. The corpus is especially suitable for the analysis of texts written by students with Norwegian as their second language. There are approx 225 texts and 112 000 words in the corpus. The texts differ in length, genre and type. The text corpus have three different versions of each text: one scanned original in pdf format and two transcribed versions in txt format: one original transcription with errors and one version where the errors are corrected. All versions are linked and it is possible to search in both transcribed versions. |
dc:publisher | |
dc:format | accessibleThroughInterface |
dc:date | 2013-01-01 |
dc:date | 2016-04-01 |
dc:rights | Academic |
dc:rights | CLARIN |
dc:rights | CLARIN_ACA-NC-LOC-ND |
dc:rights | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&NORED=1&ND=1 |
dc:creator | Elisabeth Selj |
dc:lang | Norwegian Bokmål |