ASK – The Norwegian Second Language Corpus
Extended metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: ASK – The Norwegian Second Language Corpus
- description: ASK is an electronic, searchable text corpus of Norwegian as a second language, with links between linguistic data and personal data. Ask was established by the Norwegian Second Language Corpus project. The corpus contains written texts produced by language learners from ten different language backgrounds: German, Dutch, English, Spanish, Russian, Polish, Bosnian-Croatian-Serbian, Albanian, Vietnamese and Somali. The size of the corpus and the flexible query system make it possible to develop a new methodological approach to the study of transfer when the L2 is Norwegian. The selection of texts is primarily based on the native language of the test takers, and the typological distribution of these languages is taken into consideration. A corpus of Norwegian as a second language makes it possible to use quantitative methods in second language research, and provides a basis for pedagogical developments. ACCESS: the material is available in searchable form via the corpus search engine Corpuscle (see links in metadata). One can enter the ASK corpus directly via the Corpuscle main page, or using a direct link to this specific corpus via the ASK project page. Four texts of the material have also been uploaded and parsed as a small treebank in INESS.
- description: Norsk andrespråkskorpus (ASK) er et elektronisk, søkbart tekstkorpus for norsk som andrespråk med mulighet for kobling mellom språkdata og persondata. Dataene hentes fra Norsk språktests arkiv over innvandrere som har tatt Språkprøven i norsk for voksne innvandrere og Test i norsk – høyere nivå. Utvalget av tekster er i første rekke gjort på grunnlag av testtakernes morsmål, og de ulike morsmålene har en typologisk spredning. I tillegg har vi hentet inn tekster av samme type skrevet av språkbrukere med norsk som morsmål. Kodingen av tekstene genererer et parallellkorpus av de originale tekstene i samsvar med norsk skriftnorm. Søkesystemet er fleksibelt og gjør det mulig å søke etter for eksempel feilkategorier, ord, lemma, strenger av ord, strenger av ord og ordklasser og ulike kombinasjoner av disse. Et elektronisk andrespråkskorpus gir grunnlag for kvantitative metoder i andrespråksforskningen, for eksplorative undersøkelser, og det kan gi grunnlag for pedagogisk utviklingsarbeid.
- resource Short Name: ASK
- url: http://clarino.uib.no/korpuskel/landing-page?identifier=ask&view=short
- url: http://clarino.uib.no/korpuskel/landing-page?identifier=ask
- url: http://clarino.uib.no/iness/lfg-treebank?treebank=nob-ask
- url: http://www.uib.no/fg/askeladden/
- url: http://clarino.uib.no/korpuskel/corpus-list?collection=ASK
- P I D: hdl:11495/DA23-DEB6-9EE5-2
- identifier: ask
- distribution Info:
- licence Info:
- user Category: Restricted
- distribution Access Medium: accessibleThroughInterface
- execution Location: http://clarino.uib.no/korpuskel/corpus-list?collection=ASK
- attribution Text: Tenfjord, Kari; Meurer, Paul; Hofland, Knut. The ASK Corpus – A Language Learner Corpus of Norwegian as a Second Language. Proceedings from 5th International Conference on Language Resources and Evaluation (LREC), Genova 2006. URL http://www.lrec- conf.org/proceedings/lrec2006/pdf/573_pdf
- licence:
- licence Family: CLARIN
- licence Name: CLARIN_RES-PRIV
- licence Url: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaRes?ID=1&PERM=1&PLAN=1&BY=1&PRIV=1&NORED=1
- conditions Of Use: BY
- conditions Of Use: ID
- conditions Of Use: NORED
- conditions Of Use: PERM
- conditions Of Use: PLAN
- conditions Of Use: PRIV
- ipr Holder
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Bergen
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: CLARINO Bergen Centre
- actor Info:
- actor Type: person
- person Info:
- surname: Lyse
- given Name: Gunn Inger
- sex: female
- position: Researcher (Ph.D)
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- actor Info:
- actor Type: person
- role: Project leader
- person Info:
- surname: Tenfjord
- given Name: Kari
- sex: female
- position: Professor
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- department Name: Institutt for lingvistiske, litterære og estetiske studier (LLE)
- corpus Info:
- corpus Type: Written Corpus
- corpus Part Info:
- media Type: text
- corpus Part General Info:
- source Work Info:
- work Description: The corpus contains person information (anonymized) about, and texts written by, candidates who have completed two different tests in Norwegian: "Språkprøven i norsk for voksne innvandrere" [the language test for adult immigrants] and "Test i norsk – høyere nivå" [Test in Norwegian – higher level]
- linguality Info:
- linguality Type: bilingual
- multilinguality Type: comparable
- multilinguality Type Details: ASK is a corpus of comparable texts, constisting of one corpus of original text written by second language learners of Norwegian, and another corpus with reconstructions of each texts which accords perfectly with the written norm Norwegian bokmål. They are represented as two individual corpora, ASK and ASK-correct, respectively.
- language Info:
- language Id: nb
- language Name: Norwegian bokmål
- language Info:
- language Id: no
- language Name: Norwegian
- modality Info:
- modality Type: writtenLanguage
- size Info:
- size: 1129799
- size Unit: tokens
- size Info:
- size: 769892
- size Unit: words
- size Info:
- size: 1936
- size Unit: texts
- annotation Info:
- annotation Type: other
- annotation Description: texts written by second language learners of Norwegian are annotated, where needed, with the corresponding text according to the written norm of Norwegian bokmål
- annotation Info:
- annotation Type: morphosyntacticAnnotation-posTagging
- segmentation Level: word
- annotation Format: See: Oslo-Bergen-tagger: http://omilia.uio.no/obt/les.html
- tagset: See: Oslo-Bergen-tagger: http://omilia.uio.no/obt/les.html
- annotation Mode: automatic
dc:type | corpus |
dc:title | ASK – The Norwegian Second Language Corpus |
dc:identifier | oai:clarino.uib.no:ask |
dc:description | ASK is an electronic, searchable text corpus of Norwegian as a second language, with links between linguistic data and personal data. Ask was established by the Norwegian Second Language Corpus project. The corpus contains written texts produced by language learners from ten different language backgrounds: German, Dutch, English, Spanish, Russian, Polish, Bosnian-Croatian-Serbian, Albanian, Vietnamese and Somali. The size of the corpus and the flexible query system make it possible to develop a new methodological approach to the study of transfer when the L2 is Norwegian. The selection of texts is primarily based on the native language of the test takers, and the typological distribution of these languages is taken into consideration. A corpus of Norwegian as a second language makes it possible to use quantitative methods in second language research, and provides a basis for pedagogical developments. ACCESS: the material is available in searchable form via the corpus search engine Corpuscle (see links in metadata). One can enter the ASK corpus directly via the Corpuscle main page, or using a direct link to this specific corpus via the ASK project page. Four texts of the material have also been uploaded and parsed as a small treebank in INESS. |
dc:publisher | |
dc:format | accessibleThroughInterface |
dc:date | |
dc:date | |
dc:rights | Restricted |
dc:rights | CLARIN |
dc:rights | CLARIN_RES-PRIV |
dc:rights | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaRes?ID=1&PERM=1&PLAN=1&BY=1&PRIV=1&NORED=1 |
dc:creator | Kari Tenfjord |
dc:lang | Norwegian bokmål |
dc:lang | Norwegian |