INESS NorGramBank collection
Extended metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: INESS NorGramBank collection
- description: NorGramBank is a parsebank of Norwegian that is under development in the INESS project. It covers the two written standards of Norwegian (Bokmål and Nynorsk) and has varied types of texts, both fiction (adult and children’s fiction) and non-fiction (newspapers, information brochures, research articles, etc). NorGramBank is being built by automatically parsing with NorGram, a hand-written broad coverage computational grammar. NorGram is written in the Lexical Functional Grammar (LFG) framework on the XLE (Xerox Linguistic Environment) platform. It provides detailed grammatical analyses on two levels, constituent structure and functional structure. Parsing is done with XLE, and the LFG Parsebanker is used for semi-automatic disambiguation (Rosén et al. 2012, 2009, etc., see the Publications page at http://clarino.uib.no/iness). Parts of the material are annotated manually; the rest is disambiguated using statistical parse ranking. This is work in progress, and the portion of manually annotated material may increase. The full source texts remain copyright protected, and cannot be redistributed by INESS. The full list of authors and texts in each treebank can be found in INESS via "Treebank overview" in the left-hand menu in the INESS portal.
- url: http://clarino.uib.no/iness/landing-page?collection=NorGramBank
- P I D: hdl:11495/D977-14FF-354B-1
- identifier: NorGramBank
- distribution Info
- licence Info
- user Category: Unspecified
- licence
- licence Family: none
- licence Name: unspecified
- non Standard Conditions Of Use: Different licences apply for the treebanks in the collection; each of them must be accepted individually by clicking on the relevant treebank(s).
- licence Info
- contact
- actor Info
- actor Type: person
- person Info
- surname: Rosén
- given Name: Victoria
- sex: female
- position: Associate Professor
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- communication Info
- email: iness@uib.no
- actor Info
- metadata Info
- metadata Creation Date: 13.08.2015
- metadata Language Name: English
- metadata Language Id: en
- metadata Last Date Updated: 08.11.2017
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Lyse
- given Name: Gunn Inger
- sex: female
- position: Researcher (Ph.D)
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- communication Info
- email: iness@uib.no
- email: clarin@uib.no
- actor Info
- funding Project:
- project Info
- project Name: Infrastructure for the Exploration of Syntax and Semantics
- project Short Name: INESS
- project I D: 195323
- url: http://clarino.uib.no/iness/
- funding Type: nationalFunds
- funder: The Research Council of Norway under the Infrastruktur program
- funder: University of Bergen
- funding Country: Norway
- project Start Date: 2011
- project End Date: 2016
- corpus Info
- corpus Type: Treebank
- corpus Part Info
- media Type: text
- corpus Part General Info
- linguality Info
- linguality Type: monolingual
- language Info
- language Id: nb
- language Name: Norwegian Bokmål
- size Per Language
- size Info
- size: 3254218
- size Unit: sentences
- size Info
- language Info
- language Id: nn
- language Name: Norwegian Nynorsk
- size Per Language
- size Info
- size: 366727
- size Unit: sentences
- size Info
- language Info
- language Id: no
- language Name: Norwegian
- size Per Language
- size Info
- size: 3642868
- size Unit: sentences
- size Info
- modality Info
- modality Type: writtenLanguage
- annotation Info
- annotation Type: other
- segmentation Level: word
- annotation Mode: interactive
- annotation Mode Details: Text Preprocessing: When a corpus is parsed, there will always be words that are unknown to the morphological analyzer and/or the lexicon. Thus, the documents must be preprocessed before syntactic parsing. INESS has therefore developed an intelligent browser-based preprocessing interface which facilitates efficient text cleanup and the treatment of unknown word forms. For more details, cf. Rosén et al (2012). 'An integrated web-based treebank annotation system'. http://clarino.uib.no/iness/page?page-id=Publications.
- annotation Info
- annotation Type: syntacticAnnotation-treebanks
- annotation Standoff: false
- segmentation Level: sentence
- annotation Format: Negra/Tiger XML
- tagset: http://prosjekt.digital.uni.no/projects/inesspublic/wiki/NorGram_Lexical_Categories_(Preterminals); http://prosjekt.digital.uni.no/projects/inesspublic/wiki/NorGram_Phrase_Structure_Categories; http://prosjekt.digital.uni.no/projects/inesspublic/wiki/NorGram_F-structure_Features
- theoretic Model: Lexical Functional Grammar (LFG)
- annotation Mode: mixed
- annotation Mode Details: Automatic parsing, manual disambiguation using discriminants.
- annotation Manual Unstructured
- role: annotationManual
- document Unstructured: http://clarino.uib.no/iness/page?page-id=_NorGram_annotator_guidelines_
- annotator:
- actor Info
- actor Type: person
- person Info
- surname: Gyri Smørdal
- given Name: Losnegaard
- sex: female
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- actor Info
- actor Type: person
- person Info
- surname: Lyse
- given Name: Gunn Inger
- sex: female
- position: Researcher (Ph.D)
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- communication Info
- email: iness@uib.no
- email: clarin@uib.no
- linguality Info
- actor Info
- actor Type: person
- person Info
- surname: Thunes
- given Name: Martha
- sex: female
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- actor Info
- actor Type: person
- person Info
- surname: Haugereid
- given Name: Petter
- sex: male
- position: Researcher (Ph.D)
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- actor Type: person
- person Info
- surname: Fatnes
- given Name: Ingeborg
- sex: female
- position: Scientific assistant in INESS (text preprocessing)
- actor Type: person
- person Info
- surname: Dale
- given Name: Ingerid
- sex: female
- position: Scientific assistant in INESS (text preprocessing)
- actor Type: person
- person Info
- surname: Bergmann
- given Name: Julie
- sex: female
- position: Scientific assistant in INESS (text preprocessing)
- genre Info
- genre Type: textGenre
dc:type | corpus |
dc:title | INESS NorGramBank collection |
dc:identifier | oai:clarino.uib.no:oslo-bergen |
dc:description | NorGramBank is a parsebank of Norwegian that is under development in the INESS project. It covers the two written standards of Norwegian (Bokmål and Nynorsk) and has varied types of texts, both fiction (adult and children’s fiction) and non-fiction (newspapers, information brochures, research articles, etc). NorGramBank is being built by automatically parsing with NorGram, a hand-written broad coverage computational grammar. NorGram is written in the Lexical Functional Grammar (LFG) framework on the XLE (Xerox Linguistic Environment) platform. It provides detailed grammatical analyses on two levels, constituent structure and functional structure. Parsing is done with XLE, and the LFG Parsebanker is used for semi-automatic disambiguation (Rosén et al. 2012, 2009, etc., see the Publications page at http://clarino.uib.no/iness). Parts of the material are annotated manually; the rest is disambiguated using statistical parse ranking. This is work in progress, and the portion of manually annotated material may increase. The full source texts remain copyright protected, and cannot be redistributed by INESS. The full list of authors and texts in each treebank can be found in INESS via "Treebank overview" in the left-hand menu in the INESS portal. |
dc:publisher | |
dc:format | |
dc:date | |
dc:date | |
dc:rights | Unspecified |
dc:rights | none |
dc:rights | unspecified |
dc:rights | |
dc:lang | Norwegian Bokmål |
dc:lang | Norwegian Nynorsk |
dc:lang | Norwegian |