The Oslo-Bergen Tagger
Extended metadata
- resource Common Info
- resource Type: toolService
- identification Info
- resource Name: The Oslo-Bergen Tagger
- description: The Oslo-Bergen tagger is a robust morphological and syntactic tagger developed at the University of Oslo and at Uni Computing in Bergen over several years. The tagger consists of three main modules: a preprocessor with multitagger and compound analyser (ref3), a grammar module for morphological and syntactic disambiguation (Constraint Grammar) (ref2) and a statistical module that removes the last of the remaining morphological ambiguity (only for Bokmål). The Constraint Grammar module uses a compiler developed at the University of Southern Denmark in Odense. The multitagger uses the lexicon Norsk ordbank.
- resource Short Name: obt
- url: http://www.tekstlab.uio.no/obt-ny/english/index.html
- P I D: http://hdl.handle.net/11538/0000-0005-E7C6-7
- distribution Info
- licence Info
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://github.com/noklesta/The-Oslo-Bergen-Tagger
- licence
- licence Family: GNU
- licence Name: General Public License (GPL)
- licence Url: http://www.gnu.org/licenses/gpl.html
- conditions Of Use: BY
- conditions Of Use: SA
- licensor:
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- distribution Rights Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: University of Oslo
- organization Name: Universitetet i Oslo
- organization Short Name: UiO
- organization Short Name: UoO
- department Name: Department of Linguistics and Scandinavian Studies
- department Name: Institutt for lingvistiske og nordiske studier (ILN)
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/english/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- licence Info
- ipr Holder
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- actor Info
- actor Type: person
- organization Info
- organization Name: Uni Research AS
- department Name: Uni Research Computing
- actor Info
- contact
- actor Info
- actor Type: organization
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- actor Type: person
- person Info
- surname: Meurer
- given Name: Paul
- sex: male
- position: Senior researcher
- affiliation:
- organization Info
- organization Name: Uni Research AS
- department Name: Uni Research Computing
- communication Info
- email: paul.meurer@uni.no
- actor Info
- metadata Creation Date: 16.03.2015
- metadata Last Date Updated: 05.06.2018
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Hagen
- given Name: Kristin
- organization Info
- organization Name: The Text Laboratory
- organization Short Name: Textlab
- department Name: Department of Linguistics and Scandinavian Studies, University of Oslo
- communication Info
- email: kristin.hagen@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- validated: true
- validation Mode Details: Bokmål: The evaluation of the morphological constraint grammar modul shows a success rate (recall) of 99% and a precision of 96%. This gives an f-measure of 97.5% (if recall and precision are weighted equally). The tagger was tested on a 30 000 words long evaluation corpus with texts from newspapers, magazines, journals, government reports and novels. Including the statistical module to perform complete disambiguation of the evaluation corpus yields a tagger accuracy of 96.5%. This number includes both fully disambiguation of morphology and lemma. Nynorsk: Evaluation is so far only made for the original CG1-module of the Oslo-Bergen tagger. This module had a success rate (recall) of 98.7% with 93.6% precision. This gives an f-measure of 96.2%. The evaluation corpus for Nynorsk also had about 30 000 words taken from newspapers, magazines, journals, government reports and novels.
- validation Report Unstructured
- role: validationReport
- document Unstructured: See in publications: http://www.tekstlab.uio.no/obt-ny/english/publications.html
- documentation Unstructured
- role: documentation
- document Unstructured: http://www.tekstlab.uio.no/obt-ny/english/index.html
- creation Start Date: 1996
- creation End Date: 2009
- resource Creator
- actor Info
- actor Type: organization
- communication Info
- email: tekstlab-post@iln.uio.no
- url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
- address: Box 1102 Blindern
- zip Code: 0317
- city: OSLO
- country: Norway
- actor Info
- funding Project:
- project Info
- project Name: the Tagger Project (Taggerprosjektet 1996 – 1998)
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Start Date: 01.01.1996
- project End Date: 31.12.1998
- project Name: Norwegian Newspaper Corpus (2007-2009)
- funding Type: nationalFunds
- funder: The Research Council of Norway
- funding Country: Norway
- project Start Date: 01.01.2007
- project End Date: 31.12.2009
- tool Info
- description: The tagger consists of three parts: 1) A multitagger (tokenizer, morphological analyzer, and compund analyzer). The multitagger is currently only distributed in binary form. (ref3) 2) A Constraint Grammar (CG) tagger (ref2) a) VISL CG-3 compiler from University of Southern Denmark b) Constraint grammar rules 3) OBT+stat – A statistical (HunPoS) tagger removing ambiguity not resolved in the CG step (currently only for bokmål)
- input Info
- media Type: text
- resource Type: corpus
- modality Type: writtenLanguage
- language Name: Norwegian
- language Name: Norwegian Bokmål
- language Name: Norwegian Nynorsk
- language Id: No
- language Id: Nb
- language Id: Nn
- mime Type: txt, xml
- character Encoding: latin1, utf-8
- annotation Type: lemmatization
- annotation Type: morphosyntacticAnnotation-posTagging
- tagset: http://www.tekstlab.uio.no/obt-ny/english/tagset.html
- segmentation Level: word
- segmentation Level: clause
- output Info
- media Type: text
- resource Type: corpus
- modality Type: writtenLanguage
- language Name: Norwegian
- language Name: Norwegian Bokmål
- language Name: Norwegian Nynorsk
- language Id: No
- language Id: Nb
- language Id: Nn
- mime Type: txt, xml
- character Encoding: latin1, utf-8
- tagset: http://www.tekstlab.uio.no/obt-ny/english/tagset.html
- segmentation Level: clause
- segmentation Level: word
- tool Service Operation Info
- operating System: linux
- operating System: mac-OS
- running Environment Info
- required Software
- target Resource Name U R I: VISL CG3: http://beta.visl.sdu.dk/cg3/chunked/installation.html.
- required Software
- target Resource Name U R I: HunPos: https://code.google.com/p/hunpos/
- required Software
dc:type | toolService |
dc:title | The Oslo-Bergen Tagger |
dc:identifier | oai:tekstlab.uio.no:oslo-bergen-tagger |
dc:description | The Oslo-Bergen tagger is a robust morphological and syntactic tagger developed at the University of Oslo and at Uni Computing in Bergen over several years. The tagger consists of three main modules: a preprocessor with multitagger and compound analyser (ref3), a grammar module for morphological and syntactic disambiguation (Constraint Grammar) (ref2) and a statistical module that removes the last of the remaining morphological ambiguity (only for Bokmål). The Constraint Grammar module uses a compiler developed at the University of Southern Denmark in Odense. The multitagger uses the lexicon Norsk ordbank. |
dc:publisher | |
dc:format | downloadable |
dc:date | 1996 |
dc:date | 2009 |
dc:rights | Public |
dc:rights | GNU |
dc:rights | General Public License (GPL) |
dc:rights | http://www.gnu.org/licenses/gpl.html |
dc:lang | Norwegian |