Training Corpus jos1M
Utvidet metadata
- resource Common Info
- resource Type: corpus
- identification Info
- resource Name: Training Corpus jos1M
- description: The jos1M corpus contains 1 million words of sampled paragraphs from the FidaPLUS corpus. It is meant to serve as a training corpus for word-level tagging of Slovene. This silver-standard corpus is annotated for morphosyntactic descriptions (fine grained PoS tags) and lemmas, with about one fourth of the most problematic annotations hand-validated. The corpus is available in source TEI P5 XML and in the simpler and smaller vertical format, used by various concordancers.
- resource Short Name: jos1M
- url: http://clarino.uib.no/iness/landing-page?resource=jos1M&view=short
- url: http://clarino.uib.no/iness/landing-page?resource=jos1M
- P I D: hdl:11495/DC84-BF60-3823-5
- distribution Info
- licence Info
- user Category: Public
- licence
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC (CC-BY-NC)
- licence Url: http://creativecommons.org/licenses/by-nc/4.0/
- conditions Of Use: BY
- conditions Of Use: NC
- licence Info
- contact
- actor Info
- actor Type: person
- role: author
- person Info
- surname: Krek
- given Name: Simon
- affiliation:
- organization Info
- organization Name: “Jožef Stefan” Institute
- actor Info
- metadata Info
- metadata Creation Date: 28.03.2017
- metadata Last Date Updated: 06.03.2018
- metadata Creator
- actor Info
- actor Type: person
- person Info
- surname: Dione
- given Name: Cheikh Bamba
- sex: male
- position: Researcher (Ph.D)
- affiliation:
- organization Info
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- communication Info
- email: clarin@uib.no
- email: iness@uib.no
- actor Info
- resource Creator
- actor Info
- actor Type: person
- person Info
- surname: Erjavec, Tomaž
- affiliation:
- organization Info
- organization Name: Jožef Stefan Institute
- actor Info
- actor Info
- actor Type: person
- person Info
- surname: Krek, Simon
- affiliation:
- organization Info
- organization Name: Jožef Stefan Institute
- corpus Info
- corpus Type: Written Corpus
- corpus Part Info
- media Type: text
- corpus Part General Info
- annotation Info
- annotation Type: morphosyntacticAnnotation-posTagging
- annotation Type: lemmatization
- annotation Info
dc:type | corpus |
dc:title | Training Corpus jos1M |
dc:identifier | oai:clarino.uib.no:Jos1M |
dc:description | The jos1M corpus contains 1 million words of sampled paragraphs from the FidaPLUS corpus. It is meant to serve as a training corpus for word-level tagging of Slovene. This silver-standard corpus is annotated for morphosyntactic descriptions (fine grained PoS tags) and lemmas, with about one fourth of the most problematic annotations hand-validated. The corpus is available in source TEI P5 XML and in the simpler and smaller vertical format, used by various concordancers. |
dc:publisher | |
dc:format | |
dc:date | |
dc:date | |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC (CC-BY-NC) |
dc:rights | http://creativecommons.org/licenses/by-nc/4.0/ |