Parallel Corpus of documents from the Technical Regulations Information System for German-Spanish
Extended metadata
- resource Common Info:
- resource Type: corpus
- identification Info:
- resource Name: Parallel Corpus of documents from the Technical Regulations Information System for German-Spanish
- description: The corpus TRIS (Parallel Corpus of documents from the Technical Regulations Information System for German-Spanish) is a specialized parallel corpus with Spanish-German (ES-ES, DE-AT and DE-DE), texts from the European Commission between 1997-2010. The texts are technical regulations in a variety of domains. This third and final version is sentence aligned and is in TMX and TEI format. TMX files are sentence aligned while TEI encoded files have the information about sentence alignment in stand-off annotation. Every sentence includes information about the domain, the year and the file it belongs to as well as the sentence number. It contains files written in Austria and translated into European Spanish from three different domains: – B00: Construction (205 files; 70,648 sentences; 1,563,000 words; time frame: 1999-2010) – C00A: Agriculture, Fishing and Foodstuffs (12 files; 4879 sentences; 137,354 words; time frame: 1999-2001) – H00: Domestic and Leisure Equipment (12 files; 1229 sentences; 58328 words; time frame: 2005-2010) Additionally the corpus has also been Part-Of-Speech tagged using the TreeTagger POS tagger and the POS tagged files are also available. TRIS version 0.3 is the final version and subsumes version 0.1 and 0.2, and corrects some errors that were present in the two first versions. TRIS v0.3 is encoded in TEI P5 and includes files from two domains not included in versions 0.1 and 0.2: C00A (Agriculture, Fishing and Foodstuffs), which is currently under alignment and H00 (Domestic and Leisure Equipment), which includes all files available in the database up to 2010.
- resource Short Name: TRIS Spanish-German parallel corpus
- url: http://clarino.uib.no/korpuskel/landing-page?identifier=tris&view=short
- url: http://hdl.handle.net/11509/79
- P I D: http://hdl.handle.net/11509/79
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- distribution Access Medium: accessibleThroughInterface
- download Location: http://hdl.handle.net/11509/79
- execution Location: http://clarino.uib.no/korpuskel/landing-page?identifier=tris&view=short
- attribution Text: Parra Escartin, Carla. 2013. Encoding a parallel corpus: The TRIS corpus experience. In: Vol 3,No 1 (2013) The many facets of corpus linguistics in Bergen – in honour of Knut Hofland. Bergen: Bergen Language and Linguistics Studies (BeLLS) 2013 ISBN 978-82-998587-2-4. pp. 61-80.
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
- licence Url: http://creativecommons.org/licenses/by-nc-sa/3.0/
- conditions Of Use: BY
- conditions Of Use: NC
- conditions Of Use: SA
- ipr Holder
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Faculty of Humanities
- department Name: Det humanistiske fakultet
- communication Info:
- email: clarin@uib.no
- url: https://repo.clarino.uib.no/
- url: https://clarin.b.uib.no
- city: Bergen
- country: Norway
- actor Info:
- actor Type: organization
- organization Info:
- organization Name: CLARINO Bergen Centre
- actor Info:
- actor Type: person
- person Info:
- surname: Lyse
- given Name: Gunn Inger
- sex: female
- position: Researcher (Ph.D)
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Name: Universitetet i Bergen
- organization Short Name: UiB
- organization Short Name: UoB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- actor Info:
- actor Type: person
- person Info:
- surname: Parra Escartín
- given Name: Carla
- sex: female
- position: Ph.D
- affiliation:
- organization Info:
- organization Name: University of Bergen
- organization Short Name: UiB
- department Name: Department of Linguistic, Literary and Aesthetic Studies
- corpus Info:
- corpus Type: Multilingual Corpus
- corpus Part Info:
- media Type: text
- corpus Text Info:
- text Format Info:
- mime Type: application/xml
- corpus Part General Info:
- source Work Info:
- work Description: Texts from the European Commission between 1997-2010. The texts are technical regulations in a variety of domains. The texts are written in Austria and translated into European Spanish from three different domains: – B00: Construction (205 files; 70,648 sentences; 1,563,000 words; time frame: 1999-2010) – C00A: Agriculture, Fishing and Foodstuffs (12 files; 4879 sentences; 137,354 words; time frame: 1999-2001) – H00: Domestic and Leisure Equipment (12 files; 1229 sentences; 58328 words; time frame: 2005-2010)
- linguality Info:
- linguality Type: bilingual
- multilinguality Type: parallel
- language Info:
- language Id: de
- language Name: German
- size Per Language:
- size Info:
- size: 1129372
- size Unit: tokens
- language Variety Info:
- language Variety Type: dialect
- language Variety Name: Austria
- language Info:
- language Id: es
- language Name: Spanish
- size Per Language:
- size Info:
- size: 1497496
- size Unit: tokens
- language Variety Info:
- language Variety Type: dialect
- language Variety Name: Spain
- modality Info:
- modality Type: writtenLanguage
- size Info:
- size: 2626868
- size Unit: words
- size Info:
- size: 76756
- size Unit: sentences
- annotation Info:
- annotation Type: morphosyntacticAnnotation-posTagging
- annotation Standoff: false
- segmentation Level: word
- annotation Format: word POS lemma
- tagset: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
- annotation Mode: automatic
- annotation Mode Details: No correction has been made to the POS tagged texts.
- annotation Info:
- annotation Type: alignment
- annotation Standoff: true
- segmentation Level: sentence
- conformance To Standards Best Practices: TMX
- conformance To Standards Best Practices: TEI_P5
- annotation Mode: interactive
- annotation Mode Details: TMX files are sentence aligned while TEI encoded files have the information about sentence alignment in stand-off annotation.
- time Coverage Info:
- time Coverage: 1997-2010
- creation Info:
- creation Mode: mixed
- original Source:
- target Resource Name U R I: http://ec.europa.eu/growth/tools-databases/tris/en/search/
dc:type | corpus |
dc:title | Parallel Corpus of documents from the Technical Regulations Information System for German-Spanish |
dc:identifier | oai:clarino.uib.no:tris |
dc:description | The corpus TRIS (Parallel Corpus of documents from the Technical Regulations Information System for German-Spanish) is a specialized parallel corpus with Spanish-German (ES-ES, DE-AT and DE-DE), texts from the European Commission between 1997-2010. The texts are technical regulations in a variety of domains. This third and final version is sentence aligned and is in TMX and TEI format. TMX files are sentence aligned while TEI encoded files have the information about sentence alignment in stand-off annotation. Every sentence includes information about the domain, the year and the file it belongs to as well as the sentence number. It contains files written in Austria and translated into European Spanish from three different domains: – B00: Construction (205 files; 70,648 sentences; 1,563,000 words; time frame: 1999-2010) – C00A: Agriculture, Fishing and Foodstuffs (12 files; 4879 sentences; 137,354 words; time frame: 1999-2001) – H00: Domestic and Leisure Equipment (12 files; 1229 sentences; 58328 words; time frame: 2005-2010) Additionally the corpus has also been Part-Of-Speech tagged using the TreeTagger POS tagger and the POS tagged files are also available. TRIS version 0.3 is the final version and subsumes version 0.1 and 0.2, and corrects some errors that were present in the two first versions. TRIS v0.3 is encoded in TEI P5 and includes files from two domains not included in versions 0.1 and 0.2: C00A (Agriculture, Fishing and Foodstuffs), which is currently under alignment and H00 (Domestic and Leisure Equipment), which includes all files available in the database up to 2010. |
dc:publisher | |
dc:format | downloadable |
dc:date | |
dc:date | 2014 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-BY-NC-SA (CC-BY-NC-SA) |
dc:rights | http://creativecommons.org/licenses/by-nc-sa/3.0/ |
dc:creator | Carla Parra Escartín |
dc:lang | German |
dc:lang | Spanish |