Parallel Corpus of documents from the Technical Regulations Information System for German-Spanish (v0.3)

Specialised parallel corpus Spanish-German (ES-ES, DE-AT and DE-DE), texts from the European Commission between 1997-2010.
The texts are technical regulations in a variety of domains. This third version is sentence aligned and is in TMX and TEI format.
TMX files are sentence aligned while TEI encoded files have the information about sentence alignment in stand-off annotation.
Every sentence includes information about the domain, the year and the file it belongs to as well as the sentence number. It contains files written in Austria and translated into European Spanish from three different domains: B00: Construction (205 files; 70,648 sentences; 1,563,000 words; time frame: 1999-2010); C00A: Agriculture, Fishing and Foodstuffs (12 files; 4879 sentences; 137,354 words; time frame: 1999-2001); H00: Domestic and Leisure Equipment (12 files; 1229 sentences; 58328 words; time frame: 2005-2010).

Additionally the corpus has also been Part-Of-Speech tagged using the TreeTagger POS tagger and the POS tagged files are also available.

Versions 0.1 and 0.2 are kept as individual records because they are (currently) intended to be downloaded individually.

Version 0.3 is encoded in TEI P5 and includes files from two new domains not included in versions 0.1 and 0.2: C00A (Agriculture, Fishing and Foodstuffs), which is currently under alignment and H00 (Domestic and Leisure Equipment), which includes all files available in the database up to 2010.

Additionally the corpus has also been Part-Of-Speech tagged using the TreeTagger POS tagger and the POS tagged files are also available.

Versions 0.1 and 0.2 are kept as individual records because they are (currently) intended to be downloaded individually.

Extended metadata

resource Common Info
- resource Type: corpus
- identification Info
  - resource Name: Parallel Corpus of documents from the Technical Regulations Information System for German-Spanish (v0.3)
  - description: Specialised parallel corpus Spanish-German (ES-ES, DE-AT and DE-DE), texts from the European Commission between 1997-2010. The texts are technical regulations in a variety of domains. This third version is sentence aligned and is in TMX and TEI format. TMX files are sentence aligned while TEI encoded files have the information about sentence alignment in stand-off annotation. Every sentence includes information about the domain, the year and the file it belongs to as well as the sentence number. It contains files written in Austria and translated into European Spanish from three different domains: B00: Construction (205 files; 70,648 sentences; 1,563,000 words; time frame: 1999-2010); C00A: Agriculture, Fishing and Foodstuffs (12 files; 4879 sentences; 137,354 words; time frame: 1999-2001); H00: Domestic and Leisure Equipment (12 files; 1229 sentences; 58328 words; time frame: 2005-2010). Additionally the corpus has also been Part-Of-Speech tagged using the TreeTagger POS tagger and the POS tagged files are also available. Versions 0.1 and 0.2 are kept as individual records because they are (currently) intended to be downloaded individually. Version 0.3 is encoded in TEI P5 and includes files from two new domains not included in versions 0.1 and 0.2: C00A (Agriculture, Fishing and Foodstuffs), which is currently under alignment and H00 (Domestic and Leisure Equipment), which includes all files available in the database up to 2010.
  - resource Short Name: TRIS corpus v03
  - url: http://clara.b.uib.no/fellows/carla-parra-escartin/tris/
  - identifier: tris_v03
- distribution Info
  - licence Info
    - user Category: Public
    - distribution Access Medium: downloadable
    - distribution Access Medium: accessibleThroughInterface
    - download Location: https://spaced.uib.no/xmlui/handle/11509/79
    - execution Location: http://clarino.uib.no/korpuskel/page
    - attribution Text: Parra Escartín, Carla. 2012. Design and compilation of a specialized Spanish-German parallel corpus, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) (Istanbul, Turkey) (Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis, eds.), European Language Resources Association (ELRA), May 2012, pp. 2199-2206. ISBN 978-2-9517408-7-7.
    - licence
      - licence Family: Creative Commons (CC)
      - licence Name: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
      - licence Url: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
      - conditions Of Use: BY
      - conditions Of Use: NC
      - conditions Of Use: SA
    - licensor:
    - actor Info
      - actor Type: organization
      - organization Info
        organization Name: University of Bergen
        organization Short Name: UiB
        organization Short Name: UoB
        department Name: Department of Linguistic, Literary and Aesthetic Studies
      - communication Info
        email: clarin@uib.no
        url: http://clarin.b.uib.no/about/
        country: Norway
    - distribution Rights Holder
      - actor Info
        actor Type: organization
        organization Info
        organization Name: University of Bergen
        organization Short Name: UiB
        organization Short Name: UoB
        department Name: CLARINO project
  - ipr Holder
    - actor Info
      - actor Type: organization
      - organization Info
        organization Name: University of Bergen
        organization Short Name: UiB
        organization Short Name: UoB
        department Name: Department of Linguistic, Literary and Aesthetic Studies
      - communication Info
        email: post@lle.uib.no
- contact
  - actor Info
    - actor Type: person
    - person Info
      - surname: Parra Escartín
      - given Name: Carla
      - affiliation:
      - organization Info
        organization Name: University of Bergen
        organization Short Name: UiB
        department Name: Department of Linguistic, Literary and Aesthetic Studies
- metadata Info
  - metadata Creation Date: 06.02.2015
  - metadata Language Name: English
  - metadata Language Id: en
  - metadata Last Date Updated: 12.02.2015
  - metadata Creator
    - actor Info
      - actor Type: person
      - person Info
        surname: Parra Escartín
        given Name: Carla
        sex: female
        affiliation:
        organization Info
        organization Name: University of Bergen
        organization Short Name: UiB
        department Name: Department of Linguistic, Literary and Aesthetic Studies
      - communication Info
        email: carla.parra@uib.no
        url: http://www.uib.no/en/persons/Carla.Parra.Escart%C3%ADn
        country: Norway
- version Info
  - version: v03
  - revision: Version 0.3 is encoded in TEI P5 and includes files from two new domains not included in versions 0.1 and 0.2: C00A (Agriculture, Fishing and Foodstuffs), which is currently under alignment and H00 (Domestic and Leisure Equipment), which includes all files available in the database up to 2010.
  - last Date Updated: 08.01.2013
- resource Documentation Info
- resource Creation Info
  - resource Creator
    - actor Info
      - actor Type: person
      - person Info
        surname: Parra Escartín
        given Name: Carla
        sex: female
        affiliation:
        organization Info
        organization Name: University of Bergen
        organization Short Name: UIB
        department Name: Department of Linguistic, Literary and Aesthetic Studies
      - communication Info
        email: carla.parra@uib.no
        url: http://www.uib.no/en/persons/Carla.Parra.Escart%C3%ADn
        country: Norway
  - funding Project:
  - project Info
    - project Name: Common Language Resources and their Applications (CLARA – Project number: 238405)
    - project Short Name: CLARA
    - project I D: 238405
    - url: http://clara.uib.no
    - funding Type: euFunds
    - funder: SP3-People-ITN (Network for Initial Training, Marie Curie Actions, FP7)
    - project Start Date: 01.12.2009
    - project End Date: 30.11.2013

dc:type	corpus
dc:title	Parallel Corpus of documents from the Technical Regulations Information System for German-Spanish (v0.3)
dc:identifier	oai:repo.clarino.uib.no:11509/79
dc:description	Specialised parallel corpus Spanish-German (ES-ES, DE-AT and DE-DE), texts from the European Commission between 1997-2010. The texts are technical regulations in a variety of domains. This third version is sentence aligned and is in TMX and TEI format. TMX files are sentence aligned while TEI encoded files have the information about sentence alignment in stand-off annotation. Every sentence includes information about the domain, the year and the file it belongs to as well as the sentence number. It contains files written in Austria and translated into European Spanish from three different domains: B00: Construction (205 files; 70,648 sentences; 1,563,000 words; time frame: 1999-2010); C00A: Agriculture, Fishing and Foodstuffs (12 files; 4879 sentences; 137,354 words; time frame: 1999-2001); H00: Domestic and Leisure Equipment (12 files; 1229 sentences; 58328 words; time frame: 2005-2010). Additionally the corpus has also been Part-Of-Speech tagged using the TreeTagger POS tagger and the POS tagged files are also available. Versions 0.1 and 0.2 are kept as individual records because they are (currently) intended to be downloaded individually. Version 0.3 is encoded in TEI P5 and includes files from two new domains not included in versions 0.1 and 0.2: C00A (Agriculture, Fishing and Foodstuffs), which is currently under alignment and H00 (Domestic and Leisure Equipment), which includes all files available in the database up to 2010.
dc:publisher
dc:format	downloadable
dc:date
dc:date
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
dc:rights	https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
dc:creator	Carla Parra Escartín
dc:lang	German
dc:lang	Spanish

Parallel Corpus of documents from the Technical Regulations Information System for German-Spanish (v0.3)

Extended metadata

Download resources

Download metadata

Go to resource page

Parallel Corpus of documents from the Technical Regulations Information System for German-Spanish (v0.3)

Extended metadata

Resource Common Info

Corpus Info

Dublin Core (DC)

Download resources

Download metadata

Go to resource page