Ressurser fra ressursbanken Archive - Side 11 av 189 - Språkbanken

I samarbeid med

The Freiburg – Brown Corpus of American English

The Freiburg - Brown Corpus of American English (Frown) contains texts from 1991. Like the original Brown and LOB corpora, Frown contains 500 texts of around 2000 words each, distributed across 15 …

Opphav:
CLARINO Bergen
Lisens:
CLARIN_ACA
Oppdatert:
27.08.2018
The Corpus of Free Trade Agreements (FTA)

Corpus of Free Trade Agreements (English/Spanish) The FTA corpus consists of 233 XML source files in each language. The corpus contains approximately 1370000 words in the English section and 1483000 …

Opphav:
CLARINO Bergen
Lisens:
CLARIN_ACA
Oppdatert:
27.08.2018
The LOB corpus (POS tagged)

The Lancaster - Oslo/Bergen (LOB) Corpus is a million-word collection of present-day (1961) British English texts. The corpus was compiled under the direction of Geoffrey Leech, University of …

Opphav:
CLARINO Bergen
Lisens:
CLARIN_ACA
Oppdatert:
27.08.2018
The London-Lund Corpus of Spoken English (LLC)

The London-Lund Corpus contains samples of educated spoken British English, in orthographic transcription with detailed prosodic marking. It consists of 100 'texts', each of some 5,000 running words. …

Opphav:
CLARINO Bergen
Lisens:
CLARIN_ACA
Type:
Tekst
Oppdatert:
27.08.2018
Helsinki Corpus of Older Scots

The Helsinki Corpus of Older Scots was compiled as a supplement to the diachronic part of the Helsinki Corpus of English Texts. The Scottish texts were selected according to the same principles of …

Opphav:
CLARINO Bergen
Lisens:
CLARIN_ACA
Type:
Tekst
Oppdatert:
27.08.2018
The Oslo-Bergen Tagger

The Oslo-Bergen tagger is a robust morphological and syntactic tagger developed at the University of Oslo and at Uni Computing in Bergen over several years. The tagger consists of three main modules: …

Språk:
norsk
Opphav:
CLARINO Tekstlaboratoriet
Lisens:
General Public License (GPL)
Type:
Verktøy
Oppdatert:
05.06.2018
Nordisk syntaksdatabase

The database consists of judgments by 924 Nordic dialect speakers from 207 places to a list of sentences that illustrate various syntactic phenomena. Many of the speakers are the same in both database …

Språk:
norsk, svensk, dansk, islandsk, færøysk
Opphav:
CLARINO Tekstlaboratoriet
Lisens:
CLARIN_ACA-NC-LOC-PRIV-ND-*
Type:
Tekst
Oppdatert:
05.06.2018
NoWaC v 1.0 (Norwegian Web as Corpus)

Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between …

Språk:
bokmål
Opphav:
CLARINO Tekstlaboratoriet
Lisens:
Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
Type:
Tekst
Oppdatert:
05.06.2018
Glossa (new version)

Glossa is a tool for researchers who want to search linguistically annotated corpora. Glossa is designed to make it easy for researchers to: - create complex searches - explore the result via e.g. …

Språk:
Opphav:
CLARINO Tekstlaboratoriet
Lisens:
MIT license
Type:
Verktøy
Oppdatert:
05.06.2018
Frequency lists from NoWaC – Norwegian Web as Corpus

Frequency lists from NoWaC - Norwegian Web as Corpus - a web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing …

Språk:
norsk, Norwegian bokmål
Opphav:
CLARINO Tekstlaboratoriet
Lisens:
Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
Type:
Tekst
Oppdatert:
05.06.2018