NST Pronunciation Lexicon for Swedish
Extended metadata
- resource Common Info:
- resource Type: lexicalConceptualResource
- identification Info:
- resource Name: NST uttaleleksikon for svensk
- resource Name: NST Pronunciation Lexicon for Swedish
- description: Dette uttaleleksikonet for svensk vart opphavleg produsert av Nordisk språkteknologi (NST), og inneheld om lag 927.000 oppslag. Ordlista tek utgangspunkt i dei 100.000 mest frekvente ordformene i det svenske tekstkorpuset til NST. Heile leksikonet ligg føre som ei stor fil i rein tekst-format. Kvart oppslag er på ei line, det er 51 postar tilgjengeleg på kvar line, og postane er skilde med semikolon. Ikkje alle postane er like relevante for alle føremål, men gitt formatet er det lett å hente ut den informasjonen ein treng. Leksikonet inneheld mellom anna informasjon om dekomponeringsledd i samansettingar og ein eller flere fonetiske transkripsjon.ar Denne transkripsjonen er delvis gjort manuelt, men det meste er gjort automatisk ved hjelp av ein inflektor, og dette er delvis stikkprøvekontrollert. Sjølve inflektoren, og anna leksikalsk verktøy som kan nyttast til å handsame leksikonet, kan lastast ned som ein eigen zip-fil. Transkripsjonsformatet er SAMPA (Speech Assessment Methods Phonetic Alphabet).
- description: This pronunciation lexicon for Swedish was originally produced by Nordic Language Technology (NST), and contains approximately 927,000 entries. The word list is based on the 100,000 most frequent word forms in the Swedish text corpus of NST. The lexicon is available as one large file in simple text format. Each entry occupies one line, and there are 51 fields available on each line, the fields are separated by a semicolon. Not all fields are equally relevant for all purposes, but given the format it is easy to extract the information you need. The lexicon contains, among other things, information about the decomposition of compounds and one or more phonetic transcriptions. This transcription has partly been done manually, but most has been done automatically with the help of an inflector, random samples of which has been checked manually. The inflector itself, and other lexical tools that can be used to handle the lexicon, can be downloaded as a separate zip file. The transcription format is SAMPA (Speech Assessment Methods Phonetic Alphabet).
- url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-22/
- P I D: hdl:21.11146/22
- identifier: sbr-22
- distribution Info:
- licence Info:
- user Category: Public
- distribution Access Medium: downloadable
- download Location: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-22/
- licence:
- licence Family: Creative Commons (CC)
- licence Name: Creative_Commons-ZERO (CC-ZERO)
- licence Url: https://creativecommons.org/publicdomain/zero/1.0/
- licensor:
- actor Info:
- actor Type: organization
- role: Licensor
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- distribution Rights Holder
- actor Info:
- actor Type: organization
- role: Distribution Rights Holder
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- communication Info:
- email: sprakbanken@nb.no
- url: https://www.nb.no/sprakbanken/
- address: P.O. Box 2674 Solli
- zip Code: 0203
- city: Oslo
- region: Oslo
- country: Norway
- actor Info:
- actor Type: organization
- role: Contact
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- actor Info:
- actor Type: person
- role: Metadata Creator
- person Info:
- surname: Birkenes
- given Name: Magnus Breder
- affiliation:
- organization Info:
- organization Name: Nasjonalbiblioteket
- organization Name: National Library of Norway
- organization Short Name: NB
- organization Short Name: NLN
- department Name: Språkbanken
- department Name: The Language Bank
- actor Info:
- actor Type: organization
- role: Resource Creator
- organization Info:
- organization Name: Nordisk språkteknologi AS
- organization Name: Nordic Language Technology AS
- organization Short Name: NST
- organization Short Name: NST
- lexical Conceptual Resource Info Rev1:
- lexical Conceptual Resource Type: computationalLexicon
- lexical Conceptual Resource Part General Info:
- linguality Info:
- linguality Type: monolingual
- language Info:
- language Id: sv
- language Name: Swedish
- size Per Language:
- size Info:
- size: 927167
- size Unit: words
- language Variety Info:
- language Variety Type: other
- language Variety Name: standard language
- size Per Language Variety:
- size Info:
- size: 927167
- size Unit: words
- modality Info:
- modality Type: writtenLanguage
- modality Type Details: General, non-specific vocabulary, frequency-based with additions from other sources, including named entities.
- size Info:
- size: 927167
- size Unit: words
- creation Info:
- creation Mode: mixed
- creation Mode Details: Frequency-based with addition from other sources, about 25% of the words are manually transcribed/checked. Also contains information about part-of-speech, among many other things.
- lexical Conceptual Resource Encoding Info:
- encoding Level: phonetics
- linguistic Information: phonetics-Transcription
- theoretic Model: SAMPA/https://www.phon.ucl.ac.uk/home/sampa/
- lexical Conceptual Resource Part Info Rev1:
- media Type: text
- lexical Conceptual Resource Text Info:
- text Format Info:
- mime Type: text/csv
- character Encoding Info:
- character Encoding: UTF-8
dc:type | lexicalConceptualResource |
dc:title | NST Pronunciation Lexicon for Swedish |
dc:identifier | oai:nb.no:sbr-22 |
dc:description | This pronunciation lexicon for Swedish was originally produced by Nordic Language Technology (NST), and contains approximately 927,000 entries. The word list is based on the 100,000 most frequent word forms in the Swedish text corpus of NST. The lexicon is available as one large file in simple text format. Each entry occupies one line, and there are 51 fields available on each line, the fields are separated by a semicolon. Not all fields are equally relevant for all purposes, but given the format it is easy to extract the information you need. The lexicon contains, among other things, information about the decomposition of compounds and one or more phonetic transcriptions. This transcription has partly been done manually, but most has been done automatically with the help of an inflector, random samples of which has been checked manually. The inflector itself, and other lexical tools that can be used to handle the lexicon, can be downloaded as a separate zip file. The transcription format is SAMPA (Speech Assessment Methods Phonetic Alphabet). |
dc:publisher | |
dc:format | downloadable |
dc:date | 2000-01-03 |
dc:date | 2003-02-24 |
dc:rights | Public |
dc:rights | Creative Commons (CC) |
dc:rights | Creative_Commons-ZERO (CC-ZERO) |
dc:rights | https://creativecommons.org/publicdomain/zero/1.0/ |
dc:creator | Nordic Language Technology AS |
dc:lang | Swedish |