NST Pronunciation Lexicon for Swedish

This pronunciation lexicon for Swedish was originally produced by Nordic Language Technology (NST), and contains approximately 927,000 entries. The word list is based on the 100,000 most frequent word forms in the Swedish text corpus of NST.

The lexicon is available as one large file in simple text format. Each entry occupies one line, and there are 51 fields available on each line, the fields are separated by a semicolon. Not all fields are equally relevant for all purposes, but given the format it is easy to extract the information you need.

The lexicon contains, among other things, information about the decomposition of compounds and one or more phonetic transcriptions. This transcription has partly been done manually, but most has been done automatically with the help of an inflector, random samples of which has been checked manually. The inflector itself, and other lexical tools that can be used to handle the lexicon, can be downloaded as a separate zip file.

The transcription format is SAMPA (Speech Assessment Methods Phonetic Alphabet).

Download resources

Extended metadata

Last ned metadata (CMDI XML)

Last ned metadata (CMDI XML) https://www.nb.no/sprakbanken/oai?verb=GetRecord&identifier=oai:nb.no:sbr-22&metadataPrefix=cmdi

dc:type	lexicalConceptualResource
dc:title	NST Pronunciation Lexicon for Swedish
dc:identifier	oai:nb.no:sbr-22
dc:description	This pronunciation lexicon for Swedish was originally produced by Nordic Language Technology (NST), and contains approximately 927,000 entries. The word list is based on the 100,000 most frequent word forms in the Swedish text corpus of NST. The lexicon is available as one large file in simple text format. Each entry occupies one line, and there are 51 fields available on each line, the fields are separated by a semicolon. Not all fields are equally relevant for all purposes, but given the format it is easy to extract the information you need. The lexicon contains, among other things, information about the decomposition of compounds and one or more phonetic transcriptions. This transcription has partly been done manually, but most has been done automatically with the help of an inflector, random samples of which has been checked manually. The inflector itself, and other lexical tools that can be used to handle the lexicon, can be downloaded as a separate zip file. The transcription format is SAMPA (Speech Assessment Methods Phonetic Alphabet).
dc:publisher
dc:format	downloadable
dc:date	2000-01-03
dc:date	2003-02-24
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-ZERO (CC-ZERO)
dc:rights	https://creativecommons.org/publicdomain/zero/1.0/
dc:creator	Nordic Language Technology AS
dc:lang	Swedish

NST Pronunciation Lexicon for Swedish

Download resources

Extended metadata

Dublin Core (DC)

Last ned metadata (CMDI XML)