OCR Models for Sámi Languages

This is a collection of models for OCR (optical character recognition) of Sámi languages. These can be used to recognize text in images of printed text (scanned books, magazines, etc.) in North Sámi, South Sámi, Lule Sámi, and Inari Sámi.

You can read more detailed information about the training and evaluation of the models in the article ‘Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway’, see https://arxiv.org/abs/2501.07300.

The collection consists of three different types of models: Transkribus models, Tesseract models, and TrOCR models.

See the documentation file for more information.

The collection consists of three different types of models: Transkribus models, Tesseract models, and TrOCR models.

See the documentation file for more information.

Download resources

Extended metadata

Last ned metadata (CMDI XML)

Last ned metadata (CMDI XML) https://www.nb.no/sprakbanken/oai?verb=GetRecord&identifier=oai:nb.no:sbr-100&metadataPrefix=cmdi

dc:type	toolService
dc:title	OCR Models for Sámi Languages
dc:identifier	oai:nb.no:sbr-100
dc:description	This is a collection of models for OCR (optical character recognition) of Sámi languages. These can be used to recognize text in images of printed text (scanned books, magazines, etc.) in North Sámi, South Sámi, Lule Sámi, and Inari Sámi. You can read more detailed information about the training and evaluation of the models in the article 'Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway', see https://arxiv.org/abs/2501.07300. The collection consists of three different types of models: Transkribus models, Tesseract models, and TrOCR models. See the documentation file for more information.
dc:publisher
dc:format	downloadable
dc:date	2024-08-01
dc:date	2025-01-22
dc:rights	Public
dc:rights	Creative Commons (CC)
dc:rights	Creative_Commons-BY (CC-BY)
dc:rights	https://creativecommons.org/licenses/by/4.0/
dc:creator	National Library of Norway
dc:lang

OCR Models for Sámi Languages

Download resources

Extended metadata

Dublin Core (DC)

Last ned metadata (CMDI XML)