Skip to content

OCR Models for Sámi Languages

This is a collection of models for OCR (optical character recognition) of Sámi languages. These can be used to recognize text in images of printed text (scanned books, magazines, etc.) in North Sámi, South Sámi, Lule Sámi, and Inari Sámi.

You can read more detailed information about the training and evaluation of the models in the article ‘Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway’, see https://arxiv.org/abs/2501.07300.

The collection consists of three different types of models: Transkribus models, Tesseract models, and TrOCR models.

See the documentation file for more information.

This is a collection of models for OCR (optical character recognition) of Sámi languages. These can be used to recognize text in images of printed text (scanned books, magazines, etc.) in North Sámi, South Sámi, Lule Sámi, and Inari Sámi.

You can read more detailed information about the training and evaluation of the models in the article ‘Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway’, see https://arxiv.org/abs/2501.07300.

The collection consists of three different types of models: Transkribus models, Tesseract models, and TrOCR models.

See the documentation file for more information.

Extended metadata

Download resources

Download metadata