Skip to content

INESS list of lexical units unknown to the NorGram lexicon

In the INESS project, Norwegian texts in Norwegian Bokmål and Nynorsk are parsed with the NorGram grammar and lexicon. When text is parsed, there will always be words that are unknown to the morphological analyzer and/or the lexicon. INESS has therefore developed an intelligent browser-based preprocessing interface which facilitates, among other things, the efficient treatment of unknown word forms. The list of word forms that have not been automatically recognized are manually inspected.
While some of these result from OCR errors and others are simply typos, most unrecognized word forms are productive compounds, words only occurring in MWEs, names, foreign words, neologisms, interjections, dialect words, and systematic, or intended, misspellings. To read more about the types of lexical units registered, please refer to the documentation at http://clarino.uib.no/iness/page?page-id=Text_preprocessing.

In the INESS project, Norwegian texts in Norwegian Bokmål and Nynorsk are parsed with the NorGram grammar and lexicon. When text is parsed, there will always be words that are unknown to the morphological analyzer and/or the lexicon. INESS has therefore developed an intelligent browser-based preprocessing interface which facilitates, among other things, the efficient treatment of unknown word forms. The list of word forms that have not been automatically recognized are manually inspected.
While some of these result from OCR errors and others are simply typos, most unrecognized word forms are productive compounds, words only occurring in MWEs, names, foreign words, neologisms, interjections, dialect words, and systematic, or intended, misspellings. To read more about the types of lexical units registered, please refer to the documentation at http://clarino.uib.no/iness/page?page-id=Text_preprocessing.

Extended metadata

Download resources

Download metadata