Lemmatization

MAXDictio permits lemmatization of words in various languages for word frequency and word combination functions.  When this option is activated, words are returned to their respective basic forms, so that words with the same meaning are combined regardless of declination or case. For example, only the word “give” will be counted for the various forms “give”, “gave”, and “given”.

In MAXDictio, lemmatization is performed using lemma lists which are available in the following languages:

  • Bulgarian
  • German
  • English
  • Estonian
  • French
  • Italian
  • Catalan
  • Polish
  • Portuguese
  • Swedish
  • Spanish
  • Czech
  • Ukrainian
  • Hungarian

The lists are in TXT format (UTF-8) and can be edited and expanded as required. It is recommended to create backup copies, because files may be overwritten during a new installation. You can find the lists in the installation folder of MAXQDA:

Windows: local installation

C:\Program Files\MAXQDA2020\Resources\Lemmatization

Windows: portable installation on USB pen drive

USB drive > MAXQDA 2020

Mac: local installation

Applications > Right-click on MAXQDA2020 > Show package contents: Contents > Resources > Lemmatization

Mac: portable installation on USB pen drive

USB drive > MAXQDA 2020 Portable for Mac > Right-click on MAXQDA2020 > Show package contents: Contents / Resources / Lemmatization

Important license note: MAXDictio uses lemmatize lists for this function that have been published under the Creative Commons and the Open Database License. If you use this function for a publication you have to give appropriate credit in a short note like the following:

Lemma list for German: “A lemma list has been used that is based on "Deutsche Morphologie-Daten" by Daniel Naber (http://www.danielnaber.de/morphologie/), which is made available under the Creative Commons Attribution-ShareAlike 4.0 license (http://creativecommons.org/licenses/by-sa/4.0/).”

Lemma list for other languages: “A lemma list has been used that was originally provided by Michal Boleslav Měchura (http://www.lexiconista.com/datasets/lemmatization/), which is made available under the Open Database License (ODbL) (http://opendatacommons.org/licenses/odbl/1.0/).”

Was this article helpful?