Word Frequencies: Analyze Word Frequencies

The simplest function of MAXDictio determines the vocabulary of all of a current project’s texts.

This function can be accessed by either:

  • selecting the option MAXDictio > Word Frequencies, or
  • just clicking on the corresponding quick button in the toolbar “MAXDictio”.
Start “Word frequencies” in MAXQDA

After starting the function, the following dialogue window appears. Here you may select all the options you need.

Word Frequency options in MAXQDA 2018

Selection of texts to be analyzed

Only for activated documents – the frequencies procedure will be restricted to the activated text files

Only in retrieved segments – the frequencies procedure will be restricted to the coded segments actually displayed in the “Retrieved Segments” window

If neither option is selected, all text and table documents in the MAXQDA project will be analyzed.

Please note: Please be aware that hyphenation is not recognized in PDF-documents.

Differentiation of results

None: The results table does not differentiate the results, providing only the totals over all analyzed texts.

By documents, document groups, document sets: The results table contains additional columns that can be used to compare word frequency within individual documents, document groups or document sets (see Differentiation by Documents, Document Groups and Document Sets). When the Only for activated documents option is selected, only activated documents within the document groups or document sets are taken into account, and only document groups or document sets containing activated documents will be analyzed.

By Codes: This option is available only if the analysis is restricted to the segments in the “Retrieved Segments” and a “Simple Coding Query” has been performed. The results table contains additional columns of recurring frequencies for each code that appears in the “Code System”. This option is particularly helpful when texts have been divided into text units using codes for MAXDictio analysis, as it allows you to compare the word frequencies within different codes.

Further options

Characters to be disregarded: By clicking the button with three dots , you will open a dialog box in which you can enter characters which are to be cut off from the words and ignored during the count. The selected characters then apply to all word-based functions in MAXQDA and MAXDictio.

How does MAXDictio define a “word”? A word, as showed above, is any sequence of characters between two delimiting characters. Delimiters can be, for example, blank spaces or punctuation marks. Take the example “work.” As the last word of a sentence, this word is delimited by a space on its left and a period on its right.

The characters to be used as delimiters must be entered in the “Characters to be cut off” dialog box. Normally these characters include punctuation marks, question marks, etc. The selection of characters is stored in the respective project so the same results will be obtained for MAXDictio functions even if you open the file on another computer. By default, the following characters are entered automatically in new projects:

@ ! § $ % & / ( ) = ? ^ ° ‘ ´ ` ” „ “ ” “ { } [ ] # + * _ . : , ; < > ~ —

Some problems may occur with hyphens. If compound words should be counted as one word and not be split into their separate parts, the hyphen may not be declared as a delimiter. It is best to experiment a little with different possibilities. As the counting of the word frequencies can be repeated with no significant loss of time, it is advisable to look through the results for any conspicuities, change the options if needed, and then repeat the analysis.

Minimal number of characters – words with fewer characters will be skipped

Apply stop list – If a stop list is to be used, the corresponding box must be checked. Click on the button with the three dots to open and edit the stop lists.

Case sensitivity: If this setting is activated, “Give” and “give”, for example, will be counted as different words. If the setting is inactive, all words will be displayed in lowercase in the results list.

Lemmatize words – when this box is checked, the identified words in the texts will be simplified to their word stems (lemmas) by using a lemma lexicon in the chosen language. For example, if a text contains the words “gave”, “given”, and “gives”, MAXDictio will list the base form “give” in the results table only.

Click OK, to begin the analysis of word frequencies. Depending on the size of the texts, this process may take a few moments. A display informs you about the progress of the analysis.