Word Combinations

What is the “Word Combinations” function?

The word combinations function provides an overview of the most frequent combinations of up to 5 words, and word combinations which occur more rarely in selected texts. The function operates similarly to the Word Frequency feature, in that not only individual words but combinations of up to 5 words can be listed.

Opening the function and selecting settings

Begin the search for word combinations via the function MAXDictio > Word Combinations in the ribbon menu. The following dialog window will appear:

Set options for defining word combinations

Number of words

At the top of the dialog window, define how many words to include in the word combination. You can search for combinations of up to 5 words. The setting “Search for word combinations with 3 to 3 words” will search for all word combinations with exactly 3 words. The setting “Search for word combinations with 2 to 4 words” will list all 2,3, and 4 word combinations.

Selection of texts to be analyzed

Only for activated documents: This option restricts the analysis to currently activated documents.

Only in "Retrieved Segments" window: This option restricts the analysis to text segments that are currently displayed in the “Retrieved segments” window.

If neither option is selected, all text and table documents in the MAXQDA project will be analyzed.

Differentiation of results

None: The results table will not contain any differentiation of results, but only the totals across all analyzed texts.

By documents, document groups, document sets: The results table will contain additional columns that can be used to compare the frequencies of word combinations of the individual documents, document groups or document sets. With the option Only activated documents, only the activated documents are taken into account within the document groups or document sets, and only the groups and sets with documents that are activated are included.

By Codes: This option is only available if the analysis is restricted to the segments in the “Retrieved segments” and a “Simple Coding Query” has been performed. The results table contains additional columns of frequencies for each code that appears in the “Code System”. This option is particularly useful if you have divided texts into text segments using codes for analysis in MAXDictio, as it allows you to compare the frequencies of word combinations within different codes.

Ignore

If you want to ignore certain word combinations from the analysis, such as Email adresses or hashtags, select the corresponding entries.

Further options

Min. number of characters: Words containing fewer characters than specified here will not be included in the analysis. The default option is set to 1 character. If this option is increased to 2 characters, words such as “he” and “it” will be excluded from the analysis and treated as words on the stop list.

Apply stop word list: Check this box to apply a stop list. Clicking on the button with 3 dots (...) will open the stop list window where stop lists can be selected and edited.

Case sensitivity: When this setting is activated, the word combination “Go home” will be treated as a different word combination than “go home”. When the setting is not activated, all words will be displayed in lowercase in the results table.

Only word combinations within sentences: The definition of the word combinations in MAXDictio is such that all words of a text are written successively into a series. Up to 5 consecutive words then form the word combinations, ignoring paragraphs, points, exclamation marks etc. between words. Therefore, it is usually useful to turn this option on to ignore all word combinations that go beyond the end of a sentence.

Example: “It is warm. I am going home.” If the above option is not activated, the two-word combination “warm I”, which are unrelated, would be counted.

Hint: Sentences are defined by MAXQDA according to the following rules: A sentence always begins following a period, question mark, exclamation mark, or colon. The following exceptions apply:
  • A number that is not four digits appears before a period (e.g. 1. or 2.).
  • A single character appears before a period (to exclude abbreviations).
  • Two identical characters appear directly before a period (e.g. ff. or pp.).
  • Literal speech in quotation marks belonging to the sentence itself.
  • First letter after a sentence is in small letters.
  • A number appears directly following the end of a sentence.
  • Quotation marks appear immediately after the end of a sentence.
  • After a paragraph, a new sentence begins, without exception.

Only word combinations within parts of sentences. Separators …: Often, it is not logical to count combinations of words that are separated by, for example, a comma. It is therefore advisable to activate this option in order to obtain a breakdown of sentences into parts of sentences. Separators are defined by clicking the button with 3 points (...). By default, the following characters are defined as separators:

; , – ( ) … [ ]

Example: “I’m tired, so I’m going home.” If the above option is selected, the sentence will be separated into two parts by the comma, so the 2-word combination “tired so” would not be counted.

Lemmatize words: If this option is activated, each word is returned to its basic form using a lemma lexicon in the selected language. For example, the words “give”, “gave”, and “given”, would be counted only as “give”.

When you click OK, the analysis of word combinations will begin. Depending on the length of the text, this process may take a few moments. A display will inform you about the progress of the analysis.

Results table

Results table for “Word combinations”

The table above displays the number of documents included in the analysis and the number of word combinations found. The table is automatically sorted by frequency each time it is reopened, with the most frequent word combination appearing in the first row.

The columns and information displayed (including toolbars) are the same as those described here: Word Frequencies: Table of Results, the only difference being that word combinations, rather than individual words, are displayed.

Lemmatization and stop word lists

How do lemmatizaion and application of a stop word list affect the results?

Stop Word Lists

  • If a word within a word combination, or a word combination itself appears on the selected stop word list, the word combination will be ignored.
  • If part of a word combination appears on the stop word list, the entire word combination will be ignored.
  • All words will be lemmatized. The combinations of lemmatized words will be output.

Interaction of lemmatization and stop word lists

  • If a lemmatized word appears on the stop word list, the combination will also be ignored.
  • If a lemmatized combination appears on the stop word list, the combination will be ignored.
  • If a lemmatized part of a combination appears on the stop word list, the entire word combination will be ignored.

Was this article helpful?