Word Combinations

What is the “Word Combinations” Function?

The word combinations function provides an overview of the most frequent combinations of up to 5 words, and word combinations which occur more rarely in selected texts. The function operates similarly to the word frequencies function (see Analyze Word Frequencies), in that not only individual words but combinations of up to 5 words can be listed.

Call Up the Function and Setting Options

Begin the search for word combinations via the menu option MAXDictio > Word combinations. The following dialog window will appear:

Set options for defining word combinations
Number of words

At the top of the dialog window, define how many words to include in the word combination. You can search for combinations of up to 5 words. The setting “Search for word combinations with 3 to 3 words” will search for all word combinations with exactly 3 words. The setting “Search for word combinations with 2 to 4 words” will list all 2,3, and 4 word combinations.

Selection of texts to be analyzed

Only for activated documents: This option restricts the analysis to currently activated documents.

Only in retrieved segments: This option restricts the analysis to text segments that are currently displayed in the “Retrieved segments” window.

If neither option is selected, all text and table documents in the MAXQDA project will be analyzed.

Hint: Please be aware that hyphenation can not be recognized in PDF-documents.
Differentiation of results

None: The results table will not contain any differentiation of results, but only the totals across all analyzed texts.

By documents, document groups, document sets: The results table will contain additional columns that can be used to compare the frequencies of word combinations of the individual documents, document groups or document sets. With the option Only activated documents, only the activated documents are taken into account within the document groups or document sets, and only the groups and sets with documents that are activated are included.

By Codes: This option is only available if the analysis is restricted to the segments in the “Retrieved segments” and a “Simple Coding Query” has been performed. The results table contains additional columns of frequencies for each code that appears in the “Code System”. This option is particularly useful if you have divided texts into text segments using codes for analysis in MAXDictio, as it allows you to compare the frequencies of word combinations within different codes.

Further options

Characters to be cut off: By clicking the button with three dots (…), you will open a dialog box in which you can enter characters which are to be cut off from the words and ignored during the count. The setting of the characters to be cut off applies to all functions in MAXDictio.

How does MAXDictio define a “word”? A word is any sequence of characters between two delimiting characters. Delimiting characters may include, for example, spaces or punctuation marks. For example, the last word of the last sentence in this paragraph, “marks”, is delimited by a space to the left and a period to the right.

The characters to be used as delimiters must be entered in the “Characters to be cut off” dialog box. These normally include punctuation marks, quotation marks, etc.:

@ ! § $ % & / ( ) = ? ^ ° ‘ ´ ` ” „ “ ” “ { } [ ] # + * _ . : , ; < > ~ —

Some problems may occur with hyphens. If compound words should be counted as one word and not be split into their separate parts, the hyphen may not be declared as a delimiter. It is best to experiment a little with different possibilities. As the counting of the word frequencies can be repeated with no significant loss of time, it is advisable to look through the results for any conspicuities, change the options if needed, and then repeat the analysis.

Min. number of characters: Words containing fewer characters than specified here will not be included in the analysis. The default option is set to 1 character. If this option is increased to 2 characters, words such as “he” and “it” will be excluded from the analysis and treated as words on the stop list.

Apply stop list: Check this box to apply a stop list. Clicking on the button with 3 dots (…) will open the stop list window where stop lists can be selected and edited.

Case sensitivity: When this setting is activated, the word combination “Go home” will be treated as a different word combination than “go home”. When the setting is not activated, all words will be displayed in lowercase in the results table.

Only word combinations within sentences: The definition of the word combinations in MAXDictio is such that all words of a text are written successively into a series. Up to 5 consecutive words then form the word combinations, ignoring paragraphs, points, exclamation marks etc. between words. Therefore, it is usually useful to turn this option on to ignore all word combinations that go beyond the end of a sentence.

Example: “It is warm. I am going home.” If the above option is not activated, the two-word combination “warm I”, which are unrelated, would be counted.

Hint: Sentences are defined by MAXQDA according to the following rules: A sentence always begins following a period, question mark, exclamation mark, or colon. The following exceptions apply:

# A number that is not four digits appears before a period (e.g. 1. or 2.).

# A single character appears before a period (to exclude abbreviations).

# Two identical characters appear directly before a period (e.g. ff. or pp.).

# Literal speech in quotation marks belonging to the sentence itself.

# First letter after a sentence is in small letters.

# A number appears directly following the end of a sentence.

# Quotation marks appear immediately after the end of a sentence.

In text or table documents, a new sentence invariably begins after a paragraph.

Only word combinations within parts of sentences. Separators …: Often, it is not logical to count combinations of words that are separated by, for example, a comma. It is therefore advisable to activate this option in order to obtain a breakdown of sentences into parts of sentences. Separators are defined by clicking the button with 3 points (…). By default, the following characters are defined as separators:

; , – ( ) … [ ]

Example: “I’m tired, so I’m going home.” If the above option is selected, the sentence will be separated into two parts by the comma, so the 2-word combination “tired so” would not be counted.

Lemmatize words: If this option is activated, each word is returned to its basic form using a lemma lexicon in the selected language. For example, the words “give”, “gave”, and “given”, would be counted only as “give”.

When you click OK, the analysis of word combinations will begin. Depending on the length of the text, this process may take a few moments. A display will inform you about the progress of the analysis.

Results Table

Results table for “Word combinations”

The table above displays the number of documents included in the analysis and the number of word combinations found. The table is automatically sorted by frequency each time it is reopened, with the most frequent word combination appearing in the first row.

The columns and information displayed (including toolbars) are the same as those described here: Word Frequencies: Table of Results, the only difference being that word combinations, rather than individual words, are displayed.

Lemmatization and Stop Lists

In the following section, the functioning of lemmatization and stop lists is described:

Stop Lists

  • If a word within a word combination, or a word combination itself appears on the stop list, the word combination will be ignored.
  • If part of a word combination appears on the stop list, the entire word combination will be ignored.
  • All words will be lemmatized. The combinations of lemmatized words will be output.

Interaction of Lemmatization and Stop Lists

  • If a lemmatized word appears on the stop list, the combination will also be ignored.
  • If a lemmatized combination appears on the stop list, the combination will be ignored.
  • If a lemmatized part of a combination appears on the stop list, the entire word combination will be ignored.