MAXQDA 2022 Online Manual: Topic Modeling

Topic Modeling is a method from the world of Natural Language Processing (NLP). With the help of unsupervised machine learning, text documents are statistically analyzed for word patterns in order to compile words into groups (the so-called "topics"):

Please note: Topic Modeling is only available in TeamCloud projects.

Topic Modeling in MAXQDA

Topic Modeling in MAXQDA is primarily used for the exploration of data. Topic Modeling helps you to identify topics in your documents or survey responses and to include the identified topics in your analysis:

The identified topics can be saved as dictionary categories containing the corresponding words. You can then use the dictionary for autocoding and dictionary-based content analysis.
The dominant topic in each analyzed document can be recorded in a document variable.
The documents can be assigned to document sets according to the dominant topics.

Prerequisite for performing Topic Modeling in MAXQDA

In order to obtain meaningful results with Topic Modeling, it is necessary that not only a few documents with very similar most frequent words are analyzed. With less than 30 documents, meaningful results can hardly be expected; using survey data with 100 cases or more, the chances to get meaningful results are greater.

Please note: The analyzed texts must stem from different documents. It is therefore (currently) not possible to analyze tweets or YouTube comments if several tweets or comments are located in one table document.

How to start Topic Modeling

Activate all documents that you want to include in the analysis.
Start MAXDictio > Topic Modelingfunction.
Choose desired settings in the dialog:
1. Number of topics
2. Restriction to certain documents or segments
3. Ignoring certain content and stop words
4. Lemmatization (reduction of words to their base form)

After clicking OK the calculation starts. Topic Modeling is a very computationally intensive process and can therefore take several minutes, even with smaller amounts of data.

Tips: (1) You can continue working with other MAXQDA functions during the calculation. The calculation will continue if you leave the window open. (2) If you closed the window after the calculation, you can open it again via MAXDictio > Topic Modeling > Last Result Topic Modeling.

The results window

In the result window, the identified words per topic are presented:

The more important a word is for a topic, the larger and more colored it is displayed in the word cloud and the higher up it will be presented in the list view.

The following options are available in the Results window:

Using the icons above the topics, you can switch the view from list to word cloud.
To name or rename the individual topics, click on the current name.
To ignore a topic for further analysis, click the crossed-out eye icon. Disabled topics are ignored in the following functions: “Topic Document Matrix” and “Save Topics as Dictionaries/Variables/Document Set”.
At the top left of the ribbon, you can change the number of topics and the number of words displayed per topic. If you change the number of topics, the calculation will be restarted, and the set topic labels will be reset.
On the top right you can save the current view in QTT, copy it to the clipboard, or export it.

Topic-Document-Matrix

Via Start > Topic Document Matrix in the result window you can call up a visualization that shows which topics dominate in which document.

The calculated probabilities that a topic occurs in a document serve as the basis for the visualization. The probability value between 0 and 1 is multiplied by 100 for the display.

Using the icons above the display, you can adjust the display, for example, you can switch to a heatmap view as shown in the image above. The functions in the toolbar are described in detail in the Code Matrix Browser section of the manual.

Saving results

The assignments of words to topics and the probabilities of topics per document can be saved as follows:

Save Topics as Dictionaries – A new dictionary is created in MAXDictio. The topic names are used as category names and the top words per topic are entered as search items. You should check the dictionary for duplicate words, because it is possible that the same word is significant for different topics (usually with different weighting).

Save Topics as Document Variable – A new document variable is created and for each analyzed document the topic name with the highest probability is entered. If several topics are equally likely, “not defined” is entered.

Save Topics as Document Sets – One document set is created per topic in the “Document System” window. Each document is assigned to the topic set with the highest membership probability.

Topic Modeling for Survey Responses

If you use the Analysis > Categorize Survey Responses feature to code your survey responses, you can invoke Topic Modeling directly from the analysis window in the Start menu.

Only the currently displayed responses are considered in the analysis.

Concluding remarks

Topic Modeling is a statistical modeling technique that does not consider the meaning of words. Accordingly, results may differ from subjective expectations.
MAXQDA uses Gensimwith the Latent Dirichlet Allocation (LDA) algorithm to determine the topics. To reproduce the results in Gensim: MAXQDA uses 50 iterations and sets 1 as the random state to always obtain the same results for the same input.

MAXQDA 2022 Manual

Topic Modeling

Topic Modeling in MAXQDA

Recommended procedure for Topic Modeling in MAXQDA

1. Preparation: Create a stop word list for the data

2. Create the first model

3. Check the model and create alternative models if necessary

4. Name the topics

5. Use and save the topics