Stop word lists: improving visualization of text data

The term “stop word” refers to the most common words in a language. These are typically words which do not add much meaning to a sentence since they occur too frequently in various contexts or belong to a category called function words. When handling natural language data, it is often useful to filter out such words to clarify your analysis. As you can imagine, there is no such thing as an “universal list of stop words”- indeed, it is an unspoken fact that a researcher must create a personal stop word list or adjust already existing ones to their scope. This can improve the accuracy of the overall analysis and simplify the data when evaluating the content of lengthy text passages, since irrelevant words are omitted.

How can stop word lists enhance my word cloud visualizations?

The main purpose of word clouds is to transform text data into a graphic reflection of their content. Accordingly, semantically rich words, i.e. words carrying meaning, are fundamental components for this visualization process. They help us to gain a quick overview of the subject, presenting the most relevant words at a glance. To accommodate a wide range of projects and input data, MAXQDA offers stop word lists consisting of multiple subcategories which can be combined and adapted in numerous ways.

What is contained in MAXQDA’s predefined stop word lists?

Our stop word lists are divided into the following part of speech subcategories:

Part of speech subcategories included on the MAXQDA stop words list.

Additionally, we’ve created lists of single letters and the 100 most common words of the given language to make it easier for the researcher to identify the most significant terms in a document.

Proper use of stop word lists: five steps to improve the visualization of your text data  

The following steps should help you to use stop word lists in the best way and in situations where they can be beneficial:

  1. Identify your research goal
  2. Examine the word cloud
  3. Select a stop word list
  4. Customize list
  5. Apply and review results
  1. The first step is to identify the “goal” of your text analysis. Do you want to observe word frequencies in general? In this case, you might not use any stop words at all, to ensure your observations are realistic and true to the actual text data.

However, if you want to gain an impression of the content-related substance of the material, then you should probably make use of stop words.

  1. Before applying any kind of stop word list, you should open your text data with the word cloud feature and go through the words being displayed. Can you identify a group of words which you would like to exclude from your evaluation? Those words could make up a class of stop words that you can apply in the next step.

A word cloud from MAXQDA2020, before applying a stop words list.Climate Change – no stop words applied

  1. Depending on your field of research, you can now choose and combine our predefined stop word lists. In a linguistic context, you may want to have a closer look on auxiliaries and conjunctions used in your data. In social sciences, it may be more important to concentrate on meaningful words such as adjectives and nouns and to filter out function words.
  2. Once you have selected the lists you want to apply to your text, you should skim through and check whether you have chosen the correct lists for your objective. Is there a word missing in the list? You can easily add it using the Stoplist Manager. Are there words listed that should be contained in your analysis? You can look these up in the Stoplist Manager and delete the word from the stop word list. Now it should reappear in the word cloud.
  3. Finally, after having adapted the list to your needs, you can apply the list to your data and review the outcome. If you detect unwanted terms, you can add them to your stop words by right-clicking the word and selecting “add to stop words”.

A word cloud from MAXQDA2020, after applying a stop words list.Climate Change – stop words applied

