Post by Prof. Dr. Udo Kuckartz.
At the beginning of February 2020, in the early days of the coronavirus period, our knowledge about the virus which causes COVID-19 was very limited. As the virus spread, a great need for information arose in all sections of the population in Germany. In this respect, I felt the same way as millions of other people in the world. Since the beginning of March, all kinds of media have been trying to meet this need for information – from the tabloid press, radio, and television, to scientific journals, and preprints on university websites. Since I am considered as belonging to the risk group, I wanted to learn everything that was known about COVID-19 from the very beginning of the pandemic. However, the amount of information grew exponentially in the following weeks and months. That’s how I came up with the idea to use MAXQDA for support.
MAXQDA is not only a powerful tool for data analysis in research but can also provide useful services beyond the analysis of research data. For example, many people use MAXQDA for literature reviews and interaction with reference managers such as Endnote or Citavi.
I managed to make use of MAXQDA to get quick access to the rapidly growing amount of knowledge about COVID-19, its symptoms, the incubation period, disease duration and course, and many other questions. In doing so, I made particular use of the search functions and the functions of automatic coding – I will describe this in a condensed form below using the example of transcribed podcast data. In Germany, the virologist Prof. Drosten plays an important role as an advisor to the federal government and the Chancellor, Angela Merkel, as well as in the public sphere. Prof. Drosten is the chief physician for virology at the Charité, the university hospital of Humboldt University in Berlin. From February 26 to June 23, 2020- initially on a daily, later on a weekly basis- Drosten dealt with all questions concerning COVID-19 in a podcast entitled “The Coronavirus Update”, broadcast by Norddeutscher Rundfunk (NDR).
Preparing the data
NDR has made all podcasts freely available in its media library. These can easily be downloaded as MP3 files. In addition, all podcasts can be downloaded in transcribed form as PDF files. I regularly downloaded these files from February to the end of June, which created a collection of 50 audio files and 50 PDF files. The audio files were all imported into MAXQDA into a document group “Drosten podcasts audio”, and the transcripts into a document group “Drosten podcast transcripts (PDF)”.
After performing the import of audio and text files, I prepared the data in an additional step. This step is not absolutely necessary, but it presents me with more options for the subsequent analysis. I converted the PDF-formatted transcripts into DOCX files. This is done in MAXQDA with the function “Insert PDF text as new document” (available in the context menu of the respective document). I then created a document variable called “Podcast date” and entered the respective dates of the podcasts. This way, it is always immediately visible when something was communicated. This also makes it possible to work only with podcasts from a certain period in time.
Systematically searching for information with the Lexical Search function
The Lexical Search function (in the “Analysis” tab) opens up initial possibilities for compiling information from the podcast data. Here you can enter search words and determine what should be considered a hit. The search itself has several useful options such as “Find whole words” and “Case sensitive”. You can also include words from a lemma list. In lemmatization, word forms are reduced to their basic form, the lemma, by looking them up in an electronic dictionary. Thus, for example, “failures” is traced back to “failure”, just as the capitalized “Failure”.
Even the simple search comes up with useful results within seconds. For example: if I search for “incubation period” I find that astonishingly few statements have been made about this subject. The topic is mentioned for the first time in podcast number 5 on March 3. At that time it was assumed that the core of the incubation period is between two and seven days. Later, based on an English study, an average incubation period of 5 to 6 days is assumed, although the period can be up to 14 days. In podcast 16 on March 18th we learn that a group of researchers at Imperial College London assumes an incubation period of 5.1 days when constructing their models. In total, only 9 of the 50 podcasts contain the topic “incubation period”. The summary of collected statements from all podcasts shows that there is still little reliable knowledge about the incubation time and that the information is still rather vague- e.g. no information about the standard deviation of the incubation time is given. This information would of course be of great relevance for both individual and governmental measures.
In MAXQDA, search words can also be combined, so I got the following result from the search query for “day” and “infection”: “Infected people are most contagious the day before symptoms begin, and after four days, or at most after seven days, they are apparently no longer contagious. The virus can then only be detected as genetic material.” (Podcast 34 on March 22nd)
If my search query contains many search words or if I have utilized the function “use regular expressions”, it is useful to save the search query so that I do not have to retype it later.
Autocoding interesting topics
If new podcast transcripts are added to the database, the search for statements on specific topics can be started from scratch. However, you will soon notice that the level of information varies greatly from one retrieved text segment to another. Some segments are mere repetitions, while others are very detailed. Therefore, it would not be very effective to have to read all the references to a certain topic over and over. An excellent way to avoid this is to automatically code the passages and then run through them, marking the less interesting passages as not to be coded. This is a great advantage of MAXQDA’s automatic coding function: : it does not simply blindly code all found text passages, but allows the user to select the truly important ones. Once this has been done, these text passages are marked once and for all, and in a later search, e.g. for the topic “incubation period”, only those coded passages are listed which contain important and new information.
A very controversial question in Germany was whether and to what extent children are infectious. Searching for the occurrence of the words “child” and “infectious” in the same sentence yields 58 hits in 15 podcasts. The topic really only gained a certain importance from episode 36 (28 March) onwards; before then it had played hardly any role at all. I flip through the results table and mark the less interesting statements. Now I still have to decide what to code. MAXQDA again offers many possibilities here: I can code just the search string, the sentence, or the entire paragraph. With the options “sentence” and “paragraph” I can even specify how many sentences or paragraphs should be coded before and after the hit.
The autocode function creates a code in the code system that always starts with the word “autocode”,and the selected options are recorded as a code memo, so that I always know later on which data was automatically coded in which way.
Autocoding with Dictionary
A great function of MAXQDA is autocoding with a dictionary you have created yourself. Such dictionaries can be created using the MAXDictio module. I will soon describe in another blog post how this form of autocoding works with the coronavirus data.
Working with the podcasts of the “Coronavirus Update” has not only considerably expanded my knowledge about the pandemic, but has also evoked questions on a meta-level, for example, which implicit concept of evidence underlies the work of this most important German virologist in public discourse.