Structured Documents (Preprocessor)

What are structured texts?

Often you may wish to import documents that are structured and pre-coded. Examples for this kind of document include:

  • Forms: Here you would like to code each section with its respective heading for the field in the form.
  • Questionnaires filled in by the respondents in a structured text file: Here you may wish to code the answers with their respective questions or instructions.
  • Asynchronous online discussions: that you have retrieved with online tools and already tagged.

The problem is similar in each case: before the actual analysis has begun certain text sections have already been allotted to specific form fields or similar. And you would like to save yourself the effort of manually coding each. To solve this problem, MAXQDA has the preprocessor that allows you to split a text file into several text documents during its import and to code labeled text areas with one or more codes.

Import structured data with the preprocessor

The preprocessor lets you enter a large number of documents into a single file and have them separated out into different documents when imported into MAXQDA. The syntax rules are as follows:

#TEXT textname this is the content of the 1st text…

#TEXTtextname this is the content of the 2nd text…

#TEXT textname…

Every document must start with the “#TEXT” and “TEXT” must be written in capital letters.

The name that you want to give to the document should come immediately after the “#TEXT” without a space in between. If you do not enter a name, MAXQDA will automatically assign one when the document is imported. The first imported document will be called “Document nn,” and the following documents will be named in sequential order in the “Document System.” This automatic numbering is useful when, for example, you enter the answers to open questions in a partly standardized survey. The answers must then simply be entered in the order of the standardized data in the SPSS file. It is not necessary to enter a name for each text. Both texts will have the same name.

Document names are handled by MAXQDA as follows. You can enter any kind of string (up to 63 characters) as a document name – spaces are also allowed. If you enter a document name with more than 63 characters, MAXQDA will truncate it automatically. Once the document has been imported into MAXQDA, you can change the name to include up to 64 characters.

Example

In the example project, various interviewees were asked about their level of satisfaction with various aspects of their life. Their answers were transcribed and imported as Word documents.

The resulting Word file appears as follows:

#TEXT 4(26,f,0k,sin)

I’ve gained too much weight over the last several years and I don’t seem to be doing anything to get rid of it. I have high cholesterol levels, but I don’t attempt to change my eating habits. I’d like to jolt myself into becoming more physically active, so I can lose the weight and feel more energetic. I keep saying I’m going to do something about it, soon.

#TEXT 3(34,f,2k,mar)

Overall I am pretty happy with my mental, social and physical health. I would like to improve my dedication to working out. I am the type of person who will work out 5 times a week for a month straight and then is slowly turns into less days a week until it is none. I get distracted by school work, my job or just being tired.

After each #TEXT entry is the name being given to the document. For the purposes of this project, it made sense to assign a number for each of the interviewees in addition to some basic information about that person, including age, gender, number of kids, and marriage status.

The Word file was then saved with the name “HealthSatisfaction” in RTF or DOC/X format. It could then be imported by going to the Import tab and selecting Structured Text.

Tip: MAXQDA automatically creates a new document group in the „Document System“ during import and inserts all texts into this document group.

All texts will appear in the “Document System” after being successfully imported. The documents are named as previously specified followed by #TEXT as an identifying agent. In the above example, the document name consists only of a text number and the personal data indicated in parentheses.

Documents after being imported with the Preprocessor

Pre-coding text segments during import

The document is then imported, separating out each individual document and assigning the given name for each in the “Document System” from the original RTF document.

In this way, it is possible to very quickly import and separate out many different documents formerly contained in a single file. The Preprocessor is able to do significantly more than this simple action, however. In many cases, you will already be able to code aspects of the document, and the Preprocessor can do that automatically during the import. To do so, you simply need to use additional syntax words.

For each section of the document that is to be coded, you simply need to type “#CODE” before and “#ENDCODE” after. In the case of a standardized survey or questionnaire, you could, for example, code each answer with the question number that it connects to. The answer to the first question, then, could be coded with the code “Question1” as is shown below.

#TEXTinterviewee1

#CODEquestion1

answer to question 1

#ENDCODE

#CODEquestion2

answer to question 2

#ENDCODE

 

#TEXTinterviewee 2

#CODEquestion1

answer to question 1

#ENDCODE

#CODEquestion2

answer to question 2

#ENDCODE

Important: It is important to remember that “CODE” is written in capital letters.

To avoid typos when entering “CODEquestion1,” it is possible to simply use a place holder, such as “§1” for the first question, “§2” for the second, etc. You can then later do an automatic search and replace, finding all instances of “§” and automatically replacing it with “CODEquestion.” This can save you a lot of time in addition to helping you avoid typos.

It is also possible to define and code with a subcode in a document prepared for the Preprocessor. To do so, use the following syntax:

#CODE Codename\Subcodename

You are giving the complete information about the subcode, including its name and the code it is a subcode of. The code and subcode are separated by a “” symbol, but no spaces. As with any codes used in a document imported with the Preprocessor, MAXQDA will first check whether the code already exists, and if it does not, it will be created.

Important: As soon as “#CODE” appears in the text, the new code will be used, automatically ending the coded segment of the previous code.

Text excerpts, as well as full paragraphs, can be pre-coded with the help of the preprocessor during import, as shown by the following example:

#TEXTtextname

Here is a text. Encoding begins here with #CODECode 1Subcode#. In the next sentence it ends with #ENDCODE# in the middle

#CODECode 2

Here is another text. The encoding ends here #ENDCODE# in the middle of the text.…

For a preliminary encoding of keywords from various text excerpts, the following rules apply:

  • When the keywords #CODE or #ENDCODE appear alone in a row, it is not necessary to insert # at the end. If these keywords appear within a text, or at the end of a line of text, they must be enclosed with #.
  • It is not possible to “layer” encodings. When a new #CODE# command comes before an #ENDCODE# command, the previous code is automatically closed.
Tip: To assign several codes to a text part you can combine the codes with two “&&” characters, e.g. #CODEFirst code && Second code && Third code

Mark participants of a focus group

If you use the tag #SPEAKER inside a text section, MAXQDA will import the text as a focus group transcript and will code the text after the tag with a speaker code. The tag #ENDSPEAKER closes the speaker coding. The tag may be inserted within a sentence, but then a # has to be added after the name of the speaker.

#TEXTFocus group 1

 

#SPEAKERModerator

A warmly welcome to our session …

#ENDSPEAKER

 

#SPEAKERParticipant 1

#CODETheme 1

I have the following opinion

#ENDCODE

#ENDSPEAKER

This function is very interesting for exporting data that have been collected online with the tool http://www.kernwert.com.