Notes on PDF Documents

PDF documents, like normal text documents, can be imported into a MAXQDA project and coded. However, there are some special considerations when working with PDF documents, as the PDF format was not conceived for text editing but rather as a layout format for printing:

Saving PDF files outside the MAXQDA project file

By default, all PDF files smaller than 5 MB will be saved in the project file upon insertion. PDF files larger than 5 MB are not saved in the MAXQDA project itself, but rather in the folder for externally saved files, and generate only a reference to the the externally saved data. You can customize the maximum file size as well as the location for externally saved files under Project > Preferences … (Windows) or MAXQDA 12 > Preferences … (Mac).

Tip: If you are working with many large PDF files (e.g. with a total size of more than 1 GB), it makes sense to store them externally so that the MAXQDA file remains small and can be easily secured. For optimal performance it is recommended that externally saved files be located on the local hard disk and if possible not on a network, although the acceleration of network speeds mean that this poses less and less of a problem.

Text and image segments in PDF documents can be coded with the mouse. Select and create a frame around the desired segments to subsequently code them. MAXQDA does not distinguish between text and image encodings in regards to code frequency; however in the Coding Query when searching for overlap, the query will search independently for overlap/intersection in text and image documents. Overlap between text segments and image segments will be ignored. The “Near” function for image segments always returns a result of 0, both in the Complex Coding Query and the Code Relations Browser.

If a text is in the format of a scanned PDF file, Optical Character Recognition or OCR, a text recognition process, must carried out beforehand. This process makes it possible to mark and code the text, otherwise it would be possible only to mark images.

Multipage Coding

Beginning in MAXQDA 12, it is now possible to code text and image segments which fall over multiple pages, without dividing them. If the project is transferred to MAXQDA 11, the segments will once again be divided and the number of coded segments may increase.

Absence of paragraphs in PDF files

PDF documents, unlike text documents, have no paragraph structure per se. MAXQDA functions that rely on the paragraph structure can therefore not be used in PDF documents. These functions include, among others, automatic coding with the parameters “Sentence” or “Paragraph”, as well as the “Near” function for segments in the Complex Coding Query and Code Matrix Browser.

Navigating the Document Browser

As soon as a PDF document is displayed in the Document Browser, several clickable icons will appear in the toolbar. You can flip forward and backward, adjust the zoom and use the bookmarks for navigation (many PDF files have several bookmarks, e.g. one per chapter).

Tip: If you want to display a PDF in MAXQDA that contains data in form fields, you should print your document into a new PDF before importing it. Otherwise it may happen, that the data inside the form fields is not displayed.

Was this article helpful?