Download This Chapter

PDF Documents

Importing PDF documents

You can import PDF documents into MAXQDA, as described here:

  • by clicking the Import Documents icon in the "Document System", or
  • by clicking on the Documents icon on the Import
Importing data via the Import tab

Text from a PDF document as a separate text document

After a PDF document has been imported into a MAXQDA project, you can extract the text from the PDF document. Images and formatting are ignored, only the plain text is inserted as a new text document in the "Document System":

Click on a PDF document in the "Document System" and select the function Insert PDF text as a new document. The new text appears directly below the clicked document.

Import a PDF as a new document
Tip: With many PDF texts, the conversion makes it possible to search within paragraphs when conducting a lexical search.

Working with PDF documents

There are some special considerations when working with PDF documents, as the PDF format was not conceived for text editing but rather as a layout format for printing, and hence are much bigger files than simple text documents.

Tip: PDF documents may contain pages in both portrait and landscape orientation. MAXQDA displays all of the pages in a PDF document in the same page orientation as the first page. If your document contains mixed page orientations, it might be necessary to rotate them into the same orientation before importing the document in MAXQDA.

Saving PDF files outside the MAXQDA project file

By default, all PDF files smaller than 5 MB will be saved in the project file upon insertion. PDF files larger than 5 MB are not saved in the MAXQDA project itself, but rather in the folder for externally saved files, and generate only a reference to the externally saved data. You can customize the maximum file size as well as the location for externally saved files through MAXQDA’s preferences, which you can access via the gear symbol in the top right corner of MAXQDA.

Tip: If you are working with many large PDF files (e.g. with a total size of more than 1 GB), it makes sense to store them externally so that the MAXQDA file remains small and can be easily secured. For optimal performance it is recommended that externally saved files be located on the local hard disk and if possible not on a network, although the acceleration of network speeds mean that this poses less and less of a problem.

Coding text and image segments

Text and image segments in PDF documents can be coded with the mouse. Select and create a frame around the desired segments to subsequently code them. MAXQDA does not distinguish between text and image encodings in regard to code frequency; however in the Coding Query when searching for overlap, the query will search independently for overlap/intersection in text and image documents. Overlap between text segments and image segments will be ignored. The “Near” function for image segments always returns a result of 0, both in the Complex Coding Query and the Code Relations Browser.

If a text is in the format of a scanned PDF file, Optical Character Recognition or OCR, a text recognition process, must carried out beforehand. This process makes it possible to mark and code the text, otherwise it would be possible only to mark images.

Absence of paragraphs in PDF files

PDF documents, unlike text documents, have no paragraph structure per se. MAXQDA functions that rely on the paragraph structure can therefore not be used in PDF documents. These functions include, among others, automatic coding with the parameters “Sentence” or “Paragraph”, as well as the “Near” function for segments in the Complex Coding Query and Code Matrix Browser.

Navigating through the "Document Browser"

As soon as a PDF document is displayed in the Document Browser, several clickable icons will appear in the toolbar. You can flip forward and backward, adjust the zoom and use the bookmarks for navigation (many PDF files have several bookmarks, e.g. one per chapter).

Tip: MAXQDA does not support editable PDF form fields. To display content from PDF forms, save your PDF document via a PDF printer as a new PDF file that contains the contents of the form fields as pure text.

Was this article helpful?