PDF Documents

Importing PDF documents

You can import PDF documents into a MAXQDA project in various ways:

  • with the mouse, drag and drop PDF files from the Windows Explorer or macOS Finder directly into the “Document System” window,
  • click the Import Documents icon in the “Document System”, or
  • click on the Texts, PDFs, Tables icon on the Import menu tab.
Importing data via the “Import” tab
For general information on importing files and organizing them, see Import and Group your Data.

Color highlighting and comments

Color highlighting that already exists in a PDF document is adopted as regular coding during import into MAXQDA. A parent code with the name "Word/PDF highlighting" is created at the top level in the code system. For each color, a subcode with the colors name in English is created and assigned to the corresponding text passage.

Note: When importing the color highlighting, slight color deviations from the original are possible, as MAXQDA selects the one that fits best from a list of stored colors.

Already existing comments on the PDF document are imported as in-document memos at the same position in the document and can be displayed in the "Document Browser". Several related comments or threads are combined into one memo.

The option to code highlighted text and import comments as in-document memos during import can be turned on/off in the settings of the “Document System”. To open the settings, click on the gear wheel in the toolbar.

Navigating through the “Document Browser”

As soon as a PDF document is displayed in the Document Browser, several clickable icons will appear in the toolbar. You can directly jump to a page, flip forward and backward, adjust the zoom and use the bookmarks for navigation (many PDF files have several bookmarks, e.g. one per chapter).

Navigation toolbar
Tip: MAXQDA does not support editable PDF form fields. To display content from PDF forms, save your PDF document via a PDF printer as a new PDF file that contains the contents of the form fields as pure text.

Working with PDF documents

There are some special considerations when working with PDF documents, as the PDF format was not conceived for text editing but rather as a layout format for printing, and hence are much bigger files than simple text documents.

Saving PDF files outside the MAXQDA project file

By default, all PDF files smaller than 5 MB will be saved in the project file upon insertion. PDF files larger than 5 MB are not saved in the MAXQDA project itself, but rather in the folder for externally saved files, and only a reference to the externally saved data is created. You can customize the maximum file size as well as the location for externally saved files in MAXQDA’s preferences, which you can access via the gear symbol in the top right corner of MAXQDA's main window.

For detailed information, see External Files.

Tip: If you are working with many large PDF files (e.g., with a total size of more than 50 MB), it makes sense to store them externally so that the MAXQDA file remains small and can be easily backed up. For optimal performance, it is recommended that externally saved files are located on the local hard disk. The use of a network drive can also work if the connection is very fast.

Coding text and image segments

Text and image segments in PDF documents can be coded with the mouse. Select and create a frame around the desired segments to subsequently code them. MAXQDA does not distinguish between text and image codings in regard to code frequency; however, in the Coding Query when searching for overlap, the query will search independently for text and image segments. Overlap between text segments and image segments will be ignored. The “Near” function for image segments always returns a result of 0, both in the Complex Coding Query and the Code Relations Browser.

If a text is in the format of a scanned PDF file, Optical Character Recognition (OCR), a text recognition process, must carried out with a suitable program before the import into MAXQDA. This process makes it possible to mark and code the text in MAXQDA later, otherwise it would only be possible to mark images.

Paragraphs in PDF files

PDF documents, unlike text documents, have no paragraph structure per se. MAXQDA therefore tries to recognize paragraphs in PDF documents based on various criteria, so that, for example, the functions for finding words within a paragraph or autocoding paragraphs can be used.

Paragraph detection works very well in most PDF documents, but please consider the following limitations:

  • For MAXQDA, there are no paragraphs across page boundaries in PDF documents. This means that even if the content of a paragraph continues on the next page, for MAXQDA the paragraph ends at the end of the page.
  • Footnote characters in the text may be recognized as the end of a paragraph.
  • The quality of paragraph recognition depends on how the PDF was created and what structure it has. In PDF documents created from scanned text using OCR text recognition, the quality of paragraph recognition will be worse than in PDF documents created directly from Word.

Extract text from a PDF document and save as text document

After a PDF document has been imported into a MAXQDA project, you can extract the text from the PDF document. Images and formatting are ignored, only the plain text is inserted as a new text document in the “Document System”.

Click on one or multiple PDF documents in the “Document System” and select the function Insert PDF Text as New Document. The new text appears directly below the clicked document.

Import a PDF as a new document
Tip: With many PDF texts, the conversion makes it possible to search within paragraphs when conducting a lexical search.

Was this article helpful?