Importing PDF documents
You can import PDF documents into MAXQDA, as described here:
- by clicking the Import Documents icon in the “Document System”, or
- by clicking on the Documents icon on the Import
Text from a PDF document as a separate text document
After a PDF document has been imported into a MAXQDA project, you can extract the text from the PDF document. Images and formatting are ignored, only the plain text is inserted as a new text document in the “Document System”:
Click on a PDF document in the “Document System” and select the function Insert PDF text as a new document. The new text appears directly below the clicked document.
Working with PDF documents
There are some special considerations when working with PDF documents, as the PDF format was not conceived for text editing but rather as a layout format for printing, and hence are much bigger files than simple text documents.
Saving PDF files outside the MAXQDA project file
By default, all PDF files smaller than 5 MB will be saved in the project file upon insertion. PDF files larger than 5 MB are not saved in the MAXQDA project itself, but rather in the folder for externally saved files, and generate only a reference to the externally saved data. You can customize the maximum file size as well as the location for externally saved files through MAXQDA’s preferences, which you can access via the gear symbol in the top right corner of MAXQDA.
Coding text and image segments
Text and image segments in PDF documents can be coded with the mouse. Select and create a frame around the desired segments to subsequently code them. MAXQDA does not distinguish between text and image encodings in regard to code frequency; however in the Coding Query when searching for overlap, the query will search independently for overlap/intersection in text and image documents. Overlap between text segments and image segments will be ignored. The “Near” function for image segments always returns a result of 0, both in the Complex Coding Query and the Code Relations Browser.
If a text is in the format of a scanned PDF file, Optical Character Recognition or OCR, a text recognition process, must carried out beforehand. This process makes it possible to mark and code the text, otherwise it would be possible only to mark images.
Absence of paragraphs in PDF files
PDF documents, unlike text documents, have no paragraph structure per se. MAXQDA functions that rely on the paragraph structure can therefore not be used in PDF documents. These functions include, among others, automatic coding with the parameters “Sentence” or “Paragraph”, as well as the “Near” function for segments in the Complex Coding Query and Code Matrix Browser.
Navigating through the “Document Browser”
As soon as a PDF document is displayed in the Document Browser, several clickable icons will appear in the toolbar. You can flip forward and backward, adjust the zoom and use the bookmarks for navigation (many PDF files have several bookmarks, e.g. one per chapter).