MAXQDA Tip of the Month: Converting PDF files into text documents

Have you ever wished to make changes to the text of a PDF document while working on a project? With the “Insert PDF Text as New Document” function you can do exactly that without having to leave your MAXQDA project. All you need to do is right-click on the PDF document that you wish to convert into a text document, select “Insert PDF Text as New Document” from the context menu and MAXQDA will automatically create a new text document that contains all the text from the PDF file. In the resulting document images and formatting are ignored; only the plain text is inserted as a new text document in the “Document System”.

Screenshot from MAXQDA showing the convert PDF to text dialog

To create a new text document containing all the text from a PDF document click “Insert PDF Text as New Document”

With many PDF texts, the conversion makes it possible to search within paragraphs when conducting a lexical search. And as with all text documents, you can then go ahead and edit the text to your liking: you can adjust the font, font size, make the text bold or italic, create lists, change indent and a lot more. Of course, you can also add text or delete segments if you wish! This text document can then be exported as well, so that you can also edit it in other text processing software.

Screenshot from MAXQDA showing the created text document in the Document System

The new document will then appear right next to the original PDF.

The whole thing works using OCR (optical character recognition) technology: this means that the text contents of a PDF file are recognised so that the computer treats the individual characters as they would with a typical text document. Most PDF files that were created digitally have OCR, but beware: scans of texts might not have it. A good way to check whether or not a text has OCR is to simply highlight a segment in the document browser in MAXQDA. If you can do that, the individual characters have been recognised; if you’re only able to highlight segments as you would in images this might indicate that there is no OCR in your PDF file. But don’t worry, you can easily add OCR to your PDF files using free online tools.

Screenshot of the plain text document as converted from the PDF

The new text document contains all the text of the PDF document, images and formatting are ignored.

For further information on PDF documents and text recognition have a look at this tutorial video on our YouTube channel:

Notes on PDF documents in the MAXQDA Manual

MAXQDA Newsletter

Our research and analysis tips, straight to your inbox.