MAXQDA

How do you automatically code only certain sections of PDFs?

Moderator: srueger

How do you automatically code only certain sections of PDFs?

16.06.2022, 16:01

I want to use the dictionary and autocoding functions to code for large amounts of words more easily for 10-20 large PDFs, which are mostly text, but also include lots of tables and some images.

However, most of these PDFs have headers or footers that include the title or other repeated key words. I also want to leave certain sections out, like the table of contents or the bibliography. I've tried and searched, but I can't find any way to implement this. Is it simply not possible?

My only other option I can think of is to extract all of the text and then edit out the parts I don't want, but that is of course also quite tedious and time consuming. I would also prefer to keep them as PDFs to maintain a better idea of where the text occurs in the document.

Is it generally better to just use pre-cleaned text? Even when trying to clean the text in Word, there are lots of formatting irregularities that I have to fix, and I assume these would be carried over to MaxQDA, right? For example, sentences that are split with paragraph breaks abruptly.

Any tips would be greatly appreciated!

Version: MAXQDA 2022
System: Mac OS X 12.x (Monterey)
libraryBook
 
Posts: 3
Joined: 16.06.2022, 13:35

Re: How do you automatically code only certain sections of P

16.06.2022, 16:03

Oh, I should also add that I've thought of just manually going through and deleting each coded instance that I don't want, but this isn't a great solution either, because I'm likely to add more words later and have to go through the whole process again. I also read on a similar topic that this wouldn't fix the word count issue, only the code count.
libraryBook
 
Posts: 3
Joined: 16.06.2022, 13:35

Re: How do you automatically code only certain sections of P

17.06.2022, 12:38

Hello,

a function that enables selective analysis of a PDF document, so that certain passages are excluded from the analysis and won't be found, for example when you auto-code using the dictionary does not exist at the moment. We will forward this idea as a feature request to our development team - the colleagues there discuss feature wishes from our customers on a regular basis.

But maybe this workaround works for you: you can always convert PDF documents into editable text documents in MAXQDA by right-clicking on the PDF document in the Document System and selecting "Insert PDF Text as New Document". MAXQDA will then create a new text document that only contains the text elements of the PDF file and you can easily edit the document in MAXQDA by opening it in the Document Browser and activating Edit mode - for example to remove redundant or irrelevant passages. You could then go ahead and analyze this edited document using your dictionaries. The original PDF document stays intact so that you can always return to it for reference.

Further information can be found here: https://www.maxqda.com/blogpost/convert-pdf-text

All the best on behalf of the MAXQDA support team,

Alex
Alex_EN
 
Posts: 15
Joined: 10.02.2022, 13:50

Re: How do you automatically code only certain sections of P

17.06.2022, 16:47

Got it, thanks so much for the quick reply! And good to hear that the team would consider this. For now, I'm basically doing what you suggested by converting them to text and cleaning them up before analysis.
libraryBook
 
Posts: 3
Joined: 16.06.2022, 13:35

Return to Technical Questions

Who is online

Users browsing this forum: No registered users and 8 guests

We use cookies to improve your experience on our website. By clicking OK or by continuing to browse the website, we’ll assume that you are happy with their use. Click here to review our Cookie Policy. OK