Please note: We have moved the forum to our new self-service portal. You are welcome to start a discussion in the new MAXQDA forum!

Adding and editing posts is no longer possible in this forum.

16.06.2022, 16:01

I want to use the dictionary and autocoding functions to code for large amounts of words more easily for 10-20 large PDFs, which are mostly text, but also include lots of tables and some images.

However, most of these PDFs have headers or footers that include the title or other repeated key words. I also want to leave certain sections out, like the table of contents or the bibliography. I've tried and searched, but I can't find any way to implement this. Is it simply not possible?

My only other option I can think of is to extract all of the text and then edit out the parts I don't want, but that is of course also quite tedious and time consuming. I would also prefer to keep them as PDFs to maintain a better idea of where the text occurs in the document.

Is it generally better to just use pre-cleaned text? Even when trying to clean the text in Word, there are lots of formatting irregularities that I have to fix, and I assume these would be carried over to MaxQDA, right? For example, sentences that are split with paragraph breaks abruptly.

Any tips would be greatly appreciated!

Version: MAXQDA 2022
System: Mac OS X 12.x (Monterey)

16.06.2022, 16:03

Oh, I should also add that I've thought of just manually going through and deleting each coded instance that I don't want, but this isn't a great solution either, because I'm likely to add more words later and have to go through the whole process again. I also read on a similar topic that this wouldn't fix the word count issue, only the code count.

17.06.2022, 12:38

Hello,

a function that enables selective analysis of a PDF document, so that certain passages are excluded from the analysis and won't be found, for example when you auto-code using the dictionary does not exist at the moment. We will forward this idea as a feature request to our development team - the colleagues there discuss feature wishes from our customers on a regular basis.

But maybe this workaround works for you: you can always convert PDF documents into editable text documents in MAXQDA by right-clicking on the PDF document in the Document System and selecting "Insert PDF Text as New Document". MAXQDA will then create a new text document that only contains the text elements of the PDF file and you can easily edit the document in MAXQDA by opening it in the Document Browser and activating Edit mode - for example to remove redundant or irrelevant passages. You could then go ahead and analyze this edited document using your dictionaries. The original PDF document stays intact so that you can always return to it for reference.

Further information can be found here: https://www.maxqda.com/blogpost/convert-pdf-text

All the best on behalf of the MAXQDA support team,

Alex

17.06.2022, 16:47

Got it, thanks so much for the quick reply! And good to hear that the team would consider this. For now, I'm basically doing what you suggested by converting them to text and cleaning them up before analysis.

Please note: We have moved the forum to our new self-service portal. You are welcome to start a discussion in the new MAXQDA forum!

Adding and editing posts is no longer possible in this forum.

How do you automatically code only certain sections of PDFs?

How do you automatically code only certain sections of PDFs?

Re: How do you automatically code only certain sections of P

Re: How do you automatically code only certain sections of P

Re: How do you automatically code only certain sections of P

Who is online

Please note: We have moved the forum to our new self-service portal. You are welcome to start a discussion in the new MAXQDA forum! Adding and editing posts is no longer possible in this forum.

How do you automatically code only certain sections of PDFs?

How do you automatically code only certain sections of PDFs?

Re: How do you automatically code only certain sections of P

Re: How do you automatically code only certain sections of P

Re: How do you automatically code only certain sections of P

Who is online

Please note: We have moved the forum to our new self-service portal. You are welcome to start a discussion in the new MAXQDA forum!

Adding and editing posts is no longer possible in this forum.