MAXDictio Word Combinations - excluding text

20.11.2019, 23:09

Looking for a way to exclude certain parts of a pdf, or alternatively select the text in a pdf that MAXDictio looks at.

The problem is that some webpages include copious tag phrases on many pages, and these get seen as frequent terms in MAXDictio.

I use the Web Collector to create pdf documents for selected pages, and then analyze using MAXDictio to generate a phrase cloud. This week I used a lot of pages from the same website. Because they include a section of tag words on each page, and repeat them, and also include a list of "other pages" with the same anchor text, MAXDictio is seeing that as frequent sequences, and the cloud is nonsense.

Other than going though a few hundred items and manually adding these to the Stop List, is there a way to rather select what part of the pdf document MAXDictio should search?

Version: MAXQDA 2020
System: Windows 10
Posts: 117
Joined: 28.02.2014, 19:38
Location: Washington DC & Denver Colorado

Re: MAXDictio Word Combinations - excluding text

26.11.2019, 10:32

Hi Matthew,

Thanks for the question – indeed, there is a way to select the text in a PDF for at least the word frequency features, although I don't know whether using it is feasible in your case. But in principle, you can use the "Only in retrieved segments"-option to limit those features to the segments coded with e.g. an "Actual content"-code. This would require you, however, to go through the page and select everything that's not a tag phrase. This will either need to be done manually or, in theory, you could also use the lexical search and regular expressions to target those sections if they begin and end with a unique string (e.g. one of those tag phrases?).

Alternatively, as you said, one would usually exclude those tag phrases via the stop list.

I hope this helps! If not, or in case of any other questions, please don't hesitate to contact us again.

Best regards on behalf of the MAXQDA support team,

MAXQDA Support Team
Andreas V.
Posts: 272
Joined: 13.04.2017, 16:23

Re: MAXDictio Word Combinations - excluding text

28.11.2019, 19:12

Thanks Andreas - yes, "Only Coded Segments" would give me that control and would work well enough.

The more I thought about this problem, the more concerned I was about how loads of tag words and other extraneous text in collected web pages could be diluting the text and reducing my construct validity.

Your solution will help reduce that risk
Posts: 117
Joined: 28.02.2014, 19:38
Location: Washington DC & Denver Colorado

Return to Technical Questions

Who is online

Users browsing this forum: No registered users and 3 guests

We use cookies to improve your experience on our website. By clicking OK or by continuing to browse the website, we’ll assume that you are happy with their use. Click here to review our Cookie Policy. OK