'Split' PDF Documents?

01.12.2015, 14:30

Hi there,

I'm performing a thematic analysis (or argumentative discourse analysis) on a corpus of LexisNexis newspaper data.

I have downloaded the newspaper articles as one document - one long PDF document with each new newspaper article on a separate page.

I have coded all the documents, but now that I am performing my analysis this may be a problem. My aim is to to cross-reference individual newspaper articles with individual codings to find the most prominent framing techniques / codings in each article, and also to denote the overall tone of each article.

Thus, is there any way of 'splitting' the PDF documents into individual articles? Or to use the Document Sets function to resolve this issue?


Posts: 1
Joined: 25.11.2015, 18:37

Re: 'Split' PDF Documents?

02.12.2015, 16:27

Hi Kevin,

you are right: you can get the most out of most analysis functions in MAXQDA if you work with different documents, in example the variable functions. So it is important to prepare the document system structure from a perspective that considers the analysis process at the end of a research process before starting the import and coding process of data material. This shouldn't be a big issue in general, because in most cases the data material will already be in "splitted" form.

There is no real good way to go, when you have already imported a huge PDF file and there is no way to split up the PDF file in MAXQDA at the stage you are describing. Of course there are some special freeware tools to split a PDF file outside of MAXQDA but this won't help you, because you can't split the coding information, that would refer to wrong pages after splitting the PDF.

I see this workaround only, where you re-import the PDF many times and split up the coded segments:

1. You can import the PDF several times in the project, in fact you will need it as often as many groups of news articles you need It might help to think of a good aggregation level, i.e. years or month, this depends on the comparision you want to do. Name the newly imported documents with the "group" names to be able to disginguish between them all.

Be sure to save the PDF file as an external file otherwise the project file would get to large!
2. Transfer the coded segments from the original to the first newly document with a right click on the coded document and "Export teamwork" and a right click on one of the target document and "Import teamwork".

3. Open the "Overview of coded segments" for the new document and delete all coded segments from passages that don't belong to the group (you can use the column "page" as a good indicator what to delete). => The document will only contain the coded segments belonging to the name of the document.

4. Continue for each new document.

I really hope this helps. The document sets function won't help right now, maybe you can use it after splitting up the documents.

Best regards on behalf of the MAXQDA team,

P.S.: If you are working with LexisNexis news data without pictures it might be of interest to import them as text files not as PDF files the next time, because text documents offer more functionality than PDF documents.
Dr Stefan Rädiker
Training, Coaching, Analysis for Research & Evaluation
MAXQDA Professional Trainer
Stefan Rädiker
Posts: 254
Joined: 12.02.2007, 10:18

Return to Technical Questions

Who is online

Users browsing this forum: No registered users and 1 guest